samedi 11 mars 2017

Python Reg Pattern URL select/filter

links =[ ' 'http://ift.tt/2nodMfW', 'http://ift.tt/2my50gc', 'http://ift.tt/10xcuxv', 'https://www.twitter.com/NPR']

Objective: get links contain (/yyyy/mm/dd/ddddddddd/) format. e.g. /2017/03/10/519650091/

for some reasons just cannot get it right, always has the facebook, twitter and 2017/03/20170311 format links in it.

sel_links = [] def selectedLinks(links): r = re.compile("^(/[0-9]{4}/[0-9]{2}/[0-9]{2}/[0-9]{9})$") for link in links: if r.search(link)!="None": sel_links.append(link) return set(sel_links) selectedLinks(links)

Aucun commentaire:

Enregistrer un commentaire