samedi 10 février 2018

python HTML parsing issue

Given an html page, I would like to only get an array of variables like this (id1, value1), (id2, value2), ...., the file is given like this:

    <div class="col m3 s12 col_title"><div class="font-small grey-text truncate content" title="value1">value1</div></div>
    <div class="col m7 s12 col_id"><div class="content wrap">id1</div></div>

every value is followed by a "content wrap" id. I was thinking of something like:

match = re.compile('title="(.+?)".+?wrap"(.+?)"').findall(source)

Thanks,

Aucun commentaire:

Enregistrer un commentaire