mardi 17 décembre 2019

Regex Pattern for document indexed

there is a extended document with title indexed in format ascend, for example 8.1, 8.1.1... 8.1.1.1.1.1.1 such as:

<h1 class="topicTitle-h1">8.12.1.1.12.1.1 title03</h1>
<h1 class="topicTitle-h1">8.1 title01</h1>
<h1 class="topicTitle-h1">8.1.1.1.1.1.1 title03</h1>
<h1 class="topicTitle-h1">8.1.1 title02</h1>
<h1 class="topicTitle-h1">8.1.1.1.1.2.1 title03</h1>
<h1 class="topicTitle-h1">8.1.1.1 title03</h1>
<h1 class="topicTitle-h1">8.1.1.3.2.3.1 title03</h1>
<h1 class="topicTitle-h1">8.1.1.1.1 title05</h1>
<h1 class="topicTitle-h1">8.1.4.2.5.9.3 title03</h1>
<h1 class="topicTitle-h1">8.1.1.1.1.1 title06</h1>
<h1 class="topicTitle-h1">8.1.11.12.14.3.1 title03</h1>

I tried to get only title03 with regex expression re.search(r'\">\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3} (.*)</h1>',x) but it matches all of the title without exceptions instead of only matches for d.d.d.d.d.d.d

thanks in advance

Aucun commentaire:

Enregistrer un commentaire