there is a extended document with title indexed in format ascend, for example 8.1
, 8.1.1
... 8.1.1.1.1.1.1
such as:
<h1 class="topicTitle-h1">8.12.1.1.12.1.1 title03</h1>
<h1 class="topicTitle-h1">8.1 title01</h1>
<h1 class="topicTitle-h1">8.1.1.1.1.1.1 title03</h1>
<h1 class="topicTitle-h1">8.1.1 title02</h1>
<h1 class="topicTitle-h1">8.1.1.1.1.2.1 title03</h1>
<h1 class="topicTitle-h1">8.1.1.1 title03</h1>
<h1 class="topicTitle-h1">8.1.1.3.2.3.1 title03</h1>
<h1 class="topicTitle-h1">8.1.1.1.1 title05</h1>
<h1 class="topicTitle-h1">8.1.4.2.5.9.3 title03</h1>
<h1 class="topicTitle-h1">8.1.1.1.1.1 title06</h1>
<h1 class="topicTitle-h1">8.1.11.12.14.3.1 title03</h1>
I tried to get only title03 with regex expression re.search(r'\">\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3} (.*)</h1>',x)
but it matches all of the title without exceptions instead of only matches for d.d.d.d.d.d.d
thanks in advance
Aucun commentaire:
Enregistrer un commentaire