jeudi 14 juin 2018

PHP Regular Expression pick up matches only after certain word in text

that's my first question here. :) Was searching around with my problem for a few days, but it is not yet fully solved. What I have is a bunch of html code. I like to extract prices for some goods from local ad site, to analyze it collectively latter. But this site is making duplicates of a few ads as a paid promotion. Hence when code runs it collects both promoted and regular price ads. This creates duplicates in results. I'd like to avoid it. The only way to distinguish them is to search for matches only after top paid ads are over. This is distinguished by "promoted-after" piece of text. So here is my RegEx:

'/<p\s*class="price">\s*<strong>([\d $€\.]*)\s*<\/strong>/i'

It awesomely works for ALL the prices it founds including duplicated from paid ads. But when I modify it to:

'/promoted-after.*\G<p\s*class="price">\s*<strong>([\d $грн€\.]*)\s*<\/strong>/is'

It correctly bypasses the top part, but then saves only one last price of all the ads. How can it be modified to correctly save all the prices AFTER "promoted-after" tag? Here is the example of input:

<p>a lot of some html code here</p>
<p class="price">   <strong>2680 $</strong>
<div>a lot of some random html code here</div>
<p class="price">   <strong>3250 $</strong>
<p>a lot of some good html code here</p>
<p class="price">   <strong>3450 $</strong>
<div id="promoted-after"></div>
<p class="price">   <strong>400 $</strong>
<td>a lot of some strange html code here</td>
<p class="price">   <strong>401 $</strong>
<div>a lot of some awesome html code here</div>
<p class="price">   <strong>402 $</strong>
<span>a lot of some ugly html code here</span>
<p class="price">   <strong>403 $</strong>
<div>a lot of some nice html code here</div>
<p class="price">   <strong>404 $</strong>
<table>a lot of some best html code here</table>

P.S. I use preg_match_all

Aucun commentaire:

Enregistrer un commentaire