lundi 4 novembre 2019

Non-greedy wild card appears to match greedily?

I need to understand why regular expression is matching greedily when I am specifying it not to.

Given string='.GATA..GATA..ETS..ETS.'

Return the shortest substring of GATA...ETS

I use the regex pattern pattern = r'(GATA).*?(ETS)'

syntax_finder=re.compile(pattern,re.IGNORECASE)

for match in syntax_finder.finditer(string):
    print(match)

Returns <re.Match object; span=(1, 17), match='GATA..GATA..ETS'>

However, I want it to return 'GATA..ETS'

Does anyone know why this is happening?

I am not looking for a solution to this exact matching problem. I will be doing a lot of these types of searches with more complicated patterns of GATA and ETS, but I will always want it to return the shortest match.

Thanks!

Aucun commentaire:

Enregistrer un commentaire