samedi 20 avril 2019

How to return whole non-latin strings matching a reduplication pattern, such as AAB or ABB

I am working with strings of non-latin characters. I want to match strings with reduplication patterns, such as AAB, ABB, ABAB, etc. I tried out the following code:

import re

patternAAB = re.compile(r'\b(\w)\1\w\b')
match = patternAAB.findall(rawtext)
print(match) 

However, it reurns only the first characters of the matched string. I know this happens because of the capturing parenthesis around the first \w.

I tried to add capturing parenthesis around the whole matched block, but Pyhton gives

error: cannot refer to an open group at position 7

I also found this method,but didn't work for me:

patternAAB = re.compile(r'\b(\w)\1\w\b')
match = patternAAB.search(rawtext)
if match:
    print(match.group(1))

How could I match the pattern and return the whole matching string?

# Ex. 哈哈笑 
# string matches AAB pattern so my code returns 哈 
# but not the entire string

Aucun commentaire:

Enregistrer un commentaire