I needs to match a really large list of keywords (>1000000) in a string efficiently using python. I found some really good libraries which try to do this fast:
1) FlashText (http://ift.tt/2guItlC)
2) Aho-Corasick Algorithm etc.
However I have a peculiar requirement: In my context a keyword say 'XXXX YYYY' should return a match if my string is ' XXXX is a very good indication of YYYY'. Note here that 'XXXX YYYY' is not occuring as a substring but XXXX and YYYY are present in the string and this is good enough for me for a match.
I know how to do it naively. What I am looking for is efficiency, any more good libraries for this?
Aucun commentaire:
Enregistrer un commentaire