vendredi 10 juin 2016

increasing length subsequence searching

I have to implement basically an algorithm for searching sequence in a text (let's imagine without loss of generality to talk about strings and chars). I've found lot of searching algorithms (for example this beautiful site http://ift.tt/yhaHnH) but not what I wanted. I'll try to explain what I need by an example:

If I have the text:

ababbacacababcacacababacabababcabbbac

so the alphabet is {a,b,c}; I want first to find all length-1 subsequence i.e. all occurrences of "a", "b" and "c". BUT in the second step I would search for all possible length-2 subsequence i.e. all occurrences of "aa", "ab", "ac", "ba", "bb", "bc", "ca", "cb", "cc" and I want to exploit the information I got from step 1 in order to reduce the complexity. For example, if in the first step I do not find any "a" occurrences, in the second step I wouldn't lose time in searching "aa","ab" or "ac" because I already know they do not exist.

I already have in mind some sort of data structure to manage some well known string searching algorithm modifying it in order to search only a subset of strings that I will compute online but that is my idea. I would like to know if there exist approaches in literature that already manage this problem.

Thank you very much, I'll appreciate every help you could give me.

Federico

Aucun commentaire:

Enregistrer un commentaire