mardi 29 décembre 2015

Pattern search algorithm

I have few letters and want to find pattern which happen often.

I have below letters.

 ---------TAAA-GAGAG----T--T-------T
 -------------T------A---------T----
 ----------AAA-GAGAG---C-----------T
 ------C------------T-A----T-----TA-
 -----AC----------------TT--------C-
 -------------------T---------------
 ---A---------------T---------------
 -------------------------T----T----
 ------C---------------------T-----T
 ----------AAA-GAGAG---C------------
 ----A--------------------T-------A-
 --------G-G-----------G-------T----
 ----A--------------------T---------
 ---------T-------------------------
 -----AC------T--------------------T
 --GA-G------------GT--------------T
 -----A--G-AAA-GAGAG-AA-------------
 -----------------------------------
 -T-------C-G-------T---TT------T--T
 TT-----------------T---------------
 -------------------T---TT-T--------
 ---A----G------------A--------T----
 -----------------------------------
 -------T--AA--G-GAG---C------------
 -T-A----------------A--------------
 ------------T------------------T---
 -----------G-----------------------
 --G-A------------------C---T---T---
 ----A---G---A-------A------T-------
 --------G-------------------T-----T
 -TG--------A---------A-T-----------
 --G--A--------GAGAG---CT-----------
 ---A-------G------------T----G---A-
 T-------G----T---------------------
 -T----C-GCAA--GAGAG-A-C-----T--TT--
 -----A----AAA-GAGAG-A-T------G---A-
 -T---------G-----------------------
 ---A---T---------------------G-----
 ---A---------T-------A---T---A---A-
 -----------------------------------
 TC--A----T----------G-------T-T--G-
 -T----CT---G-T-----T-A----T-T--T---
 -------------T-----TA------------A-
 --G----T-----------------T-T--T----
 ---A------AA--GAGAG---C-----T------
 --------------GAGAG-A-C------------
 ----------AA--GAGAG---C-G----------

And as you can see there is pattern on the middle there is "GAGAG"

So I can say This G+A+G+A+G <- is coming from same data.

So I want to split out that lines and group.

Wrong Case. First line There is T+A+A+A but another line does not together with T+A+A+A , only A+A+A always together. So in this case T and A is not group.

Anybody has idea this kind of related algorithm or how to find a pattern.

I try to do with Java Programming.

Thank you.

Aucun commentaire:

Enregistrer un commentaire