mardi 31 mars 2015

Finding a specific pattern using grep (or any other program)

I am looking for a very specific pattern in a database of proteins. The structure I am looking for is C1XXXC2-X(n)-C3XXXC4. C = cysteine.


For example: MALEAQMTLRMFVLVAMASTVHVLSSSFSEDLGTVPLSKVFRSETRFTLIQSLRALLSRQLEAEVHQPEIGHPGFSDETSSRTGKRGGLGRCIHNCMNSRGGLNFIQCKTMCS


As you can see, this sequences ONLY has four C's and they are in the pattern that I mentioned above. There cannot be any more C's in the rest of the sequence!


I was using this code: grep "[A-Z]"C...C"[A-Z]"C...C"[A-Z]*", but it would give me sequences with more than four C's.


Thank you in advance for any help.


Aucun commentaire:

Enregistrer un commentaire