mercredi 12 juin 2019

How to find matching patterns in a list of sequences in Perl

In am new to Perl and I would need some help with the following code. Given a file.txt containing 3 (or more) sequences:

S>seq1
MDQNGASGSHPNRASTRPGAHARERGATVSAAANRSNIIDEMAKICEADRQTFAIARRTR NESQFFGFRTASNKAIEITEAMEKRGAMFLTQSKATDQLNGWQPSDEPDKTSAESEPWFR.
S>seq2
MKIKLVTVGDAKEEYLIQGINEYLKRLNSYAKRETIEVPDEKAPEKLSDAEMLQVKEKEG EYVFVLAINGKQLSSEEFSKEIFQTGISGKSNLTFTTCFSLGLSDSVLQRIMKGEPYHKL.
S>seq3
MWKTVAPIFAAIFAVGLCGTFRTNTRKGEPTTKCFVFVHDTKARIYQCTFKTWSCPWLNN IVSAQFQFVTGANYKIVVKLVGELFTETALFNWSSPTTIFTGLGTLITADKTLDCDSNML

For each of the three following type of models, I want to find out:
1) how many sequences contain at least on occurrence of the model.
2) which sequence contains the largest number of occurrences of that model.

model_x
(KT)(KT).(AF)

model_y
(AF)..(DE)

model_z
(AF).(KT)

(KT)(KT).(AF) = element K or T followed element K or T followed by any element (.) followed by an element A or F (example of matching sequence=TTCF)

(AF)..(DE) = element A or F followed by any two elements (..) followed by an element D or E (example of matching sequence=FIVE)

(AF).(KT) = element A or F followed by any element (.) followed by an element K or T (example of matching sequence=FVK)

I tried the following code but it prints only one line as output

#!/usr/bin/perl -w

use warnings;
use strict;

my $row;
my $overline;
if(!open(DATA, "file.txt")){
die "File not found\n";
}
while ($row = <DATA>) {
  if ($row =~ /^S>/) {
     $overline = $row;} 
     elsif ($row =~ /([KT][KT].[AF])+/g){
     print($overline,$row);}
     elsif ($row =~ /([AF]..[DE])+/g){
     print($overline,$row);}
     elsif ($row =~ /([[AF].[KT])+/g){
     print($overline,$row);
   }
}
close (DATA);

Thank you very much for helping me with this code or pointing me in the correct direction!

Aucun commentaire:

Enregistrer un commentaire