In am new to Perl and I would need some help with the following code. Given a file.txt containing 3 (or more) sequences:
S>seq1
MDQNGASGSHPNRASTRPGAHARERGATVSAAANRSNIIDEMAKICEADRQTFAIARRTR NESQFFGFRTASNKAIEITEAMEKRGAMFLTQSKATDQLNGWQPSDEPDKTSAESEPWFR.
S>seq2
MKIKLVTVGDAKEEYLIQGINEYLKRLNSYAKRETIEVPDEKAPEKLSDAEMLQVKEKEG EYVFVLAINGKQLSSEEFSKEIFQTGISGKSNLTFTTCFSLGLSDSVLQRIMKGEPYHKL.
S>seq3
MWKTVAPIFAAIFAVGLCGTFRTNTRKGEPTTKCFVFVHDTKARIYQCTFKTWSCPWLNN IVSAQFQFVTGANYKIVVKLVGELFTETALFNWSSPTTIFTGLGTLITADKTLDCDSNML
For each of the three following type of models, I want to find out:
1) how many sequences contain at least on occurrence of the model.
2) which sequence contains the largest number of occurrences of that model.
model_x
(KT)(KT).(AF)
model_y
(AF)..(DE)
model_z
(AF).(KT)
(KT)(KT).(AF) = element K or T followed element K or T followed by any element (.) followed by an element A or F (example of matching sequence=TTCF)
(AF)..(DE) = element A or F followed by any two elements (..) followed by an element D or E (example of matching sequence=FIVE)
(AF).(KT) = element A or F followed by any element (.) followed by an element K or T (example of matching sequence=FVK)
I tried the following code but it prints only one line as output
#!/usr/bin/perl -w
use warnings;
use strict;
my $row;
my $overline;
if(!open(DATA, "file.txt")){
die "File not found\n";
}
while ($row = <DATA>) {
if ($row =~ /^S>/) {
$overline = $row;}
elsif ($row =~ /([KT][KT].[AF])+/g){
print($overline,$row);}
elsif ($row =~ /([AF]..[DE])+/g){
print($overline,$row);}
elsif ($row =~ /([[AF].[KT])+/g){
print($overline,$row);
}
}
close (DATA);
Thank you very much for helping me with this code or pointing me in the correct direction!
Aucun commentaire:
Enregistrer un commentaire