Given a multi fasta file:
>seq1
CCTTTGGATGGCAAAATTTNTNGTAAA
AGGGCACCCANTTCTGGC
>seq2
NNNNNGGGGCGTAANGAGGGGCACGG
TNCC
>seq3
AAAAAANNNNTAC
I want to find motifs matching with the pattern [NC].[CT] (element N or C followed by any element . followed by an element C or T) for each sequence starting with symbol '>' and count how many sequences contain the motif. The main problem I am encountering with this code is the iteration over each sequence. Here is my code:
#!/usr/bin/perl -w
use warnings;
if(!open(MY_HANDLE, "genbank.txt")){
die "Cannot open the file";
}
@content = <MY_HANDLE>;
close(MY_HANDLE);
foreach $row(@content){
chomp($row);
if (@matches1=$row =~ /([AT][AN]..[CG]+)/g) {
$numMat=scalar(@matches1);
print("@matches1,$numMat\n");
}
elsif (@matches2=$row =~ /[NC].[CT]+/g) {
print("@matches2\n");
}
}
Thanks a lot
Aucun commentaire:
Enregistrer un commentaire