vendredi 14 juin 2019

Perl how to find motifs in a multi fasta file

Given a multi fasta file:

>seq1  
CCTTTGGATGGCAAAATTTNTNGTAAA
AGGGCACCCANTTCTGGC  
>seq2
NNNNNGGGGCGTAANGAGGGGCACGG
TNCC
>seq3   
AAAAAANNNNTAC

I want to find motifs matching with the pattern [NC].[CT] (element N or C followed by any element . followed by an element C or T) for each sequence starting with symbol '>' and count how many sequences contain the motif. The main problem I am encountering with this code is the iteration over each sequence. Here is my code:

#!/usr/bin/perl -w

use warnings;

if(!open(MY_HANDLE, "genbank.txt")){
    die "Cannot open the file";
}
 @content = <MY_HANDLE>;
close(MY_HANDLE);
foreach $row(@content){
  chomp($row);
     if (@matches1=$row =~ /([AT][AN]..[CG]+)/g)  {
     $numMat=scalar(@matches1);
     print("@matches1,$numMat\n");
     }
     elsif (@matches2=$row =~ /[NC].[CT]+/g)  {
     print("@matches2\n");
     }
  }


Thanks a lot

Aucun commentaire:

Enregistrer un commentaire