I have an data set with 7754 obs. and 5 variables
name protein mutation_CDS mutation_nonCDS seq
B*07 02 01 01 ATGCTGGTCATGGCGCCCCGAACCGTCCTCCTGCTGCTCTCGG
B*07 02 01 48 ATGCTGGTCATGGCGCCCCGAACCGTCCTCCTGCTGCTCTCGG
...
B*18 153 NA NA ATGCGGGTCACGGCGCCCCGAACCCTCCTCCTGCTGCTCTGGG
B*18 155 NA NA ATGCGGGTCACGGCGCCCCGAACCCTCCTCCTGCTGCTCTGGG
...
Within the "name" variable I have 36 different names with different numbers of rows having this name "B07" "B08" "B13" "B14" "B15" "B18" "B27" "B35" "B37" "B38" "B39" "B40" "B41" "B42" "B44" "B45" "B46" "B47" "B48" "B49" "B50" "B51" "B52" "B53" "B54" "B55" "B56" "B57" "B58" "B59" "B67" "B73" "B78" "B81" "B82" "B83"
My idea was, to create a loop within R in which I take the whole data set with its 7754 obs., look for the unique name ("B*07") and store all rows (including columns) having this name and stored it in a new data set with the lable "B07" OR directly create a .txt (or .fasta) file.
I would end up with separate files, for example a file called B07 containing only the B*07 information (620 obs. and 5 variables)
name protein mutation_CDS mutation_nonCDS seq
B*07 02 01 01 ATGCTGGTCATGGCGCCCCGAACCGTCCTCCTGCTGCTCTCGG
B*07 02 01 48 ATGCTGGTCATGGCGCCCCGAACCGTCCTCCTGCTGCTCTCGG
and a file called B08 containing only the B*08 information (X obs. and 5 variables)
B*18 153 NA NA ATGCGGGTCACGGCGCCCCGAACCCTCCTCCTGCTGCTCTGGG
B*18 155 NA NA ATGCGGGTCACGGCGCCCCGAACCCTCCTCCTGCTGCTCTGGG
In simple words, I want to split the 7754 observations in their parts referring to a certain name pattern
"B*07" 620 of 7754
"B*08" X of 7754
"B*13" X of 7754
etc.
does anyone have an Idea how to do that?
Aucun commentaire:
Enregistrer un commentaire