mardi 3 mai 2016

R regex semicolon

I want to check if all elements of a list have the pattern I need them to have, otherwise I will stop the whole script.

The example list looks like this:

[1]
Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanobrevibacter;
[2]
Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanosphaera;
[3]
Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanosphaera;
[4]
Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;
[5]
Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;
[6]
Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;
[7]
Bacteria;Actinobacteria;Actinobacteria;Coriobacteriales;Coriobacteriaceae;Gordonibacter;
[8]
Bacteria;Actinobacteria;Coriobacteriia;Coriobacteriales;Coriobacteriaceae;;
[9]
Bacteria;Actinobacteria;Coriobacteriia;Coriobacteriales;Coriobacteriaceae;;

Want I all entries to have exactly six semicolons. I tried to do a pattern matching with grepl but I have troubles with the right pattern. Here is what I tried

if(!any(grepl(";{6}", taxonomy))) { Through error message if the
taxonomy is not in the right format   stop("Wrong number of taxonomic
classes\n Taxonomic levels have to be separated by semicolons (six in
total).  IMPORTANT: if taxonomic information at any level is missing,
the semicolons are still needed:\n  
e.g.Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;
      e.g.Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;;")
} else {

But I always get FALSE.

Aucun commentaire:

Enregistrer un commentaire