jeudi 15 décembre 2016

R: grep() pattern is a vector

I have two data frames, df 1 contains a small set of chemical compounds I have been using and I want to see which of these compounds are also part of df 2. The problem I ran into is that many values in df2 do not match exactly my values in df1 (e.g. df1 = "Altretamin", df2 = "Altretamine" or "Altretamin Hydrochloride" or "ALTRETAMIN HCL"). To circumvent these match and upper/lower case problems I want to use an ifelse / grepl / grep syntax but grep itself has issues when it reaches a NULL in the loop and when trying to circumvent this with an ifelse statement i get the following error:

   df1 <- data.frame(CompoundName = c("Bosutinib", "Nilotinib", "Cabozantinib", "Altretamine"))
   df2 <- data.frame(CompoundName = c("Bosutinib", "Nilotinib", "ALTRETAMINE HCL", "Masitinib"))

    index <- NULL
    for (i in 1:length(df1$CompoundName)) {
      index[i] <- 
        ifelse(grepl(df1$CompoundName[i], df2$CompoundName, ignore.case = TRUE), 
               grep(df1$CompoundName[i], df2$CompoundName, ignore.case = TRUE), 0)
      print(index[i])
    }
    index

Which gives

[1] 1
[1] 0
[1] 0
[1] 0
Warning messages:
1: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName,  :
  number of items to replace is not a multiple of replacement length
2: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName,  :
  number of items to replace is not a multiple of replacement length
3: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName,  :
  number of items to replace is not a multiple of replacement length
4: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName,  :
  number of items to replace is not a multiple of replacement length
> index
[1] 1 0 0 0

I figured so far that grepl gives me a vector of FALSE or TRUE statements for each "i" but does not just give me one TRUE or FALSE value which I could use in the loop. Is there a good way to circumvent these issues? Or just another way how to match patterns that are not exact?

Aucun commentaire:

Enregistrer un commentaire