I have two data frames, df 1 contains a small set of chemical compounds I have been using and I want to see which of these compounds are also part of df 2. The problem I ran into is that many values in df2 do not match exactly my values in df1 (e.g. df1 = "Altretamin", df2 = "Altretamine" or "Altretamin Hydrochloride" or "ALTRETAMIN HCL"). To circumvent these match and upper/lower case problems I want to use an ifelse / grepl / grep syntax but grep itself has issues when it reaches a NULL in the loop and when trying to circumvent this with an ifelse statement i get the following error:
df1 <- data.frame(CompoundName = c("Bosutinib", "Nilotinib", "Cabozantinib", "Altretamine"))
df2 <- data.frame(CompoundName = c("Bosutinib", "Nilotinib", "ALTRETAMINE HCL", "Masitinib"))
index <- NULL
for (i in 1:length(df1$CompoundName)) {
index[i] <-
ifelse(grepl(df1$CompoundName[i], df2$CompoundName, ignore.case = TRUE),
grep(df1$CompoundName[i], df2$CompoundName, ignore.case = TRUE), 0)
print(index[i])
}
index
Which gives
[1] 1
[1] 0
[1] 0
[1] 0
Warning messages:
1: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName, :
number of items to replace is not a multiple of replacement length
2: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName, :
number of items to replace is not a multiple of replacement length
3: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName, :
number of items to replace is not a multiple of replacement length
4: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName, :
number of items to replace is not a multiple of replacement length
> index
[1] 1 0 0 0
I figured so far that grepl gives me a vector of FALSE or TRUE statements for each "i" but does not just give me one TRUE or FALSE value which I could use in the loop. Is there a good way to circumvent these issues? Or just another way how to match patterns that are not exact?
Aucun commentaire:
Enregistrer un commentaire