jeudi 4 juin 2015

wildcard pattern matching and output of pattern (bolded) in subject string

I have a data frame with several columns, in column 2 I have my patterns, and in column 4 I have long strings (subjects) where I want to find the pattern in. I'd like to loop through the data frame so that:

for 1:nrow(dataframe)

do pattern match dataframe.col2 to dataframe.col4 such that if the pattern is ABCDEF then A?B?C?D?E?F is also allowed (? for 1 character)

and then print by mapping pattern onto the long string sequence and bolding the pattern to see where in the sequence it is :

ID_num ZXCVBNMABCYDWEFZXCVBNMXCDFGHJK

so the results should be a new dataframe that looks exactly like the old dataframe, but, in col4 the pattern is in BOLD (if there is a match). this way I can quickly scan all the matches to see which are best (this depends on the position of the pattern in the string as well, so if that information could be captured, that would help) Thanks!

Aucun commentaire:

Enregistrer un commentaire