jeudi 5 octobre 2017

Finding all string matches from another dataframe in R

I am relatively new in R.

I have a dataframe locs that has 1 variable V1 and looks like:

V1
edmonton general hospital
cardiovascular institute, hospital san carlos, madrid spain
hospital of santa maria, lisbon, portugal

and another dataframe cities that has two variables that look like this:

city              country
edmonton          canada
san carlos        spain
los angeles       united states
santa maria       united states
tokyo             japan
madrid            spain
santa maria       portugal
lisbon            portugal

I want to create two new variables in locs that relates any string match of V1 within city so that locs looks like this:

V1                                            city                  country                      
edmonton general hospital                     edmonton              canada
hospital san carlos, madrid spain             san carlos, madrid    spain
hospital of santa maria, lisbon, portugal     santa maria, lisbon   portugal, united states

A few things to note: V1 may have multiple country names. Also, if there is a repeat country (for instance, both san carlos and madrid are in spain), then I only want one instance of the country.

Please advise.

Thanks.

Aucun commentaire:

Enregistrer un commentaire