jeudi 16 janvier 2020

Subsetting a dataframe based on a vector of strings

I have a large dataset called genetics which I need to break down. There are 4 columns, the first one is patientID that is sometimes duplicated, and 3 columns that describe the patients.

As said before, some of the patient IDs are duplicated and I want to know which ones, without losing the remaining columns.

dedupedGenID<- unique(Genetics$ID) Will only give me the unique IDs, without the column.

In order to subset the df by those unique IDs I did

dedupedGenFull <- Genetics[str_detect(Genetics$patientID, pattern=dedupedGenID,]

This gives me an error of "longer object length is not a multiple of shorter object length" and the dedupedGenFull has only 55 rows, while dedupedGenID is a character vector of 1837.

My questions are: how do I perform that subsetting step correctly? How do I do the same, but with those that are multiplicated, i.e. how do I subset the df so that I get IDs and other columns of those patients that repeat?

Any thoughts would be appreciated.

Aucun commentaire:

Enregistrer un commentaire