mercredi 20 novembre 2019

For each row in Pandas dataframe, check if row contains string from list

I have a given list of strings, like that:

List=['plastic', 'carboard', 'wood']

I have a column of dtype string in my dataframe, like that:

Column=['beer plastic', 'water cardboard', 'eggs plastic', 'fruits wood']

For each row in the column, I want to know if the row contains a word from the list, and if yes, I want to keep only the text that comes before that word, like that:

New_Column=['beer', 'water', 'eggs', 'fruits']

Is there a way to systematize this for each row of my dataframe (millions of rows)? Thanks

PS. I have tried building a function with regular expression pattern matching like this

pattern=re.compile('**Pattern to be defined to include element from list**')

def truncate(row, pattern):
    Column=row['Column']
    if bool(pattern.match(Column)):
        Column=Column.replace(**word from list**,"")
        return Column

df['New_column']=df.apply(truncate,axis=1, pattern=pattern)

Aucun commentaire:

Enregistrer un commentaire