I have a given list of strings, like that:
List=['plastic', 'carboard', 'wood']
I have a column of dtype string in my dataframe, like that:
Column=['beer plastic', 'water cardboard', 'eggs plastic', 'fruits wood']
For each row in the column, I want to know if the row contains a word from the list, and if yes, I want to keep only the text that comes before that word, like that:
New_Column=['beer', 'water', 'eggs', 'fruits']
Is there a way to systematize this for each row of my dataframe (millions of rows)? Thanks
PS. I have tried building a function with regular expression pattern matching like this
pattern=re.compile('**Pattern to be defined to include element from list**')
def truncate(row, pattern):
Column=row['Column']
if bool(pattern.match(Column)):
Column=Column.replace(**word from list**,"")
return Column
df['New_column']=df.apply(truncate,axis=1, pattern=pattern)
Aucun commentaire:
Enregistrer un commentaire