jeudi 24 février 2022

How to identify various string patterns in rows in Pandas dataframes when there are various values in between raws

Entity  Identified
0   Ref-Name    T.M.Chanika
1   Location    Kandy
2   Email   chanikalakmini23@gmail.com
3   Degree  Certificate Level of REVIT
4   Degree  Certificate Level of ADVANCED COMPUTER
5   Skill   Auto Cad
6   Degree  Certificate Level of ADVANCED COMPUTER
7   Skill   3D MODELLING using
8   Skill   Auto Cad
9   Institute   University of Moratuwa of Mechanical Engineering
10  Degree  NVQ Level 05 -Technical

Here I want to find some patterns that outputs in a NER model. For example (Name, Location, Email) pattern & (Degree, Institute) pattern. How we can do this using Pandas. If all the pattern are in order we can use some thing like this pattern = ['Ref-Name','Location','Email']

pat_i = [df[i-len(pattern):i] # Get the index 
 for i in range(len(pattern), len(df)) # for each 3 consequent elements 
 if all(df['Entity'][i-len(pattern):i] == pattern)] # if the pattern matched
pat_i

[     Entity                 Identified
 0  Ref-Name                T.M.Chanika 
 1  Location                      Kandy
 2     Email  chanikalakmini23@gmail.com]

How we can identify patterns if there are values in between like in Degree & Institute

Aucun commentaire:

Enregistrer un commentaire