mardi 14 janvier 2020

How do I correct this regular expression and function in order to verify the correctness of a pandas column value pattern?

I would like to check that the the strings in my pandas column follow a specific pattern. I want to do so with a function check_pattern and a regular expression. The data should consist only out of digits except that there is a dash after the first two digits. A correct value would be 08-15643. Wrong values could for example be 07-456d, 04-47897-1, 084564, etc.)

Please take a look at the data and my code:

df = pd.DataFrame(str_list)
str_list = ['19-123', '08-156445787', '08-156468787-1']
df.rename(columns={df.columns[0]: "Strings" }, inplace = True)

def check_pattern(Strings):
    is_correct_pattern = False
    pattern = re.compile("^[0-9]{2}'-'[0-9]")
    if pattern.match(Strings) == True:
        is_correct_pattern = True
   return is_correct_pattern

df['Correct_pattern'] = df['Strings'].apply(lambda x: check_pattern(x))

My output should be the original dataframe df with an additional column Correct_Pattern. With the data df given, the result should be True, True, False for that column. If you have another idea for solving this I am also interested :)

Aucun commentaire:

Enregistrer un commentaire