I would like to check that the the strings in my pandas column follow a specific pattern. I want to do so with a function check_pattern and a regular expression. The data should consist only out of digits except that there is a dash after the first two digits. A correct value would be 08-15643. Wrong values could for example be 07-456d, 04-47897-1, 084564, etc.)
Please take a look at the data and my code:
df = pd.DataFrame(str_list)
str_list = ['19-123', '08-156445787', '08-156468787-1']
df.rename(columns={df.columns[0]: "Strings" }, inplace = True)
def check_pattern(Strings):
is_correct_pattern = False
pattern = re.compile("^[0-9]{2}'-'[0-9]")
if pattern.match(Strings) == True:
is_correct_pattern = True
return is_correct_pattern
df['Correct_pattern'] = df['Strings'].apply(lambda x: check_pattern(x))
My output should be the original dataframe df with an additional column Correct_Pattern. With the data df given, the result should be True, True, False for that column. If you have another idea for solving this I am also interested :)
Aucun commentaire:
Enregistrer un commentaire