When the user uploads a spreadsheet with headers, I parse it into JSON. Then, I need to match their fields with the fields in my database. For example, if they uploaded a fname
and lname
field, I'd recognize those, and concatenate them into the full name
field in my database.
My idea is to write patterns based on regex, column name, and column position. For example, a Last Name
has a header name that probably starts with "Last" or at least "L". The values probably have a regex pattern of [a-zA-Z]{2,20}
and the previous column's values (presumably first name) probably follows the same regex pattern. I'll keep my regex pattern simple & scan a sample of the records to rule out edge cases like "Jo Ann O'hoolihan".
Does this approach sound reasonable? Has anyone already created something similar? Of course, I'll have the user confirm that the algorithm guessed correctly, but I find it strange how little information I've found googling about the pattern. Any ideas, comments, or sources welcome!
Aucun commentaire:
Enregistrer un commentaire