lundi 1 juin 2015

Is a workflow appropriate to import CSV, find and replace, normalize, then insert into db

I have a simple but tedious requirement, to build a system that each month imports several csv files into a database.

Each CSV file has different fields, and needs different sets of rules i.e.

  • column mapping
  • validation (i.e. do I have the correct number of fields, are they of the right type?)
  • capitalization
  • string replacement
  • regepr replacement

All of the above will be defined either at the file level or for individual fields.

I would like to use php (but it's not mandatory).

Since the requirement is pretty common, and it's quite a lot of work (I need to also code an editor for the rules), and I don't like reinventing the wheel, I am wondering if a workflow - based approach could prove satisfactory, leaving the user to choose which sets of rules are to be applied for each condition. Otherwise, please help me find the words that best describe this set of requirements, I am not mothertongue and I'm having a very hard time finding out best practices and alternative approaches, or at least some design-pattern level code to best implement this.

Right now I'm coding a javascript-editor for the rules that dynamically creates rules in json-format such as:

var rule = {
  ruleName:"Import from Agrabah",
  senderEmail:"jack@example.com",
  fileValidations: {
    format:"UTF-8",
    columns:8
  },
  ignoreLines: 1,
  find: ["Sir","Doc"],
  replace: ["Sr","Dr"],
  fields: [{
    index:0,
    column:"Name",
    case:"ucword"
  },{
    index:1,
    column:"Last",
    case:"ucword"
  },{
    index:2,
    column:"Province",
    case:"uc",
    find: ['ps'],
    replace: ['pu']
  }]
}

These are then saved in the php backend and used to define how each received file should be processed.

Each csv line is converted to an object through these rules, using the column names as properties of the object. Once this is done, a simple script updates/inserts the database record.

If my approach requires improvements or doesn't follow best practices please mention it

Please do not close this question, I am not asking for an opinion on 'the best product', rather the approach, patterns that may help me achieve a great result.

Aucun commentaire:

Enregistrer un commentaire