vendredi 27 juillet 2018

how to organize the code for an analytics solution with different algorithms?

So I´m currently creating an analytics solution that will:

  1. read the data from csv files
  2. clean, pre-process, normalize some fields
  3. apply different algorithms depending on the case (k-means, sales forecasting, etc)
  4. save results as csv files somewhere

I´m planning to have different standalone batch jobs without any dependencies among them and all of them would have the 4 steps described above. Which design pattern would you use to implement this in Python? strategy? Would you have a parent class called Analysis and use inheritance (like below)? any "more decent" solutions available?

class Analysis:
    def __init__(self, name):
        self.name = name
    def readData(self):
        print("read data")
    def cleanData(self):
        print("clean data")
    def etL(self):
        print("etl")
    def process(self):
        print("process")
    def saveResults(self):
        print("save results")

Aucun commentaire:

Enregistrer un commentaire