mardi 16 avril 2019

Python Design Pattern: using class attributes to store data vs. local function variables

Often I find myself running into the same question. A common pattern is that I create a class that performs some operations. Eg. Loads data, transforms/cleans data, saves data. The question then arises how to pass/save intermediate data. Look at the following 2 options:

import read_csv_as_string, store_data_to_database

class DataManipulator:
    ''' Intermediate data states are saved in self.results'''

    def __init__(self):
        self.results = None

    def load_data(self):
        '''do stuff to load data, set self.results'''
        self.results = read_csv_as_string('some_file.csv')

    def transform(self):
        ''' transforms data, eg get first 10 chars'''
        transformed = self.results[:10]
        self.results = transformed

    def save_data(self):
        ''' stores string to database'''
        store_data_to_database(self.results)

    def run(self):
        self.load_data()
        self.transform()
        self.save_data()

DataManipulator().run()

class DataManipulator2:
    ''' Intermediate data states are not saved but passed along'''


    def load_data(self):
        ''' do stuff to load data, return results'''
        return read_csv_as_string('some_file.csv')

    def transform(self, results):
        ''' transforms data, eg get first 10 chars'''
        return results[:10]

    def save_data(self, data):
        ''' stores string to database'''
        store_data_to_database(data)

    def run(self):
        results = self.load_data()
        trasformed_results = self.transform(results)
        self.save_data(trasformed_results)

DataManipulator2().run()

Now for testing, I find DataManipulator2 cleaner, at the same time I also like the clean run function of DataManipulator. What is the most pythonic way?

Aucun commentaire:

Enregistrer un commentaire