jeudi 21 janvier 2021

How to Create Scikitlearn or Pytorch like Pipeline Design Pattern in Python?

I was studying the sklearn Pipeline object and also Pytorch nn module. In both cases it is possible to attach other object in to the constructor subclass of Pytorch or add a tuple in the list of Pipeline object. So when the pipeline is executed it runs the whole transformation sequentially both for Pytorch and slkearn. What I understood on studying sklearn Pipeline doc that each pipeline object should implement an interface which should have fit, transform and fit_transform methods. Then later adding that in the Pipeline the objects can run as regular pipeline object. My Question which kind of Design Pattern is this and How I can create my own module like that. The sklearn repository is so big I am just lost where to find what? . For example I can create a New custom Transformer class such as:

class MultiColumnEncoder:
    def __init__(self, columns=None):
        """Multicolumn Encoder class encode the provided columns in to integer"""
        self.columns = columns

    def fit(self, df: pd.DataFrame, y=None):
        """Just a Dummy Return"""
        # In proper class definition that should be a fit
        ....
        ....
        return fitted_df

    def transform(self, df: pd.DataFrame):
        """
        Transforms columns of X specified in self.columns using
        LabelEncoder(). If no columns specified, transforms all
        columns in X.
        """
       ....
       ....
       return transformed_df

    def fit_transform(self, df: pd.DataFrame, y=None):
        .....
        .....
        return self.fit(df).transform(df)

Now as it satisfies the Pipiline interface in can use in my pipeline along with other transformer:

preprocess_pipeline = Pipeline([
    ('encoding', MultiColumnEncoder(columns=['column_1', 'column_2']),
    ('stdscaler', StandardScaler())
])

For pytorch I can add each sequential operation like convulation, dorp out, batch norm in the the __init__ method and the forward method will run them sequentially. So which kind of design pattern is needed to design such architecture? I have searched several resources but have not got the answer.

Thanks

Aucun commentaire:

Enregistrer un commentaire