I was studying the sklearn Pipeline object and also Pytorch nn module. In both cases it is possible to attach other object in to the constructor subclass of Pytorch or add a tuple in the list of Pipeline object. So when the pipeline is executed it runs the whole transformation sequentially both for Pytorch and slkearn. What I understood on studying sklearn Pipeline doc that each pipeline object should implement an interface which should have fit
, transform
and fit_transform
methods. Then later adding that in the Pipeline the objects can run as regular pipeline object. My Question which kind of Design Pattern is this and How I can create my own module like that. The sklearn repository is so big I am just lost where to find what? . For example I can create a New custom Transformer class such as:
class MultiColumnEncoder:
def __init__(self, columns=None):
"""Multicolumn Encoder class encode the provided columns in to integer"""
self.columns = columns
def fit(self, df: pd.DataFrame, y=None):
"""Just a Dummy Return"""
# In proper class definition that should be a fit
....
....
return fitted_df
def transform(self, df: pd.DataFrame):
"""
Transforms columns of X specified in self.columns using
LabelEncoder(). If no columns specified, transforms all
columns in X.
"""
....
....
return transformed_df
def fit_transform(self, df: pd.DataFrame, y=None):
.....
.....
return self.fit(df).transform(df)
Now as it satisfies the Pipiline interface in can use in my pipeline along with other transformer:
preprocess_pipeline = Pipeline([
('encoding', MultiColumnEncoder(columns=['column_1', 'column_2']),
('stdscaler', StandardScaler())
])
For pytorch I can add each sequential operation like convulation, dorp out, batch norm in the the __init__
method and the forward
method will run them sequentially. So which kind of design pattern is needed to design such architecture? I have searched several resources but have not got the answer.
Thanks
Aucun commentaire:
Enregistrer un commentaire