jeudi 11 juin 2015

Pattern for instantiating from list of (classes, args, kwargs) tuples

I have a set of (potentially quite complex) Python classes that perform a certain transformation of data, e.g. Principal Component Analysis (PCA). Let's call these blocks. Blocks are organized in pipelines which can have any sequence of blocks. Several pipelines are organized in what we call models.

I apply these models-with-pipelines-with-blocks in parallel to thousands of datasets. Therefore, I have thousands of instantiations of models (and pipelines, and blocks), since each dataset has its own associated parameters that are stored in the blocks (e.g., the principal components for the PCA).

Now, since I want to have a set of predefined pipelines of blocks, I define some as module attributes in my pipelines.py:

PIPELINE_A = [(Block1, args_block1, kwargs_block1),
              (Block2, args_block2, kwargs_block2),
              ...
              (BlockN, args_blockN, kwargs_blockN)]

The same goes for the models in my models.py:

MODEL_1 = {'in': (PIPELINE_A, args_pl_A, kwargs_pl_A),
           ...
           'out': (PIPELINE_B, args_pl_B, kwargs_pl_B)}

These can be considered pipeline/model configurations from which I can instantiate pipeline and model objects. I don't instantiate them in the module yet, because they should be different for all the datasets.

Because I always need a class name, arguments and keywords arguments, I have to keep these all stored in a tuple until I need them to instantiate the objects (e.g., [cl(*ar, **kw) for cl, ar, kw in MODEL_1). This is ugly, and I suspect there is a better way.

I wonder if anyone can recommend a design pattern or trick for this situation.

P.S. I realize I could instantiate a proto-pipeline/model in my modules and then make copies from that. Blocks can get really complex, though, and I wonder if Python handles deep copying well enough for that.

Aucun commentaire:

Enregistrer un commentaire