samedi 29 décembre 2018

Scikit-learn pipelines : how to deal with hyperparameters in a clean way

Let's suppose that I have a Pipeline in scikit-learn (pipe and filter pattern).

Now let's suppose that this Pipeline is wrapped in a class of its own and that it has a pre-made list of hyperparameters, or a premade list of hyperparameters grid, as a static class constant. The class stays a pipeline so that it has the .fit and .transform (it inherits from the Pipeline object, but at construction, it already knows which pipeline steps to compose).

I now want pipelines of pipelines, so one master pipeline would compose each one. In the end, I want to perform a hyperparameter search, so I'd need the first pipeline to get the hyperparameters of every sub-pipeline and add the string prefix as needed.

So the question is: How to do this (optional: in a clean way)? I feel that I break some Object Oriented Programming (OOP)'s SOLID principles by having to get one massive hyperparameter grid out of each pipeline somehow recursively and to prepend prefixes to params with some __ separators.

Any suggestion will be appreciated. It's obvious that in my thought experiment, something is wrong with the way to share hyperparameters, and I'd like to fix that with. Can you suggest a design pattern or a more usable way to do that?

Aucun commentaire:

Enregistrer un commentaire