samedi 23 avril 2022

Optimal design patterns for reading and restructurize datasets with different formats?

Problems:

  1. (I want to solve this) Homogenize image datasets with different formats: HDF5, folder images (with different structures), etc.

  2. (Just to give you context) Then, the datasets are concatenated, preprocessed by a client code and stored in a HDF5 file with a defined/fixed structure.

My solution to 1:

Use the template pattern as the following pseudo-UML shows:

pseudo-UML diagram

Noticed drawbacks of this solution to 1:

  1. Client code needs to be changed each time a new dataset comes into play because it doesn't know which ConcreteStructurizer should use for a given dataset, I mean, the client does something like that:
if datset_0 use ConcreteStructurizerFolder
ConcreteStructurizerFolder(cfg_dataset_0).reorganize()
.
.
.
if dataset_n use ConcreteStructurizerHDF5
ConcreteStructurizerHDF5(cfg_dataset_n).reorganize()

Could you propose a better/optimal approach/design pattern?

PD: I am learning software design (physics background), I'd be grateful if you could provide a pedagogical/well explained answer, thanks.

Aucun commentaire:

Enregistrer un commentaire