vendredi 24 septembre 2021

Best Library or Object Oriented Design Pattern to Describe Dataflow Dependency in Python

Writing codes on python3. Basically, the project involves generating some json files based on some input json files and taking them through multiple processing steps (which I have implemented as separate functions). Algorithmically speaking, nothing complicated.

For example, there will be one module to load input in the form of pandas dataframes and some nested dictionaries. Then some function will do stage1 processing based on the loaded input. Another function will do stage2 processing based on stage1 output. Then stage3 processing will use the output of stage2 as well as the loaded inputs.

Essentially, the computation diagram will look like a directed acyclic graph where each node is a function (which I already wrote), having one starting point and one endpoint.

My question is what kind of object oriented design pattern or readymade libraries best capture this kind of flow pattern? Without such a design, I have to code the whole thing as

processed_input=input_reader(source_json)
intermediate_op1=stage1(processed_input)
intermediate_op2=stage2(intermediate_op1)
intermediate_op3=stage3(intermediate_op2, processed_input)
...
# and so on

My intention is to describe the graph as an object and the raw input. Then the object will execute to calculate the final output from the input source file names. Probably I can code a class which will execute the functions in sequence, but to initiate the class, I would have to supply the above code nonetheless.

Kindly let me know if the question and my intention is not too clear, I will try to rephrase it.

Aucun commentaire:

Enregistrer un commentaire