I am currently examining different design pattern options for our pipelines. Kedro framework seems like a good option (allowing to modular design pattern, visualization methods, etc.).
The pipelines should be created out of many modules that are either writting output to file or piping it to next module (depends on condition). In second case (pipe to next module) kedro falls of, as it reads the whole output into memory and then forwards to next step (or is there a possibility of unix-type pipeing)? I am working with Big Data, so this one is out for me. Why is this workflow different to a usual unix pipe? - unix pipes are reading in certain buffer size and forwarding it right away (I guess this gets swapped to disk and not kept in memory?). I would appreciate if you could point me to another framework that allows such functionality (I also don't mind implementing DP from scratch).
Aucun commentaire:
Enregistrer un commentaire