vendredi 21 août 2020

How to design data provider

I have an application made of many individual scripts. Output of each of them is an input od the next one. Each script reads data on the beggining and saves modified data as its last activity. In short:

  • script1.py: reads mariadb data to df -> does stuff -> saves raw data in mysql.sql sqlite3 format
  • script2.py: reads sqlite3 file -> does stuff -> saves raw data in data.txt - tab separated values
  • program3.exe: reads data.txt -> does stuff -> writes another.txt - tab separated values
  • script4.py: reads another.txt -> does stuff -> creates data4.csv
  • script5.py: reads data4.csv -> does stuff -> inserts mariadb entries

What I am searching and asking for is: is there any design pattern (or other mechanism) for creating data provider for situation like that? "Data provider" should be a some abstraction layer which:

  • have different data source types (like mariadb connection, csv files, txt files, others) predefined and easy to extern that list.
  • should reads data from "data-specified-source" and deliver the data to given script/proggram (f.i. by execute script with parameter)
  • should validate if output of each application part (each script/program) is valid or take over the task of generating this data

In general "Data provider" would run script1.py with some parameter (dataframe?) in some sandbox, take over data before it is saved and prepare data for script2.py proper execution. OR it just could run script1.py with some parameter, wait for execution, check if output is valid, convert (if necessary) that output to another format and run script2.py with well-prepared data.

I have access to python script sources (script1.py ... script5.py) and I can modify them. I am unable to modify program3.exe source code but it is always one part of the whole process. What is the best way (or just a way) to design such a layer?

Aucun commentaire:

Enregistrer un commentaire