samedi 16 mai 2020

How to design multiclient preprocess software pipeline using aws?

My software goal is to automate the preprocessing pipeline, the pipeline has three code blocks:

  1. Fetching the data - either by api or by client uploading csv to s3 bucket.

  2. Processing the data - my goal is to unified the data from the different clients to a unified end scheme.

  3. Store scheme is database. I know it is a very common system but I failed to find what is the best design for it.

The requirements are:

  1. The system is not real time, for each client I plan each X days to fetch the new data and it is dose not matter if only even a day later it will finish
  2. The processing partis unique per client data, of course there are some common features, but also a lot of different features and muniplation.
  3. I wish the system to be automated.

I thought of the following :

  1. The lambda solution: schedule a lambda for each client which will fetch the data every X days, the lambda will trigger another lambda which will do processing. But if I have 100 clients that will be awful to handle 200 lambdas.

  2. 2.1 making a project call Api and have different script for each client, my a schudle for each script on a ec2 or ecs.

2.2 Have another project call processing where the father class has the common code and all the subclass client code inherite from it, the API script will activate the relevant processing script.

In the end I am very confused what is the best practice, I only found example which handle one client, or a general scheme approch/ diagram block which is to broad. Because I know it such a common system, I would appreciate learning from others experience. Would appreciate any reference links or wisdom

Aucun commentaire:

Enregistrer un commentaire