mercredi 26 juillet 2017

Design for parallel processing in the system

In current system, there are some long running processes(and sub-processes) which run for every account that exists in the system. Those processes actually fetches account data and perform some operations on that account data (account data can go from few thousands to millions). These processes are not one time process. It keeps running in every few hours.

Earlier there was only single program which performs all the above tasks picking up one account at a time, which was not a desirable solution since that way it would take days to complete the job for all the system accounts. So we ended up having separate programs for each account so that we can run the programs in parallel for each account.

Since now the account data is growing day by day, there is a need to further scale it.

To scale this, we decided to design a producer/consumer queue architecture kind of solution. For queuing system I will be using RabbitMQ, can not switch to any other alternate for this.

There are two ways I think of right now:

  1. Build a single queue(there can be other queues for other sub-task) design where all the data of multiple account is dumped for processing and the consumers will pick data from that queue and process. Number of consumers can be increased to process the record from the queue. But I am not sure whether increasing number of consumers will solve the problem. Data will be processed in FIFO by the consumers for account, so data which is added in the queue in the last will always be picked up at the end. So it can not ensure that the overall system is performing on a better speed on account level.

  2. Another is, to introduce multiple queues each for one account. So that dedicated consumers for each account queue can be initiated and perform the task. Some how I do not feel comfortable with this approach since account numbers are dynamic and creating that number of queues does not seems to be a good solution to me, as the number of account grows will increase by the time.

Any other suggestion to solve this problem.

Aucun commentaire:

Enregistrer un commentaire