jeudi 2 novembre 2017

RabbitMQ - parallel processing with strict ordering guarantees

In HappyFunPizzaCorp we have a POS system which generates two events: new_order event and payment event. Both events contain a pizza order_id key for cross referencing.

Both events are sent to an exchange TheExchange. new_order events are always emitted before payment events.

We are a very busy place, selling, say, 100000 pizzas per second, so processing all of the records serially is not an option.

So the question is how do we parallelize the processing of our workload, while still guaranteeing that new_order events are processed before payment events for the same pizza?

A simple queue with multiple consumers won't do. We get round-robin behavior between consumers, so it's possible that the payment event is processed concurrently with the new_order event for the same pizza.

Another solution would be to use a sharded-exchange and use the order_id as a sharding key. While that sounds good, we now have some inherent parallelism between queues, how do our consumers connect? We could have a pre-defined set of queues to prevent re-sharding and parallelism between messages because of re-sharding. But then if we have multiple consumer instances, how do they decide which queues to consume data. On top of that, we want to auto-scale our consumers.

Our current solution is to use a consensus protocol (raft/paxos via zookeeper) to determine how many queues and which queues each of the consumer processes should service. We have a pre-set (high number) of sharded queues in the system which should not change. The consumers of the queues are exclusive (giving at most once delivery guarantees) and they coordinate via the consensus protocol which queues should be serviced by whom.

While this set up seems like it will work, it appears to be overly complex, and I am wondering if there is a reference solution which we are missing.

Aucun commentaire:

Enregistrer un commentaire