lundi 1 juin 2015

What is the best way to implement a fast, scalable statistics aggregation architecture?

The problem:

When displaying user statistics in our e-commerce website (e.g: sales/shopping analytics, etc…) we use a fan-in approach: certain flows in the system trigger an event to a rabbit worker, which aggregates statistics per user in MongoDB, so that most of the calculations are done upon insert and retrieving statistics for display is trivial and very lightweight.

When an event isn’t received in the worker, the counters start to ‘drift away’ from the true MySQL count.

counters may drift due to:

  • Service outages
  • Bugs
  • Multiple-workers synchronisations (2 workers can update the same document, so updates must be atomic, like mongo’s ‘inc’)

As the number of users/orders/messages/etc keeps growing, doing MySQL counts and joins to calculate these stats on-the-fly becomes less and less scalable. So we can’t do that. That’s why we chose the fan-in approach to begin with.

What is the best solution to overcome these "drifts away" in a robust and scalable way?

Aucun commentaire:

Enregistrer un commentaire