design-patterns: Storing raw data for complex calculations

lundi 4 mai 2020

Storing raw data for complex calculations

We have a web application for reporting where all the data resides in a relational database. For most of the reports, we can do all the calculations with the built-in aggregator functions provided by the engine or by pre-calculating and caching the results. The performance of generating those reports to the users are very fast.

There are only two cases where we need to run a very complex algorithm based on trial and error that cannot be accomplished via SQL nor anything can be pre-cached in the database. There are also too many input variations that it is impossible to cache the results. The algorithm also requires all the data at the same time and therefore it is not possible to divide and do in parallel or to fetch in chunks.

So we are currently preparing those reports by getting the raw data from the database and calculating the logic in C#. This is however slow as we need to fetch all that data that can reach up to 8GB at the moment to the backend. Also, many heavy concurrent requests might cause a limitation in the available memory at the VMs.

We are already fetching the minimum amount of rows and columns (2 numeric fields and 1 date field) to do the algorithm and based on that we cannot improve the performance by reducing the size of transferred data. So we tried to speed it as PoC by doing the following:

In-memory databases for caching raw data (Redis, Memcached): While reading from the database was very fast, we faced a large problem in deserialization that can take up to 9 seconds for the data we have. We tried multiple deserialization algorithms but none of them were fast enough for large amount of data.
Storing raw data in memory (local cache): This provided obviously the best performance possible by finalizing the calculations in less than a second. However, this is not ideal as it is not scalable and might even cause problems with garbage collection according to many sources online.

My question is whether there is a recommendation from an architectural point of view where we can speed up the reports to something close to local cache speed without the limitations caused by it. We use .NET Core and SQL server if that helps.

design-patterns

lundi 4 mai 2020

Storing raw data for complex calculations

Aucun commentaire:

Enregistrer un commentaire