design-patterns: Large file processing asynchronous and db optimization

samedi 5 janvier 2019

Large file processing asynchronous and db optimization

We are supposed to receive hourly (3h range) or daily report (24h range) this report will contain Entry matching (reference,id1,id2) to Transaction records in our db (reference,id1,id2) if match is found we need to include this record in our final comparison of amount in the report and sum of calculation of matching record amount in our db.
We are using JEE7 and so far we have come up with this approach :
1. Parse/De-serialize xml streaming Entry into Java object using Jackson
2. Once in-memory threshold of 10K object is reached sum/filter these object and then serialize them to json before storing them temporarily in db.
3. So by the end of complete parsing we will have filtered/sum and serialize to json data our Entry in db. We start parallel from now on to compare these Entry objects with our Transaction record.

Here I want some optimization because all we want is to compare these Entry(reference,id1,id2) = Transaction(reference,id1,id2) and if match is found we consider this Transaction in our final calculation because it was part of our report. So essentially all (reference,id1,id2) should be unique.
I am reading about ora_hash function or multiple column index to speed up this lookup. Then there is some limitation with SQL IN Clause (1000 records) see here
We are also using Ejb Async method to achieve this parallelism and still need to work on knowing when this asynch call ends don't want to introduce Thread delay.

I want to know few optimization in this approach and in general if this approach is ok after we mark all our Transaction on RHS we do as step 4 is to query and sum all these amounts and match with the one came with this report, this concludes our process.

design-patterns

samedi 5 janvier 2019

Large file processing asynchronous and db optimization

Aucun commentaire:

Enregistrer un commentaire