mercredi 13 novembre 2019

Spring Boot: parse paginated list of large number of db entries while deleting some of them

  • Technologies: backend - Spring Boot java 8, database - postgres, entities mapped via Spring Data Jpa/Hibernate
  • I have the task of querying a table with a large number (millions) of distinct entries and for each entry validate (via REST HTTP call) a certain case from an external system. If the validation case fails, the certain entry has to be removed.
  • This task has to be done for all entries of that table, so we cannot filter by anything. It also requires a more complex validation business logic so we have to do it in Java, not with only SQL scripts
  • Due to the large amount of distinct entries, one idea is to use PagingAndSortingRepository.
  • Therefore the java task would be executed in a dedicated thread, as follows:
    • for each page of n entries, execute the validation logic for each entry. If validation fails for that entry, it should be deleted from the database. Potential solutions:
      1. delete the entry directly - Open point: Let's say we removed 2 entries from the total of 10 from page 1, then we cannot continue with page 2 since now the pagination order has changed (2 entries from initially page 2 would now come to page 1). We can process page 1 again in order to make sure we didn't skip any entries, but now some entries would be processed twice (increased load on external system that is called for validation). Is there a way to modify/delete data retrieved in a paginated way such that it does not always break initial pagination?
      2. flag it for future deletion somehow so we don't modify initial pagination results - maybe add the unique id of the entry to a list and after we processed all pages, finally delete all entries mentioned in that list - Open point: The list could grow quite large and potentially cause memory issues.
      3. ??

What would be the most efficient solution considering the large dataset and considering the cost of each "validation check" on external system?

Aucun commentaire:

Enregistrer un commentaire