mardi 29 juin 2021

Database design issues due to concurrency in multi-node architecture

I will try my best to go through the issue. Please let me know if there needs any clarity.

The problem is faced for an application in a distributed environment where multiple nodes (running on AWS) have a single persistent data store. In that datastore, we have 3 tables like,

instance_info (id, instance_details, ...)   -> legacy table
task_info (id, task_id, ...)    -> legacy table
new_table (id, instance_info_id, task_info_id, ...)

"id" in all the tables are the PKs of their respective tables. In new_table the column "task_info_id" is a foreign key to table task_info, and "instance_info_id" is to table instance_info.
On new_table, a unique constraint exists for columns instance_info_id & task_info_id, while the other tables don't consist of any unique constraint. The situation is that there could be multiple tasks running concurrently (on a single node) that try to insert data in the task_info table, leading to having multiple entries in the table (due to the absence of unique constraints).

Thought of the following bad approaches,

  1. add unique constraints to task_info table so that it fails on IntegrityConstrainViolation. However, this approach ruins an existing functionality for retrying mechanism (legacy code).
  2. Lock the whole table during write operation preventing other writes to fail. However, this could end up creating performance issues for us.

There is a distributed cache (Memcache) but seems to be in doubt as there could be a downtime, leading to data loss.
Is there any efficient design approach (with minimal/no changes in the legacy code/design) that can be looked into?

Aucun commentaire:

Enregistrer un commentaire