I will try my best to go through the issue. Please let me know if there needs any clarity.
The problem is faced for an application in a distributed environment where multiple nodes (running on AWS) have a single persistent data store. In that datastore, we have 3 tables like,
instance_info (id, instance_details, ...) -> legacy table
task_info (id, task_id, ...) -> legacy table
new_table (id, instance_info_id, task_info_id, ...)
"id" in all the tables are the PKs of their respective tables. In new_table the column "task_info_id" is a foreign key to table task_info, and "instance_info_id" is to table instance_info.
On new_table, a unique constraint exists for columns instance_info_id & task_info_id, while the other tables don't consist of any unique constraint. The situation is that there could be multiple tasks running concurrently (on a single node) that try to insert data in the task_info table, leading to having multiple entries in the table (due to the absence of unique constraints).
Thought of the following bad approaches,
- add unique constraints to task_info table so that it fails on IntegrityConstrainViolation. However, this approach ruins an existing functionality for retrying mechanism (legacy code).
- Lock the whole table during write operation preventing other writes to fail. However, this could end up creating performance issues for us.
There is a distributed cache (Memcache) but seems to be in doubt as there could be a downtime, leading to data loss.
Is there any efficient design approach (with minimal/no changes in the legacy code/design) that can be looked into?
Aucun commentaire:
Enregistrer un commentaire