I have a server that is sending commands to other servers, and I want to design an acknowledgement system so that my server knows whether the commands were executed.
For most cases, this is easy, the other server can just send back a message when the command is completed or if it terminates abnormally.
The part I'm stuck on is what if one of the servers goes offline after receiving the command? Some of these commands can take a very long time to execute (like 20 min). And if the server recovers from a crash, it won't continue executing the command.
So that means I can't just keep a queue of unacknowledged commands and consider them failed if a set amount of time passes since these commands can take a long time depending on the situation.
I also can't just ping the servers to see if they are healthy, as they may have recovered from a crash.
Currently, there's no way to check on the server's current activity either.
Ideally, I'd like to avoid polling to reduce overhead.
Are there any design patterns that will work for this?
Aucun commentaire:
Enregistrer un commentaire