I have an API which features a series of endpoints that all preform very long running jobs, as in jobs that may take up to 48 hours to complete.
Of course, I can’t keep the client waiting for 48 hours to return a response, so I am looking for the best solution to handle these cases.
I have an idea of what to do but I am unsure if it is a worthwhile solution or how things are done in production based app. Furthermore, I’d like to implement a way to cancel the jobs when they’re running if need be and to update/monitor the overall progress of the jobs.
My current setup works as follows:
-
API receives the request to start long job
-
An entry for the job info is stored in the DB with the job status set to PENDING
-
That job info is placed in a Rabbit MQ message and sent off by the RabbitMQ Producer
-
The job id is retuned back to the client that initiated the API call with a 202, accepted status
-
A RabbitMQ Consumer receives message with the job info and calls the class that is responsible for Executing the long running job, the job status is updated to IN-PROGRESS
-
Now the client can check on the status of the job by another endpoint that accepts the job id and returns the current status / info
I think this approach works, and it seems scalable, but I have a few concerns that maybe someone could shed some insight on or help me address:
A. I want to be able to kill the job from another exposed endpoint if need be, what is the best way of accomplishing such a thing? I was thinking that maybe I could also persist the Thread id with the job info in step 5 when the service updates the status to IN-PROGRESS and begins processing. Then when I hit the cancel job endpoint I could just give it the job id and then kill the associated thread. Is that a viable solution or is there a better way to handle it?
B. I would like to implement an update strategy that allows me to quantify the overall progress for the job, therefore instead of just seeing IN PROGRESS or PENDING from the front end I can see the percentage of the job that is complete. The front end will be a desktop app so eventually i’d like to use this info to support a progress bar. However, I’m concerned about performance because I will need to constantly be writing to the job table for every time the progress is incremented and also constantly reading from the table when the endpoint to check the status is being hit by the client (I’m thinking every 10 seconds or so) is there a better solution to handling this based on the given info?
If it makes any difference, only 1 job should be processing at a time ... this is for an admin portal so only a few people will have access to this feature and if a specific job type is already IN PROGRESS, than another of that type won’t be allowed until it is COMPLETE, CANCELLED, or FAILED
Aucun commentaire:
Enregistrer un commentaire