I have have list of Jobs (more than 1000K) Jobs.
Some times my server may crashed or program may hang.
If I start executing Jobs again it will start from begin, these jobs runs in threading based.
I want to develop robust and standard solution to handle Jobs to start from where it stopped.
If my approach is wrong or any better/easy solution please share us docs links to implement same
I never implemented such kind of solutions before, So below is steps I am thinking to implement
1) My Program handles 50 threads
2) sort the jobs in the list
3) split 50 jobs from the list
4) If 50 jobs completed write to file (I submit 50 jobs parallel )
a. Create file .lock
b. write data to .lock to completed: 50
5) Increment for each success to 50 in .lock file
6) In case server crash, start program again
7) It first reads the value from .lock and get the completed number
8) start the job process from completed number
9) In case I have to start process all the jobs from beginning, I will provide reset feature, which will make .lock file to "0" completed
I searched few of the docs, before posting this question but could not locate right docs
Aucun commentaire:
Enregistrer un commentaire