I have been asked this question in an interview
A company developing its own people management system and want to migrate data to its own database. To achieve this there are 3 API(getPersonalInfo, getPayrollInfo, getDomainAssociationInfo) that can provide data, each API cost $1 per successfull response. The number of employees are 1 million(just the interviewer intention was to tell there is huge amount of data). As per my understanding what comes first in my mind -
Use executor framework to process the thread parallely and make the use of Spring batch, as huge amount of data is required to read and write.
Lets say for every id, three API need to give response, that need to be stored, so as to maintain an object lets say ProcessedData
ProcessedData {id, PersonalAPIResponse{responseCode, Data}, PayrollAPIResponse{responseCode, Data}, DomainAssociationAPIResponse{responseCode, Data}},
If for particular id, the response code in all three API would be 200 successfull response, this id would be added to the List, lets say, 'prepareWriteList'
otherwise if any API fails then it will be added in a queue, lets say, 'retryExecuteQueue'.
There is another exceutor framework that will pick the task from 'retryExecuteQueue' and then check the response code in API response object, if its not 200 means it assumes not successfull and retry to hit the APIs again to get the data.
If its passes then it will be added to 'prepareWriteList' otherwise again add to 'retryExecuteQueue'.
When the prepareWriteList size reached to max, the itemReader send the list to ItemWriter to write the data into the local DB.
I want to know, what could be the better approach. Also, didnot get any clue that why interviewer mentioned the cost for API, as per his clarification one id data can be fetched at once only, although it only been charged for successfull reponse and not for failure.
Any better approach, please suggest, Thanks in advance !!!!
Aucun commentaire:
Enregistrer un commentaire