mardi 29 octobre 2019

Design a data source independent application using batch data

We have a legacy application which reads data from mongo for each user (query result is small to large based on user request) and our app creates a file for each user and drop to ftp server /s3.We are reading data as mongo cursor and writing each batch to file as soon it get batch data so file writing performance is decent.This application works great but bound to mongo and mongo cursor.

Now we have to redesign this application as we have to support different data sources i.e mongodb , postgresDB, Kinesis , S3 etc.We have thought below ideas so far-

1.Build data APIs for each source and expose a paginated REST response .This is feasible solution but it might be slow for large query data compare to current cursor response.

2.Build a data abstraction layer by feeding batch data in kafka and read batch data stream in our file generator.but most of the time user ask for sorted data so we would need to read messages in sequence .We will loose benefit of great throughput and lot of extra work to combine these data message before writing to file.

We are looking for a solution to replace current mongo cursor and make our file generator independent of data source.

Aucun commentaire:

Enregistrer un commentaire