I have to write an api-client system that connects to multiple api-servers, does a job and disconnects. It does two simple things, but needs to do it at scale (ie: aiming for 200-500m outbound API client calls per day):
(1) Simple client connects to an API-server (http/rest), sends a query, receives a response (text based), saves the response for later, and moves on to the next server/query.
Once responses start coming in, a separate process will:
(2) parse the text in the responses and add them to a large file/queue for reporting
I currently have a test system in C#, running 20 console applications on a machine, with 20 threaded clients in each console application carrying out the work. I need to be able to scale this up on demand. What is the best approach to do this? ... I am sure a solid pattern exists to this simple problem?
My thoughts so far are:
-> design a management system that depending on the volume of API-servers to be queried in a given hour, orchestrates the provisioning of virtual machines (not trying to redesign the wheel - will hook into any existing framework like chef/puppet etc if suitable)
-> have a central system for collection of data from the api-clients (perhaps a node instance passing the data off to RabbitMQ for later pickup/processing)
-> have a separate management system that orchestrates the text parsing of data received from the API clients.
-> As project is network latency bound, I believe development language is not really relevant so long as it has good network support.
My main questions then are around:
(1) What would be a most appropriate language/framework to implement this in to enable a lean/cost-effective system? ... ie: no point in spinning up multiple Windows VMs for example if they have a bigger footprint/overhead/cost than doing the same thing in linux? (so in this case I could use the mono framework - get the benefit of C# that my team knows, but the lower cost of linux VMs...)
(2) Is my thinking about having to spin multiple VMs up to do this correct (albeit small VMs running X client applications each)?
(3) Another approach I thought of is to write the clients in Javascript - the reason being that the bottleneck for the api-client is network and api-server response time, not client-side, so it might be well suited to async work? .... in this case I could have one Node server running 100x more api-clients than I could ever get in even a bunch of micro-windows VMs ?
(4) Finally, am I reinventing the wheel? ... is there anything out there on Amazon or Azure already that I can plug into that would provide a ready framework for what I need?
All comments and suggestions and guidance most welcome.
Many thanks.
Aucun commentaire:
Enregistrer un commentaire