mercredi 26 décembre 2018

Maintaining order of processing Streaming Events along with other conditions

I have an events service which corresponds to all the activities (viewed / submitted) by the user. Each event will have a session_id along with other attributes such as page_id, url, page_type (viewed / submitted ), etc.

I have the following problems that I need to cater:

  1. Since there would be a lot of events pushed, I want to write / push them somewhere in the fastest way possible.
  2. Event Processing for different session_ids should be done in parallel. For events with the same session id though, the processing should be synchronous. For example, customer-payment event should be before form-submitted event
  3. Event processing is done by a separate service. This service exposes a url where event data is pushed in order to be processed. Now, I don't want to overload this service with more requests than it can handle. If it can handle 2k requests concurrently, I should be able limit my concurrent calls to not more than 2000.

Here is what I have been able to come up till now.

For Problem 1:

I have a separate service that pushes events received from the browser to AWS DynamoDB. Then, I can enable Streams on the table created. And through proper setting of partitions while creating a table, I can ensure that event logs for a single session_id are sorted (by keeping the partition key as session_id and sort key as created_at).

However, I don't know how to solve the other two problems. The solutions that I have in mind can solve either of the two but not both.

  1. I can set up a pooling service that ensures that total number of requests to the event-processing doesn't exceed a certain amount. If the incoming requests are more, then it will queue them and process it as soon as the event-processing server is free i.e number of concurrent connections are less than 2000. But this solution will not ensure that events belonging the same session_id are processed synchronously. If I have pool limit of 2000 connections, and I have 20 events of the same session, my pooling-service will make 20 requests to event-processing service in parallel.
  2. I can have a service that spawns a new process for each session_id when processing an event. In that case, I will have a process per session_id for processing an event. So, I will ensure that events belonging to the same session_id are sent to a single process. Now, these processes need to be lightweight so that my service doesn't bloat up when there are multiple number of concurrent sessions. I can write the service in Go or Erlang over here. But this doesn't ensure that event-processing service gets no more than specified number of requests in parallel.

Can someone help in figuring out the solution or point me in the right direction?

Aucun commentaire:

Enregistrer un commentaire