I am in process of designing sqs FIFO handling strategy, so trying to do it in right way. Faced with few fundamental questions.
For example i have in queue:
- 100 messages with groupid "
group1_100
" - 5 messages with "
group2_5
" - 1 message with "
group3_1
" - 1 message with "
group4_1
"
Example of handle :
I requested 10 messages using longpooling :
1.1) Can i be sure that i'll receive 10 (max of requested) messages pre request on retrieve if there are more, than 10 messages of same group.
1.2) May sqs returns me 7 messages from group1_100
, 2 from group2_5
and 1 from group4_1
?
if yes :
1.2.1) for example, handling this 7 messages of group1_100 will take 10 seconds. So using thread pool 3 messages from group2_5
and group4_1
already handled in parallel - so i saying to sqs that they were handled so :
1.2.2) making another one request to sqs (to not wait until 7 messages from group1_100
finished). So
1.2.2.1) Is it possible, that i can receive messages from group1_100
(that should be handled after first 7 = in right order)? (assume, that no, because sqs guarantee us that it will be retrieved strongly in right order). Am i right?
In case if 1.2 is "yes" i need to control different "groupid" handle to prevent situation, when my processors will work only for "group1_100" even if they were first from "group/time" point of view, right? To prevent theoretic attack like "we are as a client will send you a lot of messages with one groupId, and you'll need to take us all your handle processors".
-
So it may happen in case of single thread handling, right?
-
And may it be resolved by next strategy? :
Making request to get 10 messages -> devide them to n groups (grouping by groupid) and starting handle them in parralel. When all messages from "group" handled - making "batch" notify to sqs about finishing -> in parralel start fetching another one pack of messages to handle (depending on/checking constant total_handling_messages_limit_at_time) and start handle them in same way syncing fetching to not make it too many times per second/minute.
So it will allow us to split handle inside one worker using threadpool, and not to stack with "long running" tasks with "one groupid".
Am i right in general, guys? Maybe i something missing here?
Aucun commentaire:
Enregistrer un commentaire