I am looking to setup kafka as an intermediary between data coming from IoT machines and a service that will process that data. I am having some issues identifying the proper way to design my topics based on my usecase and would love some advice.
I am looking to read sensor data from many machines, and each machine could have many sensors. eg( temperature, pressure, parts etc..) The order of these messages that my consumers will read is imporant and needs to be sequential.
I have come up with three possible designs but I am not sure which is best, if any?
a) Each machine will write to a specific topic with 1 partition to guarantee sequence. so machine 100 will write to topics called : machine100TempSensor1, machine100TempSensor2, machine100PressureSensor1 etc..
b) all machines will write to a single topic but the partitions will be based on machine/sensor so using the same example as above, machine 100 will write to a topic called 'temperature' but will be keyd on the machine and sensor.
eg.
(Topic: temperature, partition : machine100TempSensor1)
(Topic: temperature, partition : machine100TempSensor2)
(Topic: temperature, partition : machine200TempSensor1)
c) produce all temperature related messages to a temperature topic and filter the messages as I process the data.
My concerns with all solutions,
a) - Kafka guarantees sequence on the partition level only, so would creating a topic with a single partition be a good idea or does that go against what a topic should be?
- If I wanted to read 'Temperature' from all machines, I would have to know the names and request data from specific topics instead of a general 'Temperature' topic.
- Kafka states that only one consumer group can read from a single partition, so I would have to create many consumer groups.
b) - A single 'temperature' topic could possibly have 30+ partitions if not 100s/1000s if I consider scaling. (but I would have the benefit of reading all partitions at once)
- Since only a single consumer group is able to read from a single partition, I will have a consumer group for every consumer.
c) - I feel there could be a big performance cost in filtering thousands of useless messages.
- I will run into the same issue when it comes time to pushing the processed data to kafka.
Something to consider is that I would like to have the ability to process certain machines/sensors.
Hopefully I have been able to explain everything clearly.
Aucun commentaire:
Enregistrer un commentaire