Why Kafka is preferred in Event driven architecture? Message ordering and scalable partitions.
Let me introduce you to Kafka partitions before jumping into message ordering, to understand the magic behind. Kafka topics are divided into partitions to make horizontal scaling possible. Messages will be distributed across these partitions so that we can process multiple messages in parallel by attaching consumers on each of these partitions.
Imagine you have a topic for User domain object, let’s name this topic user-topic. While creating this kafka topic you would have selected the partition count. So if you configure the partition count as 5 then it means your microservice infrastructure can process 5 requests in parallel. Before drilling deep down scalability let’s see what makes a partition work.
Partition Key
Any message in Kafka can be strictly bound to a partition based on the partition key.
Header: partitionKey
PartitionKey is an important Header which we can send along with a message to a Kafka topic. Messages sent with the same partitionKey will always end up in the same partition of a topic. Conside the messages given below, () represents headers and {} represents payload.
User (partitionKey=12345) {id=12345, email=a@a.com, status=NEW}
User (partitionKey=12345) {id=12345, email=a@a.com, status=EDITED}
User (partitionKey=12345) {id=12345, email=a@a.com, status=ACTIVE}
All the above messages will end up in the same partition of user-topic, say partition-0. And they will exactly follow the same order as pushed to the topic. So status NEW will be first, EDITED will be behind it, followed by the ACTIVE status payload. This way the events can always be processed in the exact same order they were emitted.
What’s the Big deal?
Imagine you have ten replicas of user-microservices listening to user-topic for events. Only one thread spawned in all those threads of the 10 microservices will consume from partition-0
Ordering of events is guranteed 100%
No two user-microservice replicas will process the same user (e.g. the one with id 12345) at the same time. This is a big deal because if two microservices fetch NEW status and EDITED status payload in parallel then we can not guarantee which event will be processed first. This may lead to erroneous processing of events.
This is the major problem with RabbitMQ and hence it is not advisable to use it for event driven architecture.
Now let’s talk about the scalability part of this post.
Choosing partition count
Number of partition will be directly linked to the number of parallel requests your service can handle. Ideally this number should be decided before you design your microservices based on your volume forecast. But it can always be incremented in Kafka when your volume increases. Try to configure the partition count as a natural number, best choice would be multiples of 5 or 10.
Partition Rebalancing
spring cloud stream kafka binder has wonderful support to distribute consumers across partitions. Not just consumers but also multiple instances/replicas of same microservice. This way the partition is perfectly balanced between multiple replicas on the fly.
For this in your application.yml you have to configure certain things like the example given below.
Consumer config
spring.cloud.stream.bindings:
user-sink:
group: user-microservice
contentType: application/json
destination: user-topic
Producer config
spring.cloud.stream.bindings:
user-source:
group: user-microservice
contentType: application/json
destination: user-topic
producer:
partition-key-expression: headers['partitionKey']
spring.cloud.stream.kafka.bindings.user-source.producer.sync: true
(spring.cloud.stream.kafka.bindings.user-source.producer.sync is to enable waiting for kafka response to check if message was properly pushed)
Main thing to notice in those above configurations is the group property. If we declare a group for our consumer then any two microservices sharing the topic will be equally balanced on the available partition.
If we declare 10 partitions and start the first replica, 10 consumers threads will be spawned and attached to each of those 10 partitions.
If we scale the replica to 2 then 5 partitions will be allocated for each microservice. AUTOMATICALLY. That is 5 for replica 1 and 5 for replica 2.
Awesome right?
Points to remember before scaling your microservice
- Double the partition count in Kafka
- Scale the microservice by 1 replica
- Check if the partitions are revoked in old and new replicas and new distribution happens in the logs.
Kafka, Isn’t she beautiful :)