Why Kafka is preferred in Event driven architecture? Message ordering and scalable partitions.

4 min readJun 12, 2019

Let me introduce you to Kafka partitions before jumping into message ordering, to understand the magic behind. Kafka topics are divided into partitions to make horizontal scaling possible. Messages will be distributed across these partitions so that we can process multiple messages in parallel by attaching consumers on each of these partitions.

Imagine you have a topic for User domain object, let’s name this topic user-topic. While creating this kafka topic you would have selected the partition count. So if you configure the partition count as 5 then it means your microservice infrastructure can process 5 requests in parallel. Before drilling deep down scalability let’s see what makes a partition work.

Partition Key

Any message in Kafka can be strictly bound to a partition based on the partition key.

Header: partitionKey

PartitionKey is an important Header which we can send along with a message to a Kafka topic. Messages sent with the same partitionKey will always end up in the same partition of a topic. Conside the messages given below, () represents headers and {} represents payload.

User (partitionKey=12345) {id=12345, email=a@a.com, status=NEW}

User (partitionKey=12345) {id=12345, email=a@a.com, status=EDITED}

User (partitionKey=12345) {id=12345, email=a@a.com, status=ACTIVE}

All the above messages will end up in the same partition of user-topic, say partition-0. And they will exactly follow the same order as pushed to the topic. So status NEW will be first, EDITED will be behind it, followed by the ACTIVE status payload. This way the events can always be processed in the exact same order they were emitted.

What’s the Big deal?

Imagine you have ten replicas of user-microservices listening to user-topic for events. Only one thread spawned in all those threads of the 10 microservices will consume from partition-0

Ordering of events is guranteed 100%

No two user-microservice replicas will process the same user (e.g. the one with id 12345) at the same time. This is a big deal because if two microservices fetch NEW status and EDITED status payload in parallel then we can not guarantee which event will be processed first. This may lead to erroneous processing of events.

This is the major problem with RabbitMQ and hence it is not advisable to use it for event driven architecture.

Now let’s talk about the scalability part of this post.

Choosing partition count

Number of partition will be directly linked to the number of parallel requests your service can handle. Ideally this number should be decided before you design your microservices based on your volume forecast. But it can always be incremented in Kafka when your volume increases. Try to configure the partition count as a natural number, best choice would be multiples of 5 or 10.

Partition Rebalancing

spring cloud stream kafka binder has wonderful support to distribute consumers across partitions. Not just consumers but also multiple instances/replicas of same microservice. This way the partition is perfectly balanced between multiple replicas on the fly.

For this in your application.yml you have to configure certain things like the example given below.

Consumer config

spring.cloud.stream.bindings:
  user-sink:
    group: user-microservice
    contentType: application/json
    destination: user-topic

Producer config

spring.cloud.stream.bindings:
  user-source:
    group: user-microservice
    contentType: application/json
    destination: user-topic
    producer:
      partition-key-expression: headers['partitionKey']
spring.cloud.stream.kafka.bindings.user-source.producer.sync: true

(spring.cloud.stream.kafka.bindings.user-source.producer.sync is to enable waiting for kafka response to check if message was properly pushed)

Main thing to notice in those above configurations is the group property. If we declare a group for our consumer then any two microservices sharing the topic will be equally balanced on the available partition.

If we declare 10 partitions and start the first replica, 10 consumers threads will be spawned and attached to each of those 10 partitions.

If we scale the replica to 2 then 5 partitions will be allocated for each microservice. AUTOMATICALLY. That is 5 for replica 1 and 5 for replica 2.

Awesome right?

Points to remember before scaling your microservice

Double the partition count in Kafka
Scale the microservice by 1 replica
Check if the partitions are revoked in old and new replicas and new distribution happens in the logs.

Kafka, Isn’t she beautiful :)

Single Writer Principle

Microservices in event driven architecture

medium.com

How to name events in Event Driven Architecture

Event Driven Architecture naming conventions

medium.com

Microservices tutorial. From Initial Commit till Running on Cloud.

This is a series of posts to learn how to build an online product with Angular frontend and microservices backend.

medium.com

Why Kafka is preferred in Event driven architecture? Message ordering and scalable partitions.

Partition Key

What’s the Big deal?

Choosing partition count

Partition Rebalancing

Points to remember before scaling your microservice

Single Writer Principle

Microservices in event driven architecture

How to name events in Event Driven Architecture

Event Driven Architecture naming conventions

Microservices tutorial. From Initial Commit till Running on Cloud.

This is a series of posts to learn how to build an online product with Angular frontend and microservices backend.

Written by Richy Great

No responses yet