How many Kafka partitions are needed?

Apache Kafka is a distributed system, running in a cluster with each of the nodes referred to as brokers. Kafka topics are partitioned and replicated across the brokers throughout the entirety of the implementation.

Why are Partitions Important?

Partitions enable users to parallelize topics, allowing data for any topic to be divided over multiple brokers. And as the parallelization is increased so is the throughput. Partitions also play an important role in guaranteeing message order. Check out our article on how Kafka guarantees message order to learn more.

Are More Partitions Better?

You don’t necessarily want to use more partitions than needed because increasing partition count simultaneously increases the number of open server files and leads to increased replication latency. For most implementations you want to follow the rule of 10 partitions per topic, and 10,000 partitions per Kafka cluster. Going beyond that amount can require additional monitoring and optimization.

Calculating Kafka Partition Requirements

The calculation we use to optimize the number of partitions for a Kafka implementation is # Partitions = Desired Throughput / Partition Speed. Conservatively, you can estimate that a single partition for a single Kafka topic runs at 10 MB/s.

Kafka Partition Calculator

Throughput

MB/s

Partition Speed

MB/s

[throughput]/[speed]

Number of Partitions

Kafka Partition Calculator

Throughput

MB/s

Partition Speed

MB/s

[throughput2]/[speed2]

Number of Partitions

Kafka Partition Calculator

Throughput

MB/s

Partition Speed

MB/s

[throughput3]/[speed3]

Number of Partitions

As an example, let’s say your desired throughput is 5 TB per day, yielding an average of 58 MB/s. Using the estimate of 10 MB/s per partition, this example implementation would require 6 partitions. Play around with the Kafka partition calculator above to see how the partition count changes as throughput increases and decreases.

For the example above, the number of partitions is set using the following code:

bin/kafka-topics.sh –zookeeper ip_addr_of_zookeeper:2181 –create –topic my-topic –partitions 6 –replication-factor 3 –config max.message.bytes=64000 –config flush.messages=1

Managed Apache Kafka^®

At Dattell, we provide fully managed Kafka in our clients’ environments — cloud or on-premise. Visit our Managed Apache Kafka® services page for more details.

How many Kafka partitions are needed?

How many Kafka partitions are needed?