Updated May 2020
Kafka Consumer’s Role. The role of the Kafka consumer is to read data from Kafka. Kafka consumer optimization can help avoid errors and increase performance of your application.
While the focus of this blog post is on the consumer, we will also review several broker configurations which affect the performance of consumers.
Top 5 Configurations for Kafka Consumer Optimization
#1 Use rebalance delay to reduce the number of consumer group rebalances
Kafka consumers read messages from brokers starting at a specific point in time called an offset. Kafka keeps track of the offset, or what messages have already been read, at the consumer group level.
By default, when adding a new consumer the brokers will wait three seconds before adding the new consumer into the overall pool of available consumers. After the three seconds the brokers will rebalance what data is sent to which consumer. Read more on the Kafka consumer rebalance time here.
Typically we see production with a three second delay and development with zero delay.
#2 Process messages once with exactly_once processing.guarantee
Assuming exactly_once processing is turned on, consumers will read each message only once. The default processing guarantee is “at least once.” Kafka balances performance with its best effort to send messages once to each consumer group.
#3 Good network connections increase throughput and availability
A good network connection from the Kafka consumers to the brokers is paramount to the success of your consumers. When a consumer drops out of a consumer group due to a failure or poor network connection, the brokers will take a second or two to rebalance the distribution of partitions to consumers. While the brokers are rebalancing, consumers are not able to consume any messages. Keep your broker cluster running at peak performance by ensuring proper network connectivity.
Most cloud service providers offer the ability to physically locate instances in a specific location within a data center to increase network performance. Having instances be physically located close to each other decreases latency and increases the network throughput. For AWS, see placement groups as an example.
#4 Set number of consumers equal to or a multiple greater than the number of partitions
A consumer can read from many partitions. If there are four consumers and eight partitions, each consumer will read from two partitions.
A partition can only send data to a single consumer. If there are eight consumers and four partitions, four of the eight consumers will read from a single partition and the other four consumers will do nothing.
Aim to have the number of partitions either be exactly the number of consumers or a multiple thereof. If the number of partitions is not a multiple of the number of consumers, a few of the consumers will have additional load. For example, if there are four consumers and nine partitions, three consumers will read from two partitions and one consumer will read from three partitions.
#5 Keep messages below one megabyte in size
Kafka brokers are optimized to process batches of small messages. According to the benchmarks the optimal message size is around 100 bytes. Large 10 megabyte messages will bog down the Kafka brokers and subsequently consumers.
Kafka Consumer Optimization Summary
In short, for Kafka Consumer Optimization be sure to use rebalance delay, process messages with exactly_once processing, ensure good network connections, set consumers to a multiple of the number of partitions, and keep messages below 1 MB in size.