Kafka Consumer Offset

Understanding Kafka Consumer Offset

Updated January 2023

One aspect of Kafka that can cause some confusion for new users is the consumer offset.  In this post, we define consumer offset and outline the factors that determine Kafka Consumer offset.

Defining Kafka Consumer Offset

The consumer offset is a way of tracking the sequential order in which messages are received by Kafka topics.  Keeping track of the offset, or position, is important for nearly all Kafka use cases and can be an absolute necessity in certain instances, such as financial services.

The Kafka consumer offset allows processing to continue from where it last left off if the stream application is turned off or if there is an unexpected failure.  

In other words, by having the offsets persist in a data store (Kafka and/or ZooKeeper), data continuity is retained even when the stream application shuts down or fails.

As discussed in a previous post about creating a Kafka topic, offsets are one of three metrics that when used together can locate or identify a message.  First there is the topic, then within topics are partitions, and then finally the ordering of the messages in the topics is referred to as the offset. 

If you’re curious about how to determine the number of partitions, check out this simple formula for optimizing partitions.

Determining Kafka Consumer Offset

New Consumer Groups

Initially, when a Kafka consumer starts for a new topic, the offset begins at zero (0).  Easy enough.

On the other hand, if a new consumer group is started in an existing topic, then there is no offset store.  In this scenario, the offset will either begin from the beginning of a topic or the end of the topic. The beginning of a topic would give the smallest possible offset.  The end of the topic would be the greatest possible offset.

Whether you start at the beginning or end of a topic is determined by your use case.  If you start the offset at the beginning of a topic, then you will be replaying data. This approach is good for building out a new server and populating it with data, or for doing load testing on a Kafka cluster.  If your needs don’t require any of those functions, then you likely will want to start at the end of the topic.

Existing Consumer Groups

What about for existing consumer groups?  Let’s say for instance that a consumer group consumes 12 messages before failing.  When the consumer starts up again, it will continue from where it left off in the offset (or position) because that offset is stored by Kafka and/or ZooKeeper.

If you are ever curious about where the offset is at, then you can open the kafka-consumer-groups tool.  This tool will provide you with both the offsets and lag of consumers for the various topics and partitions.  Keep in mind that the consumer has to be active when you run this command to see its current offset.

New to Kafka consumer groups?

Check out our primer on how Kafka uses consumer groups for event scaling, in addition to bridging two traditional messaging models:  pub-sub and shared message queues.  Click to learn about Kafka consumer groups.

Log Retention’s Impact on Offset

Log retention times can also impact consumer offset.  Let’s consider an example where the log retention is set to three (3) days.  What would happen if 32 messages were received over a couple of hours, and then four (4) days go by before the next message is received?  Where would the offset begin?

The answer is it depends on the offset retention period.  The default retention period for message offsets in Kafka is one week (7 days).  If Kafka was configured using the default, then to answer the questions above, the offset would begin at 32.

If the amount of time passed was two weeks (14 days), then the offset would be changed to the latest offset, since the previous offset would have been removed at one week (7 days).

The finite offset retention period exists to avoid overflow.  However, it isn’t set in stone. If you want to extend the retention beyond a week, simply specify the desired retention period when creating a Kafka topic.

More information on Kafka consumers and Kafka consumer optimization is available here.

Have Kafka Questions?

Managed Kafka on your environment with 24/ 7 support.

Consulting support to implement, troubleshoot,
and optimize Kafka.

Schedule a call with a Kafka solution architect.

Published by

Dattell - Kafka & Elasticsearch Support

Benefit from the experience of our Kafka, Pulsar, Elasticsearch, and OpenSearch expert services to help your team deploy and maintain high-performance platforms that scale. We support Kafka, Elasticsearch, and OpenSearch both on-prem and in the cloud, whether on stand alone clusters or running within Kubernetes. We’ve saved our clients $100M+ over the past six years. Without our guidance companies tend to overspend on hardware or purchase unnecessary licenses. We typically save clients multiples more money than our fees cost in addition to building, optimizing, and supporting fault-tolerant, highly available architectures.