Kafka is structured by its four primary components: topics, producers, consumers, and brokers. In this post, we will discuss topics. For information about optimizing Kafka, check out our post Kafka Optimization.
Kafka organizes message feeds into categories called topics. Messages are sent to and read from specific topics. In other words, producers write data to topics, and consumers read data from topics.
Kafka is a distributed system, running in a cluster. Each of the nodes in a Kafka cluster are referred to as brokers. The topics are partitioned and replicated across the brokers throughout the entirety of the implementation. These partitions allow users to parallelize topics, meaning data for any topic can be divided over multiple brokers.
Since a topic can be split into partitions over multiple machines, multiple consumers can read a topic in parallel. This organization sets Kafka up for high message throughput.
You might be wondering at this point how consumers can keep track of the messages in the different partitions of a particular topic. This is handled with what are called offsets. Offsets are assigned to each message in a partition. Kafka automatically handles the message ordering, and the sequence for these offsets does not change.
Just like a mailing address includes a country, city, and street number to identify a location, messages within Kafka can be identified using the combination of the topic, partition, and offset. Together, these three pieces of information are unique to a message.
Creating a Kafka Topic
There are two ways to create a Kafka topic.
Auto Topics. Set the property auto.create.topics.enable to true. This parameter is set to true by default. Topics will be automatically created when applications produce, consume, or fetch metadata from a not yet existent topic. It is good practice to check num.partitions for the default number of partitions and default.replication.factor for the default number of replicas of the created topic.
Manual Topics. Topics can be created manually with the Kafka utility. Run kafka-topics.sh and insert topic name, replication factor, and any other relevant attributes.
/bin/kafka-topics.sh –create \
–zookeeper : \
It is best practice to manually create all input/output topics before starting an application, rather than using auto topic. However, internal topics do not need to be manually created.
View Topics. To view the list of topics, run the following command:
> bin/kafka-topics.sh –list –zookeeper :
Granular View of Topics. For a more granular view of the topics and partitions:
> bin/kafka-topics.sh –describe –zookeeper :
For more information on Kafka check out our other blog posts: Kafka Optimization, Kafka FAQ, Monitoring Kafka, and Kafka vs. RabbitMQ.
Learn more about our effective and personalized Apache Kafka Training.
Data consulting and implementation services from Dattell provide STRATEGY, ENGINEERING, and PERSPECTIVE to support your organization’s data projects. Our services include custom Data Architecture, Business Analytics, Operational Intelligence, Centralized Reporting, Automation, and Machine Learning. Dattell specializes in Apache Kafka and the Elastic Stack for reliable data collection, storage, and real-time display.