Kafka: Physical vs. Logical Separation

Within Kafka, there are two different ways to organize data through separation: physical and logical. Whether physical or logical separation is appropriate for your implementation will depend on your use case.

PHYSICAL SEPARATION

Physical separation of clusters is the best approach in four primary cases.

#1

If the message sizes coming to your system vary widely, from exceedingly small to exceedingly large, physical separation is best.

Messages are processed in the order they are received and a single, large message would add considerable delay for the smaller messages.

Implementing both forms of physical and logical separation of data will increase the performance of your Kafka clusters, reduce cost, and reduce downtime.

#2

Physical separation is best is when messages are consumed in intervals instead of constantly.

Kafka prefers to have messages consumed as they are received. If a consumer wakes up once per hour to consume messages, Kafka will need to pull messages from disk and insert them into RAM, effectively decreasing the performance of the entire Kafka cluster.

#3

Exceedingly high overall bandwidth for a single service also merits physical separation.

If a specific subset of the messages in a cluster are the source of the majority of the bandwidth, decrease the overall liability to your message queue by creating a separate cluster dedicated to the high bandwidth messages.

Autoscaling operations become easier as well when working with only a single service.

#4

The fourth case for a separate cluster is when you have critical messages that must be guaranteed.

As mentioned previously, autoscaling operations are more robust when only accounting for a single service.

Additionally, physical separation of your critical data protects the messages from an unrelated, less important service with a bug causing a degradation of your Kafka cluster holding the critical data.

Drawbacks to Physical Separation

The drawback to implementing physically separated clusters is that there are now multiple clusters to monitor and alert on, and multiple clusters to expand and manage.

These perceived drawbacks are minimized by automating the majority of those tasks.

LOGICAL SEPARATION

Logical separation of data is a bit trickier than physical separation because much thought and consideration must be given to the creation of Kafka topics and which services apply to each topic.

Unless multiple services need the same stream of messages, all of the data a service needs should have its own topic.

You want to keep your topics to a minimum to maintain high performance of the Kafka cluster.

Benefit of Logical Separation

The benefit of logical separation is that missing data is easily retrieved from Kafka because of the organization of information into topics.

Drawback

The drawback to this approach is that for optimization purposes, each topic should have its own dedicated disk to maintain sequential read and write performance.

—

Implementing both forms of physical and logical separation of data will increase the performance of your Kafka clusters, reduce cost, and reduce downtime.

24x7 Kafka Support & Consulting

Visit our Apache Kafka® page for more details on our support services.

Kafka: Physical vs. Logical Separation