The image reads, "Origins of Kafka and why it plays well with Elasticsearch" and contains logos for both software.

Origins of Kafka and Why it Plays Well With Elasticsearch

Updated September 2020

Many companies leverage both Apache Kafka and the Elastic Stack (Elasticsearch, Logstash, and Kibana) for log and/or event processing.  Kafka is often used as the transport layer, storing and processing data, typically large amounts of data.

Kafka stages data before it makes its way to the Elastic Stack.  Logstash transforms the data, making it uniform. Elasticsearch serves as long-term, horizontally scalable storage, search, and analysis.  Kibana is used as the visualization platform with graphs, charts, dashboards, login capability, and more.

Kafka’s Origins

The origin of the technology behind Kafka provides insight into its success and core competency.  Created by LinkedIn, Kafka was built to manage enormous volumes of event data. As with other message brokers, Kafka groups data into topics, using publisher-consumer and queue semantics.

A distinguishing factor in Kafka’s design is that, unlike other message brokers, the complexity of the technology is shifted from producers to consumers, and it relies heavily on the file system cache.  For more details on the differences between Kafka and other message brokers, like RabbitMQ and Redis, check out our post on this topic.

High Throughput Data

Kafka typically comes out on top for use cases that require the smooth handling of high throughput data and access to stream history.  Often, companies are drawn to Elasticsearch because it is also built to handle large amounts of data, thus making the technologies pair together nicely in high throughput use cases.

Logstash Integration With Kafka

Logstash, the ETL layer for the Elastic Stack, integrates natively with Kafka. The integration occurs using Java APIs. And there are both input and output plugins allowing users to read and write to Kafka directly from Logstash.


When to Pair Kafka With Elasticsearch

Event Spikes

Log and event data are inherently inconsistent, with message volume and flow rates sometimes changing in unpredictable ways.  These surges can be somewhat predictable, with a planned website promo or at specific times of the year. However, volume and flow rate increases can also come unexpectedly, such as from a bug in a program that logs information excessively.

Employing Kafka can protect Logstash and Elasticsearch from such data spikes, preventing a system crash and data loss.

Elasticsearch Isn’t Reachable

If Elasticsearch is down, whether for a planned outage, such as an upgrade, or unplanned outage, using a message queue like Kafka prevents data loss by holding the data until Elasticsearch is back online.

When to Pair Elasticsearch With Kafka

Monitoring Kafka cluster performance is crucial for diagnosing system issues and preventing future problems.  Additionally, monitoring dispels any doubt in your users’ minds that all messages are being properly processed. Remember, if you lose a user’s data, it can be extremely difficult to regain trust.

We recommend Elasticsearch for Kafka monitoring because Elasticsearch it is free and highly versatile.  Learn more about how to use Elasticsearch for Kafka monitoring in our post Kafka Monitoring With Elasticsearch and Kibana.

Considerations for Employing Kafka

There are many considerations when employing Kafka including understanding best practices for Kafka Topics and Kafka Partitions, in addition to overall Kafka Optimization and Performance Tuning.  And of course it is crucial that Kafka is secured appropriately.

Below we gathered several of our most popular posts on Kafka to help you get started.

Performance Tuning.  Learn how to tune Kafka performance with tips including how to optimize batch sizes (hint minimum should be 1 kb), optimize number of partitions, and more. Continue reading.

Monitoring Kafka. The key to ensuring Kafka uptime and maintaining peak performance is through monitoring. Continue reading.

Securing Kafka.  There are 6 key components to securing Kafka. These best practices will help you optimize Kafka and protect your data from avoidable exposure. Continue reading.

Kafka Optimization.  Learn how to optimize Kafka in 5 minutes. We’ve outlined the most important settings and configurations for optimum Kafka performance. Continue reading.

Kafka Topics.  In this post we define what Kafka topics are and explain how to create them. Continue reading.

Kafka Partitions.  To optimize Kafka for the number of partitions, use the calculation #Partitions = Desired Throughput / Partition Speed, where partition speed is estimated as 10 MB/s. Continue reading.

Have questions about Kafka or Elasticsearch?

Get in touch with our expert engineers who have assisted hundreds of companies with Apache Kafka, Elasticsearch, and supporting technologies.