Updated September 2020
Many companies leverage both Apache Kafka and the Elastic Stack (Elasticsearch, Logstash, and Kibana) for log and/or event processing. Kafka is often used as the transport layer, storing and processing data, typically large amounts of data.
Kafka stages data before it makes its way to the Elastic Stack. Logstash transforms the data, making it uniform. Elasticsearch serves as long-term, horizontally scalable storage, search, and analysis. Kibana is used as the visualization platform with graphs, charts, dashboards, login capability, and more.
The origin of the technology behind Kafka provides insight into its success and core competency. Created by LinkedIn, Kafka was built to manage enormous volumes of event data. As with other message brokers, Kafka groups data into topics, using publisher-consumer and queue semantics.
A distinguishing factor in Kafka’s design is that, unlike other message brokers, the complexity of the technology is shifted from producers to consumers, and it relies heavily on the file system cache. For more details on the differences between Kafka and other message brokers, like RabbitMQ and Redis, check out our post on this topic.
High Throughput Data
Kafka typically comes out on top for use cases that require the smooth handling of high throughput data and access to stream history. Often, companies are drawn to Elasticsearch because it is also built to handle large amounts of data, thus making the technologies pair together nicely in high throughput use cases.
Logstash Integration With Kafka
Logstash, the ETL layer for the Elastic Stack, integrates natively with Kafka. The integration occurs using Java APIs. And there are both input and output plugins allowing users to read and write to Kafka directly from Logstash.
When to Pair Kafka With Elasticsearch
Log and event data are inherently inconsistent, with message volume and flow rates sometimes changing in unpredictable ways. These surges can be somewhat predictable, with a planned website promo or at specific times of the year. However, volume and flow rate increases can also come unexpectedly, such as from a bug in a program that logs information excessively.
Employing Kafka can protect Logstash and Elasticsearch from such data spikes, preventing a system crash and data loss.
Elasticsearch Isn’t Reachable
If Elasticsearch is down, whether for a planned outage, such as an upgrade, or unplanned outage, using a message queue like Kafka prevents data loss by holding the data until Elasticsearch is back online.
When to Pair Elasticsearch With Kafka
Monitoring Kafka cluster performance is crucial for diagnosing system issues and preventing future problems. Additionally, monitoring dispels any doubt in your users’ minds that all messages are being properly processed. Remember, if you lose a user’s data, it can be extremely difficult to regain trust.
We recommend Elasticsearch for Kafka monitoring because Elasticsearch it is free and highly versatile. Learn more about how to use Elasticsearch for Kafka monitoring in our post Kafka Monitoring With Elasticsearch and Kibana.
Considerations for Employing Kafka
There are many considerations when employing Kafka including understanding best practices for Kafka Topics and Kafka Partitions, in addition to overall Kafka Optimization and Performance Tuning. And of course it is crucial that Kafka is secured appropriately.
Below we gathered several of our most popular posts on Kafka to help you get started.
Kafka Partitions. To optimize Kafka for the number of partitions, use the calculation #Partitions = Desired Throughput / Partition Speed, where partition speed is estimated as 10 MB/s. Continue reading.