Zero Downtime Migration to Kafka


Published October 2023 One of the emerging trends we’ve observed in the data architecture space is the growing interest in moving from externally hosted Kafka to companies running Kafka in their own environments.  In this post, we’ll detail why companies are migrating Kafka to their internal environments, the importance of zero downtime migration, and how … Continue reading Zero Downtime Migration to Kafka

What is Kafka Streams?


Published September 2023 Traditional batch processing systems, albeit effective for fixed datasets, are often ill-suited for real-time, high-throughput scenarios. Kafka Streams enables organizations to act on data as it arrives, making it particularly useful for applications that require immediate response. Kafka Streams is a client library for building real-time applications and microservices. It’s a part … Continue reading What is Kafka Streams?

Optimizing Kafka Brokers: Lessons From Managing Fortune 500 Implementations


Published August 2023 Optimizing Kafka broker performance will have a direct impact on your overall Kafka implementation. In this blog post, we cover what a Kafka broker does, why it’s essential, and how to optimize Kafka broker settings.  We also share firsthand experiences optimizing Kafka brokers from our work with Fortune 500 companies.  We encourage … Continue reading Optimizing Kafka Brokers: Lessons From Managing Fortune 500 Implementations

Is Kafka free?


Updated September 2023 Apache Kafka is available under the Apache License v2.0, and is completely free.  The license is extremely permissive, allowing for users to include Kafka in consumer products and edit the code. Confluent Kafka is available either free under their community license (not open source) or a paid version.  The paid version of … Continue reading Is Kafka free?

What is a Kafka partition?


Published May 2023 Apache Kafka partitions are integral to Kafka’s ability to scale.  The act of partitioning divides up a single topic into multiple partitions.  Each of the partitions can then exist on a separate node within the Kafka cluster.  The work of storing, writing, and processing messages is then distributed across multiple nodes in … Continue reading What is a Kafka partition?

Does Kafka Guarantee Message Order?


Published August 2023 Yes, Apache Kafka does guarantee message order.  In this article we will walk you through how Kafka guarantees order using the detailed figure below. Producers’ Role in Kafka Ordering Data is sent to Kafka from producers.  Producers are often applications that generate messages.  And each message sent to Kafka will correlate with … Continue reading Does Kafka Guarantee Message Order?

Is Kafka a message queue?


Updated August 2023 Apache Kafka is not a traditional message queue. Kafka is a free to use, distributed messaging system that includes components of both a message queue and a publish-subscribe model.   Kafka improves on the deficit of each of those traditional approaches allowing it to provide fault tolerant, high throughput stream processing.  Traditional shared … Continue reading Is Kafka a message queue?

How to Check Kafka Version


Published October 2022 Here are two quick steps to check which version of Apache Kafka is running. Step 1:  Change directories to the Kafka home directory. Step 2:  Use command-line utilities to enter the following command: It will return the version running. Here the Kafka version running is 3.3.1.

Kafka Consumer Basics


Updated August 2023 In this article we provide succinct answers to common Apache Kafka consumer questions.  You will learn the basics of how Kafka consumer groups work, how many consumers you can have per topic, and other important Kafka consumer facts. Answers to commonly asked questions about Kafka consumers Can a Kafka consumer read from multiple … Continue reading Kafka Consumer Basics

Kafka vs Pulsar


Updated August 2023 Pulsar and Kafka achieve the same result. They both guarantee messages reach their intended destination(s). Yet, there are important differences between the two message queues. These differences can make one of the technologies a better fit, depending on your use case. In this post we cover 8 ways in which Apache Kafka … Continue reading Kafka vs Pulsar

How to Prevent a Kafka Outage


Published August 2023 Apache Kafka is a highly reliable tool when configured correctly for your use case.  It should be the piece of your data architecture that you can be sure will remain online.   Here we put together eight important best practices to help shore up your Kafka implementation. 8 Tips to Prevent Kafka Downtime … Continue reading How to Prevent a Kafka Outage

Kafka on Kubernetes


Updated August 2023 Companies are coming to us specifically for assistance with deploying and managing Apache Kafka on Kubernetes.  With many teams already familiar with Kubernetes, it can sometimes be the best choice to spin up Kafka servers on Kubernetes alongside their other applications. Kafka on Kubernetes presents some challenges though. In this post we … Continue reading Kafka on Kubernetes

What is Kafka Connect?


Updated February 2023 Kafka Connect is a free tool for efficiently moving data into and out of Apache Kafka.  Kafka Connect simplifies streaming data while also improving scalability and reliability. Features of Kafka Connect Standardizes integrations with Kafka.  Kafka Connect provides a shared framework for all Kafka connectors, which improves efficiency for connector development and … Continue reading What is Kafka Connect?

Kafka Uses Consumer Groups for Scaling Event Streaming


Updated August 2023 Apache Kafka is a distributed messaging system that implements pieces of the two traditional messaging models, Shared Message Queues and Publish-Subscribe.  Both Shared Message Queues and Publish-Subscribe models present limitations for handling high throughput use cases.   Apache Kafka provides fault tolerant, high throughput stream processing that can handle even the most complicated … Continue reading Kafka Uses Consumer Groups for Scaling Event Streaming

Kafka Case Studies


Updated March 2023 Below are eight case studies showcasing how our Kafka experts have supported clients with Kafka challenges.  These case studies cover a variety of fields and highlight the vast applications for Kafka across industries. Learn about Kafka support Information Security CHALLENGE:  Client needs to monitor all servers, devices, applications, and laptops in this … Continue reading Kafka Case Studies

Kafka Definitions


Updated July 2022 Taking a break from Kafka optimization posts to get back to the basics of Apache Kafka and define fundamental Kafka concepts. Kafka Definitions:  A Primer for Apache Kafka Fundamentals Kafka Producer.  A Kafka producer is a standalone application, or addition to your application, that sends data to Kafka broker(s). Kafka Broker.  A … Continue reading Kafka Definitions

Kafka Consumer Optimization


Updated August 2023 Kafka Consumer’s Role. The role of the Kafka consumer is to read data from Kafka.  Kafka consumer optimization can help avoid errors and increase performance of your application.   While the focus of this blog post is on the consumer, we will also review several broker configurations which affect the performance of consumers. Top … Continue reading Kafka Consumer Optimization

What is a Kafka Topic?


Updated April 2022 Kafka topics are the categories used to organize messages. Each topic has a name that is unique across the entire Kafka cluster. Messages are sent to and read from specific topics.  In other words, producers write data to topics, and consumers read data from topics. Kafka topics are multi-subscriber.  This means that … Continue reading What is a Kafka Topic?

Open Source Monitoring for Kafka


Updated December 2021 A critical component to ensuring Kafka uptime and maintaining peak performance is through monitoring.  Open source monitoring of disk performance, memory usage, CPU, network traffic, and load allow you to identify abnormal metrics in real-time and address potential issues before a performance dip or outage occurs. In other words, monitoring Apache Kafka … Continue reading Open Source Monitoring for Kafka

Load Balancing With Kafka


Updated February 2023 What is Kafka loading balancing? Load balancing with Kafka is a straightforward process and is handled by the Kafka producers by default.  While it isn’t traditional load balancing, it does spread out the message load between partitions while preserving message ordering. Round-robin approach:  By default, producers choose the partition assignment for each … Continue reading Load Balancing With Kafka

Kafka Use Cases


Updated April 2021 Apache Kafka is a high-throughput, open source message queue used by Fortune 100 companies, government entities, and startups alike. Part of Kafka’s appeal is its wide array of use cases.  In this post we will outline several of Kafka’s uses cases from event sourcing to tracking web activities to metrics and more. … Continue reading Kafka Use Cases

Performance Tuning for Apache Kafka


For Apache Kafka performance tuning measure latency and throughput for your Kafka implementation. Latency is the measure of how long it takes Kafka to process a single event. Throughput is the measure of how many events arrive within a particular period of time.

Kafka Monitoring With Elasticsearch and Kibana


Monitoring Kafka cluster performance is crucial for diagnosing system issues and preventing future problems. We recommend using Elasticsearch for Kafka monitoring because Elasticsearch is free and highly versatile as a single source of truth throughout any organization.

Frequently Asked Questions: Apache Kafka


Our team is experienced with implementing and fixing Kafka on a wide-range of systems for an even wider-range of business needs. From our real-world experience with Kafka consulting, we found that there are common questions that many new clients have about the technology.
Here are some quick answers to those questions.

Kafka Optimization


Issues with Apache Kafka performance are directly tied to system optimization and utilization. Here, we compiled the best practices for a high volume, clustered, general use case.

When to Consider Physical and Logical Separation With Kafka


When companies scale, their data handling needs change, and systems that worked a year ago are now over-taxed with the increase in message volume. One particular component of the data handling system, the cluster architecture, should be revisited.