Kafka organizes message feeds into categories called topics. Each topic has a name that is unique across the entire Kafka cluster. Messages are sent to and read from specific topics. In other words, producers write data to topics, and consumers read data from topics. Kafka topics are multi-subscriber. This means that a topic can have … Continue reading What is a Kafka Topic?
One of the attractions of a hosted service, such as hosted Kafka, is the peace of mind that a third party is responsible for ensuring uptime. In this post we challenge that misconception and suggest an improved approach to managed Kafka. Let’s start with a hard truth. There are big names in managed Kafka as … Continue reading Hosted Kafka: Why Managed Kafka in Your Cloud or Data center is a Better Choice Than Hosted Kafka
Kafka’s primary role in many data architecture designs is ensuring that no data is lost. Databases can fail. Servers can fail. Applications can fail. But a well designed Kafka deployment should provide 24/7, reliable, fault-tolerant message collection and processing. One way to ensure an expertly designed and managed Kafka deployment is to employ a Kafka … Continue reading Uptime Guarantees for Managed Kafka as a Service
The key to ensuring Kafka uptime and maintaining peak performance is through monitoring. By reviewing disk performance, memory usage, CPU, network traffic, and load in real-time abnormal metrics or trends can be identified before a performance dip or outage occurs. Furthermore, monitoring Kafka provides assurance to your users that all messages are correctly processed. There … Continue reading Open Source Monitoring for Kafka
There are a handful of providers offering Kafka as a Service. If you are in the market for managed Kafka you might be wondering what factors to consider when choosing a provider. In this post, we break down the five most important considerations. #1 Is the service fully managed? If the service is truly a … Continue reading 5 Factors to Consider When Choosing a Kafka as a Service Provider
Load balancing with Kafka is a straightforward process and is handled by the Kafka producers by default. While it isn’t traditional load balancing, it does spread out the message load between partitions while preserving messaging ordering. By default, producers choose the partition assignment for each incoming message within a topic, following a round-robin approach. As … Continue reading Load Balancing With Kafka
One aspect of Kafka that can cause some confusion for new users is the consumer offset. In this post, we define consumer offset and outline the factors that determine the offset. Defining Kafka Consumer Offset The consumer offset is a way of tracking the sequential order in which messages are received by Kafka topics. Keeping … Continue reading Understanding Kafka Consumer Offset
The California Consumer Privacy Act (CCPA) allows the sale of customer data, even personally identifiable data, but it does add new restrictions. In this post we will discuss whether or not the new law applies to your organization, the restrictions on selling data without consent, and the restrictions for selling personally identifiable data. If you’re … Continue reading How CCPA Affects the Sale of Customer Data
ZooKeeper is used in distributed systems for service synchronization and as a naming registry. When working with Apache Kafka, ZooKeeper is primarily used to track the status of nodes in the Kafka cluster and maintain a list of Kafka topics and messages. ZooKeeper was originally developed by Yahoo to address the bugs that can arise … Continue reading What is ZooKeeper & How Does it Support Kafka?
Many companies leverage both Apache Kafka and the Elastic Stack (Elasticsearch, Logstash, and Kibana) for log and/or event processing. Kafka is often used as the transport layer, storing and processing data, typically large amounts of data. Kafka stages data before it makes its way to the Elastic Stack. Logstash transforms the data, making it uniform. … Continue reading Origins of Kafka and Why it Plays Well With Elasticsearch
AWS released Open Distro for Elasticsearch, adding to the performance and usability of the already essential log analytics and search technology. In this post we outline the new features that Open Distro provides and an overview of what this new open source technology means for Elastic Stack users. Open Distro for Elasticsearch Technology and Features … Continue reading Getting to Know Open Distro for Elasticsearch
Apache Kafka is a high-throughput, open source message queue used by Fortune 100 companies, government entities, and startups alike. Part of Kafka’s appeal is its wide array of use cases. In this post we will outline several of Kafka’s uses cases from event sourcing to tracking web activities to metrics and more. Use Cases for … Continue reading Kafka Use Cases
For performance tuning you will want to measure latency and throughput for your Kafka implementation. Latency is the measure of how long it takes Kafka to process a single event. Throughput is the measure of how many events arrive within a particular period of time. To achieve the best balance of latency and throughput, tune … Continue reading Kafka Performance Tuning
Optimizing Elasticsearch for shard size is an important component for achieving maximum performance from your cluster. To get started let’s review a few definitions that are an important part of the Elasticsearch jargon. If you are already familiar with Elasticsearch, you can continue straight to the next section. Defining Elasticsearch Jargon: Cluster, Replicas, Shards, and … Continue reading Elasticsearch Shards — Definitions, Sizes, Optimizations, and More
The way nodes are organized in an Elasticsearch cluster changes depending on the size of the cluster. For small, medium, and large Elasticsearch clusters there will be different approaches for optimization. Dattell’s team of engineers are expert at designing, optimizing, and maintaining Elasticsearch implementations and supporting technologies. Find our more about our Elasticsearch services here. … Continue reading Elasticsearch Optimization for Small, Medium, and Large Clusters
There are six key components to securing Kafka. These best practices will help you optimize Kafka and protect your data from avoidable exposure. #1 Encryption By default, data is plaintext in Kafka, which leaves it vulnerable to a man-in-the-middle attack as data is routed over your network. Transport layer security (TLS) and/or a secure sockets … Continue reading Kafka Optimization: Kafka Security Checklist
Apache Kafka is a distributed system, running in a cluster with each of the nodes referred to as brokers. Kafka topics are partitioned and replicated across the brokers throughout the entirety of the implementation. These partitions allow users to parallelize topics, meaning data for any topic can be divided over multiple brokers. A critical component … Continue reading Kafka Optimization — How many partitions are needed?
Earlier this year, California passed the California Consumer Privacy Act of 2018, or CCPA for short. Beginning in January 2020, companies will be required to comply with this new law. It places new restrictions on how companies handle personal data, including minimum damages for class action suits in response to data breaches. Dattell is a … Continue reading The California Consumer Privacy Act of 2018 (CCPA): How to Prevent Data Breaches
In this post we review the California Consumer Privacy Act (CCPA) and outline why it is important for technology teams to understand it.
In this post, we will define what Kafka topics are and explain how to create them.
Yes, your data is valuable. However, like oil in the ground, its value isn’t fully realized until it is cleaned up and processed. And just as crude oil can be valuable for transportation, plastic manufacturing, and heating, company data too can be processed to extract multiple layers of value. In this post, we will discuss the ways in which your data can provide value for your business and customers.
There are several message queue programs to choose from: Kafka, RabbitMQ, ActiveMQ, ZeroMQ, Redis, among others. How do you choose which is right for you?
Monitoring Kafka cluster performance is crucial for diagnosing system issues and preventing future problems. We recommend using Elasticsearch for Kafka monitoring because Elasticsearch is free and highly versatile as a single source of truth throughout any organization.
From our real-world experience with Elasticsearch consulting, we found that there are common questions that many new clients have about the technology.
Here are some quick answers to those questions.
Our team is experienced with implementing and fixing Kafka on a wide-range of systems for an even wider-range of business needs. From our real-world experience with Kafka consulting, we found that there are common questions that many new clients have about the technology.
Here are some quick answers to those questions.
We outlined the four primary ways for backing up data and their benefits and drawbacks to help you decide on which approach best meets your company’s needs.
When we are driving, we are routinely making data-driven decisions using the gauges on our dashboard to guide us. Data-driven decision making should be just as easy when it comes to business.
We broke down the thought process for choosing between AWS Elasticsearch and a custom Elasticsearch solution here to help you think through what will be right for you and your team.
With this guide, you will be able to define the business and technical requirements for your data platform, making the implementation process efficient and successful.
When designing a custom data architecture, business analytics, or operational intelligence platform for a client, four benefits of open source tools make them undoubtedly a better option in the vast majority of cases.
The implementation of a data handling platform, whether it is a centralized reporting system, Business Analytics, Operational Intelligence, or single point of truth for your company, will improve the way you make data-driven decisions.
Dattell is pleased to announce our formal partnership with Elastic, Inc. as an official reseller of the Elastic X-pack.
Dattell is partnering with the creators of Kafka and Confluent to strengthen our commitment to improve messaging and data infrastructure for companies of all sizes.
Issues with Apache Kafka performance are directly tied to system optimization and utilization. Here, we compiled the best practices for a high volume, clustered, general use case.
When companies scale, their data handling needs change, and systems that worked a year ago are now over-taxed with the increase in message volume. One particular component of the data handling system, the cluster architecture, should be revisited.