Published July 2022 With companies revisiting their budgets to brace for a possible recession, now is the time to review your data storage costs and find places to reduce those fees without sacrificing performance. In this article we consolidate our top tips for saving money on data storage costs. From the top we want to … Continue reading How to Save Money on Data Storage Costs
How to Prevent a Kafka Outage
Published August 2023 Apache Kafka is a highly reliable tool when configured correctly for your use case. It should be the piece of your data architecture that you can be sure will remain online. Here we put together eight important best practices to help shore up your Kafka implementation. 8 Tips to Prevent Kafka Downtime … Continue reading How to Prevent a Kafka Outage
Data Engineering Study
Published June 27, 2022 Data engineering is the field dedicated to building data infrastructure to ingest, process, and store large amounts of data. This is a quickly growing field, with both the number of jobs in data engineering and the number of tools on the market steadily increasing. Despite the popularity of data engineering as … Continue reading Data Engineering Study
What is a Virtual CIO?
Published June 2022 Virtual CIOs provide the leadership and expertise to build, grow, and maintain reliable data architecture. They are often hired by midsized companies that are looking for a trusted authority to drive data architecture and the supporting team. Virtual CIOs are also referred to as vCIOs, fractional CIOs, part-time CIOs, and CIOs for … Continue reading What is a Virtual CIO?
What is OpenSearch?
Updated May 2022 OpenSearch is an open source search and analytics software. It’s a community led project with Amazon Web Services (AWS) leading the development. It was first created as a fork from Elasticsearch 7.10.2 and Kibana 7.10.2 in 2021. The OpenSearch search engine is simply referred to as OpenSearch, and the dashboard tool is … Continue reading What is OpenSearch?
How to fix the MEMBER_ID Error in Kafka
Updated May 2022 If you found this post it’s likely because you got the Kafka member_id error. Let’s first cover why the error popped up and then go through two ways to resolve the error. Reason for the Kafka Member ID Error When a new consumer joins a group it enters with the member.id set … Continue reading How to fix the MEMBER_ID Error in Kafka
Kafka on Kubernetes
Updated August 2023 Companies are coming to us specifically for assistance with deploying and managing Apache Kafka on Kubernetes. With many teams already familiar with Kubernetes, it can sometimes be the best choice to spin up Kafka servers on Kubernetes alongside their other applications. Kafka on Kubernetes presents some challenges though. In this post we … Continue reading Kafka on Kubernetes
Elasticsearch Basics: What it is, Licensing, Languages, and Getting Help
Updated March 2023 Elasticsearch is a distributed search and analytics engine. It is built on top of Apache Lucene. Elasticsearch was first released in 2010 by the company now known as Elastic. It was originally completely open source, but license changes have limited its usage. More on that below. Elasticsearch is part of a group … Continue reading Elasticsearch Basics: What it is, Licensing, Languages, and Getting Help
Apache Pulsar Tutorial & Online Course
Updated February 2023 Two of our team members had a vision to make Apache Pulsar accessible to everyone through a free online course. Our team members who are experienced with building highly available Apache Pulsar messaging platforms put together this video series that leads learners through installing, configuring, and running Pulsar. Visit the Pulsar Course … Continue reading Apache Pulsar Tutorial & Online Course
How to Query Elasticsearch With Boolean Queries
Updated January 2023 Boolean queries in Elasticsearch are a popular query type because of their versatility and ease of use. Boolean queries, or bool queries, find or match documents by using boolean clauses. For the vast majority of cases, the filtering clause will be used because it can be cached for faster search times. In … Continue reading How to Query Elasticsearch With Boolean Queries
Hosted Apache Pulsar: Why managed Pulsar in your environment is a better choice
Updated June 2023 One of the attractions of hosted Apache Pulsar is the peace of mind that a third party is responsible for ensuring uptime. However, that conclusion doesn’t consider what a company loses by using a third party hosted service. Fully managed Pulsar services, hosted directly in your internal environment (cloud or on-prem), still … Continue reading Hosted Apache Pulsar: Why managed Pulsar in your environment is a better choice
What is Kafka Connect?
Updated February 2023 Kafka Connect is a free tool for efficiently moving data into and out of Apache Kafka. Kafka Connect simplifies streaming data while also improving scalability and reliability. Features of Kafka Connect Standardizes integrations with Kafka. Kafka Connect provides a shared framework for all Kafka connectors, which improves efficiency for connector development and … Continue reading What is Kafka Connect?
BookKeeper for Pulsar
Updated January 2023 As discussed in a previous article, “What is Apache Pulsar?”, Pulsar is a two-layer system with Pulsar brokers acting as the serving layer and Apache BookKeeper bookies providing the persistent storage layer. In this post we will review BookKeeper’s role, important terminology, and an introduction to configuring Ledgers. Apache BookKeeper Basics BookKeeper … Continue reading BookKeeper for Pulsar
Subscription Types in Apache Pulsar
Updated August 2023 Apache Pulsar is a publish-subscribe distributed messaging system. When consumers subscribe to topics in Pulsar, there are four different types to choose from: Exclusive, Failover, Shared, and Key_Shared. In this article we will review the different subscription types and what factors to consider when choosing between them. If you are interested in … Continue reading Subscription Types in Apache Pulsar
What is Apache Pulsar?
Updated August 2023 Apache Pulsar is an open source, publish-subscribe messaging system. It’s unique because of its two-layer system where the serving and storage layers are separated. Pulsar runs with two supporting technologies, Apache BookKeeper and Apache ZooKeeper. The three technologies together provide a high throughput, low latency distributed messaging system. Pulsar Broker – Serving … Continue reading What is Apache Pulsar?
How to Query Elasticsearch in Kibana
Updated July 2022 Kibana Query Syntax When querying Elasticsearch in Kibana you can either use the traditional Lucene query syntax or the newer Kibana Query Language (KQL). If you are using Kibana 7.0 or later, Kibana Query Language is included as a default. In this article we provide the basics for both approaches and provide … Continue reading How to Query Elasticsearch in Kibana
Solr vs Elasticsearch
Updated December 2022 Both Apache Solr and Elasticsearch are popular open source* search engines built on top of Lucene. This article is intended to help readers learn more about the technologies in relation to one another to guide technology decisions. * Check out this article for information about recent Elasticsearch licensing changes. Elasticsearch is no … Continue reading Solr vs Elasticsearch
How to Index Elasticsearch
Updated January 2023 An Index in Elasticsearch is used to both organize and distribute data within a cluster. In this post we will define both components of an Index and then outline how to create, add to, delete, and reindex Indicies in Elasticsearch. We will also touch on querying, but Elasticsearch querying is covered in … Continue reading How to Index Elasticsearch
Kafka Uses Consumer Groups for Scaling Event Streaming
Updated August 2023 Apache Kafka is a distributed messaging system that implements pieces of the two traditional messaging models, Shared Message Queues and Publish-Subscribe. Both Shared Message Queues and Publish-Subscribe models present limitations for handling high throughput use cases. Apache Kafka provides fault tolerant, high throughput stream processing that can handle even the most complicated … Continue reading Kafka Uses Consumer Groups for Scaling Event Streaming
Kafka Case Studies
Updated March 2023 Below are eight case studies showcasing how our Kafka experts have supported clients with Kafka challenges. These case studies cover a variety of fields and highlight the vast applications for Kafka across industries. Learn about Kafka support Information Security CHALLENGE: Client needs to monitor all servers, devices, applications, and laptops in this … Continue reading Kafka Case Studies
Comparing Confluent Kafka and Apache Kafka
Updated August 2023 In this post we will compare Apache Kafka and the Confluent Kafka Platform, describing what they have in common and what sets them apart. What is Confluent Kafka and Apache Kafka? Apache Kafka is a free, open source message broker that provides high throughput, high availability, and low latency. Apache Kafka can … Continue reading Comparing Confluent Kafka and Apache Kafka
Elasticsearch Definitions
Updated March 2023 When learning Elasticsearch, it’s important to start with a good foundational understanding of key concepts and terms. In this post we define and explain important Elasticsearch concepts. Elasticsearch Terms and Definitions Elasticsearch Node. An Elasticsearch node is a single Elasticsearch process, and the minimum number of nodes for a highly available Elasticsearch … Continue reading Elasticsearch Definitions
Kafka Definitions
Updated July 2022 Taking a break from Kafka optimization posts to get back to the basics of Apache Kafka and define fundamental Kafka concepts. Kafka Definitions: A Primer for Apache Kafka Fundamentals Kafka Producer. A Kafka producer is a standalone application, or addition to your application, that sends data to Kafka broker(s). Kafka Broker. A … Continue reading Kafka Definitions
Kafka Consumer Optimization
Updated August 2023 Kafka Consumer’s Role. The role of the Kafka consumer is to read data from Kafka. Kafka consumer optimization can help avoid errors and increase performance of your application. While the focus of this blog post is on the consumer, we will also review several broker configurations which affect the performance of consumers. Top … Continue reading Kafka Consumer Optimization
Top 10 Apache Kafka Features That Drive Its Popularity
Updated January 2022 Apache Kafka is hugely popular because of its features that guarantee uptime, make it easy to scale, enable Kafka to handle high volumes, and much more. In this article we will discuss the Top 10 Apache Kafka features to help you evaluate if Kafka is the right technology for your company’s business … Continue reading Top 10 Apache Kafka Features That Drive Its Popularity
What is a Kafka Topic?
Updated April 2022 Kafka topics are the categories used to organize messages. Each topic has a name that is unique across the entire Kafka cluster. Messages are sent to and read from specific topics. In other words, producers write data to topics, and consumers read data from topics. Kafka topics are multi-subscriber. This means that … Continue reading What is a Kafka Topic?
Hosted Kafka: Why Managed Kafka in Your Cloud or Data center is a Better Choice Than Hosted Kafka
In this post we challenge the misconception that managed Kafka services need to be hosted on third party platforms.
Uptime Guarantees for Managed Kafka as a Service
Updated September 2022 Kafka’s primary role in many data architecture designs is ensuring that no data is lost. Databases can fail. Servers can fail. Applications can fail. But a well designed Kafka deployment should provide 24/7, reliable, fault-tolerant message collection and processing. One way to ensure an expertly designed and managed Kafka deployment is to … Continue reading Uptime Guarantees for Managed Kafka as a Service
Open Source Monitoring for Kafka
Updated December 2021 A critical component to ensuring Kafka uptime and maintaining peak performance is through monitoring. Open source monitoring of disk performance, memory usage, CPU, network traffic, and load allow you to identify abnormal metrics in real-time and address potential issues before a performance dip or outage occurs. In other words, monitoring Apache Kafka … Continue reading Open Source Monitoring for Kafka
5 Factors to Consider When Choosing a Kafka as a Service Provider
Updated April 2021 There are a handful of providers offering Kafka as a Service. If you are in the market for managed Kafka you might be wondering what factors to consider when choosing a provider. In this post, we break down the five most important considerations. #1 Is the service fully managed? If the service … Continue reading 5 Factors to Consider When Choosing a Kafka as a Service Provider
Load Balancing With Kafka
Updated February 2023 What is Kafka loading balancing? Load balancing with Kafka is a straightforward process and is handled by the Kafka producers by default. While it isn’t traditional load balancing, it does spread out the message load between partitions while preserving message ordering. Round-robin approach: By default, producers choose the partition assignment for each … Continue reading Load Balancing With Kafka
Understanding Kafka Consumer Offset
Updated January 2023 One aspect of Kafka that can cause some confusion for new users is the consumer offset. In this post, we define consumer offset and outline the factors that determine Kafka Consumer offset. Defining Kafka Consumer Offset The consumer offset is a way of tracking the sequential order in which messages are received … Continue reading Understanding Kafka Consumer Offset
What is ZooKeeper & How Does it Support Kafka?
Updated July 2022 ZooKeeper is used in distributed systems for service synchronization and as a naming registry. When working with Apache Kafka, ZooKeeper is primarily used to track the status of nodes in the Kafka cluster and maintain a list of Kafka topics and messages. Jump to info on using Kafka without ZooKeeper ZooKeeper … Continue reading What is ZooKeeper & How Does it Support Kafka?
Origins of Kafka and Why it Plays Well With Elasticsearch
Updated June 2022 Many companies leverage both Apache Kafka and the Elastic Stack (Elasticsearch, Logstash, and Kibana) for log and/or event processing. Kafka is often used as the transport layer, storing and processing data, typically large amounts of data. Kafka stages data before it makes its way to the Elastic Stack. Logstash transforms the data, … Continue reading Origins of Kafka and Why it Plays Well With Elasticsearch
Kafka Use Cases
Updated April 2021 Apache Kafka is a high-throughput, open source message queue used by Fortune 100 companies, government entities, and startups alike. Part of Kafka’s appeal is its wide array of use cases. In this post we will outline several of Kafka’s uses cases from event sourcing to tracking web activities to metrics and more. … Continue reading Kafka Use Cases
Performance Tuning for Apache Kafka
For Apache Kafka performance tuning measure latency and throughput for your Kafka implementation. Latency is the measure of how long it takes Kafka to process a single event. Throughput is the measure of how many events arrive within a particular period of time.
Elasticsearch Shards — Definitions, Sizes, Optimizations, and More
Updated September 2022 Optimizing Elasticsearch for shard size is an important component for achieving maximum performance from your cluster. To get started let’s review a few definitions that are an important part of the Elasticsearch jargon. If you are already familiar with Elasticsearch, you can continue straight to the next section. Defining Elasticsearch Jargon: Cluster, … Continue reading Elasticsearch Shards — Definitions, Sizes, Optimizations, and More
Elasticsearch Optimization for Small, Medium, and Large Clusters
Updated January 2023 The way nodes are organized in an Elasticsearch cluster changes depending on the size of the cluster. For small, medium, and large Elasticsearch clusters there will be different approaches for optimization. Dattell’s team of engineers are expert at designing, optimizing, and maintaining Elasticsearch implementations and supporting technologies. Click here to learn more … Continue reading Elasticsearch Optimization for Small, Medium, and Large Clusters
Kafka Optimization: Kafka Security Checklist
Updated July 2022 There are six key components to securing Kafka. These best practices will help you optimize Kafka and protect your data from avoidable exposure. #1 Encryption By default, data is plaintext in Kafka, which leaves it vulnerable to a man-in-the-middle attack as data is routed over your network. Transport layer security (TLS) and/or … Continue reading Kafka Optimization: Kafka Security Checklist
Kafka Optimization — How many partitions are needed?
Updated February 2023 Apache Kafka is a distributed system, running in a cluster with each of the nodes referred to as brokers. Kafka topics are partitioned and replicated across the brokers throughout the entirety of the implementation. Why are Partitions Important? Partitions allow users to parallelize topics, meaning data for any topic can be divided … Continue reading Kafka Optimization — How many partitions are needed?
Creating a Kafka Topic: What Are Kafka Topics & How Are They Created?
In this post, we will define what Kafka topics are and explain how to create them.
Kafka vs. RabbitMQ: How to choose an open source message broker
There are several message queue programs to choose from: Kafka, RabbitMQ, ActiveMQ, ZeroMQ, Redis, among others. How do you choose which is right for you?
Kafka Monitoring With Elasticsearch and Kibana
Monitoring Kafka cluster performance is crucial for diagnosing system issues and preventing future problems. We recommend using Elasticsearch for Kafka monitoring because Elasticsearch is free and highly versatile as a single source of truth throughout any organization.
Top 10 Questions: Elasticsearch Consulting & Managed Services
Dattell’s engineers work one-on-one with companies to design, implement, manage, and improve their Elasticsearch deployments. Get answers to top questions about Elasticsearch consulting and managed services.
Frequently Asked Questions: Apache Kafka
Our team is experienced with implementing and fixing Kafka on a wide-range of systems for an even wider-range of business needs. From our real-world experience with Kafka consulting, we found that there are common questions that many new clients have about the technology.
Here are some quick answers to those questions.
4 Approaches to Data Backup
We outlined the four primary ways for backing up data and their benefits and drawbacks to help you decide on which approach best meets your company’s needs.
Dashboards for Data-Driven Decision Making: As Easy as Driving Your Car
When we are driving, we are routinely making data-driven decisions using the gauges on our dashboard to guide us. Data-driven decision making should be just as easy when it comes to business.
6 Tips for Choosing Between AWS Elasticsearch and a Custom Elasticsearch Solution
We broke down the thought process for choosing between AWS Elasticsearch and a custom Elasticsearch solution here to help you think through what will be right for you and your team.
3 Business Questions to Guide Data Collection, Storage, and Insights
With this guide, you will be able to define the business and technical requirements for your data platform, making the implementation process efficient and successful.
Open Source Tools for Data Architecture, Business Analytics, and Operational Intelligence: Tech’s Little Free Library
When designing a custom data architecture, business analytics, or operational intelligence platform for a client, four benefits of open source tools make them undoubtedly a better option in the vast majority of cases.