Published May 2023 Small, medium, and large OpenSearch clusters require different approaches for optimization. Dattell’s engineers are expert at designing, optimizing, and maintaining OpenSearch. Find out more about our OpenSearch support services. Optimizing a Small OpenSearch Cluster The minimum number of nodes for a small, highly available OpenSearch cluster is three (3). Three nodes might … Continue reading OpenSearch Cluster Optimization for Small, Medium, and Large Clusters
What is a Kafka partition?
Published May 2023 Apache Kafka partitions are integral to Kafka’s ability to scale. The act of partitioning divides up a single topic into multiple partitions. Each of the partitions can then exist on a separate node within the Kafka cluster. The work of storing, writing, and processing messages is then distributed across multiple nodes in … Continue reading What is a Kafka partition?
How does Geo-Replication Work in Apache Pulsar?
Published May 2023 Apache Pulsar offers geo-replication as out-of-the-box functionality. This sets Pulsar apart from some other message queues that require external tools for geo-replication. Pulsar Geo-Replication Geo-replication is the replication of messages across multiple clusters of a Pulsar instance. For clarification we are referring to a Pulsar instance as multiple processes of Pulsar brokers, … Continue reading How does Geo-Replication Work in Apache Pulsar?
Top 10 Reasons to Use Open Source Software
Published April 2023 Dattell specializes in support for open source software: OpenSearch, Kafka, and Pulsar. The benefits of using open source software are vast. Learn about the top 10 reasons companies choose open source software. #1 Lower Costs Open source software is often available for free, allowing companies to save on software licensing fees. For … Continue reading Top 10 Reasons to Use Open Source Software
Top 10 Reasons to Unify Data Into a Single Data Store
Published March 2023 Often companies find themselves with data stored in disparate data stores. The use of legacy systems and subdivisions within a company using their own systems are two common reasons for this scattered data storage. In this article we explain 10 reasons why it is important and beneficial for companies to unify data … Continue reading Top 10 Reasons to Unify Data Into a Single Data Store
Top 10 Reasons Companies Use OpenSearch
Published March 2023 In this post we review the top 10 reasons why companies choose Opensearch. In brief, companies are choosing OpenSearch because it is: cost-effective, scalable, customizable, has community support, has comprehensive security plug-ins, real-time analytics, easy to integrate, fast performance, based on open standards, and is future-proof. For those new to OpenSearch, it … Continue reading Top 10 Reasons Companies Use OpenSearch
Is Elasticsearch a NoSQL database?
Published March 2023 Yes, Elasticsearch is a NoSQL database. This means that SQL queries are supported but not full-featured. SQL databases, or relational databases, can be too rigid for some use cases. As an alternative, NoSQL databases provide more scalability and flexibility. NoSQL stands for “Not only SQL”. Horizontal Scaling with NoSQL NoSQL databases are … Continue reading Is Elasticsearch a NoSQL database?
Automating OpenSearch with Ansible
Published March 2023 Ansible is an open source automation tool that can automate OpenSearch implementations, maintenance, and workflows. Below we discuss the benefits of using Ansible to improve OpenSearch setup, documentation, and recovery. Ansible for Initial Setup and Configuration of OpenSearch Ansible can be used for the initial setup and configuration of an OpenSearch cluster. … Continue reading Automating OpenSearch with Ansible
Does Kafka Guarantee Message Order?
Published February 2023 Yes, Apache Kafka does guarantee message order. In this article we will walk you through how Kafka guarantees order using the detailed figure below. Producers’ Role in Kafka Ordering Data is sent to Kafka from producers. Producers are often applications that generate messages. And each message sent to Kafka will correlate with … Continue reading Does Kafka Guarantee Message Order?
Free Apache Pulsar Learning and Installation Resources
Published February 2023 The Apache Pulsar Learning Hub offers free educational resources. It includes fundamentals of Pulsar, installation tutorial videos, and sample code. Go to the Pulsar Learning Hub The Pulsar Learning Hub is composed of three parts. Part I is an introduction to the structure of Pulsar. This series of videos will give you … Continue reading Free Apache Pulsar Learning and Installation Resources
How to Index OpenSearch
Published January 2023 An OpenSearch Index is used to both organize and distribute data within a cluster. In this post we define both components of an Index and outline how to create, add to, delete, and reindex Indices in OpenSearch. We will also touch on querying, but querying OpenSearch is covered in more depth in … Continue reading How to Index OpenSearch
Is Kafka a message queue?
Published December 2022 Apache Kafka is not a traditional message queue. Kafka is a distributed messaging system that includes components of both a message queue and a publish-subscribe model. Kafka improves on the deficit of each of those traditional approaches allowing it to provide fault tolerant, high throughput stream processing. Traditional shared message queues are … Continue reading Is Kafka a message queue?
Enterprise Managed OpenSearch
Updated January 2023 Anyone in charge of ensuring their company’s data pipeline has the following five priorities in mind: reliability, security, speed, cost, and ownership. In this article we discuss how enterprise managed OpenSearch provides peace of mind, especially having someone to call when a cluster fails in the middle of the night. And we … Continue reading Enterprise Managed OpenSearch
What is Apache Pulsar’s Two-Layer System?
Published November 2022 Apache Pulsar has two layers, a serving layer and a storage layer. The Pulsar brokers carry out the data serving. The BookKeeper bookies provide the storage. In this article we will describe how Pulsar’s two-layer system works and review its benefits and drawbacks. Serving Layer: Role of Pulsar Brokers Data is sent … Continue reading What is Apache Pulsar’s Two-Layer System?
How to Check Elasticsearch Version
Published November 2022 Here we provide three simple ways to check which version of Elasticsearch is installed and running on your machine. The first uses the Kibana dev console, and the second two approaches use the command line. Let’s dive in. Check Elasticsearch Version Running Using Kibana Dev Console Perhaps the most convenient way to … Continue reading How to Check Elasticsearch Version
How to Query OpenSearch With Boolean Queries
Updated January 2023 OpenSearch boolean queries find or match documents using boolean clauses. In this article we describe how to construct a boolean query, or bool query. We will also work through several example OpenSearch bool queries. OpenSearch Boolean Clauses The four boolean clauses used for bool queries are filter, must, must_not, and should. filter … Continue reading How to Query OpenSearch With Boolean Queries
How to Check Kafka Version
Published October 2022 Here are two quick steps to check which version of Apache Kafka is running. Step 1: Change directories to the Kafka home directory. Step 2: Use command-line utilities to enter the following command: It will return the version running. Here the Kafka version running is 3.3.1.
What are Apache Pulsar Functions
Published October 2022 Apache Pulsar functions process simple tasks. Pulsar functions do not completely remove the need for separate technologies such as Apache Heron, Apache Storm, or Apache Flink, for complicated processing. However, often processing logic or computation is simple enough for it to be handled natively using Pulsar functions. Handling the computations natively within … Continue reading What are Apache Pulsar Functions
Kafka Consumer Basics
Updated February 2023 In this article we provide succinct answers to common Apache Kafka consumer questions. You will learn the basics of how Kafka consumer groups work, how many consumers you can have per topic, and other important Kafka consumer facts. Answers to commonly asked questions about Kafka consumers Can a Kafka consumer read from multiple … Continue reading Kafka Consumer Basics
OpenSearch Shard Optimization
Published September 2022 Optimizing OpenSearch for shard size is an important component for achieving maximum performance from your cluster. OpenSearch shards enable parallelization of data processing across both a single node and multiple OpenSearch nodes. OpenSearch automatically manages the allocation of shards within the nodes. However, choosing the number of shards needed is up to … Continue reading OpenSearch Shard Optimization
OpenSearch Terms and Definitions
Published September 2022 In this post we round up the most searched for OpenSearch terms and definitions. OpenSearch Node An OpenSearch node is a single OpenSearch process, and the minimum number of nodes for a highly available OpenSearch cluster is three. OpenSearch Cluster An OpenSearch cluster is one or more OpenSearch nodes with the same … Continue reading OpenSearch Terms and Definitions
Apache Pulsar Support FAQ
Published September 2022 There are common questions that new clients have about Apache Pulsar support services. Below is a list of a few of the most common questions inquiring new clients have when they reach out. Part A: Technical Questions We are encountering Pulsar scaling issues. Can you help? We routinely scale and optimize Pulsar for … Continue reading Apache Pulsar Support FAQ
Vector Search for OpenSearch
Published August 2022 OpenSearch includes a plugin for vector search. In this post, we introduce vector search and compare the different methods available. We will also point you in the right direction for example code. For personalized help, contact us to learn more about our OpenSearch support services. What is vector search? Here’s the … Continue reading Vector Search for OpenSearch
Kafka vs Pulsar
Updated January 2023 Pulsar and Kafka achieve the same result. They both guarantee messages reach their intended destination(s). Yet, there are important differences between the two message queues. These differences can make one of the technologies a better fit, depending on your use case. In this post we cover 8 ways in which Apache Kafka … Continue reading Kafka vs Pulsar
Preparing for a Cloud Outage
Published August 2022 Nearly all of our clients and a majority of companies are using the cloud for at least a portion of their infrastructure. It’s important for companies to plan for cloud outages to minimize the damage caused by them. In this post we will cover how to minimize damage and recover quickly after … Continue reading Preparing for a Cloud Outage
OpenSearch vs. Elasticsearch
Updated January 2023 With OpenSearch originating as a fork from Elasticsearch, the two databases can appear to be near-identical to the unacquainted. However, they are unique, becoming more so with each new update. Here we will discuss how the two search engines compare when it comes to security, licensing, core features, documentation, community support, dashboards, … Continue reading OpenSearch vs. Elasticsearch
Elasticsearch Support Services FAQ
Published July 2022 Our team of engineers has been architecting, optimizing, and managing Elasticsearch for over 6 years. We’ve found that there are common questions that new clients have about Elasticsearch support services. Below is a list of a few of the most common questions inquiring new clients have when they reach out. Let us … Continue reading Elasticsearch Support Services FAQ
How to Choose a Managed Kafka Service Provider
Published July 2022 It can be difficult to choose a managed Kafka service provider because they can all somehow appear so different and yet also so similar. Here we break down the 8 biggest factors to consider when comparing providers. 8 Considerations for Choosing a Managed Apache Kafka Provider #1 Preventative maintenance Preventative maintenance guided … Continue reading How to Choose a Managed Kafka Service Provider
Planning a Successful Migration to OpenSearch
Updated March 2023 There are plenty of posts and documentation on the nitty gritty approaches for migrating to OpenSearch (i.e. rolling updates, snapshots, etc.). Start with the OpenSearch documentation for that. Here we are covering something much larger and more important: The plans and support that need to be in place to facilitate a successful … Continue reading Planning a Successful Migration to OpenSearch
How to Save Money on Data Storage Costs
Published July 2022 With companies revisiting their budgets to brace for a possible recession, now is the time to review your data storage costs and find places to reduce those fees without sacrificing performance. In this article we consolidate our top tips for saving money on data storage costs. From the top we want to … Continue reading How to Save Money on Data Storage Costs
How to Prevent a Kafka Outage
Published June 2022 Apache Kafka is a highly reliable tool when configured correctly for your use case. It should be the piece of your data architecture that you can be sure will remain online. Here we put together eight important best practices to help shore up your Kafka implementation. 8 Tips to Prevent Kafka Downtime … Continue reading How to Prevent a Kafka Outage
Data Engineering Study
Published June 27, 2022 Data engineering is the field dedicated to building data infrastructure to ingest, process, and store large amounts of data. This is a quickly growing field, with both the number of jobs in data engineering and the number of tools on the market steadily increasing. Despite the popularity of data engineering as … Continue reading Data Engineering Study
What is a Virtual CIO?
Published June 2022 Virtual CIOs provide the leadership and expertise to build, grow, and maintain reliable data architecture. They are often hired by midsized companies that are looking for a trusted authority to drive data architecture and the supporting team. Virtual CIOs are also referred to as vCIOs, fractional CIOs, part-time CIOs, and CIOs for … Continue reading What is a Virtual CIO?
What is OpenSearch?
Updated May 2022 OpenSearch is an open source search and analytics software. It’s a community led project with Amazon Web Services (AWS) leading the development. It was first created as a fork from Elasticsearch 7.10.2 and Kibana 7.10.2 in 2021. The OpenSearch search engine is simply referred to as OpenSearch, and the dashboard tool is … Continue reading What is OpenSearch?
How to fix the MEMBER_ID Error in Kafka
Updated May 2022 If you found this post it’s likely because you got the Kafka member_id error. Let’s first cover why the error popped up and then go through two ways to resolve the error. Reason for the Kafka Member ID Error When a new consumer joins a group it enters with the member.id set … Continue reading How to fix the MEMBER_ID Error in Kafka
Kafka on Kubernetes
Updated January 2023 Companies are coming to us specifically for assistance with deploying and managing Apache Kafka on Kubernetes. With many teams already familiar with Kubernetes, it can sometimes be the best choice to spin up Kafka servers on Kubernetes alongside their other applications. Kafka on Kubernetes presents some challenges though. In this post we … Continue reading Kafka on Kubernetes
Elasticsearch Basics: What it is, Licensing, Languages, and Getting Help
Updated March 2023 Elasticsearch is a distributed search and analytics engine. It is built on top of Apache Lucene. Elasticsearch was first released in 2010 by the company now known as Elastic. It was originally completely open source, but license changes have limited its usage. More on that below. Elasticsearch is part of a group … Continue reading Elasticsearch Basics: What it is, Licensing, Languages, and Getting Help
Apache Pulsar Tutorial & Online Course
Updated February 2023 Two of our team members had a vision to make Apache Pulsar accessible to everyone through a free online course. Our team members who are experienced with building highly available Apache Pulsar messaging platforms put together this video series that leads learners through installing, configuring, and running Pulsar. Visit the Pulsar Course … Continue reading Apache Pulsar Tutorial & Online Course
How to Query Elasticsearch With Boolean Queries
Updated January 2023 Boolean queries in Elasticsearch are a popular query type because of their versatility and ease of use. Boolean queries, or bool queries, find or match documents by using boolean clauses. For the vast majority of cases, the filtering clause will be used because it can be cached for faster search times. In … Continue reading How to Query Elasticsearch With Boolean Queries
Hosted Apache Pulsar: Why managed Pulsar in your environment is a better choice
Updated March 2023 One of the attractions of hosted Apache Pulsar is the peace of mind that a third party is responsible for ensuring uptime. However, that conclusion doesn’t consider what a company loses by using a third party hosted service. Fully managed Pulsar services, hosted directly in your internal environment (cloud or on-prem), still … Continue reading Hosted Apache Pulsar: Why managed Pulsar in your environment is a better choice
What is Kafka Connect?
Updated February 2023 Kafka Connect is a free tool for efficiently moving data into and out of Apache Kafka. Kafka Connect simplifies streaming data while also improving scalability and reliability. Features of Kafka Connect Standardizes integrations with Kafka. Kafka Connect provides a shared framework for all Kafka connectors, which improves efficiency for connector development and … Continue reading What is Kafka Connect?
BookKeeper for Pulsar
Updated January 2023 As discussed in a previous article, “What is Apache Pulsar?”, Pulsar is a two-layer system with Pulsar brokers acting as the serving layer and Apache BookKeeper bookies providing the persistent storage layer. In this post we will review BookKeeper’s role, important terminology, and an introduction to configuring Ledgers. Apache BookKeeper Basics BookKeeper … Continue reading BookKeeper for Pulsar
Subscription Types in Apache Pulsar
Updated July 2022 Apache Pulsar is a publish-subscribe distributed messaging system. When consumers subscribe to topics in Pulsar, there are four different types to choose from: Exclusive, Failover, Shared, and Key_Shared. In this article we will review the different subscription types and what factors to consider when choosing between them. If you are interested in … Continue reading Subscription Types in Apache Pulsar
What is Apache Pulsar?
Updated July 2022 Apache Pulsar is an open source, publish-subscribe messaging system. It’s unique because of its two-layer system where the serving and storage layers are separated. Pulsar runs with two supporting technologies, Apache BookKeeper and Apache ZooKeeper. The three technologies together provide a high throughput, low latency distributed messaging system. Pulsar Broker – Serving … Continue reading What is Apache Pulsar?
How to Query Elasticsearch in Kibana
Updated July 2022 Kibana Query Syntax When querying Elasticsearch in Kibana you can either use the traditional Lucene query syntax or the newer Kibana Query Language (KQL). If you are using Kibana 7.0 or later, Kibana Query Language is included as a default. In this article we provide the basics for both approaches and provide … Continue reading How to Query Elasticsearch in Kibana
Solr vs Elasticsearch
Updated December 2022 Both Apache Solr and Elasticsearch are popular open source* search engines built on top of Lucene. This article is intended to help readers learn more about the technologies in relation to one another to guide technology decisions. * Check out this article for information about recent Elasticsearch licensing changes. Elasticsearch is no … Continue reading Solr vs Elasticsearch
How to Index Elasticsearch
Updated January 2023 An Index in Elasticsearch is used to both organize and distribute data within a cluster. In this post we will define both components of an Index and then outline how to create, add to, delete, and reindex Indicies in Elasticsearch. We will also touch on querying, but Elasticsearch querying is covered in … Continue reading How to Index Elasticsearch
Kafka Uses Consumer Groups for Scaling Event Streaming
Updated July 2022 Apache Kafka is a distributed messaging system that implements pieces of the two traditional messaging models, Shared Message Queues and Publish-Subscribe. Both Shared Message Queues and Publish-Subscribe models present limitations for handling high throughput use cases. Apache Kafka provides fault tolerant, high throughput stream processing that can handle even the most complicated … Continue reading Kafka Uses Consumer Groups for Scaling Event Streaming
Kafka Case Studies
Updated March 2023 Below are eight case studies showcasing how our Kafka experts have supported clients with Kafka challenges. These case studies cover a variety of fields and highlight the vast applications for Kafka across industries. Learn about Kafka support Information Security CHALLENGE: Client needs to monitor all servers, devices, applications, and laptops in this … Continue reading Kafka Case Studies
Comparing Confluent Kafka and Apache Kafka
Updated July 2022 In this post we will compare Apache Kafka and the Confluent Kafka Platform, describing what they have in common and what sets them apart. What is Confluent Kafka and Apache Kafka? Apache Kafka is an open source message broker that provides high throughput, high availability, and low latency. Apache Kafka can be … Continue reading Comparing Confluent Kafka and Apache Kafka