Published November 2023 Apache Pulsar servers may need occasional restarts to maintain optimal performance and keep current with software updates. Thoughtful planning is needed to perform these restarts correctly and avoid disruption. In this instructional guide, we will walk you through the best practices for restarting a Pulsar server. Reasons for Restarting a Pulsar Server … Continue reading Instructions for Restarting a Pulsar Server
Published November 2023 Apache Kafka servers may need occasional restarts to maintain optimal performance and stability. There are four common reasons to restart your Kafka server. Configuration Changes. A restart may be necessary after adjusting topic or security settings. Software Updates. Bug fixes and new feature updates are essential for Kafka security and performance. A … Continue reading Instructions for Restarting a Kafka Server
Published November 2023 One intriguing feature of OpenSearch is its native support of multi-tenancy. But what exactly is multi-tenancy, and why is it so important? Let’s dive in. What is OpenSearch Multi-tenancy? Multitenancy is the ability of a single instance of software to serve many users (tenants). Each tenant operates in a dedicated, isolated environment. … Continue reading Understanding OpenSearch Multi-tenancy
An OpenSearch testing environment provides a safe space to validate changes and avoids costly disruptions to production.
Published October 2023 Not every company has a testing environment for Pulsar, but every company should. In this post, you’ll learn why a testing environment is crucial for Apache Pulsar, what it should encompass, when it should be employed, the errors it can help avert in production, and its inherent limitations. The Importance of a … Continue reading Why You Need a Testing Environment for Apache Pulsar
Published October 2023 Not every company has a testing environment for Kafka, but every company should. In this post, you’ll learn why a testing environment is indispensable for Apache Kafka, what it should include, when to use it, the problems it helps avoid, and its inherent limitations. Why a Kafka Testing Environment is Essential Implementing … Continue reading Why You Need a Testing Environment for Kafka
Published October 2023 One of the emerging trends we’ve observed in the data architecture space is the growing interest in moving from externally hosted Kafka to companies running Kafka in their own environments. In this post, we’ll detail why companies are migrating Kafka to their internal environments, the importance of zero downtime migration, and how … Continue reading Zero Downtime Migration to Kafka
Published October 2023 One of the emerging trends we’ve observed in the data architecture space is the growing interest in Apache Pulsar. In this post, we’ll cover why companies are migrating to Pulsar, the importance of zero downtime migration, and how we ensure a seamless transition to Pulsar. Why Migrate to Apache Pulsar? Apache Pulsar … Continue reading Zero Downtime Migration to Apache Pulsar
Published September 2023 Traditional batch processing systems, albeit effective for fixed datasets, are often ill-suited for real-time, high-throughput scenarios. Kafka Streams enables organizations to act on data as it arrives, making it particularly useful for applications that require immediate response. Kafka Streams is a client library for building real-time applications and microservices. It’s a part … Continue reading What is Kafka Streams?
Published September 2023 Apache Pulsar is a multi-tenant, high-performance data streaming and messaging technology. Its capabilities make it a go-to choice for businesses looking to make real-time, data-driven decisions. However, the way you choose to implement Apache Pulsar can significantly impact your operations. The two primary options are Pulsar as a Service and Managed Pulsar … Continue reading Comparing Pulsar as a Service and Managed Pulsar in Your Environment
Published September 2023 Depending on your data infrastructure needs, one data streaming tool, such as Apache Pulsar or Apache Kafka, may be better suited than another option. In this post, we outline 10 use cases for Apache Pulsar. We provide detailed examples and discuss when Pulsar is the superior choice over other solutions. If you’re … Continue reading Apache Pulsar Use Cases
Published September 2023 The choice of search platforms can significantly impact business operations. As experts in high-throughput implementations, we’ve witnessed the rise of various search platforms, each with its unique offerings. Today, we’ll explore OpenSearch and Google Cloud Search in depth, drawing parallels and highlighting differences. Introducing Opensearch & Cloud Search OpenSearch: A relatively new … Continue reading Comparing OpenSearch and Google Cloud Search
Published August 2023 We often get asked: “Can a Kafka consumer listen to multiple topics?” The short answer is: Yes, a Kafka consumer can listen to (or subscribe to) multiple topics. In this post we discuss examples where consumers would listen to multiple topics versus only one. And we address concerns about potential impacts on … Continue reading Kafka Consumers: The Power of Listening to Multiple Topics
Published August 2023 Apache Kafka is the backbone of many modern data architectures due to its ability to handle real-time data streams efficiently. One of the most commonly asked questions in the Kafka community is, “How many topics can Kafka support?” In this blog post, we will dive deep into this question of topic capacity … Continue reading Scaling Kafka: Unraveling the Mystery of Topic Capacity
Published August 2023 The Apache Pulsar schema registry ensures a smooth transfer of information between producers and consumers. One of the reasons a distributed messaging platform is important is because sometimes a producer writes data in a format that cannot be read by a consumer. For instance, maybe a producer sends data in CSV format … Continue reading Pulsar Schema Registry
Published August 2023 Yes, Elasticsearch can be loosely defined as a database. But more accurately, it’s a distributed search engine. Elasticsearch can be used for text-based data, numerical data, geospatial data, vector data, and aggregating data. Elasticsearch is a NoSQL database, meaning SQL queries are supported but not full-featured. We have two articles on how … Continue reading Is Elasticsearch a Database?
Published August 2023 Optimizing Kafka broker performance will have a direct impact on your overall Kafka implementation. In this blog post, we cover what a Kafka broker does, why it’s essential, and how to optimize Kafka broker settings. We also share firsthand experiences optimizing Kafka brokers from our work with Fortune 500 companies. We encourage … Continue reading Optimizing Kafka Brokers: Lessons From Managing Fortune 500 Implementations
Published July 2023 Both RabbitMQ and Apache Pulsar are popular options for distributed systems and messaging architectures. The two open source technologies can handle high-throughput message traffic and ensure reliable communication between various apps and other components of complex data systems. In this post we will compare the features, strengths, and differences between RabbitMQ and … Continue reading RabbitMQ vs Pulsar
Published July 2023 Yes, OpenSearch can be loosely defined as a database. But more accurately, it’s a distributed search engine. OpenSearch can be used for text-based data, numerical data, geospatial data, vector data, and aggregating data. OpenSearch is a NoSQL database, meaning SQL queries are supported but not full-featured. We have several education articles for … Continue reading Is OpenSearch a Database?
Updated September 2023 Apache Kafka is available under the Apache License v2.0, and is completely free. The license is extremely permissive, allowing for users to include Kafka in consumer products and edit the code. Confluent Kafka is available either free under their community license (not open source) or a paid version. The paid version of … Continue reading Is Kafka free?
Published June 2023 You can test if Elasticsearch is running using the curl tool. You can download curl here. Input You’ll notice we used port 9200. That’s the default port for Elasticsearch. We also use localhost. You could alternatively use the local IP address of your machine, for example 192.168.1.1. If you get a response … Continue reading How to Check if Elasticsearch is Running
Published May 2023 Small, medium, and large OpenSearch clusters require different approaches for optimization. Dattell’s engineers are expert at designing, optimizing, and maintaining OpenSearch. Find out more about our OpenSearch support services. Optimizing a Small OpenSearch Cluster The minimum number of nodes for a small, highly available OpenSearch cluster is three (3). Three nodes might … Continue reading OpenSearch Cluster Optimization for Small, Medium, and Large Clusters
Published May 2023 Apache Kafka partitions are integral to Kafka’s ability to scale. The act of partitioning divides up a single topic into multiple partitions. Each of the partitions can then exist on a separate node within the Kafka cluster. The work of storing, writing, and processing messages is then distributed across multiple nodes in … Continue reading What is a Kafka partition?
Published May 2023 Apache Pulsar offers geo-replication as out-of-the-box functionality. This sets Pulsar apart from some other message queues that require external tools for geo-replication. Pulsar Geo-Replication Geo-replication is the replication of messages across multiple clusters of a Pulsar instance. For clarification we are referring to a Pulsar instance as multiple processes of Pulsar brokers, … Continue reading How does Geo-Replication Work in Apache Pulsar?
Published April 2023 Dattell specializes in support for open source software: OpenSearch, Kafka, and Pulsar. The benefits of using open source software are vast. Learn about the top 10 reasons companies choose open source software. #1 Lower Costs Open source software is often available for free, allowing companies to save on software licensing fees. For … Continue reading Top 10 Reasons to Use Open Source Software
Published March 2023 Often companies find themselves with data stored in disparate data stores. The use of legacy systems and subdivisions within a company using their own systems are two common reasons for this scattered data storage. In this article we explain 10 reasons why it is important and beneficial for companies to unify data … Continue reading Top 10 Reasons to Unify Data Into a Single Data Store
Updated August 2023 In this post we review the top 10 reasons why companies choose Opensearch. In brief, companies are choosing OpenSearch because it is: cost-effective, scalable, customizable, has community support, has comprehensive security plug-ins, real-time analytics, easy to integrate, fast performance, based on open standards, and is future-proof. For those new to OpenSearch, it … Continue reading Top 10 Reasons Companies Use OpenSearch
Published March 2023 Yes, Elasticsearch is a NoSQL database. This means that SQL queries are supported but not full-featured. SQL databases, or relational databases, can be too rigid for some use cases. As an alternative, NoSQL databases provide more scalability and flexibility. NoSQL stands for “Not only SQL”. Horizontal Scaling with NoSQL NoSQL databases are … Continue reading Is Elasticsearch a NoSQL database?
Published March 2023 Ansible is an open source automation tool that can automate OpenSearch implementations, maintenance, and workflows. Below we discuss the benefits of using Ansible to improve OpenSearch setup, documentation, and recovery. Ansible for Initial Setup and Configuration of OpenSearch Ansible can be used for the initial setup and configuration of an OpenSearch cluster. … Continue reading Automating OpenSearch with Ansible
Published August 2023 Yes, Apache Kafka does guarantee message order. In this article we will walk you through how Kafka guarantees order using the detailed figure below. Producers’ Role in Kafka Ordering Data is sent to Kafka from producers. Producers are often applications that generate messages. And each message sent to Kafka will correlate with … Continue reading Does Kafka Guarantee Message Order?
Published February 2023 The Apache Pulsar Learning Hub offers free educational resources. It includes fundamentals of Pulsar, installation tutorial videos, and sample code. Go to the Pulsar Learning Hub The Pulsar Learning Hub is composed of three parts. Part I is an introduction to the structure of Pulsar. This series of videos will give you … Continue reading Free Apache Pulsar Learning and Installation Resources
Published August 2023 An OpenSearch Index is used to both organize and distribute data within a cluster. In this post we define both components of an Index and outline how to create, add to, delete, and reindex Indices in OpenSearch. We will also touch on querying, but querying OpenSearch is covered in more depth in … Continue reading How to Index OpenSearch
Updated August 2023 Apache Kafka is not a traditional message queue. Kafka is a free to use, distributed messaging system that includes components of both a message queue and a publish-subscribe model. Kafka improves on the deficit of each of those traditional approaches allowing it to provide fault tolerant, high throughput stream processing. Traditional shared … Continue reading Is Kafka a message queue?
Updated August 2023 Anyone in charge of ensuring their company’s data pipeline has the following five priorities in mind: reliability, security, speed, cost, and ownership. In this article we discuss how enterprise managed OpenSearch provides peace of mind, especially having someone to call when a cluster fails in the middle of the night. And we … Continue reading Enterprise Managed OpenSearch
Published August 2023 Apache Pulsar has two layers, a serving layer and a storage layer. The Pulsar brokers carry out the data serving. The BookKeeper bookies provide the storage. In this article we will describe how Pulsar’s two-layer system works and review its benefits and drawbacks. Serving Layer: Role of Pulsar Brokers Data is sent … Continue reading What is Apache Pulsar’s Two-Layer System?
Published November 2022 Here we provide three simple ways to check which version of Elasticsearch is installed and running on your machine. The first uses the Kibana dev console, and the second two approaches use the command line. Let’s dive in. Check Elasticsearch Version Running Using Kibana Dev Console Perhaps the most convenient way to … Continue reading How to Check Elasticsearch Version
Updated August 2023 OpenSearch boolean queries find or match documents using boolean clauses. In this article we describe how to construct a boolean query, or bool query. We will also work through several example OpenSearch bool queries. OpenSearch Boolean Clauses The four boolean clauses used for bool queries are filter, must, must_not, and should. filter … Continue reading How to Query OpenSearch With Boolean Queries
Published October 2022 Here are two quick steps to check which version of Apache Kafka is running. Step 1: Change directories to the Kafka home directory. Step 2: Use command-line utilities to enter the following command: It will return the version running. Here the Kafka version running is 3.3.1.
Published October 2022 Apache Pulsar functions process simple tasks. Pulsar functions do not completely remove the need for separate technologies such as Apache Heron, Apache Storm, or Apache Flink, for complicated processing. However, often processing logic or computation is simple enough for it to be handled natively using Pulsar functions. Handling the computations natively within … Continue reading What are Apache Pulsar Functions
Updated August 2023 In this article we provide succinct answers to common Apache Kafka consumer questions. You will learn the basics of how Kafka consumer groups work, how many consumers you can have per topic, and other important Kafka consumer facts. Answers to commonly asked questions about Kafka consumers Can a Kafka consumer read from multiple … Continue reading Kafka Consumer Basics
Updated August 2023 Optimizing OpenSearch for shard size is an important component for achieving maximum performance from your cluster. OpenSearch shards enable parallelization of data processing across both a single node and multiple OpenSearch nodes. OpenSearch automatically manages the allocation of shards within the nodes. However, choosing the number of shards needed is up to … Continue reading OpenSearch Shard Optimization
Published August 2023 In this post we round up the most searched for OpenSearch terms and definitions. OpenSearch Node An OpenSearch node is a single OpenSearch process, and the minimum number of nodes for a highly available OpenSearch cluster is three. OpenSearch Cluster An OpenSearch cluster is one or more OpenSearch nodes with the same … Continue reading OpenSearch Terms and Definitions
Updated August 2023 Read about common questions that new clients have when searching for Apache Pulsar support services. Visit Pulsar Services Page Part A: Technical Questions We are encountering Pulsar scaling issues. Can you help?We routinely scale and optimize Pulsar for companies receiving large amounts of data, along with data bursts. We also set up … Continue reading Apache Pulsar Support FAQ
Updated August 2023 OpenSearch includes a plugin for vector search. In this post, we introduce vector search and compare the different methods available. We will also point you in the right direction for example code. For personalized help, contact us to learn more about our OpenSearch support services. What is vector search? Here’s the … Continue reading Vector Search for OpenSearch
Updated August 2023 Pulsar and Kafka achieve the same result. They both guarantee messages reach their intended destination(s). Yet, there are important differences between the two message queues. These differences can make one of the technologies a better fit, depending on your use case. In this post we cover 8 ways in which Apache Kafka … Continue reading Kafka vs Pulsar
Published August 2022 Nearly all of our clients and a majority of companies are using the cloud for at least a portion of their infrastructure. It’s important for companies to plan for cloud outages to minimize the damage caused by them. In this post we will cover how to minimize damage and recover quickly after … Continue reading Preparing for a Cloud Outage
Updated January 2023 With OpenSearch originating as a fork from Elasticsearch, the two databases can appear to be near-identical to the unacquainted. However, they are unique, becoming more so with each new update. Here we will discuss how the two search engines compare when it comes to security, licensing, core features, documentation, community support, dashboards, … Continue reading OpenSearch vs. Elasticsearch
Published July 2022 Our team of engineers has been architecting, optimizing, and managing Elasticsearch for over 6 years. We’ve found that there are common questions that new clients have about Elasticsearch support services. Below is a list of a few of the most common questions inquiring new clients have when they reach out. Let us … Continue reading Elasticsearch Support Services FAQ
Published July 2022 It can be difficult to choose a managed Kafka service provider because they can all somehow appear so different and yet also so similar. Here we break down the 8 biggest factors to consider when comparing providers. 8 Considerations for Choosing a Managed Apache Kafka Provider #1 Preventative maintenance Preventative maintenance guided … Continue reading How to Choose a Managed Kafka Service Provider
Updated March 2023 There are plenty of posts and documentation on the nitty gritty approaches for migrating to OpenSearch (i.e. rolling updates, snapshots, etc.). Start with the OpenSearch documentation for that. Here we are covering something much larger and more important: The plans and support that need to be in place to facilitate a successful … Continue reading Planning a Successful Migration to OpenSearch