Published November 2023 Apache Pulsar servers may need occasional restarts to maintain optimal performance and keep current with software updates. Thoughtful planning is needed to perform these restarts correctly and avoid disruption. In this instructional guide, we will walk you through the best practices for restarting a Pulsar server. Reasons for Restarting a Pulsar Server … Continue reading Instructions for Restarting a Pulsar Server
Published November 2023 Apache Kafka servers may need occasional restarts to maintain optimal performance and stability. There are four common reasons to restart your Kafka server. Configuration Changes. A restart may be necessary after adjusting topic or security settings. Software Updates. Bug fixes and new feature updates are essential for Kafka security and performance. A … Continue reading Instructions for Restarting a Kafka Server
Published November 2023 One intriguing feature of OpenSearch is its native support of multi-tenancy. But what exactly is multi-tenancy, and why is it so important? Let’s dive in. What is OpenSearch Multi-tenancy? Multitenancy is the ability of a single instance of software to serve many users (tenants). Each tenant operates in a dedicated, isolated environment. … Continue reading Understanding OpenSearch Multi-tenancy
An OpenSearch testing environment provides a safe space to validate changes and avoids costly disruptions to production.
Published October 2023 Not every company has a testing environment for Pulsar, but every company should. In this post, you’ll learn why a testing environment is crucial for Apache Pulsar, what it should encompass, when it should be employed, the errors it can help avert in production, and its inherent limitations. The Importance of a … Continue reading Why You Need a Testing Environment for Apache Pulsar
Published October 2023 Not every company has a testing environment for Kafka, but every company should. In this post, you’ll learn why a testing environment is indispensable for Apache Kafka, what it should include, when to use it, the problems it helps avoid, and its inherent limitations. Why a Kafka Testing Environment is Essential Implementing … Continue reading Why You Need a Testing Environment for Kafka
Published September 2023 Traditional batch processing systems, albeit effective for fixed datasets, are often ill-suited for real-time, high-throughput scenarios. Kafka Streams enables organizations to act on data as it arrives, making it particularly useful for applications that require immediate response. Kafka Streams is a client library for building real-time applications and microservices. It’s a part … Continue reading What is Kafka Streams?
Published September 2023 The choice of search platforms can significantly impact business operations. As experts in high-throughput implementations, we’ve witnessed the rise of various search platforms, each with its unique offerings. Today, we’ll explore OpenSearch and Google Cloud Search in depth, drawing parallels and highlighting differences. Introducing Opensearch & Cloud Search OpenSearch: A relatively new … Continue reading Comparing OpenSearch and Google Cloud Search
Published August 2023 We often get asked: “Can a Kafka consumer listen to multiple topics?” The short answer is: Yes, a Kafka consumer can listen to (or subscribe to) multiple topics. In this post we discuss examples where consumers would listen to multiple topics versus only one. And we address concerns about potential impacts on … Continue reading Kafka Consumers: The Power of Listening to Multiple Topics
Published August 2023 Optimizing Kafka broker performance will have a direct impact on your overall Kafka implementation. In this blog post, we cover what a Kafka broker does, why it’s essential, and how to optimize Kafka broker settings. We also share firsthand experiences optimizing Kafka brokers from our work with Fortune 500 companies. We encourage … Continue reading Optimizing Kafka Brokers: Lessons From Managing Fortune 500 Implementations
Published June 2023 You can test if Elasticsearch is running using the curl tool. You can download curl here. Input You’ll notice we used port 9200. That’s the default port for Elasticsearch. We also use localhost. You could alternatively use the local IP address of your machine, for example 192.168.1.1. If you get a response … Continue reading How to Check if Elasticsearch is Running
Published May 2023 Small, medium, and large OpenSearch clusters require different approaches for optimization. Dattell’s engineers are expert at designing, optimizing, and maintaining OpenSearch. Find out more about our OpenSearch support services. Optimizing a Small OpenSearch Cluster The minimum number of nodes for a small, highly available OpenSearch cluster is three (3). Three nodes might … Continue reading OpenSearch Cluster Optimization for Small, Medium, and Large Clusters
Published April 2023 Dattell specializes in support for open source software: OpenSearch, Kafka, and Pulsar. The benefits of using open source software are vast. Learn about the top 10 reasons companies choose open source software. #1 Lower Costs Open source software is often available for free, allowing companies to save on software licensing fees. For … Continue reading Top 10 Reasons to Use Open Source Software
Published March 2023 Often companies find themselves with data stored in disparate data stores. The use of legacy systems and subdivisions within a company using their own systems are two common reasons for this scattered data storage. In this article we explain 10 reasons why it is important and beneficial for companies to unify data … Continue reading Top 10 Reasons to Unify Data Into a Single Data Store
Updated August 2023 Anyone in charge of ensuring their company’s data pipeline has the following five priorities in mind: reliability, security, speed, cost, and ownership. In this article we discuss how enterprise managed OpenSearch provides peace of mind, especially having someone to call when a cluster fails in the middle of the night. And we … Continue reading Enterprise Managed OpenSearch
Updated August 2023 Optimizing OpenSearch for shard size is an important component for achieving maximum performance from your cluster. OpenSearch shards enable parallelization of data processing across both a single node and multiple OpenSearch nodes. OpenSearch automatically manages the allocation of shards within the nodes. However, choosing the number of shards needed is up to … Continue reading OpenSearch Shard Optimization
Updated August 2023 OpenSearch includes a plugin for vector search. In this post, we introduce vector search and compare the different methods available. We will also point you in the right direction for example code. For personalized help, contact us to learn more about our OpenSearch support services. What is vector search? Here’s the … Continue reading Vector Search for OpenSearch
Updated August 2023 Pulsar and Kafka achieve the same result. They both guarantee messages reach their intended destination(s). Yet, there are important differences between the two message queues. These differences can make one of the technologies a better fit, depending on your use case. In this post we cover 8 ways in which Apache Kafka … Continue reading Kafka vs Pulsar
Published August 2022 Nearly all of our clients and a majority of companies are using the cloud for at least a portion of their infrastructure. It’s important for companies to plan for cloud outages to minimize the damage caused by them. In this post we will cover how to minimize damage and recover quickly after … Continue reading Preparing for a Cloud Outage
Updated January 2023 With OpenSearch originating as a fork from Elasticsearch, the two databases can appear to be near-identical to the unacquainted. However, they are unique, becoming more so with each new update. Here we will discuss how the two search engines compare when it comes to security, licensing, core features, documentation, community support, dashboards, … Continue reading OpenSearch vs. Elasticsearch
Published July 2022 Our team of engineers has been architecting, optimizing, and managing Elasticsearch for over 6 years. We’ve found that there are common questions that new clients have about Elasticsearch support services. Below is a list of a few of the most common questions inquiring new clients have when they reach out. Let us … Continue reading Elasticsearch Support Services FAQ
Published July 2022 With companies revisiting their budgets to brace for a possible recession, now is the time to review your data storage costs and find places to reduce those fees without sacrificing performance. In this article we consolidate our top tips for saving money on data storage costs. From the top we want to … Continue reading How to Save Money on Data Storage Costs
Published June 27, 2022 Data engineering is the field dedicated to building data infrastructure to ingest, process, and store large amounts of data. This is a quickly growing field, with both the number of jobs in data engineering and the number of tools on the market steadily increasing. Despite the popularity of data engineering as … Continue reading Data Engineering Study
Published June 2022 Virtual CIOs provide the leadership and expertise to build, grow, and maintain reliable data architecture. They are often hired by midsized companies that are looking for a trusted authority to drive data architecture and the supporting team. Virtual CIOs are also referred to as vCIOs, fractional CIOs, part-time CIOs, and CIOs for … Continue reading What is a Virtual CIO?
Updated May 2022 OpenSearch is an open source search and analytics software. It’s a community led project with Amazon Web Services (AWS) leading the development. It was first created as a fork from Elasticsearch 7.10.2 and Kibana 7.10.2 in 2021. The OpenSearch search engine is simply referred to as OpenSearch, and the dashboard tool is … Continue reading What is OpenSearch?
Updated May 2022 If you found this post it’s likely because you got the Kafka member_id error. Let’s first cover why the error popped up and then go through two ways to resolve the error. Reason for the Kafka Member ID Error When a new consumer joins a group it enters with the member.id set … Continue reading How to fix the MEMBER_ID Error in Kafka
Updated March 2023 Elasticsearch is a distributed search and analytics engine. It is built on top of Apache Lucene. Elasticsearch was first released in 2010 by the company now known as Elastic. It was originally completely open source, but license changes have limited its usage. More on that below. Elasticsearch is part of a group … Continue reading Elasticsearch Basics: What it is, Licensing, Languages, and Getting Help
We outlined the four primary ways for backing up data and their benefits and drawbacks to help you decide on which approach best meets your company’s needs.
When we are driving, we are routinely making data-driven decisions using the gauges on our dashboard to guide us. Data-driven decision making should be just as easy when it comes to business.
With this guide, you will be able to define the business and technical requirements for your data platform, making the implementation process efficient and successful.
When designing a custom data architecture, business analytics, or operational intelligence platform for a client, four benefits of open source tools make them undoubtedly a better option in the vast majority of cases.
The implementation of a data handling platform, whether it is a centralized reporting system, Business Analytics, Operational Intelligence, or single point of truth for your company, will improve the way you make data-driven decisions.