Solr vs Elasticsearch


Updated October 2020 Both Solr and Elasticsearch are popular open source search engines built on top of Lucene.  This article is intended to help readers learn more about the technologies in relation to one another to guide technology decisions. Quick Reference Comparison of Elasticsearch vs Solr As far as speed and performance go, Elasticsearch and … Continue reading Solr vs Elasticsearch

How to Index Elasticsearch


Updated October 2020 An Index in Elasticsearch is used to both organize and distribute data within a cluster.  In this post we will define both components of an Index and then outline how to create, add to, delete, and reindex Indicies in Elasticsearch.  We will also touch on querying, but querying will be covered in … Continue reading How to Index Elasticsearch

Kafka Uses Consumer Groups for Scaling Event Streaming


Updated September 2020 Apache Kafka is a distributed messaging system that implements pieces of the two traditional messaging models, Shared Message Queues and Publish-Subscribe.  Both Shared Message Queues and Publish-Subscribe models present limitations for handling high throughput use cases.   Apache Kafka provides fault tolerant, high throughput stream processing that can handle even the most complicated … Continue reading Kafka Uses Consumer Groups for Scaling Event Streaming

Kafka Case Studies


Updated August 2020 Apache Kafka’s high throughput and high availability make its applications vast.  In this post we dive into eight Kafka case studies.  These accounts are taken from work our Kafka solutions architects have done in the field with our clients. Medical Manufacturing Client automating the drug manufacturing process with multiple machines needs Kafka … Continue reading Kafka Case Studies

Elasticsearch Definitions


Updated June 2020 Taking a break from Elasticsearch optimization posts to get back to the basics to define fundamental Elasticsearch concepts. Elasticsearch Definitions:  A Primer for Elasticsearch Fundamentals Elasticsearch Node.  An Elasticsearch node is a single Elasticsearch process, and the minimum number of nodes for a highly available Elasticsearch cluster is three. Continue reading about … Continue reading Elasticsearch Definitions

Kafka Definitions


Updated September 2020 Taking a break from Kafka optimization posts to get back to the basics of Apache Kafka and define fundamental Kafka concepts. Kafka Definitions:  A Primer for Apache Kafka Fundamentals Kafka Producer.  A Kafka producer is a standalone application, or addition to your application, that sends data to Kafka broker(s). Kafka Broker.  A … Continue reading Kafka Definitions

Kafka Consumer Optimization


Updated May 2020 Kafka Consumer’s Role. The role of the Kafka consumer is to read data from Kafka.  Kafka consumer optimization can help avoid errors and increase performance of your application.   While the focus of this blog post is on the consumer, we will also review several broker configurations which affect the performance of consumers. Top … Continue reading Kafka Consumer Optimization

What is a Kafka Topic?


Updated August 2019 Kafka organizes message feeds into categories called topics. Each topic has a name that is unique across the entire Kafka cluster. Messages are sent to and read from specific topics.  In other words, producers write data to topics, and consumers read data from topics.  Kafka topics are multi-subscriber.  This means that a … Continue reading What is a Kafka Topic?

Open Source Monitoring for Kafka


Updated September 2020 The key to ensuring Kafka uptime and maintaining peak performance is through monitoring.  By reviewing disk performance, memory usage, CPU, network traffic, and load in real-time abnormal metrics or trends can be identified before a performance dip or outage occurs.  Furthermore, monitoring Kafka provides assurance to your users that all messages are correctly … Continue reading Open Source Monitoring for Kafka

Load Balancing With Kafka


Updated September 2020 What is Kafka loading balancing? Load balancing with Kafka is a straightforward process and is handled by the Kafka producers by default.  While it isn’t traditional load balancing, it does spread out the message load between partitions while preserving message ordering. Round-robin approach:  By default, producers choose the partition assignment for each … Continue reading Load Balancing With Kafka

Kafka Use Cases


Updated August 2020 Apache Kafka is a high-throughput, open source message queue used by Fortune 100 companies, government entities, and startups alike. Part of Kafka’s appeal is its wide array of use cases.  In this post we will outline several of Kafka’s uses cases from event sourcing to tracking web activities to metrics and more. … Continue reading Kafka Use Cases

Performance Tuning for Apache Kafka


For Apache Kafka performance tuning measure latency and throughput for your Kafka implementation. Latency is the measure of how long it takes Kafka to process a single event. Throughput is the measure of how many events arrive within a particular period of time.

Elasticsearch Shards — Definitions, Sizes, Optimizations, and More


Updated October 2020 Optimizing Elasticsearch for shard size is an important component for achieving maximum performance from your cluster. To get started let’s review a few definitions that are an important part of the Elasticsearch jargon. If you are already familiar with Elasticsearch, you can continue straight to the next section. Defining Elasticsearch Jargon:  Cluster, … Continue reading Elasticsearch Shards — Definitions, Sizes, Optimizations, and More

Elasticsearch Optimization for Small, Medium, and Large Clusters


Updated September 2020 The way nodes are organized in an Elasticsearch cluster changes depending on the size of the cluster.  For small, medium, and large Elasticsearch clusters there will be different approaches for optimization. Dattell’s team of engineers are expert at designing, optimizing, and maintaining Elasticsearch implementations and supporting technologies.  Find our more about our … Continue reading Elasticsearch Optimization for Small, Medium, and Large Clusters

The California Consumer Privacy Act of 2018 (CCPA): How to Prevent Data Breaches


Updated December 2018 Earlier this year, California passed the California Consumer Privacy Act of 2018, or CCPA for short. Beginning in January 2020, companies will be required to comply with this new law. It places new restrictions on how companies handle personal data, including minimum damages for class action suits in response to data breaches. … Continue reading The California Consumer Privacy Act of 2018 (CCPA): How to Prevent Data Breaches

Is my data valuable?


Yes, your data is valuable. However, like oil in the ground, its value isn’t fully realized until it is cleaned up and processed. And just as crude oil can be valuable for transportation, plastic manufacturing, and heating, company data too can be processed to extract multiple layers of value. In this post, we will discuss the ways in which your data can provide value for your business and customers.

Kafka Monitoring With Elasticsearch and Kibana


Monitoring Kafka cluster performance is crucial for diagnosing system issues and preventing future problems. We recommend using Elasticsearch for Kafka monitoring because Elasticsearch is free and highly versatile as a single source of truth throughout any organization.

Frequently Asked Questions: Apache Kafka


Our team is experienced with implementing and fixing Kafka on a wide-range of systems for an even wider-range of business needs. From our real-world experience with Kafka consulting, we found that there are common questions that many new clients have about the technology.
Here are some quick answers to those questions.

4 Approaches to Data Backup


We outlined the four primary ways for backing up data and their benefits and drawbacks to help you decide on which approach best meets your company’s needs.

Kafka Optimization


Issues with Apache Kafka performance are directly tied to system optimization and utilization. Here, we compiled the best practices for a high volume, clustered, general use case.

When to Consider Physical and Logical Separation With Kafka


When companies scale, their data handling needs change, and systems that worked a year ago are now over-taxed with the increase in message volume. One particular component of the data handling system, the cluster architecture, should be revisited.