Elasticsearch Cluster Optimization

Elasticsearch Optimization for Small, Medium, and Large Clusters

Updated January 2023

The way nodes are organized in an Elasticsearch cluster changes depending on the size of the cluster.  For small, medium, and large Elasticsearch clusters there will be different approaches for optimization.

Dattell’s team of engineers are expert at designing, optimizing, and maintaining Elasticsearch implementations and supporting technologies. Click here to learn more about Elasticsearch support services.

Optimizing a Small Elasticsearch Cluster

The most important concept to remember when creating a small, highly available Elasticsearch cluster is that the minimum number of nodes is three (3).  

At first three might seem like one more than you need. Afterall, all you need are two nodes for fault tolerance. 

However, two nodes aren’t enough for a distributed system.

Distributed systems, like Elasticsearch, have the potential for a split brain problem.

The split brain problem occurs when two different nodes both claim to be the master node.  This can happen if two nodes are separated from one another, i.e. a partition occurs.


Distributed systems, like Elasticsearch, have the potential for a split brain problem. 

If in a two node system one node drops out (Node A) the second node (Node B) continues to ingest data.  The split brain then occurs when Node A comes back online and claims it is the master while Node B has the correct data. Both Node A and Node B have 50% voting rights as to which has the correct data in the two-node scenario.

In order to determine which data set is correct, a majority vote must occur.  If you have a third node and require at least two nodes to vote for a master, you will eliminate the possibility for the split brain.

Optimizing a Medium Sized Elasticsearch Cluster

As your cluster size grows to about 8 or 12 nodes, it’s time to consider dedicating nodes to specific tasks. Doing this will allow you to optimize the nodes for the different tasks.

Data Nodes (Hot/Warm) – Data nodes will be optimized for storage space and search with less compute power.  Further, data nodes can be split into hot nodes run on your fastest hardware and warm nodes that can be run on cheaper hardware like spinning disks.

Master Nodes – These nodes are important for maintaining a consistent view of the cluster. Typically you want to have three (3) master nodes.  Make sure to scale the RAM for the dedicated master nodes to keep pace with the growth of the cluster because every shard will take a finite amount of space on the master node(s).

Coordinator Nodes – Coordinator nodes can be important for improving performance by reducing the burden on data nodes.  Coordinator nodes are aptly named as they coordinate execution queries or load balance executive queries.

Ingest Nodes – Ingest nodes can be used like a “lightweight” Logstash to run ETL transformations on newly indexed data.

Machine Learning Nodes – These nodes will be the most compute heavy of any of the node types.

Optimizing a Large Elasticsearch Cluster

With medium-to-large size clusters (40 nodes and above), you will want to start performance testing.  Tools like Rally allow you to test metrics like your ideal shard size, indexing time, cpu usage, and min/max throughput.

Once you reach the triple digit number of nodes it will get harder to manage the cluster.  For instance, if you have 30 teams logging data in a single Elasticsearch cluster and one team sends bad data to that cluster, then all 30 teams will be affected when the failure occurs.

Furthermore, Elasticsearch clusters don’t scale well past 400 TB because at that point the shard count generally starts to introduce lag into the cluster.

As your node reaches 400 TB, consider splitting up a large cluster into multiple smaller clusters.  

If you are concerned about being able to search information across clusters when you split up your nodes, fret not. Cross Cluster Search (previously Tribe Nodes) allows you to search across multiple Elasticsearch clusters.

Elastic Stack Consulting Services

If you are interested in 24/7 support, consulting, and/or fully managed Elasticsearch services on your environment, you can find more information on our Elasticsearch consulting page.

Schedule a call with an Elastic Stack engineer.

Published by

Dattell - Kafka & Elasticsearch Support

Benefit from the experience of our Kafka, Pulsar, Elasticsearch, and OpenSearch expert services to help your team deploy and maintain high-performance platforms that scale. We support Kafka, Elasticsearch, and OpenSearch both on-prem and in the cloud, whether on stand alone clusters or running within Kubernetes. We’ve saved our clients $100M+ over the past six years. Without our guidance companies tend to overspend on hardware or purchase unnecessary licenses. We typically save clients multiples more money than our fees cost in addition to building, optimizing, and supporting fault-tolerant, highly available architectures.

Leave a Reply