How We Reduced Elasticsearch Index Size by 60%
Without Losing Query Performance

How We Reduced Elasticsearch Index Size by 60% Without Losing Query Performance

As data volumes grow, managing the cost and performance of Elasticsearch becomes a critical challenge. One of our recent projects involved helping a client reduce their Elasticsearch index size by over 60% while maintaining, and in some cases improving, query performance.

Here’s how we did it, and how you can apply these strategies to your own clusters.

Revisit Your Mappings

One of the biggest causes of bloated indices is overly dynamic or unnecessary field mapping. Elasticsearch will automatically try to map every field in an incoming document unless explicitly told otherwise, which can create many unnecessary or redundant fields that waste space and increase index complexity.

We addressed this by:

Eliminating fields that were mapped but never queried, reducing index size and speeding up indexing.
Disabling _all and setting index: false on metadata fields only used at ingest.
Defining field mappings explicitly to avoid Elasticsearch creating multiple analyzed and non-analyzed versions of the same field.

Why It Helps

The Tradeoff

Reduces index size, improves write speed, lowers fielddata/cache usage, and simplifies queries.

Less flexibility if document structures change often—may require schema updates.

Why It Helps

The Tradeoff

Reduces index size, improves write speed, lowers fielddata/cache usage, and simplifies queries.

Less flexibility if document structures change often—may require schema updates.

Why It Helps

Reduces index size, improves write speed, lowers fielddata/cache usage, and simplifies queries.

The Tradeoff

Less flexibility if document structures change often—may require schema updates.

Switch to Keyword-Only Where Appropriate

Many users default to text for every string, but if you’re not doing full-text search on a field, keyword is the better choice. It stores values as-is and is optimized for filtering, sorting, and aggregations.

We reviewed field usage and:

Replaced text fields with keyword for fields used only in filters, terms aggregations, or visualizations.

Why It Helps

The Tradeoff

keyword fields are smaller and faster to query in aggregations.

You lose support for full-text search on those fields, so be deliberate.

Why It Helps

The Tradeoff

keyword fields are smaller and faster to query in aggregations.

You lose support for full-text search on those fields, so be deliberate.

Why It Helps

keyword fields are smaller and faster to query in aggregations.

The Tradeoff

You lose support for full-text search on those fields, so be deliberate.

Normalize Data at Ingest

Before data reached Elasticsearch, we transformed high-cardinality or verbose string fields into compact, normalized values.

Transformed verbose labels into enum codes.
Truncated timestamps to minute-level precision where applicable.
Flattened nested arrays that weren’t queried as nested structures.

Why It Helps

The Tradeoff

Smaller documents lead to fewer disk writes, better caching efficiency, and lower heap usage.

You may need to maintain mappings between enums and their labels at the application layer.

Why It Helps

The Tradeoff

Smaller documents lead to fewer disk writes, better caching efficiency, and lower heap usage.

You may need to maintain mappings between enums and their labels at the application layer.

Why It Helps

Smaller documents lead to fewer disk writes, better caching efficiency, and lower heap usage.

The Tradeoff

You may need to maintain mappings between enums and their labels at the application layer.

Use Index Templates with Efficient Settings

We standardized index configurations using templates that applied best practices automatically:

Targeted 10-30GB shard sizes to reduce overhead and improve merge efficiency.
Reduced primary shard counts to better align with node count.
Applied best_compression codec for older, low-access indices.

Why It Helps

The Tradeoff

Right-sizing shards improves performance and stability. Compression significantly reduces disk usage.

Compression can slightly slow indexing and retrieval; only use it where query speed isn’t critical.

Why It Helps

The Tradeoff

Right-sizing shards improves performance and stability. Compression significantly reduces disk usage.

Compression can slightly slow indexing and retrieval; only use it where query speed isn’t critical.

Why It Helps

Right-sizing shards improves performance and stability. Compression significantly reduces disk usage.

The Tradeoff

Compression can slightly slow indexing and retrieval; only use it where query speed isn’t critical.

Rethink Retention & Lifecycle Policies

We implemented ILM (Index Lifecycle Management) to manage data across hot, warm, and cold phases:

Rolled over indices daily instead of hourly to reduce shard count.
Automatically shrank indices in the warm phase.
Deleted stale data in the cold phase according to compliance policy.

Why It Helps

The Tradeoff

Keeps hot data fast and compact, reduces operational burden, and controls storage growth.

Misconfigured policies can delete or freeze data prematurely—requires careful planning.

Why It Helps

The Tradeoff

Keeps hot data fast and compact, reduces operational burden, and controls storage growth.

Misconfigured policies can delete or freeze data prematurely—requires careful planning.

Why It Helps

Keeps hot data fast and compact, reduces operational burden, and controls storage growth.

The Tradeoff

Misconfigured policies can delete or freeze data prematurely—requires careful planning.

Monitor with Metrics

We continuously tracked index health using:

_cat/indices and _stats for size, segment count, and doc count.
Kibana dashboards for tracking field cardinality and ingest rate.
Alerts for unusual segment merging behavior or sudden shard growth.

Why It Helps

The Tradeoff

Real-time observability enables proactive tuning and fast root-cause analysis.

Requires regular review and tuning of dashboards and alerts to remain useful.

Why It Helps

The Tradeoff

Real-time observability enables proactive tuning and fast root-cause analysis.

Requires regular review and tuning of dashboards and alerts to remain useful.

Why It Helps

Real-time observability enables proactive tuning and fast root-cause analysis.

The Tradeoff

Requires regular review and tuning of dashboards and alerts to remain useful.

The Results

Across the board, these changes:

Reduced total storage usage by 61%
Improved query latency on common dashboards by 25%
Cut down index recovery time after node restarts

By focusing on data modeling, mapping hygiene, and lifecycle strategy, we delivered measurable ROI without sacrificing performance.

Need help optimizing your Elasticsearch deployment?
Schedule a free consultation today.

✅ Schedule a call.

24x7 Elasticsearch Support & Managed Services

Visit our Elasticsearch page for more details on our support services.

How We Reduced Elasticsearch Index Size by 60% Without Losing Query Performance