How we stream Kafka data into Elasticsearch with millisecond latency
How we stream Kafka data into Elasticsearch with millisecond latency
How we stream Kafka data into Elasticsearch with millisecond latency
Streaming Kafka data into Elasticsearch at millisecond latencies takes more than flipping a connector switch. In this post, we share how we built a low-latency pipeline for a fintech client needing near-real-time searchability.
Disclaimer: We wrote this post to help guide readers on reducing latency for their use cases. However, keep in mind that each implementation will be unique, and what worked best here might not work best for your situation.
Pick the right connector
Choosing the wrong connector can add seconds of delay. We evaluated Kafka Connect vs. custom consumers. Kafka Connect with the Elasticsearch Sink Connector won on simplicity. We used flush.timeout.ms, linger.ms, and batch.size tuning to optimize throughput.
Use JSON or Avro with schemas
Consistent serialization reduces parsing overhead and prevents mapping explosions. We used Confluent Schema Registry and converted timestamps and enums at ingest. This yielded, faster indexing and fewer mapping errors.
Enable ingestion pipelines
Enabling ingestion pipelines keeps indexing fast while enriching data in-flight. We used ingest pipelines for transformation and enrichment:
- Added geo tags and user agent parsing
- Flattened nested data
Monitor lag and end-to-end latency
Monitoring pinpoints slow stages and supports SLAs. We have an entire post dedicated to monitoring Kafka with Elasticsearch. Check it out for a full background on what to monitor. For reducing latency it was important that we tracked:
- Kafka consumer lag
- Time from event creation to index appearance
- Elasticsearch ingest and refresh rates
Keep indexing fast
Lastly, we made a few optimizations to reduces indexing pressure and refresh contention.
- Set refresh_interval to 5s
- Used index templates with fast analyzers
- Shard sizing tuned to <30GB per shard
Summing it up
By carefully tuning Kafka Connect, serialization formats, and Elasticsearch indexing, we helped a client achieve millisecond-latency searchability across millions of events per day.
Need help building a real-time search pipeline? Let’s talk.
24x7 Kafka Support & Consulting
24x7 Kafka Support & Consulting
24x7 Kafka Support & Consulting
Visit our Apache Kafka® page for more details on our support services.