I Made Our System 30% Faster by Actually Reading the Docs (Kafka + Elasticsearch Speedrun)

Backend•March 15, 2025•8 min read

So our data ingestion was basically trash. Like, embarrassingly bad. We were processing messages one by one like it's 2010, and our Elasticsearch was crying every time we sent it a single document. Classic rookie mistake, but hey, we've all been there.

The problem? We were being inefficient AF. Every single message from Kafka was getting processed individually, and every document was hitting Elasticsearch solo. That's like... ordering one item at a time from Amazon instead of filling up your cart. Just painful to watch.

Here's what was actually happening:

Kafka Consumption: We had concurrency set to 1 (lol why?) and were processing messages like we're afraid of parallelism. Spoiler alert: we weren't.
Elasticsearch Writes: Single document writes everywhere. The bulk API was just sitting there, unused, probably judging us.

So I did what any reasonable dev would do - I RTFM'd and actually implemented batching properly:

Kafka side: Bumped up to 10k messages per batch, cranked concurrency to 10. Suddenly we're processing 22k messages/second per pod. Not bad for an afternoon's work.
Elasticsearch side: Built a proper batching layer that collects documents before yeeting them to ES using the bulk API. Because apparently that's what it's for.

The results were honestly satisfying:

10 million records went from 12+ hours to under 1 hour (I may have done a little victory dance)
System throughput up 30%+ across the board
Infrastructure costs down because we went from 64 pods to 16 (finance team loved this)

The real lesson? Batching isn't just some fancy optimization - it's literally how you're supposed to do things at scale. Every operation has overhead, so group them up and send them together. Your future self (and your infrastructure bill) will thank you.

Also, maybe read the docs before implementing things. Just a thought.

References

Confluent - Optimize Clients for Throughput - Official guide on producer batching optimization
Elasticsearch - Tune for Indexing Speed - Official ES performance tuning documentation
GeeksforGeeks - Elasticsearch Bulk API Guide - Comprehensive bulk operations tutorial
Redpanda - Kafka Performance Tuning - Deep dive into batching strategies
New Relic - Tuning Kafka Consumers - Consumer optimization best practices

I Made Our System 30% Faster by Actually Reading the Docs (Kafka + Elasticsearch Speedrun)

References

More Random Thoughts

19 Raw SQL Hacks That Saved Me From Ugly Dashboards and Crying ETL Runs

Built a Kids' App, Hit $900 MRR, Almost Burned Out (A 12-Week Reality Check)