← Back to all posts

I Made Our System 30% Faster by Actually Reading the Docs (Kafka + Elasticsearch Speedrun)

BackendMarch 15, 20258 min read
I Made Our System 30% Faster by Actually Reading the Docs (Kafka + Elasticsearch Speedrun)

So our data ingestion was basically trash. Like, embarrassingly bad. We were processing messages one by one like it's 2010, and our Elasticsearch was crying every time we sent it a single document. Classic rookie mistake, but hey, we've all been there.

The problem? We were being inefficient AF. Every single message from Kafka was getting processed individually, and every document was hitting Elasticsearch solo. That's like... ordering one item at a time from Amazon instead of filling up your cart. Just painful to watch.

Here's what was actually happening:

  1. Kafka Consumption: We had concurrency set to 1 (lol why?) and were processing messages like we're afraid of parallelism. Spoiler alert: we weren't.
  2. Elasticsearch Writes: Single document writes everywhere. The bulk API was just sitting there, unused, probably judging us.

So I did what any reasonable dev would do - I RTFM'd and actually implemented batching properly:

  • Kafka side: Bumped up to 10k messages per batch, cranked concurrency to 10. Suddenly we're processing 22k messages/second per pod. Not bad for an afternoon's work.
  • Elasticsearch side: Built a proper batching layer that collects documents before yeeting them to ES using the bulk API. Because apparently that's what it's for.

The results were honestly satisfying:

  • 10 million records went from 12+ hours to under 1 hour (I may have done a little victory dance)
  • System throughput up 30%+ across the board
  • Infrastructure costs down because we went from 64 pods to 16 (finance team loved this)

The real lesson? Batching isn't just some fancy optimization - it's literally how you're supposed to do things at scale. Every operation has overhead, so group them up and send them together. Your future self (and your infrastructure bill) will thank you.

Also, maybe read the docs before implementing things. Just a thought.