Optimizing Kafka Streams State Store with RocksDB

Kafka Streams is a powerful library for building real-time stream processing applications on Apache Kafka. One of its key features is local state management using RocksDB, a high-performance embedded key-value store.

While Kafka Streams works well out of the box, tuning RocksDB can significantly improve performance, especially for high-throughput or state-heavy workloads.


Why Tune RocksDB?

By default, RocksDB is optimized for general use. However, as your Kafka Streams application grows in complexity or volume, you might observe:

  • Increased disk I/O
  • Latency spikes
  • High memory usage
  • Compaction overhead

RocksDB exposes a variety of configuration options to help address these issues.


Key RocksDB Configurations

Kafka Streams allows you to customize RocksDB by implementing the RocksDBConfigSetter interface.

1. Increase Write Buffer Size

Increases the size of the in-memory write buffer (memtable) before flushing to disk.

options.setWriteBufferSize(64 * 1024 * 1024L); // 64MB

2. Use More Write Buffers

Allows RocksDB to keep multiple memtables in memory to reduce write stalls.

options.setMaxWriteBufferNumber(4);

3. Enable Dynamic Level Compaction

Improves compaction performance by dynamically adjusting level sizes.

options.setLevelCompactionDynamicLevelBytes(true);

4. Add Bloom Filters

Speeds up key lookups, especially beneficial for point queries.

BlockBasedTableConfig tableConfig = new BlockBasedTableConfig();
tableConfig.setFilterPolicy(new BloomFilter(10, false));
options.setTableFormatConfig(tableConfig);

5. Set Block Cache Size

Caches frequently accessed data blocks in memory to reduce disk reads.

tableConfig.setBlockCacheSize(128 * 1024 * 1024L); // 128MB

Full Example: Custom RocksDBConfigSetter

import org.springframework.util.unit.DataSize;

public class CustomRocksDBConfig implements RocksDBConfigSetter {
    @Override
    public void setConfig(final String storeName, final Options options, final Map<String, Object> configs) {
        options.setWriteBufferSize(DataSize.ofMegabytes(64).toBytes());
        options.setMaxWriteBufferNumber(4);
        options.setLevelCompactionDynamicLevelBytes(true);

        BlockBasedTableConfig tableConfig = new BlockBasedTableConfig();
        tableConfig.setBlockCacheSize(DataSize.ofMegabytes(128).toBytes());
        tableConfig.setFilterPolicy(new BloomFilter(10, false));
        options.setTableFormatConfig(tableConfig);
    }
}

Then register the config class in your Kafka Streams configuration:

props.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, CustomRocksDBConfig.class);

Monitoring and Best Practices

  • Measure before and after tuning – Benchmark performance to evaluate impact.
  • Monitor key metrics – Use JMX or external tools to track RocksDB stats like compactions, write stalls, and block cache hit ratios.
  • Adjust based on workload – Tune differently for write-heavy, read-heavy, or large-window aggregations.
  • Mind your environment – Cloud or container-based deployments may require adjusting IOPS, memory limits, or volume sizes.

Further Reading

(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, CustomRocksDBConfig.class);

Monitoring and Best Practices

  • Measure before and after tuning – Benchmark performance to evaluate impact.
  • Monitor key metrics – Use JMX or external observability tools.
  • Be mindful of disk and memory limits – Especially on containerized or cloud deployments.

Conclusion

Tuning RocksDB for Kafka Streams can significantly boost your application’s performance and scalability. With the right combination of buffer sizes, cache settings, and compaction strategies, you can reduce latency, lower resource usage, and improve throughput for your stream processing pipelines.


Further Reading

Happy streaming! 🚀

Written on April 8, 2025