Kafcache: Memcached and Kafka Streams

Introducing Kafcache a Memcached Kafka state store bridge for processing topologies in which low-latency matters. At TokenAnalyst we take low-latency seriously and use Memcached for our machine learning model application, for instance labelling, but also for lookup intensive data transformations.

Head over to the Kafcache repository and try it out.

Kafka Streams

Kafka Streams is a wonderful library for stream processing building on top of a resilient Kafka cluster. Kafka Streams applications, think of it as stream processors or microservices, consume from topics and write to Kafka topics. Either stateless or stateful. For the latter, one can persist state with an out-of-the-box RocksDB or in-memory backend. RocksDB writes state to disk which can, depending on the setup, survive a restart of the Kafka streams application or restart of the host instance.

Why does it matter?

However, in certain scenarios, writing to disk is simply too slow and results in higher processing latency. At TokenAnalyst, we take low latency serious and try to minimize performance bottlenecks where possible. We're running our processing services on memory-optimized AWS instance storing our full state in memory. Instead of using the provided in-memory implementation we're experimenting with a epheremal memcached storage backend, I call the bridge library Kafcache. The provided Kafka Streams in-memory implementation is designed for small state that can easily be held in the JVM heap space. However, our state ranges easily from 100GiB to 200GiB.

Kafcache

Kafcache is built on ScalaCache allowing other supported cache backends, such as Redis, as well. On the Kafka side, it's implemented using the KeyValueStore backend serializing data with ByteArray serdes for key and values. Due to the basic nature of memcached operations, range queries e.g. for window queries are not implemented. As an effect, the memcached backend cannot be used for windowed KTables.