Apache Kafka is an open-source distributed publish-subscribe system, which is widely used in data centers for messaging between applications, log aggregation, and stream processing. The existing Kafka implementation uses TCP/IP for communication, which has various inefficiencies such as a high message dispatch cost due to OS involvement and excessive memory copies. Recently, the
availability of cost-effective RDMA-capable network controllers within data centers and cloud infrastructures have encouraged
many modern applications to adopt RDMA networking, which offers the potential to outperform classical TCP/IP. We introduce
KafkaDirect, an extension to Apache Kafka, that uses RDMA to accelerate the three most network intensive datapaths: record production,
record replication, and record consumption. In this work, we explore the design choices including which RDMA operations to
use to take full advantage of offloaded communication. Our RDMA design relies on one-sided RDMA requests to attain true zero-copy
communication completely avoiding the need for using intermediate buffers in Kafka servers, thereby ensuring low latency and high
throughput communication. KafkaDirect can offer up to 9x increase in throughput for both Kafka producers and Kafka consumers, and
can provide 4x and 50x reduction in latency for Kafka producers and Kafka consumers, respectively.
@inproceedings{, author={Konstantin Taranov and Steve Byan and Virendra Marathe and Torsten Hoefler}, title={{KafkaDirect: Zero-copy Data Access for Apache Kafka over RDMA Networks}}, year={2022}, month={Jun.}, booktitle={Proceedings of the 2022 ACM SIGMOD International Conference on Management of Data}, source={http://www.unixer.de/~htor/publications/}, }