TR-4947: Apache Kafka workload with NetApp NFS storage - Functional validation and performance

09/15/2025 Contributors

Shantanu Chakole, Karthikeyan Nagalingam, and Joe Scott, NetApp

Kafka is a distributed publish-subscribe messaging system with a robust queue that can accept large amounts of message data. With Kafka, applications can write and read data to topics in a very fast manner. Because of its fault tolerance and scalability, Kafka is often used in the big data space as a reliable way to ingest and move many data streams very quickly. Use cases include stream processing, website-activity tracking, metrics collection and monitoring, log aggregation, real time analytics, and so on.

Although normal Kafka operations on NFS work well, the silly rename issue crashes the application during the resizing or repartitioning of a Kafka cluster running on NFS. This is a significant issue because a Kafka cluster must be resized or repartitioned for load-balancing or maintenance purposes. You can find additional details here.

This document describes the following subjects:

The silly-rename problem and solution validation
Reducing CPU utilization to reduce the I/O wait time
Faster Kafka broker recovery time
Performance in the cloud and on-premises

Why use NFS storage for Kafka workloads?

Kafka workloads in production applications can stream huge amounts of data between applications. This data is held and stored in the Kafka broker nodes in the Kafka cluster. Kafka is also known for availability and parallelism, which it achieves by breaking topics into partitions and then replicating those partitions throughout the cluster. This eventually means that the huge amount of data that flows through a Kafka cluster is generally multiplied in size. NFS makes rebalancing data as the number of brokers changes very quick and easy. For large environments, rebalancing data across DAS when the number of brokers changes is very time consuming, and, in most Kafka environments, the number of brokers changes frequently.

Other benefits include the following:

Maturity. NFS is a mature protocol, which means most aspects of implementing, securing, and using it are well understood.
Open. NFS is an open protocol, and its continued development is documented in internet specifications as a free and open network protocol.
Cost-effective. NFS is a low-cost solution for network file sharing that is easy to set up because it uses the existing network infrastructure.
Centrally managed. Centralized management of NFS decreases the need for added software and disk space on individual user systems.
Distributed. NFS can be used as a distributed file system, reducing the need for removable media storage devices.

Why NetApp for Kafka workloads?

The NetApp NFS implementation is considered a gold standard for the protocol and is used in countless enterprise NAS environments. In addition to the credibility of NetApp, it also offers the following benefits:

Reliability and efficiency
Scalability and performance
High availability (HA partner in a NetApp ONTAP cluster)
Data protection
- Disaster recovery (NetApp SnapMirror). Your site goes down or you want to jump start at a different site and continue from where you left off.
- Manageability of your storage system (administration and management using NetApp OnCommand).
- Load balancing. The cluster allows you to access different volumes from data LIFs hosted on different nodes.
- Nondisruptive operations. LIFs or volume moves are transparent to the NFS clients.

TR-4947: Apache Kafka workload with NetApp NFS storage - Functional validation and performance

Creating your file...

Why use NFS storage for Kafka workloads?

Why NetApp for Kafka workloads?