Kafka on Kubernetes – Stateful Spring Boot Configuration
Running Apache Kafka on Kubernetes combines Kafka’s event-streaming capabilities with Kubernetes’ scalability and orchestration features. However, operating Kafka in this environment requires careful configuration to ensure reliable, scalable, and performant deployments. From StatefulSets and persistent storage to tools like Helm charts and the Strimzi operator, this guide covers everything you need to know.
This article includes practical Spring Boot examples to integrate Kafka applications, along with Kubernetes deployment best practices for high availability (HA).
Table of Contents
- Introduction to Kafka on Kubernetes
- StatefulSet Configuration
- Storage Considerations for Kafka
- Helm Charts and Strimzi Operator
- Best Practices for High Availability
- Spring Boot Kafka Integration Example
- External Resources for Further Learning
- Final Thoughts
Introduction to Kafka on Kubernetes
Why Run Kafka on Kubernetes?
Kubernetes provides a highly scalable, fault-tolerant platform for deploying distributed systems like Kafka. By leveraging Kubernetes features like StatefulSets, persistent volumes, and custom resource operators, you can deploy Kafka clusters that are easier to manage and scale dynamically.
Key advantages include:
- Automated Scaling with Kubernetes’ Horizontal Pod Autoscaler (HPA).
- Resilience through self-healing mechanisms for failed pods.
- Portability across cloud providers or on-premise environments.
For a high-level overview, check out the Apache Kafka Wikipedia page.
StatefulSet Configuration
Kafka brokers require unique identities and stable storage. Kubernetes’ StatefulSets are designed to meet these requirements, ensuring that each Kafka broker:
- Maintains a consistent network identifier (e.g.,
kafka-0
,kafka-1
). - Retains state across restarts with persistent volumes.
Sample StatefulSet Definition for Kafka
Here’s an example Kubernetes StatefulSet configuration for a 3-broker Kafka deployment:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: kafka
labels:
app: kafka
spec:
replicas: 3
serviceName: kafka-headless
selector:
matchLabels:
app: kafka
template:
metadata:
labels:
app: kafka
spec:
containers:
- name: kafka
image: confluentinc/cp-kafka:7.2.0
ports:
- containerPort: 9092
env:
- name: KAFKA_BROKER_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: KAFKA_ZOOKEEPER_CONNECT
value: "zookeeper-headless:2181"
- name: KAFKA_ADVERTISED_LISTENERS
value: "PLAINTEXT://kafka-0.kafka-headless.default.svc.cluster.local:9092"
volumeMounts:
- name: kafka-storage
mountPath: /var/lib/kafka/data
volumeClaimTemplates:
- metadata:
name: kafka-storage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Key Points
- Headless Service:
kafka-headless
is used to provide unique DNS entries for each pod (e.g.,kafka-0.kafka-headless
). - Persistent Volumes: Attached using
volumeClaimTemplates
to retain data. - Broker Identity: The
KAFKA_BROKER_ID
environment variable ensures each broker has a unique ID.
Storage Considerations for Kafka
Kafka requires highly performant and reliable storage for message logs. On Kubernetes, this translates into Persistent Volumes (PVs) or Persistent Volume Claims (PVCs).
Storage Performance
- I/O Throughput: Use fast storage like SSDs, as Kafka is I/O intensive.
- Replication Factor: Set appropriate replication for high availability while balancing storage usage.
- Disk Size: Ensure your volumes are large enough to hold your topic partitions for an extended duration.
Storage Class Example
Below is an example of a StorageClass for dynamic PV provisioning on Kubernetes:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: kafka-storage-class
provisioner: k8s.io/minikube-hostpath
parameters:
type: pd-ssd
fsType: ext4
Attach this storage class to your volume template in the StatefulSet configuration.
Helm Charts and Strimzi Operator
Using Helm Charts
Helm simplifies the deployment of Kafka by providing pre-configured templates. To deploy Kafka with Helm:
- Add the Bitnami Kafka Helm repo:
helm repo add bitnami https://charts.bitnami.com/bitnami helm repo update
- Install the Kafka Helm chart:
helm install kafka bitnami/kafka --set replicas=3
- Monitor the pods and services:
kubectl get pods kubectl get svc
Find additional options in the Bitnami Kafka Helm documentation.
Strimzi Kafka Operator
Strimzi is a popular Kubernetes operator for deploying and managing Kafka clusters. Operators automate the complex lifecycle tasks (e.g., scaling, rolling updates, configuration management).
- Install Strimzi:
kubectl apply -f https://strimzi.io/install/latest?namespace=kafka
- Deploy a Kafka cluster:
kubectl apply -f kafka-cluster.yaml
Refer to the Strimzi documentation for customizations.
Best Practices for High Availability
Running Kafka in a highly available fashion on Kubernetes requires careful planning:
- Replication Factor: Set replication for partitions to
3
or higher, ensuring data redundancy. - Anti-Affinity Rules: Use pod anti-affinity to spread Kafka brokers across different nodes.
- Monitoring and Alerts: Integrate monitoring tools like Prometheus and Grafana to observe Kafka metrics.
- Failure Recovery: Enable Leader Election to reassign partition leaders in case of broker failures.
- Multi-Zone Deployment: Extend your cluster across availability zones to ensure failover capabilities.
Spring Boot Kafka Integration Example
Adding Dependencies
Add the following dependencies to handle Kafka in your Spring Boot application (Maven):
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
Configuring Kafka Properties
Set up Kafka connection properties for your Kubernetes-deployed Kafka cluster:
application.properties:
spring.kafka.bootstrap-servers=kafka-0.kafka-headless.default.svc.cluster.local:9092
spring.kafka.consumer.group-id=my-group
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer
Producer and Consumer Bean Configuration
Producer Configuration:
@Configuration
public class KafkaProducerConfig {
@Bean
public KafkaTemplate<String, String> kafkaTemplate(ProducerFactory<String, String> producerFactory) {
return new KafkaTemplate<>(producerFactory);
}
@Bean
public ProducerFactory<String, String> producerFactory() {
Map<String, Object> configs = new HashMap<>();
configs.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-0.kafka-headless.default.svc.cluster.local:9092");
configs.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
configs.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
return new DefaultKafkaProducerFactory<>(configs);
}
}
Consumer Example with @KafkaListener:
@Service
public class KafkaConsumerService {
@KafkaListener(topics = "topic-example", groupId = "my-group")
public void listen(String message) {
System.out.println("Received message: " + message);
}
}
This application connects to your Kubernetes-deployed Kafka cluster and listens to events from the configured topic.
External Resources for Further Learning
- Kubernetes StatefulSet Documentation
- Helm Charts for Kafka
- Strimzi Kafka Operator
- Apache Kafka Documentation
- Kafka on Wikipedia
Final Thoughts
Deploying Kafka on Kubernetes unlocks the full potential of distributed event streaming while leveraging Kubernetes’ scalability, resilience, and orchestration capabilities. By utilizing StatefulSets, Helm charts, or operators like Strimzi, you can simplify Kafka cluster management and scale seamlessly.
This article covered deploying Kafka, managing storage, integrating with Spring Boot, and implementing high-availability architectures. With the right tools and practices in place, you can confidently deploy and manage Kafka on Kubernetes for any production-grade workload.
Bookmark this guide for your future Kafka deployments!