Kafka Architecture Explained Broker Topic Partition and Zookeeper
Apache Kafka is a high-throughput, distributed event-streaming platform trusted by companies across the globe. Designed to handle large amounts of data in real-time, it powers applications requiring high scalability, fault tolerance, and performance. If you’re just stepping into the world of Kafka, its architecture might seem daunting, but fear not! This guide breaks it down into bite-sized, easy-to-understand components. We’ll also touch on how you can get started with Kafka using Java Spring Boot.
By the end of this guide, you’ll have a solid grasp of Kafka’s core building blocks, message flow, and practical steps to apply your knowledge.
What Is Kafka Architecture?
At its core, Kafka is a distributed system operating as a cluster that communicates through topics, partitions, and brokers. It is designed to reliably send, store, and process streams of data. To understand Kafka better, we need to explore its key components and their roles.
Key Components of Kafka
Let’s break down Kafka’s architecture and its essential components:
1. Broker
A Kafka broker acts as a mediator and handles incoming data streams. Kafka typically operates as a cluster with multiple brokers. Each broker stores a portion of the data and manages multiple partitions for scalability.
- Think of brokers as storage units in a warehouse where different sections contain chunks of information.
Features of Kafka brokers:
- They manage both message storage and retrieval.
- They handle partitioning and replication of data across the cluster for redundancy.
2. Topic
A Kafka topic is a category or stream of data. Producers send data to topics, while consumers fetch data from them.
- Topics are log-based and maintain messages sequentially.
- Topics can be divided into multiple partitions for parallel processing.
Example: A “sales” Kafka topic could store all events related to new transactions of an online store.
3. Partition
Each topic in Kafka is split into partitions, which allow Kafka to scale horizontally. Messages are distributed among partitions based on a specific key, ensuring data is balanced and processed efficiently.
Core traits:
- Each partition is a commit log inherently ordered by an offset number.
- Partitions are replicated across brokers for fault tolerance.
Example: If your topic is “customer-updates,” you might split it into two partitions:
- Partition 0 handles newsletter subscriptions.
- Partition 1 processes account updates.
4. Zookeeper or KRaft
Historically, Kafka maintained a dependency on Apache Zookeeper, a centralized service for coordinated synchronization. However, as Kafka evolved, a new architecture called KRaft (Kafka Raft) replaced Zookeeper for metadata management.
- Zookeeper:
- Handles the metadata for Kafka topics, partitions, and brokers.
- Monitors health and manages leader elections for partitions.
- KRaft:
- A native consensus mechanism integrated directly in Kafka for improved performance and simpler management.
If you’re just starting, many Kafka clusters still depend on Zookeeper, though KRaft is growing in adoption.
Message Flow in Kafka From Producer to Consumer
Kafka’s architecture revolves around the smooth flow of messages between producers (senders) and consumers (receivers). Here’s how it works:
- Producer Publishes Message: The producer decides to which topic and partition a message will be sent. The selection can be based on a key or a round-robin distribution.
- Broker Stores the Message: The Kafka broker takes the message, appends it to the specified topic and partition, and assigns it an offset.
- Consumer Fetches the Message: Consumers subscribe to a topic and constantly poll the broker for new events. Kafka ensures they receive messages in the same order as their offsets within a partition.
Partitioning and Replication Basics
Partitioning and replication are at the heart of Kafka’s scalability and resilience.
- Partitioning: Kafka divides topics into partitions, enabling parallel message production and consumption. With partition keys, producers can control which partition gets specific messages, ensuring logical grouping.
- Replication: Each partition is replicated across multiple brokers for fault tolerance. Kafka guarantees that one broker will serve as the leader of a partition, while the rest remain followers. If the leader goes down, a follower seamlessly takes over.
Example: If you replicate data 3 times, you’ll have a leader copy along with 2 backups.
Kafka’s Storage Model
Kafka brokers store data in logs corresponding to topic partitions. These logs retain messages for a configurable period, regardless of whether they are read.
Key highlights:
- Retention: Kafka retains messages for a set period (e.g., 7 days) or until the log size reaches a limit.
- Compaction: For cases like transactional data, Kafka can compact the log to keep only the latest value for a key.
Getting Started With Kafka Using Spring Boot
Now, let’s take the theory and make it practical! Here’s a simple setup to connect your Spring Boot application to Kafka.
Dependencies in your Maven POM file
Add Kafka dependencies to your pom.xml
file:
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
Producer Configuration
Create your producer configuration in a KafkaConfig
class:
@Configuration
public class KafkaConfig {
@Bean
public ProducerFactory<String, String> producerFactory() {
return new DefaultKafkaProducerFactory<>(producerConfig());
}
@Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
private Map<String, Object> producerConfig() {
Map<String, Object> config = new HashMap<>();
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
return config;
}
}
Sending Messages With a Kafka Producer
Use the KafkaTemplate
to send a message to a Kafka topic:
@RestController
@RequestMapping("/api/kafka")
public class KafkaController {
private final KafkaTemplate<String, String> kafkaTemplate;
public KafkaController(KafkaTemplate<String, String> kafkaTemplate) {
this.kafkaTemplate = kafkaTemplate;
}
@PostMapping("/publish")
public String publish(@RequestParam("message") String message) {
kafkaTemplate.send("my_topic", message);
return "Message published.";
}
}
Setting Up a Kafka Consumer
Enable Kafka listeners in your application.yml
:
spring:
kafka:
consumer:
bootstrap-servers: localhost:9092
group-id: group_id
auto-offset-reset: earliest
listener:
topic-patterns: my_topic
Take the Next Step
Kafka is a game-changer for real-time event streaming, offering scalability, fault-tolerance, and performance for modern applications. Whether you’re managing customer events or processing massive data streams, its architecture ensures seamless reliability.
Want hands-on experience? Set up a Kafka cluster using Spring Boot, and watch the magic happen.
I’ve updated the code snippets with proper formatting and syntax highlighting for better visuals. Let me know if there’s anything else you’d like to tweak!