Understanding Kafka Topics: Configuration, Retention Policies, and Best Practices

Apache Kafka has become the go-to platform for managing high-throughput, real-time event streams. At the center of Kafka’s architecture is the concept of Kafka Topics. These topics are integral to organizing, storing, and processing data in a scalable way. But what are Kafka Topics exactly? How do you set them up efficiently while following industry best practices? And what configurations should you prioritize?

This comprehensive guide will answer all these questions and more. We’ll explore how Kafka Topics work, their key configuration parameters, retention policies, compaction versus deletion strategies, and effective naming conventions. You’ll also find Spring Boot code snippets to quickly integrate Kafka with your applications.

What Are Kafka Topics?

A Kafka Topic is essentially a logical channel or category where messages are stored. Producers write these messages to a topic, and consumers then read them. Topics are partitioned for scalability, enabling Kafka to handle massive amounts of data while ensuring fault tolerance.

Here’s an analogy to simplify it:

Think of a Kafka Topic as a massive filing cabinet.
Each partition within a topic is like a drawer in that cabinet.
Producers write into these drawers, while consumers pull data from them simultaneously.

Anatomy of Kafka Topics

Partitions enable data to be split and distributed across multiple brokers for parallel processing and high availability.
Producers send data to specific topics.
Consumers subscribe to topics, reading data from partitions.

For example, an e-commerce platform might use topics like orders, inventory_update, and user_activity to process and analyze user behavior in real time.

Key Takeaway

Partitions are fundamental to Kafka’s scalability. With proper design, you can accommodate large datasets and achieve high throughput across distributed systems.

Topic Creation and Configuration

Creating and configuring Kafka Topics can be done in several ways, including the Kafka CLI, programmatically through admin APIs, or automated during application startup. Below, we explore each approach.

1. Creating a Kafka Topic via CLI

The Kafka CLI gives you direct control over topic creation. Here’s how to create a topic named orders with five partitions and a replication factor of three:

bin/kafka-topics.sh --create \
--bootstrap-server localhost:9092 \
--replication-factor 3 \
--partitions 5 \
--topic orders

Replication Factor ensures fault tolerance by creating multiple copies of the data across brokers.
Partitions make it possible for Kafka to achieve parallelism.

2. Spring Boot YAML Configuration for Default Topic

If you’re using Spring Boot, you can define a topic directly in the application.yml:

spring.kafka.admin.properties:
bootstrap.servers:
- localhost:9092
spring.kafka.template.default-topic=orders

This ensures that the orders topic is created automatically during application startup, simplifying integration workflows.

3. Key Configuration Parameters for Topics

When creating or managing Kafka Topics, pay close attention to these parameters:

Partitions
More partitions mean better parallelism but also higher resource usage. For most workloads, start with 3-5 partitions per topic and adjust based on throughput needs.
Replication Factor
Set this value to at least 2 or 3 to ensure data durability and fault tolerance across brokers.
Log Retention
Configured using properties like log.retention.bytes or log.retention.hours, this determines how long messages remain in a topic before being deleted or compacted.
Min Insync Replicas
A critical setting for write durability. For example: min.insync.replicas=2

Retention Policies

Kafka retention policies determine how long messages are kept in a topic. This feature is essential for managing storage while maintaining data availability.

Time-Based Retention

With time-based retention, messages in a Kafka Topic expire after a set number of hours or days. For instance:

log.retention.hours=72

This configuration tells Kafka to delete messages older than 72 hours. Use this when older data is less relevant to your workflows, such as in real-time alerting systems.

Size-Based Retention

Size-based retention limits the amount of space consumed by a topic:

log.retention.bytes=1073741824

Here, each partition will retain up to 1GB of messages. This approach works for high-throughput systems, ensuring storage constraints aren’t exceeded.

Best Practice for Combining Policies

You can combine both time-based and size-based retention. Kafka will delete data that meets either criterion first, ensuring optimized storage management.

Compaction vs Deletion

Kafka provides two strategies for cleaning up topic data:

Log Deletion

With log deletion (the default policy), Kafka removes all data that exceeds the retention limits. This keeps topics lightweight and is ideal for ephemeral data pipelines like event notifications.

Use Cases:

Monitoring logs
Temporary event queues

Log Compaction

Log compaction retains only the latest version of each unique key in a topic, making it suitable for datasets where maintaining the current state is crucial.

Configuring Log Compaction: You can enable log compaction for a topic as follows:

bin/kafka-configs.sh --alter \
--zookeeper localhost:2181 \
--entity-type topics \
--entity-name user-profiles \
--add-config cleanup.policy=compact

Use Cases:

User profile updates
Maintaining inventory counts

Naming Conventions for Kafka Topics

Good naming conventions reduce ambiguity and simplify topic management in large deployments.

Recommended Practices:

Descriptive Names
Choose names that explain the purpose of the topic. For example, user_signups or orders.placed.
Use Hierarchies
Dots can be used to represent hierarchical structures, such as ecommerce.orders.created.
Environment Prefixes
Add prefixes like dev or prod to avoid confusion across environments. For example, prod.orders.
Version Control
Include version numbers where schema evolution is expected: orders.v1

Spring Boot Kafka Examples

Integrating Kafka with Spring Boot is straightforward. Below are code snippets to help you build a producer and consumer.

Kafka Producer Configuration

@Configuration
public class KafkaProducerConfig {
@Bean
public ProducerFactory<String, String> producerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
return new DefaultKafkaProducerFactory<>(config);
}
@Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
}

Kafka Consumer Configuration

@Configuration
public class KafkaConsumerConfig {
@Bean
public ConsumerFactory<String, String> consumerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
config.put(ConsumerConfig.GROUP_ID_CONFIG, "group_id");
config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
return new DefaultKafkaConsumerFactory<>(config);
}
@Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
return factory;
}
}

Sending and Receiving Messages

Producer:

@RestController
@RequestMapping("/publish")
public class KafkaController {
@Autowired
private KafkaTemplate<String, String> kafkaTemplate;
@GetMapping("/{message}")
public String sendMessage(@PathVariable String message) {
kafkaTemplate.send("orders", message);
return "Message sent successfully.";
}
}

Consumer:

@Service
public class KafkaConsumer {
@KafkaListener(topics = "orders", groupId = "group_id")
public void consume(String message) {
System.out.println("Consumed message: " + message);
}
}

Final Thoughts

Kafka Topics are the backbone of any real-time data streaming architecture. By effectively configuring partitions, retention policies, and cleanup strategies, you can build resilient, scalable, and efficient data pipelines. The examples and best practices shared here should give you the tools to master Kafka Topics and take full advantage of Kafka’s capabilities.

Understanding Kafka Topics: Configuration, Retention Policies, and Best Practices

What Are Kafka Topics?

Anatomy of Kafka Topics

Key Takeaway

Topic Creation and Configuration

1. Creating a Kafka Topic via CLI

2. Spring Boot YAML Configuration for Default Topic

3. Key Configuration Parameters for Topics

Retention Policies

Time-Based Retention

Size-Based Retention

Best Practice for Combining Policies

Compaction vs Deletion

Log Deletion

Log Compaction

Naming Conventions for Kafka Topics

Recommended Practices:

Spring Boot Kafka Examples

Kafka Producer Configuration

Kafka Consumer Configuration

Sending and Receiving Messages

Producer:

Consumer:

Final Thoughts

Top 10 Spring Security Interview Questions for Spring Boot Developers

Kafka Connect Basics and Use Cases

Top 10 Java Concurrency Interview Questions with Code

Kafka vs Pulsar vs Kinesis : Choosing the right platform

Top 10 Spring Boot and Spring Cloud Interview Questions

Setting Up Kafka Locally Using Docker | Spring Boot Connect

Leave a Reply Cancel reply

Top 10 IT Companies for Banking Domain

Top 10 IT Companies in India to Work For in 2025

Top 10 IT Companies in the UK Hiring in 2025

Top 10 IT Companies in the USA for Tech Careers

Subscribe to Newsletter

Categories

Pages

Pages Link

What Are Kafka Topics?

Anatomy of Kafka Topics

Key Takeaway

Topic Creation and Configuration

1. Creating a Kafka Topic via CLI

2. Spring Boot YAML Configuration for Default Topic

3. Key Configuration Parameters for Topics

Retention Policies

Time-Based Retention

Size-Based Retention

Best Practice for Combining Policies

Compaction vs Deletion

Log Deletion

Log Compaction

Naming Conventions for Kafka Topics

Recommended Practices:

Spring Boot Kafka Examples

Kafka Producer Configuration

Kafka Consumer Configuration

Sending and Receiving Messages

Producer:

Consumer:

Final Thoughts

Related posts:

Similar Posts

Leave a Reply Cancel reply

Pages Link