Nº04 · Processing

Apache Kafka

The nervous system for real-time data.

Platform—Intermediate—Data Engineer—python

What is it?

Apache Kafka is a distributed event streaming platform. Systems publish events to topics and others consume them, in real time and decoupled. Kafka stores those streams durably, so multiple consumers can read them at their own pace.

What is it for?

Moving data in real time between services, databases and pipelines.
Decoupling producers and consumers (one event, many readers).
Feeding streaming processing (Spark, Flink) or ingestion into a data lake.

When to use it / when not

Use it when you need a durable, high-throughput event bus, or event-driven architectures where several systems react to the same stream.

Think twice for purely batch data (a daily file doesn't need Kafka) or for a simple task queue — there a traditional queue is lighter.

Get started in 1 minute

You need a running broker. The fastest way to try it locally is one with Docker:

docker run -d -p 9092:9092 apache/kafka:latest
pip install confluent-kafka

from confluent_kafka import Producer

producer = Producer({"bootstrap.servers": "localhost:9092"})

producer.produce("sales", key="ES", value='{"amount": 100.5}')
producer.flush()
print("Event published to topic 'sales'")

Quick trivia — test what you just read.

How much do you know about Apache Kafka?

Official documentation

The source of truth lives there. Here we orient you; the depth is up to you.

Open official docs ↗

What to learn next

Apache Spark

The distributed engine for processing data at large scale.

Intermediatepython

Nº30Processing

Trino

One SQL to query data wherever it lives.

Intermediatesql

Nº04 · Updated 2026-06-08