Nº02 · Orchestration
Apache Airflow
Orchestrate data pipelines as code: schedule, run and monitor.
What is it?
Apache Airflow is a platform to orchestrate workflows. You define your pipelines as DAGs (graphs of tasks) in Python, and Airflow schedules them, runs the tasks in the right order, retries the ones that fail, and shows you everything in a UI.
What is it for?
- Coordinating ETL/ELT pipelines with dependencies between steps.
- Scheduling recurring jobs (daily, hourly) with retries and alerts.
- Getting visibility: what ran, when, what failed and why.
When to use it / when not
Use it when you have batch flows with multiple steps and dependencies that need scheduling, observability and retries.
Think twice for real-time streaming (prefer Kafka/Spark Streaming) or for a single simple script in cron — Airflow adds infrastructure you don't always need.
Get started in 1 minute
pip install apache-airflow
from airflow.decorators import dag, task
import pendulum
@dag(schedule="@daily", start_date=pendulum.datetime(2026, 1, 1), catchup=False)
def sales_pipeline():
@task
def extract():
return [1, 2, 3]
@task
def load(rows):
print(f"Loaded {len(rows)} rows")
load(extract())
sales_pipeline()
Quick trivia — test what you just read.
How much do you know about Apache Airflow?
Official documentation
The source of truth lives there. Here we orient you; the depth is up to you.
Open official docs ↗What to learn next
See alsoNº02 · Updated 2026-06-08