Open-source curation · Python-first · in Spanish & English

The catalogue

Nº02 · Orchestration

Apache Airflow

Orchestrate data pipelines as code: schedule, run and monitor.

PlatformIntroData Engineerpython

What is it?

Apache Airflow is a platform to orchestrate workflows. You define your pipelines as DAGs (graphs of tasks) in Python, and Airflow schedules them, runs the tasks in the right order, retries the ones that fail, and shows you everything in a UI.

What is it for?

  • Coordinating ETL/ELT pipelines with dependencies between steps.
  • Scheduling recurring jobs (daily, hourly) with retries and alerts.
  • Getting visibility: what ran, when, what failed and why.

When to use it / when not

Use it when you have batch flows with multiple steps and dependencies that need scheduling, observability and retries.

Think twice for real-time streaming (prefer Kafka/Spark Streaming) or for a single simple script in cron — Airflow adds infrastructure you don't always need.

Get started in 1 minute

pip install apache-airflow
from airflow.decorators import dag, task
import pendulum

@dag(schedule="@daily", start_date=pendulum.datetime(2026, 1, 1), catchup=False)
def sales_pipeline():
    @task
    def extract():
        return [1, 2, 3]

    @task
    def load(rows):
        print(f"Loaded {len(rows)} rows")

    load(extract())

sales_pipeline()

Quick trivia — test what you just read.

How much do you know about Apache Airflow?

Official documentation

The source of truth lives there. Here we orient you; the depth is up to you.

Open official docs

What to learn next

See also

Nº02 · Updated 2026-06-08