Nº10 · Orchestration

Dagster

Pipeline orchestration centered on the data (assets), not just on tasks.

Platform—Intermediate—Data Engineer—python

What is it?

Dagster is a modern, asset-centric data pipeline orchestrator. Instead of thinking only in terms of tasks that run in order, you define in Python the data assets you produce — a table, a model, a dataset — and their dependencies. The execution graph is inferred from those relationships, and with it come types, tests, and observability out of the box.

That shift (from "tasks" to "assets") is what sets it apart from traditional orchestrators: the system knows what data each step produces, not just that something ran.

What is it for?

Modeling pipelines as a graph of assets. You declare each asset and its dependencies; Dagster builds the DAG and keeps the lineage.
Local development + UI. You run and debug everything on your machine, and the web interface shows the graph, the runs, and the state of each asset.
Stack integration. It connects naturally with dbt, Spark, warehouses, and data sources under one observable graph.
Schedules and sensors. Schedule materializations by time or trigger them on events.

When to use it / when not

Use it when you want a data-aware orchestrator with good developer experience: pipelines with several sources and dependencies between tables, local testing, lineage, and observability without wiring them by hand.

Think twice for:

A single simple cron (one script once a day): standing up an orchestrator is overkill — a cron is enough.
Teams already invested in Airflow: versus Airflow — more traditional, task-centric, with a huge ecosystem of operators — the choice depends on the team and the case, not on which is "better" in the abstract.

Get started in 1 minute

Define an asset that produces data and another that consumes it — the heart of Dagster's model.

pip install dagster

# pipeline.py
from dagster import asset

@asset
def sales():
    # Source asset: produces the data
    return [{"country": "PE", "amount": 150}, {"country": "CL", "amount": 80}]

@asset
def total_by_country(sales):
    # Downstream asset: depends on `sales` via the parameter name
    totals = {}
    for row in sales:
        totals[row["country"]] = totals.get(row["country"], 0) + row["amount"]
    return totals

Launch the UI with dagster dev -f pipeline.py and open http://localhost:3000: you'll see the sales → total_by_country graph and can materialize it with one click.

Quick trivia — test what you just read.

How much do you know about Dagster?

Official documentation

The source of truth lives there. Here we orient you; the depth is up to you.

Open official docs ↗

What to learn next

Apache Airflow

Orchestrate data pipelines as code: schedule, run and monitor.

Intropython

Nº11Processing

dbt

Transform data in your warehouse with SQL, treated like software.

Introsql

Nº05Orchestration

Apache NiFi

Move data between systems with visual flows, no code required.

IntermediateOSS

Nº10 · Updated 2026-06-26