Nº01 · Orchestration
Airbyte
Move data from any source to your warehouse with ready-made connectors.
What is it?
Airbyte is an open-source data integration (EL/ELT) platform: its job is to move data from a source to a destination. What sets it apart is its library of hundreds of pre-built connectors — APIs, databases, and SaaS apps as sources; warehouses and lakes as destinations — so you don't have to write and maintain each connector by hand.
It fits the ELT pattern: Airbyte does the Extract and the Load, and the Transform happens afterward, inside the destination, usually with dbt.
What is it for?
- Ingestion without writing connectors. Plug in Stripe, Postgres, Salesforce, or a file and Airbyte handles the extraction and pagination; you just configure.
- Incremental syncs. Instead of reloading everything each time, it brings only what's new or changed, keeping state between runs.
- EL before transforming. It lands raw data in the warehouse so dbt can model it afterward — separating load from transform keeps the pipeline simple and auditable.
- Custom connectors. If a source is missing, the CDK (Connector Development Kit) lets you build a connector that lives in the same ecosystem.
When to use it / when not
Use it when you need to bring data from many heterogeneous sources into a central destination on a recurring, batch basis, without reinventing extraction for every API.
Think twice for:
- Real-time streaming / low latency: Airbyte is batch by design. For continuous event streams, Kafka is the tool.
- Heavy transformations: business logic and modeling belong in dbt (in SQL) or Spark, not in the ingestion layer.
- A trivial, one-off extraction: if it's a single script you run once, a small Python script weighs less than standing up the platform.
Get started in 1 minute
The honest way to try Airbyte in 1 minute is PyAirbyte, the library that runs connectors from Python without standing up the whole platform:
pip install airbyte
import airbyte as ab
# 'source-faker' generates test data — no credentials, no server
source = ab.get_source(
"source-faker",
config={"count": 100},
install_if_missing=True,
)
source.check()
source.select_all_streams()
result = source.read()
df = result["products"].to_pandas() # one connector stream as a DataFrame
print(df.head())
This runs a real connector on your machine. The full platform (UI, scheduler, all connectors and destinations) is stood up separately with
abctl local install, which requires Docker.
Quick trivia — test what you just read.
How much do you know about Airbyte?
Official documentation
The source of truth lives there. Here we orient you; the depth is up to you.
Open official docs ↗What to learn next
See alsoNº01 · Updated 2026-06-26