Nº01 · Orchestration

Airbyte

Move data from any source to your warehouse with ready-made connectors.

Platform—Intro—Data Engineer

What is it?

Airbyte is an open-source data integration (EL/ELT) platform: its job is to move data from a source to a destination. What sets it apart is its library of hundreds of pre-built connectors — APIs, databases, and SaaS apps as sources; warehouses and lakes as destinations — so you don't have to write and maintain each connector by hand.

It fits the ELT pattern: Airbyte does the Extract and the Load, and the Transform happens afterward, inside the destination, usually with dbt.

What is it for?

Ingestion without writing connectors. Plug in Stripe, Postgres, Salesforce, or a file and Airbyte handles the extraction and pagination; you just configure.
Incremental syncs. Instead of reloading everything each time, it brings only what's new or changed, keeping state between runs.
EL before transforming. It lands raw data in the warehouse so dbt can model it afterward — separating load from transform keeps the pipeline simple and auditable.
Custom connectors. If a source is missing, the CDK (Connector Development Kit) lets you build a connector that lives in the same ecosystem.

When to use it / when not

Use it when you need to bring data from many heterogeneous sources into a central destination on a recurring, batch basis, without reinventing extraction for every API.

Think twice for:

Real-time streaming / low latency: Airbyte is batch by design. For continuous event streams, Kafka is the tool.
Heavy transformations: business logic and modeling belong in dbt (in SQL) or Spark, not in the ingestion layer.
A trivial, one-off extraction: if it's a single script you run once, a small Python script weighs less than standing up the platform.

Get started in 1 minute

The honest way to try Airbyte in 1 minute is PyAirbyte, the library that runs connectors from Python without standing up the whole platform:

pip install airbyte

import airbyte as ab

# 'source-faker' generates test data — no credentials, no server
source = ab.get_source(
    "source-faker",
    config={"count": 100},
    install_if_missing=True,
)
source.check()
source.select_all_streams()

result = source.read()
df = result["products"].to_pandas()   # one connector stream as a DataFrame
print(df.head())

This runs a real connector on your machine. The full platform (UI, scheduler, all connectors and destinations) is stood up separately with abctl local install, which requires Docker.

Quick trivia — test what you just read.

How much do you know about Airbyte?

Official documentation

The source of truth lives there. Here we orient you; the depth is up to you.

Open official docs ↗

What to learn next

Apache NiFi

Move data between systems with visual flows, no code required.

IntermediateOSS

Nº04Processing

Apache Kafka

The nervous system for real-time data.

Intermediatepython

Nº11Processing

dbt

Transform data in your warehouse with SQL, treated like software.

Introsql

Nº01 · Updated 2026-06-26