Open-source curation · Python-first · in Spanish & English

The catalogue

Nº01 · Orchestration

Airbyte

Move data from any source to your warehouse with ready-made connectors.

PlatformIntroData Engineer

What is it?

Airbyte is an open-source data integration (EL/ELT) platform: its job is to move data from a source to a destination. What sets it apart is its library of hundreds of pre-built connectors — APIs, databases, and SaaS apps as sources; warehouses and lakes as destinations — so you don't have to write and maintain each connector by hand.

It fits the ELT pattern: Airbyte does the Extract and the Load, and the Transform happens afterward, inside the destination, usually with dbt.

What is it for?

  • Ingestion without writing connectors. Plug in Stripe, Postgres, Salesforce, or a file and Airbyte handles the extraction and pagination; you just configure.
  • Incremental syncs. Instead of reloading everything each time, it brings only what's new or changed, keeping state between runs.
  • EL before transforming. It lands raw data in the warehouse so dbt can model it afterward — separating load from transform keeps the pipeline simple and auditable.
  • Custom connectors. If a source is missing, the CDK (Connector Development Kit) lets you build a connector that lives in the same ecosystem.

When to use it / when not

Use it when you need to bring data from many heterogeneous sources into a central destination on a recurring, batch basis, without reinventing extraction for every API.

Think twice for:

  • Real-time streaming / low latency: Airbyte is batch by design. For continuous event streams, Kafka is the tool.
  • Heavy transformations: business logic and modeling belong in dbt (in SQL) or Spark, not in the ingestion layer.
  • A trivial, one-off extraction: if it's a single script you run once, a small Python script weighs less than standing up the platform.

Get started in 1 minute

The honest way to try Airbyte in 1 minute is PyAirbyte, the library that runs connectors from Python without standing up the whole platform:

pip install airbyte
import airbyte as ab

# 'source-faker' generates test data — no credentials, no server
source = ab.get_source(
    "source-faker",
    config={"count": 100},
    install_if_missing=True,
)
source.check()
source.select_all_streams()

result = source.read()
df = result["products"].to_pandas()   # one connector stream as a DataFrame
print(df.head())

This runs a real connector on your machine. The full platform (UI, scheduler, all connectors and destinations) is stood up separately with abctl local install, which requires Docker.

Quick trivia — test what you just read.

How much do you know about Airbyte?

Official documentation

The source of truth lives there. Here we orient you; the depth is up to you.

Open official docs

What to learn next

See also

Nº01 · Updated 2026-06-26