Open-source curation · Python-first · in Spanish & English

The catalogue

Nº14 · Analysis

DuckDB

The analytical database that runs inside your process — no server.

Engine / DBIntroData Engineer·Data Scientistsql

What is it?

DuckDB is an in-process (embedded) analytical database: no server to run, no infrastructure to manage — it runs inside your Python script, your notebook or your terminal. Think "SQLite, but for analytics": a fast SQL engine that lives next to your code.

What is it for?

  • Querying Parquet and CSV files directly with SQL, without loading them into a database first.
  • Fast local analysis over medium-to-large datasets (GBs) on a single machine, without paying for a data warehouse.
  • Living alongside pandas/Polars: read a DataFrame with SQL and return another DataFrame, mixing the best of both worlds.

When to use it / when not

Use it when you want analytical SQL over local files or object storage, to prototype transformations, or to speed up exploration that gets slow in pandas.

Think twice if you need concurrent writes from many users, an always-on transactional service (that's PostgreSQL), or distributed petabyte-scale processing (that's Spark or Trino). DuckDB shines single-node.

Get started in 1 minute

pip install duckdb
import duckdb

# Query a Parquet file directly, without loading it into a table
df = duckdb.sql("""
    SELECT country, SUM(amount) AS total
    FROM 'sales.parquet'
    GROUP BY country
    ORDER BY total DESC
""").df()

print(df)

That's it: no server, no upfront schema, no CREATE TABLE. DuckDB reads the file, runs the SQL and hands you back a pandas DataFrame.

Quick trivia — test what you just read.

How much do you know about DuckDB?

Official documentation

The source of truth lives there. Here we orient you; the depth is up to you.

Open official docs

What to learn next

See also

Nº14 · Updated 2026-06-08