Nº14 · Analysis
DuckDB
The analytical database that runs inside your process — no server.
What is it?
DuckDB is an in-process (embedded) analytical database: no server to run, no infrastructure to manage — it runs inside your Python script, your notebook or your terminal. Think "SQLite, but for analytics": a fast SQL engine that lives next to your code.
What is it for?
- Querying Parquet and CSV files directly with SQL, without loading them into a database first.
- Fast local analysis over medium-to-large datasets (GBs) on a single machine, without paying for a data warehouse.
- Living alongside pandas/Polars: read a DataFrame with SQL and return another DataFrame, mixing the best of both worlds.
When to use it / when not
Use it when you want analytical SQL over local files or object storage, to prototype transformations, or to speed up exploration that gets slow in pandas.
Think twice if you need concurrent writes from many users, an always-on transactional service (that's PostgreSQL), or distributed petabyte-scale processing (that's Spark or Trino). DuckDB shines single-node.
Get started in 1 minute
pip install duckdb
import duckdb
# Query a Parquet file directly, without loading it into a table
df = duckdb.sql("""
SELECT country, SUM(amount) AS total
FROM 'sales.parquet'
GROUP BY country
ORDER BY total DESC
""").df()
print(df)
That's it: no server, no upfront schema, no CREATE TABLE. DuckDB reads the file,
runs the SQL and hands you back a pandas DataFrame.
Quick trivia — test what you just read.
How much do you know about DuckDB?
Official documentation
The source of truth lives there. Here we orient you; the depth is up to you.
Open official docs ↗What to learn next
See alsoNº14 · Updated 2026-06-08