Open-source curation · Python-first · in Spanish & English

The catalogue

Nº29 · Languages

SQL

The universal language for asking questions of your data.

LanguageIntroBase / cross-cutting·Data Engineer·Data Scientistsql

What is it?

SQL (Structured Query Language) is the standard language for querying and manipulating data in relational databases (tables with rows and columns). It is declarative: instead of coding step by step how to traverse the data, you describe what result you want — which columns, with which filters, grouped how — and the engine decides the most efficient way to produce it.

Although an ISO standard exists, each engine speaks its own dialect (PostgreSQL, MySQL, SQLite, DuckDB, BigQuery, Snowflake…). The differences are minor compared to the shared core: what you learn in one transfers almost entirely to the rest.

What is it for?

  • Querying data. SELECT … WHERE … ORDER BY is the daily bread: filter, sort, and project the columns you care about.
  • Summarizing and combining. GROUP BY with aggregate functions (SUM, COUNT, AVG) and JOIN to cross tables are the heart of almost any tabular analysis.
  • Defining and populating. With DDL (CREATE TABLE) and DML (INSERT, UPDATE) you model the schema and maintain the data.
  • The foundation of the analytics stack. dbt transforms with SQL, Trino federates sources with SQL, warehouses (BigQuery, Snowflake) are queried with SQL. Mastering it unlocks half a dozen tools.

When to use it / when not

Use it for practically any analysis over structured data: exploring tables, building metrics, feeding dashboards, transforming in the warehouse. It is usually more concise and readable than the equivalent code, and the engine handles optimization.

Think twice for:

  • Complex iterative or procedural logic (loops, elaborate branching): a general-purpose language like Python is more natural.
  • Machine learning: training lives in the Python ecosystem (scikit-learn, PyTorch). SQL contributes feature preparation, not the model.
  • Deeply nested or unstructured data (deep JSON, free text, images): extensions exist, but another tool often fits better.

Get started in 1 minute

The fastest way to run SQL without installing a server is DuckDB: an analytical database that lives inside your process.

pip install duckdb
duckdb        # opens the interactive CLI
-- Create data on the fly and aggregate it — the heart of SQL in 4 lines
SELECT country, SUM(amount) AS total
FROM (VALUES ('PE', 100), ('PE', 50), ('CL', 80)) AS sales(country, amount)
GROUP BY country
ORDER BY total DESC;

When you want to practice on real data, DuckDB queries a CSV directly: SELECT * FROM 'file.csv' LIMIT 10; — nothing to load first.

Quick trivia — test what you just read.

How much do you know about SQL?

Official documentation

The source of truth lives there. Here we orient you; the depth is up to you.

Open official docs

What to learn next

See also

Nº29 · Updated 2026-06-25