Open-source curation · Python-first · in Spanish & English

The catalogue

Nº30 · Processing

Trino

One SQL to query data wherever it lives.

Engine / DBIntermediateData Engineersql

What is it?

Trino is a distributed SQL query engine designed to run a single query across very different sources — a data lake on S3, PostgreSQL, Hive, Kafka — without moving the data. Each source connects as a catalog and you query everything with standard SQL.

What is it for?

  • Querying a data lake (Parquet/Iceberg on object storage) with interactive SQL.
  • Federating sources: joining a PostgreSQL table with files on S3 in one query.
  • Acting as a query layer for dashboards (Superset, etc.) over the lake.

When to use it / when not

Use it for interactive analytics over large volumes in the lake, or when you need to query several sources at once without a prior ETL.

Think twice for single-node or medium datasets (DuckDB is simpler), for heavy ETL transformations (Spark fits better) or for transactional workloads (PostgreSQL).

Get started in 1 minute

Spin up a local Trino with Docker (it ships the tpch sample catalog) and connect with its CLI:

docker run -d -p 8080:8080 --name trino trinodb/trino
docker exec -it trino trino   # opens the CLI inside the container
-- Against the built-in tpch sample catalog (nothing to configure):
SELECT nationkey, name FROM tpch.tiny.nation LIMIT 5;

-- Trino's real point — federate different sources in a single query
-- (once you wire up hive/postgresql catalogs):
SELECT s.country, SUM(s.amount) AS total, c.region
FROM hive.analytics.sales s
JOIN postgresql.public.catalog c ON s.country = c.country
GROUP BY s.country, c.region
ORDER BY total DESC;

Quick trivia — test what you just read.

How much do you know about Trino?

Official documentation

The source of truth lives there. Here we orient you; the depth is up to you.

Open official docs

What to learn next

See also

Nº30 · Updated 2026-06-08