Nº30 · Processing
Trino
One SQL to query data wherever it lives.
What is it?
Trino is a distributed SQL query engine designed to run a single query across very different sources — a data lake on S3, PostgreSQL, Hive, Kafka — without moving the data. Each source connects as a catalog and you query everything with standard SQL.
What is it for?
- Querying a data lake (Parquet/Iceberg on object storage) with interactive SQL.
- Federating sources: joining a PostgreSQL table with files on S3 in one query.
- Acting as a query layer for dashboards (Superset, etc.) over the lake.
When to use it / when not
Use it for interactive analytics over large volumes in the lake, or when you need to query several sources at once without a prior ETL.
Think twice for single-node or medium datasets (DuckDB is simpler), for heavy ETL transformations (Spark fits better) or for transactional workloads (PostgreSQL).
Get started in 1 minute
Spin up a local Trino with Docker (it ships the tpch sample catalog) and connect with its CLI:
docker run -d -p 8080:8080 --name trino trinodb/trino
docker exec -it trino trino # opens the CLI inside the container
-- Against the built-in tpch sample catalog (nothing to configure):
SELECT nationkey, name FROM tpch.tiny.nation LIMIT 5;
-- Trino's real point — federate different sources in a single query
-- (once you wire up hive/postgresql catalogs):
SELECT s.country, SUM(s.amount) AS total, c.region
FROM hive.analytics.sales s
JOIN postgresql.public.catalog c ON s.country = c.country
GROUP BY s.country, c.region
ORDER BY total DESC;
Quick trivia — test what you just read.
How much do you know about Trino?
Official documentation
The source of truth lives there. Here we orient you; the depth is up to you.
Open official docs ↗What to learn next
See alsoNº30 · Updated 2026-06-08