Trino
Trino is an open-source, distributed SQL query engine designed for low-latency ad-hoc querying and analytics over large-scale datasets. Formerly known as PrestoSQL, Trino is built on a Massively Parallel Processing (MPP) architecture.
Key Characteristics
- Decoupled Storage: Unlike a traditional database, Trino does not have a native storage format. Instead, it queries data in-place from remote sources, including data lakes (with formats like Apache Iceberg or Delta Lake), relational databases (MySQL, PostgreSQL), and NoSQL stores (Elasticsearch, Cassandra).
- Query Federation: Allows users to write a single SQL query that joins data across multiple distinct storage systems (e.g., joining an Iceberg table on AWS S3 with customer profile data stored in PostgreSQL).
- In-Memory Pipeline Execution: Designed for high performance, Trino processes queries in-memory across a cluster of workers without writing intermediate states to disk.
Trino is widely used to provide interactive BI and ad-hoc query capabilities on top of enterprise data lakes.
Part of the Data & AI Terms glossary.