Snowflake
Snowflake is a fully managed, cloud-native data platform that functions primarily as an enterprise Data Warehouse. It is designed to run on top of major cloud infrastructures (AWS, GCP, Azure) and separates storage, compute, and service management layers.
Core Architecture
Snowflake is built on a patented shared-data, multi-cluster architecture that decouples resources:
- Database Storage: Data loaded into Snowflake is reorganized into a proprietary columnar, compressed, and optimized format called micro-partitions. It is stored in cloud object storage, and is accessible only through SQL queries executed via Snowflake.
- Query Processing (Compute): Queries are processed using “Virtual Warehouses.” Each warehouse is an independent MPP (Massively Parallel Processing) compute cluster (backed by VM instances) that does not share compute resources with other warehouses. This enables independent scaling of workloads (e.g., ETL vs. BI queries) without resource contention.
- Cloud Services: A collection of services that coordinate activities across Snowflake, including user authentication, metadata management, optimization, transaction control (ACID compliance), and access controls.
Snowflake supports standard SQL and integrates with diverse data integration, business intelligence, and machine learning tools, establishing a central repository for structured, semi-structured (JSON, Avro, Parquet), and unstructured data.
Part of the Data & AI Terms glossary.