Solutions

May 7, 2025

Introducing Trino on VAST: Real-Time SQL for AI-Ready Infrastructure

Authored by

Colleen Quinn, Product Marketing Manager

This blog post was written in 2025 and reflects product capabilities at that time. Some information may be outdated.

Building powerful AI models isn’t enough. Real-world AI success depends on giving models real-time access to live, dynamic, multi-modal data—for context, inference, and action.

Today’s AI agents and applications depend on timely, composable views of data from across the enterprise. But legacy architectures force teams to build fragile ETL pipelines, duplicate systems, and manually stitch together joins—before inference can even begin.

That’s why VAST is expanding the range of engines that run natively on our platform—so AI and analytics pipelines can query governed, real-time data directly at the source, without operational sprawl or movement delays.

Today, we’re excited to announce support for the Trino SQL query engine running on VAST’s serverless compute infrastructure. Trino now joins Spark query engine as part of VAST’s native engine ecosystem, giving teams a choice of high-performance compute engines running directly on live, governed data.

With the VAST Data Platform, users can unify storage, database, stream processing, and vector search into a single, all-flash system built for AI and analytics. With Trino now available alongside Spark on dedicated VAST compute nodes, teams can choose the right engine for each workload—without ever leaving the platform where their data lives.

Simpler, Faster, Easier-to-Use

Trino is widely known for its powerful distributed SQL capabilities. But until now, running Trino meant choosing between limited cloud warehouses, complex open-source clusters, or commercial offerings locked into legacy formats like Parquet. Trino on VAST changes the equation. It’s the first deployment that combines Trino’s query flexibility with the raw speed, simplicity, and security of the VAST Data Platform.

Trino brings fast, distributed SQL analytics directly to structured, semi-structured, and real-time event data stored on the VAST DataBase - no ETL, no query sprawl, and no data duplication. Part of the VAST DataEngine, Trino on VAST combines the flexibility of open SQL with the performance of flash-native storage and the simplified configuration and management —so teams can move faster, manage less, and start querying immediately without replatforming.

Trino on VAST is built for scale, simplicity, and speed:

Zero Infrastructure to Manage: Trino runs natively on VAST’s serverless compute layer—no clusters to configure, no nodes to monitor, and no YAML to debug.
High Availability by Default: VAST automatically handles failover events, monitors Trino health, and restarts services when needed—keeping AI and analytics pipelines running smoothly with no manual intervention.
Always Up to Date: The Trino engine is maintained and upgraded by VAST as part of each platform release, so you’re always running a secure, optimized, and current version.
Performance Boosted by DASE Architecture: Trino gains low-latency, high-throughput performance simply by running on VAST’s Disaggregated Shared Everything (DASE), all-flash infrastructure—accessing data directly over NVMe at scale, with no bottlenecks or transfer delays.
Unified Access to All Data: Query files, objects, and tables from a single SQL engine, governed by the same atomic permissions applied across the VAST stack.

Traditional architectures treat Trino as an afterthought—bolting it onto fragmented storage layers riddled with immutable files and maintenance overhead. But with VAST, Trino becomes a first-class, real-time SQL engine that directly accesses live, governed data.

Built for Real-Time AI and Event-Driven Workflows

Trino on VAST isn’t just for analytics - it’s an engine for real-time AI.

Today’s AI agents and pipelines rely on live event data—behavior logs, telemetry, metadata, and more—to power real-time decisions. Kafka is the de facto standard for streaming—but integrating it into an analytics stack is anything but simple.

In most environments, turning Kafka streams into queryable data means deploying brokers, managing frontends like ksqlDB or Kafka Connect, creating staging layers, and constantly tuning performance. The result is an expensive, fragile, high-latency pipeline just to make event data useful.

The VAST Event Broker eliminates that complexity.

Native Kafka protocol support: Ingest events directly into VAST—no brokers, no frontends, no extra clusters.
Live event tables: Every Kafka topic is automatically stored as a fully queryable, real-time table in the VAST DataBase.
Unified event streaming and querying: Stream, store, and query event data on a single platform—no pipelines, no duplication.

With the VAST Event Broker, Trino users can query live event streams just like any other table—simplifying pipelines, accelerating AI, and turning streaming data into instant context.

The Truth About Parquet: Breaking Tradeoffs with the VAST DataBase

Trino on VAST is more than a fast SQL engine—it’s a new way to unify how teams access and act on all enterprise data.

Most Trino deployments query data stored in Parquet format. But Parquet is inherently immutable: appending new data, updating rows, or deleting records requires rewriting files, leading to file fragmentation, write amplification, and slower query performance over time.

Even with a transactional table format like Apache Iceberg or Delta Lake to manage metadata and file optimization, users face constant maintenance tasks like compaction, vacuuming, and version reconciliation.

The VAST DataBase eliminates these bottlenecks with a fully transactional, real-time-native architecture:

Faster appends, updates, and deletes — with VAST DataBase native table structures that eliminate the need for vacuuming, compaction, or costly file rewrites.
Higher performance and lower CPU overhead — with VAST’s Element Store reducing scan size by up to 32x compared to Parquet.
Simplified access across all data types — with unified querying of files, objects, tables, and vectors through a single engine, governed by atomic permissions and optimized for real-time analytics.

With VAST, Trino users are no longer boxed in by fragmented lakehouse layers. Instead, they get a fully integrated, transactional, high-performance query experience—built for speed, scale, and AI.

Powered by DASE: A Unified Foundation for Speed and Scale

At the heart of this breakthrough is VAST’s Disaggregated, Shared-Everything (DASE) architecture, a radical redesign of how data platforms should scale.

Unlike traditional systems that either tightly couple compute and storage or scatter data across silos and replicas, DASE delivers a single, flash-native platform where compute engines like Trino operate directly on the same real-time data fabric as the storage layer.

By colocating compute and storage at the architectural level, DASE eliminates the latency, bottlenecks, and data shuffling that cripple external query engines, giving Trino direct NVMe access to the VAST DataBase for live, high-speed querying at petabyte and exabyte scale.

Most platforms treat Trino as an add-on. VAST treats it as a first-class engine—fully integrated into our all-flash, Disaggregated Shared-Everything (DASE) architecture.

This flips the script:

Direct NVMe access: Every Trino engine runs natively on VAST compute nodes, with zero-latency access to all of your data across thousands of SSDs in parallel.
No shared-nothing penalties: Compute nodes access all-flash storage without serialization, avoiding the overhead of partitioned or isolated data islands.
No remote servers or data shipping: Queries execute directly where the data lives—eliminating movement, delay, and complexity.

This makes VAST the world’s fastest, most scalable, and most transactional platform for Trino deployments.

Governed from End to End

Enterprise AI demands more than speed—it requires precision control. Trino on VAST delivers built-in, policy-based governance with row-, column-, and cell-level filtering enforced at query time. Identity-aware policies integrate with Open Policy Agent (OPA), audit trails are stored as queryable tables, and all open source code is scanned and verified as part of VAST’s secure release process. Governance isn’t an afterthought—it’s embedded across the full stack, from raw data to real-time queries.

Not Just Faster… Smarter.

The future of AI depends on one foundational capability: instant, in-place access to live data—so agents, copilots, and autonomous systems can reason, retrieve, and act in real time. Yet traditional data stacks remain stuck in a past defined by latency, fragile pipelines, and siloed infrastructure. VAST was built to change that.

With the launch of Trino on VAST’s serverless compute platform, we’re giving customers a fast, federated SQL engine that runs directly on the data—without movement, without orchestration, and without infrastructure sprawl. And Trino is just the beginning. It joins Spark as part of a growing family of engines that can run natively on VAST’s all-flash architecture, delivering governed, low-latency access to structured, unstructured, and vector data at scale.

This is more than a performance story—it’s a new foundation for operationalizing AI, analytics, and automation across your enterprise. Trino on VAST unlocks the speed, simplicity, and real-time intelligence today’s AI workloads demand.

Ready to see Trino on VAST in action? Join a live VAST demo to experience the future of AI-driven analytics.