Solutions
May 25, 2026

The Data Stack Is Breaking Under AI Scale

The Data Stack Is Breaking Under AI Scale

Authored by

Jim Crook - Director, Corporate Communications

Over the last several years the standard blueprint for a data platform has looked like an explosion in a Lego factory. To build even a basic AI pipeline, architects have been forced to stitch together a disparate mess of technologies: a Kafka cluster for ingestion, a Spark cluster for ETL, a data lake for raw storage, a specialized vector database to handle the “AI bits,” and on and on.

Each line on that architectural diagram represents not just a connection but a “data tax,” a compounding cost of latency, redundant infrastructure, and governance gaps.

As enterprises move from experimental GenAI to production-scale workloads, the friction of this fragmented stack is becoming untenable. Essentially, we’re hitting a wall where the complexity of moving data is outweighing the value of the data itself.

According to Fouad Teban, VAST solutions engineering VP, the solution isn’t more or better integration; it’s architectural collapse. During a recent talk Teban identified the root cause of this complexity as the very architecture we’ve relied on for thirty years: shared nothing architectures.

In a traditional shared nothing MPP (Massively Parallel Processing) database, each node owns a specific slice of the data. While this arrangement works for the static BI reports of the 2000s, it is fundamentally brittle for the AI era.

Because nodes in this model cannot see each other’s memory or disks, complex AI queries trigger massive “East-West” traffic as data is shuffled across the network to find its partner for a join. The speed of your entire AI pipeline becomes the speed of your slowest, most congested node.

“At a certain scale, these clusters just fall over,” Teban said.

The Ingestion Paradox: Collapsing the Streaming Silo

Teban first honed in on the wall between streaming and storage. Many real-time AI systems depend on streaming infrastructure such as Kafka or RedPanda. This creates constant pipeline fatigue: engineers spend half their lives maintaining connectors just to shuttle data from a topic into a table.

Teban’s point, without explicitly stating it: Streaming systems exist because storage systems historically could not ingest and expose real-time data fast enough.

The DASE architecture, introduced by VAST a decade ago, changes the model by collapsing much of that boundary between streaming ingestion and persistent shared storage. Instead of continuously shuttling data between disconnected systems, organizations can ingest, persist, query, and serve data against a unified underlying platform optimized for rapid writes and immediate availability. Crucially, ingested data is transformed into a columnar format for deep analytical queries, thus blurring the lines between OLTP and OLAP data, as Teban noted.

And because storage and compute scale independently, ingestion throughput can grow without tightly coupling it to query infrastructure. This eliminates the need for a separate streaming tier and allows for a unified query that joins fresh telemetry with years of historical context. For the data platform engineer, the “infinite retention” of Kafka topics becomes a native property of the global namespace (with no forever tuning required).

Vectors as a First-Class Citizen

If the collapse of streaming simplifies the when of data, integrated vector search simplifies the what.

The industry is currently suffering from what we might call “specialized database fatigue.” We’ve seen the rise of dedicated vector stores, but for senior technical leaders, these represent yet another silo to secure and govern. In Teban’s view, a vector is simply another column type. “Embeddings live along with the metadata and the unstructured data in the same cluster, avoiding a lot of silos and redundancy,” he noted.

By treating embeddings as a native data type within a DASE-backed columnar database, you no longer need to sync metadata in Postgres with vectors in a standalone store. When vector search is integrated into the core architecture, similarity searches can scale to trillions of rows with sub-second latency. In testing, Teban noted, the VAST DataBase sustained over 1,000 queries per second at scale, demonstrating near-linear performance growth as client concurrency increased from 75 to 375 threads.

The Architecture Behind Our 11× Vector Benchmark

The Architecture Behind Our 11× Vector Benchmark

VAST DataBase sustained over 1,000 queries per second at scale, demonstrating near-linear performance growth as client concurrency increased from 75 to 375 threads.

Users can run complex SQL queries that filter by structured metadata while simultaneously performing semantic searches, all without the performance degradation inherent in shared-nothing data shuffling. (Learn how the DASE architecture enables an 11x vector search improvement over Milvus 2.6.)

The Great ETL Repatriation

For continuous AI-scale pipelines, the cloud’s utility model increasingly breaks down under the weight of persistent compute, data movement, and always-on transformation jobs. Teban pointed to cloud bill shock driven by continuous Spark jobs that never shut down because the data never stops flowing.

Here too, VAST has an answer, said Teban. By supporting Apache Spark and Trino natively on compute nodes that sit atop the storage layer in a VAST cluster, organizations can run an entire Bronze-to-Gold (in medallion lakehouse architecture terms) transformation pipeline against shared data in place, without copying datasets between separate analytics systems.

VAST’s all-flash AI lakehouse with simplified governance

VAST’s all-flash AI lakehouse with simplified governance

Consider it a zero migration strategy. Because the platform supports the tools data scientists already use - Jupyter, Spark, and Python - repatriating an expensive cloud-native stack becomes a matter of shifting the gravity, not rewriting the code.

Implications for the AI-Driven Enterprise

The convergence of native streaming, integrated vectors, and repatriated ETL signals the end of the “best-of-breed” fragmentation. Teban suggests the AI-driven enterprise cannot afford the latency of a fragmented stack.

The unresolved question for the next year is no longer how to connect these disparate systems, but rather which silos you are willing to let go of first. The most successful architects will be those who stop building bridges between nodes and start collapsing the distance between the data and the insight.

More from this topic

Learn what VAST can do for you

Sign up for our newsletter and learn more about VAST or request a demo and see for yourself.

* Required field.