Thought Leadership

Apr 21, 2026

AI Factories Only Work If Data Is Always Ready

Authored by

Nicole Hemsoth Prickett, Head of Industry Relations

At the jam-packed VAST FWD user event in Salt Lake City, the collaboration between VAST Data and NVIDIA was explored in the context of enterprise agentic AI.

Sagi Grimberg, VP of Architecture at VAST, and Adel El Hallak, Vice President of Product Management, Agentic AI for the Enterprise at NVIDIA, walked through what it takes to build systems where agents operate continuously across enterprise data rather than issuing isolated queries against static snapshots.

AI factories are quickly becoming the dominant model for this shift. Behind that trend is a known cadence: assemble the stack, connect models to infrastructure, and intelligence becomes something the system produces continuously across GPUs, networking, orchestration, and software.

The thing is, in practice, enterprise data is constantly changing with loads of contingencies. It’s distributed across systems, constrained by identity and access controls, and all the while the pipelines that represent it lag behind.

As that gap widens, agents are left to operate on stale or missing context, which means accuracy degrades over time even as outputs continue to appear correct.

The Constraint Moves From Models to Data

For a long time, the model could take the blame as the limiting factor. If outputs were weak, you trained a better one. If reasoning failed, you scaled compute. Those approaches worked when systems were smaller, interactions were one-off, and being a little wrong didn’t really matter, but these are different times.

Given the right inputs and tools, models can handle complex, multi-step tasks with a level of abstraction that would have seemed unrealistic even recently. What they can’t do (and what no amount of model improvement compensates for) is operate on data that is incomplete, stale, inaccessible, or just plain poorly governed.

The real constraint now is actually that enterprise data is fragmented.

It’s spread across systems, it’s constantly changing, spread across different formats, and is married to identity and access policies. The model isn’t the problem, it’s that agents need to reliably reach the data they need, in the form they need it, at the moment they need it. Once that happens, every downstream step inherits the error, and in systems that chain reasoning across multiple stages, those errors compound quickly.

This is a compounding problem. As enterprise data changes, the system’s representation of that data falls out of sync, and once that happens, agent accuracy begins to decay because every step in the process is now operating on a version of the enterprise that no longer exists.

Most Enterprise Data Isn’t Ready

Inside most enterprises, data exists everywhere and nowhere at once. It lives in documents, logs, tickets, code repositories, shared drives, collaboration tools, and operational systems that were never designed to be queried in real time by anything other than a human who already knows where to look.

The pipelines built around that data reflect the possibly outdated assumption that ingestion is periodic, that embeddings are generated once and revisited later and that indexes are updated in batches. They’re built around the idea that permissions get layered on after the fact. And while this all works when queries are occasional and when the cost of being out of date is tolerable it definitely doesn’t work in a world of continuous use.

Because data changes constantly. Documents are updated, logs stream in, code evolves, and signals arrive from systems that never stop running. But the representation of that data inside AI systems lags behind. Embeddings drift out of sync with source material, indexes reflect a past state, access control becomes inconsistent across layers…you see where El Hallak and Grimberg are going with the idea of decay.

What makes this tricky too is that from the outside, the system’s still responding, still returning answers and generating output but the gap between what the data is and what the system believes the data to be widens over time and gets wildly cumulative.

RAG Was Built for Snapshots

Retrieval augmented generation was the first serious attempt to bridge models and enterprise data, and while it’s still important, it was built on assumptions that aren’t holding out at scale.

RAG works great in controlled environments where the dataset is known, the scope is limited, and the system can be prepped ahead of time. You ingest, get your embeddings, build an index, then expose it to the model at query time.

That works well enough but the problem is that RAG assumes that the data can sit still long enough to be prepared.

The thing is, data is continuously created, updated, and deleted. New sources appear, new formats that go beyond text (images, logs, video, and code, etc.) and access policies shift as users and roles change. The volume gets out of control to the point where selecting a subset of “important” data becomes its own problem.

In that environment, a pipeline that runs just periodically is already behind. By the time embeddings are generated and indexed, the underlying data might have changed. And by the time a query is off to the races, the system is pulling from a version of reality that no longer exists.

Data Has to Be Kept Ready, Not Made Ready

What replaces that approach isn’t a better retrieval trick, it’s a different way of building the system. If data is always changing, you can’t prepare it once and be done. It has to be kept ready all the time.

This is the problem the VAST InsightEngine is designed to solve.

A key capability of the VAST AI OS, InsightEngine is a fully assembled system designed to keep enterprise data continuously usable.

The SyncEngine ingests data from enterprise systems such as Google Drive, Jira, and Confluence while preserving identity and access semantics.

The DataEngine processes that data in real time, generating embeddings, enriching metadata, and maintaining pipelines that do not pause.

The DataStore provides a unified file and object substrate, and on top of that sits a vector database designed to operate at trillion-scale, where all enterprise data can exist within a single logical space.

Instead of splitting data across multiple indexes, everything lives in one place, so the system doesn’t have to guess where to look. It can pull the right context together on the fly from all the data.

So now instead of keeping indexes in memory, the system spreads them across fast flash, so performance doesn’t fall apart as things scale. When you run a query, it finds similar results, re-ranks them, and checks permissions at the same time so everything returned is both relevant and allowed.

And now it’s possible to always be kept as AI-ready data.

The Pipeline Becomes a Loop

Once data has to stay ready, the idea of a pipeline as a set of steps stops making sense, as El Hallak showed the crowd.

The usual flow of ingest, embed, index, and query assumes the system can pause between stages and catch up, but in a continuous system there is no pause.

What replaces it is a loop that never stops. Data is ingested as it changes, embeddings are updated as models and content evolve, and queries run against whatever the current state is at that moment. The results don’t just end there either, they generate new data that feeds back into the system, so the whole thing keeps moving.

This is how NVIDIA’s stack is designed to operate, with components like NVIDIA NeMo Retriever running continuously instead of in batches. In that setup, batch ingestion isn’t just inefficient, it creates gaps where the system falls out of sync with the data. Agents make that worse because they aren’t just reading data, they’re producing it, adding more load and more pressure to keep everything current.

Keeping that loop running is what matters and NVIDIA and VAST have made that far simpler.

You Can’t Pre-Select What Matters

Once the system runs continuously, scale stops being something you plan for later. You can’t rely on narrowing the dataset or deciding ahead of time what’s relevant, because the context for any given query can pull from documents, logs, code, and prior interactions that weren’t obviously connected in advance.

If the system can’t search across all of it, it can’t guarantee correctness.

That’s why VAST is built to handle trillions of vectors in a single space. The point isn’t raw size, it’s avoiding fragmentation. Traditional vector databases rely on memory for indexing, which forces you to split data into smaller chunks and introduces guesswork about where relevant information lives.

By spreading indexes across fast flash instead of keeping them in memory, the system can scale without losing performance or increasing latency. It keeps everything in one place and makes it all searchable in real time.

AI Factories Depend on This Layer

The AI factory model depends on continuous ops across a full stack of hardware, orchestration, models, and applications, but it assumes the data underneath is already usable, and in most cases it isn’t.

This stack is already running inside NVIDIA’s own AI factory, where the NVIDIA AI-Q research assistant operates on continuously ingested enterprise data, with VAST handling ingestion, unified vector storage, and retrieval, and NVIDIA’s software handling embedding, reranking, orchestration, and model execution. This is a working system, not a reference design.

Without a data layer that stays current, the factory still produces outputs, but it can’t guarantee they’re right. As systems move from single queries to agents and then to multi-agent workflows, the weaknesses in the data layer become obvious.