VAST FWD 2026: AI Factories Only Work When the System Holds Together

You know a technology set is reaching wider adoption when thought leaders move beyond spending entire panels unpacking definitions and start talking nuts and bolts.

AI factories are making the transition from definition to definitive as evidenced recently at the VAST FWD event where leaders from VAST, NVIDIA and Cisco worked out how the architecture for AI factories will be shaped and refined.

All panelists agreed that what matters most now for AI factories is less about how fast you can train a model, and more about how reliably you can run it, how often it’s used, and how many requests it can handle at once. They also converged on the idea that AI factories are defined by the need for continuous inference, something all three companies support at their core.

Whereas most of the model training workloads from a data and a storage perspective look more batch in nature, in inferencing we're moving to more and more real time pipelines. Real time pipelines that have a more variety access of data at their disposal. There's an inherent need, because it now becomes mission critical, to be always on

said John Mao, VP of Technology Alliances at VAST.

What he’s saying here changes the operating model completely from the “old” world of separate training and inference (and infrastructure). At its core, an AI factory can’t (and shouldn’t) pause between jobs. It has to respond continuously, across many inputs, more like a live service than a scheduled workload.

Of course, that also means the environment gets more complex. “In the world of inferencing, it's going to look very different. You're going to have lots of different applications, lots of different agents. That is going to be a whole new sets of security requirements, access controls. Multi-tenancy is going to become a critical component of that as well,” Mao added.

An AI factory now has to handle many workloads at once, keep them separated, and still deliver consistent performance. At that point, we have to stop thinking of AI factories as just a cluster and instead consider them a system that has to run continuously under real production conditions.

Enterprises need deterministic outcomes, so once you start putting something into production, there's a whole lot of additional concerns that you didn't have to think about before, around governance and security. And from our own experience deploying our own AI factory within Nvidia, it was really interesting. At first, there was so much enthusiasm, and then we started tripping over the data

Jacob Liberman, Director of Product Management at NVIDIA told the crowd.

What he’s getting at here is that the challenge isn’t about getting models to run (with compute as a main limitation) as in the prior AI era. It is getting them to run against the right data, at the right time, with the right controls.

In a continuous system, every request depends on data access but if that access is inconsistent, delayed, or poorly governed, the entire system becomes unreliable.

The panelists agree this is why AI factories tend to stall after initial success. The compute layer scales cleanly, but the data layer doesn’t if the data is fragmented, stored in different formats, governed by different policies, and/or accessed with different latencies. And what’s worse is that the more frequently the system is used, the more those inconsistencies show up.

In other words, what looks like a performance issue is almost always a data problem, which leads us to the most important point/takeaway from the VAST FWD panel:

Factories Need a Data Supply Chain

Once the data layer becomes the limiting factor, it becomes clear that the AI factory is only half the system. The compute side executes inference and training, but it depends entirely on how well data is prepared, organized, and delivered. Without that, the factory has nothing consistent to operate on.

As NVIDIA’s Jacob Liberman explained:

The AI data platform idea was initially conceived as this appliance, and then over as it started getting adopted and as customers are starting to mature, it also contributed to the optimization aspect. So yeah, I think enterprises want to start by getting access and insight from their unstructured data, and if we can do that by delivering them a secure AI factory and AI data platform as an appliance that just jump starts them, and anyone who's gone through the pain of trying to build a do it yourself pipeline will immediately see the value in that appliance delivery.

Of course, an AI factory doesn’t run in clean steps, it runs all the time. Each request pulls data, runs inference, and produces a result, and that cycle repeats continuously with no reset between runs.

That’s what makes AI factories different. They handle many requests at once, from many applications, with uneven demand and changing inputs. Keeping that running smoothly and consistently is what actually defines whether the factory works.

Making the Factory Work in Production

Getting a model to run is one thing. Running it reliably, at scale, and under real business conditions is much harder.

Danny McGinnis, VP of Product Management at Cisco, described how quickly the conversation has shifted.

I think the piece about what I've seen with with AI, is that a year ago, most customers were thinking about either proof of concepts, or should I even look at, look at AI? Or if I am, it was very much reserved for different lines of business. You had a lot of shadow AI that was happening inside of a lot of organizations. I'd say the biggest shift that I've seen in the last 12 months, maybe even six months, with customers, is now they're really thinking about AI as not just a competitive advantage, but you know, how do I maximize the spend?

That shift brings with it a fair amount of pressure because systems have to deliver results, use resources efficiently, and hold up under constant use.

This is the first time where the business, I think, is driving the big change, like you're actually chasing the technology, as opposed to using technology to drive some more efficiency. This time, you're actually saying, my business in order to have a competitive advantage or not have a competitive disadvantage, you're saying, I have to have this

McGinnis added.

So at that point, the AI factory has to meet real expectations for performance, cost, and reliability, or it doesn’t get used.

Which Leads Us to Why AI Factories Have to Be Pre-Built Systems

At this point, it becomes clear why AI factories are not something you assemble piece by piece. Too many parts have to work together at the same time, compute, data, networking, security, and orchestration, all under constant load.

What the panel kept coming back to is that this only works when the system is already integrated and tested as a whole. The more you try to stitch components together yourself, the more likely it is something breaks under real usage. That is why the focus is shifting toward pre-built, validated systems that are designed to run these workloads from the start.

If the stack is not aligned, the system does not hold up. If it is, you can start using it immediately and scale from there. It is here where the partnership between VAST, NVIDIA, and Cisco is most clearly valuable.

NVIDIA brings the models, frameworks, and reference architectures, Cisco delivers the infrastructure and operational backbone to run it reliably, and VAST provides the data layer that keeps everything fed and consistent.

Together, it turns what would otherwise be a complex, fragile build into something that can actually be deployed, used, and scaled in a real environment.

Instead of spending months wiring systems together and debugging edge cases, teams can focus on using the system to solve real problems. The integration work is already done, the performance characteristics are known, and the operational model is clear.

So with AI factory mandates front of mind priorities, what would normally be a long, uncertain build phase becomes a much shorter path to production, which is ultimately what determines whether an AI factory delivers value or stalls before it ever gets fully used.

The challenge for creating production-ready AI factories is moving from just adding more hardware to making data, infrastructure, and ops work together without parts flopping over. Many companies aren’t likely to build this successfully on their own because the complexity is high and mistakes show up quickly in production.

Pre-built, tested systems are becoming the starting point, and what matters most is how effectively they are used. The leaders from VAST, Cisco, and NVIDIA agree that it’s here where real differentiation is starting to show up.

AI Factories Only Work When the System Holds Together

Factories Need a Data Supply Chain

Making the Factory Work in Production

Which Leads Us to Why AI Factories Have to Be Pre-Built Systems

More from this topic