Thought Leadership
Jun 2, 2026

AI Inference Is Changing the Shape of Data Architecture and the Cloud Itself

AI Inference Is Changing the Shape of Data Architecture and the Cloud Itself

Authored by

Derrick Harris, Technology Storyteller

As AI inference workloads ramp up and organizations need to deliver real-time intelligence to a broad swath of users, data architecture is taking on a position of unprecedented importance. And it’s shifting the dynamics of the technology industry from storage hardware up to hyperscale cloud computing platforms.

We recently sat down with Microsoft’s Kanchan Mehrotra to discuss what she’s seeing within her world of managing AI and HPC infrastructure products for Microsoft Azure. The short version is that although data architecture always matters when you’re moving applications into production, AI inference introduces new wrinkles — and whole new use cases — that are changing what customers expect from their cloud providers.

Keep reading for some highlights from that discussion, and watch the video below to hear everything that Kanchan had to say about how AI is reshaping data infrastructure in the cloud.

Scaling production-ready systems

The deep learning era was about scaling jobs. The only question was, “Can we train a bigger model faster?”

The Gen AI era is about scaling systems. It is not just about training. It's really the full life cycle we are talking about — fine tuning, RAG, large-scale inference. Customers are starting to care about things like latency, SLOs, multi-tenancy, and safety boundaries. As a result, the product focus has moved from peak performance in the lab to sustained reliability in production.

Technically, that translates into a few shifts. One is that data architecture matters more. Deep learning taught us to really handle large volumes of data — petabytes of data ingesting into the system. But with Gen AI, it's about velocity. We're looking at a continuous loop, so the data architecture and governance really matter.

Savvy customers demand adaptability in AI operations

AI has moved from “build a model” to “run a business.” That means we start caring about latency, cost per token, and power efficiency a lot more. Customers are constantly iterating: they're plugging in their own data; they're refining policies; and access control, as well as data quality and security, are no longer minimal features that may be needed. They're actually really-integrated parts of the infrastructure requirements.

We are also seeing huge model diversity. It's not just LLMs, but also multimodal and reasoning models that are thinking more. It is small models that are optimized for special use cases. Our job at Azure is to build the infrastructure in the platform in an adaptive way, so that whatever model you're running just runs and works fine.

The cloud as an intelligent operating system

AI technology is moving faster than our vocabulary. But if we look at the current trajectory, the biggest change is a fundamental shift in mindset. We are moving from intelligence at any cost to intelligence per watt.

In five to 10 years, the cloud won't be a place to rent VMs. It will be an intelligent operating system. We are heading toward intelligence as a utility, where customers don't buy compute cycles. They buy verified outcomes and agentic labor. The cloud will become the strategic orchestrator of planetary intelligence, and its shape will evolve toward hybrid by default — while the edge handles immediate physical actions, the cloud remains the indispensable control plane and the source of truth for training, governance, and global security. It is transitioning from a hardware host into a vital fabric for autonomous AI.

We aren't just building infrastructure. We're building the engine for the next era of human productivity.

Why Microsoft partners with VAST on cloud infrastructure

You see more partnerships because AI infrastructure has become really a full-stack system problem. In the old world, you could optimize individual components, and they would just work in a silo. But no one really wins that way today. Storage, specifically, is no longer a passive backend. It really needs to keep up with your expensive GPUs, or you're going to be wasting a lot of GPU power and time.

At a system level, VAST is built to handle the data bottleneck. It delivers the high throughput and predictable latency you need for running thousands of clients at the same time. And that's a very common pattern that we have in the AI world.

But the real game changer for us and for our customers is VAST's global namespace. AI workflows are distributed by nature, so you're going to be running training in one place and you may be running inference in another one. The global namespace gives you the ability to have a consistent view of your data across the entire footprint. It removes the friction of data movement and data silos. Ultimately, it's about the speed to value.

More from this topic

Learn what VAST can do for you

Sign up for our newsletter and learn more about VAST or request a demo and see for yourself.

* Required field.