perspectives

Apr 24, 2025

Beyond File: Rethinking Data Infrastructure for AI at Scale

Author

Nicole Hemsoth Prickett, Head of Industry Relations

There was no parade. No grand blog post. No obituary in a supercomputing magazine.

There has just been a quiet architectural shift, and a startling statement from Microsoft Principal Engineer, Glenn Lockwood at GTC: one of the world’s largest AI training supercomputers—the Microsoft system code-named Eagle—runs with no parallel file system whatsoever.

And nothing broke.

For decades, the architectural prescription for scalable AI and HPC infrastructure was canonical: you wired GPUs to InfiniBand and pointed your storage stack at a monolithic parallel file system. Lustre, GPFS, Spectrum Scale—pick your poison.

It was a default more than a decision. Nobody questioned it. Until, suddenly, it didn’t make sense anymore.

We can see how this can leave HPC confused—alarmed, even—by the idea that a massive LLM could be trained using object storage on S3. It violated the sacred formula. How could a workload this latency-sensitive, this bandwidth-thirsty, tolerate the glacial awkwardness of object APIs?

The answer, as it turned out, was not that the storage got faster—but that the workload got smarter.

Checkpointing, that ancient ceremony of synchronous suffering, had gone asynchronous. Frameworks like NVIDIA’s NeMo had evolved past the old blocking write model, allowing intermediate training state to be buffered in CPU memory and flushed opportunistically. No more stalling thousands of GPUs to babysit a write. No more I/O as the bottleneck. No more excuse for needing a file system that can do synchronized dancing at 100 gigabits per second.

And here’s the part that really stings if you’ve spent the last decade tuning striping policies: asynchronous checkpointing wasn’t just a clever workaround. It was a liberation. GPU time, once constantly held hostage by defensive I/O, was finally free to do what it was built for—compute.

But infrastructure inertia is real and many large organizations still walk into the conversation assuming that file systems are the cost of doing business at scale. They are not.

In fact, the more progressive the workload, the less sense file makes at all. In Lockwood’s experience, the absence of a file system didn’t just reduce complexity—it vaporized entire classes of shakeout pain. No stale mounts. No storage hangs when the fabric flaps. No mysterious errors that resolve themselves only after a full reboot and a sacrifice to the sysadmin gods. Without a file system, there was simply less to break.

What we’re witnessing isn’t a debate about performance. It’s a philosophical shift in where storage lives, what it’s responsible for, and who—machine or human—is consuming it.

The file system was built as an abstraction layer for people. But in a world where workloads don’t talk through shells or POSIX calls, that interface has become vestigial.

Training frameworks don’t need files—they need memory. Objects. Tokens. Key-value pairs. Structures that can be moved and retrieved without the overhead of legacy semantics.

The file system didn’t lose a fight. It aged out of relevance.

The Smart Kids Know Inference is the True Architecture Driver

As the GTC panel featuring VAST Data’s Jeff Denworth also spelled out, it’s easy to focus on training. That’s where the money goes. That’s where the clusters are biggest, the model sizes most eye-popping, and the press releases most breathless. But it’s not where the architectural pressure is coming from anymore.

The real challenge is inference.

Training is a controlled environment. You know the data, the schedule, the duration. If something breaks, you restart a job. Inconvenient, yes—but survivable.

Inference lives in a completely different world. It’s reactive, transactional, and revenue-adjacent. Models must respond in real time to human or machine queries. There’s no room for delay, and even less tolerance for failure.

And yet, the demands of inference are more complex than they appear.

It’s not just about getting a fast answer—it’s about preserving context. Language models don’t generate responses in a vacuum. They rely on token histories, attention states, and a growing body of short-term memory that must be retained across sessions.

If a user returns to continue a conversation, that state must be recalled immediately. If it’s lost, the system either reprocesses the entire session—burning GPU cycles—or degrades the user experience with a cold start.

This is where the traditional object storage model starts to break down: Pulling token state over TCP-based S3 is too slow for real-time inference. You can’t pin users to specific GPUs forever, hoping their cached data remains local. And you sure can’t afford to recompute full session histories every time.

The architectural response to this is already forming: key-value caches, tuned specifically for the memory access patterns of modern inference. These caches allow context data—attention maps, token sequences, session metadata—to be stored as discrete, queryable elements. They’re fast, GPU-aware, and structured to serve precisely what the model needs, when it needs it.

At VAST, this thinking has evolved into a design called VUA—short for Undivided Attention.

It delivers key-value data to GPUs over RDMA, bypassing slow stacks and delivering just the relevant fragments of session state. It’s not a storage system in the conventional sense. It’s an intelligence layer for inference, deciding what data matters and serving it with surgical precision.

What’s notable here isn’t just the shift from file to object, or from object to key-value. It’s the increasing role of inference logic in dictating storage behavior.

We’re no longer building infrastructure to store data. We’re building it to serve context. And as that context becomes dynamic, hierarchical, and performance-critical, the architectural response must move up the stack with it.

Inference isn’t a downstream concern. It’s the engine driving upstream infrastructure decisions. And for organizations serious about deploying AI in production, the old rules of storage simply don’t apply.

The End of Unstructured, and the Beginning of…Something Else

For decades, “unstructured data” has been the catch-all term for the messy majority of enterprise information—text documents, PDFs, logs, cat photos, videos of your model train set, scientific output, emails, etc..

It was called unstructured because it wasn’t SQL. Because it didn’t live in a schema. Because nobody really knew what to do with it except archive it, back it up, and hope someday it might be useful.

That someday is now.

What’s happening, particularly in organizations building AI infrastructure at scale, is the slow erasure of the structured/unstructured divide. The binary classification doesn’t hold when large language models can parse natural language as fluidly as SQL engines parse tables. But, the beautiful thing is that once data can be embedded—translated into vector space—it can be queried semantically. It can be reasoned over. It becomes searchable by meaning, not just location or filename or regex pattern.

This is the real impact of RAG (Retrieval-Augmented Generation), even if no one called iit out explicitly on stage.

When a model needs to augment its knowledge in real time, it doesn’t go looking for files. It queries a vector index. It pulls context from a key-value store. It asks for relevance, recency, and resolution—not for bytes on disk.

And so the architecture shifts again. Storage isn’t just storage. It’s context infrastructure. It’s a substrate for intelligent querying. It’s a dynamic, memory-aware, versioned, AI-facing service that just happens to persist data as a side effect.

That’s a long way from a file system.

The ripple effects are enormous and hard to overstate: Organizations sitting on decades of historical data are suddenly seeing value where there was only entropy. Embedding that data, making it available to inference engines, building access paths that speak vectors instead of POSIX—these are not incremental improvements. They are the beginning of a redefinition.

Which leads to a final, uncomfortable question: in a world where AI agents interact directly with data—via APIs, caches, attention maps, semantic graphs—what happens to the human-facing abstractions? Do we still need files?

Do we still need paths and mount points and hierarchies built for terminal sessions?

The honest answer is: yeah, sometimes. But less and less.

Legacy apps, compatibility, and user access will keep the file interface around for a while—just like COBOL and FORTRAN still echo through parts of HPC. But in modern, AI-native infrastructure, the file system is becoming a translation layer. A lowest-common-denominator interface that wraps something fundamentally different underneath.

Where we’re headed, files aren’t the unit of work. Context is.

Infrastructure for Intelligence, Not Storage

This is not just a transition from file to object. It’s not even just a matter of adding key-value stores or accelerating inference paths. What’s happening is deeper: the traditional concept of storage is being displaced by something more active, more situational, and more aware of the workloads it serves.

In an AI-native architecture, data is no longer a static resource to be stored and fetched. It is a live substrate—queried, ranked, streamed, contextualized. Its value is no longer measured in capacity or IOPS but in how quickly and precisely it can contribute to inference, training, or reasoning.

To meet that bar, infrastructure must adapt. That means rethinking the role of the file system, the shape of memory hierarchies, and the APIs that connect compute to data. It means building for asynchronous workflows, real-time inference, dynamic caching, and eventually, fully agentic data interaction.

For organizations already operating at this scale—Microsoft, Meta, OpenAI—these shifts are not theoretical. They are already reflected in deployed systems. For everyone else, the opportunity is clear: don’t just modernize storage. Leap over it.

The platforms you build now will define your AI capabilities in two years. Don’t build them for yesterday’s assumptions.

Build for where the data—and the intelligence—are actually going.

Have we reached the end of the unstructured data era? How is storage infrastructure changing in your experience? Join the conversation on Cosmos.

Beyond File: Rethinking Data Infrastructure for AI at Scale

The Smart Kids Know Inference is the True Architecture Driver

The End of Unstructured, and the Beginning of…Something Else

Infrastructure for Intelligence, Not Storage

More from this topic