We’ve Arrived at the 122TB Drive Inflection Point

Let’s say you’ve just signed the check for a few thousand GPUs. You’ve got models to train, inference to serve, and a five-year runway filled with unknowns.

The problem isn’t compute. It’s everything around compute. And increasingly, the real showstopper isn’t network bottlenecks or software orchestration—it’s something embarrassingly physical.

You’ve run out of space.

“We’re seeing this across the board,” says Alon Horev, cofounder and VP of Technology at VAST. “There are companies who know they’re going to scale—hyperscalers, cloud-native AI platforms—and they design for density from day one. But then there’s a whole category of datacenters that simply can’t expand. They’re out of racks, out of power, out of runway.”

This isn’t hypothetical. This is happening right now in high-frequency trading hubs, in AI-first startups, and even in surprisingly constrained data halls in energy-rich locations.

The great irony of modern AI infrastructure is that its limiting factor is often architectural in the most old-school sense: power, square footage, and the heat budget of a first-floor utility closet.

“Sprawl Is a Lie” You Tell Yourself The First Year….

The early logic feels sound. You start with what’s prescribed: performance. Throughput. Low-latency metadata operations. A file system that can feed your GPU pods like a deranged buffet line. To get that, people go wide: lots of SSDs, small drives, high performance. In other words, they build out.

“This works until it doesn’t,” Horev says flatly. “You chase performance and burn through all your rackspace before you even realize it. And now you’ve got ten petabytes sitting across eighty servers, and nowhere to put the next eighty.”

It’s the same pattern across customers—webscale, enterprise, doesn’t matter. The cluster is performant but inflexible. And adding capacity becomes not just a logistical burden, but a performance tax. Try mixing small and large drives in one system and see how long it takes before you're debugging imbalance and throughput skew that has nothing to do with software.

“We’ve seen clusters that are a graveyard of good intentions,” he adds. “They were optimized for benchmarks, not for time.”

And To This End, Why 122TB Is the New Baseline

The move to ultra-high-capacity SSDs isn’t about someone’s love of big numbers. It’s about physics.

At 122TB per drive, you get ~2PB per server. That means half the rackspace, half the power, fewer cables, fewer NICs, fewer things to go wrong. It’s not just density. It’s gravitational collapse—in a good way.

“The hyperscalers? They get it. They assume they’ll need 10x what they have today. So they plan for scale. But enterprise and mid-size orgs—they’re the ones that suffer later,” Horev explains. “They go with something that looks cheaper up front, but it breaks their scale-out future.”

And it’s not just about the capacity per se. The drives themselves perform. These aren’t slow, archival-class bricks—they’re part of an all-flash system with full NVMe characteristics. At the end of the day that means you’re not compromising speed for space, you’re collapsing both into a denser, more efficient system.

The 10PB Inflection Point: When Big Drives Start Making Big Sense

There’s a quiet calculus that happens when your infrastructure crosses a certain threshold—not just in petabytes, but in psychology.

Under that threshold, flexibility wins. Over it, density rules.

“Let’s say I have a performance target of 600 gigabytes per second, and I think 2PB is a good starting point,” Horev explains. “That means I might use roughly 200 SSDs of 10TB each to hit the performance and capacity goal. But now I’ve locked in a footprint that takes 8X the rackspace I could’ve used with denser SSDs. But cross the 10PB mark, and the equation shifts.

“Once you’re deploying hundreds of drives, the game changes,” he adds. “At that scale, the biggest cost isn’t the drive—it’s the rack, the power, the airflow, the people touching it. That’s when density starts to pay for itself in ways that are hard to model in Excel.”

It’s not that smaller drives don’t have their place. In fact, they remain ideal for edge sites, performance-first clusters, or environments where rebuild risk is the highest concern. But above a certain size, the concerns around drive failure impact or bandwidth density begin to fade, overtaken by the very real constraint of running out of room.

“You’re no longer optimizing for drive-level resilience, you’re optimizing for datacenter-level survivability.”

Every server you don’t deploy is power you don’t cool, switches you don’t need, people you don’t hire to cable and rack and monitor and maintain. At a certain scale, the cost of not going dense is the real premium.

But What Happens When You Hit the Wall?

There’s a moment in nearly every deployment when the spreadsheet stops matching the reality.

“We’ve seen teams who spec’d for 5PB and were out of space in three weeks,” Horev says. “They thought they could trickle in checkpoints or inference logs or versioned datasets. But they didn’t model the reality of iteration.”

The reality is: AI teams don’t just train once. They checkpoint, tune, fork, retrain. And the storage footprint compounds—not gradually, but exponentially. By the time leadership realizes it, the data has become unmovable, and the system too fragile to re-architect mid-flight.

The Tiering Illusion: Two Systems That Should Be One

Here’s the conventional wisdom: fast file system for active data, object store for everything else. Two tiers, two interfaces, and an endless dance of copying, syncing, and hot-cold data management.

But that separation was never about logic. It was about limitations—hardware, cost, architecture.

For years, the default architecture separated performance from capacity. Fast file systems, needed for high-throughput jobs, were built on many small, fast drives. Object stores, designed for cost-efficient scale, relied on large, dense drives. But that logic breaks down when datasets grow massive—especially with multi-modal AI workloads like video, which require both speed and size in the same pipeline.

“Users suffer when they have to move data between fast file and cheap object,” Horev says. “The industry assumes two separate pools. We’re saying: you don’t need two.”

“With dense drives in a system like VAST’s, there’s no reason to split anymore,” Horev continues. “You get the capacity of an object store with the performance of a fast file system. Same hardware. Same APIs. Same rack.”

You kill tiering not with software tricks, but with architecture that no longer needs the compromise.

Planning for Density, Not Just Performance

One of the biggest failures in current-gen AI buildouts is the lack of clear guidance on how much storage is actually needed.

“There’s no prescription,” Horev admits. “You might be told how fast a GPU node needs to be fed, but nobody says, ‘You’ll need 80PB in year two.’ So people guess.”

Some estimate 1PB per 1,000 GPUs as a loose rule of thumb. But depending on your workload—vision models, synthetic data, transformer checkpoints—it could be twice that. Or ten times.

And once you’ve guessed low, the penalty isn’t just buying more gear. It’s unwinding a design decision that already calcified across your racks.

Scale Is Inevitable. Waste Isn’t.

We used to chase performance. Now we chase room. Inference is eating the world, but only if we can fit it through the datacenter door.

“You can’t always get more space. You can’t always get more power,” Horev adds. “But you can use what you have better. That’s what these drives are about.”

Because in the end, the real cost of not consolidating performance and capacity into one tier isn’t just dollars or watts—it’s the moment six months from now when your GPUs are hungry, your datasets are swelling, and you realize the constraint wasn’t ambition or budget.

It was your racks.