Beating the Flash Crunch With Efficiency, Not Supply

Authored by

Nicole Hemsoth Prickett, Head of Industry Relations

For years, the storage industry has been conditioned to treat shortages as temporary. Cyclical downturns, followed by density gains, followed by relief.

But as we unpack in a recent episode of the Shared Everything podcast from VAST, what’s unfolding now looks different.

The current flash supply crunch is not simply a matter of demand outrunning manufacturing for a quarter or two. Instead we’re looking at the collision of AI-scale consumption with the physical limits of NAND production, the long timelines of fab expansion, and a market that no longer has meaningful slack.

That reality is forcing a reframing of where leverage actually exists, as VAST Co-Founder Jeff Denworth and Solidigm’s Scott Shadley explain.

AI infrastructure has changed the profile of storage demand in three important ways. First, the sheer volume of data being ingested has exploded. Training pipelines, inference logs, embeddings, checkpoints, and intermediate artifacts all want to live close to compute and stay accessible.

Second, access patterns have shifted. Random access is no longer a performance optimization; it is a requirement.

And third, tolerance for latency and inefficiency has collapsed. When GPUs cost tens of thousands of dollars per unit and are deployed by the hundreds of thousands, wasted I/O capacity becomes wasted compute.

At the same time, NAND manufacturing has lost the easy gains it once enjoyed. Layer counts continue to rise, but fab throughput does not scale linearly with density. Process times increase, toolchains diverge, and new fabs take years to bring online. Even the hard drive industry, long assumed to be the safety valve during flash shortages, is facing its own mechanical and power constraints. The result is a market where buying more media is no longer a reliable strategy.

In conversations with customers navigating the current crunch, one pattern shows up repeatedly. Organizations are discovering that their real flash consumption has far less to do with the amount of data they generate than with how inefficiently that data is stored. Triplicated databases sitting on top of triplicated object stores. Erasure codes chosen for simplicity rather than efficiency. Performance workarounds that multiply capacity requirements by three, six, or nine times.

Those design decisions were tolerable when flash was abundant and prices were falling, but they are untenable when allocation is fixed and lead times stretch into quarters.

VAST approaches the problem from a different direction, as Denworth outlines in this episode.

Instead of assuming flash is cheap and infinite, the system is built around extracting maximum usable capacity from every drive. That starts with erasure coding designed for flash from day one, delivering overhead on the order of a few percent rather than the double-digit penalties common in legacy systems. It continues with data reduction that operates globally across the dataset, not as a bolt-on feature or a best-effort optimization, but as a first-class architectural principle.

This goes well beyond theory. Across production environments, customers consistently see usable capacity expand by multiples relative to raw flash. In practical terms, this means organizations constrained by NAND allocation can meet ingest and retention goals without waiting for the supply chain to catch up. In extreme cases, environments that previously required multiple replicated copies for performance and protection collapse into a single, efficient footprint.

This also reframes the flash shortage from a procurement problem into a software problem.

What’s notable is how quickly this shift is happening. Efficiency used to be discussed in the context of power, cooling, and cost optimization. Those still matter, but they are no longer the primary drivers. Today, efficiency determines whether a deployment can happen at all. When hyperscalers are willing to place open-ended orders and enterprises are competing for the same constrained supply, the ability to make flash go two or three times further becomes a strategic advantage.

There’s also a second-order effect. As AI workloads increasingly demand instant access to data beyond GPU and CPU memory, flash becomes an extension of the compute fabric rather than a passive storage tier. Algorithms that assume random access and consistent latency unlock capabilities that simply are not possible on spinning media or heavily replicated architectures. This tight coupling between compute and data makes inefficiency even more expensive, because it directly impacts utilization upstream.

None of this eliminates the need for more flash. Fabs will expand, supply will eventually loosen, and the market will find a new equilibrium. But the lesson of this cycle is unlikely to fade. AI has exposed how fragile capacity planning becomes when it relies on abundance rather than design.

In the near term, the organizations that succeed will not be the ones that secured the largest allocations, but the ones that extracted the most value from what they already had. In that sense, the flash crunch is doing something the industry rarely gets forced to do: confront inefficiency head-on, and treat software architecture as a first-order supply chain lever.

Beating the Flash Crunch With Efficiency, Not Supply

Efficiency is the new supply. Learn how to maximize every bit.

More from this topic