In a relatively short period of time AI infrastructure giant CoreWeave has aggressively built the largest AI-native cloud in the market, driven by clusters with thousands of the highest-end GPUs.
Accordingly power density, cooling, and pure data throughput are hard constraints. This means every decision about infrastructure and software has to be future-proof.
As Chen Goldberg, EVP of Product and Engineering at CoreWeave told the packed audience at the VAST FWD event recently, this journey has been much less about scale for its own sake and more about cleverly shifting workload expectations. Her team’s clusters are directly tied to revenue, research timelines, and live services, and even the smallest inefficiencies show up immediately in utilization and cost. The margin of error shrinks as scale increases, and scale increases often.
As Goldberg describes, workloads can be running on hundreds to thousands of nodes, but the going concern is less about sheer scale and more about coordination between those nodes. As she explains, failure or slowdown of even just one component isn’t contained, it has a ripple effect on everything from utilization to scheduling, having an ultimate impact on the overall economics of the whole system.
As Goldberg describes, workloads can be running on hundreds to thousands of nodes, but the going concern is less about sheer scale and more about coordination between those nodes. As she explains, failure or slowdown of even just one component isn’t contained, it has a ripple effect on everything from utilization to scheduling, having an ultimate impact on the overall economics of the whole system.
When GPUs Wait, the System Is Already Broken
When it comes to first-order scaling problems, many might assume the fear is running out of compute, but Goldberg says that in practice, the issue is inverted.
On the ground, the system begins to degrade while there's still ample compute available because the GPUs are no longer being fed consistently. That means the bottleneck shifts from how fast work can be executed to how reliably work can be delivered.
“Your GPU resources are staying idle, which costs a lot of money,” she says, noting idleness is less of a capacity issue and much more a failure in coordination. Even more specifically, that’s a data delivery failure.
At CoreWeave’s scale, every request depends on timely access to data, predictable movement of that data across nodes, and the ability to reuse or stage it without introducing latency spikes. When those conditions aren't met, the system doesn't slow down evenly across the board instead it tends to fragment. This means some nodes might stall while others continue, which in turn means schedulers lose efficiency, and utilization drops in ways that compound quickly across thousands of GPUs.
CoreWeave’s growth journey as Goldberg described it exposes that the primary constraint is in the system’s ability to keep data flowing in a way that's consistent enough for orchestration to make reliable decisions. Once that breaks, adding more GPUs doesn't recover performance. It amplifies the inefficiency.
The System Is the Loop, Not the Model
What CoreWeave is encountering operationally is the same shift that VAST Data CEO and founder Renen Hallak has been describing from the architectural side, a topic he delved into in his keynote at VAST FWD.
The industry focused on models and training as if that was the whole system, which worked when workloads were simple and self-contained but stops working once they run continuously and rely on ongoing context. But the model is just one component in a loop of ingestion, transformation, retrieval, inference, and action, and once you see the system that way, the issues Goldberg brings up about make more sense.
When every step depends on data, inconsistency in how it’s accessed or moved becomes systemic, and performance is no longer gated by model speed but by whether the loop can run without interruption, with data shaping behavior across the entire system.
CoreWeave’s experience, where coordination failures emerge before compute is exhausted, is a direct reflection of that reality.
In his VAST FWD keynote, VAST cofounder, Jeff Denworth described how continuous systems can't be built on fragmented infrastructure where work doesn’t even get the chance to start behaving like interdependent processes that carry context forward over time.
In a fragmented environment, each layer introduces its own variability. Storage behaves one way, networking another, orchestration makes decisions based on partial visibility, and the system compensates through retries, buffering, and overprovisioning. At small scale, those inefficiencies are absorbed. But at CoreWeave scale, with thousands of nodes participating in a single workload, they accumulate into systemic instability. Latency becomes inconsistent, data arrives out of order or too late to be useful, and scheduling decisions degrade because the system can't rely on a consistent performance envelope.
So if AI infrastructure is going to operate as a continuous loop, then the underlying architecture has to behave as a unified system, where data access, movement, and execution are aligned tightly enough that the system can sustain forward progress without interruption.
The System Stops Hiding Its Weaknesses
What CoreWeave’s story at VAST FWD makes clear (once you move past the surface description of large clusters and rapid growth) it’s that scale isn't a number. It's a change in how the system behaves under pressure.
“Infrastructure is not falling behind in the pace of innovation compared to new models, new developer tools, like we are right up there,” Goldberg told the crowd, explaining it has to evolve in lockstep because the system is now sensitive to every inefficiency in that layer. The infrastructure becomes visible precisely because the system can no longer absorb inconsistencies without consequence.
As CoreWeave takes on more customers and different types of workloads, the system has to manage way more than just compute. It has to keep thousands of moving parts working consistently, even though each one behaves a little differently. And at that point, the challenge isn’t just growing bigger/scaling, it’s staying stable while it grows, and that depends on whether the data layer behaves predictably enough for everything else to rely on it.
From Layers to a System: Rebuilding Control Across the Stack
Once scale exposes variability as the real problem, the architectural response can't be piecemeal, Goldberg says.
CoreWeave’s answer was to stop treating infrastructure as a set of independent layers and instead rebuild it as a coordinated system with shared visibility and control. This is where their design departs most clearly from traditional cloud builders, which have usually assumed flat separation between compute, storage, and networking, all the while relying on abstractions in those boundaries to deal with more complexity.
“Instead of thinking about the cloud as different layers, isolated” Goldberg says, they imagined a system where decisions about scheduling, placement, and recovery are made with full awareness of how data, compute, and network conditions interact in real time. In this view, the orchestration layer isn’t dispatching work into a set of black boxes, she says, it’s actually coordinating a single system where all the parts behave consistently enough that you can rely on them working together.
This is exactly what the VAST founders described. If the system is the continuous loop, then the control plane has to operate across that loop, not within isolated segments of it. CoreWeave collapsed the distance between layers so that the system could respond to variability before it propagates. And as Chen outlines, that’s essential at the scale they're operating, because once coordination breaks, recovery isn't localized. It spreads across the entire workload.
Feeding the GPU at CoreWeave Scale
Feeding a single GPU is trivial. Feeding thousands of GPUs, each running workloads with different context sizes, access patterns, and timing requirements, is the real test.
For Goldberg and team, data has to arrive at the right place, at the right time, at a throughput level that matches the hardware’s capabilities, and it has to do that consistently enough that schedulers can rely on those assumptions when placing work.
Variability at any point in that chain translates directly into idle compute, stalled pipelines, and degraded utilization across the cluster. CoreWeave’s unified control plane can only make effective decisions if the underlying data layer behaves predictably. This is the point where VAST moves from being a component in the stack to being a condition for the system to function at all.
“If you think about GPUs as the engines, we think about storage as the highway system… if there is traffic, nothing actually works,” Goldberg told the audience. And, she adds, at this scale, traffic is a dominant factor shaping system behavior.
What’s interesting too is that she doesn’t lump VAST into the storage bucket. “It's not something that you think about… it's part of the AI fabric,” she says.
For CoreWeave, the data layer has to have consistent performance across thousands of nodes, across different workload types, and across constant changes in scheduling and placement. Without that, the orchestration layer loses the ability to make reliable decisions, and the entire system begins to buckle under its own variability.
What VAST enables in this context goes way beyond just higher throughput (although that is necessary). It provides uniformity in how data is accessed and moved, which allows the rest of the system to operate with fewer unknowns. That very uniformity is what allows CoreWeave to scale without introducing the kind of fragmentation Jeff Denworth described earlier. It ensures that as more GPUs, more workloads, and more customers are added, the data layer doesn't become a source of divergence in system behavior.
Efficiency Emerges from Data Consistency
Once the data layer behaves predictably, the effect shows up clearly in utilization, Goldberg says. “We optimize for high utilization from power, cooling, networking, storage, all of that is factoring into efficiency.” For her team, If utilization is high, it means the system is behaving coherently. But if it drops, something in the coordination of those layers has broken down.
For Goldberg and team, utilization is defined more broadly. Instead of just how busy the GPUs are, she says it’s a reflection of whether the system can sustain forward progress without interruption.
When data shows up reliably, the system can keep everything moving smoothly and make good decisions about where work should run. When it doesn’t, things fall out of sync. Some nodes sit idle while others move ahead, queues get uneven, and the system becomes less efficient because it can’t rely on consistent behavior.
The unified control plane depends on predictable inputs to make effective decisions. The continuous loop depends on uninterrupted access to data. Fragmentation introduces variability that erodes both.
By stabilizing the data layer, CoreWeave is able to treat utilization as a controllable property of the system rather than an outcome it can only observe after the fact. That shift is what allows scaling to remain efficient instead of degrading as the system grows.
When the System Holds Together, Infrastructure Disappears for the User
The most telling outcome of all of this work doesn't show up in a benchmark or a throughput number, it’s found in how customers describe their experience, which in this case is almost understated to the point of being easy to miss.
“Some of the customers… what they love about using VAST that they actually don’t have to think about it,” Goldberg says.
That only happens when the system underneath is coherent enough that its complexity doesn't leak into the user’s workflow. What CoreWeave and VAST have built avoids that outcome by ensuring that the data layer behaves the same way regardless of where or how it is accessed.
The unified control plane can only abstract complexity if the underlying components are predictable. The continuous loop only feels seamless if data is always available when needed. Fragmentation disappears from the user’s perspective only when it has been removed from the system itself.
What the customer experiences as simplicity is the result of a system that has been engineered to behave consistently under conditions where inconsistency would normally dominate.
As the system gets better at moving data consistently, workloads stop running as one-off jobs and start running as continuous pipelines that build on prior results. Data is always available, so the system doesn’t have to start over each time, and work can flow from step to step without interruption.
That shifts the challenge from running models to keeping data moving reliably, since everything depends on that flow staying consistent as the system scales.
The takeaway from Goldberg’s insights at VAST FWS is that her team at CoreWeave scaled successfully because they were able to remove variability in how data moves through the system (not because it added more compute).
She emphasized that when data is inconsistent, everything falls apart. But when it’s predictable, the system holds together. At this scale, performance depends on keeping data flowing reliably across thousands of nodes, which is why the focus shifts to making the data layer stable enough for the rest of the system to depend on.



