How Jump Trading Rebuilt Data Infrastructure Around Shared Access

How Jump Tradinsg Rebuilt Data Infrastructure Around Shared Access

Authored by

Nicole Hemsoth Prickett, Head of Industry Relations

The crowd at VAST FWD 2026 was all ears when Lucas Wojcik, HPC Systems Engineer at Jump Trading took the stage. It’s not often the world gets to hear firsthand what it takes to build ultra-fast mission-critical infrastructure that has to adapt at the pace of research and markets.

Jump’s environment is built around continuous, large-scale research, where infrastructure has to support thousands of CPUs and GPUs without adding friction, Wojcik told the audience. “We tend to do a lot of things at scale, and when you're enabling research at scale, it requires many things, namely automation, flexibility, self-service systems and a mindset of constant improvement. This is because if you're not flexible, you're going to break, and if you're not already automating stuff, well, humans are imperfect we'll just say.”

That expectation shapes how any new system is evaluated at Jump Trading, he explained.

Wojcik said VAST first came into consideration because it presented an approach that didn’t align with how high-performance storage had historically been built. “We just knew that we heard that this new kid on the block that was offering NFS with HPC performance, which at the time sounded like a pipe dream, and they were offering low cost all flash systems, which not a lot of places were at that time.” He added that NFS had never been associated with such a level of performance, which made the only viable path forward direct testing.

Wojcik and team began with standard benchmarking to validate (or break) the system. “We started off with the normal things you do when you get a new piece of tech at your company, you start using synthetic benchmarks, DDS, io, five hundreds, FiO tests, just to kind of see if the product actually holds up to what they were saying, or if they are who they say they were type thing.” And sure enough, as he unpacked in detail for the rest of the session, the system held, which was enough to move into real workloads where limits typically emerge.

Software First, Not Box-Bound

The shift came when Jump stopped treating the system like fixed hardware versus software that could be more flexible. He says the limit they hit was mere cluster size, not the platform itself, so instead of waiting on new gear they reused old servers as C-nodes, which opened big doors.

VAST validated and brought them online fast enough to keep work moving and Jump could keep moving too because capacity was no longer tied to a vendor-defined config. As Wojcik said, they could just expand using what they already had and avoid being blocked by supply chain delays.

That flexibility showed up exactly when it was needed too. Lead times were long and demand was immediate, so the ability to extend the cluster in place meant zero pause in production (whereas in a typical environment, that kind of change would be slowed by hardware constraints and approval cycles).

Wojcik also said that same pattern applied to support. Jump worked directly with engineers who were looking at their workloads and making changes in response, which meant problems didn’t sit waiting for a future release. Fixes were often delivered overnight and deployed the next day, which he says turned the relationship into something closer to co-development, where real usage shaped how the system improved.

Where File Systems Actually Break Under Load

Wojcik made it clear Jump goes beyond benchmarking, preferring to roll out workloads that push filesystem behavior to failure, modifying open or deleted files, stressing unusual NFS semantics, and driving ultra-concurrency across thousands of clients. He stressed that these are real quant patterns, not edge cases, and they expose where systems break.

At this pace for Jump Trading, metadata becomes a bottleneck, coordination adds latency, and cluster sizing assumptions fall apart as parallelism increases. This is where coordination, locking, and consistency start to degrade under load and where the real test for VAST was on full display.

Those same constraints forced a move from NFSv3 to NFSv4. The original goal was simple, standard NFS, but NFSv3 could not handle the access patterns at scale but he says NFSv4 introduced better handling of complex file interactions and metadata coordination, and Jump adopted it early, encountering issues but gaining capabilities that matched their workloads. He said it also laid the groundwork for features like delegations to reduce metadata overhead and improve scaling behavior.

But the more important shift happened at the system level. As Wojcik describes it, storage moved from isolated pod-level deployments to a shared layer across the datacenter.

Instead of constant rsync, duplicated datasets, and delays, everything operates from a single namespace. “Now what we have is we have all data accessible to all pods, and that is, that takes away all of the complex data management from all of our quants,” he said, adding that users no longer have to deal with staging or replicate data.

They just don't have to think about that anymore, because the data actually follows the compute, which is a massive advantage.

From Single Protocol to Unified Data Access

Though Jump Trading started with NFS, the system evolved into a multiprotocol layer serving different workloads against the same data.

NFS remains the core for POSIX access, while S3 became essential for object workflows without cloud latency or duplication. “If somebody needs POSIX, they can access that data from NFS. If they want to access it over s3 that's there for them too, and it's as simple as a dropdown menu.” NVMe over TCP is newer, supporting block storage for Kubernetes and production systems, extending the same model of multiple access paths to a single dataset without copies.

Driver-Level Scaling and Multipath Throughput

The funny thing is the performance shift came from something Jump didn’t even want to use at first. Wojcik says they avoided kernel modules until an NFS bug forced it, and once they adopted the VAST client driver, behavior changed immediately. Reads went from around 8–9 GB/s to more than six times that, writes pushed toward 20 GB/s, and cluster throughput hit 1.25 TB/s at around 0.6 ms latency. This wasn’t tuning, Wojcik says, it was actually pathing. Multipathing let clients use all network interfaces, and better balancing across C-nodes removed single-path limits, so previously unused bandwidth became usable.

How Jump Trading Rebuilt Data Infrastructure Around Shared Access

That same shift shows up in observability he says. Audit logging exposes every file operation and reconstructs it into timelines tied to users and paths, so debugging moves from guesswork to evidence. Teams can see exactly what happened and when. “When a user comes to you and says, my data got deleted or it got disappeared, you too can show them that they indeed deleted their data.” For his team. what started off just as visibility became a reliable debugging layer for both users and operators under complex workloads.

Once throughput and visibility are in place, second-order effects matter more. Data reduction at roughly 2:1 cuts required hardware, but the bigger impact is everything attached to it, power, rack space, networking, and operational overhead. In an environment with heavy dataset overlap, this reflects real reuse while lowering the cost of running at scale.

That expansion continues into production systems, he says. The CSI driver brings the same platform into Kubernetes, supporting persistent volumes for container workloads. The pattern is consistent, a question leads to direct engineering engagement, testing happens within days, and production follows shortly after. The model stays the same, one data platform, multiple access paths, but the scope keeps expanding.

What keeps this manageable is the operational model for Jump Trading. Wojcik says a very small team runs the environment using APIs and custom tooling instead of manual processes. Everything can be automated, including provisioning, cluster operations, and upgrades, which keeps the system aligned with infrastructure that expects programmatic control.

Continuous Operation Without Interruption

Upgrades show how far this has come. Early on, they required tight coordination and active troubleshooting. Now they run during the day on systems that are never idle, without users noticing. The complexity is still there, but it’s contained within operations, so changes don’t interrupt active workloads and the system can evolve continuously.

What stands out is how expectations changed once this was in place. Multiprotocol access became necessary. The NFS driver became required for performance, observability became central and data reduction proved to impact more than capacity. Even further, engineering response times reset expectations, and new features could be introduced live.

As the team at Jump Trading looks to the future, Wojcik says NFSv4 delegations aim to reduce metadata overhead under concurrency. VAST DataSpace extends the namespace across locations so compute can move without moving data. VAST Event Broker introduces event-driven workflows tied to data activity. Block storage continues expanding into production alongside containers.

For Jump, the platform became programmable, observable, and shared across compute domains. The constraint shifts away from raw throughput and toward coordination, access patterns, and how effectively data is reused at scale.