Enabling Researchers to Computationally Perform Science

SciNet boosts AI and HPC performance with the VAST AI OS

Learn More

Industry

Education

Use Case

Artificial Intelligence

Video

Overview

SciNet, the supercomputing center at the University of Toronto, is one of Canada's five centralized HPC facilities and home to the TOP500 supercomputer Trillium. The center serves the entire academic community in Canada, providing researchers with the resources and expertise necessary to perform diverse, large-scale research, ranging from cutting-edge bioinformatics to climate science and astrophysics. Their mission demands a general-purpose, highly available system capable of handling the largest parallel compute jobs.

Background

With the installation of their new, third-generation supercomputer, a system boasting 240,000 CPU cores and 250 NVIDIA H100 GPUs, SciNet faced a critical challenge: ensuring the underlying data infrastructure could keep pace with the massive compute power and, crucially, the burgeoning demands of AI workloads. “We support virtually every scientific discipline, from traditional fields like physics, high-energy physics, and engineering to emerging areas like bioinformatics, social sciences, and AI/statistics. Computing is essential for all academic work.” explains Daniel Gruner, CTO, SciNet.

The previous architecture, which ran for over seven years, relied on a complex, two-tiered system: ∼18 petabytes on GPFS for bulk storage and a separate flash burst buffer dedicated to handling the IOPS-intensive small-file and temporary data loads. Gruner was clear about the limitations of this model. "I was adamant I didn't want to do the same thing again. We are adventurous, if you will, but not in a silly sense. There's a good reason for looking to do our work differently and better.” commented Gruner.

Outcome

The VAST deployment provided SciNet with immediate, structural advantages critical for their accelerated science programs:

Eliminating the Burst Buffer Concept: The VAST platform renders the burst buffer obsolete. The entire 30 petabytes acts as a single, consistent performance tier. This move simplified the data path dramatically and ensures that the system is not IO-bound when running large-scale jobs, such as single-run ocean simulation which produces a petabyte of output.
Powering AI Workflows: By choosing VAST, SciNet ensured their GPUs were fully fed with data. The VAST architecture efficiently handles the chaotic IO patterns characteristic of AI/ML, whether it's the parallel IO from large model training or the intense metadata operations from complex data lakes.
Preparing for Massive AI Expansion: SciNet is already planning the next phase of their AI capability, which requires an increase in storage capacity of 30 petabytes. The proven scalability of the VAST platform will allow them to increase their capacity seamlessly, proving it as the scalable foundation for their long-term AI strategy.