Sumit Pal: Why VAST is Poised to Catalyze Enterprise AI Adoption

Authored by

Sumit Pal

This blog post was written in 2023 and reflects product capabilities at that time. Some information may be outdated.

Sumit Pal is a data management industry consultant and former Gartner VP Analyst with more than 30 years of experience developing scalable software systems and big data and AI/ML strategies.

The AI Renaissance in Enterprise Data Infrastructure

Exponential data creation and accumulation is fast outpacing human comprehensibility & cognitive limits. The confluence of data flywheels, the algorithm economy, cloud-enabled data platforms and AI has made every enterprise aspire to become data driven, and leverage emergent bleeding edge technologies to build Data/AI/ML apps at scale for competitive advantage. Organizations today are focused on harnessing the power of AI to evolve their business. However, AI infrastructure in most enterprises is deployed by duct-taping storage and processing technologies developed pre-AI era.

To prepare organizations for the AI renaissance, VAST has built the foundation for data-powered AI systems with the VAST Data Platform. This offers a distributed data namespace architected with right balance and tradeoffs to leverage data effectively and efficiently in the era of deep learning.

VAST has been a game changer in enterprise data infrastructure, bringing an alternative approach to storage with its innovative flash cost vs. capacity ratio with an all-flash data platform. VAST has innovated algorithms on top of flash hardware and network technologies to lower flash economics, bringing alive an ecosystem powered by flash.

As the next logical step, the elite VAST team is forging ahead with its dream to position itself at the core foundational layer of multiple technology inflection points—analytics, AI, generative AI—that all rely on data, to be an AI data platform for enterprises.

How VAST Made its Mark

One of the major challenges for any organization is managing enterprise storage and associated silos. Data is generated, stored, and used across datacenters, edge, and cloud providers. Managing a distributed storage environment is challenging and complex with no data map to guide data teams.

VAST addresses challenges of siloed data across tiers with efficient algorithms to enable flash economics, making it possible for organizations to deploy heterogeneous data workloads on a scalable, affordable data platform with universal access across files and objects. VAST has innovated to allow organizations to consolidate tiers of data and serve applications from a single tier of infrastructure that combines the performance of HPC systems with the economics of an archive.

This lays the foundation for machines to ingest, collect, process, and consume data at a global scale by bringing together enterprise data into a unified computing environment. VAST built efficiency into the platform’s core, which makes possible addressing challenges plaguing application such as complexities of storage tiering and data gravity.

An Architecture Built for the AI Era

VAST provides exabyte scalability and multi-tenancy to consolidate data and applications on a unified scale-out flash tier. With multi-protocol (file/object) data access and ability to execute disparate workflows, VAST makes it possible for organizations to run existing applications and emerging workflows within a converged platform.

The DASE (Disaggregated Shared Everything) architecture allows customers to get the best of all worlds, from real-time storage performance and ad-hoc queries to complex data science jobs. The platform separates computing and storage and leverages all-flash to accelerate real-time access to shared storage with scalability, high availability, and low latency in a single unified namespace.

Customers leverage VAST Data to consolidate workloads from home directories to NAS storage, object stores for data lakes and backup repositories. Unifying this offers efficacy of cost, data movement, and reduced administrative burdens. VAST also reduces data footprint with data reduction for heterogeneous data with similarity algorithms.

A Smarter Way to Catalog Data

Scaling storage and data management beyond petabytes comes with its challenge of managing metadata. Findability with high relevance and low latency is challenging with PB-sized datasets. According to McKinsey employees in most organizations spend 30% of their time searching for the right data. VAST addresses this with an innovative, integrated built-in metadata index called VAST Catalog. This can be a game changer that addresses the fundamental problems afflicting data ecosystems - findability and searchability of data at scale with low latency.

VAST catalogs files and objects in its datastore, enabling users to enrich and tag associated data with user-defined context and query metadata with semantics. This provides a tightly integrated, synchronized catalog and opens vast (pun intended) possibilities where:

Operations leverage the Catalog for dynamic capacity management.
Archival applications use it for faster lookups, backups, and data migration.
Applications replace POSIX functions with SQL for speedup.
AI and ML applications leverage it as a unified store where training data and features are co-located, allowing simpler lineage and model tracking.
Users enjoy a huge productivity boost as a searchability tool to leverage underlying architecture for fast data discovery and exploration.

With its rich capabilities, VAST Catalog opens numerous possibilities, especially as a foundation for next generation semantic layer for the data platform. It can be the workhorse for an enterprise data catalog, an essential feature for the data mesh, data fabric-based paradigm.

VAST supplements its datastore and Catalog with plugins for SQL engines like Trino, Starburst, Apache Spark to push down queries and oﬄoad heavy lifting to VAST. With an ultra-fast interface to the VAST storage system, this optimizes data operations with less data movement between query and storage layers.

Analysis

VAST Data is rapidly innovating and marching ahead to build the next generation data infrastructure platform with the vision of building a system that is a data analysis platform for unstructured data with focus on advanced AI and deep learning for petabyte+ size workloads. This will enable organizations to innovate and leverage AI for answering analytical questions backed by intelligent infrastructure for digital transformation and accelerated decision making.

With its solid scaffolding of the unstructured data layer, VAST is building on its success and pushing boundaries to incorporate database functionality with an optimal data access layer to power data analytics and AI ecosystems. Coming up on the horizon are game-changing capabilities that I’ve been briefed on by the VAST team, but am not at liberty to divulge just yet. It’s best you join the VAST event on August 1st to see for yourself.

As companies of all sizes and verticals look to innovate with Generative AI, I strongly feel VAST Data is poised to be the catalyst for enterprises to leverage emerging AI and ML applications through the convergence of rock-solid fast storage and advanced data analysis software.