Real-Time Data Engineering & Real-Time Global Data Analytics
Breathing Life Into Your Data
The VAST DataEngine brings data to life in a machine that can continuously process and learn on data from the natural world.
No more batch. No more silos of data processing. Just continuous, recursive computing.
Shipping in 2024, the VAST DataEngine will redefine the data computing paradigm by introducing serverless functions and real-time triggers into the VAST Data Platform. Once logic and state are merged... files, objects and tables come to life from edge to cloud.
Data Platforms Need To Evolve
For decades, datastores have been unaware of applications, and applications have been equally unaware of data events. The division between applications and data has resulted in fractional solutions to building data pipelines and a batch processing mentality which separates data streams from deep data analysis.
The VAST Data Platform aims to break the tradeoff between data streaming and global insight by engineering data processing and event notifications natively into the system.
By supporting new types of data - functions and triggers – the VAST Data Platform makes data dynamic by adding support for procedural functions in the same way that JavaScript made websites dynamically interactive.
With the VAST DataEngine – data, and changes to data, trigger action, action is then performed on the data, and the system processes recursively forever. The Data Engine is the basis for perpetual AI training and inference and we hope will be the basis for the AI-powered discoveries of the future.
A Programmable Computing Engine in Software
The DataEngine is a containerized computing environment that customers deploy on their choice of CPUs, GPUs and DPUs – from edge to cloud. By embedding logic directly into the VAST Data Platform, the system can schedule processing events in real time, triggered by data activities.
DataEngine Programmable Environment
VAST’s DataEngine provides a programmable environment in Python for developers to bring their own code. There are also a number of built-in functions that are provided out of the gate to get value from the VAST Data Platform.
These include:
Data Indexing
File header indexing
PII data detection
Ransomware detection
Streaming between tables/topics/files
Data augmentation
Next-Generation Event Streaming Infrastructure
The VAST DataEngine features a new data streaming interface designed to write events natively into the VAST DataBase.
For the first time, it’s now possible to analyze all data by ingesting streaming data in real-time into VAST’s exabyte-scale transactional and analytical database.
A Real-Time Event Router
The VAST Event Router unifies unstructured and structured data event management into a common platform, providing event consumers simple tools to trigger action.
The VAST Data Platform is designed to create structure and insight from unstructured data.
By storing triggers and functions as state in the VAST Data Platform, your code becomes dynamically managed by a global data store that supports global code versioning, global code distribution and global code security policies.
A Simple Python SDK
The VAST DataEngine is a serverless platform, programmed in Python, that integrates stateful functions into an exabyte-scale datastore.
By integrating streaming and data processing with an exabyte scale datastore and database, the Data Platform enables comprehensive function calling with minimal code.
Introducing the VAST DataSet
Deep learning data engineering is tough. Data engineers write large dataset files down to archive storage for training… creating a number of problems associated with rigid data management:
If model training requires data variation, new datasets are written down to storage, often creating redundant data because datasets use overlapping training example data
Because conventional datasets are not embedded with training code, it can often be difficult to reproduce training models as data and code continue to evolve independently
With the DataEngine, VAST is introducing a new concept called the VAST DataSet. This new approach to data management leverages the VAST Database to create materialized views of example data without copying and re-copying data into blunt data containers. DataSets can scale to exabytes. Each DataSet includes an indexed set of examples and the code used for training so that it’s easy to reproduce models on the fly.
A Global Execution Environment
The VAST DataEngine is built on a container framework that allows for services to be globally executed across the VAST DataSpace.
Real-time Insights, Continuous AI Training and Smarter Global Workflows
Optimize Data Operations
Transform operations by automating data-driven workflows across global environments. With the VAST Data Engine, data operations are seamlessly managed from ingestion to action.
Event Triggers
The VAST DataEngine can utilize event triggers, enabling the system to act on data in pre-defined ways.
A Collection of Built-in Functions
Perform functions created or provided by VAST customers and orchestrated by the VAST DataEngine to deliver additional data value.
Kafka-Compatible Broker
Accept Kafka APIs, storing each topic as a table in the VAST DataStore and each message as a row/record in that table.
A Global Execution Engine
Learn where data is located and optimize performance by moving function execution closer to previously accessed data.
A Simple Python SDK
The VAST DataEngine is a serverless platform, programmed in Python, that integrates stateful functions into an exabyte-scale datastore.
A New Approach to Data Management
The VAST DataSet leverages the VAST DataBase to create materialized views of example data without copying and re-copying data into blunt data containers.
Innovation begins with understanding
The VAST DataEngine Explained
The compute engine of the VAST Data Platform, the VAST DataEngine brings insights to life by adding functions and triggers to data.
The VAST Data Platform Explained
Learn how VAST's revolutionary DASE architecture defies all conventional definitions of data platforms, delivering all-flash performance at archive economics to simplify the data center and accelerate all modern applications.