Running Lightning-Fast Functions and AI Workloads on Kubernetes

Authored by

Simon Golan, Senior Solutions Engineer, and Ram Bansal, Developer Advocate

Artificial intelligence is changing how we think about data infrastructure.

Pipelines are evolving into agentic workflows, and cracks are forming in the collection of siloed legacy systems we recently took for granted. As with most big technology shifts — like the move from the client-server world to the as-a-service world — these architectural changes ultimately boil down to performance and complexity, although AI adds some new wrinkles. The biggest of these being the introduction of generative AI models into the application stack.

Since the early 2000s, the web has subsumed just about everything in its path, and complexity crept into data architectures at each step along the way. Modern web applications often deploy disparate systems for data ingestion, ETL, data processing, data warehousing, transactional databases, analytic databases, and more, all managed independently and stitched together into complex pipelines. While Kubernetes has simplified the orchestration of these disparate data systems (as well as the rest of the app stack), the systems themselves typically remain siloed.

The VAST Data AI OS breaks down those silos while delivering an unparalleled combination of scalability and performance. It’s built for today’s data-driven web applications and tomorrow’s agentic AI workflows alike. In this post, we’ll touch on a key component — the VAST DataEngine — and then demonstrate how it enables end-to-end agentic workloads on Kubernetes.

A visualization of the components and processes for executing these demos.

A single system for AI and data-driven applications

Although the VAST story began with a novel architecture for high-performance, exabyte-scale storage, the plan was always to become the data layer for data-intensive applications, including AI. And with our latest release, we have all the primary pieces in place:

High-performance, multitenant storage layer (block, file, and object)
Kafka-compatible event broker (up to 1 million events/s per compute node)
Built-in, high-performance databases (transactional, analytical, and vector)
Embedded/managed Trino and Spark
Serverless functions
Data and compute orchestration
GPUDirect to Accelerate NVIDIA GPUs

This lets users run data-driven applications with minimal moving parts and operational overhead. Better yet, because they take advantage of VAST’s foundational data platform, our data services achieve extremely high performance for use cases such as real-time RAG pipelines.

Compared with traditional data-intensive applications AI workloads (training, reinforcement learning, and AI agents) really are a new beast. Yes, running AI applications in production does require interacting with and utilizing existing data systems. However, how they do that today can be fundamentally different, and the forthcoming wave of agentic workflows and pipelines will continue to change things.

But because we want to get to the demo, we’ll leave it at this: AI workloads require lightning-fast data processing, movement, and access, and the fastest possible connection with the GPU compute layer. In addition to the operational costs of maintaining them, the types of distributed data pipelines common for large-scale web workloads introduce too much latency and too many failure points for complex, real-time, and potentially mission-critical AI applications.

If you want more details on the architectural considerations for running AI workloads, we’ve shared some links at the bottom of this post.

Demo: Real-time video analysis using VAST DataEngine on Kubernetes

For this demo, we’ve built two simple agentic pipelines to analyze real-time video streams. VAST DataEngine handles much of the work, although users must still write their own serverless functions and configure their own RAG pipeline using their preferred video-language model (VLM) and embedding model. The resulting vectors are inserted into the VAST vector database for real-time action and insights.

Not highlighted here, but nonetheless important are DataEngine’s observability features. They’re like other observability stacks, but within the same UI and without the need for external Grafana, tracing systems, or other complex infrastructure. Logs and traces belong to the user code applications, although building functions with VAST runtime allows these metrics to flow into the VAST UI.

Running Lightning-Fast Functions and AI Workloads on Kubernetes

1. Write and containerize your serverless functions

After you’ve written your serverless functions, easily package and deploy it to the VAST runtime environment using either the CLI or GUI.

Then, set your triggers. In this case, the initial trigger is that whenever a new video segment hits our VAST S3 bucket, DataEngine’s event broker sends it to a topic we’ve labeled as video-events and instantiates a function we’ve labeled as video-segmenter. It cuts video chunks into 5-second segments and sends them to a different S3 bucket, which in turn fires another event-based trigger.

2. Build your pipelines

That simple function and pipeline combination is the basis for two distinct application pipelines. These include several other triggers and functions, some fully managed by the VAST platform and others external NVIDIA NIMs — but all running as containers on your Kubernetes cluster.

The first pipeline grabs those 5-second clips as they hit the S3 bucket, and sends them to a VLM that reasons about what’s happening in each clip, generating a text-based summary. Those results are then sent to an embedding model that vectorizes the data. From there, the pipeline inserts the vectors and metadata into the VAST DataBase for fast RAG retrieval, log analysis, and other.

For the second pipeline, we’ve set up an agent to run every 5 minutes, scan the video-segment database for any new inserts, and then analyze their descriptions for similarity search.

Note that while we use the VAST interactive canvas for this demo, you also can set up these pipelines via the CLI.

3. Connect to your application

The next step is to integrate these pipelines with your application. Here, we show the UI backend of a hypothetical application connected to a real-time video feed (although, for the sake of simplicity, we’re connecting to a YouTube video). Once that’s set up, you can see video traffic immediately hitting the VAST system.

4. Run your AI workflows

Finally, it’s time to see what these simple agentic workflows can do. In the first sample application, we use our reasoning pipeline to enable semantic search across our vector database about specific occurrences in our video dataset. Although we show a manual prompt for “a man walking with a red suitcase,” you could easily set up a trigger that takes an action or sends an alert whenever the VLM identifies a man with a red suitcase (see below). Or you could expose this via a chat interface to ask questions, such as “Is there a juice stand on West 34th St.?”

For our similarity search pipeline, we’ve set up an example application that sends a text message to first responders when a fire is detected. Here, we also demonstrate some of the VAST DataEngine observability features by digging in to verify that the system did, in fact, identify a fire and that the SMS was sent.

Hopefully, this gives you a good sense of how the VAST DataEngine supports real-time AI and agentic workflows running on Kubernetes. If you want to learn more, read the links below, dig into the entire VAST Data AI OS, and visit us at KubeCon NA in Booth #1741.