From Detection to Reasoning: How Autonomous Systems Are Scaling AI

In this latest episode of the Shared Everything podcast we sit down with Norm Marks, VP of automotive at Nvidia, to chat through the technical and infrastructure shifts reshaping all things autonomous, from vehicles, robotics, and industrial AI systems.

Marks brings nearly three decades of experience in automotive software, having worked across engineering, manufacturing, customer experience, and analytics as the industry evolved from early business intelligence systems to predictive analytics, machine learning, and now generative and agentic AI.

What stands out to him today is not simply the capability of the technology but the speed at which it is spreading across the enterprise.

“The speed at which this has transformed from Gen AI and chatbots to now full blown agents… the pace has been dizzying,” he tells us.

Marks walks us through the progression of autonomous driving in three stages. Early systems, what he now refers to as AV 1.0, focused on detection. Sensors identified lane markers, nearby vehicles, and driver attention, enabling things like adaptive cruise control, braking assistance, and in-cabin monitoring systems.

The current generation of autonomous systems operate at a more advanced level, he says, shifting from detection to prediction. Instead of simply identifying a pedestrian standing near the road, the system attempts to predict whether that pedestrian will step into traffic and adjusts behavior accordingly. NVIDIA now describes the next phase as AV 3.0, where reasoning models evaluate context and potential outcomes in a way that more closely mirrors how human drivers interpret complex traffic situations.

Those architectural shifts dramatically increase the computational scale required to train autonomy models. Early training environments might have required clusters of roughly 10kGPUs. Predictive autonomy models increased that requirement to around 40k GPUs, while reasoning-based systems can now require 80k GPUs or more in large-scale training environments.

Much of that scale is, of course, driven by data. Autonomous driving systems depend on vast quantities of real-world driving data captured from vehicles operating in the field. But real-world data alone cannot provide enough examples of rare or dangerous edge cases. As a result, simulation and synthetic data generation have become essential components of modern training pipelines. Engineers can capture a single real-world driving scenario and generate thousands of variations of that environment under different weather conditions, traffic patterns, lighting environments, or behavioral outcomes.

Depending on the maturity of the program, synthetic data may represent a substantial portion of the training dataset. Companies with large fleets collecting real-world driving data may rely on synthetic generation primarily for rare events, while newer programs may rely on synthetic data for more than half of their training inputs.

It’s creating a situation where it’s not just a storage and a data gravity issue… it’s leading to hybrid deployments. So you might be all in on-prem at your headquarter, but then you want to use the cloud in another geography where you only need a smaller data set.

The result is an enormous expansion in both data volume and infrastructure complexity. Training pipelines increasingly span multiple geographies as manufacturers adapt models for different markets.

As Marks says, a company may train core models using large on-premises clusters at headquarters, while deploying additional training environments in regional cloud deployments where localized driving datasets are processed. He adds that these deployments are increasingly hybrid, shaped by data gravity, regulatory considerations, and the need to train models close to where data is collected.

Autonomy training is only one part of that broader architecture. Automotive engineering environments now combine traditional high performance computing workloads such as computational fluid dynamics with digital twin environments used to simulate vehicle behavior and manufacturing systems. Those digital twins increasingly feed into AI pipelines capable of optimizing factory operations in real time.

The infrastructure supporting these environments often involves multiple GPU architectures optimized for different workloads. One cluster may accelerate CAE simulations, another may render digital twin environments, and a third may handle large-scale autonomy training or robotics development. The result is a layered computational environment in which data, simulation, and AI training pipelines operate simultaneously across distributed GPU clusters.

That same pattern is now extending beyond vehicles themselves. Robotics systems and humanoid platforms are increasingly trained using similar techniques, combining real-world data capture with large-scale synthetic environments. Marks notes that training traditional industrial robots may have required only a few thousand GPUs in the past, while humanoid robotics systems can require clusters of ten thousand GPUs or more.

Across all of these systems, the underlying challenge is increasingly architectural rather than algorithmic. As Marks explains during the discussion, organizations are effectively building AI factories, infrastructure platforms designed to continuously train, simulate, and deploy intelligent systems across global environments.

The models may receive the most attention, but the real engineering challenge lies in building the systems capable of feeding them.

From Detection to Reasoning: How Autonomous Systems Are Scaling AI

More from this topic