From DeepMind to the Frontier of Machine Autonomy

Authored by

Nicole Hemsoth Prickett

Laurent Sifre, DeepMind veteran behind AlphaGo and Chinchilla, is building H Company’s Surfer H, an open, efficient agent that learns to use computers like we do.

Laurent Sifre was at the heart of it all, inside the walls of a place that had already rewritten the laws of what was computationally possible.

Sifre spent almost a decade at DeepMind, where the words AlphaGo, AlphaFold, and Chinchilla became shorthand for entirely new paradigms of reasoning. When he talks about the culture there, he calls it a research lab that thought in strike teams, not papers.

“Usually in research you write a paper every few months, send it to a conference, and have a small incremental impact,” he said. “At DeepMind we would form teams of ten or twenty people, work on a single problem in secret for a year or two, and aim for a Nature paper.”

The goals were monumental and often achieved. AlphaGo shattered the assumptions of human intuition. AlphaFold turned molecular structure prediction into a solved problem. AlphaStar trained reinforcement learning to play real-time strategy. And Chinchilla quietly recalibrated how the entire industry thought about scale.

Chinchilla was his proof that the model-size arms race had gone astray. After GPT-3, the world chased parameter count as if it were horsepower. Companies trained ever-larger models on the same static dataset of roughly three hundred billion tokens.

“We did the same internally,” he said in a recent episode of the , “and it didn’t really work very well.”

But what did work was a new proportionality between compute, tokens, and parameters. That equation became the Chinchilla scaling law. It was a rediscovery of balance, and it defined a generation of more data-efficient models across the field.

That instinct for efficiency has followed him into his next act. Early in 2024, Sifre left Google to co-found H Company, where he now serves as Chief Technology Officer.

The company is building what he calls computer-use agents or software that interacts with interfaces the same way people do. “They perceive the screen, they click, scroll, type, and zoom,” he explained. “The reason this matters is that much of the world’s software legacy isn’t accessible through APIs.”

That single constraint defines the next bottleneck in automation Sifre argues.

Corporate data is no longer opening outward. Slack, he noted, has already begun limiting access to its search APIs to promote its own licensed agents. More companies are expected to follow, guarding structured data the way they guard source code.

“Computer-use agents have an advantage,” he said, “because they operate like humans and you can’t restrict humans.” It is an inversion of the API economy. Where the last decade focused on structured calls, the next one will depend on perceptual dexterity.

To make that possible, H Company builds smaller, domain-specific models instead of a single general one.

Their flagship series, Holo 1, specializes in recognizing and localizing user-interface elements. Their latest release, Otter 1.5, reaches state-of-the-art precision for identifying buttons, forms, and context areas within complex graphical environments.

These models feed the agents that navigate between legacy systems, executing tasks without depending on brittle API wrappers. The goal is functional autonomy rather than linguistic showmanship.

The company’s approach to commercialization borrows from Sifre’s colleague and CEO, Gauthier Clau, who built Palantir’s French operations.

Rather than selling access by the token, H Company sells value. “If you charge per million tokens, you end up competing on price,” Sifre said. “It’s a much better conversation when you focus on the value created for the customer and take a fraction of that.”

Building an agent that manipulates a browser is nothing like training a model that writes code.

“The environment is messy and slow,” he said. “You click something and the page takes seconds to load. If you hammer the same site from a single IP you get blocked. You get CAPTCHAs. Everything can go wrong.”

To survive in that environment, their models learn through reinforcement. Agents generate trajectories (sequences of actions and results) that are then optimized for success rather than imitation.

It is end-to-end learning, a cycle that depends on careful orchestration between actors producing data and learners refining the model. “You have to coordinate both sides at scale,” he said.

This kind of training demands infrastructure that behaves less like a lab and more like a miniature hyperscaler. H Company runs clusters of GPUs tied to high-capacity disk arrays. They use TRL, a reinforcement-learning library, to manage training and vLLM for inference.

Every few iterations, model checkpoints are written to networked storage, where they can be accessed by thousands of concurrent processes.

In this context Sifre mentioned VAST Data, noting the company’s file and object architecture, he said, can handle both NFS and S3 protocols in a unified system, a capability that matters when hundreds of learners are streaming and mixing data shards simultaneously.

“Each learner reads different permutations of files and keeps many open at the same time,” he explained. “That’s where you need very strong storage infrastructure, like VAST.”

What H Company builds depends on low-latency file access, heavy concurrency, and reliable checkpointing, exactly the conditions that collapse on ordinary network storage once scale arrives.

In the kind of continuous reinforcement regime Sifre envisions, every actor writes data while every learner reads it. The system lives or dies by its bandwidth and its ability to parallelize without collision. The underlying architecture determines the pace of discovery.

H Company’s compute resides partly on AWS and partly on Scaleway, a French cloud provider located only a short walk from their Paris office. . “We thought it was important to have some of the compute located in France,” he said. “Some of our future customers will care that their data never crosses borders.”

Sovereign control is becoming as much a selling point as speed. In Europe, the preference for regional clouds is moving from regulation to expectation, especially among banks and healthcare organizations.

While the company continues to refine its internal systems, it is preparing a public platform that will let developers create accounts, launch agents, and monitor performance through dashboards. The system will show which tasks complete successfully, which fail, and what data those agents produce.

The first iteration focuses on web automation, but the roadmap extends to desktop and mobile environments. It is, in effect, an early layer for computer-use intelligence.

Sifre says some enterprise clients have asked for one hundred thousand discrete tasks per day, each requiring its own inference loop. Meeting that demand at a viable cost means finding ways to cache results, reuse partial computations, and reserve the most advanced models for when they are truly necessary.

“Even small models become expensive at that volume,” he said. Caching, when done carefully, can absorb much of that load, but only if evaluation frameworks are robust enough to detect when reuse begins to degrade accuracy.

The product now emerging from this work is Surfer H, a cost-efficient web agent built on open weights and modular design. It separates intelligence into three parts: a Policy that decides, a Localizer that perceives, and a Validator that checks success.

Surfer H runs purely through the browser, mirroring the way people act online. On benchmarks such as WebVoyager, which covers hundreds of real-world tasks, the system shows that small, specialized models can outperform larger closed ones when efficiency is measured in accuracy per dollar. With the Holo 1-7B model it achieves more than ninety-two percent task accuracy at roughly thirteen cents per task, defining a new Pareto edge for agentic performance.

The pattern behind all this work traces back to the same discipline that produced Chinchilla. The idea is to find balance between size, data, and outcome, whether in a transformer model or an embodied agent clicking through a website.

When asked about the future, Sifre described it as a shift from episodic training to continuous learning, where actors, learners, and storage move together without pause. It is a machine that never sleeps, growing by doing. The pieces are in place already he thinks, smaller specialized models, infrastructure capable of sustaining the traffic, and a clear understanding of where inference remains fragile.

From DeepMind to the Frontier of Machine Autonomy

More from this topic