Bosch is generating synthetic data to fine-tune LLMs for manufacturing, improving decision support in plants while ensuring humans remain in control.
In the controlled chaos of a manufacturing plant where precision, efficiency, and safety are constantly negotiated against time and cost the idea of training a large language model on the messy rhythms of human decision-making seems almost absurd, according to Alessandro Oltramari.
Oltramari is the President of the Carnegie Bosch Institute at Carnegie Mellon University and a Group Leader at Bosch Research. He’s spent the better part of two decades trying to reconcile symbolic reasoning with the statistical force of machine learning, from the early days of applied ontologies through the long arc of cognitive architectures like CMU-developed ACT-R.
Where much of the AI world is still content to pour more data into ever-larger models, Oltramari has been asking a simpler, harder question: what if the right data doesn’t exist at all?
That is the problem Bosch faces every day across its factories, from automotive systems to household appliances. Decision-making is still, and will likely remain, a human activity. Good old fashioned human operators monitor lines, adjust machines, and intervene when sensors don’t tell the whole story.
These are not tasks that produce the kind of vast behavioral datasets you might find in psychology labs or online platforms. So rather than waiting for data that doesn’t exist, Oltramari and his colleagues started to generate it.
The ACT-R tool, a cognitive architecture developed to model the mechanisms of human cognition (memory, attention, perception, control and the like). Instead of collecting millions of examples of an operator adjusting a machine, Bosch researchers built an ACT-R model of the operator’s decision process. That model then produces what he calls synthetic traces, or the reasoning path, calls to memory, triggering of procedural rules, basically all the things that factor into weighing of uncertain signals.
“In the plants we don't have the same amount of data of human behavioral data that you know Binz and Schulz could find in the cognitive psychology literature and that's where we started thinking about using cognitive models of these specific tasks in the manufacturing facility to create synthetic data at scale that we could use to fine-tune the models,” he explained to a crowd at Arizona State University.
He adds that those traces become a kind of synthetic training set for large language models. And when Bosch fine-tuned an LLM with this data, the model’s performance on factory decision tasks improved dramatically compared to its pre-trained baseline.
But the effectiveness of this approach depends on fidelity, he says. “This only works if the cognitive models that you create are able to replicate with high fidelity these human decisions and processes in reasoning that we observe in the plants.”
In industry, in other words, there’s no practical way to collect the necessary behavioral data across hundreds of plants, product lines, and contexts. The kinds of cognitive models he describes offer a shortcut, essentially creating synthetic minds standing in for human operators, producing data at a scale that reality cannot provide.
But here’s the thing. The results were encouraging but not magical.
As Oltramari is quick to point out, fine-tuning, even on rich synthetic data, doesn’t really change the reasoning architecture of a model.
At the end of the day, if we think that human reasoning is something that emerges from a combination of knowledge, a combination of cognitive mechanisms and grounding in the experiences of the world, then the mechanism of fine-tuning, which is very much like training, doesn't add any capability in principle to the specific internal mechanisms of machine learning. So why are we expecting better reasoning if we are only providing more data for the model to kind of enrich its own latent semantic space?
Oltramari has seen this before. Earlier experiments fine-tuning LLMs on common sense knowledge bases produced improvements in specific tasks but left intact the deeper reasoning errors: failures in temporal logic, spatial reasoning, analogies, metaphors.
“We see errors in temporal reasoning, errors in spatial reasoning, errors in analogical reasoning, metaphorical reasoning,” he says. “All these different reasonings that have been studied profusely in the literature. So what we observe is that the knowledge we could find to fine-tune these models wasn’t sufficient to cure them from their hallucinations and mistakes.”
In the industrial context, the consequences of such errors are amplified. A misstep in a question-answering benchmark is an academic curiosity but a misstep in a factory can be dangerous, even catastrophic.
This is why Oltramari insists on neurosymbolic approaches, which are systems where symbolic verifiers or rule-based reasoners can clean up the contradictions that large models produce. He points to examples like DeepMind’s AlphaGeometry, where symbolic solvers vet the candidate solutions generated by a model, as the kind of hybrid thinking necessary for AI in the real world. “I think this is a very valuable approach and a very promising combination of symbolic reasoning and machine learning at scale,” he says.
Even beyond decision traces, the industrial world is full of knowledge, but not the kind that easily fits into a language model.
Bosch, like every large manufacturer, has decades of schematics, spreadsheets, diagrams, knowledge graphs, and operator heuristics spread across formats and systems. There are made of parts, databases of tolerances, annotated circuit diagrams, and know-how passed from one worker to another.
There is a lot of domain knowledge that has been collected and piled up over the years in different formats—from spreadsheets, databases, knowledge graphs, ontologies, circuit diagrams, to tacit knowledge of the human operators. All this knowledge is there and is very useful to bootstrap some of these technologies based on AI that we are developing.
The challenge is that none of this is homogeneous, and none of it scales neatly into a single training set.
LLMs can be pointed at multimodal inputs to summarize or translate, but if they make errors in consolidation, those errors propagate downstream, poisoning the decision pipeline.
For Oltramari, this is where symbolic and semantic methods remain indispensable. Structured knowledge, grounded in ontologies, provides the scaffolding for AI to reason reliably in industrial settings.
And it’s also why programs like the NSF’s Open Knowledge Network matter: they aim to federate disparate knowledge resources, turning fractured data into usable substrates. Without that foundation, fine-tuned LLMs risk becoming very confident systems built on very brittle ground.
But he irony is that the more automation creeps into factories, the more valuable the human operator becomes.
Oltramari says Bosch is not trying to replace humans in decision loops.
In general, and this is not just Bosch, this is any company that focuses on hardware and mechanical engineering, there are a lot of tasks that we do that can be augmented by AI but in a framework where humans are still the ones that make decisions. It’s important to focus on human–machine collaboration rather than delegating completely.
AI can support operators by surfacing context, simulating decision outcomes, or catching contradictions. But in sensitive environments where physical consequences are immediate and irreversible, the human remains in control.
This is why cognitive modeling resonates so deeply in manufacturing. It reflects not just the decisions humans make, but the bounded rationality under which they operate. Large models built on statistical prediction cannot replicate this dynamic but cognitive models can approximate it.
And when combined, the two approaches can yield systems that are more aligned with the actual rhythms of industrial life.



