AI Hallucinations and the Autonomous Enterprise


Large Language Model Hallucinations
Currently, existing SAP customers are being led to believe that the erroneous outputs of large language models (LLMs)—so-called “hallucinations”—are merely temporary teething problems that will soon be brought under control through ever-larger data sets, more sophisticated architectures, or rigorous fact-checking. However, findings from theoretical computer science debunk this narrative and reveal that SAP and other AI companies are laboring under a dangerous misconception.
Independent research groups, led by Sourav Banerjee, Ayushi Agarwal, and Saloni Singla in their highly acclaimed 2024 paper, have proven beyond a doubt that hallucinations are not a solvable problem in theoretical computer science or an ERP data issue. Rather, hallucinations are an inevitable mathematical and logical property of large language models. The researchers coined the apt term „structural hallucination“ for this computer science problem.
Kurt Gödel and Alan Turing
For IT decision-makers in the SAP environment, this requires a look at the history of mathematics—specifically, Kurt Gödel’s first incompleteness theorem and Alan Turing’s halting problem from the 1930s. These theorems irrefutably prove that a perfect „truth machine“ is simply mathematically impossible.
Applied to the inner workings of modern LLMs, this means that at every single stage of the processing workflow—from the compilation of training data through intent classification and fact retrieval to the actual text generation —there is a non-zero probability of error that cannot be eliminated through optimization.
AI researchers and computer scientists have mathematically proven that no training database can ever be 100 percent complete. Even if the knowledge were present in the system, the LLM—due to its probabilistic nature—cannot guarantee that it will accurately extract the correct facts from a massive data fabric.
The Retention Problem in Language Models: Ontology Failure
The situation is made even more critical by the undecidability of the halting problem, which has a full impact on LLMs. A language model can never predict a priori how many tokens it will generate or at what point its computation will come to a complete halt. Because the model does not know when its own text generation will end, the sequence of generated tokens is unpredictable in advance, which inevitably makes the system prone to producing self-contradictory, paradoxical, or simply false facts.
Existing SAP customers must also understand that even downstream control mechanisms—such as fact-checking or Retrieval-Augmented Generation (RAG), which are often touted as a panacea— can never completely eliminate structural hallucinations, since even these verification steps—which involve a finite number of steps—are not error-free.
Transformation Strategy: SAP Autonomous Enterprise with an Error Rate Greater Than Zero
The implications of these findings for existing SAP customers are dramatic and shed light on the risks of SAP’s current transformation strategy. If SAP plans to use Agentic AI to integrate hundreds of autonomous AI agents deep into the business-critical processes of S/4 HANA or the Business Technology Platform (SAP BTP), then purely statistical, probabilistic models will be unleashed on highly sensitive, deterministic ERP tasks.
When such AI makes autonomous decisions regarding supply chains, payroll, or year-end financial statements, an error rate greater than zero is not an acceptable compromise, but a business-critical risk. An error in a production SAP system has immediate business, financial, and legal consequences. IT decision-makers must therefore not be blinded by the rhetoric that ever-increasing computing power and ever-larger language models are the solution.
SAP Performance Limits and the Laws of ERP
Scaling merely pushes the limits of performance; it does not override the mathematical laws of nature. Humans as the controlling authority—the often-derided „human in the loop“—are therefore not a bothersome, temporary stopgap on the path to perfect artificial intelligence, but rather a permanent, mathematical necessity for safeguarding corporate autonomy. Anyone who entrusts their ERP system to these hallucinating algorithms without any safeguards is ignoring nearly a century of fundamental computer science research.
For SAP decision-makers, the conclusion is therefore that using generative LLMs for business-critical, strictly deterministic tasks is highly negligent. If an LLM’s probability calculations are to determine whether or not a salary payment is made at the end of the month, the architectural dysfunctionality of this purely statistical approach becomes apparent.
LLMs vs. Energy-Based Models
AI pioneers are therefore strongly advocating for research into alternative architectures, such as energy-based models, which seek logical consistency and physically feasible states rather than simply stringing words together. SAP itself has also had to respond to these limitations of traditional LLMs and has introduced specialized foundation models such as RPT-1, which is specifically trained on tabular relationships and circumvents the error-prone language-token paradigm when dealing with enterprise data.
However, as long as traditional LLMs form the core of the new business AI strategy, every existing SAP customer must recognize—through conceptual education—that technical advances and computing power cannot overcome mathematical limitations; strict, deterministic governance outside the AI model—acting as a control mechanism for these probabilistic black boxes—therefore remains indispensable.
Human in the Loop
There is no alternative to the „human-in-the-loop“ approach. From a mathematical perspective, all technical mitigation strategies—from RAG to RPT-1—are revealed to be purely risk management measures that lower the probability of error but never reduce it to zero. Since hallucinations are based on the same theoretical impossibilities that Kurt Gödel (incompleteness theorem) and Alan Turing (halting problem) proved for formal systems as early as the 1930s, there is no fully autonomous solution.
For business-critical SAP processes in which Agentic AI is intended to independently trigger orders or initiate financial transactions in the future, the human as the final authority (Human in the Loop) is therefore not merely a cumbersome stopgap solution on the path to the perfect machine. Anyone making decisions in the SAP environment for which the company bears legal and financial liability must recognize that the validation of probabilistic AI results through deterministic rules and human expertise remains an inevitable, permanent, and mathematical necessity.






