Beware of Open Source Washing in AI


SAP users cannot avoid AI and ML either. The technologies can be used in a wide range of SAP areas, for example as predictive or analytical AI in master data analysis, the optimization of production processes and supply chains, quality control and in generative form. When using AI in business-critical areas such as SAP environments, a high degree of traceability and explainability must always be guaranteed. Companies do not want a black box for algorithms, training data or models, but an AI that also observes legal and ethical principles.
Transparency through open source
The value of proven open source strategies and technologies becomes apparent during implementation. As in software development, they stand for transparency. However, there is a risk of open source washing here, as a comparison of the core elements of open source software and open source LLMs shows. Open source software is characterized by transparent, comprehensible algorithms, visible error handling and the opportunity to drive further development with the involvement of the community. In contrast, many of the open source LLMs are usually freely available, but offer little insight into aspects such as training data, weightings, model-internal guard rails or a robust roadmap. Surprises are the order of the day.
From a company perspective, traceability and data basis are fundamental, if only for liability or compliance reasons. The questions arise as to whether and which foundation model of monolithic development should be used, how the limitations and risks can be countered and what effort needs to be invested in operation, fine-tuning and monitoring. Compact and domain-specific models for an SAP user are an interesting option here, as they are easier and quicker to train for a defined area of application and can be operated and integrated in a more controlled manner.
When using generic LLMs, which offer varying degrees of openness in terms of pretraining data and usage restrictions, these models can now be extended for a specific business purpose using a different procedure. Red Hat and IBM have launched the InstructLab community project for this purpose.
It requires less data and computing resources to retrain a model. Users and, if desired, the community can continuously improve the models by "knowledge" and "skills" through upstream contributions - according to true open source principles, without generating thousands of new model variants. It is therefore possible to further strengthen a RAG process by applying the RAG technique to a model that has been coordinated with InstructLab.
Flexible and hybrid MLOps platforms
The platform used is always an important part of the AI environment. Instead of many isolated, often non-scalable sandbox environments, SAP users expect a flexible and hybrid MLOps platform for productive use, which is now available with Red Hat OpenShift AI, for example. Such a platform supports the training, provision and uniform monitoring of all AI applications in the cloud, at the edge and on-premises. AI will also play an important role in SAP environments in the future, both inside and outside SAP, integrated with each other and highly scalable.
However, due to the high regulatory requirements resulting from the GDPR or the EU's AI Act, for example, trustworthy AI is essential. An open source approach is the right basis for this, as it offers transparency, innovation and security. However, it must be genuinely open source - also with regard to the training data or AI models - and not just open source washing.
To the partner entry:
