The global and independent platform for the SAP community.

AI transparency is not a problem for ML and the black box phenomenon

Machine learning only provides me with a black box, we can't do anything with it. This sentence, which I often encounter in practice, usually comes from future users of an AI project who are less trained in data analytics. But is this statement true?
Ansgar Heidemann, Windhoff Group
November 9, 2023
avatar
This text has been automatically translated from German to English.

In this article, I will use five concrete examples to argue why this statement is rarely true without qualification. There are cases in which it is even unavoidable or sometimes not so bad that prediction models created with machine learning (ML) are a black box.

We speak of a black box when we do not understand the logic of a model. Using the example of a predictive model (predictive analytics), this means that the output of machine learning cannot be easily understood on the basis of specific input data. In contrast to classic statistical analyses, the mathematical rules of a complex algorithmic model cannot be described with a handful of (linear) parameters.

In fact, in my opinion, transparency and explainability are only essential in high-risk applications of artificial intelligence (AI). Here, however, we are not helpless in the face of the black box problem. Research has developed numerous methods around Explainable Artificial Intelligence (XAI). SAP, for example, is continuously incorporating these new options into its technologies in order to make the logic of the models created easier to understand.

Argument 1: A lack of transparency also has advantages. There are use cases for machine learning where the lack of traceability becomes a real strength. Imagine that an internal or external actor wants to deliberately manipulate a (partially) automated decision-making system based on ML. However, since they do not know which input data leads to which output in the model, this becomes much more difficult. Scientists call this aspect "gameability" (Langer and König 2021).

The scientists also identified further advantages, depending on the application, when they evaluated algorithmically supported decision-making processes. The result: with non-transparent models, they lead to increased efficiency! The reason is as plausible as it is simple. Machines cannot be distracted by unnecessary details and a flood of information. In addition, intransparency contributes to data protection if personal data is included in the training process.

Argument 2: the trade-off between performance and transparency. With any demand for transparency, it should be borne in mind that transparency comes at the expense of model accuracy. Machine learning is designed to discover detailed and non-linear patterns in data and make them available in the models (Kellogg et al. 2020, pp. 370-371). In order to increase transparency, this complexity could be gradually reduced or less complex algorithms could be used. However, both measures subsequently reduce the accuracy of the forecast results. The strength of machine learning is therefore undermined. An important aspect that leads directly to the third argument.

Argument 3: Lack of transparency is a key characteristic of machine learning. While traditional statistics are designed to understand the data, machine learning attempts to make existing data usable, for example to create the most accurate forecasts possible based on historical data. If pattern recognition in data is the main goal of the project (data mining), then machine learning could simply be the wrong tool (Rudin 2019). In a project, statistics or descriptive data analysis in the style of a dashboard (slice and dice, drill-down) can be combined with machine learning. Each tool then fulfills its main purpose and synergies are created. Every craftsman uses a whole toolbox. To round off the metaphor: Machine learning is a very good universal tool, comparable to a cordless screwdriver. But that doesn't mean it can be used to saw boards.

Argument 4: Hey, as long as it works?! Let me start with a short thought experiment. Would you rather fly in an airplane that you yourself have thoroughly inspected technically and dissected down to every screw, or in one that has passed all the prescribed test standards and test flights with flying colors? Surely almost everyone lacks the engineering knowledge or at least the patience for the first option. That's why we go for the second option. This analogy comes from Cassie Kozyrkov, who famously works as Chief Decision Scientist at Google.

Cassie Kozyrkov also points out that a test run is also carried out in machine learning. This so-called out-of-sample test is basically an exam: the data sets (tasks) are different from those provided for training (homework). In practice, it is often worth carrying out these out-of-sample tests in detail and thoroughly rather than chasing after a desired level of transparency. This argument certainly carries a lot of weight, but requires a rethink in the way operational decision-making processes are founded. This is why it takes time for it to sink into the minds of users. Data scientists are challenged here to explain the implications of their test strategies and results in easy-to-understand terms.

Argument 5: Research and technology are not standing still - Explainable AI. Finally, I would like to point out the important fact that transparency in machine learning cannot always be dispensed with. Transparency is absolutely essential in high-risk applications that have a direct impact on people's immediate lives. This includes decision support through machine learning in sensitive areas that can close doors. This applies to lending, recruiting and human resources, among others. In such areas, ensuring fairness and equal treatment is an absolute priority and cannot be achieved without transparency and explainability of the models. Fortunately, technological progress is not standing still here. SAP, for example, has been continuously incorporating Explainable AI into its predictive analytics products for several years. In technologies such as SAP Analytics Cloud or Hana Predictive Analytics Library, complex machine learning models can still be made more transparent to a certain degree. The methods extract information such as the effect of individual influencing factors on the outputs of models or approximate the models with comprehensible rule systems based on fundamental questions: What if? And above all, what if influencing factor X changes?

Conclusion: Explainable AI


Using machine learning just because it fits in well with the AI hype around ChatGPT does not always make sense. If knowledge extraction is a goal, a different tool should perhaps be used early on in the project. However, once a suitable use case for machine learning has been found, extensive testing is the decisive factor for validation. Transparency can also be created for complex models if required, but then you need to invest in additional expertise and the use of Explainable AI methods. My final opinion with regard to the title of this article: The black box phenomenon for machine learning should only be a show stopper for innovative ML projects in critical exceptional cases.

avatar
Ansgar Heidemann, Windhoff Group

Ansgar Heidemann is Consultant Business Intelligence at the Windhoff Group and external doctoral student at TU Dortmund University.


1 comment

  • Liebe LeserInnen,

    der ursprüngliche Titel zum Text lautet: “Maschinelles Lernen und das Black Box Phänomen – Warum fehlende Transparenz nicht immer ein Problem ist “. Der aktuelle Titel stammt aus der E3-Redaktion. Künstliche Intelligenz ist ein beliebter Begriff, der sich überall gerne einmischt 😉

    Ich bin mir sicher, dass es andere Meinungen gibt bezüglich der Notwendigkeit von Transparenz von ML-Modellen. Lasst uns gerne über das Thema diskutieren!

Write a comment

Working on the SAP basis is crucial for successful S/4 conversion. 

This gives the Competence Center strategic importance for existing SAP customers. Regardless of the S/4 Hana operating model, topics such as Automation, Monitoring, Security, Application Lifecycle Management and Data Management the basis for S/4 operations.

For the second time, E3 magazine is organizing a summit for the SAP community in Salzburg to provide comprehensive information on all aspects of S/4 Hana groundwork.

Venue

More information will follow shortly.

Event date

Wednesday, May 21, and
Thursday, May 22, 2025

Early Bird Ticket

Available until Friday, January 24, 2025
EUR 390 excl. VAT

Regular ticket

EUR 590 excl. VAT

Venue

Hotel Hilton Heidelberg
Kurfürstenanlage 1
D-69115 Heidelberg

Event date

Wednesday, March 5, and
Thursday, March 6, 2025

Tickets

Regular ticket
EUR 590 excl. VAT
Early Bird Ticket

Available until December 20, 2024

EUR 390 excl. VAT
The event is organized by the E3 magazine of the publishing house B4Bmedia.net AG. The presentations will be accompanied by an exhibition of selected SAP partners. The ticket price includes attendance at all presentations of the Steampunk and BTP Summit 2025, a visit to the exhibition area, participation in the evening event and catering during the official program. The lecture program and the list of exhibitors and sponsors (SAP partners) will be published on this website in due course.