Know more, plan better
It is often forgotten that the battle for skilled workers is not only fought on the labor market, but within the companies themselves. Those who succeed in permanently retaining qualified employees secure their wealth of experience and their own competitiveness. This is the conclusion of a pilot project on the topic of machine learning, which the Windhoff Group carried out together with the German Pension Insurance Association (DRV Bund).
"At the DRV Bund, the topic of strategic human resources planning is being more important than ever in the coming years. Due to Germany's age structure, we find ourselves in a double demographic dilemma: On the one hand, as the number of retirees increases, so do the volume of applications and thus our personnel requirements. On the other hand, a very large number of employees are leaving us in the medium term due to age, which reduces our workforce. Added to this is an ever more intense war for talent and an increasingly dynamic world of work."explains Dr. Michael Tekieli, responsible for People Analytics at DRV Bund.
Changing priorities
So far, the issue of attrition has not been one of the pain points, but it will definitely increase in relevance in the near future, Dr. Tekieli added: "To avoid turnover becoming a pain point, we need targeted solutions. These should enable us to at least anticipate changes in the personnel landscape early and accurately and, in the best case, to use our knowledge to proactively and effectively counteract success-critical departures."
Against this background, two questions were formulated for the pilot project: To what extent can machine learning help to identify fluctuation risks? Is Explainable Artificial Intelligence (XAI) capable of understanding the reasons for non-age-related fluctuation? To provide answers to these questions, the project managers decided to use Smart Predict in conjunction with the SAP Analytics Cloud.
In the first step, it was important to create a coherent data basis from internal (ERP and HR systems) and external sources. In practice, this meant that an employee's personal and professional data were supplemented by aspects from the corporate environment. In total, 40 descriptive attributes were coded. Next, the most important influencing factors were identified: Age, age of youngest child, actual working hours excluding absenteeism, length of service in months, absolute salary increase in the past twelve months, and the severity of constraints due to actions taken during the pandemic when matched against the Covid Stringency Index.
In the search for suitable descriptive variables, scientific publications as well as the creativity of the entire project team were consulted. Different time horizons of a non-age-related termination in the coming months (1/3/6/12) emerged as target variables. The attributes captured were collected for each of the 25,000+ employees on a monthly basis for 2018 through 2020. This resulted in a data set of 650,000 rows or 230 megabytes.
Smart Predict
Smart Predict makes it possible to perform the analysis of the collected data in the specialist area as a self-service. Decisive arguments for the use of SAC were the fast development of results through automated machine learning, transparent results thanks to XAI and a high prediction quality of the powerful machine learning algorithms. The analysis works intuitively and can be performed without programming knowledge, so neither IT experts nor data science resources are necessary.
To counteract reservations about acceptance, a so-called out-of-sample test was carried out. Among other things, the project showed that an algorithm trained with the data before June 2020 would also have retrospectively detected employees leaving in the second half of 2020 with a hit rate of 12.5 percent. The test also showed that the learned correlations were "robust," i.e., transferable into the future. Overall, more than 50 percent of non-age-related fluctuations were detected by machine learning.
Another method of increasing acceptance was a plausibility check of the patterns found, which the predictive analytics model uses to generate forecasts. The patterns found continued to be an important means of understanding which employee might leave the company and for what motivations. However, modern machine learning algorithms are so complex that the effect of influencing factors on the probability of churn cannot be directly understood. This is referred to as a black box phenomenon. In recent years, approaches have been created in the research field of XAI to successively explain the black box.
For example, SAC has been using SHAP values since the Q3 2021 release. This allows the significance and influence of various attributes to be explained at the local level, i.e. in relation to the individual employee. This enables a plausibility and causality check by domain experts as well as the extraction of new knowledge.
In addition to the individual evaluation of individual employees, analyses were also carried out with regard to prototypical examples using automated clustering procedures (smart grouping). The link to the existing personnel planning for age-related fluctuation is made by aggregating the expected values across various dimensions. This makes it possible to identify which departments and positions are likely to be particularly affected by non-age-related fluctuation. Dr. Tekieli adds: "For us as an organization, the only option is to evaluate and present the data at an aggregated level. This is how we strike a healthy balance between improving the operational framework for our employees, taking into account the diversity of needs, and at the same time ensuring the protection of personal data in accordance with the GDPR at all times.
Convincing project results
It remains to be said: With modern technologies, objective answers can be extracted from historical HR data. Thus, fluctuation can be anticipated months in advance and even probabilities can be quantified prognostically. Consequently, HR retention measures (for example, targeted further development opportunities) can be actively initiated to keep groups of employees in the company.
"I was initially very skeptical about the use of automated machine learning, since the supposedly important step of hyperparameterization was omitted. All the more I was positively surprised by the prediction quality. An important success factor of automated machine learning is certainly the data quality. If you leave all steps to the machine, the quality of the training data becomes even more important. The rather unpopular and dry topic of data integrity should therefore be at least as important for organizations as a consistent and clear machine learning dashboard."Dr. Tekieli notes.