Mirroring of mission-critical databases
Würth IT sees itself as a medium-sized, internationally oriented full service provider for the IT of numerous trading and service companies within and outside the Würth Group.
Since the first IT projects realized by the Künzelsau-based company in the 1980s and the merger with Comgroup in 2002, the IT service provider has continued to develop.
Würth IT has grown strongly in recent years. Following the establishment of a branch in China in 2004, further locations in India and the USA have been added in recent years.
IT works around the clock for its customers, which places particularly high demands on availability, data protection and data backup.
"New requirements mean we have to reinvent ourselves every year"
says Jörg Engel, Head of Unix Platform Services at Würth IT, where he is responsible for the SAP basis and all databases and Unix systems, among other things:
"We supply a large number of stores and branches with IT - and online marketing continues to gain in importance for our users."
The number of users is enormous: the Würth Group unites over 400 companies that are supplied with services by Würth IT. Around 15 percent of sales are generated with customers that do not belong to the Würth Group. However, SAP software also plays a major role within the Group.
The services primarily include ERP applications, Sharepoint services, telephony and IT operations. In total, there are around 1400 operating system instances, on which around 70 percent of the servers run on virtual machines.
ERP systems used worldwide run on the central systems, with SAP solutions being the main focus here. Worldwide can be understood literally here, because the companies in the Würth Group "follow the sun" and work around the clock. The companies connected to the global SAP environment generate around five billion euros.
With the digitalization of more and more new business processes, the maintenance windows for the Würth Group's IT staff are dwindling towards zero. IT work has long been carried out in the background. But not always unnoticed by the users.
"Zero-downtime backup has always been important for the round-the-clock operation of our users, but with the load on the systems, the backups, for example, were causing noticeable system loads. We had to look for new approaches"
explains Harald Holl, Member of the Management Board and, as Head of IT, responsible for the data centers.
Slowing IT load
Part of the workload traditionally comes from IT itself. The aim of every IT manager is, of course, to keep this load as low as possible. In the past, the applications to be backed up had to be switched off completely for a meaningful backup, but for some years now, backups have been running in the background while IT operations continue.
However, despite the zero-downtime backup, online data backup leads to noticeable burdens for all users. This was also the case at Würth IT. One problem that led to the high load during the data backup was the checkpoint times of the database when creating snapshots. Due to the size of the database, which is used in 24-hour operation, log file sets of several terabytes per day are created.
Without the Libelle solution, the restore took at least 13.5 hours in the best-case scenario, if no complications arose. A restore of the database with a size of around 28 terabytes, a restore of the log files with a log volume of 400 to 650 logs with 4.5 gigabytes per log and the manual restore of the redo groups had to be carried out.
"We had to rethink the entire backup scenario and the high-availability concept"
says Holl:
"The existing systems were optimized to their limits. But we didn't want to start developing our own, we were looking for a standard solution."
Now, users and providers of IT solutions understand standard in very different ways. For providers, a basic solution on which various modules are based via standardized APIs is already standard software.
Users tend to think of standard software as solutions that can be put into operation through configuration - for example, using the templates provided.
The Engel und Holl team found what they were looking for in the Stuttgart-based software company Libelle and its DBShadow solution, which was brought to Engel und Holl by Würth IT's system house, SMC Spengler IT Software Consulting GmbH.
After testing other software with a high level of individual effort, the decision was made to quickly switch to the Libelle solution.
However, Würth IT allowed itself an extensive POC (proof of concept) in which many different scenarios and a wide variety of usage and benefit aspects were tested.
Recovery times
The DBShadow now makes use of the mirror data center and its patented time funnel: After a one-time initial copy of the database, all transactions run into a buffer on the mirror side, also known as the time funnel.
The retention time of the data in the funnel is determined dynamically by Würth IT. Once this defined waiting time has elapsed, the transactions are reverted to the mirror database so that there is a constant time offset between the productive and mirror systems, but all transaction data that has not yet been reverted is already physically on the mirror side.
From an infrastructure perspective, the database on the mirror side runs completely independently of the productive database, while the DBShadow processes link the two systems at a logical level.
This means that in the event of a classic disaster, a running, up-to-date system is available again in minutes. Jörg Engel can also perform the regular backup on the mirror system at any time without disrupting the ongoing operation of the productive system.
The use of DBShadow therefore eliminates the need to restore the 28 terabyte database, restore the log files, restore the logical links of the databases and create the redo groups in the event of DR.
The time required to make a productive system available again is now only ten minutes, or a maximum of five hours in the worst-case scenario. Without the Libelle solution, the time required - if everything works - would be 13 hours and 30 minutes.
An important additional benefit is the insurance against the consequences of user errors. While hardware-based methods such as snapshots primarily protect against technical errors, the Libelle solution can also protect against logical and human errors.
Such errors are far more common than data loss due to hardware failures. Even in the event of logical errors, such as user errors, faulty software updates or similar, the productive system can be switched to the shadow database via "point-in-time recovery".
In just a few minutes, all valid transactions are then recovered from the time funnel to the shadow database - and precisely up to a definable point in time before the user error or fault. The shadow database is then switched online as a productive system.
Disaster Recovery
Following the recommendations of the BSI and common sense, large distances from physical sources of danger are important. These can be filling stations or tank farms as well as companies where chemicals are processed.
It is not always known how close such dangers are. While many companies set up their replication mirrors in a data center just a few kilometers away at best due to bandwidth restrictions and latency times, Würth IT not only uses the Libelle concept to mirror to the neighboring mirror data center, but will soon also be using the long-distance option of DBShadow to mirror databases in Switzerland.
This also counteracts the risk of data being destroyed by large-scale power failures, regional disasters and attacks and the like.
Engel also justifies the decision in favor of DBShadow with its user-friendliness:
"Whether we use the shadow database for data security, for the rapid rollout of software in the affiliated companies of the Würth Group or for ease of use, we make rapid progress with Libelle." - "With other solutions, the individual effort was far too high"
confirms Holl.
In productive operation, it is now called "Libelle once a day", meaning a check of four displays in the DBShadow GUI. Jörg Engel explains:
"We monitor the shadow, although this is not really necessary due to the reliability of the solution, but it is our most important database."
Würth IT considers the possibility of correcting semantic errors to be very high, although the baptism of fire has so far failed to materialize.
"It's good if we don't need a rollback. The Libelle time funnel is an insurance policy for this"
confirm Holl and Engel.