Logwriter & Datawriter

With Hana, SAP has been developing a new technical basis for its applications for several years. The motivation for this may be that a technological paradigm shift is imminent with non-volatile RAM (NVRAM).

Jürgen Meynert, Fujitsu

27 February 2014

Content:

To the comments

This text has been automatically translated from German to English.

The in-memoryTechnology requires fundamentally new programming models that cannot be realized by adapting existing software, but require radically new approaches. This means that a paradigm shift is imminent not only in hardware technology, but also in software technology.

As technology has advanced, the access speed of storage systems has not kept pace with increases in processor speed. At CPU clock rates of 3 GHz, which corresponds to cycle times of 0.3 nanoseconds, processing steps in the processor take on the order of nanoseconds (ns), while accesses to external storage are on the order of milliseconds (ms). That is a disproportion of 1 to 1,000,000!

As a consequence, CPUs in information processing applications spend most of their time waiting for IO. Now it is not enough just to make storage faster, for example with ultra-fast Flash-devices, since even the light and thus also the data can only cover a very limited distance in the ns range (< 30 cm in 1 ns).

Thus, fast data access can ultimately only be achieved by keeping the data available close to the processor: in the RAM, or better yet in the cache.

Code to Data

Further acceleration of processing speed can be achieved by executing application code directly in the database, thus avoiding comparatively high latencies in communication between the application and the database.

Whereas data used to be channeled through the database to the application, in the future application code will be brought to the data. This is the best way to describe the paradigm shift: Instead of "data to code," it will be "code to data" in the future.

Current is RAM However, this data is still volatile, so that write operations in main memory must be protected by a persistence layer, i.e. ultimately storage again. For read access, even to very large amounts of data, the following is required RAM already well equipped today, since with ever higher packing density of the memory elements and simultaneous drop in price, computers with high RAM-capacities (up to several TB) are available at reasonable prices.

Since reading from the RAM The focus of SAP Hana and other in-memory solutions is on the development of new technologies.Technologies focus on reading applications such as reporting and business intelligence (OnLine Analytic Processing, OLAP).

For transactional systems (OnLine Transaction Processing, OLTP), advantages can be gained from the fact that, on the one hand, online reporting on the transactional data is possible without performance losses in transaction processing, or that code lines with a high volume of communication between the database and the application already benefit from a shift to the database.

But whether OLAP or OLTP, the In-memory-DB (IMDB) requires a Persistencebecause, at the latest, when the computer is switched off, the data from the RAM disappeared.

Persistence layer and performance

Since in an IMDB the data accesses are predominantly in the RAM one might expect the storage to play a minor role as a persistence layer in terms of performance and to serve primarily as a safeguard to ensure that no data is lost. The requirements of the SAP to the performance of the Persistence were and are partly higher than for classic databases. In general, two writing mechanisms can be identified for databases - logwriter and datawriter. The logwriter documents promptly (synchronously) in a separate area each individual change (insert, update, delete), which is carried out on the database. The datawriter updates the changes to the tables in the storage from time to time (asynchronously) and ensures a consistent, but usually not up-to-date (since asynchronous) image of the database. The logwriter is critical for transaction processing and for database recovery, should it ever be necessary. A transaction is only considered complete when the logwriter has reported it as documented. Only then can processing continue. This ensures that the last valid state can be restored after an unplanned termination of the database by updating the last consistent data image with the log entries not yet recorded there (roll forward).

Logwriter & Datawriter

In the early revisions of Hana the logwriter was designed to write all changes in small block sizes to the log area. When extensive changes were made to the database, this resulted in a significant number of IO operations. Therefore, at that time the requirement of SAPthat the Persistence had to be able to write at least 100,000 IOps (IO operations per second).

This can be achieved with reasonable effort only with local Flash-devices (PCI-based). Therefore, most single-node installations of Hana had and still have PCIe-based Flash-devices. Later Hana was extended by a ScaleOut architecture for the case that the maximum possible main memory expansion within a computer was no longer sufficient to completely store a larger database.

Hana can be distributed to several computer nodes with this option. The computers can be designed in such a way that not all of them are active, but that one or more nodes can also be used as a Failover can be configured in case an active node fails. However, this requires an (external) Persistence which can be read by all computers, because otherwise a Failover-node does not read in the data of a failed computer.

This meant that the concept of writing log data very quickly to a local device was no longer tenable. Accordingly, the logwriter was optimized so that it could write variable block sizes. This also meant that the high IO rates were no longer necessary. In a scale-out scenario, just under 20,000 IOps per computer node was sufficient. Nevertheless SAP maintained the 100,000 IOps for single nodes until the recent past.

In addition to the logwriter, there is also, as already mentioned, the datawriter. At first, one would think that this is not critical in terms of performance, since it writes asynchronously. In fact Hana at configurable intervals - the default is five minutes - so-called savepoints. The performance of the storage must be such that at least the throughput of the volume changed between two savepoints can be written in the available time interval.

Since the datawriter operates on a copy-on-write basis, the write load tends to be sequential, as modified Blocks are not overwritten, but the changes are written to newly allocated Blocks be filed. This simplifies the requirements for the Persistencebecause sequential IO can be implemented much more efficiently than random IO.

Since the column-based internal architecture of Hana is comparable to databases that are one hundred percent indexed, the data in this database is much smaller than in other databases. Hana more frequent internal reorganizations, which then also affect the Persistence be mapped.

This increases the write throughput requirements of the data writer. In contrast, one should expect the IO throughput requirements for reading data to be rather low, as Hana Data actually in the RAM should read.

This may be correct for normal operation, but it is not true for the case that Hana is booted. Assuming that 1 TB of data has to be read into the main memory, this still takes 20 minutes at a throughput of 1 GB/s. This would not be a problem if restarts of the database were the exception.

Since Hana is currently under constant development with the aim of one day making optimum use of NVRAM, updates have to be installed at regular intervals, which are often accompanied by a restart of the database. This explains the requirement of SAP, the Persistence also to be equipped with high throughput rates for reading in the data area.

OLAP versus OLTP

Even though, as mentioned above, the main use of IMDBs tends to be in the OLAP lies, goes SAP already the way, also OLTP-applications on Hana to propagate (Suite on Hana). Technically it is possible for OLTP-systems using both single nodes and scale-out architectures.

From a performance perspective, however, there is a significant difference. As already explained, for OLTP-applications a performance advantage on Hana This can be achieved when code sections are moved to the database to avoid time-consuming communication between the application and the database.

If Hana but is distributed across several computer nodes in a ScaleOut landscape, it becomes very difficult to distribute code and data tables across the nodes in such a way that the code lines also find their tables on the same computer on which they are currently running. This is because if the code has to fetch the data from a neighboring node, there is again communication overhead between the nodes, which occurs with comparatively high latency, as if the code had remained on the application server right away.

For this reason, a single-node implementation of Hana for OLTP definitely preferable to a ScaleOut architecture.

At the same time there was SAP so far for Hana as a single node on the requirement for fast (internal) log devices. However, internal log devices are essential for business-critical OLTP applications unacceptable, since loss of the computer or log device is also accompanied by loss of data.

Business-critical data, especially log data, should always be written (mirrored) to a second location so that, in an emergency, you can recover the database from a second source up to the last completed transaction.

Fujitsu was quick to identify the Hana-single-node architecture into the FlexFrame operating concept and placed the log data on external, mirrorable storage units. Although the previously required 100,000 IOps are not available there, they have not been necessary for a long time from a technical point of view. However, this means that Hana the secure and flexible operation known from FlexFrame for business-critical applications with the typical high SLAs is guaranteed.

In the meantime also SAP from the high IO requirements for the logwriter in order to Hana prepare for flexible integration into the data center operation.

Efficient operating concept and shadow databases

The requirement for secure data storage and an efficient operating concept is met by the integration of Hana in FlexFrame. Mirrored shared storage ensures high availability both locally and across data centers.

An open point is still the problem of restart times. Depending on the size of the database, a complete restart can take an excessively long time even with high-performance IO channels.

In the course of the further development of Hana works SAP on the concept of the shadow database, which would ideally minimize switchover times, since shadow databases usually run along almost synchronously with the primary data.

After failure of the primary database, activation and complete recovery of the shadow database would take only a few minutes until operations can be resumed.

Shadow databases are in Hana not yet available today, but as a precursor to this offers Hana the system replication option, which ensures that the log data is replicated synchronously to a second instance and that at regular intervals the columnstore (the column structure) of Hana is preloaded into the main memory and updated.

This eliminates the need for Failover the complete reloading of the column store, since most of it is already preloaded. This reduces the restart times in critical environments to a reasonable level.

The recommendation for applications that allow minimal downtime would be to use local to the productive Hana-instance with system replication and to use the production instance for disaster recovery. Persistence to a second RZ.

Since the instance with system replication only uses a small part of the computer resources, other, non-productive systems could be run in parallel on the computer node.

ScaleOut

It remains to be discussed how a ScaleOut architecture is to be evaluated compared to a Single Node. Basically, the following applies to both OLTP as well as for OLAPthat if the database size is the same, the Single Node, provided by the RAM-capacities possible, is the preferred alternative.

There are two main reasons for this. The first was already mentioned in the discussion in connection with OLTP discussed. Communication between the database nodes costs a comparatively large amount of time and has a negative impact on performance.

Especially for OLAP applications the problem of cleverly assigning code stretches to the data is not as relevant as with OLTPThe mathematical structure of queries usually allows them to be processed in a well-distributed manner. Nevertheless, the problem of latency remains, because the partial results of a query must eventually be merged on a node and consolidated into a final result.

A second problem arises, for example, with joins that go over tables that are distributed over several nodes. Before the join can be executed, the data of the tables involved must be transferred to the node on which the join is executed and stored temporarily. This costs time on the one hand and additional main memory on the other.

With a single node, there is no need for data transfer and intermediate storage, since all data is local. This results in the recommendation that applications should be operated with a single node instance for as long as possible.

Current developments in hardware technology accommodate this approach. With the hardware officially available in February 2014, it will be possible to use up to 12 TB RAM to be installed in a machine.

SAP meanwhile lets announce that it will support with the new hardware for OLTP applications up to 6 TB on a computer for productive systems and for OLAP up to 2 TB with eight sockets equipped compared to 1 TB in the past.

This sounds plausible, since the CPU performance of the new processor generation has roughly doubled. However, the performance of the Hana–Technology has been constantly and significantly improved over the past few years, so that from a technical point of view, even greater RAM-The CPU can be expanded by more than 2 TB for a node in a ScaleOut architecture.

Jürgen Meynert, Fujitsu

Works as a senior consultant at Fujitsu Technology Solutions.

All articles of the author

Logwriter & Datawriter

Code to Data

Persistence layer and performance

Logwriter & Datawriter

OLAP versus OLTP

Efficient operating concept and shadow databases

ScaleOut

Write a comment (Cancel Reply)

Security: How to protect SAP systems effectively

Precisely supports Amazon RDS for Db2 Service

Flexera completes acquisition of Snow Software

Venue

Event date

Regular ticket:

Venue

Event date

Logwriter & Datawriter

Code to Data

Persistence layer and performance

Logwriter & Datawriter

OLAP versus OLTP

Efficient operating concept and shadow databases

ScaleOut

Write a comment (Cancel Reply)

Security: How to protect SAP systems effectively

Precisely supports Amazon RDS for Db2 Service

Flexera completes acquisition of Snow Software

Venue

Event date

Regular ticket:

Venue

Event date

Tickets