The global and independent platform for the SAP community.

Hana: The whole is greater than the sum of its parts

Reportage: A short story about the long road - In-memory computing with the Hana database: When the concepts of Hana were first introduced within SAP, I thought that any database with enough memory is an in-memory computing database and correspondingly fast. But at least I wasn't the only one.
Werner Dähn, rtdi.io
November 28, 2019
Hana: The whole is greater than the sum of its parts
avatar
This text has been automatically translated from German to English.

Now there is more talk of "warm storage" (dynamic tiering and native storage extensions). So is SAP going the other way and ending up where the classic databases came from - with disk databases and memory as cache?

Rows for OLTP Columns for OLAP

You often hear that row storage, for reading entire sentences, is better. Even Hasso Plattner implied that at the past Sapphire keynote. Usual argument:

In a row-based database, the data of a record are all together. A catchy argument, but thought too short. The topological distance is disregarded.

This argument comes from the time of hard disks, where the read head plus the angular velocity of the rotating disk determined the access time and 4 kB sectors were read at once.

With SSDs there are no more mechanics, an address is sent there and a 4 kB sector is returned. The access time to a sector is thus significantly shorter, but the data should still be close together, in the same 4 kB sector.

Memory instead returns only 16 bytes per access, but of course much faster, and which memory address is requested next is irrelevant.

In other words: To read 4 kB at maximum speed, the data should be in one sector on hard disks and SSDs. The memory, on the other hand, doesn't care at all, since 256 accesses of 16 bytes each are made to read 4 kB anyway.

A row- or column-oriented storage is equally fast from this point of view and if all data are already in the memory. The memory address is different, but this has no influence on the speed.

Only for disk operations the statement is valid that a row read requires a row-oriented storage and OLAP analyses a column orientation. With this I have refuted the reasoning itself, but the statement "with enough memory even a normal database is an in-memory database" is still unanswered.

This reasoning results from the column orientation: in one data set there is a wide variety of information, material number, text, and many indicators such as color, type, size, and so on.

This divergent information certainly cannot be compressed as well as each column on its own with rather repetitive patterns. It is done even more intelligently by looking at each value of such a column individually.

For example, for the Size field there are only the values M, S, L, XL, so four strings are generated. The string for M might look like this: 0100-0000-0000-1000; and it says that the value M occurs in set 2 and 13.

Something like this can be compressed very well by a computer and also very fast. For other columns, such as the material number, which is different for each material, other methods are used.

Werner Daehn

Search and find

In an ERP system, data can be accessed either by primary key or by search. "show sales order 1234" or "search all sales items for sales order 1234".

The former runs the same way for both types of databases. A classic database uses the index, finds the address of the row there and reads this record at this address. In Hana, you read the index, find the record number there, and fetch the value at that position from each column.

Again, there are differences in the search: With a Row orientation, hopefully there is a second index that contains all the addresses of the Sales Item records that belong to a Sales Order.

Otherwise, the database has no chance and must search the entire table. With Hana, indexing is already given by the way it is stored. An immense practical advantage.

Each of these points - column orientation, compression, indexing and in-memory - has its own advantages and disadvantages. The unique selling point of SAP Hana is that even today, this database is the only one that intelligently combines all of these points, so that the advantages come together.

Keeping everything in memory is possible when you compress. If you organize the data in columns, you can compress better. Apart from primary keys, indexes are no longer needed thanks to columnar organization and compression.

Because all data is in the memory, queries of a complete table set are also as fast as with a row-oriented storage, so you can use a column-oriented storage even for such queries.

On the other hand, you gain a lot with Hana for the normal cases. OLAP queries such as "Total sales per year" are very fast because the data is already organized in columns.

Searches on any attribute are much faster because each column per se represents an index. Queries that do not read 100 percent of the table columns are faster.

But this is exactly the crux of the matter: My complete consideration was based on the premise that all data is already in the memory and also fits into the memory.

Isn't it a waste of memory and therefore money to always keep everything in RAM (random access memory), regardless of whether it is needed or not?

SAP gives the administrator possibilities for optimization here: A first point are binary data types (LOB, CLOB and NCLOB data types). These are not stored in memory, but always remain on disk.

The pointer to the file is in the memory, but not the content. Good idea, but it doesn't help much because such data types hardly ever occur in an ERP.

First use

Next optimization: The partitions are only fetched into the memory when they are used for the first time and not already as a precaution. So if a table consists of one billion records, divided into ten partitions for ten years, only the partition for the current year would be loaded into memory.

Also a good idea, reduces the startup time and the initial memory requirements, but in the course of time every partition will have been used at least once by someone. Thus, everything is in RAM at some point and remains there until the next restart.

There is a feature for this: the Retention Period. With this setting, such partitions are kicked out of RAM after a set time without any access. Finally a setting that removes things from RAM. Attention, this switch is set to Off by default!

This is now already very good, but has two gaps. The first person using even a single record triggers the complete load from this partition into memory.

If such a partition is one gigabyte in size, it can take as long as two seconds. And all this doesn't help if the complete database needs 1.1 terabytes of RAM, but only 1 terabyte is available.

This is where the latest feature comes to the rescue, the Native Storage Extension. This means that the entire partition is no longer loaded, but only the required pages that contain this data.

And if there is not enough RAM available, pages that are not needed are removed again. So from the way you proceed here really like with a disk-based, classic database and use for tables with this setting the RAM only as a cache.

Multi-temperature data

But that is not how it is intended. Hana is also an in-memory computing database, so all (!) used (!) data should be available in memory. Only in this way can OLTP and OLAP queries be handled by the same database, and only in this way can short and predictable response times be achieved.

These additional functions are only suitable for multi-temperature data. For data that is needed from time to time. For which the user should not be sent to an archive database.

If I were to use these features for all tables and allocate too little RAM, then the disadvantages of column-oriented storage start to become apparent. I would no longer have all the advantages without disadvantages.

Instead, Hana represents the central entry point for the differently tempered data and hides the physical differences: The user submits his commands, some data comes from the in-memory store, some from disk, others are federated from an external system via the Hana Smart Data Access feature. The user does not notice any of this, except for possibly longer response times. Hana becomes a data fabric.

The physics is hidden, but it is still there. Reading Hana data structures from memory is faster than from disk with the Native Storage Extension.

The break to Dynamic Tiering with an SAP IQ database in the background is even greater because Hana data structures are no longer used here. The interface is therefore limited to SQL.

And if the user accesses an external system via Smart Data Access, the response can only be as fast as the external system allows.

avatar
Werner Dähn, rtdi.io

Werner Dähn is Data Integration Specialist and Managing Director of rtdi.io.


Write a comment

Working on the SAP basis is crucial for successful S/4 conversion. 

This gives the Competence Center strategic importance for existing SAP customers. Regardless of the S/4 Hana operating model, topics such as Automation, Monitoring, Security, Application Lifecycle Management and Data Management the basis for S/4 operations.

For the second time, E3 magazine is organizing a summit for the SAP community in Salzburg to provide comprehensive information on all aspects of S/4 Hana groundwork.

Venue

More information will follow shortly.

Event date

Wednesday, May 21, and
Thursday, May 22, 2025

Early Bird Ticket

Available until Friday, January 24, 2025
EUR 390 excl. VAT

Regular ticket

EUR 590 excl. VAT

Venue

Hotel Hilton Heidelberg
Kurfürstenanlage 1
D-69115 Heidelberg

Event date

Wednesday, March 5, and
Thursday, March 6, 2025

Tickets

Regular ticket
EUR 590 excl. VAT
Early Bird Ticket

Available until December 24, 2024

EUR 390 excl. VAT
The event is organized by the E3 magazine of the publishing house B4Bmedia.net AG. The presentations will be accompanied by an exhibition of selected SAP partners. The ticket price includes attendance at all presentations of the Steampunk and BTP Summit 2025, a visit to the exhibition area, participation in the evening event and catering during the official program. The lecture program and the list of exhibitors and sponsors (SAP partners) will be published on this website in due course.