V² for Big Data
Admittedly, the term "V2" comes from me and my math teacher would reprimand me for it. The first V in V2 stands for the Roman numeral 5 and the second V stands for each of the five terms that we will now discuss and that serve as the basis for Big Data. The term is not protected and is often used as a label in marketing.
First of all, "V" as in volume. This actually refers to the large to huge amounts of data that can occur in each of us and in companies today.
According to various sources, in 2018 the amount of data stored worldwide was 33 ZB (zettabyte). 1 ZB is equivalent to 1000 EB (exabyte) or 1 billion TB (terabyte).
Since annual growth of around 27 percent is assumed, this corresponds to a doubling every 3 years (!). Interestingly, the largest data producer is the manufacturing industry, followed by trade, financial services, infrastructure, and media and entertainment.
Only then will health and transport follow, although I see a strong upward trend in these areas. Especially if the speed of data transmission (for example, 5G) increases.
It is also about the speed of executing the large amounts of data and partly complex algorithms. This speed "Velocity" is the second "V".
The value of data is greater the more up-to-date it is. An example is when you visit a website with advertising and, based on the cookies, you are shown the appropriate advertising that a provider has purchased for you at that moment.
This leads on to the third "V" for Variety: not only does collected data come in very different file formats, but most of it (about 80 percent) is also unstructured, such as text, audio, video, chats, motion profiles, and so on.
One of the expectations in Big Data is to make this data analyzable and thus usable. Examples are predictions about behavior, where the unstructured data is already used today for predictions about coming disasters from communication in the affected areas linked with weather, historical data and geodata.
Regardless of the amount of data, data must be valid. Hence the fourth "V" for validity. Is what I measure representative and how is the correlation to the behavior I want to predict?
Although the birth rate of humans and the population of storks correlate, there is no causal relationship. Conversely, Walmart is said to have discovered 20 years ago that diapers and beer sell well together (especially on Fridays).
The background is that (allegedly) young fathers were sent to buy diapers and then took beer for themselves. Due to the cost structure and speed of SAP systems, the collection and processing according to the above "V" tends to be carried out in systems outside SAP and the results are then transferred.
The fifth "V" stands for "value," i.e., the value of the data obtained through big data. This ranges from targeted marketing to the optimization of business processes or value chains to new business models. Even though big data offers many opportunities, it is not available for free and the benefits must be correspondingly high.
I do not want to deprive you of the fact that there are other "Vs" in the current discussion that also deserve attention. For example, there is "V" like "Veracity": The higher the quality, the greater the informative benefit. Or "volatility," which refers to the availability of data.
Or "viability" as the right selection of the available data. As you can see, there are many adjusting screws that make up the benefits of Big Data. If you have read this column to the end and your head is not yet buzzing from all these "V" words, you can confidently make a "V" like "Victory".