{"id":62135,"date":"2019-10-02T11:00:14","date_gmt":"2019-10-02T09:00:14","guid":{"rendered":"http:\/\/e3mag.com\/?p=62135"},"modified":"2020-02-08T16:11:33","modified_gmt":"2020-02-08T15:11:33","slug":"big-data-architecture","status":"publish","type":"post","link":"https:\/\/e3mag.com\/en\/big-data-architecture\/","title":{"rendered":"Big Data Architecture"},"content":{"rendered":"<p>As a software architect, my goal is to achieve complicated tasks via simple solutions. The individual components of a solution each have advantages and disadvantages, the art is to combine them in such a way that in sum the advantages remain and the disadvantages cancel each other out.<\/p>\n<p>For many SAP users, the first step will be to enable analytics with Big Data, that is, to find interesting information in these huge volumes of data.<\/p><div id=\"great-724983462\" class=\"great-fullsize-content-en\" style=\"margin-bottom: 20px;\"><a data-no-instant=\"1\" href=\"https:\/\/www.youtube.com\/watch?v=6yfv7eho3Gc\" rel=\"noopener\" class=\"a2t-link\" target=\"_blank\" aria-label=\"Fullsize\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/e3mag.com\/wp-content\/uploads\/2026\/03\/banner_26_04_08_1200x150.jpg\" alt=\"Fullsize\"  srcset=\"https:\/\/e3mag.com\/wp-content\/uploads\/2026\/03\/banner_26_04_08_1200x150.jpg 1200w, https:\/\/e3mag.com\/wp-content\/uploads\/2026\/03\/banner_26_04_08_1200x150-400x50.jpg 400w, https:\/\/e3mag.com\/wp-content\/uploads\/2026\/03\/banner_26_04_08_1200x150-768x96.jpg 768w, https:\/\/e3mag.com\/wp-content\/uploads\/2026\/03\/banner_26_04_08_1200x150-100x13.jpg 100w, https:\/\/e3mag.com\/wp-content\/uploads\/2026\/03\/banner_26_04_08_1200x150-480x60.jpg 480w, https:\/\/e3mag.com\/wp-content\/uploads\/2026\/03\/banner_26_04_08_1200x150-640x80.jpg 640w, https:\/\/e3mag.com\/wp-content\/uploads\/2026\/03\/banner_26_04_08_1200x150-720x90.jpg 720w, https:\/\/e3mag.com\/wp-content\/uploads\/2026\/03\/banner_26_04_08_1200x150-960x120.jpg 960w, https:\/\/e3mag.com\/wp-content\/uploads\/2026\/03\/banner_26_04_08_1200x150-1168x146.jpg 1168w, https:\/\/e3mag.com\/wp-content\/uploads\/2026\/03\/banner_26_04_08_1200x150-18x2.jpg 18w, https:\/\/e3mag.com\/wp-content\/uploads\/2026\/03\/banner_26_04_08_1200x150-600x75.jpg 600w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" width=\"1200\" height=\"150\"  style=\" max-width: 100%; height: auto;\" \/><\/a><\/div>\n<p>But instead of building a completely new infrastructure for the users, I combine the Big Data system with the existing data warehouse.<\/p>\n<p>The data scientist gets the data lake, a data area in which all the raw data is available, and a powerful tool to go with it, with which he can also process this raw data. The result of his work is new key figures that I add to the data warehouse. This has several advantages:<\/p>\n<ul>\n<li>The business user continues to use his usual tools for analysis, only now he has more key figures.<\/li>\n<li>The Data Scientist has access to all data, Big Data and ERP data.<\/li>\n<li>For IT, the effort is manageable.<\/li>\n<\/ul>\n<p>This solution is also attractive in the context of costs vs. benefits vs. probabilities of success: By docking on to the existing, I have a reduced project scope, thus a minimized project risk and a cheaper implementation, but still fully exploit the potential benefits.<\/p>\n<p>Thus, a Big Data solution consists of only two components: the data lake with the raw data and a server cluster where the data preparation takes place.<\/p>\n<h3>Data Lake or SAP Vora<\/h3>\n<p>In the past, SAP offered SAP Vora as a data lake and sells the Altiscale solution under the name Big Data Services. Basically, however, the data lake is just a large file system. If SAP sales nevertheless propose Vora, Altiscale or DataHub, the price and performance should be scrutinized very critically.<\/p>\n<p>Why not just start with a local hard disk or the central file server in the first project phase? As long as there is enough space and the costs for the storage space are not too high, this is valid throughout. Copying the files is possible at any time and without any problems, so I don't block anything for the future.<\/p>\n<h3>Preparation with Apache Spark<\/h3>\n<p>For the processing of this data, most projects today use the open source framework Apache Spark. It allows programs for data processing to be written with just a few lines of code and executed in parallel in a server cluster.<\/p>\n<p>There is no reason for me to reinvent the wheel here, especially since such an installation is very simple and can be done in ten minutes: download the package on a small Linux machine, extract it and start a master and a first worker via the start-all command.<\/p>\n<h3>Challenge: Algorithm<\/h3>\n<p>The technology is manageable with the above approach. Developing the algorithms for the new key figures is the difficult part: How can information be extracted from the mass data that will ultimately be reflected in the company's profit?<\/p>\n<p>This is precisely where the success of a Big Data project is decided. That's why I think it makes sense to invest here, for example in the training of a data scientist.<\/p>\n<p>In the following columns, I will answer the following questions, among others: Why use Apache Spark and not an ETL tool? Why do you need the data lake if the data is already in the data warehouse? Etc.<\/p>","protected":false},"excerpt":{"rendered":"<p>Big Data is a big topic, but the multitude of possibilities is overwhelming. Every software provider comes up with different products and different goals. I would like to bring some structure into this jungle and make it easier to get started.<\/p>","protected":false},"author":1891,"featured_media":62136,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","footnotes":""},"categories":[7,35911,36004],"tags":[937,210,927,67],"coauthors":[36006],"class_list":["post-62135","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-meinung","category-mag-1909","category-smart-big-data-integration","tag-analytics","tag-big-data","tag-data-warehouse","tag-linux","pmpro-has-access"],"acf":[],"featured_image_urls_v2":{"full":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",1000,450,false],"thumbnail":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration-150x150.jpg",150,150,true],"medium":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",400,180,false],"medium_large":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration-768x346.jpg",768,346,true],"large":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",1000,450,false],"image-100":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration-100x45.jpg",100,45,true],"image-480":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration-480x216.jpg",480,216,true],"image-640":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration-640x288.jpg",640,288,true],"image-720":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration-720x324.jpg",720,324,true],"image-960":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration-960x432.jpg",960,432,true],"image-1168":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",1000,450,false],"image-1440":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",1000,450,false],"image-1920":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",1000,450,false],"1536x1536":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",1000,450,false],"2048x2048":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",1000,450,false],"trp-custom-language-flag":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",18,8,false],"bricks_large_16x9":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",1000,450,false],"bricks_large":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",1000,450,false],"bricks_large_square":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",1000,450,false],"bricks_medium":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",600,270,false],"bricks_medium_square":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration.jpg",600,270,false],"profile_24":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration-24x24.jpg",24,24,true],"profile_48":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration-48x48.jpg",48,48,true],"profile_96":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration-96x96.jpg",96,96,true],"profile_150":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration-150x150.jpg",150,150,true],"profile_300":["https:\/\/e3mag.com\/wp-content\/uploads\/2019\/08\/Smart-and-Big-Data-Integration-300x300.jpg",300,300,true]},"post_excerpt_stackable_v2":"<p>Big Data ist ein gro\u00dfes Thema, doch die Vielzahl an M\u00f6glichkeiten erschl\u00e4gt. Jeder SW-Anbieter kommt mit verschiedenen Produkten und unterschiedlichen Zielen. In diesen Dschungel m\u00f6chte ich etwas Struktur bringen und den Einstieg erleichtern.<\/p>\n","category_list_v2":"<a href=\"https:\/\/e3mag.com\/en\/category\/opinion\/\" rel=\"category tag\">Die Meinung der SAP-Community<\/a>, <a href=\"https:\/\/e3mag.com\/en\/category\/mag-1909\/\" rel=\"category tag\">MAG 19-09<\/a>, <a href=\"https:\/\/e3mag.com\/en\/category\/opinion\/smart-big-data-integration\/\" rel=\"category tag\">Smart &amp; Big Data Integration<\/a>","author_info_v2":{"name":"Werner D\u00e4hn, rtdi.io","url":"https:\/\/e3mag.com\/en\/author\/werner-daehn\/"},"comments_num_v2":"0 comments","_links":{"self":[{"href":"https:\/\/e3mag.com\/en\/wp-json\/wp\/v2\/posts\/62135","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/e3mag.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/e3mag.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/e3mag.com\/en\/wp-json\/wp\/v2\/users\/1891"}],"replies":[{"embeddable":true,"href":"https:\/\/e3mag.com\/en\/wp-json\/wp\/v2\/comments?post=62135"}],"version-history":[{"count":0,"href":"https:\/\/e3mag.com\/en\/wp-json\/wp\/v2\/posts\/62135\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/e3mag.com\/en\/wp-json\/wp\/v2\/media\/62136"}],"wp:attachment":[{"href":"https:\/\/e3mag.com\/en\/wp-json\/wp\/v2\/media?parent=62135"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/e3mag.com\/en\/wp-json\/wp\/v2\/categories?post=62135"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/e3mag.com\/en\/wp-json\/wp\/v2\/tags?post=62135"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/e3mag.com\/en\/wp-json\/wp\/v2\/coauthors?post=62135"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}