Oncoming #BW4HANA Webcasts

Here is a list of ASUG webcasts covering topics around BW/4HANA; click on title for registration:

For a complete list of ASUG BI webcasts look here.

Advertisements

Technical Summary of #BW4HANA

[This can be considered as an extended version of the introductory blog What is #BW4HANA?]

Overview

barcelona-2016-nov-13BW/4HANA is a data warehousing application sitting on top of HANA as the underlying DBMS. A data warehouse (DW) is designed specifically to be a central repository for all data in a company. This well-structured data traditionally originates from transactional systems, ERP, CRM, and LOB applications. Each individual system is consistent, whereas the union of the systems and the underlying data is not. This is why disparate data from those systems has to be harmonized—that is, extracted, transformed, loaded (ETL) or logically exposed (federated) — into the warehouse within a single relational schema. The predictable data structure (of such a schema) optimizes processing and reporting.

BW/4HANA allows to define a DW architecture via high level building blocks, almost like Lego bricks. Out of this model, a set of tables, views and other relational objects are generated. BW/4HANA manages the lifecycle of those tables and views, e.g. when columns are added or removed. It also manages the relationship between the tables. For example, it asserts referential integrity which is extremely beneficial for query processing as it avoids the use of outer joins whose performance is, in general, far inferior to inner joins. BW/4HANA not only manages the lifecycle of tables, views etc. but also the lifecycle of the data sitting in those tables or being exposed by the views. Data typically enters a DW in its original format but then gets harmonized with data from other systems. For legal compliance and other reasons, it is usually important to track the data in the DW, how it “travels” from its entry in the DW to its exposure to the end users. In many cases, data retains for a certain period in an active (hot data) layer in the DW until it is moved to less expensive media outside of HANA, e.g. nearline storage (NLS) in IQ or Hadoop. Still, BW/4HANA provides online access to that data albeit at a small performance penalty.

On top of its data management layer, BW/4HANA also provides an analytic layer with an analytic manager at its core. The latter is a unique asset as it differentiates from traditional OLAP engines in the sense that it refrains from processing data itself (i.e. in ABAP) but simply compiles query execution graphs that are sent to HANA’s execution engines, mainly the calculation engine but also SQL, OLAP, planning and other engines and libraries. Those engines return (partial) results which are then assembled within BW/4HANA’s analytic manager to form an overall query result. It is important to understand that typical analytic queries consist of a sequence of operations which cannot be arbitrarily changed for optimization due to mathematical constraints. For example, currency values have to be converted to a single currency before the values are aggregated; swapping aggregation and currency conversion would yield incorrect results. In order to leverage HANA’s extremely fast aggregation power, currency conversion (and similarly unit conversion) logic has been brought into HANA.

Many of the approaches mentioned above have started within BW-on-HANA but are now extended within BW/4HANA, mainly using the advantage that HANA is the only supported DBMS underneath. In the remainder, we will elaborate this further.

Openness

Probably one of the most popular and widely recognized strengths of BW/4HANA is the many features that allow BW/4HANA …

  • to expose its data and a subset of the semantics on top (e.g. hierarchies, currency logic, fiscal logic, formulas) via HANA’s calculation views to a SQL tool or programmer,
  • to incorporate SQL tables, views, SQL script procedures seamlessly into a BW/4HANA-based DW architecture,
  • to leverage any specialized library (e.g. AFLs, PAL) in batch or online processing.

So, it is possible to easily interact with any SQL environment, tool and approach. It is so popular that many BW-on-HANA customers (and this should be even more the case for BW/4HANA – have started to discard their SQL-oriented data warehouses in favor of using native SQL within BW-on-HANA or BW/4HANA. Bell Helicopters presented an example at the ASUG 2016 conference; see the figure below. They will deprecate their 4 Oracle-based data warehouses and move them into HANA. For more details see their slides or here.

Bell Helicopter's plans as presented at ASUG 2016

Bell Helicopter’s plans as presented at ASUG 2016

Simplicity

Depending on how one counts, BW offers 10 to 15 different object types / building blocks – these are the “Lego bricks” mentioned above – for building a data warehouse. In BW/4HANA, there are only 4 which are at least as expressive and powerful as the previous 15 – see the figure below.  BW/4HANA’s building blocks are more versatile. Data models can now be built with less buildings blocks w/o compromising expressiveness. They will, therefore, be easier to maintain, thus more flexible and less error-prone. Existing models can be enhanced, adjusted and, thus, be kept alive during a longer period that goes beyond an initial scope.

Another great asset of BW/4HANA is that it knows what type of data sits in which table. The usage and access pattern of each table is very well known to BW/4HANA. From that information, it can automatically derive which data needs to sit in the hot store (memory) and which data can be put into the warm store (disk or non-volatile RAM) to yield a more economic usage of the underlying hardware. This is unique to BW/4HANA compared to handcrafted data warehouses that require also a handcrafted, i.e. specifically (manually) implemented data lifecycle management.

Object types in classic BW vs BW/4HANA

Object types in classic BW vs BW/4HANA

Modern UIs

With the switch from BW or BW-on-HANA to BW/4HANA comes along a shift away from legacy SAPGUI based UIs for administrators, expert users and DW architects to modern UIs based on HANA Studio or Fiori-like, browser based UIs. Currently, this shift has been accomplished for the main modeling UIs and SAPGUI will still be necessary in the short term. But it is not only about using modern technology and changing the visualization of existing UIs but there has been significant changes in how to define and manage a DW architecture. Most prominently, there is a new data flow modeler (figure below; left-hand side) which visualizes the DW architecture in a very intuitive and user-friendly way, thereby moving away from the classic tree-based BW workbench (figure below; right-hand side).

Advances of UIs in BW/4HANA vs classic BW

Advances of UIs in BW/4HANA vs classic BW


Big Data

BW/4HANA will be tightly integrated with SAP’s planned Big Data Hub tooling. This caters for the fact that traditional data warehouses are gradually complemented with big data environments which lead to an architecture of modern data warehouses; see the figure below. Typically, “data pipelines” (data movement processes that refine, combine, harmonize, transform, convert unstructured → structured etc.) span the various storage layers of such an environment. It will be possible to incorporate BW/4HANA’s process chains into such data pipelines to allow for an end-to-end view, scheduling, monitoring and overall management. BW/4HANA will leverage VORA as a bridge between HANA and HDFS, e.g. for accessing NLS data that might have been moved to HDFS, for machine learning or transformation processes that involve (e.g. high volume) data in HDFS.

An EDW in the context of a big data system landscape

An EDW in the context of a big data system landscape

This article has also been published on Linkedin. You can follow me on Twitter via @tfxz.

Quality of Sensor Data: A Study Of Webcams

noisy GPS data

Fig 1: Noisy GPS data: allegedly running across a lake.

For a while, I’ve been wondering what the data quality of sensor data is. Naively – and many conversations that I had on this went along that route – it can be assumed that sensors always send correct data unless they fail completely. A first counter-example that many of us can relate to is GPS, e.g. integrated into a smartphone. See the figure to the right which visualises part of a running route and shows me allegedly running across a lake.

Now, sensor does not equal sensor, i.e. it is not appropriate to generalise “sensors”. Quality of measurements and data varies a lot on the actual measure (e.g. temperature), the environment, the connectivity of the sensor, the assumed precision and many more effects.

In this blog, I analyse a fairly simple, yet real-world setup, namely that of 3 webcams that take images every 30 minutes and send them via the FTP protocol to an FTP server. The setup is documented in the following figure that you can read from right to left in the following way:

  1. There are 3 webcams, each connected to a router via WLAN.
  2. The router is linked to an IP provider via a long-range WIFI connection based on microwave technology.
  3. Then there is a standard link via the internet from IP provider to IP provider.
  4. A router connects to the second IP provider.
  5. The FTP server is connected to that router.
The connection between the webcams on th right to the FTP server on the left.

Fig 2.: The connection between the webcams on the right to the FTP server on the left.

So, once an image is captured, it travels from 1. to 5. I have been running this setup for a number of years now. During that time, I’ve incorporated a number of reliability options like rebooting the cameras and the (right-hand) router once per day. From experience, steps 1. and 2. are the most vulnerable in this setup: both, long-range WIFI and WLAN, are liable to a number failure options. In my sepcific setup, there is no physical obstacle or frequency polluted environment. However, weather conditions are most likely to be the source of distortion, like widely varying humidity and temperature.

So, what is the experiment and what are the results? I’ve been looking at the image data sent over the course of approx. 3 months. In total, around 8000 images were transmitted. I counted the successful (fig 3) vs the unsuccessful (fig 4) transmissions. I did not track the images that completely failed to be transmitted, i.e. that did not reach the FTP server at all and therefore did not leave any trace. 5.3% of the images were distorted (as in fig 4) or every 19-th image failed to be transmitted correctly. In addition, that rate was no constant (e.g. per week) but there were times of heavy failures and times of no failures.

Successfully transmitted image.

Fig 3: Successfully transmitted image.

Distorted image.

Fig 4: Distorted image.

This is an initial and simple analysis but one that matches real-world conditions and setups pretty well and is therefore no artificial simulation. In the future, I might refine the analysis like counting non-transmissions too or correlating the quality with temperature, humidity or other potential influencing factors.

You can follow me on Twitter via @tfxz.

What To Take Away From The #HLF16

20160922-img_7497From 18-23 Sep 2016, I had the privilege to join approx. 200 young researchers and 21 laureates of the most prestigious awards in Mathematics and Computer Science at the 4th Heidelberg Laureate Forum (HLF). Similar to the Lindau Nobel Meetings, it intends to bring together and seed the exchange between the past and the future generations of researchers.

I thought that it is worthwhile writing up the experience from someone with my background and perspective which is the following: I am a member of the HLF foundation council and could therefore join this illustrous event. I hold a master degree from the Karlsruhe Institute of Technology (KIT) and a PhD from Edinburgh University, both in CS. For almost 18 years, I have been working for SAP in development, most of the time confronted with high performance problems of CS in general and DBMS in particular. So, generation- and careerwise, I’m somewhere inbetween the two focus groups of the HLF.

The Setup

… was excellent: there were lectures by the laureates during the morning and breakout events (panels, workshops, poster sessions and the like) in the afternoon. Inbetween, there were ample breaks for people to mingle, talk to each other, exchange ideas, make contacts and/or friends, basically everything to nourish creativity, curiosity and inspiration. Many years ago, a friend commented: “At conferences, breaks are as important as the scheduled events; presentations are there only to seed topics for lively discussions during the breaks.”. I think HLF is an excellent example implementing that notion.

Meeting People

20160920-img_7417Over the course of the week, I have talked to many attendees and to both, young researchers and laureates. Topics circled – as in the presentation – around the past and the future, lessons learned on past problems in order to tackle the coming problems. Sir Andrew Wiles did a great job in his lecture on how people tried to prove Fermat’s Last Theorem until after more than 300 years a new approach was triggered which he finally led to a successful end. Similarly, Barbara Liskov chose to talk about the lessons learned in the early 1970s when early versions of modularity and data abstraction made it into modern programming languages which finally led to object orientation.

On Wed morning of the “HLF week”, a group of them visited SAP. The young researchers learned about opportunities at SAP while the laureates were exposed to a demo of SAP’s digital boardroom. Also on this occasion, good questions and discussions came up.

The Presentations

20160923-img_7501The lectures given by the laureates were mostly excellent. I’ve decided to pick 3 which I personally enjoyed most, 1 of each discipline:

  • Mathematics: Sir Michael Atiyah is 95 years old. He took advantage of the fact that as a laureate you don’t have to prove to anyone anything and, thus, can take a few steps back and look at the research in your area from a distance. His lecture on the “The Soluble and the Insoluble” discusses this as “both a philosophical question, and a practical one, which depends on what one is trying to achieve and the means, time and money available. The explosion in computer technology keeps changing the goal posts.”

  • Computer Science: Raj Reddy picked “Too Much Information and Too Little Time” as a topic. Amongst others, he pointed to cognitive science and that modern software needs to take human limitations (make errors, forget, impatience, “go for least effort” etc) and strengths (tolerance of ambiguity + imprecision + errors, rich experience and knowledge, natural language) far more into account.

  • Physics: Brian Schmidt gave the Lindau Lecture on the “State of the Universe”. I am no expert in astrophysics but, still, there has been a lot of fascinating facts and “aha effects” in it. It is amazing what science can do.

There is a lot more material from this and previous events. If you are interested in finding out more details, you might want to look at the HLF website where you also find more video recordings of the laureates’s lectures.

This has been cross-published on Linkedin. You can follow me on Twitter via @tfxz.

3 Turing Award And 1 Nobel Price Winners Visiting SAP

20160919-img_7405aOn Wednesday 21 Sep, SAP hosted a group of visitors participating in the 4th Heidelberg Laureate Forum (HLF), a meeting of young researchers and winners of the most prestigious awards in Mathematics and Computer Science, the Abel Prize, the Fields Medal, the Nevanlinna Prize and the ACM Turing Award, all tantamount to a Nobel Prize in their respective discipline. Amongst the visitors were

From left to right: K. Bliznak (SAP), Mrs O. Ioannidi, J. Sifakis, Sir T. Hoare, B. Schmidt, V. Cerf, T. Zurek (SAP)

From left to right: K. Bliznak (SAP), Mrs O. Ioannidi, J. Sifakis, Sir T. Hoare, B. Schmidt, V. Cerf, T. Zurek (SAP)

This visit took around 2.5 hrs and comprised a tour through SAP’s inspiration pavillon and a demo of SAP’s digital boardroom. The 4 laureates showed a huge interest in both, still being critical here and there: at the big data wall, Vint Cerf noted the omission of some of the advances in the internet in the 1970s and 1980s. Brian Schmidt commented: “Vint, you are too biased!”. The digital boardroom demo triggered a good set of questions that went beyond the pure visualisation on the 3 screens but extended towards questions on how to incorporate data sitting in non-SAP or legacy systems, on who composes the dashboards and how that might be applicable in their respective areas etc. They even speculated on how SAP might create upsell opportunities. It has been a lively exchange of ideas and, overall, a bit different from the common visits by SAP customers.

Vint Cerf commenting on the Digital Boardroom

Vint Cerf commenting on the Digital Boardroom

If you are interested in finding out more details on the HLF, you might want to look at the HLF website where you also find video recordings of the laureates’s lectures.

This blog has also been published on SciLogs.You can follow me on Twitter via @tfxz.

#BW4HANA Launch Demo

On 7 Sep 2016, BW/4HANA was launched in San Francisco. Videos from the event can be found on the event’s website. Especially the BW/4HANA launch demo has attracted a lot of attention. For further reuse – like embedding it into PPT – we have uploaded it to Youtube:

The demo is also available as a screencam:

More infos on BW/4HANA can be found here. The blog has been cross-published on SCN. You can follow me on Twitter via @tfxz.

PS: The demo has been shown also at SAP Teched Bangalore (5’33”, Oct 2016).

 

Data Modeling with #BW4HANA

Teulada-Palmeras

One of the most striking differences between BW and BW/4HANA is data modeling. On one hand, there is less but more versatile objects to choose from (see figure 1). On the other hand, there is a new, more intuitive alternative to BW’s long standing admin workbench (RSA1), namely the Data Flow Modeler (see figure 2). It shows physical and virtual containers (like DSOs or composite providers) as boxes. Data transformations, combinations, transitions etc. are indicated as directed lines between those boxes. From those boxes and lines it is possible to access the respective editor for those objects. In that way, a DW architect can navigate along the paths that the data take from entering the system towards the multidimensional views that serve as source to pivot tables, charts and other suitable visualisations. This is not only great for the mentioned DW architect but also allows for rapid prototyping scenarios, e.g. when a DW architect sits down with a business user to quickly create a new or modify a given model. Figure 3 shows an example.

Modeling-DW Modeling-BW4
Figure 1: Less and more logical objects when architecting a DW with BW/4HANA.
dataflow modeler 1

Figure 2: The new Data Flow Modeler in BW/4HANA.

dataflow modeler 3 dataflow modeler 4
Figure 3: The same scenario, once in the traditional admin workbench (left) and BW/4HANA’s Data Flow Modeler (right).

This blog has also been published here. You can follow me on Twitter via @tfxz.