[This can be considered as an extended version of the introductory blog What is #BW4HANA?]
BW/4HANA is a data warehousing application sitting on top of HANA as the underlying DBMS. A data warehouse (DW) is designed specifically to be a central repository for all data in a company. This well-structured data traditionally originates from transactional systems, ERP, CRM, and LOB applications. Each individual system is consistent, whereas the union of the systems and the underlying data is not. This is why disparate data from those systems has to be harmonized—that is, extracted, transformed, loaded (ETL) or logically exposed (federated) — into the warehouse within a single relational schema. The predictable data structure (of such a schema) optimizes processing and reporting.
BW/4HANA allows to define a DW architecture via high level building blocks, almost like Lego bricks. Out of this model, a set of tables, views and other relational objects are generated. BW/4HANA manages the lifecycle of those tables and views, e.g. when columns are added or removed. It also manages the relationship between the tables. For example, it asserts referential integrity which is extremely beneficial for query processing as it avoids the use of outer joins whose performance is, in general, far inferior to inner joins. BW/4HANA not only manages the lifecycle of tables, views etc. but also the lifecycle of the data sitting in those tables or being exposed by the views. Data typically enters a DW in its original format but then gets harmonized with data from other systems. For legal compliance and other reasons, it is usually important to track the data in the DW, how it “travels” from its entry in the DW to its exposure to the end users. In many cases, data retains for a certain period in an active (hot data) layer in the DW until it is moved to less expensive media outside of HANA, e.g. nearline storage (NLS) in IQ or Hadoop. Still, BW/4HANA provides online access to that data albeit at a small performance penalty.
On top of its data management layer, BW/4HANA also provides an analytic layer with an analytic manager at its core. The latter is a unique asset as it differentiates from traditional OLAP engines in the sense that it refrains from processing data itself (i.e. in ABAP) but simply compiles query execution graphs that are sent to HANA’s execution engines, mainly the calculation engine but also SQL, OLAP, planning and other engines and libraries. Those engines return (partial) results which are then assembled within BW/4HANA’s analytic manager to form an overall query result. It is important to understand that typical analytic queries consist of a sequence of operations which cannot be arbitrarily changed for optimization due to mathematical constraints. For example, currency values have to be converted to a single currency before the values are aggregated; swapping aggregation and currency conversion would yield incorrect results. In order to leverage HANA’s extremely fast aggregation power, currency conversion (and similarly unit conversion) logic has been brought into HANA.
Many of the approaches mentioned above have started within BW-on-HANA but are now extended within BW/4HANA, mainly using the advantage that HANA is the only supported DBMS underneath. In the remainder, we will elaborate this further.
Probably one of the most popular and widely recognized strengths of BW/4HANA is the many features that allow BW/4HANA …
- to expose its data and a subset of the semantics on top (e.g. hierarchies, currency logic, fiscal logic, formulas) via HANA’s calculation views to a SQL tool or programmer,
- to incorporate SQL tables, views, SQL script procedures seamlessly into a BW/4HANA-based DW architecture,
- to leverage any specialized library (e.g. AFLs, PAL) in batch or online processing.
So, it is possible to easily interact with any SQL environment, tool and approach. It is so popular that many BW-on-HANA customers (and this should be even more the case for BW/4HANA – have started to discard their SQL-oriented data warehouses in favor of using native SQL within BW-on-HANA or BW/4HANA. Bell Helicopters presented an example at the ASUG 2016 conference; see the figure below. They will deprecate their 4 Oracle-based data warehouses and move them into HANA. For more details see their slides or here.
Bell Helicopter’s plans as presented at ASUG 2016
Depending on how one counts, BW offers 10 to 15 different object types / building blocks – these are the “Lego bricks” mentioned above – for building a data warehouse. In BW/4HANA, there are only 4 which are at least as expressive and powerful as the previous 15 – see the figure below. BW/4HANA’s building blocks are more versatile. Data models can now be built with less buildings blocks w/o compromising expressiveness. They will, therefore, be easier to maintain, thus more flexible and less error-prone. Existing models can be enhanced, adjusted and, thus, be kept alive during a longer period that goes beyond an initial scope.
Another great asset of BW/4HANA is that it knows what type of data sits in which table. The usage and access pattern of each table is very well known to BW/4HANA. From that information, it can automatically derive which data needs to sit in the hot store (memory) and which data can be put into the warm store (disk or non-volatile RAM) to yield a more economic usage of the underlying hardware. This is unique to BW/4HANA compared to handcrafted data warehouses that require also a handcrafted, i.e. specifically (manually) implemented data lifecycle management.
Object types in classic BW vs BW/4HANA
With the switch from BW or BW-on-HANA to BW/4HANA comes along a shift away from legacy SAPGUI based UIs for administrators, expert users and DW architects to modern UIs based on HANA Studio or Fiori-like, browser based UIs. Currently, this shift has been accomplished for the main modeling UIs and SAPGUI will still be necessary in the short term. But it is not only about using modern technology and changing the visualization of existing UIs but there has been significant changes in how to define and manage a DW architecture. Most prominently, there is a new data flow modeler (figure below; left-hand side) which visualizes the DW architecture in a very intuitive and user-friendly way, thereby moving away from the classic tree-based BW workbench (figure below; right-hand side).
Advances of UIs in BW/4HANA vs classic BW
BW/4HANA will be tightly integrated with SAP’s planned Big Data Hub tooling. This caters for the fact that traditional data warehouses are gradually complemented with big data environments which lead to an architecture of modern data warehouses; see the figure below. Typically, “data pipelines” (data movement processes that refine, combine, harmonize, transform, convert unstructured → structured etc.) span the various storage layers of such an environment. It will be possible to incorporate BW/4HANA’s process chains into such data pipelines to allow for an end-to-end view, scheduling, monitoring and overall management. BW/4HANA will leverage VORA as a bridge between HANA and HDFS, e.g. for accessing NLS data that might have been moved to HDFS, for machine learning or transformation processes that involve (e.g. high volume) data in HDFS.
An EDW in the context of a big data system landscape
This article has also been published on Linkedin. You can follow me on Twitter via @tfxz.