SAP Products and the DW Quadrant

Heidelberg PhilosophenwegIn a recent blog, I have introduced the Data Warehousing Quadrant, a problem description for a data platform that is used for analytic purposes. The latter is called a data warehouse (DW) but labels, such as data mart, big data platform, data hub etc., are also used. In this blog, I will map some of the SAP products into that quadrant which will hopefully yield a more consistent picture of the SAP strategy.

To recap: the DW quadrant has two dimensions. One indicates the challenges regarding data volume, performance, query and loading throughput and the like. The other one shows the complexity of the modeling on top of the data layer(s). A good proxy for the complexity is the number of tables, views, data sources, load processes, transformations etc. Big numbers indicate many dependencies between all those objects and, thus, high efforts when things get changed, removed or added. But it is not only the effort: there is also a higher risk of accidentally changing, for example, the semantics of a KPI. Figure 1 shows the space outlined by the two dimensions. The space is the divided into four subcategories: the data marts, the very large data warehouses (VLDWs), the enterprise data warehouses (EDWs) and the big data warehouses (BDWs). See figure 1.

Figure 1: The DW quadrant.

Now, there is several SAP products that are relevant to the problem space outlined by the DW quadrant. Some observers (customers analysts, partners, colleagues) would like SAP to provide a single answer or a single product for that problem space. Fundamentally, that answer is HANA. However, HANA is a modern RDBMS; a DW requires tooling on top. So, there is something more required than just HANA. Figure 2 assigns SAP products / bundles to the respective subquadrants. The idea behind that is to be a “flexible rule of thumb” rather than a hard assignment. For example, BW/4HANA can play a role in more than just the EDW subquadrant. We will discuss this below. However, it becomes clear where the sweet spots or the focus area of the respective products are.

Figure 2: SAP products assigned to subquadrants.

From a technical and architectural perspective, there is a lot of relationships between those SAP products. For example, operational analytics in S/4 heavily leverages the BW embedded inside S/4. Another example is BW/4HANA’s ability to combine with any SQL object, like SQL accessible tables, views, procedures / scripts. This allows smooth transitions or extensions of an existing system into one or the other direction of the quadrant. Figure 3 indicates such transitions and extension options:

  1. Data Mart → VLDW: This is probably the most straightforward path as HANA has all the capabilities for scale-up and scale-out to move along the performance dimension. All products listed in the data mart subquadrant can be extended using SQL based modeling.

  2. Data Mart → EDW: S/4 uses BW’s analytic engine to report on CDS objects. Similarly, BW/4HANA can consume CDS views either via the query or in many cases also for extraction purposes. Native HANA data marts combine with BW/4HANA similarly to the HANA SQL DW (see 3.).

  3. VLDW ⇆ EDW: Here again, I refer you to the blog describing how BW/4HANA can combine with native SQL. This allows BW/4HANA to be complemented with native SQL modeling and vice versa!

  4. VLDW or EDW → BDW: Modern data warehouses incorporate unstructured and semi-structured data that gets preprocessed in distributed file or NoSQL systems that are connected to a traditional (structured), RDBMS based data warehouse. The HANA platform and BW/4HANA will address such scenarios. Watch out for announcements around SAPPHIRE NOW 😀

Figure 3: Transition and extension options.

The possibility to evolve an existing system – located somewhere in the space of the DW quadrant – to address new and/or additional scenarios, i.e. to move along one or both dimensions is an extremely important and valuable asset. Data warehouses do not remain stale; they are permanently evolving. This means that investments are secure and so it the ROI.

This blog has also been published here. You can follow me on Twitter via @tfxz.


The Data Warehousing Quadrant

A good understanding or a good description of a problem is a prerequisite to finding a solution. This blog presents such a problem description, namely for a data platform that is used for analytic purposes. Traditionally, this is called a data warehouse (DW) but labels, such as data mart, big data platform, data hub etc., are also used in this context. I’ve named this problem description the Data Warehousing Quadrant. An initial version has been shown in this blog. Since then, I’ve used it in many meetings with customers, partners, analysts, colleagues and students. It has the nice effect that it makes people think about their own data platform (problem) as they try to locate where they are and where they want to go. This is extremely helpful as it triggers the right dialog. Only if you work on the right questions you will find the right answers. Or put the other way: if you start with the wrong questions – a situation that occurs far more often than you’d expect – then you are unlikely to find the right answers.

The Data Warehousing Quadrant (Fig. 1) has two problem dimensions that are independent from each other:

  1. Data Volume: This is a technical dimension which comprises all sorts of challenges caused by data volume and/or significant performance requirements such as: query performance, ETL or ELT performance, throughput, high number of users, huge data volumes, load balancing etc. This dimension is reflected on the vertical axis in fig. 1.

  2. Model Complexity: This reflects the challenges triggered by the semantics, the data models, the transformation and load processes in the system. The more data sources that are connected to the DW, the more data models, tables, processes exist. So, the number of tables, views, connected sources is probably a good proxy for the complexity of modeling inside the DW. Why is this complexity relevant? The lower it is the less governance is required in the system. The more tables, models, processes there are, the more dependencies between all this objects exists and the more difficult it becomes to manage all those dependencies whenever something (like a column of a table) needs to be added, changed, removed. This is the day-to-day management of the “life” of a DW system. This dimension is reflected on the horizontal axis in fig. 1.

The DW quadrant
Figure 1: The DW quadrant.

Now, these two dimensions create a space that can be divided into four (sub-) quadrants which we discuss in the following:

Bottom-Left: Data Marts

Here, the typical scenarios are, for example,

  • a departmental data mart, e.g. a marketing department sets up a small, maybe even open source based RDBMS system and creates a few tables that help to track a marketing campaign. Those tables hold data of customers that were approached, their reactions or answers to questionnaires, addresses etc. SQL or other views allow some basic evaluations. After a few weeks, the marketing campaign ends, hardly any or no data gets added and the data, the underlying tables and views slowly “die” as they are not used anymore. Probably, one or two colleagues are sufficient to handle the system, both setting it up and creating the tables and views. They now the data model intimately, data volume is manageable and change management is hardly relevant as the data model is either simple (thus changes are simple) or has a limited lifespan (≈ the duration of the marketing campaign).

  • An operational data mart. This can also be the data that is managed via a certain operational application as you find them e.g. in an ERP, CRM or SRM system. Here, tables, data are given and data consistency is managed by the related application. There is no requirement to involve additional data from other sources as the nature of the analyses is limited to the data sitting in that system. Typically, data volumes and number of relevant tables are limited and do not constitute a real challenge.

Top-Left: Very Large Data Warehouses (VLDWs)

Here, a typical situation is that there is a small number of business processes – each one supported via an operational RDBMS – with at least one of them producing huge amounts of data. Imagine the sales orders submitted via Amazon’s website: this article cites 426 items ordered per second on Cyber Monday in 2013. Now, the model complexity is considerably simple as only a few business processes, thus tables (that describe those processes), are involved. However, the major challenges originate in the sheer volume of data produced by at least one of those processes. Consequently, topics such as DB partitioning, indexing, other tuning, scale-out, parallel processing are dominant while managing the data models or their lifecycles is fairly straightforward.

Bottom-Right: Enterprise Data Warehouses (EDWs)

When we talk about enterprises then we look at a whole bunch of underlying business processes: financial, HR, CRM, supply-chain, orders, deliveries, billing etc. Each of these processes is typically supported by some operational system which has a related DB in which it stores the data describing the ongoing activities within the respective process. There is natural dependencies and relationships between those processes – e.g. there has to be an order before something is delivered or billed – that it makes sense for business analysts to explore and analyse those business processes not only in an isolated way but also to look at those dependencies and overlaps. Everyone understands that orders might be hampered if the supply chain is not running well. In order to underline this with facts the data from the supply chain and the order systems need to be related and combined to see the mutual impacts.

Data warehouses that cover a large set of business processes within an enterprise are therefore called enterprise data warehouses (EDWs). Their characteristic is the large set of data sources (reflecting the business processes) which, in turn, translates into a large number of (relational) tables. A lot of work is required to cleanse and harmonise data in those tables. In addition, the dependencies between the business processes and its underlying data are reflected in the semantic modeling on top of those tables. Overall, a lot of knowledge and IP goes into building up an EDW. This makes it sometimes expensive but, also, extremely valuable.

An EDW does not remain static. It gets changed, adjusted, new sources get added, some models get refined. Changes in the day-to-day business – e.g. changes in a company’s org structure – translate into changes in the EDW. This, by the way, does apply to the other DWs mentioned above, too. However, the lifecycle is more prominent with EDWs than in the other cases. In other words: here, the challenges by the model complexity dimension dominate the life of an EDW.

Top-Right: Big Data Warehouses (BDWs)

Finally, there is the top-right quadrant which starts to become relevant with the advent of big data. Please beware that “big data” not only refers to data volumes but also incorporating types of data that have not been used that much so far. Examples are

  • videos + images,
  • free text from email or social networks,
  • complex log and sensor data.

This requires additional technologies involved that currently surge in the wider environment of Hadoop, Spark and the like. Those infrastructures are used to complement traditional DWs to form BDWs, aka modern data warehouses, aka big data hubs (BDHs). Basically, those BDWs see challenges from both dimensions, the data volume and the modeling complexity. The latter is being augmented by the fact that models might span various processing and data layers, e.g. Hadoop + RDBMS.

How To Use The DW Quadrant?

Now, how can the DW quadrant help? I have introduced it to various customers and analysts and it made them think. They always start mapping their respective problems or perspectives to the space outlined by the quadrant. It is useful to explain and express a situation and potential plans of how to evolve a system. Here are two examples:

SAP addresses those two dimensions or the forces that push along those dimensions via various products, namely SAP HANA and VORA for the data volume and performance challenges, while BW/4HANA and tooling for BDH will help along the complexity. Obviously, the combination of those products is then well suited to address the cases of big data warehouses.

An additional aspect is that no system is static but evolves over time. In terms of the DW quadrant, this means that you might start bottom-left as a data mart to then grow into one or the other or both dimensions. These dynamics can force you to change tooling and technologies. E.g. you might start as a data mart using an open source RDBMS (MySQL et al.) and Emacs (for editing SQL). Over time, data volumes grow – which might require to switch to a more scalable and advanced commercial RDBMS product – and/or sources and models are added which requires a development environment for models that has a repository, SQL generating graphical editors etc. Power Designer or BW/4HANA are examples for the latter.

This blog can also be found on and on Linkedin. You can follow me on Twitter via @tfxz.

#BW4HANA and a SQL-Based DW Hand-in-Hand

AstorhausThis blog looks at one of BW/4HANA’s biggest strengths, namely to embrace both, (1) a guided or managed approach – using the highly integrated BW or BW/4 based tools and editors – and (2) a freestyle or SQL-oriented one – as prevalent in many handcrafted data warehouses (DWs) based on some relational database (RDBMS). And it is not only restricted to running those approaches side-by-side! They can also be combined in many ways which allows to tap into the best of both worlds. For instance, data can be loaded into an arbitrary table using basic SQL capabilities to then expose that table to BW/4HANA as if it were an infoprovider that can be secured via BW/4HANA’s rich set of security features.

In fact, many SAP customers have one or more BW systems for (1) and one or more DW systems for (2). Those systems depend on each other as data is copied from one to the other so that each system can provide a coherent view on the data. Keeping such a system landscape in sync is not only a technical challenge. Often, separate IT teams own the respective systems. There exists a natural rivalry; they compete for resources, ownerships, who has the better SLAs, whose requirements gets precedence in situations that affect both teams or systems and so on. Fig. 1 shows that situation.

Typical customer landscape with a Business Warehouse (BW) and a SQL-based data warehouse side-by-side.

Fig. 1: Typical customer landscape with a Business Warehouse (BW) and a SQL-based data warehouse side-by-side.

The reason for the organisational and technical separation that is shown in fig. 1 is typically found in that approaches (1) and (2) appear to be mutually exclusive and, thus, ought to be separated. This has become a common perception and practice. Now and as mentioned above, BW/4HANA offers the possibility of not only a coexistence of (1) and (2) in one single system but also synergetic combinations of (1) and (2) – see figure 2.

Fig. 2: BW/4HANA combines the best of both worlds in one and the same system.

Examples for synergies between (1) and (2) – the frequently cited mixed scenarios – have been documented in various presentations, webinars, blogs and the like, sometimes still in the context of BW-on-HANA but all of that is even more applicable now to BW/4HANA as the latter has seen a number of enhancements. Here is a non-exhaustive list of material:

In a simplified way or as a summary, there is the following options:

  1. SQL → BW/4HANA: Any SQL-consumable table or view can be incorporated into BW/4HANA, e.g. augmented by BW/4HANA based semantics (like currency logic) or infrastructure (like BW/4HANA defined security).
  2. BW/4HANA → SQL: Most of the BW/4HANA based data objects (i.e. infoproviders but also BW queries) can be exposed as SQL-consumable views, potentially with a loss of some semantics.
  3. BW/4HANA ⇄ SQL: There is a number of “exit options” that allow to add SQL, SQL script, R or any other HANA supported code to BW/4HANA processing. The most popular place is the HANA Analysis Process (HAP) in BW/4HANA.

This blog can also be found on SCN and on SAP HANA. You can follow me on Twitter via @tfxz.

Native DSO in #HANADW

There is an excellent series of short videos that introduce the native data store object (NDSO) for HANA. The NDSO can be considered as a more intelligent table that, in particular, allows to capture deltas. This is especially useful when data is regularly loaded to be transformed or cleansed afterwards: rather than going through the complete data set in the table, one can focus on the changes since the last transformation or cleansing has happened. This reduces the amount of data that needs to be processed and, thus, increases the throughput / performance of the process. Frequently, the effect is significant. The DSO idea has originated from SAP’s Business Warehouse (BW) and has seen the advent of the more versatile and powerful advanced DSO (ADSO) in BW/4HANA.

Here are 4 videos as an introduction to the NDSO:

There are more videos on the HANA DWF features in this list.

You can follow me on Twitter via @tfxz.

HANA Data Warehousing: The #HANADW

Hirschhorn am Neckar With this blog, I like to shed some light into the direction that SAP is taking towards a unified offering for building a data warehouse on top of HANA. The unofficial working title is the HANA DW. I’ve divided the blog into 3 sections, each addressing the most pressing questions that I’ve received from customers who have already seen flavours of this.

The Vision for Data Warehousing on HANA: the HANA DW.

As outlined in my blog Data Warehousing on HANA: Managed or Freestyle? BW or Native?, there are two approaches (preferences) for building a DW, not only on HANA but in general:

  1. SQL-based: Meaning that the DW architects use SQL as the main building paradigm which gives them a lot of freedom but also bears the risk that too much diversity jeopardises the lifecycle of the DW as it becomes increasingly complex to manage dependencies (e.g. impact of changes) and integration (e.g. same entities – like products, customers – represented in different ways, using different data types etc).
  2. Managed via best practices: Here, high-value building blocks (like data flows, transformations, hierarchies, BW’s DSOs, BW’s data requests but also naming conventions) are used to construct and manage the DW. This is a faster way as long as the building blocks serve the need. It gets cumbersome whenever there is a scenario that requires deviating from the standard path offered via the building blocks.

In recent years, BW-on-HANA has offered approach #2 being extended and combined with #1, the so called mixed case scenario. A tangible example is described here. Many customers have adopted such a mixed approach; in fact, it has become the mainstream for BW-on-HANA.

The HANA DW takes a similar direction but starts with #1 and complements with #2 which, in the end, yields the same result. It goes along the following notion:

  • Start with a naked HANA DB that offers all sorts of SQL capabilities that you need. Fundamentally, you can now write your SQL code in Notepad, Emacs, VI etc, store that SQL code in files and execute them in HANA either manually or via generic tools like cron.
  • Now, writing SQL code from scratch in a text editor is cumbersome, even if there is some syntax highlighting or automatic syntax completion. Most people acquire tools that allow them to graphically model / design / create stuff to generate the underlying SQL statements.
  • Whichever method you use to get to the SQL statements, there will be the need to maintain them. Scenarios get extended or adjusted. This translates into changes on the SQL level. For purposes like auditing or simply for having the option to return to an earlier setup it is good practice to track the evolution (i.e. the changes) and to keep the versions of those (SQL or higher-level) artifacts. This is nothing else than in all kinds of programming environments and one can lend infrastructure from there like GIT. The latter and services related to it are (or will be) offered by the HANA platform. They consitute a repository.
    There are two more tasks that the repository should support:

    • managing the dependencies between the objects (e.g. a transformation using certain stored procedures who, in turn, use certain tables), and
    • the release management of those (SQL or higher-level) artifacts, e.g. to allow them being developed and tested in one system w/o jeopardising the production system.
  • Finally, there are certain recurring patterns of SQL: things that you need to do over and over again. Examples are tracking incoming data (e.g. via something like the data request in BW), how to derive data changes (like in a DSO), how to store hierarchies etc. Such “patterns” basically translate into higher-level (abstract) artifacts that are created and maintained at the abstraction level to then be translated into a series of SQL statements.

The HANA DW will support this process in the following way; figure 1 below visualises this:

  • The HANA DB provides all the SQL functionality you need.
  • The HANA platform will provide the development infrastructure, especially to support a repository and related services.
  • Tooling on top will create either direct (HANA) SQL* or higher-level artifacts that translate into (HANA) SQL*.
  • Those tools will keep their artifacts in the HANA repository, allowing to support the complete lifecycle incl. auditing, versioning, dependency analysis (especially also between artifacts maintained by different tools).
  • Tools constitute optional added value that you can use but that you don’t have to use. Consider BW-on-HANA as such a tool too.

It is planned to bring the currently existing SAP products related to data warehousing into this HANA DW setup. This will allow SQL-based data warehousing (1.) enriched via higher-level / higher-purpose artifacts (2.). The second pillar in figure 2 describes that evolution. The third pillar indicates that tooling will evolve, potentially into a series of apps or services that can also manage a cloud-based DW.

The vision for the HANA DW.
Figure 1: The vision for the HANA DW.

Short-, mid- and long-term evolution of the HANA DW.
Figure 2: Short-, mid- and long-term evolution of the HANA DW.

The Role of BW-on-HANA.

From the above, it should have become obvious that BW-on-HANA will form an important, but optional part of the HANA DW. If it is convenient for the purpose of the DW, then it should be used or added to a HANA DW. Another potential scenario is that an existing BW-on-HANA will gradually evolve into a HANA DW as it is complemented with other tooling in the fashion described above. The border line will be blury. In any case, BW-on-HANA will extend and enhance its existing functionality enabling more and more direct SQL access + options and leveraging / interacting with the HANA repository. A stand-alone BW-on-HANA system, as it exists today, can be considered as a special instance of a HANA DW. It will continue to exist, evolve, excel. Anyone investing into BW-on-HANA today is on a safe track.

The Role of HANA Vora and Hadoop: the HANA Big DW.

Many customers are looking at ways to complement existing data warehouses with Hadoop. HANA Vora will play a pivotal role in combining the HANA and Hadoop platforms. Therefore, HANA Vora will allow to extend the HANA DW into a HANA Big DW (current working title). We will elaborate on that at a later stage.

* Please consider HANA SQL here as a placeholder comprising all sorts of more specialised languages and extensions like MDX, SQLscript, calc engine expressions etc.

This post is also available on Linkedin Pulse, SCN and Startup.Focus. You can follow me on Twitter via @tfxz.

From Data To Intelligence, From Hadoop To DBMS

For many years, it has been best practice to use well structured data (typically originating in an operational RDBMS) with well defined and well known semantics (derived from the application feeding that operational RDBMS) for analysing business and other processes. That analyses convert data into intelligence, meaning knowledge that allows to take better and educated decisions.

Knowledge is the final result, i.e. answers exist to certain questions. The latter constitute the step prior to gaining insight. Given the right set of well defined questions (or queries), it is typically straightforward to find the answers (or compute the query results). So, business analysts have been striving for a while to find the “golden queries”, i.e. the right questions to ask. This has led to approaches like data exploration, outlier analysis, data mining, machine learning etc. Fundamentally, they are expected to provide methods to find good questions that lead to useful results (insights). Nowadays, finding good questions is considered to be a fine art and data scientists are supposed to be at the forefront of that effort.

It is important to understand how knowledge and intelligence are acquired in order to understand how technical architectures that support that process need to look like. For many years, we have built OLTP, OLAP, data warehouse systems to collect and analyse data. Those systems continue to be relevant. However, they need to be complemented by new infrastructures like Hadoop that can cater for data that has no clear structure, initially no obvious value or purpose, little semantics and that frequently comes, above all, in huge volumes. However, while we have more or less learned to manage the data volumes, it is still necessary to tackle the many unknowns in big data. Forrester therefore states:

Of Gartner’s “3Vs” of big data (volume, velocity, variety), the variety of data sources is seen by our clients as both the greatest challenge and the greatest opportunity.

In fact, this explains also why a file system like HDFS incl. processing infrastructures like map-reduce or Spark is so suitable: it can address that variety better than traditional RDBMS.

Data lifecycle by combining HANA and Hadoop

Data lifecycle by combining HANA and Hadoop.

At SAP, we are currently working on ways to integrate Hadoop not only technically (e.g. access Hadoop from SQL or allow Hadoop to access HANA) but provide and define ways for our customers to look at Hadoop (and “big data”) as an extension to their existing data management setups. This requires to understand how the customers work with data from finding areas where they expect valuable questions, to finding those questions (queries), to then derive the results.

PS: On Sep 29, 2015, I gave a 5-min presentation on this topic using the slide shown below. It was presented at the HPTS 2015 workshop.

From data to intelligence.

From data to intelligence.

What’s the Difference Between a Classic #SAPBW and #BWonHANA?

High level comparison between a classic BW and the two versions of BW-on-HANA.

Fig. 1: High level comparison between a classic BW and the two versions of BW-on-HANA. A PPT version of this pic can be found here.

This is yet another question that I get from all angles, partners, customers but even colleagues. BW has been the spearhead SAP application to run on HANA. Actually, it is also one of the top drivers for HANA revenue. We’ve created the picture in figure 1 to describe – on a high level – what has happened. I believe that this not only tells a story on BW’s evolution but underlines the overall HANA strategy of becoming not only a super-fast DBMS but an overall, compelling and powerful platform.

Classic BW

Classic BW (7.3ff) follows the classic architecture with a central DBMS server with one or more application servers attached. The latter communicate with the DBMS in SQL via the DBSL layer. Features and functions of BW – the red boxes in the left-most picture of fig. 1 – are (mostly) implemented in ABAP on the application server.

BW 7.3 on HANA

At SAPPHIRE Madrid in November 2011, BW 7.3 was the first version to be released on HANA as a DBMS. There, the focus was (a) to enable HANA as a DBMS underneath BW and (b) to provide a few dedicated and extremely valuable performance improvements by pushing the run-time (!) of certain BW features to the HANA server. The latter is shown in the centre of fig. 1 by moving some of the red boxes from the application server into the HANA server. As the BW features and functions are still parameterised, defined, orchestrated from within the BW code in application server, they are still represented as striped boxes in the application server. Actually, customers and their users do not note a difference in usage other than better performance. Examples are: faster query processing, planning performance (PAK), DSO activation. Frequently, these features have been implemented in HANA using specialised HANA engines (most prominently the calculation and planning engines) or libraries that go well beyond a SQL scope. The latter are core components of the HANA platform and are accessed via proprietary, optimised protocols.

BW 7.4 on HANA

The next step in the evolution of BW has been the 7.4 release on HANA. Beyond additional functions being pushed down into HANA, there has been a number of features (pictured as dark blue boxes in fig. 1) that extent the classic BW scope and allow to do things that were not possible before. The HANA analytic process (e.g. using PAL or R) and the reworked modeling environment with new Eclipse-based UIs that smoothly integrate with (native) HANA modeling UIs and concepts leading also to a reduced set of infoprovider types that are necessary to create the data warehouse. Especially the latter have triggered comments like

  • “This is not BW.”
  • “Unbelievable but BW has been completely renewed.”
  • “7.4 doesn’t do justice to the product! You should have given it a different name!”

It is especially those dark blue boxes that surprise many, both inside and outside SAP. It is the essence that makes dual approaches, like within the HANA EDW, possible, which, in turn, leads to a simplified environment for a customer.

This blog has been cross-published here. You can follow me on Twitter via @tfxz.