SAP Products and the DW Quadrant

Heidelberg PhilosophenwegIn a recent blog, I have introduced the Data Warehousing Quadrant, a problem description for a data platform that is used for analytic purposes. The latter is called a data warehouse (DW) but labels, such as data mart, big data platform, data hub etc., are also used. In this blog, I will map some of the SAP products into that quadrant which will hopefully yield a more consistent picture of the SAP strategy.

To recap: the DW quadrant has two dimensions. One indicates the challenges regarding data volume, performance, query and loading throughput and the like. The other one shows the complexity of the modeling on top of the data layer(s). A good proxy for the complexity is the number of tables, views, data sources, load processes, transformations etc. Big numbers indicate many dependencies between all those objects and, thus, high efforts when things get changed, removed or added. But it is not only the effort: there is also a higher risk of accidentally changing, for example, the semantics of a KPI. Figure 1 shows the space outlined by the two dimensions. The space is the divided into four subcategories: the data marts, the very large data warehouses (VLDWs), the enterprise data warehouses (EDWs) and the big data warehouses (BDWs). See figure 1.

Figure 1: The DW quadrant.

Now, there is several SAP products that are relevant to the problem space outlined by the DW quadrant. Some observers (customers analysts, partners, colleagues) would like SAP to provide a single answer or a single product for that problem space. Fundamentally, that answer is HANA. However, HANA is a modern RDBMS; a DW requires tooling on top. So, there is something more required than just HANA. Figure 2 assigns SAP products / bundles to the respective subquadrants. The idea behind that is to be a “flexible rule of thumb” rather than a hard assignment. For example, BW/4HANA can play a role in more than just the EDW subquadrant. We will discuss this below. However, it becomes clear where the sweet spots or the focus area of the respective products are.

Figure 2: SAP products assigned to subquadrants.

From a technical and architectural perspective, there is a lot of relationships between those SAP products. For example, operational analytics in S/4 heavily leverages the BW embedded inside S/4. Another example is BW/4HANA’s ability to combine with any SQL object, like SQL accessible tables, views, procedures / scripts. This allows smooth transitions or extensions of an existing system into one or the other direction of the quadrant. Figure 3 indicates such transitions and extension options:

  1. Data Mart → VLDW: This is probably the most straightforward path as HANA has all the capabilities for scale-up and scale-out to move along the performance dimension. All products listed in the data mart subquadrant can be extended using SQL based modeling.

  2. Data Mart → EDW: S/4 uses BW’s analytic engine to report on CDS objects. Similarly, BW/4HANA can consume CDS views either via the query or in many cases also for extraction purposes. Native HANA data marts combine with BW/4HANA similarly to the HANA SQL DW (see 3.).

  3. VLDW ⇆ EDW: Here again, I refer you to the blog describing how BW/4HANA can combine with native SQL. This allows BW/4HANA to be complemented with native SQL modeling and vice versa!

  4. VLDW or EDW → BDW: Modern data warehouses incorporate unstructured and semi-structured data that gets preprocessed in distributed file or NoSQL systems that are connected to a traditional (structured), RDBMS based data warehouse. The HANA platform and BW/4HANA will address such scenarios. Watch out for announcements around SAPPHIRE NOW 😀

Figure 3: Transition and extension options.

The possibility to evolve an existing system – located somewhere in the space of the DW quadrant – to address new and/or additional scenarios, i.e. to move along one or both dimensions is an extremely important and valuable asset. Data warehouses do not remain stale; they are permanently evolving. This means that investments are secure and so it the ROI.

This blog has also been published here. You can follow me on Twitter via @tfxz.

S/4HANA and #BWonHANA

dawn-walldorfFor many years, the absence of powerful analytic modeling and processing within SAP’s R/3 and Business Suite applications led customers to install an instance of Business Warehouse (BW) next to such a system. Essentially, BW closed the analytic gap of those solutions. To that end, data was loaded from R/3 or the Business Suite into BW. Maintaining those load processes, two systems and the resulting time lag between the data being created in R/3 or Business Suite and the data being visible in BW reporting was the price for this workaround. Now, that operational analytics is possible within S/4HANA, this workaround has become obsolete. This is sometimes mis-perceived as BW becoming obsolete. Let’s have a look into the situation to understand what is dead and what is alive.

In my recent blog on S/4HANA and Data Warehousing, I’ve made the point that data warehouses – in general – are still necessary, even with the advent of operational analytics in S/4HANA. Here, I like to tackle the topic from BW’s point of view. To that end it is important to understand that BW can be deployed in two ways; fig. 1 shows this visually:

  • embedded: into any application that runs on Netweaver like SAP CRM, SCM, HR, FIN etc, and
  • stand-alone: as an enterprise data warehouse (EDW) that allows to harmonise data originating from various, disconnected systems so that it can be analysed in a consistent way. This allows to get an understanding across many disparate business processes that exist in an enterprise.
Deployment options for BW.

Figure 1: Deployment options for BW.

So in both deployment options, BW exists and can be used for operational or cross-system analytics. This leads to 4 potential situations, also depicted in fig. 2:

  1. Embedded, used for operational analytics
    • BW reuses application’s storage, semantics, security.
    • No redundancy.
  2. Stand-alone, used for operational analytics
    • Data, semantics, security is copied to BW as a workaround as no analytics options are available in the app.
    • This is the case that should be replaced as it is sufficient but not necessary.
    • It is not ideal but a workaround to fill a gap within the app.
  3. Embedded, used for cross-system analytics
    • BW within Netweaver (on which S/4HANA, SAP Business Suite, SAP CRM etc are built) is used as a DW.
    • Technically this is possible.
    • However, it is currently not recommended due to concerns around workload + governance around the system.
  4. Stand-alone, used for cross-system analytics
    • BW as a stand-alone data warehouse.
    • This is the most frequent deployment of BW.
Theoretical combinations of BW use cases and BW deployment options.

Figure 2: Theoretical combinations of BW use cases and BW deployment options.

Case 2. is solved in S/4HANA by replacing it with case 1. – which is not a prominent but yet a fact. This frequently leads to the misunderstanding that “BW was obsolete” which should actually read that case 2. is mostly obsolete (in the context of S/4HANA) . Furthermore, case 4. continues to be a valid, as is any data warehouse approach that blends S/4HANA data with data from other systems for deeper insight into what is going on in the enterprise. The advent of IoT scenarios makes this even more imperative than before.

This blog has been cross-published here. You can follow me on Twitter via @tfxz.

PS: The figures can be found in this slidedeck.

PPS: There is an excellent real-world example for case 1. in the blog How Nucor Simplified Their BW Landscape By Using An Embedded BW, maybe with a certain overlap with case 3.

S/4HANA and Data Warehousing

SAP Walldorf Mar 2015One of the promisses of S/4HANA is that analytics is integrated into the [S/4HANA] applications to bring analyses (insights) and the potentially resulting actions closely together. The HANA technology provides the prerequisites as it allows to easily handle “OLTP and OLAP workloads”. The latter is sometimes translated into a statement that data warehouses would become obsolete in the light of S/4HANA. However, the actual translation should read “I don’t have to offload data anymore from my application into a data warehouse in order to analyse that data in an operational (isolated) context.”. The fundamental thing here is that analytics is not restricted to pure operational analytics. This blog elaborates that difference.

To put it simple: a business application manages a business process. Just take the Amazon website: it’s an application that handles Amazon’s order process. It allows to create, change, read orders. Those orders are stored in a database. A complex business (i.e. an enterprise) has many such business processes, thus many apps that support those processes. Even though some apps share a database – like in SAP’s Business Suite or S/4HANA – there is usually multiple databases involved to run a modern enterprise:

  • Simply take a company’s email server which is part of a communications process. The emails, the address book, the traffic logs etc sit in a database and consitute valuable data for analysis.
  • Take a company’s webserver: it’s a simple app that manages access to information of products, services and other company assets. The clickstream tracked in log files constitutes a form of (non-transactional) database.
  • Cash points (till, check-outs) in a retail or grocery store form part of the billing process and write to the billing database.
  • Some business processes incorporate data from 3rd parties like partners, suppliers or market research companies meaning that their databases get incorporated too.

The list can be easily extended when considering traditional processes (order, shipping, billing, logistics, …) and all the big data scenarios that arise on a daily base; see here for a sample. The latter add to the list of new, additional databases and, thus, potential data sources to be analysed. From all of that, it becomes obvious that not all of those applications will be hosted within S/4HANA. It is even unlikely that all the underlying data is physically stored within one single database. It is quite probable that it needs to be brought either physically or, at least, logically to one single place in order to be analysed. That single place hosts the analytic processing environment, i.e. some engines that apply semantics to the data.

Now, whatever the processing environment is (HANA, Hadoop, Exadata, BLU, Watson, …) and whatever technical power it provides, there is one fundamental fact: if the data to be processed is not consistent, meaning harmonised and clean, then the results of the analyses will be poor. “Garbage in – garbage out” applies here. Even if all originating data sources are consistent and clean, then the union of their data is unlikely to be consistent. It starts with non-matching material codes, country IDs or customer numbers, stretches to noisy sensor data and goes up to DB clocks (whose values are materialised in timestamps) that are not in sync – simply look at Google’s efforts to tackle that problem.

In summary: while analytics in S/4HANA is operational, there is 2 facts that make non-operational (i.e. beyond a single, isolated business process) and strategical analyses challenging:

  1. It is likely that enterprise data sits in more than 1 system.
  2. Data that originates from various systems is probably not clean and consistent when being combined.

A popular choice to tackle that challenge is a data warehouse. It has the fundamental task to expose the enterprise data in a harmonised and consistent way (“single version of the truth”). This can be done by physically copying data into a single DB to then transform, cleanse, harmonise the data there. It can also be done by exposing data in a logical way via views that comprise code to transform, cleanse, harmonise the data (federation). Both approaches do the same thing, simply at different moments in time: before or during query execution. But, both approaches do cleanse and harmonise. There is no way around. So, either physical or logical data warehousing is a task that does not go away. Operational analytics in S/4HANA cannot and does not intend to replace the strategical, multi-systems analytics of a physical or logical data warehouse. This should not be confused by the fact that they can leverage the same technical assets, e.g. HANA.

On purpose, this blog has been neutral to the underlying product or approach used for data warehousing. This avoids that technical product features are mixed up with general tasks. In a subsequent blog, I will tackle the topic of the relationship between S/4HANA and BW-on-HANA.

This blog has been cross-published here. You can follow me on Twitter via @tfxz.

Business Warehouse by SAP seems very popular – #SAPBW

This is an English summary for an article published in Computerwoche and CIO.de. I’ve borrowed that summary from a news syndication email as the original article is in German:

Computerwoche’s annual survey among German system integrator’s customers concludes that SAP’s ERP solutions continue to dominate the market. In the scope of the survey, customers evaluated 570 IT software application projects and rated their satisfaction with the work of their house system. NetWeaver Business Intelligence (SAP BI), business warehouse solution by SAP, scored very high among the customers. Also, customers revealed plans to introduce SAP’s cloud offering Business by Design. In addition to ERP installations, customer relationship management (CRM), document management systems (DMS) and content management system solutions (CMS) are currently in demand […]

10 Years of “No Aggregates” in #SAPBW

"High Performance BI" (aka BWA) as explained by Shai Agassi in his keynote at Teched San Diego in 2004.

Fig. 1: “High Performance BI” (aka BWA) as explained by Shai Agassi in his keynote at Teched San Diego in 2004.

Cleaning up the hard drive of my laptop this week I’ve come across a number of documents that reminded me that in spring 2004 we – i.e. the BW and TREX development teams – had the first version of the BWA up and running internally. Then, it still was project Euclid. Later that year, Euclid was presented publicly at Techeds in San Diego and Munich – see also Fig. 1. Euclid paved the way to SAP’s first commercial in-memory product, the BW Accelerator (BWA). The BWA was first shipped with NetWeaver 2004s (i.e. BW 7.0). It removed the need to define and maintain aggregates, i.e. materialized aggregations (group-by’s and filters). At that time, BW’s change run – the process that adjusted aggregates to master data changes, like changes in attributes and hierarchies – was one of the top-3 critical processes in almost every customer installation.

One of the first customers adopting BWA simply upgraded from BW 3.5 to BW 7.0 leaving everything in place but added BWA to his BW system. Thus, there was no change to the end user but the power of BWA. They reported the following effects:

  • more than 90% of queries on indexed cubes ran faster than before
  • one third of queries saw response time cuts by more than half
  • the average query runtime was cut by two thirds

Those impressive improvements in query performance translated into a better usage of their BW system. Adding BWA was the only change in the system. Thus the improved usage could be attributed to the improved query run times:

  • the number of query runs per week increased by over 60%
  • over two thirds of the queries saw an increased usage
Number of support tickets related to BW aggregates. Number of tickets in 2006 = 100%.

Fig. 2: Number of support tickets related to BW aggregates. Number of tickets in 2006 = 100%.

One other episode came from a European customer ordering a BWA instance for a proof-of-concept to then not allow the hardware vendor to remove the BWA as they wanted to keep and use it immediately. Since then, 1000+ BWAs have been installed and nowadays, BW-on-HANA basically runs a “BWA within” providing even better query performance w/o aggregates and removing the need to synchronize a standard RDBMS and a BWA as “all (RDBMS and BWA) is one”, namely HANA.

Aggregates usage has significantly declined over the years despite significant growth rates of around 15-20% for BW installations per year. Fig. 2 shows how the number of aggregates-related support tickets raised by BW customers has decreased by 60% over the course of 8 years while the number of BW installations roughly tripled. Oddly, some people still talk about aggregates in the context of BW, even 10 years after their way out started. Those comments are trailing 10 years behind the fact.

This blog has been cross-published here. You can follow me on Twitter under @tfxz.

Down to Earth (or Mars)?

Everyone heralds the landing of NASA’s Mars rover as a tremendous success of engineering. Interestingly, the computing technology on board the rover is 8 years old and seems to be outdated as described in an article published by Information Management last week: processors clocking at 133 MHz*, 20 MB of code, 4.5 GB storage, modem-like data bandwidths, 2 megapixel camera etc. It is an excellent reminder of (a) what is possible when focusing on the essential stuff, and (b) that the popular “more & faster” is sometimes only secondary compared to “reliable, robust and minimal”, especially in this instance. And maybe it is a reminder that “reliable, robust and minimal” exists at all and is not given by default. Reflecting on that and what it means to my activities I’d list three experiences that fall into that category. I think they might be worthwhile sharing; I’d be happy to learn from anybody else similar examples. So here we go …

1. Stable vs Excellent Performance

A few years ago, when showing the first prototypes of BWA to customers, we found out that many of them rated the fact that BWA’s query performance was stable even higher than the speed. Stable in this context means that if the query runs in N sec today then it runs in N sec tomorrow. And even if some additional data had been uploaded over night the performance would degrade only a bit. This contrasts a lot with cost-based query optimizers that depend so much on DB statistics for their cost calculations. As such, new statistics – e.g. created by a nightly admin task – can lead to largely different execution plans and thus query performances. So if queries have been tested thoroughly – e.g. in preparation of a go-live – then there is still a chance that performances could deviate largely from one day to the other.

Don’t get me wrong. Cost-based query optimizers are extremely powerful mechanisms. But the effect described above is a reminder of how complexity (of the software in this case) can translate into the opposite of what had been intended.

2. The Value of Having All Necessary Data in One Place

Another interesting fact is that many data warehouses (DWs) have so called staging areas that receive data 1:1 from the various source systems. This applies to ELT based approaches like in SAP BW but also to many home-grown and hand-crafted DWs. The staging layer removes the dependency on the source systems being available at all times, i.e.

  • that the network connections to those systems are available and provide the necessary bandwidth, and
  • that and the systems themselves have no planned or unforeseen downtime.

50+ sources are frequent for DWs and even some data marts. Consequently, the probability that one source system is not available (or not available with at a necessary bandwidth) at any given time is quite high. So the staging layer relieves us from that dependency and provides availability of source data at the expense of some latency and redundancy. In other words: it trades of providing a certain service level against latency and redundancy. In contrast, federated approaches emphasize “no latency” and “no redundancy” but work only if sources are, in sum, highly reliable and available. For an anecdote on what still happens these days in real world setups see my comment to this blog.

3. Solar-Powered Mobile Webcam

One of my recent pet projects is running an outdoor webcam in a remote location that regularly sends pictures to my FTP server at home. The setup looks like this:

  1. an outdoor webcam with an integrated server for FTP and SMTP (email) connectivity,
  2. a mobile WLAN router that connects the webcam to the internet (router and webcam are 1.5m apart with a brick wall inbetween),
  3. a car battery,
  4. a converter, providing the right voltage and connecting the battery to webcam and router,
  5. solar panels to charge 3.,
  6. an FTP server at home to receive the pictures.

The mode of operation is that webcam and router are switched on some hours during the day (a) to not go beyond the batteries capacity and (b) to force a daily reboot of webcam and router. It sends a picture to the FTP server every hour. After some testing, the system started to work well and has remained unchanged (no changes to the configuration, no updates etc; only the fixes cited below). So here is a list of unexpected incidents that I had not taken into account:

  • A long spell of cloudy weather during winter time left the battery uncharged which led to the router loosing its configuration. After adding another solar panel and decreasing the hours of operations things ran smoothly.
  • On some days, the feed from the webcam to the FTP server stopped in the afternoon but resumed on the next day when the webcam and the router were rebooted. After some analyses I tracked the reason down to (a) the WLAN connection being less stable due to high temperatures and (b) a weakness in the webcam’s firmware to react to unstable WLAN
    connections; a different firmware from another vendor offering the same hardware/webcam did work.
  • My FTP server uses a dynamic IP address based on a dynamic DNS service. On 3 occassions, the (daily) registration of the changed IP address at the dynamic DNS service failed which made the FTP server unreachable. Fortunately, that problem fixed itself with the next IP address change and a new registration attempt.
  • The FTP server’s unavailability, in turn, affected the mobile router in the aftermath which then frequently cut off the FTP transmissions thereby sending only parts of the JPG file (which could still be viewed but omitted, e.g., the lower part of the picture). Clearing all its caches did the trick and allowed to return to a normal mode.
  • The mobile router lost its configuration once, probably during one of the reboots. It had to be reconfigured.
  • Some day last June, FTP connections started to time out leaving 0-byte-sized JPG files on my FTP server. That continued for weeks. Finally, I found out that the mobile operator was working on its network (allegedly upgrading it to 4G) thereby affecting 3G and GPRS connections. I only got that one resolved by switching to a different mobile operator.

Long story short: even such a simple system with some basic components and some basic mode of operations can fail in unexpected ways. As the webcam is located 2000 km from home, going onsite is not an easy option. Reliability is therefore highly desirable; in contrast, performance requirements are rather minimal in this case. I can really relate to the colleagues who put in every effort to make the Mars rover’s systems as reliable as possible. They have done an admirable job!


* For completeness: this article describes the RAD750 chip in more details and talks about 200 MHz.

July 2012 Listing of BW and HANA Related Blogs

Whenever I’m coming across a good blog that is related to BW, BW-on-HANA or HANA stand-alone I somewhere store a link to that blog for future reference. Potentially that list is useful for others. So here it is:

  1. SAP BW as an Enterprise Data Warehouse and How SAP HANA Changes the Game by Sven Jensen (9 Jul 2012)
  2. On the #SAPchat:
  3. Building the Business Case for SAP BW on HANA by Sven Jensen (19 Jul 2012)
  4. Toward an analysis of datawarehouse and business intelligence challenges – part 1 by Ethan Jewett (23 Jul 2012)
  5. Toward an analysis of datawarehouse and business intelligence challenges – part 2 by Ethan Jewett (24 Jul 2012)
  6. SAP HANA Crossing the Chasm – Some Unsolicited Advice by Stefan Schaffer (23 Jul 2012)

Here are some older blogs:

  1. Does SAP HANA Replace BW? (Hint: No.) by Steve Lucas (13 Jun 2012)
  2. Does SAP HANA Replace BW? (Hint: No.) – Part 2
    by John Appleby (20 Jun 2012)
  3. Why Should I Care About SAP BW? (22 Jun 2012)
  4. SAP BW and SAP HANA – a simpler view by @donaldmac (25 Jun 2012)
  5. Aareal Bank went live with SAP BW Powered by SAP HANA (26 Jun 2012)
  6. Having data is a waste of time when you can’t agree on an interpretation. on Twitter (29 Jun 2012)