From Data To Intelligence, From Hadoop To DBMS

For many years, it has been best practice to use well structured data (typically originating in an operational RDBMS) with well defined and well known semantics (derived from the application feeding that operational RDBMS) for analysing business and other processes. That analyses convert data into intelligence, meaning knowledge that allows to take better and educated decisions.

Knowledge is the final result, i.e. answers exist to certain questions. The latter constitute the step prior to gaining insight. Given the right set of well defined questions (or queries), it is typically straightforward to find the answers (or compute the query results). So, business analysts have been striving for a while to find the “golden queries”, i.e. the right questions to ask. This has led to approaches like data exploration, outlier analysis, data mining, machine learning etc. Fundamentally, they are expected to provide methods to find good questions that lead to useful results (insights). Nowadays, finding good questions is considered to be a fine art and data scientists are supposed to be at the forefront of that effort.

It is important to understand how knowledge and intelligence are acquired in order to understand how technical architectures that support that process need to look like. For many years, we have built OLTP, OLAP, data warehouse systems to collect and analyse data. Those systems continue to be relevant. However, they need to be complemented by new infrastructures like Hadoop that can cater for data that has no clear structure, initially no obvious value or purpose, little semantics and that frequently comes, above all, in huge volumes. However, while we have more or less learned to manage the data volumes, it is still necessary to tackle the many unknowns in big data. Forrester therefore states:

Of Gartner’s “3Vs” of big data (volume, velocity, variety), the variety of data sources is seen by our clients as both the greatest challenge and the greatest opportunity.

In fact, this explains also why a file system like HDFS incl. processing infrastructures like map-reduce or Spark is so suitable: it can address that variety better than traditional RDBMS.

Data lifecycle by combining HANA and Hadoop

Data lifecycle by combining HANA and Hadoop.

At SAP, we are currently working on ways to integrate Hadoop not only technically (e.g. access Hadoop from SQL or allow Hadoop to access HANA) but provide and define ways for our customers to look at Hadoop (and “big data”) as an extension to their existing data management setups. This requires to understand how the customers work with data from finding areas where they expect valuable questions, to finding those questions (queries), to then derive the results.

PS: On Sep 29, 2015, I gave a 5-min presentation on this topic using the slide shown below. It was presented at the HPTS 2015 workshop.

From data to intelligence.

From data to intelligence.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s