Quality of Sensor Data: A Study Of Webcams

noisy GPS data

Fig 1: Noisy GPS data: allegedly running across a lake.

For a while, I’ve been wondering what the data quality of sensor data is. Naively – and many conversations that I had on this went along that route – it can be assumed that sensors always send correct data unless they fail completely. A first counter-example that many of us can relate to is GPS, e.g. integrated into a smartphone. See the figure to the right which visualises part of a running route and shows me allegedly running across a lake.

Now, sensor does not equal sensor, i.e. it is not appropriate to generalise “sensors”. Quality of measurements and data varies a lot on the actual measure (e.g. temperature), the environment, the connectivity of the sensor, the assumed precision and many more effects.

In this blog, I analyse a fairly simple, yet real-world setup, namely that of 3 webcams that take images every 30 minutes and send them via the FTP protocol to an FTP server. The setup is documented in the following figure that you can read from right to left in the following way:

  1. There are 3 webcams, each connected to a router via WLAN.
  2. The router is linked to an IP provider via a long-range WIFI connection based on microwave technology.
  3. Then there is a standard link via the internet from IP provider to IP provider.
  4. A router connects to the second IP provider.
  5. The FTP server is connected to that router.
The connection between the webcams on th right to the FTP server on the left.

Fig 2.: The connection between the webcams on the right to the FTP server on the left.

So, once an image is captured, it travels from 1. to 5. I have been running this setup for a number of years now. During that time, I’ve incorporated a number of reliability options like rebooting the cameras and the (right-hand) router once per day. From experience, steps 1. and 2. are the most vulnerable in this setup: both, long-range WIFI and WLAN, are liable to a number failure options. In my sepcific setup, there is no physical obstacle or frequency polluted environment. However, weather conditions are most likely to be the source of distortion, like widely varying humidity and temperature.

So, what is the experiment and what are the results? I’ve been looking at the image data sent over the course of approx. 3 months. In total, around 8000 images were transmitted. I counted the successful (fig 3) vs the unsuccessful (fig 4) transmissions. I did not track the images that completely failed to be transmitted, i.e. that did not reach the FTP server at all and therefore did not leave any trace. 5.3% of the images were distorted (as in fig 4) or every 19-th image failed to be transmitted correctly. In addition, that rate was no constant (e.g. per week) but there were times of heavy failures and times of no failures.

Successfully transmitted image.

Fig 3: Successfully transmitted image.

Distorted image.

Fig 4: Distorted image.

This is an initial and simple analysis but one that matches real-world conditions and setups pretty well and is therefore no artificial simulation. In the future, I might refine the analysis like counting non-transmissions too or correlating the quality with temperature, humidity or other potential influencing factors.

You can follow me on Twitter via @tfxz.

What To Take Away From The #HLF16

20160922-img_7497From 18-23 Sep 2016, I had the privilege to join approx. 200 young researchers and 21 laureates of the most prestigious awards in Mathematics and Computer Science at the 4th Heidelberg Laureate Forum (HLF). Similar to the Lindau Nobel Meetings, it intends to bring together and seed the exchange between the past and the future generations of researchers.

I thought that it is worthwhile writing up the experience from someone with my background and perspective which is the following: I am a member of the HLF foundation council and could therefore join this illustrous event. I hold a master degree from the Karlsruhe Institute of Technology (KIT) and a PhD from Edinburgh University, both in CS. For almost 18 years, I have been working for SAP in development, most of the time confronted with high performance problems of CS in general and DBMS in particular. So, generation- and careerwise, I’m somewhere inbetween the two focus groups of the HLF.

The Setup

… was excellent: there were lectures by the laureates during the morning and breakout events (panels, workshops, poster sessions and the like) in the afternoon. Inbetween, there were ample breaks for people to mingle, talk to each other, exchange ideas, make contacts and/or friends, basically everything to nourish creativity, curiosity and inspiration. Many years ago, a friend commented: “At conferences, breaks are as important as the scheduled events; presentations are there only to seed topics for lively discussions during the breaks.”. I think HLF is an excellent example implementing that notion.

Meeting People

20160920-img_7417Over the course of the week, I have talked to many attendees and to both, young researchers and laureates. Topics circled – as in the presentation – around the past and the future, lessons learned on past problems in order to tackle the coming problems. Sir Andrew Wiles did a great job in his lecture on how people tried to prove Fermat’s Last Theorem until after more than 300 years a new approach was triggered which he finally led to a successful end. Similarly, Barbara Liskov chose to talk about the lessons learned in the early 1970s when early versions of modularity and data abstraction made it into modern programming languages which finally led to object orientation.

On Wed morning of the “HLF week”, a group of them visited SAP. The young researchers learned about opportunities at SAP while the laureates were exposed to a demo of SAP’s digital boardroom. Also on this occasion, good questions and discussions came up.

The Presentations

20160923-img_7501The lectures given by the laureates were mostly excellent. I’ve decided to pick 3 which I personally enjoyed most, 1 of each discipline:

  • Mathematics: Sir Michael Atiyah is 95 years old. He took advantage of the fact that as a laureate you don’t have to prove to anyone anything and, thus, can take a few steps back and look at the research in your area from a distance. His lecture on the “The Soluble and the Insoluble” discusses this as “both a philosophical question, and a practical one, which depends on what one is trying to achieve and the means, time and money available. The explosion in computer technology keeps changing the goal posts.”

  • Computer Science: Raj Reddy picked “Too Much Information and Too Little Time” as a topic. Amongst others, he pointed to cognitive science and that modern software needs to take human limitations (make errors, forget, impatience, “go for least effort” etc) and strengths (tolerance of ambiguity + imprecision + errors, rich experience and knowledge, natural language) far more into account.

  • Physics: Brian Schmidt gave the Lindau Lecture on the “State of the Universe”. I am no expert in astrophysics but, still, there has been a lot of fascinating facts and “aha effects” in it. It is amazing what science can do.

There is a lot more material from this and previous events. If you are interested in finding out more details, you might want to look at the HLF website where you also find more video recordings of the laureates’s lectures.

This has been cross-published on Linkedin. You can follow me on Twitter via @tfxz.

3 Turing Award And 1 Nobel Price Winners Visiting SAP

20160919-img_7405aOn Wednesday 21 Sep, SAP hosted a group of visitors participating in the 4th Heidelberg Laureate Forum (HLF), a meeting of young researchers and winners of the most prestigious awards in Mathematics and Computer Science, the Abel Prize, the Fields Medal, the Nevanlinna Prize and the ACM Turing Award, all tantamount to a Nobel Prize in their respective discipline. Amongst the visitors were

From left to right: K. Bliznak (SAP), Mrs O. Ioannidi, J. Sifakis, Sir T. Hoare, B. Schmidt, V. Cerf, T. Zurek (SAP)

From left to right: K. Bliznak (SAP), Mrs O. Ioannidi, J. Sifakis, Sir T. Hoare, B. Schmidt, V. Cerf, T. Zurek (SAP)

This visit took around 2.5 hrs and comprised a tour through SAP’s inspiration pavillon and a demo of SAP’s digital boardroom. The 4 laureates showed a huge interest in both, still being critical here and there: at the big data wall, Vint Cerf noted the omission of some of the advances in the internet in the 1970s and 1980s. Brian Schmidt commented: “Vint, you are too biased!”. The digital boardroom demo triggered a good set of questions that went beyond the pure visualisation on the 3 screens but extended towards questions on how to incorporate data sitting in non-SAP or legacy systems, on who composes the dashboards and how that might be applicable in their respective areas etc. They even speculated on how SAP might create upsell opportunities. It has been a lively exchange of ideas and, overall, a bit different from the common visits by SAP customers.

Vint Cerf commenting on the Digital Boardroom

Vint Cerf commenting on the Digital Boardroom

If you are interested in finding out more details on the HLF, you might want to look at the HLF website where you also find video recordings of the laureates’s lectures.

This blog has also been published on SciLogs.You can follow me on Twitter via @tfxz.

#BW4HANA Launch Demo

On 7 Sep 2016, BW/4HANA was launched in San Francisco. Videos from the event can be found on the event’s website. Especially the BW/4HANA launch demo has attracted a lot of attention. For further reuse – like embedding it into PPT – we have uploaded it to Youtube:

The demo is also available as a screencam:

More infos on BW/4HANA can be found here. The blog has been cross-published on SCN. You can follow me on Twitter via @tfxz.

PS: The demo has been shown also at SAP Teched Bangalore (5’33”, Oct 2016).

 

Data Modeling with #BW4HANA

Teulada-Palmeras

One of the most striking differences between BW and BW/4HANA is data modeling. On one hand, there is less but more versatile objects to choose from (see figure 1). On the other hand, there is a new, more intuitive alternative to BW’s long standing admin workbench (RSA1), namely the Data Flow Modeler (see figure 2). It shows physical and virtual containers (like DSOs or composite providers) as boxes. Data transformations, combinations, transitions etc. are indicated as directed lines between those boxes. From those boxes and lines it is possible to access the respective editor for those objects. In that way, a DW architect can navigate along the paths that the data take from entering the system towards the multidimensional views that serve as source to pivot tables, charts and other suitable visualisations. This is not only great for the mentioned DW architect but also allows for rapid prototyping scenarios, e.g. when a DW architect sits down with a business user to quickly create a new or modify a given model. Figure 3 shows an example.

Modeling-DW Modeling-BW4
Figure 1: Less and more logical objects when architecting a DW with BW/4HANA.
dataflow modeler 1

Figure 2: The new Data Flow Modeler in BW/4HANA.

dataflow modeler 3 dataflow modeler 4
Figure 3: The same scenario, once in the traditional admin workbench (left) and BW/4HANA’s Data Flow Modeler (right).

This blog has also been published here. You can follow me on Twitter via @tfxz.

Why #BW4HANA ?

Benissa-2016-08With the recent announcement of BW/4HANA some questions arise on the motivation for a new product rather than evolving an existing one, namely BW-on-HANA. With this blog, we want to shed some light into the discussions we have had and why we think that this is the best way forward. Here are the fundamental 3 reasons:

1. Classic DBs vs HANA Platform

Nowadays, HANA has become much more than a pure, classic RDBMS that offers standard SQL processing on a new (in-memory) architecture. There is a number of specialized engines and libraries that allow to bring all sorts of processing capabilities close to where the data sits rather than the data to a processing layer such as SAP’s classic application server. Predictive, geo-spatial, time-series, planning, statistical and other engines and libraries are all combined with SQL but go well beyond the traditional Open SQL scope that has been prevalent in SAP applications for almost 3 decades. Please recall that Open SQL constitutes the (least) common denominator between the classic RDBMS that have been supported in SAP applications. Long time ago, BW has broken with that approach a bit by introducing RDBMS specific classes and function groups that allowed to leverage specific SQL and optimizer capabilities of the underlying RDBMS. Still, the mandate has to be pushing BW’s data processing more and more to where the data sits. Accommodating a “common denominator” notion (i.e. complying with “standard-ish SQL”) impedes innovation at times as it stops adopting highly DW relevant and effective capabilities from HANA.

2. Legacy Objects / Backward Compatibility

BW has been originally architected around the properties and the cost models imposed by the classic RDBMS. Over the past decade, cautious re-architecting has allowed to continuously innovate BW and to safeguard the investments of the BW customers. There has been a strong emphasis on keeping newer versions of BW as compatible with past versions as possible. Similar to sticking to “standard-ish SQL” this impedes innovation in some areas. BW/4HANA breaks with this strict notion of backward compatibility and replaces it with tooling for conversions that might require user interaction here and there, thus some effort. However, this allows for removing some legacy not only inside a software product but also in existing DW instances that move from BW to BW/4HANA.

Now, with some “baggage” removed it has become easier to focus on new, innovative things without being squeezed into considerations about backward compatibility in order to keep older scenarios going that you would build differently (e.g. with BW/4HANA‘s new object types) nowadays. So and in that sense, BW/4HANA is a much better breeding ground for innovations than BW-on-HANA can ever be. This is not because BW-on-HANA is a bad product but because it comes with a guarantee of supporting older stuff too which BW/4HANA does not.

3. Guidance

Finally, and that is basically the result of 1. and 2., many of our partners and customers have asked us for guidance about which of the many options BW provides they should use for their implementation. Some of those options are there simply because they got introduced some time ago but would be actually obsolete within a new product. So, SAP has decided to reduce the complexity of choices and created a product, namely BW/4HANA, that offers only those building blocks that customers and partners should use now and in the future. The product has become simple and that will translate into simplified DW architectures.

I hope this blog helps you to understand why SAP has moved from BW to BW/4HANA. In simple terms, it’s similar to choosing between renovating and rearchtecting your existing house or building and moving to a new house with the latter fitting your furniture and all the other stuff that you cherish. We all hope that you will feel comfortable in the new home.

 

This blog has also been published here. You can follow me on Twitter via @tfxz.

PS: More details are revealed on Sep 7’s SAP and Amazon Web Services Special Event.

PPS: In this 4 min video, Lothar Henkes and myself describe the motivation and plans for BW/4HANA. It was recorded at the BW/4HANA launch event in San Francisco on 7 Sep 2016.

What is #BW4HANA?

New YorkBW/4HANA is an evolution of BW that is completely optimised and tailored to HANA. The BW/4HANA code can only run on HANA as it is interwoven with HANA engines and libraries. The ABAP part is several million lines of code smaller compared to BW-on-HANA. It is free of any burden to stay, e.g., within a certain, “common denominator scope” of SQL, like SQL92 or OpenSQL, but can go for any optimal combination with what the HANA platform offers. The latter is especially important as it extends into the world of big data via HANA VORA, an asset that will be heavily used by BW/4HANA.

So, what are BW/4HANA’s major selling points? What are the “themes” or “goals” that will drive the evolution of BW/4HANA? Here they are:

1. Simplicity

 

Less-ObjecttypesDepending on how one counts, BW offers 10 to 15 different object types (building blocks like infocubes, multiproviders) to build a data warehouse. In BW/4HANA, there will be only 4 which are at least as expressive and powerful as the previous 15. BW/4HANA’s building blocks are more versatile. Data models can now be built with less buildings blocks w/o compromising expressiveness. They will, therefore, be easier to maintain, thus more flexible and less error-prone. Existing models can be enhanced, adjusted and, thus, be kept alive during a longer period that goes beyond an initial scope.

Data-LifecycleAnother great asset of BW/4HANA is that it knows what type of data sits in which table. From that information it can automatically derive which data needs to sit in the hot store (memory) and which data can be put into the warm store (disk or non-volatile RAM) to yield a more economic usage of the underlying hardware. This is unique to BW/4HANA compared to handcrafted data warehouses that require also a handcrafted, i.e. specifically implemented data lifecycle management.

2. Openness

 

SQL-OpennessBW/4HANA – as BW – offers a managed approach to data warehousing. This means that prefabricated templates (building blocks) are offered for building a data warehouse in a standardised way. The latter provides huge opportunities to optimise the resulting models for HANA regarding performance, footprint, data lifecycle. In contrast to classic BW, it is possible to deviate from this standard approach wherever needed and appropriate. On one hand, BW/4HANA models and data can be exposed as HANA views that are can be accessed via standard SQL. BW/4HANA’s security is thereby not compromised but part of those HANA views. On the other hand, any type of HANA table or view can be easily and directly incorporated into BW/4HANA. It is thereby not necessary to replicate data. Both capabilities mean that BW/4HANA combines with and complements any native SQL data warehousing approach. It can be regarded as a powerful suite of tools for architecting a data warehouse on HANA with all the options to combine with other SQL-based tools.

3. Modern UIs

 

QueryDesigner_Preview_3 SAP Digital Boardroom BW/4HANA will offer modern UIs for data modeling, admin, monitoring that run in HANA Studio or a browser. In the midterm, SAPGUI will become obsolete in that respect. Similarly, SAP’s Digital Boardroom, Business Objects Cloud, Lumira, Analysis for Office and Design Studio will be the perfect match as analytic clients on top of BW/4HANA.

4. High Performance

 

Big-DWExcellent performance has been at the heart of BW since the advent of HANA. As elaborated above, BW/4HANA will be free of any burdens and will leverage any optimal access to HANA which will be especially interesting in the context of big data scenarios as HANA VORA offers a highly optimised “bridge” between the worlds of HANA (RDBMS) and Hadoop/SPARK (distributed processing on a file system). Most customers require to enhance and complement existing data warehouses with scenarios that address categories of data that go beyond traditional business process triggered (OLTP) data, namely machine generated data (IoT) and human sourced information (social networks).

The figure below summarises the most important selling points. It is also available as a slide.

Major BW/4HANA selling points.

This blog has been cross published here and here. You can follow me on Twitter via @tfxz.

PS: In the meantime …