For a while, I’ve been wondering what the data quality of sensor data is. Naively – and many conversations that I had on this went along that route – it can be assumed that sensors always send correct data unless they fail completely. A first counter-example that many of us can relate to is GPS, e.g. integrated into a smartphone. See the figure to the right which visualises part of a running route and shows me allegedly running across a lake.
Now, sensor does not equal sensor, i.e. it is not appropriate to generalise “sensors”. Quality of measurements and data varies a lot on the actual measure (e.g. temperature), the environment, the connectivity of the sensor, the assumed precision and many more effects.
In this blog, I analyse a fairly simple, yet real-world setup, namely that of 3 webcams that take images every 30 minutes and send them via the FTP protocol to an FTP server. The setup is documented in the following figure that you can read from right to left in the following way:
- There are 3 webcams, each connected to a router via WLAN.
- The router is linked to an IP provider via a long-range WIFI connection based on microwave technology.
- Then there is a standard link via the internet from IP provider to IP provider.
- A router connects to the second IP provider.
- The FTP server is connected to that router.
So, once an image is captured, it travels from 1. to 5. I have been running this setup for a number of years now. During that time, I’ve incorporated a number of reliability options like rebooting the cameras and the (right-hand) router once per day. From experience, steps 1. and 2. are the most vulnerable in this setup: both, long-range WIFI and WLAN, are liable to a number failure options. In my sepcific setup, there is no physical obstacle or frequency polluted environment. However, weather conditions are most likely to be the source of distortion, like widely varying humidity and temperature.
So, what is the experiment and what are the results? I’ve been looking at the image data sent over the course of approx. 3 months. In total, around 8000 images were transmitted. I counted the successful (fig 3) vs the unsuccessful (fig 4) transmissions. I did not track the images that completely failed to be transmitted, i.e. that did not reach the FTP server at all and therefore did not leave any trace. 5.3% of the images were distorted (as in fig 4) or every 19-th image failed to be transmitted correctly. In addition, that rate was no constant (e.g. per week) but there were times of heavy failures and times of no failures.
This is an initial and simple analysis but one that matches real-world conditions and setups pretty well and is therefore no artificial simulation. In the future, I might refine the analysis like counting non-transmissions too or correlating the quality with temperature, humidity or other potential influencing factors.