There is a lot of debate whether data warehouses are dead (and BW in particular) and that all sorts of technologies surpass the allegedely old fashioned discipline of data warehousing. I strongly believe that these are separate things: technologies are tools and data warehousing is a governance process using those tools in order to achieve consistent data. Only if your data is consistent then the results of your analyses are valid and valuable. Consistency is not given if data originates from more than one source. Even if all sources are individually consistent – that should be given by the underlying application like SAP ERP – then the union of those consistent sources is not consistent, e.g.
- Customer XYZ in source 1 might by partner 123 in source 2. If you want to combine data from those sources you need to declare somewhere that XYZ and 123 are the same entity.
- Financial data in source 3 might be tracked in fiscal periods while sales data in source 4 is listed along calendar months. Again, translating fiscal periods into calendar dates is required in order to make sense.
- Source 5 might use a message queue (e.g. iTunes) which constitutes a kind of batch process, i.e. thereby introducing a lag between the moment a purchase transaction is executed and the time the transaction is committed in the DB. That lag, in turn, translates into real-time replication (which is based on transaction time) not yielding consistency either. Typically, data warehouses have implemented barriers that allow data to be released for analyses only if matching data or other consistency constraints are guaranteed or met.
- Along the same line there is an anecdotal example of a top brand customer whose upload process involved collecting data on a memory stick with a student cycling to an internet cafe to then upload the data to a server in Europe. Traffic on the cycle route and occupancy in the internet cafe were major factors and affected the time lag. This happened in a high tech location in Asia. This is the real world, even if this is arguably an extreme instance.
Naturally, all the examples are easy individually. But in most cases there are many. There is lot of tools to help but there is no way to automise all of this. Even ontologies, standards (e.g. country codes, address normalisations, currency codes, …) are only tools to relieve the task. In the end, a “consistency management architecture” like LSA (layered scalable architecture) or its successor LSA++ is required as a governance discipline. The latter can be implemented via data warehousing tools or a harmonised toolset like SAP’s BW.
If done properly, a data warehouse evolves into a very rich semantic treasure over the years. This can be of an enormous value to an enterprise. I have seen many BW installations that have gone that path. This is why many BW customers are grateful that those investments are safe and that SAP provides innovative technologies (within BW-on-HANA) in a non-disruptive way. People who understand the lifecycle of such a system also understand that data warehouses – and BW systems as prominent examples – are not dead, even if they are called differently in the future.
You can follow me on Twitter @tfxz.