BUILDING THE DATA WAREHOUSE: A RETROSPECTION

By W H Inmon

8.28.22

Just yesterday I happened to look at a book that was written in 1988 – BUILDING THE DATA WAREHOUSE. I normally do not ever reread a book that I have written but this book just reappeared on my desk. So I looked back 35 years or so and reflected on what was going on in that day and age with technology. The first thing that I looked for were things that were said that I would not agree with today. Fortunately, I found none of that. What I did find is that I would have emphasized things a bit differently and organized things differently today.

The world was a different place 35 years ago. In that day and age the definition of a data base was –“a single source of data for all processing”. When data warehousing was introduced, data warehousing was a serious challenge to this definition. With data warehouse there came the notion that there were different kinds of data bases with very different purposes. There were operational data bases where transactions were performed and there were analytical data bases where analysis was performed. These different types of data bases had very different properties and procedures that surrounded them. In a word, data warehouse was a radical idea 35 years ago and it challenged the basic principles the world was operating on at the time.

At the time the conventional notion of a data base was supported by everyone – academia, vendors, venture capitalists – everyone. The notion of a data warehouse was considered heresy in those circles. But 35 years later, data warehouse is conventional wisdom and the concept that there are different types of data bases is the accepted industry standard. And this is remarkable because data warehouse was NEVER supported by vendors or the investment community. Indeed, several prominent vendors have been openly hostile to the concept of data warehouse. And even today, data warehouse is paid lip service by the vendors but there is little or no support for data warehousing.

So what is it that vendors detest about data warehouse (other than the fact that they didn’t invent data warehousing)? The reason why data warehouse is fought tooth and nail by the vendor is that data warehouse requires that data be integrated. And integration involves a four letter word – work. To build a data warehouse requires that the data going into the data warehouse be turned into enterprise data. Enterprise data is data that is –

   Consistently defined

   Consistently named

   Consistently calculated

   Consistently organized

   Representative of the entire organization

Anyone that thinks that achieving this goal of enterprise data is easy to do has not ever done it before. It is not easy to do and requires work and using one’s intellect. And vendors and consultants and investors just hate to do that. Vendors are always looking for the quick fix with no slippery slopes. And that just isn’t integration. You cannot integrate data without getting your hands dirty and using sweat and your brain. Can’t be done.

Now all of that was 35 years ago. Or was it? The vendors just don’t give up. The vendors are creating new gimmicks every day that try to bypass the chasm of unintegrated data. So what gimmicks have been tried and are being tried today?

Today we have all sorts of new things to buy – AI, ML, data mesh. In the past we have had the “virtual data warehouse”, the dimensional model, and other gimmicks. Now there is nothing wrong with AI, ML and data mesh. But if they are to succeed these technologies have to be built on believable data. The data that they operate on must be accurate, consistent, up to date and complete. If the data that they operate on does not have these qualities, then the “new and improved” technologies have little chance of success.  I don’t care how sophisticated AI or ML is, if AI and/or ML is operating on garbage, AI and/or ML will produce garbage results.

The foundation of data that modern technologies rely is not on data but on believable, useful data. The foundation of data is an architectural underpinning of todays new technologies. Everything that sits on top of that data is a technology. But mention to the vendor that the new technology is not going to work if the data it operates on is not reliable, and the vendor magically changes the conversation. Let’s just forget about believable data because that is an “old” subject and we don’t do that.

It is like the building in San Francisco that was not built on bedrock. The building is tall, beautiful and expensive. And it is falling over. I don’t want to be on Market Street or Chinatown when it finally topples.

A final thought on the retrospection of the book that started it all. I looked over the people mentioned in the book. Some are still close friends. Others have drifted away and I haven’t had contact with in years. But the book was dedicated to Kevin Gould and Jeanne Friedman who have been steadfast friends for all this time. Thank you, Kevin and Jeanne.

Bill Inmon is the father of the data warehouse. Bill lives in Denver with his wife and his two Scotty dogs – Jeb and Lena. Lena has taken to digging holes in the back yard in order to get to the tomatoes that are ripening on the vine. Jeb has better manners than that.