Preparing to take maximum advantage of all of the new data and analytical capabilities rapidly arriving on the computing scene will mean that most businesses have to rethink how they assemble, distill, and use information. My advice: Plan on killing your data warehouse.
Actually, you won’t have to take it out back, have it kneel down, and shoot it in the back of the head, gangster style. This is more of a Dr. Frankenstein operation in which you will put the data warehouse on the operating table, cut it up, and create a new way to process information out of a combination of old and new parts. Actually, parts of your data warehouse can stay alive during this process. Consider it a form of vivisection. What has to die is the idea that the information a business needs come from data warehouses the way they are currently implemented.
The vision that I am crafting for a replacement of a data warehouse is a data lake, a concept I’ve written about on Forbes.com and a in a problem statement on CITOResearch.com (Preparing for Big Data).
The basic idea is simple. A data lake contains a large amount of data from various sources and forms that is ready to be distilled into information to support decisions or business processes.
Here are the primary differences between a data warehouse and a data lake:
The transformation driven by a data lake will implement the paradigm of operational intelligence, a more real-time, automated way of using both structured and unstructured data, both from real-time sources and historical repositories in a way that allows analysis to be as automated as possible.
The question that interests me now is how can we craft a meaningful architecture for a data lake? What will be included from the world of business intelligence and what will be tossed off the operating table? How will the capabilities I assert that should be part of a data lake support each other? How can we make the idea of a data lake more than just a list of new ideas and new technology?
Right now, the data lake is still a somewhat fuzzy vision, but it is a vision that must be pursued. Current data warehouses are not up to the task of handling the volumes of machine data, aka Big Data, in ways that allow businesses to be responsive.