As in any fast moving industry, the terms to describe the technology sometimes move as fast as the technology itself. While recruiting for leaders in the Big Data industry, I have tried my best to keep up with the latest terminology.
One of the latest buzzwords that I heard recently from a Senior Data Scientist, was the concept of “Data Oceans.” There couldn’t be anything much more evocative to describe the complexity and diversity of the Big Data space at the moment, so I thought I’d explore what exactly a Data Ocean might be.
I am sure that you are all familiar with the concepts of Big Data from my previous articles, so to start with I’ll skip straight to the concept of what a “Data Lake” is. Yes, not a Data Ocean, a Data Lake….
A Data Lake is a specific place to keep all the data from a certain part of the business in its original raw and un-modelled format. This is not an exclusive term to describe a collection of Hadoop files, but merely means a large data pool that is specific to a certain part of the business. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data is then analyzed in more detail to help answer the question. It might be called a lake because it contains a limited amount of “species” and is constrained by its size. You can still go fishing in a Data Lake, but you can be certain that you won’t be catching any sharks.
Data Oceans are very different.
To paraphrase the words from my contact, a Data Ocean is a collection of un-modelled data from the entire business, from every possible area, kept in a single repository, normally in the Hadoop format. The size of these oceans is vast, but with improvements in analytics technology, it is becoming ever easier to “fish” for whatever data you need. It is all there ready for you, at your fingertips.
The connection between disparate sets of data can be easily analyzed when a Data Ocean strategy is deployed. The data will be there in five years time, there will be consistency of formats and with time it will be easier to fish “back in time” to select data that you may not have thought you needed otherwise.
Just as the world’s oceans have hidden depths, so the Data Oceans will have entire regions that will never be touched by the analytics tools. Some may say that this is an unnecessary expense, but the benefits outweigh the costs.
Big Data is powerful when there are a large number of variables to compare. You can’t get much bigger than a Data Ocean. Is it time to create one in your organization? If so, we can help you find the people to do it!