Is Big Data the biggest information revolution since the invention of the Printing Press? The writers of The Big Data Revolution like to think so.
The available amount of data has grown exponentially, partially because of data colossi like Google and Amazon, but also because of small innovative companies that are focussed on collecing and analyzing Big Data. But will data be the main resource of our future economy, and change the way we work and live forever? There is a lot to say against this proposition, but when the hype is over the hill, Big Data will probably be a factor we can’t ignore.
What is Big Data?
According to the book, the amount of data is so enormous that we could speak of another kind of data. Data is only Big Data when it covers all possible cases. When we could say about the data collection: “N = everything”.
A good example is the FareCast application produced by the American Etzioni, which is now integrated in Bing search services. FareCast enabled the user to predict if a certain flight ticket price would increase or decrease in the near future. The user could make a decision based on this information whether to buy the ticket immediately, or wait a little longer. Etzioni had access to the main reservation databases of the American aviation branch, and thus access to the prices of all seats on all flights in civil aviation in the past years.
Three of the most important features of Big Data are observable in this example (as also discussed in this talk on the Next Web):
- More (the sheer amount of available data)
The amount of data on ticket prices was unmatched. Also, the data was messy: not evey database had the same structure, so there was no one-to-one comparison possible. Also, the data was not complete. Third: FareCast did its predictions not based on causal relationships, but based on correlations. This is perhaps the most distinguishing aspect of Big Data, because it challenges our way of thinking. Imagine that the data analysis shows that the color of the plane influences the ticket price. The data is not showing any causal relationship, only a correlation. It is very complicated to show a causal relationship, and also unnecessary: as long as it works, and reliable forecasts could be made.
Big Data is best understood when opposed to “Small Data”. Small Data involves a dataset that is collected through taking samples. It is by definition a smaller representation of the whole. Because the dataset is small, it is of utmost importance that the data is valid. Messiness on this small scale will directly distort the image projected by the dataset. In practice, Small Data is driven by causal relationships instead of correlations, because it is used to prove or undermine a given hypothesis. Big Data works in the opposite direction: an existing set of data is used to discover any potential correlations. Because the dataset is so big, it is possible to discover relationships that could not be observed by taking samples.
But where is this enormous amount of data coming from? From datafication. Datafication is perhaps the most far-reaching aspect of Big Data, because these datasets need to be collected somehow. This is done by turning more and more aspects of our daily lives into data. Google of course collects data on our seaching and surfing behaviour, mobile devices could track our geo-location, and there are even experiments with floors with pressure sensors that could datafy our movements.
Another aspect of “Big Data Thinking” is to make creative use of existing data, that has been collected for some other purpose. The application of Etzioni is an example for this. But Google is champion in data-recycling. The geo-data used for Google Maps is now being used to develop the Google Driverless Car.
Big Data will grow even bigger, and more important, no doubt. But will it trigger a true revolution, and “change the way we work and live”, like the book says? There will certainly be changes, this could be observed today. The transformation of Amazon is characteristic in this respect. Previously the company made use of reviews by professional book critics, but now Amazon uses recommendation and related item to sell their products. This change in strategy is also crucial for marketeers to realize.
But the book makes even more far-reaching claims about the influence of Big Data in the world, that are less credible than Amazon-like transformations. The authors paint a future in which Big Data is the single most important driving force in the economy. New occupations will arise: data broker and algorithmist. The data broker makes a living by mediating in data transfers, and the algorithmist will be a kind of judge that makes sure that data will be used in an impartial manner, so that data misuse will be prevented. This is a kind of future-gazing in which the role of Big Data is being overrated.
Moreover, the concept of Big Data itself has some weak points. Big Data is supposed to have three major aspects: more, messy, and correlations. Correlations will become more important, and when a correlation is demonstrable, the need to prove any causal relationship will diminish. But man will never be able to ignore his drive to understand the world. Mankind will not be satisfied with living in a world of only correlations.
Second, messy is also not a defining aspect. With the advance of technology, the ability to gather data will improve, and messiness will be decrease. Moreover, this is a desirable goal. So, pursuing Big Data does not necessarily means pursuing messiness.
The final aspect: more. The doctrine of “N = everything”. This aspect is distinguishing, because it is true having access to all data grants other possibilities than only having access to samples. But what if you want to research a subject that has not yet been datafied? A subject having to do with psychology, or the thoughts and feelings of people? Sometimes samples and questionaires are the only tools we have.
But Big Data will grow bigger, and the applications in this book are inspiring. Just don’t make it bigger than it is.
(This has been a translation of the Dutch article published on MarketingFacts.nl)