Although almost no one can tell you when data is "big" or not, we all want do “something” with big data.

But collecting terabytes of data doesn’t guarantee we will also use the available data very useful. Three recent trends begin to change the status quo.

Methods for analysing big data have improved, so we are better able to focus on the important data and ultimately make a shift from analytics to actual actions.

Big Data is actually any dataset that is too large to be analysed in memory. And that’s something you get rather quickly with the data you collect online. Your logfiles grow, only with a little traffic, into monstrous collections of potentially useful information.

Through the successful use of this type of data by some large organisations (Google, Facebook, and other ass-kicking companies), we now all want big data. However, the collection of a huge amount of logfiles by itself is not sufficient.

A very uninformative visualisation of "Big Data"

Be very critical when you collect data

Firstly you need to deliberately determine what data you collect. Does your logfile contain IP addresses, timestamps and URLs? What can you do with that information later?

Suppose you do not know what a specific URL contains, how do you know what why this URL interests your visitors?

Fortunately we are seeing a shift from gathering - often useless - log data to data collection that is actually informative for predicting the behavior of your users.

More and more companies understand the psychological processes that drive consumers' decisions. We try to collect data that has significant predictive power if it concerns consumer behaviour.

The recent focus on persuasion, what arguments influence your customers, is a good example. if your log files contain information about the type of influence that is used on specific URLs, and whether this specific influence attempt is successful or not, then you are actually able to meaningfully alter future interactions.

By knowing what drives the behaviour of your consumers and to explicitly collect this behavioural information, you are entering the realm of highly effective big data.

From actionable data to automatic decisions

Even big behavioural data, inspired by the social science knowledge about your customers, is not enough. But what will you end up doing with this information?

Many of today's big data solutions won’t do much more than counting specific lines in your log files. After counting they probably make a graph of the outcome.

Thus, a pretty picture appears that indicates that clients with IP addresses from the Netherlands more frequently visit Dutch URLs than customers from England. Or you can clearly see that between 02.00 ECT and 06.00 ECT you have fewer visitors from Europe.

Networks plots: very pretty indeed, but what should a marketer do with this?

If it concerns me, this only the beginning. Graphics are fun, and perhaps informative for online marketers to adjust running experiments. But too often companies purchase big data applications that create beautiful charts.

After a month or two looking at these charts, the same companies find out that it really is not that clear what should be done with the information.

Fortunately, a new trend is emerging and big data is more frequently being used to automatically make decisions rather than to create charts. Search engines use the count of common results to directly match suggestions, enabling a user to easier find what he seeks.

Recommendation engines, PersuasionAPI, and other decisioning technologies decide in real-time based how to optimally create your current page. Through specifying clear success criteria you can automatically optimise your page for each individual customer. Without the intervention of graphs and the interpretation of marketers.

A new order of data analysis

Finally, there is one last change upon us that will accelerate the meaningful use of big data: the possibilities to analyse the data, and ultimately to respond, are becoming more extensive.

A few years ago, running big data on standard database solutions would give many headaches. Hadoop, Hive, and MapReduce are opening up possibilities for the analysis of big data. These new technologies offer the possibility to store a lot of data and efficiently analyze it.

But Hadoop and Hive have their limitations: the analysis of the dataset is never real-time. This is why a Hadoop / Hive setup is good for graphics, but less effective for real-time decisions. The latter is especially difficult when you want to customize content between page views (within a session) based on the previous behavior of your visitors.

But developments don’t stop here as well. Streaming analytics capabilities are constantly expanding, and with solutions such as Storm almost anyone can use them. Technology combined with statistics brings us one step closer to the effective use of big behavioral data.

More and more companies follow the above trends. I think the coming years we will see new solutions that combine technology with knowledge about decision making and behavior to ultimately optimize your online activities.

Maurits Kaptein

Published 2 January, 2013 by Maurits Kaptein

Maurits Kaptein is Chief Science Officer at Science Rockstars and a contributor to Econsultancy. 

4 more posts from this author

Comments (1)

Jacob Ajwani

Jacob Ajwani, VP of Strategy at

"But Hadoop and Hive have their limitations: the analysis of the dataset is never real-time. This is why a Hadoop / Hive setup is good for graphics, but less effective for real-time decisions. "


over 4 years ago

Save or Cancel

Enjoying this article?

Get more just like this, delivered to your inbox.

Keep up to date with the latest analysis, inspiration and learning from the Econsultancy blog with our free Digital Pulse newsletter. You will receive a hand-picked digest of the latest and greatest articles, as well as snippets of new market data, best practice guides and trends research.