And even with that connection, Big Data can be a dangerous tool. It’s easy to generate meaningless correlations if you don’t think systemically about your problems.  

It’s easy to generate spurious results if you don’t pay constant attention to data quality. You can waste a lot of money if no-one acts on your findings. Nope, Big Data really won’t save the world.

But we knew this all along. I’ve seen very little written about Big Data that didn’t stress the need to manage data quality, to build integrated teams, to understand the limitations of the algorithms (that’s what “data science” is all about), to frame good questions.  

If you’re not prepared to do all this, avoid Big Data.

So where’s the backlash coming from?  Partly, it’s built into the tech industry’s DNA. We put technologies through a regular cycle of hype then backlash before we eventually settle upon reality. Big Data has now entered the backlash phase.

And partly, I suspect we’ve now had time to see some disappointing projects. There are many ways to fail with Big Data.  

For example:

  • Focus on Big. People get focused on processing as much data as possible, rather than on using the data which best fits their questions. Yes, Big Data gives you tools to extract signal from noisy data, but that doesn’t mean you can create (valid) signals from irrelevant data.
  • Hammer think. As the saying goes, “If all you have is a hammer, then everything looks like a nail”. Making the hammer bigger doesn’t make it more widely applicable. Many people seem to be trying to apply Big Data indiscriminately.
  • Tech-driven fantasies. Hadoop is a great tool. It makes sense for most IT organisations to get some understanding of it, even run a few prototypes. But if Big Data is a tech-driven project, it will fail.  

    You need a balanced set of business, data and technical skills. Few IT departments can assemble that mix.

  • Random data walks. This is the data scientists’ version of the above. Data is fascinating – given enough of it, you can find endless correlations, generate innumerable hypothesis, go down any number of blind alleys. You need to bound this exploration with some sort of business rationale.
  • Short-termism. It’s often easy to generate a few quick wins. Generating sustained results requires much more attention to organisational change – embedding analysis into day-to-day decisions, changing processes, building skills. This is all hard stuff. It’s easy to get bogged down after a few early successes.
  • Failure to act. When Big Data initiatives get isolated from regular business operations, they produce insight that no-one is committed to act on. No matter how good your insight is, without action it’s valueless.

And what can you do to avoid such failures?  

Here are some things I’d keep in mind when kicking off any sort of Big Data initiative:

  1. Data informs judgement; it doesn’t replace it. Your goal is to make good decisions. Data helps you do that. But data alone can’t make the decisions: you need to exercise judgement somewhere.
  2. Ignore the hype. The hype drags us into tools and technologies. It pulls us towards new algorithms and “interesting” data sets. You need to cut through this and focus only on the data, tools and techniques that apply to your specific problems.
  3. Focus on value, not size. You need the right data to help make your decisions. If that data is noisy, then Big Data techniques may help you find signal. But that’s a secondary issue – the starting place is solving high value problems, not using lots of data.
  4. Integrate. Big Data initiatives have to deliver capabilities that can be integrated into daily business operations. They need integrated business, data and technical teams. Standalone Big Data is a waste of time.
  5. Ongoing experimentation, not repeatable processes. You’ll often need to iterate to find the best answer to any specific question – generate a hypothesis, choose data and tools to test it, gather feedback, and refine. Trying to define the “perfect” process for a decision too early cuts you off from that feedback and refinement.
  6. Manage data quality. That’s a really boring, old world thing to say. But it’s no less true for that. Big Data gives you some tools to accommodate noise in your data, but you’ll always get better results if you start with decent data in the first place.

The bottom line is clear: without data, we tend to make poor decisions. Read Thinking, Fast and Slow by Daniel Kahneman. We’re all subject to a range of biases and shortcomings that naturally limit our ability to make good decisions. Paying careful attention to data can help us overcome these shortcomings.

Big Data isn’t a panacea. It simply adds another set of tools to our bag. Used inappropriately, those tools will dazzle us with inconsequential results, overwhelm us with trivial findings, drain resources from more useful work.

But, used well, Big Data can help us make corporate decisions based on reality rather than fantasy. In many organisations, that’d be a big change.