Big data is about more than Hadoop and a bunch of fancy technology: there are some very real organisational barriers too.
It's a bit of a mirage. As soon as you get your head around it, it ceases to exist.
How so? The accepted definition for Big Data talks about exploiting “data sets whose size is beyond the ability of commonly used tools to process it within tolerable time”. By that definition, as soon as you’re comfortably handling the data, it ceases to be big.
Nonetheless, Big Data is clearly trending amongst the tech analysts, and it’s doing so for good reasons. The volume of data we’re handling is growing dramatically, Social media, the internet of things. The mass of data produced by smart electric grids, intelligent traffic systems, etc.
90% of the data ever created has been created in the last two years...
And yes, it’s not just about size. Gartner’s “3Vs” (Volume, Velocity, Variety) are all growing. We’re being asked to process data ever more quickly so we can respond to events as they happen, and that data is coming from an ever wider array of channels, sensors and formats.
Our data is fast and complex as well as big.
So let’s all go out and buy Hadoop, and our problems will be solved. Hurrah!
Not so fast. I can see at least six things that are going to get in the way of Big Data in the typical organisation:
Big Data consumes a lot of technical infrastructure, storage, bandwidth, CPU, etc. And it generates highly variable workloads as it does so.
You need lots of infrastructure at some times, very little at others.Fortunately, the Cloud is made for this. The challenge isn’t technical so much as it’s one of finding a reliable cloud vendor, and of getting the economic model right.
Just don’t underestimate how challenging that can be in the current, rather opaque market for cloud services.
The application stack behind Big Data is complex. Some of it is immature. The Cloudera Hadoop distribution, for example, contains a dozen applications, and some of these are still pretty new.
This creates several challenges: you need to get up several learning curves at once, integrate many tools with your existing application stack, and build a stable operating environment out of these disparate pieces.
You need a deep stack of skills to do Big Data. As well as business specialists (to ask the right questions) and technologists (to tame the infrastructure and applications), you need “data scientists”.
These are the people who understand the statistical algorithms, can drive the visualisation tools, etc. They’re not easy to find. And once you’ve found them, you need to integrate them with the rest of your team, build appropriate reward and reporting structures, and so on.
Big Data projects operate on a different cycle to traditional ones.It’s not so much “plan then do” as “experiment, learn and evolve”. It requires a mindset that’s attuned to research as much as delivery, yet which is able to temper research with business objectives.
Good Big Data teams will be very tolerant of “failure”. (If 50% of your experiments don’t fail, then you’re probably not testing the boundaries.)And they’ll allocate plenty of capacity to exploring the horizon and trying new stuff.
Most organisational data is highly fragmented.The web team has a bunch of logs. Sales owns some of the customer data.Operations owns some more.
This creates challenges at several levels: syntactic (defining common formats), semantic (agreeing definitions) and political (negotiating ownership and responsibilities).
It also creates data quality problems as no-one’s responsible for the complete picture, so no-one ensures that data is correct, consistent and up to date.
Big Data needs to face all these challenges head on. (As data warehousing did before it. But Big Data has the added complications of semi-structured data and rapidly changing data definitions.)
You can only do this effectively if you can ascribe clear value to the outcomes, otherwise you have no way to prioritise activity across your portfolio of experiments and investments.
Yet few organisations are able to put clear valuations on their current data, let alone on the fuzzy web that Big Data exposes.
Of those six challenges, the first two, infrastructure and applications, are fairly straightforward. The tools we need are (largely) there. We just need to learn how to use them and to fine-tune their economics.
It’s in the next two that the challenge lies: building multi-skilled teams with the right attitude. Right now, many Big Data projects are merely playing with the data, exploring the tools and shifting data around within its silos.
If we could build some stable, cross-functional teams and focus them on business-led experimentation, then we’d probably begin to find real value in the data we have stashed away. And along the way, we’d start to break down some of the silos that have grown around our data.
As ever, the real challenge isn’t the technology. It’s shifting our organisations to address the opportunities that Big Data creates.