First off, what is it?
Well don’t let anyone tell you it’s down to sample size, or about measuring everything. It’s about combining datasets (sometimes ‘dirty’ ones), contrasting them in different ways, and doing it as quick as possible.
Sometimes this necessitates great computing power, but not always. You can read more about such technology as Hadoop and GreenPlum in this nice little article).
Datasets are multiplying as we measure lots more than we used to. This means our thinking has to broaden – no longer is ‘what can we do with our database of email addresses?’ the question, rather ‘what data can we look at to give us the best idea possible of a customer’s stage in the buying cycle and what they’ll be receptive to next?’
The definition of big data isn’t really important and one can get hung up on it. Much better to look at ‘new’ uses of data.
So, here’s some examples of new and possibly ‘big’ data use both online and off.
This article from the Wall Street Journal details Netflix’s well known Hadoop data processing platform.
Cloud architecture is highly scalable and allows Neflix to quickly provision computing resources as its sees the need. Traffic patterns are analysed across device types and localities to help improve the reliability of video streaming and plan for growth.
The technology is also used for Netflix’s recommendation engine based on a customer’s viewing habits and stated preferences.
Sticking with Netflix, this piece in the Washington Post theorises that Netflix could vary its price if it had enough information on each user to know how much they might pay.
To a certain degree, this happens in online retail with airlines targeting previous browsers, and some stores (such as Staples) changing prices depending on which physical store the customer is nearest.
The Wall Street journal has also documented that Orbitz, the travel website, has in the past charged Mac visitors higher than those on Windows. Taking into account IP, device, age, past visits, and more variables, throwing them into a database and calculating a charging threshold can conceivably be termed big data.
Out of home advertising
I’ve previously covered Route, who have combined lots of data on footfall and traffic, including the tracked day-to-day movements of 28,000 people. It’s hoped that the accuracy of predicting eyes on billboards will increase, leading to fairer pricing.
I can’t recommend too often that you read this piece from the NY Times on how Target uses a wealth of customer data to predict future purchasing habits. Specifically, pregnancy kicks off a chain of purchases that are fairly distinctive – Target’s data collection is spookily prescient, sending one teen customer nappy vouchers before her own father knew she was pregnant.
Politics has traditionally seen data siloed, with canvassing done on little more than a list of postcodes. Obama’s election campaigns began to change this. Check out this article from Slate on Project Narwhal.
Narwhal would bring new efficiency across the campaign’s operations. No longer will canvassers be dispatched to knock on the doors of people who have already volunteered to support Obama. And if a donor has given the maximum $2,500 in permitted contributions, emails will stop hitting him up for money and start asking him to volunteer instead.
Those familiar with Narwhal’s development say the completion of such a technical infrastructure would also be a gift to future Democratic candidates who have struggled to organize political data that has been often arbitrarily siloed depending on which software vendor had primacy at a given moment.
WeatherSignal works by repurposing the sensors in Android devices to map atmospheric readings. Handsets such as the Samsun S4, contain a barometer, hygrometer (humidity), ambient thermometer and lightmeter.
Obviously, the prospect of millions of personal weather stations feeding into one machine that will average out readings is exciting, and one that has the potential to improve forecasting.
You can read about OpenSignals work in this article in Scientific American.
This piece from Cloud Times details how IBM are predicting heart disease with big data. Analysis of electronic health record data could reveal symptoms at earlier stages than previously.
IBM uses the Apache Unstructured Information Management Architecture (UIMA) to extract the known signs and symptoms of heart failure from available text.
With no single strong indicator, only weak signals or ‘co-morbidities’, such as hypertension, diabetes, associated medications, ECG and genomic data etc. can be analysed. Drawing out probabilities from disparate and size-differing databases is a task for big data analytics.
Again IBM, this Venture Beat article looks at a model and data from the World Health Organization. IBM looked at local climate and temperature to find correlations with how malaria spreads. This analysis is used to predict the location of future outbreaks. The Spatio Temporal Epidemiological Modeler (STEM) is free and open source.
Justin Lessler of Johns Hopkins Bloomberg School of Public Health:
There are a lot of tacit assumptions out there about how changes in climate will impact the distribution of diseases like malaria. This work suggests that things probably are not so simple. A change that has a huge effect on malaria transmission in one place might not be as important somewhere else.
Crimson is a system that shows variables including complications, hospital readmissions and measures of cost. It colour codes signals as to how well a doctor is performing against his or her peers.
This piece in the Wall Street Journal suggests the technology has reduced average stay and average cost at the Long Beach Memorial Hospital.
One doctor was warned by a pharmacist that data showed one physician was using Levaquin, an antibiotic, at a far higher rate than peers. With concerns about generating drug-resistant bacteria, the physician was encouraged to reduce the usage of said antibiotic.
This particular medical group has used big data to make sure medication is correctly prescribed. 2012 data showing 76% of patients getting recommended shots compared with 56% in 2010.
I’ll let you help me out with that in the comments.