Analytics approaches every marketer should know #3: Predictive analytics

To this end, we will now cover the practice of predictive analytics and show how it is not necessarily about predicting the future, but rather a way to figure out what is happening right now and how marketers can use that information to their advantage.

Before we begin, though, we’d like to let you know that Econsultancy runs an Advanced Data & Analytics Training course.

What is analytics?

We have previously defined analytics as a practice, a process, and a discipline whose purpose is to turn data into actionable insight.

With predictive analytics, however, the focus is more on the insight than the action.

Predictive analytics overview

With descriptive and diagnostic analytics, we are able to describe data and offer explanations for why certain events happened. Notably, both techniques use data from things which happened in the past. The data itself, therefore, is never in question, even if the diagnoses are controversial.

With predictive analytics, we are still relying on data from past events, but instead of using the data to describe or explain the past, predictive analytics uses data to get more data.

So why are we using existing data to get more data? Two reasons:

The new data is either too difficult to get or not yet available.
The new data will help us to make better decisions.

Note that, contrary to popular perception, the data we get from predictive analytics will not necessarily be used to predict the future. Instead, predictive analytics is mostly used to predict what a data point would be if we knew what it was.

This confusing yet crucial point is probably best explained with an example.

An example of predictive analytics

One good example of predictive analytics which is relevant for marketers is sentimental analysis (inspired by a post by Dr. Michael Wu, Lithium’s chief scientist).

Say you need to find out whether comments on social media, overall, are positive or negative about a new product line. You could, in theory, gather all of the comments, read them individually, and keep count of how many were positive and how many were negative.

Or, instead, you could run the comments through a sentiment analysis algorithm which ‘scores’ each comment according to how positive or negative it was. Then, using the average score, you would have your answer. Greater than zero is net positive, less than zero, negative.

But how does a sentiment analysis engine work? How does it know what is positive or negative? The algorithm can perform this task because it ‘learns’ the difference between positive and negative comments through predictive analytics.

Using sample text, marked as ‘positive’ or ‘negative’, the sentiment analysis algorithm learns which word combinations are likely to be positive and which negative. After sufficient training, the algorithm then had rules to help it decide the tone of the passage.

So when a new passage, which is not marked as ‘positive’ or ‘negative’, is presented to the algorithm, it uses the rules it learned previously to indicate whether it is positive or negative. The algorithm, therefore, takes existing data (the comments) to create new, more useful data (the overall sentiment).

So, with a sentiment analysis algorithm, marketers can perform predictive analytics. They can ‘predict’ what the overall sentiment would be if they read and scored all of the messages individually.

The distinguishing features of predictive analytics

Produces utility data

One of the most apparent differences between predictive analytics and descriptive analytics is that its output is data to be used, not just read. From the example above, the sentiment analysis score for each individual comment is not particularly useful; it has to be averaged and interpreted.

Requires an algorithm

Additionally, unlike diagnostic analytics, you will probably write your own algorithm to do the prediction.

To understand why this is the case, have a look at some of the data sets predictive analytics is used to obtain:

Social media influencer scores.
Whether a customer is ‘in-market’ or has a particular interest.
Where a customer is in the purchase funnel.
What is the experience consumers ‘must have’ before they buy?
The likelihood of a customer to cancel your service.
A ‘lead score’, often used by business-to-business (B2B) marketers.

Each of these require a significant amount of data to be effective, and if the new data is to be consistent and reliable an algorithm is required to process the data uniformly.

Needs training data

Also, in order for the new data sets to be accurate, predictive analytics requires actual data for training. Training data must also be ‘marked’ with the outcome so that the algorithm can be calibrated. In the example above, all of the comments used to train the sentiment algorithm had to ‘marked’ as positive or negative.

Note that creating an algorithm doesn’t require fancy machine learning or artificial intelligence (AI). Many companies derive their B2B lead scoring algorithm through a collaboration between marketing and sales.

Is not exact

Finally, unlike descriptive analytics, predictive analytics only offers results which are possibly true. As with diagnostic analytics, the analyst has to take a stand with the predictions and will typically need some evidence to support the algorithm’s results.

How to do predictive analytics

Now that you perhaps have a better idea of what predictive analytics are, how do you actually do it?

1) Think of data that you want, but don’t have

The first step is to reflect on your current marketing programme and think of something that you’d like to know, but currently do not.

For example, if you are trying to boost your ecommerce sales, what do people who buy things do before buying? Do they visit the site multiple times, watch product videos, or linger on the site?

If you knew the answer to that question, you could focus your marketing efforts on getting people to have that ‘must-have’ experience as often and as quickly as possible.

2) Build a training set

Every algorithm needs to be trained with real data. To get training data, you first need to distinguish source data which has the correct attributes from data which does not. Then you need to mark each case with the result.

In the case of the sentiment analysis algorithm, someone must determine which words and phrases were negative and which were positive and then mark them as such for the algorithm to learn the difference.

3) Write the algorithm

While this sounds complicated and difficult, it need not be. An algorithm is simply a list of instructions to follow in order to transform one data set to another.

So to start off, you can simply look at your data and identify common features between the data sets that achieve your goal. Then your algorithm could be that people who do ‘X’, represented by the data set, also tend do ‘Y’, the desirable goal.

Additionally, as mentioned previously, the process does not have to be automated. You could simply look for behaviours (e.g. pages viewed) in Google Analytics and see whether that behaviour frequently led to your goal (e.g. a purchase).

4) Test performance

To test performance you need to find additional ‘marked’ data and see how well the algorithm’s output corresponds with the marks.

Testing data should not be the same as the training data. Reason being that you may devise an algorithm which is optimized only for the training data, but performs poorly on any other data.

5) Review, improve, repeat

Once exposed to real data, the performance of the algorithm will probably be underwhelming. But with some testing and additional data analysis (beyond the scope of the post, but here is a good introduction), it is likely that you can improve it over time.

Nothing tests an algorithm better than putting it to real use, though. Results like customer churn can be tested and improved with result data alone but less concrete results, like interest segments or lead score, may require collaboration with other departments.

Regardless, implementing a predictive algorithm is an iterative process and the more it is reviewed, the more likely it will become useful.

Predictive analytics best practices

Start simple

As with all analytics, it’s better to start with predictive analytics which work in a small way then to try something ambitious which fails.

So for the first few attempts, use an outcome that is absolutely true (e.g. did buy/didn’t buy) and look for one or two explanatory variables.

The ‘must-have experience’ mentioned in step 1 is a good example of a simple predictive algorithm. You are simply looking for a single common experience customers have before buying something.

Aim for high-quality data before deploying

While testing is difficult and can be discouraging, your predictive output should be of a reasonable quality before launching the algorithm. While there is no definite rule for how accurate your model should be, your algorithm should offer enough predictive power that it makes a visible impact on business performance.

Not everything will work

Even the best ideas for predictive analytics often do not work. Behaviour which seems logically to lead to your goal may only do so a small percentage of the time.

On the bright side, proving that a data set is unrelated to your goal is still useful information – and the steps you take to finding out that an algorithm doesn’t work is a good start to finding one which is indeed predictive.

So…

Although it is intoxicating to think we can predict the future with data, the reality is that we can, at best, only really be sure about what is happening right now.

Fortunately, marketers can still derive useful information by discovering connections between an existing data set and a desirable goal. Marketers can then encourage the original behaviour in an attempt to engineer the goal.

In this way, even though predictive analytics is not a crystal ball, it remains a worthwhile practice which can delivery real business value and, with some effort, return on investment.