Machine learning sounds like something that computer nerds do, but not marketers.
Well, here’s a dip-your-toe-in introduction to how anyone can use machine learning to improve their digital ad campaigns.
Machine learning is an intriguing topic. Whether you’ve read about US retailer Target discovering a young woman was pregnant before she told her parents or you have seen it in action with Amazon recommendations, it’s exciting to think that computers can do things which seem almost magical.
But did you ever think you can use it for your own marketing campaigns?
It starts with a need
So let’s say you’re marketing for a fairly straightforward business – an online jewelry store – and your main strategy is display advertising online.
Your challenge is to create a campaign which gets clicks, sure, but you want to get the right kinds of clicks – clicks from buyers, particularly big buyers.
In fact, you know that most of your client’s revenue comes from people who spend more than $100. So you need to find more of them, even at the expense of the <$100 buyers.
You devise a number of ads to attract the big rollers – carefully mixing product photos, models, and your brand logo with the right copy.
Then, after running the campaign for a few days you have a look at the analytics… only to find that they are not very useful.
Well, marketing analytics can tell you which campaigns brought in the highest average revenue and even the number of big spenders, but you need to know more.
You need to know what elements are attracting the big spenders – so you can do more of that, and less of the other stuff.
And that’s where your machine learning journey starts.
Machine learning in a nutshell
Machine learning is a vast subject with many methods and applications, but it is typically used to solve problems by finding patterns that we cannot see ourselves.
That is, you harness the unbiased and massive power of computers to see things that we as biased, slow humans cannot – and then come up with new rules for how to do things better.
For example, regarding the Target story, they wanted to give discounts to expecting couples on the items they needed as a new parent – and hopefully turn them into a lifelong, loyal cusotmer.
But they had to find them first. So, they hired a machine learning expert to help identify buying habits of someone who had just become pregnant. Once they knew this, they were able to target these people with special offers for pregnancy products.
But how did they do that?
Apparently the expert first identified customers who were already parents and looked at their buying habits leading up the birth.
Then he used a machine learning program to detect pre-pregnancy buying patterns – and fired off an alert to the marketing team when other customers had made similar purchases.
The marketing team was then able to make sure that these customers received direct mail with the special offers – which they could then track to see if they correctly identified the expecting families.
How to get your machine to learn.
Okay, but how can you, as a marketer, use the same approach? Where do you start?
Well, the first thing to do is to forget the ‘machine’ part and focus on the learning. That is, start with finding the rules and then worry about the automation of those rules.
Fortunately, there is a standard process to developing the learning rules. It’s not hard, but it is important to understand the steps first before starting – so read through them while keeping your own marketing tasks in mind.
1) Find your features
First you have to take the real-world problem and map it to something that you can put in a spreadsheet.
In this spreadsheet, the columns are the different aspects, or ‘features’, of your campaign. Things like the platform, the copy, or the photo.
The rows, then, are the data points. What were the features of each ad which led to the purchase? Which photo did they see, copy did they read, or platform did they click from?
For our jewelry ads, I use the following features to describe each ad that led to a purchase:
|Which platform was the ad on?||Google, Facebook|
|What did the ad photo feature?||Woman, product, logo|
|What did the ad copy emphasize?||Question, you, product, price|
|Was there a ‘call to action’ (e.g. Click Here)?||Yes, No|
Now, of course, there are many other features we could use. Time of purchase, pages visited, etc. – but these are simple and illustrate how machine learning works rather well.
2) Identify the result
Then, you need to have a clear, desirable result – and a clear, negative result. That way we can train the computer to find the pattern which leads, most often, to the right result.
For this example, a positive result is a $100 or greater spend, and a negative result is a spend under $100.
Notice that we do not include clicks with no purchases for this test – though, indeed, that may be another valid test to run as well. The reason for this is that we are looking for the ad which draws in big vs. small spenders, so we need to look at data for people who bought something.
So, for this example, the result is simple: if they spent more than $100 then we use ‘TRUE’ and if they did not, ‘FALSE’.
We could, of course, just stick with the dollar amount – but we will not, for reasons I explain below.
3) Gather the data
The third task is to gather the data for our features.
But what if you don’t have the right data? Ah, this is why you need to know the whole procedure before starting.
Possibly the most frustrating part of machine learning is coming up with a list of features and a result – and then realize we simply do not have all of the data.
Many times I have gone to build a report to highlight how different ads perform to find that I had not tagged the ads properly to see the differences in Google Analytics.
But since we’re reading this before we’ve done any work, we can make sure that our data will cover the features.
We tag each link in our ads with the appropriate URL variable so that when a propspective buyer clicks we know the platform, the copy, the photo, and CTA that brought them to our site.
We then combine the data with the features and end up with a table that looks like this:
4) Pick your machine learning program
OK, this is a bit hard. If you do even a little bit of reading on the topic of machine learning programs (or algorithms), you find that there is an enormous variety of algorithms. It’s bewildering and paralyzing when you first start out.
The reason for this variance is that each machine learning algorithm has its own speciality use-case which can produce some very complex models to help you predict the future.
Now, I’m not qualified to speak authoritatively on the subject, so I will defer to someone who is: Ben Lorica, the chief data scientist at O’Reilly media.
Good features allow a simple model to beat a complex model.
So, though I think it’s important to be familiar with a few models, choosing features and preparing the data set will help you solve your problem much more effectively than bellyaching about what machine learning method to use.
To make things simple, I chose the model which clearly tells you what’s working from a machine learning perspective – a decision tree.
5) Split your data
One important part of the machine learning methodology is to split your data so that you have one set of data for learning and another for testing. Typically the learning is much larger than the testing, so we will use 400 examples for learning and then test it on 100 examples.
What we are looking for here is whether the model that is built by the machine from the learning data actually works on the testing data.
That is, did the machine actually ‘learn’ well enough to be of any use in the future? Or did it just learn how to predict the learning data and is useless on data outside of that?
These are very important questions to ask and, again, there are many different opinions and methods for how to do this most effictively. But the four-to-one split between learning and testing data seems to be well-accepted, so we’ll go with that.
6) Run the algorithm
For this example, I’m using decision tree software C5.0 which has a free demo version here.
The software is very easy-to-use. All you need is a template for the features and CSV files for the training and testing data. Then you hit ‘run’ – and the program does the rest.
It really is that simple – but if you are confused about it there are many tutorials available to help you out.
7) Review the results
It takes no time at all for it to process our 400 training and 100 test cases. Then it produces an output file which is quite easy to read, though it does take some interpretation.
Here’s what our example tells us:
- Line one says that if the copy is either a Question or a Product, then FALSE, or the buyer is not likely to pay more than $100. 190 out of 195 fall into that category.
- Lines two and three: If the copy is a Price, and the photo is the Product, then the buyer is likely to buy over $100. 38 TRUE, 8 FALSE.
- Line four: But if the copy includes Price and the Photo is a Logo or a Girl, then the buyer will tend not to buy over $100.
- Lines 5-9: If the copy mentions You, and the platform is Facebook and the photo is a Product or a Girl then the buyer tends to be >$100. Otherwise, not.
As you can see, the first few recommendations are quite clear but as we go on, they get a bit more obscure – and can probably be ignored.
Now have a look at the evaluation of the tree – and the test data:
It looks complicated, but it’s actually quite straightforward. All it is telling you is that the decision tree had a 7% error when run against the training data – and the error rate only increased to 13% when run against the real data.
So, it seems that we probably have a useful algorithm for predicting which ads tend to draw in the big spenders.
Why did we use TRUE (>$100) and FALSE (<$100)?
By the way, this is the reason why we did not just give our algorithm the actual purchase amount – and instead split the result into TRUE and FALSE around the $100 mark.
If we had given the purchase amount, then the algorithm would have decided which dollar amount produced the most error-free results.
Perhaps the amount would be $100, but more likely it would have been $20 or $85 – or some other amount that did not matter to us.
So, importantly, determine what it is you want to know – and don’t expect the algorithm to figure it out!
8) Take action
So, now we know to remove Question and Product copy – but to include Price when the photo is of the Product to get more buyers spending more than $100.
And then, of course, run the test again in a few days to see if we are still getting equal results.
Now a true machine learning marketing system would follow these rules automatically – and keep running the tree with new data in order to improve results and improve predictive results.
This is where machine learning gets really interesting, as you an end up with a system which changes and improves itself over time.
But as that sort of automation may be difficult to do with your marketing systems, it’s just important at this point that you have an understanding of what is possible.
So, though it’s unlikely that machine learning experts will take our marketing jobs any time soon, it’s important for us to be familiar with new technology and know what is possible.
Hopefully from this simple example you can see the preparation necessary to use machine learning with a marketing program so that you can take steps towards some computer-assisted marketing automation.
And though a simple machine learning program may not be able to identify your customers as well as Target did, it certainly helps us identify what is – and what is not – working with our campaigns.