The next frontier is to build algorithms capable of making decisions in dynamic settings, when even humans cannot precisely understand what guides their actions.

This can be anything from driving safely to determining the ROI of a social media marketing campaign. These are activities that require constant attention and awareness of the shifting environment and force algorithms to predict and account for the consequence of their actions.

Reinforcement learning (RL) is the new approach to teaching machines to interact with the environment and receive rewards for performing the right actions until they successfully meet their goal.

For instance, Google’s AlphaGo algorithm was tasked to beat a human player in a game of Go. It was given data describing all the possible moves a human player can play, rather than being explicitly programmed to follow an “if…then” logic. The algorithm needed to determine the best course of action to get the reward (make a good move) and avoid punishment (being locked by other player’s pieces).

The particular attractiveness of reinforcement learning is that it teaches systems to focus on the long-term reward – win the game – rather than just predict the current best move, without considering the consequences of such action later in the game.

The diagram below shows the typical framing of a reinforcement learning scenario:

In online marketing, such an approach can translate to massive improvements in personalisation, ad campaign management and pricing, as the following three cases illustrate.

1. Developing highly personalised ads, optimised for the long-term

A/B testing is the simplest example of reinforcement learning in marketing. You are likely familiar with its goal: determine the best offer to pitch to prospects. The problem is that A/B testing is a patch solution: it helps you choose the best option on limited, current data, tested against a select group of consumers.

So how you do you act when you have seven or 12 different offers, developed to appeal to hundreds of thousands of consumers in the course of the next five years? You need to apply personalisation at scale – and that’s exactly where reinforcement learning comes to the fore.

Researchers from Adobe have proposed an ad personalisation solution that will account for the long-term effect of each proposed pitch. Think about this: you are 60-70% more likely to sell something to an existing customer and can have just a 5-20% success rate of selling to a new customer.

The Adobe team decided to test this assumption and developed two algorithms pursuing different goals:

  • Greedy optimisation algorithm (non-RL) was designed to maximize the probability of immediate clicks to tested offers and thus instant profits.
  • LTV (lifetime value) optimisation algorithm was tasked to increase the number of clicks users made over multiple visits to the website.

Both algorithms were tested on two datasets from the banking industry. The first one included 200,000 interaction records from a month’s worth of marketing campaign data that included 7 offers. The second set contained 4 million interactions with 12 different offers. They used two metrics to estimate the success of each algorithm:

  • CTR: total number of clicks divided by total number of visits x 100
  • LTV: total number of clicks divided by total number of visitors x 100

As expected, the greedy algorithm performed best when measured by the CTR metric, while LTV algorithm delivered better results in the latter. What’s more curious though is that LTV algorithm (now part of Adobe Marketing Cloud) could self-improve its performance over time and build new advertising policies upon existing ones.

For businesses, that translates to the following: instead of creating one-time attractive offers (e.g. sales) that will resonate only with a certain fraction of your visitors, you can create personalised offers that will generate higher ROI over the course of a few years, when presented to both new and returning customers.

2. Optimising display ad budget spending in real-time

The goal of advertising campaigns is to maximise the key KPIs (clicks, profit) based on the allocated budget. Modern ad tech tools have made significant progress in that direction. However, most of them still operate under the assumption that the market data is stationary. The algorithms do not account for changes in the bidders’ behavior.

For instance, if you are working on multiple accounts in the same niche at the same time, your tools cannot estimate how either of your strategies will impact another one and vice versa.

Reinforcement learning allows you to maximise both your individual campaign ROI and identify the best response to strategy changes of other ad bidders, all in real time.

A group of Chinese scientists affiliated with Alibaba group recently conducted a large-scale case study illustrating exactly how RL models can accomplish just that. They used a MARL (multi-agent reinforcement learning) algorithm to optimise bidding on the largest e-commerce platform in China, Taobao.

The proposed algorithm was sent to participate in a series of ad auctions and consistently performed better than manual ad bidding or a contextual bidding algorithm – a solution that does not optimise budget allocation over time. The results:

  • Manual bidding resulted in 100% ROI with 99.52% of budget spent
  • RL-powered bidding generated 340% ROI with 99.51% of budget spent

Over time, RL algorithms can self-improve their performance even further by aggregating more historic auction data, user feedback, and being challenged with more budget constraints.

3. Using inverse reinforcement learning to understand customer demands

Consumer needs and preferences change over time. A mobile data plan from the 2010s will not impress the modern user. When a competitor launches a better pricing offer, you need to respond fast.

Being able to estimate and anticipate such dynamic market changes can help you create better pricing for recurring services such as SaaS products or subscription services like internet/mobile/cloud plans, as well as improve your marketing campaigns for such offers.

A scientist from NYU Tandon School of Engineering has recently developed an Inverse Reinforcement Learning (IRL) model that can help identify customers’ responses to different plan changes, based on their service usage habits. For instance, you can estimate how much data the start plan should include, and how much a certain demographic of consumers is willing to pay for it.

The model also allows simulating the best upgrade offers by predicting the future consumption patterns of certain user groups and identifying the attractiveness of different plans to the customer in terms of their total expected utilities.

For instance, as a web hosting provider, you can estimate that customer group A will likely want to purchase an additional 5GB of storage in the next 3 months and will be comfortable paying an extra $15/month for it.

Reinforcement learning applications are yet to move from the labs to the mainstream, but the early tests are encouraging. As Google, IBM and other tech giants ramp up their spending on the research, we should soon expect more RL marketing tools to start shaking up the industry in the next couple of years.

Four industries set for a machine learning transformation in 2019