In this blog post, we’ll be talking to the founder of, Stuart McClure, about the company’s use of machine learning in ecommerce, as well as taking a more technical dive into algorithm training with CTO David Bishop.

Before we start, if you’re interested in marketing applications of AI, Econsultancy’s Supercharged conference takes place in London on May 1, 2018 and is chocked full of case studies and advice on how to build out your data science capability. Speakers come from Ikea, Danske Bank, Just Eat, Age UK, RBS and more.

AI for product tagging was founded in 2013 and is a retail sales aggregator where consumers can find more than a million sales items from hundreds of retailers.

Machine learning is used to classify these products, tagging them to enable the website to sort them into the right categories and to show a user products they may be interested in.

McClure explains the need for a machine learning-led approach:

“Given that we’ve got that massive catalogue of products that can change and shift very quickly – as products on sale can change in price or sell out – we recognised that we needed a pretty clever tool. We had a legacy tool that we built, based on Boolean search, which was alright, but it didn’t really understand the nuances of products. As a startup and essentially a tech business, we knew that AI would help.”

“We built a tool that could classify a million plus products based on training data,. It would then sort products into the right parts of the website.”

Simply put, the models that the LoveTheSales team used would tag a trainer, for example, understanding it was a trainer made of a specific material, in a specific colour, with a certain heel etc. Similarly, a shirt would be classified as such, with tags for sleeve length, collar type, pattern type and more.

Additionally, brand tags, price tags and such can give a good idea of the types of products a particular consumer is viewing. homepage

The homepage

That all sounds very simple, you may think. Why exactly is cognitive computing necessary?

The answer lies in the reliability of retailer data. McClure explains that this was one of the first problems the company encountered. “One retailer might give us amazing data and another could give us the same set of products but with awful data,” he said, continuing, “We use a text based classification tool, training various models with both positive and negative examples.”

What this means is that the retailer data for a trainer, for example, may not even include the word ‘trainer’, but if it includes the word ‘sneaker’, or if the product description includes words or phrases that the model predicts are associated with a trainer product, then a tag can be applied with a certain degree of accuracy.

McClure points out that as with many other machine learning use cases, the models can prove eerily accurate. He says “The really cool thing is, we’ll have examples, loads of them, where you’ll get say 100 shirts and there’ll be a piece of data that has nothing in it at all to say it’s a shirt, but the model has classified it correctly as a shirt because of the surrounding context.”

As an aside, to my mind the most intriguing example of this eery accuracy is the pricing model developed by Airbnb. In an article about the Aerosolve algorithm, Dan Hill states that the algorithm has become so effective that the pricing tips given to hosts in data-rich cities can be an accurate indicator of new micro-neighbourhoods. Essentially, Airbnb is mapping gentrification and charging guests accordingly.

A more technical explanation of training

David Bishop, CTO at LoveTheSales, has previously written in greater detail how the training of this tagging model works.

I think it’s worthwhile reposting some of his explanation of supervised learning, to offer the layman greater insight.

Though the initial language is daunting – Bishop declaring “We have architected a hierarchical tree of chained 2-class linear (Positive vs Negative) Support Vector Machines (LibSVM), each responsible for binary document classification of each hierarchical class” – a few diagrams can add some clarity.

Bishop writes that one might have assumed that these support vector machines (which are essentially models that give a ‘yes’ or ‘no’ answer to whether an item is, for example, a trainer) would be structured in a similar way to the website menu hierarchy picture below. i.e. Ask first ‘Is it clothing?’ Then ask ’Is it mens clothing?’ Then ‘Is it a mens shirt?’

However, this structure entails two new SVMs to be trained every time a new sub category is added (e.g. mens swimwear and women swimwear). As Bishop states, “Overall, deep hierarchical structures can be too rigid to work with.”

svm structure

A naive approach to support vector machine structure

What Bishop and his team did was to flatten the data structures as shown below into sub-trees. This means simple set-based logic can be used to ‘traverse the SVM hierarchy’ – e.g. Mens Slim-fit jeans = (Mens and Jeans and Slim Fit) and not Womens

The number of SVM’s required is therefore reduced using this method.

svm training structure

Flattened structure for SVMs

classification overlap training svms

Illustration of set-based logic

This set-based logic means a new class (such as ‘childrens’) would exponentially increase the number of final categories (producing categories such as children shirts, tops etc.) with additional training data only needed to classify if the item is ‘children’s or not (one more SVM).

This is where it gets clever. Bishop writes that they were able to “[re-use] training data, via linked data relationships.”

“For example”, he writes, “given some basic domain knowledge of the categories – we know for certain that ‘Washing machines’ can never be ‘Carpet cleaners’”

‘Re-using’ that data means that positive training examples for washing machines can be used as negative training examples for carpet cleaners.

Therefore when training data is added to improve the ‘Carpet Cleaners’ SVM – “it inadvertently improves the ‘Washing machines’ class, via linked negative data.”

Bishop adds that “another chance for reuse, that is apparent when considering a hierarchy, is that the positive training data for any child nodes, is also always positive training data for it’s parent. For example: ‘Jeans’ are always ‘Clothing’.”

You can see how this sort of methodology quickly reduces the amount of ‘manual labour’ in providing training examples for SVMs, and offers greater flexibility.

Machine learning doesn’t have to be ‘that’ difficult is run by a small team but that doesn’t preclude development of AI-based tech. The machine learning described above makes use of an open-source library and according to McClure the team is “working on an additional tier of this sort of technology which is basically a machine learning recommendation engine.”

It’s clear that McClure believe the use of machine learning, though democratised, is rather mythologised in the press and made to seem more difficult that it actually is.

However, he does note, when discussing the calculation of customer lifetime value, that “having your data in the right format” is key. As retailers seek to improve their data infrastructure and use reliable data, the machine learning part is much more achievable.

Are you a retailer using machine learning? Get in touch to let us know how, or leave a comment below.