In the first post in our series on analytics, we discussed descriptive analytics which you can use to keep others informed about what has happened.
The next step on your analytics journey is to discover why something has happened, and for that you need diagnostic analytics. Here’s an overview of the practices including an example, a step-by-step guide and some best practices.
Before we start, though, we’d like to let you know that Econsultancy runs an Advanced data analytics training course.
What is analytics?
In the previous post, we defined analytics in detail, but essentially analytics is a practice, a process, and a discipline; the purpose of which is to turn data into actionable insight.
Diagnostic analytics overview
Previously, we discussed how descriptive analytics will tell you what just happened. To understand why, however, you need to do some more work. You need to perform diagnostic analytics.
In many cases, when there is a single ‘root cause’ of the situation, diagnostic analytics can be quick and simple – you just need to find that root cause.
But, if no root cause is apparent, then you need to use diagnotic techniques to discover a causal relationships between two or more data sets.
The analyst also needs to make it clear what data is relevant to the analysis so that the relationship between the two data sets is clear.
An example of diagnostic analytics
In a descriptive report, you note that website revenue is down 8% from the same quarter last year. In an attempt to get ahead of your boss’s questions, you conduct diagnostic analytics to find out why.
First, you look for a root cause. Perhaps there was a change in ad spend, a rise in cart abandonments, or even a change in Google’s algorithm which has affected your web traffic.
Finding nothing, you then look at one of the data sets which contribute to revenue: impressions, clicks, conversions, and new customer sign-ups.
You discover from the data that changes in revenue closely tracks changes in new customer sign-ups, and so you isolate these two data series in a graph showing the relationship. This then leaves you, or one of your colleagues, to conduct diagnostic analysis on user registrations to find out why they are down.
The distinguishing features of diagnostic analytics
Like descriptive analytics, diagnostics requires past ‘owned’ data but, unlike descriptive analytics, diagnostic analytics will often include outside information if it helps determine what happened.
From the example above, it’s clear that domain knowledge is also more important with diagnostic analytics. External information from a wide range of sources should be considered in root cause analysis.
And, when comparing data sets looking for a relationship, statistical analysis may be required for a diagnoses, specifically regression analysis (see point 2 below).
Finally, with diagnostic analytics you are trying to tell a story which isn’t apparent in the data and so the analyst needs to go ‘out on a limb’ and offer an opinion.
How to do diagnostic analytics:
1) Identify something worth investigating
The first step is doing diagnostic analytics is to find something that is worth investigating. Typically this is something bad, like a fall in revenue or clicks, but it could also be an unexpected performance boost.
Regardless, the change you’re looking to diagnose should be rare as analysing volatile data is a pointless exercise.
2) Do the analysis
As shown in the example above, diagnostic analytics may be as straightforward as finding a single root cause – i.e. revenue dropped last month because new customer sign-ups were down.
More complex analyses, however, may require multiple data sets and the search for a correlation using regression analysis. How to carry out regression analysis is beyond the scope of this post but there are many excellent tutorials available to help you with it.
What you are trying to accomplish in this step is to find a statistically valid relationship between two data sets, where the rise (or fall) in one causes a rise (or fall) in another.
More advanced techniques in this area include data mining and principal component analysis, but straightforward regression analysis is a great place to get started.
3) Selectively filter your diagnoses
While it may be interesting that a variety of factors contributed to a change in performance, it’s not helpful to list every possible cause in a report.
Instead an analyst should aim to discover the single, or at most two, most influential factor(s) in the issue being diagnosed.
4) State your conclusion clearly
Finally, a diagnostic report must come to a conclusion and make a very clear case for it.
It does not have to include all of the background work, but you should:
- identify the issue you’re diagnosing,
- state why you think it happened, and
- provide your supporting evidence
Diagnostic analytics best practices
Here are a few more things to keep in mind when doing diagnostic analytics.
Correlation does not prove causation
Correlation will tell you when two variables (say clicks and conversions) move in sync with one another.
While it’s tempting todraw conclusions from that fact, the correlation must also make sense before it can be considered as causal evidence.
For some dramatic illustrations of why this is the case, please refer to this excellent collection of spurious (meaningless) correlations.
Be wary of using multiple explanatory data
When doing regression analysis, it is possible to improve your ‘correlation score’ (R-squared) by adding additional variables.
Doing so should, however, be avoided as you are both confounding your analysis (remember keep it to two factors at most) and ‘overfitting’ your model. That is, you are no longer using reason to find an answer, but instead just throwing data at the problem and seeing what works.
But don’t be drawn to easy answers, either
When you are thinking of root causes or of possible correlations, think broadly of everything that could have affected the outcome.
Typically marketers are seeking to explain campaign performance or web traffic and the contributing factors are endless.
The number of paid impressions, changes in advertising creative, and audience targeting are all obvious places to check but also consider things like the time-of-year, competitive offers, and platform algorithm changes.
Currently, analytics seems to be largely focused on describing data through reports. The potential for the practice, however, is far greater than displaying data and letting the audience make conclusions.
Analysts can do better, though. They can provide further insights into the data by using diagnostic analytics to try and explain why certain things happen.
Ideally, marketing reports should contain both. Descriptive charts and graphs to keep people informed about the systems and results which concern them and separate, diagnostic reports which aim to explain a significant phenomena such as a decline in new business or a change in web browsing behaviour.
Not only will this help the reader to understand why some decisions have been made, but it also provides evidence that the report writer understands the data and the point of collecting it. That is, we collect data so that we can make better-informed decisions through analytics.