You know what one of my favourite feelings in the world is? 

Just to clarify, I mean at work. More specifically, one of the best feelings you can get when doing email marketing.

I love the feeling I get when one of my subject line tests teaches me something about my audience. What can I say? I’m a super cool dude who gets excited when a subject line delivers amazing response. 

That moment when the opens, clicks and conversions start showing up and you’re like, “I’m the king/queen of email!” 

Yeah, I know you know that feeling too.

But that feeling is rare and fleeting, because most marketers completely screw up their email subject line split tests. 

In this post, you’ll learn how to feel pleasure, or if you’d rather, how to avoid the pain of crappy split tests.

1. Not knowing what you’re testing until the last second

How often does this happen? You spend hours constructing a beautiful email campaign.  It looks awesome, it’s responsive, and you’re convinced it’s the best looking email of all time.

You then spend another while with your data team figuring out the perfect segment to send it to. You’ve done your propensity modelling, your demographic selections, and whatnot. It’s the perfect group.

Then, you spend a while uploading the creative into your ESP. You fight with their HTML editor for a while (standard,) test it out in a few email clients, and pat yourself on the back for a job well done.

And now, you think, “Oh hey, I should really split test the subject line. That’s what you’re supposed to do, right?”

So you think of one line, loosely based upon what you think probably worked last time. And then think of a second one. And you click launch.

And you make a bunch of money from the email because, well, email works.

But, here’s the thing: a split test, be it A/B or A/B…Z shouldn’t be viewed as a quick way to make a few more bucks.

They should be viewed as controlled experiments. Because controlled experiments are how we learn about the world around us.

So check out this example of two subject lines that were tested from a recent campaign for a well-known publisher earlier this week (I won’t name the publisher as they’re a client of mine… and I intend to keep them as a client so naming and shaming is not a great idea)

A: “Subscribe now to to save up to $1.50 per issue!”

B: “For the latest trends subscribe now and get the best product reviews around”

Any guesses which one of the above won? The answer is A.

Why did it win?  Here are just a few of the reasons why version A could have won:

  • ‘Subscribe’ works better earlier than later in the subject line.
  • Exclamation points incite action.
  • Mentioning the price is good.
  • Including the brand name is good.
  • Including the industry name is bad.
  • People don’t care about the content in the publication.
  • ‘Save’ is a better word than ‘get’.
  • ‘Latest’ is a bad word.
  • Using 15 more characters is bad.
  • Leading with a second-person verb conjugation is good.
  • Leading with a prepositional clause is bad.
  • Using the word ‘to’ three times in a subject line gets awesome results.
  • Using the word ‘and’ is bad.

I could go on and list off a few hundred more potential variables. How many more can you come up with?

This is the thing. By doing this split test, they learned nothing. 

Most people conduct split tests without a robust experimental design methodology. By doing this, you’re ignoring the whole point of split testing: to learn about the world around you, or, more specifically, to learn what drives your audience to respond to your messaging.

The subject line is one of the few causal variables you can control at point of launch. If you follow a poor testing methodology, you run the risk of either learning nothing, or thinking you learn stuff which isn’t true.

2. Focusing on one-offs, not longitudinal gains

OK, so let’s go back to the example above. Subject line A got about 1% more opens than B. Fantastic!

Most people will, at this stage, produce a confidence metric to determine statistical significance. And then you’ll say something like, “We are 95% confident that A is better than B”.

So first of all, this is an incorrect interpretation. To be completely accurate, how 95% confidence should be interpreted is as follows: “If we ran the same experiment again, we are 95% confident that the same result, all else being equal, will occur”.

Perhaps a slight semantic difference, but an important one.

This isn’t the main issue however. The main issue is the variance of variance.

Whoah. That’s a mouthful.

To illustrate, try this little experiment. Run an A/B test, where everything in both A and B are the exact same, sent to random samples from your list. Same creative, same subject line, same everything.

Now of course, these should give pretty much the exact same results… but sometimes, they won’t. It’ll surprise you how different the results will be.

In large binomial distributions with high natural variance, the important thing to look at is the variance of the variance, not the confidence of one hypothesis being proven or disproven.

(Note: a binomial distribution is a data set with only two outcomes – for example, heads or tails. Or in an email context, opens or doesn’t open, or converts or doesn’t convert).

What most people care about is how well A did vs. B, and the statistical significance of this result.  But this isn’t what you should care about if you want to learn about your audience.

Without considering and comparing the amplitude of variance across a series of tests, you run the risk of thinking something is more important than it actually is.

Looking for one-off wins (A vs B) is great if you’re a meth addict looking for your next hit.  But we should be learning over time to apply the results in a robust and profitable manner.

What you should do is run a series of controlled experiments over time, and then learn from the longitudinal results, not just individual data points.

This requires a lot of planning, a lot of number crunching, and a lot of patience.  But it’s the only sound way to provide durable, predictable revenue uplift.

3. Confusing correlation for causation (aka the eighth deadly sin)

Have you ever been to Israel? Well, here’s an interesting fact.

Now, there are many different viewpoints on whether or not Israeli hummus is better than that of neighbouring nations. I’m not getting involved in that debate.

Anyways, the Israeli diet can broadly be defined as Mediterranean. Lots of olive oil, fresh fruits and veg, and healthy fish. This diet has been widely connected with lower incidence of coronary heart disease.

And yet, Israeli Jews have a higher than average incidence of heart troubles.

So, for years, we’ve been told that the Mediterranean diet is good for a healthy heart. We’ve been told that the link is obviously causal.

Yet, an outlier like this shows that the link is not necessarily causal at all.

It is certainly a strong correlation. The Mediterranean diet may reduce the odds of getting heart disease. But there are clearly other variables at work here. For example, smoking rates, genetic factors, exercise frequency and the like. In fact, it could be that the diet actually causes heart disease and it’s a misleading assumption!

For those of you who skipped statistics classes in college, let me refresh your memory:

A correlation occurs when variable X is related to variable Y. For example, when you see puddles in the street, it is often raining.

Causality occurs when variable X causes Y. For example, when it is raining, it causes puddles in the street.

See the difference? Puddles are related to rain, and rain causes puddles.

So, why does this matter? Because when you run controlled experiments it’s vital that you can identify which variables are causal, and which are correlative.

Taking our subject line example above in point one, what if you thought the causal factor was “People don’t care about product features in the email.” Fine, fair enough.

So then you send out another email tomorrow without any features – but you have no context with which to interpret the results from that campaign.

You’re effectively testing because you think you should, not because you’re learning about what makes your audience tick.

Am I right or am I wrong?

Who knows. But one thing I’m curious about is the common practices across the industry, learning where people are with their email subject line split testing strategy.

So, do me a favour. If you do email marketing, take a couple minutes and fill out this survey: The State of Split Testing

It’s the industry’s first ever look at how people run controlled subject line experiments in their business. I’ll be analysing and sharing the results in a future Econsultancy blog post – you’ll learn how you stack up against your peers, and where there are areas of opportunity.

Don’t screw up your split tests!

With a bit of rigour and methodology planning you can be your business’ subject line superhero. Or, you can be a subject line meth head, chasing that fleeting gain in subject lines week on week.

It’s up to you. All I know is that thousands of people much smarter than me have learned through experience how to use experimental design to learn about human behaviour.

Why should we be any different?

PS – That survey link is here.  Fill it out!