From selecting creative elements for testing to reaching statistical significance, this four part blog series reviews basic and advanced tips for conducting a successful creative test.

Last time, we discussed how to prioritise and test keyword tokens and use dynamic keyword insertion to increase creative relevancy. Today, in part three of this four part series, we’ll walk through how to prioritise tests based on return and the importance of limiting test elements.

Prioritise tests based on return

As paid search programs grow, it becomes increasingly challenging to implement and manage creative tests across all groups within an account.

To optimise creative at scale, prioritise tests to focus on groups with the most potential to shift overall account performance. These groups are characterised by a high share of impressions, clicks or conversions within an account.

Due to limited resources, our fictional retailer, PowPow Sports, decided to only test creative in two of the groups within their account.

Group A received 10,000 impressions per week, while group B received 1,000. Each test resulted in equal improvements in performance within its respective group.

The table below highlights the improvements in group performance after creative testing and highlights the potential performance of another, untested Other group.

This example simplifies a common challenge where groups with little to no volume are prioritised over other, higher volume groups.

Though both groups benefited from a creative test, group A experienced a greater increase in clicks and conversions. Each test took the same amount of time to implement, but one resulted in a greater revenue return on time investment.

Prioritising creative tests for high volume groups has the greatest potential for incremental improvements in overall account performance.

Limit test elements

A new creative might be subject to one or many test elements. It can be triggered by a single set or multiple sets of keyword tokens. And it might share impressions with another or many other creative within the group.

Without controlling these variables, it becomes difficult to reach statistical significance and to determine what factors contributed towards a successful or unsuccessful creative test.

Limiting the number of elements within a creative test makes it easier to identify why one creative performed better than another.

For example, assume that PowPow Sports is testing two new creative. One tests a free shipping offer, creative B, and the other tests several formatting and language elements, creative C. Even with improved performance on the new creative, it would be unclear as to which test element in creative C contributed to its success.

Testing each element one at a time will better determine its individual impact on creative performance.

Good Test

Bad Test

To promote an optimal creative testing environment, keep keyword lists concise when building out new campaigns and groups. Groups that contain a small set of highly granular keywords allow the creative within that group to focus on a small set of tokens.

Rather than having to test tokens to improve relevancy, creative within these groups can test compelling offers and calls-to-action that drive greater increases in CTR and conversion rate.

The rate at which a creative test reaches statistical significance is associated with the number of creative within the group. Testing a large number of creative requires a large number of impressions.

With smaller, low volume groups, this requirement becomes an issue. For a group that receives only 1,000 total monthly impressions, testing ten creative variations might take several months to reach statistical significance.

For larger, high volume groups, reaching statistical significance is less of a concern. However, the opportunity cost of running on underperforming creative must be monitored much more closely.

Underperforming creative within these groups accrue a high volume of impressions that are better served on top performing creative, and should be paused once statistical significance is reached.