Statistical significance — or stat sig — refers to data that can be attributed to a specific cause and not to random chance.
In the case of testing ad creative, reaching stat sig means that if you ran the test multiple times, you would see similar data at least 80% of the time.
One of the greatest determining factors of stat sig is the sample size. Again, in the case of testing ad creative, this is the number of people who were presented with one version of ad creative.
Facebook was the first to bring this measure of validity to most marketers’ attention in the world of paid social. Exiting Learning Mode (i.e. reaching 50 conversions) has long been the stat sig standard that most marketers stand by.
But for today’s most reliable ad testing method — multivariate testing (MVT) — it’s not always realistic. Because MVT involves testing a large number of ads and creative elements at once, reaching statistical significance for each ad and element is more difficult than with, say, A/B testing.
There are two silver linings to this cloud.
Let’s double click on both of these insights.
With the right tool, you can reach stat sig without hitting 50 conversions. Marpipe is the only automated ad testing platform with a live statistical significance calculator built right in. We call it the Confidence Meter, and it tells you if your data for each variant group is scientifically proven — or not.
Each multivariate test run on Marpipe contains multiple variant groups — some of which reach high confidence sooner than others. When a variant group reaches high confidence, it means you have enough data to make creative decisions. And when enough variant groups reach a high confidence level, you can move on to your next test.
Look to the Confidence Meter to understand:
The Confidence Meter does NOT tell you that:
Gray means:
Yellow means:
Green means:
What underlying statistical methods do we use?
Marpipe uses the G-test (also known as the likelihood ratio test) which is used to determine if the proportions of categories in two or more group variables significantly differ from each other. It has been the standard for decades in science and mathematics as a test for significance.
Why don't we do multiple analysis corrections?
Marpipe lets you break down and analyze your results in a nearly infinite number of ways — or just one. If you accept the results of multiple analysis breakdowns, from a statistics point of view you are more likely to think that there is no meaningful result when, in fact, there is one.
Because of this, we highly suggest customers decide on a primary hypothesis prior to running a test. And when looking at results across tests, we also suggest creating a new test to validate any specific patterns that seem to emerge.
Not reaching stat sig does not render MVT unreliable. There are still clear winners and losers, even with smaller sample sizes per ad. It just means we have to look at early indicators of success rather than stat sig to help us make quick decisions about which ads and creative elements are or aren’t performing.
In short, we need to analyze any creative outliers rising to the top to see if those ads and creative elements are worth further testing that could eventually get us to statistical significance.
Early indicators can look like this:
It’s also wise to take another look at your results using another KPI. Because while an ad or creative element may not reach stat sig for, say, purchases, it might for, say, clicks. This is a good sign that, given more time and budget, that ad or element could reach stat sig for purchases, too.
Statistical significance is an important hallmark of data validity. But the old 50-conversion benchmark isn’t always possible — or necessary. Combine this knowledge with the right multivariate tool, and you can achieve stat sig while testing large numbers of ads and creative elements at once.