A/B Testing in Data Science

Understand A/B testing from a data science perspective.

What is A/B Testing in Data Science?

A/B testing is a type of experiment in which you split your web traffic or user base into two groups, and show two different versions of a web page, app, email, and so on, with the goal of comparing the results to find the more successful version. With an A/B test, one element is changed between the original (a.k.a, “the control”) and the test version to see if this modification has any impact on user behavior or conversion rates.

From a data scientist’s perspective, A/B testing is a form of statistical hypothesis testing or a significance test.

TWILIO ENGAGE

A growth automation platform

Scale your growth strategy with a blend of automation, communications APIs, and real-time data.


A/B Testing need-to-know terms

The data science behind A/B testing can get complex pretty quickly. But, we’ve highlighted a few need-to-know terms to start with the basics. 

Null hypothesis 

The null hypothesis, or H0, posits that there is no difference between two variables. In A/B testing, the null hypothesis would assume that changing one variable on a web page (or marketing asset) would have no impact on user behavior.  

Alternative hypothesis 

On the flip side, an alternative hypothesis suggests the opposite of the null hypothesis: that changing an element will impact user behavior. Take the example below: 

Null hypothesis: The size of a call-to-action button does not impact click rates. 

Alternative hypothesis: Larger call-to-actions buttons result in higher click rates. 

Statistical significance

Statistical significance is meant to signify that the results of an A/B test are not due to chance (rejecting the null hypothesis). 

This is calculated by measuring the p-value, or probability value. So, if the p-value is low, it is saying that it’s unlikely the results of the A/B test were random. 

A rule of thumb tends to be that when the p-value is 5% or lower, the A/B test is statistically significant. 

Confidence level

Think of the confidence level as the inverse of the p-value. The confidence level is the indication of how likely it is that the results of your experiment are due to the changed variable (that is, these results are not random or a fluke occurrence). 

If a test is considered statistically significant when the p-value is at 5%, then the confidence level would be 95%.

Frequently asked questions

Having the right data infrastructure in place is essential to ensure accuracy (and speed) in A/B testing. Data needs to be cleaned, consolidated, and updated in real-time for teams to gain valuable insight.A few things to consider from a data perspective when it comes to A/B testing:

  • Understand your baseline before you begin. That is, how are your web pages, paid ad campaigns, and so forth, performing now? This threshold will help you understand what’s working versus what needs to be improved, and will provide context for future A/B tests. You can also run an A/A test, which shows the same page to two different groups. This can help ensure that there’s no drastic difference in user behavior or the software used to A/B test before kicking off your experimentation program.

  • Determine who the target audience is in your experiment (e.g. new leads vs current customers, marketing vs. developer personas, etc.)

  • Know the sample size you will need to reach statistical significance (Here’s a free calculator you can use to help determine this.)

  • Question if results of the experiment simply due to a “novelty effect” (i.e., when consumers gravitate to something new out of curiosity, rather than genuine usefulness.)

Businesses across all industries can find value in A/B tests. For an B2C e-commerce website, perhaps the marketing team runs an experiment in which two different versions of a call-to-action are shown to customers, to see if it has an impact on clicks. For a B2B company, they may run an A/B test on the subject line of their nurture emails to see which generates more open rates.

While there are some varying opinions on this, an A/B test should run for at least 1-3 weeks, depending on the sample size needed to reach statistical significance.

It’s generally accepted that reaching a 95% confidence level in an A/B test indicates that you’ve reached statistical significance.