A/B TESTING PITFALLS AND BEST PRACTICES
WHO'S THIS DUDE? ALEJANDRO PARDO LOPEZ Client Side Developer && Team Leader @ Booking.com @apardolopez
WHAT IS A/B TESTING?
Test different versions of a website or feature by randomly assigning your users into two or more groups, each one exposed to a different variant of the website/feature, and comparing the impact of each against each other
WHY A/B TESTING?
Ability to detect and measure the real impact of our changes in the UX or performance
NO HIPPO'S, NO EXPERT OPINION
COMMON A/B TEST 2 variant test (Base and Variant) Userbase split in 50% - 50%
EXAMPLES
Source: http://unbounce.com/a-b-testing/shocking-results/
WHAT DO WE WANT TO MEASURE
USUAL SUSPECTS Conversion: Successful sign up / purchase / click on CTA per visitor) Bounce rates Time Spent Other user metrics (e.g. navigation times)
OTHER USEFUL METRICS Front End Performance (page load times, navigation times) Backend performance ( CPU wallclock, SQL wallclock ) Errors External impact (e.g. # of Customer Care tickets)
BEYOND THE TYPICAL A/B TEST
MULTIVARIANT TESTS (I.E. MORE THAN 2 VARIANTS) Test multiple variations of same feature Compare each variant aganinst the others The more variants, the bigger your user base needs to be to detect a change Use a power calculator to determine how many users you need to detect a certain amount of impact e.g. http://www.evanmiller.org/ab-testing/sample-size.html
EXAMPLES OF MULTIVARIANTS CTA colour (famous Google's 40 shades of blue) Copy experiments (CTA's, email headlines)
REDUCED USER GROUP (E.G. 10% OR LESS OF TOTAL TRAFFIC) Experimental features to reduced group of users for early feedback Early detection of errors Enabling potentially dangerous code (e.g. heavy DB queries)
GRACEFUL DEGRADATION - EMERGENCY SWITCHES Disable lightactions Reduce data shown to reduce queries to overloaded DB Hide buttons that lead to pages in trouble (e.g. in another datacenter that is under pressure)
A/A EXPERIMENTS No change Used for validating the tracking framework and analysis report
INTERPRETING RESULTS OF A/B TESTS
CONCLUSIVE VS INCONCLUSIVE RESULTS
CONCLUSIVE RESULTS We are confident that one of the variants is statistically significant
WHUT???!!!!
IN OTHER WORDS...
We can confidently say which one is the winner (or loser)
Source: https://developer.amazon.com/sdk/ab-testing.html
INCONCLUSIVE RESULTS
If there was an effect it was too small to be measured
SECONDARY METRICS FTW!!! Secondary signups Performance impact Errors Inconclusive can be also a target impact (e.g. code refactor)
TRUSTWORTHY DATA
"When running online experiments, getting numbers is easy; getting numbers you can trust is hard"
TRUSTWORTHY DATA Without data you can trust, you cannot make a decision. Basically, you know nothing about the results of your test
ROBOTS They can bias your results Visitor numbers will be inflated Visitor numbers can be altered in just one variant, making distributions uneven Conversion rates can me affected as well due to the increment in visitors But also clicks! Some robots parse Javascript
INTERFERING EXPERIMENTS Modifications on same features running at the same time can bias results
INTERFERING EXPERIMENTS E.g. button color change and position change
TRACKING
AKA PUTTING USERS IN YOUR EXPERIMENT
WRONG TRACKING === USELESS DATA ...and wasted time... ...and unmeasured impact on the site...
...and rage++...
TRACKING CHALLENGES
ASSIGN USERS TO VARIANTS RANDOMLY Distribution of visitors should match the expected split
AVOID NOISE Track only people that are actually exposed to the change Otherwise, spotting change in results is much harder, and exp has to run for longer e.g. Track everyone visiting the website, but the change is only on the product page
TRACK ALL VARIANTS Don't forget any (e.g. track base)
USING JAVASCRIPT FOR TRACKING
VERY POWERFUL More precise tracking (e.g. tracking based on user interactions)
TRACK USERS ONLY WHEN THEY ARE EXPOSED TO CHANGE Lightboxes Change is actually viewed in browser viewport
BUT WEAKER TOO Sensitive to JS errors Cookie overrides by HTTP requests (use server side cookies instead)
EXAMPLE TRACKING API
track( feature ) if( track( featureA ) === 'b' ) { /* Cool stuffs */ }
$('.item').on('click', function( e ){ var title = 'Base title for lightbox'; /* Do some stuff */ showLightbox({ title: title }); });
$('.item').on('click', function( e ){ var title = 'Base title for lightbox'; /* Do some stuff */ if( track( featureA ) === 'b' ) { title = 'New title for lightbox'; } showLightbox({ title: title }); });
TRACKING PITFALL #1
$('.item').on('click', function( e ){ var title = 'Base title for lightbox', position; /* Do some stuff */ position = $('#elem').offset().top; console.log( position ); /* End do some stuff */ if( track( featureA ) === 'b' ) { title = 'New title for lightbox'; } showLightbox({ title: title }); });
TRACKING PITFALL #1
Track as early as possible, but in the point where the change is shown
TRACKING PITFALL #2
$('.item').on('click', function( e ){ var title = 'Base title for lightbox', content; /* Do some stuff */ // Synchonous call, returns default content or variant B content. content = readContentFromServer() || {}; /* End do some stuff */ if( content.useVariantBcontent && track( featureA ) === 'b' ) { title = 'New title for lightbox'; } showLightbox({ title: title }); });
TRACKING PITFALL #2 Always track base
"ADVANCED" TRACKING
TRACK WHEN AN ELEMENT BECOMES VISIBLE
Useful for elements below the fold, that require scrolling to be seen
// Footer content is changed on the template, based on the variant (function(){ track.onView('#selector', feature); });
// Simple onView implementation track.onView = function( selector, feature ) { if( !selector || !feature ) return; var trackIfVisible = function( data ){ if( isVisible( data.selector ) ) { track.feature( data.feature ); return true; } return false; }; if( ! trackIfVisible( selector, feature ) ) { throttle( trackIfVisible, { selector: selector, feature: feature } ) .on( 'scroll' ); // We don't want to run all the function code on each scroll event } }
TRACKING PITFALL #3
Some content New content added by variant B Some other content
Some content New content added by variant B Some other content
Some content New content added by variant B Some other content
TRACKING PITFALL #3 On view is sensitive to position, you might have visitor distribution issues
TRACK WHEN USER NAVIGATES AWAY
Get me outta here!
TRACKING PITFALL #4
Assume you might lose some visitors in the experiment Calling a tracking pixel or AJAX when browser is loading another page is completely unreliable You can store in localStorage/Cookie the feature and track on next page load (still not 100% relible) Alternatively, pass a parameter in the url for the server to do tracking on the page rendering
QUESTIONS?
FEEDBACK