B TESTING PITFALLS AND BEST PRACTICES

A/B TESTING PITFALLS AND BEST PRACTICES WHO'S THIS DUDE? ALEJANDRO PARDO LOPEZ Client Side Developer && Team Leader @ Booking.com @apardolopez WH...
Author: Andrew Mitchell
44 downloads 2 Views 5MB Size
A/B TESTING PITFALLS AND BEST PRACTICES

WHO'S THIS DUDE? ALEJANDRO PARDO LOPEZ Client Side Developer && Team Leader @ Booking.com @apardolopez

WHAT IS A/B TESTING?

Test different versions of a website or feature by randomly assigning your users into two or more groups, each one exposed to a different variant of the website/feature, and comparing the impact of each against each other

WHY A/B TESTING?

Ability to detect and measure the real impact of our changes in the UX or performance

NO HIPPO'S, NO EXPERT OPINION

COMMON A/B TEST 2 variant test (Base and Variant) Userbase split in 50% - 50%

EXAMPLES

Source: http://unbounce.com/a-b-testing/shocking-results/

WHAT DO WE WANT TO MEASURE

USUAL SUSPECTS Conversion: Successful sign up / purchase / click on CTA per visitor) Bounce rates Time Spent Other user metrics (e.g. navigation times)

OTHER USEFUL METRICS Front End Performance (page load times, navigation times) Backend performance ( CPU wallclock, SQL wallclock ) Errors External impact (e.g. # of Customer Care tickets)

BEYOND THE TYPICAL A/B TEST

MULTIVARIANT TESTS (I.E. MORE THAN 2 VARIANTS) Test multiple variations of same feature Compare each variant aganinst the others The more variants, the bigger your user base needs to be to detect a change Use a power calculator to determine how many users you need to detect a certain amount of impact e.g. http://www.evanmiller.org/ab-testing/sample-size.html

EXAMPLES OF MULTIVARIANTS CTA colour (famous Google's 40 shades of blue) Copy experiments (CTA's, email headlines)

REDUCED USER GROUP (E.G. 10% OR LESS OF TOTAL TRAFFIC) Experimental features to reduced group of users for early feedback Early detection of errors Enabling potentially dangerous code (e.g. heavy DB queries)

GRACEFUL DEGRADATION - EMERGENCY SWITCHES Disable lightactions Reduce data shown to reduce queries to overloaded DB Hide buttons that lead to pages in trouble (e.g. in another datacenter that is under pressure)

A/A EXPERIMENTS No change Used for validating the tracking framework and analysis report

INTERPRETING RESULTS OF A/B TESTS

CONCLUSIVE VS INCONCLUSIVE RESULTS

CONCLUSIVE RESULTS We are confident that one of the variants is statistically significant

WHUT???!!!!

IN OTHER WORDS...

We can confidently say which one is the winner (or loser)

Source: https://developer.amazon.com/sdk/ab-testing.html

INCONCLUSIVE RESULTS

If there was an effect it was too small to be measured

SECONDARY METRICS FTW!!! Secondary signups Performance impact Errors Inconclusive can be also a target impact (e.g. code refactor)

TRUSTWORTHY DATA

"When running online experiments, getting numbers is easy; getting numbers you can trust is hard"

TRUSTWORTHY DATA Without data you can trust, you cannot make a decision. Basically, you know nothing about the results of your test

ROBOTS They can bias your results Visitor numbers will be inflated Visitor numbers can be altered in just one variant, making distributions uneven Conversion rates can me affected as well due to the increment in visitors But also clicks! Some robots parse Javascript

INTERFERING EXPERIMENTS Modifications on same features running at the same time can bias results

INTERFERING EXPERIMENTS E.g. button color change and position change

TRACKING

AKA PUTTING USERS IN YOUR EXPERIMENT

WRONG TRACKING === USELESS DATA ...and wasted time... ...and unmeasured impact on the site...

...and rage++...

TRACKING CHALLENGES

ASSIGN USERS TO VARIANTS RANDOMLY Distribution of visitors should match the expected split

AVOID NOISE Track only people that are actually exposed to the change Otherwise, spotting change in results is much harder, and exp has to run for longer e.g. Track everyone visiting the website, but the change is only on the product page

TRACK ALL VARIANTS Don't forget any (e.g. track base)

USING JAVASCRIPT FOR TRACKING

VERY POWERFUL More precise tracking (e.g. tracking based on user interactions)

TRACK USERS ONLY WHEN THEY ARE EXPOSED TO CHANGE Lightboxes Change is actually viewed in browser viewport

BUT WEAKER TOO Sensitive to JS errors Cookie overrides by HTTP requests (use server side cookies instead)

EXAMPLE TRACKING API

track( feature ) if( track( featureA ) === 'b' ) { /* Cool stuffs */ }

$('.item').on('click', function( e ){ var title = 'Base title for lightbox'; /* Do some stuff */ showLightbox({ title: title }); });

$('.item').on('click', function( e ){ var title = 'Base title for lightbox'; /* Do some stuff */ if( track( featureA ) === 'b' ) { title = 'New title for lightbox'; } showLightbox({ title: title }); });

TRACKING PITFALL #1

$('.item').on('click', function( e ){ var title = 'Base title for lightbox', position; /* Do some stuff */ position = $('#elem').offset().top; console.log( position ); /* End do some stuff */ if( track( featureA ) === 'b' ) { title = 'New title for lightbox'; } showLightbox({ title: title }); });

TRACKING PITFALL #1

Track as early as possible, but in the point where the change is shown

TRACKING PITFALL #2

$('.item').on('click', function( e ){ var title = 'Base title for lightbox', content; /* Do some stuff */ // Synchonous call, returns default content or variant B content. content = readContentFromServer() || {}; /* End do some stuff */ if( content.useVariantBcontent && track( featureA ) === 'b' ) { title = 'New title for lightbox'; } showLightbox({ title: title }); });

TRACKING PITFALL #2 Always track base

"ADVANCED" TRACKING

TRACK WHEN AN ELEMENT BECOMES VISIBLE

Useful for elements below the fold, that require scrolling to be seen

// Footer content is changed on the template, based on the variant (function(){ track.onView('#selector', feature); });

// Simple onView implementation track.onView = function( selector, feature ) { if( !selector || !feature ) return; var trackIfVisible = function( data ){ if( isVisible( data.selector ) ) { track.feature( data.feature ); return true; } return false; }; if( ! trackIfVisible( selector, feature ) ) { throttle( trackIfVisible, { selector: selector, feature: feature } ) .on( 'scroll' ); // We don't want to run all the function code on each scroll event } }

TRACKING PITFALL #3

Some content New content added by variant B Some other content

Some content New content added by variant B Some other content

Some content New content added by variant B Some other content

TRACKING PITFALL #3 On view is sensitive to position, you might have visitor distribution issues

TRACK WHEN USER NAVIGATES AWAY

Get me outta here!

TRACKING PITFALL #4

Assume you might lose some visitors in the experiment Calling a tracking pixel or AJAX when browser is loading another page is completely unreliable You can store in localStorage/Cookie the feature and track on next page load (still not 100% relible) Alternatively, pass a parameter in the url for the server to do tracking on the page rendering

QUESTIONS?

FEEDBACK