Last episode, we talked about ABBA, our first A/B testing tool. We used it to test UI changes, new features, content recommendations — anything and everything we could think of. ABBA was so good and worked so well for so long…that we decided to get rid of it. Years of using ABBA taught us what makes for good experimentation, and we eventually realized we needed a better tool, built from scratch. Listen to find out why we pulled the plug on ABBA and how Spotify’s Experimentation Platform was born. And in case you missed it, a version of our internal platform will be available to the public as Confidence, a new enterprise product for developer teams — read today’s announcement: “Coming Soon: Confidence — An Experimentation Platform from Spotify”.
But first, let’s talk buttons. Everyone always has so many questions about buttons. How do you know which color they should be? Or how big they should be? Or whether the corners should be round or square? The easy answer: an A/B test! But if only all product experimentation was as simple as testing buttons. Senior staff engineer Mark Grey returns to talk with host Dave Zolotusky, along with senior engineer Dima Kunin — he helped build Spotify’s Experimentation Platform and was the guy who had the honor of finally retiring ABBA. They discuss the ins and outs of enabling experimentation at scale, including targeting criteria, controlling eligibility, the importance of measuring exposure, using properties instead of feature flags, the advantages of separating your app configuration from your experiments, fallback states, sample ratio mismatches — and all the other questions you have to answer about your experimentation process before you can even ask something as simple as “what color should a button be” — let alone “will this machine learning model consistently provide recommendations users appreciate over the next year”.
Plus, did you definitely, positively, absolutely eat the bread? Or did you just buy the bread? And a bonus trick question: What’s the difference between “treatments”, “variants”, and “groups” — and why is it always so hard to name things?
Learn more about ABBA and its successor, Spotify’s Experimentation Platform:
Plus, find out lots more about how we do experimentation at Spotify on our engineering blog — including a little light reading on automated salting and bucket reuse, choosing sequential testing frameworks, comparing quantiles at scale, and how we scale other scientific best practices across the org.
Read what else we’re nerding out about on the Spotify Engineering Blog: engineering.atspotify.com
You should follow us on Twitter @SpotifyEng and on LinkedIn!