December 7, 2021
From the idea of a new playlist, what does it take to send that playlist to Spotify users around the world? From startups to prototyping, QAing and finally shipping, publishing a new playlist on Spotify is a long process that is full of new learning every time.
We recently launched a new playlist initiative, Blend, where any user can invite any user to create a playlist where two user preferences merge into one shared playlist. Prior to Blend, the team worked on similar products, Family Mix and Duo Mix. These products create shared playlists for users in the same family or Duo plan The products were well received, so we decided to expand this product line, creating a version of the opt-in, automatic, shared, personalized playlist that could work for any two users.
Anytime we want to create a new playlist on Spotify, we aim to do something different that we haven’t been able to do before. This means that we cannot always rely on our past experience and often face new challenges that require new solutions. Especially with Blend, we were taking ideas from Family Mix and Duo Mix and expanding them to a much larger user group. One of the major complications we saw here was the scale increase in the number of users that we had to deal with. We’ve tackled the unique challenge of creating a mix of both content creation and invitation flow.
Most playlists are made up of many attributes and features. For example, with Discover Weekly, our main feature is discovery. For the Daily Mix, our qualities are familiarity and consistency. When we work with multiple users, however, we have the challenge of considering more features. The playlist is:
- Relevant: Does the track we’re selecting for that user reflect their taste? Or is it a song they heard once by mistake?
- This is especially important for track attribution – if we place a user’s profile picture next to a song, we need to make sure that this particular user listing song agrees as a representative of their tastes.
- Compatible: Are there streams in the playlist, or do the tracks seem completely random and unrelated to each other?
- Equal to: Are both users equally represented in the mix?
- Democratic: Does music that both users like top?
One of the key decisions we made for this product was to “reduce grief” or “maximize happiness”. In other words, is it better to pick everyone’s favorite tracks even if the other people in the group don’t like them, or if everyone’s favorite songs are never selected, is it better to pick everyone’s favorite tracks? “Reduce Suffering” is evaluating democratic and coherent features rather than relevance. Relevance on democratic and coherent qualities “maximizes happiness”. Our solution is to maximize enjoyment, where we try to select songs that are personally relevant to the user. This decision is based on feedback from staff and our data curation team.
It’s a little easier to create a mix with users of the same taste because they listen to a lot of the same music. However, if we have two users who have no general music listening history, it is significantly more difficult to create a perfect mix. In addition to considering how any change to the Blend algorithm affects all combinations of users, we needed a method that works for both types of pairs.
Blending is a fairly cumbersome process, between bringing data for both users and trying to bring the ideal sequence balance to all our features. When we tried to come up with the best algorithm, we weren’t so worried about our latency. Once we were happy with the quality of the blend, and started thinking about scaling the service, we realized how bad our latency was when we repeated the algorithm. We’ve spent a lot of time serving as fast as we can. What we learned was that there were some hot spots in our code base: some sections of the code were run more than 50 times per blend generation, while other sections of the code were run only once. If we try to optimize sections of code that have not been run many times, we will not be able to exert much influence on our latency. However, when we improved our hot spots, we were able to make a huge difference. The biggest example here of taking advantage of Java short circuits is switching the order of two function calls between an if statement. This simple code change has reduced our latency by 1/10 of its actual time.
We have been able to improve the quality of content using both qualitative and quantitative methods. While we usually rely on testing our own playlists when we make changes, we also need to make sure that we test different types of mixes (for example: a high flavor overlap mix and a low flavor overlap mix). We’ve created some offline metrics to measure how our features performed. We work closely with a data curation team, often called “loop people”. The data curation team evaluates and confirms the quality of content for the recommendation system.
For example, when the team wants to make a change to make the playlist more consistent, we:
- We tested our own playlists – the team implemented the change for about a month before we evaluated it. During this time we have been able to get a better sense of whether we like change or not.
- A heuristic review has been conducted, where our data curation team has reviewed several blends, including overlap scores of different flavors.
- This process helps identify issues of usability and comprehension that are most closely related to content quality and user experience.
- Use a tool called “Content Recommendation Scorecard”.
- Score each track on a number of features, such as relevance and consistency.
- From the Content Recommendation scorecard, we were able to see that the new method met our criteria more strongly for the features we wanted to optimize.
- The review has created enough confidence for the team to roll out the new approach to all users.
Creating a social playlist presents a new set of challenges in creating a new playlist algorithm. We had to try to optimize for many qualities: relevance, coherence, equality and democratic decision making. We had to consider both high-flavor overlap users and those who didn’t have too much flavor overlap.
When creating Blend products, we wanted a way to let users know about the similarities and differences in the tastes of their music. As a result, we’ve been able to create Blend Data Storage, where we can show users information like an artist who aggregates them and scores their tastes. This year, during the Rapad campaign, we gave users a Rapad Blend experience. We’ve changed Blend Data Storage to use data from Wrapped, to show users information like their best interactive artist and best interactive genre of the year.
We’re still working hard to improve Blend, and create a product that allows our users to feel intimate through music, while thinking of more fun ways to enhance the social experience on Spotify. If this kind of work sounds interesting, our personalization team is hiring!
Tags: backend, data modeling