Alexandra: Welcome to The Data Democratization podcast where we bring you the best frontline stories of data and data privacy. I’m Alexandra Ebert, MOSTLY AI’s Chief Trust Officer. Here’s my co-host, Jeffrey Dobin, Privacy Expert and Lawyer from Duality Technologies. Hi, Jeff.
Jeffrey: Hey, Alexandra, what’s up? Good to be back here. Where do we travel to this time?
Alexandra: Well, it’s only virtually, but we will be talking to somebody from Stockholm, Sweden.
Jeffrey: Stockholm, Sweden. Wow. When I think of Stockholm, one of the first things that come to mind is one of my favorite apps, Spotify. Obviously, they’re super data-driven, always making great recommendations for the songs I should listen to. I actually visited their office once in New York City, it’s a huge office.
Alexandra: Oh, nice.
Jeffrey: Yes, filled with, obviously recording studios for all the artists that come through, and then pictures of these artists all over the walls. It’s really inspiring. Then the other favorite thing that I found when I visited, there was all the amazing food. I’d love to go back in real life because traveling there over Zoom just isn’t the same.
Alexandra: Yes, you’re right.
Jeffrey: Today, you’re going to hear from one of Spotify’s, senior data scientists, Ryan McCabe. Ryan is extremely experienced, he has an exceptional understanding of the business side of things at Spotify. Over the years, he’s specialized in building customer-facing digital products and has practical advice to offer to really any size organization from startups to mature businesses looking to jumpstart their data-driven capabilities.
Alexandra: Absolutely. Ryan also has some advice for those looking to make a career in data science. This episode really is for everyone.
Jeffrey: It sure is. Let’s jump right into things.
Alexandra: Good morning, Ryan. It’s great to have you on the show. Can you share a little bit with our readers? Where does this podcast find you and also a little bit about your background?
Ryan McCabe: Hey, Alexandra, thanks for having me. I’m actually in Stockholm, Sweden right now, in my home office. It’s actually a relatively beautiful morning in Stockholm, which is a big bonus. A little bit about myself as you introduced me, my name is Ryan McCabe. I’m a data scientist, currently at Spotify. My background is with consumer-facing applications and digital products.
Alexandra: Where does your passion for data science come from, and how did you arrive at Spotify?
Ryan: I think it arrived from back in my studies, I studied economics and more along the lines of behavioral economics and economic policy. It was just very interesting to me that you could measure human behavior at such a scale, and you could experiment and have impact and really see how these things play out. I think that’s really where it comes from. It turns out, it’s like a pretty useful tool in the marketplace.
Alexandra: Absolutely, and how did you arrive at Spotify?
Ryan: Yes, through a recruiter. This is how these things work. They reached out when I was working at Prezi. I was there for three years. Prezi is a zooming presentation software in Budapest, where I was located previously.
Alexandra: Lots of experience with startups, that are now pretty big organizations, of course. With all your experience in supporting the development of digital products at startups, are there some key takeaways that also large organizations like big banks, financial institutions, could benefit from? Any tips that you have for them?
Ryan: I’ve stepped up each move in my career to a bigger place, and Spotify is the biggest place that I’ve worked at so far. We have over 5000 employees now, which for me is quite big, coming from Prezi, which had hundreds of employees, and then before that Brickflow, which was like a small startup in Budapest. I think it really depends on how much you want to spend on data science.
I think you see plenty of tech companies that spend way too much and spend way too much too early. You have plenty of older industries or more traditional industries that could really benefit from better analytics but it takes a huge upfront investment to get going. It’s not just hiring a bunch of data scientists, you’d have to have data processing and ways to collect data around your business. The first advice is nothing like organizational, it’s more around what are you measuring, and what should you be measuring? How do you get to be measuring that thing?
Alexandra: You said something interesting that you think that many organizations spend too much on data science too early? How will an organization know whether it’s the right time or whether it’s still too early to invest in data science capabilities?
Ryan: Well, if you’re not already measuring things, then it’s too early. I have done some consulting for small startups while working my regular jobs, and usually what I say to them is, “Okay, good thing you’ve hired me, so I can tell you not to get a data scientist.” A lot of the work that I end up doing, is working with some of their engineers to start saying, “Okay, what would some data pipelines look like, or what data do we have, and what data do we need?” Which is very different from a lot of the expectations when they try to hire a data scientist. It’s really hard for a data scientist to be productive in an environment where there’s not much data, or it’s very difficult to access.
This is my experience in startups, where someone from their board says, We want more data at the board meetings, and tell them to hire data scientists, but the people on the board don’t get it. It’s really up to the CEO or the founders of the company to say, “Well, hold on a bit, this is a long-term investment.” Especially if it’s like a product, which isn’t a core digital product.
Alexandra: Yes, of course. That’s definitely one challenge. When you mentioned that startups, of course, first have to build these data pipelines. When we look at some of the large enterprises that are interested in starting with data science, AI, we oftentimes also find organizations that not yet have done all this preparation work of having the data in a clean and reliable format. What’s your experience with large enterprises if they want to tap into data science, what are the kind of homework points they should accomplish first?
Ryan: Yes, you should do an audit of what you have today, and then what would a good MVP look like. This isn’t just to like building out your data, this is for everything. You could use this for your life. I’m not a life coach, but it’s been really good to, “Okay, where am I today and where do I want to be, and go from there.” There’s a lot of folks out there that are very willing to take your money to help you make these transformations. They’re not all at Boston Consulting or McKinsey.
There’s plenty of small consultancies that can help embed with your team and automatically embed best practices, and help you avoid some of those pitfalls. Also, we’re in a world where the cloud is pretty advanced now, and you have these really big players like Amazon, Google, and Microsoft. They have done this probably 10s of 1000s of times by now. It’s definitely easier to get started in this space now than it was back when I was trying to implement solutions for smaller startups.
Alexandra: Yes, definitely. I think cloud technologies definitely make it easy and all the experience that has been made by now. Can you share a few of the data science and data best practices that you see within Spotify? Are there some elements that other organizations could learn from?
Ryan: Yes, there’s plenty. Spotify definitely has the best data culture of any company that I’ve been at or heard about, but we’re also really willing to spend a lot on it. I know that’s not a reality for a lot of companies. I think a lot of it, you can have organizational maturity around data, without spending a lot on data science, or analytics. A lot of folks try to go out and buy a seasoned manager, an analytics manager.
I’d say if you have one really performant and experienced individual contributor, really senior staff, a data scientist, that also happens to be extroverted and willing to work with stakeholders. They don’t necessarily have to be extroverted but really passionate about getting the organization to ask better questions and experienced enough to be able to provide them with tools to maybe answer their own questions, depending on the organizational skill, or able to bring on more junior folks and really shape them.
Alexandra: To help our listeners better understand, can you give an example of what would be a really good question to ask a data scientist which may be some improvement and is not yet ready to be answered by a data scientist?
Ryan: Yes. A quite common bad question to ask a data scientist is, how do I meet my KPI? You’ll find this with stakeholders all the time, where their performance is evaluated by some KPI and they want to meet it. It’s pretty rare that you’ll find performance ambitious people that– I guess the analogy is like we’re all rats in the maze. 95% of the rats in the maze are just trying to cut through the maze. There’s not that many people that are like, “Oh, shit, I’m in a maze. What’s going on?” and crawl over the top but there are some rats who do figure it out.
I think a lot of the data scientist job and in any organization is getting better questions out of the stakeholder, by reformulating it for them. There’s the question you hear: I want to meet my KPI and it’s just, “Okay, well, what’s the most sustainable way to do that?” Let’s say it’s somebody that has some revenue KPI. I think that’s quite common.
Let’s say it’s a subscription business, well, you can really boost short-term revenue by increasing prices, especially on the renewable base, but in a year, you’re going to have a lot of trouble. There’s all sorts of tricks that you could do. Really getting the stakeholder to think a little bit more holistically, about their product or the thing that they have control over and make sure that their incentives on the KPI are more aligned with the company overall performance or more aligned with their performance in one or two years.
Alexandra: An holistic perspective and then asking better questions. One other thing I would be interested to hear from you, we oftentimes hear it from prospects, we talk with the data scientists, on one hand, you already mentioned it strategy to get access to relevant data, but also that in some organizations, some manager decides, “Okay, we want to move forward with artificial intelligence without already having a specific use case in mind.” Then sometimes they experience challenges with the business stakeholders of exciting them for the possibilities of AI and really finding a meaningful project to work on. What’s your experience on that? Did all companies that you joined already have an established data science team or was it also some change management that was necessary and lots of people work to get everybody within the organization on board?
Ryan: I think Spotify is really the first company that I was at that really, really had consistent machine learning in production. The common joke is with machine learning and AI, it’s like high school sex. Everyone’s talking about it. Everyone’s saying that they’re doing it, but no one’s actually doing it. I think that’s the case still.
Alexandra: It means Spotify is the cool guy from the high school kid.
Ryan: Well, so maybe Spotify has the scale, and the money to do it and the need. Our competitors all have the same catalog as us, all the same songs. When there’s millions of pieces of content, and one user, and a limited amount of space on your home screen, then yes that is a problem which requires not only analytics but production-level machine learning. There’s a lot of problems which people have that could be solved with some good logging, some good data, and a 25-year old that knows SQL.
Alexandra: For example, I think it was Accenture, said that I think 80% of C-level executives, actually want to scale AI within their organization. Also have their KPIs attached to it, but failed to do that. What are the secrets of Spotify? Why does it work so well within Spotify to really have AI and machine learning in production and why do all these other big organizations struggle to accomplish the same?
Ryan: Because it’s a need and a strategic method that we have to have this. You can imagine if it was only a search. How would you discover music? You’d have to do it the old-fashioned way. I remember as a kid, going to the library and judging albums by their cover, and then popping them in with some headphones, on the wall, and listening to it out. It took forever to participate in any of kind music discovery. That was just the need of the business. I think there are consistent things that are underserved in businesses.
After going to Spotify, I realized, “Oh, maybe we should have solved that with this machine learning when I was at Prezi, for example.” Anytime, where you have, a lot of pieces of content or a lot of things, and you have to show a limited set of them to a customer, then there’s an opportunity for machine learning there.
There are plenty of other opportunities that are not around that. Fraud is a very, very common example. These are things that are helped by machine learning, but 99% of the business machine learning cases are not really AI. You don’t need neural networks. You don’t need even reinforcement learning. A lot of times, you just need some offline random forest and an interpretable model that you can dissect and learn about.
Alexandra: If we come back to the AI and machine learning use cases, can you share about one particular use case that you find super exciting, and what the impact of this machine learning algorithm that was developed was today?
Ryan: Yes. I find the very boring use cases interesting.
Alexandra: Why is that?
Ryan: Because a lot of times they can have a lot of impact or a lot of times that it’s not something that you would necessarily think of, but it’s a very good, good thing to apply it to.
Alexandra: Do you see this in the industry that sometimes people want to attempt the more sexy problems and a little bit reluctant to look into the more boring ones and therefore, miss the opportunities?
Ryan: Of course. I based a lot of my career success on choosing projects that people shy away from and hate or are the ones that they’ve failed at before. You can imagine, in any organization where there’s a lot of data scientists, and a lot of really good problems, there’s competition for those problems. This is natural. I think this is good general advice as well, maybe in life when others zig, Maybe you should give some consideration to zag. One of those areas, for me at Spotify, was messaging, push messaging. I looked at some data, and I realized that wow, only a very limited puppy part of the population is actually getting pushed messages. That was because of the way that we designed the system and the organization.
We had a self-serve model, and then marketers could go in and design a push message for a campaign and then send those. It turns out, they’re all targeting the same people. The people that benefit from these are not necessarily the people that you’ve targeted. It’s like, “Okay, well, it seems like we have an opportunity here and 90% of the opportunity wasn’t the machine learning opportunity. 90% of the opportunity was making sure that everyone who wants messaging is getting a standard minimal treatment of that so like a message a week because this is a product or a feature that can be annoying.
If people are getting a message a week, then they’re getting frequent enough so that if they don’t want it, they can opt out. That way, we’re standardizing the treatment and we can really measure the effect experimentally. How to choose which message to give to which user and that’s the machine learning problem. Basically, we gathered all the existing messages that everyone’s ever sent. Then did some user research, figured out what people want or what they say that they want, that they’re not getting. Make sure we have those things. Maybe improve the UX of it. There’s new capabilities from the platforms.
Now iOS, and Android, you can have images in your notifications now, which is nice, and some other cool things around settings. Basically, we ended up with that similar problem where you have a user and a message to send and a lot of messages to choose from. You have to sort them somehow. For this, we chose to use a reinforcement learning model, a multi-armed bandit problem. For part of the population, we’re getting a random order, sort order of messages, and then we’re measuring some outcome. Then for the rest of the population, we’re giving an optimized sort of those messages.
Alexandra: What was the impact of these new initiatives? Did you see anything in how frequently people then use Spotify?
Ryan: Of course, they use it more. As time goes on, people tend to interact with the messages more, which is a very good sign. Then when we take the messaging away, we can see that. A lot of times when you do something annoying, that brings the user back end, like let’s say, if we send a push message every single day, it’s like, “Hey, what do you look at on the experimental charts.” If you keep giving it, then it looks great but long-term, it’s a horrible strategy. If you took it away, after giving a message every day for two weeks, what you would see is a big substitution effect.
Let’s say so you give it every day for two weeks and then take it away. On day 14, the test group engagement is going to be way lower than that of the control. We had a great positive treatment effect and it’s long-term and that’s fantastic. Also, there is no need for any huge organization around deciding who gets what message or some complicated spreadsheet or some gatekeeper around messaging. It’s a very simple system and if any marketer or product person for that matter, wants to send out a message, they can. There’s a system that deals with it and it’s there. If it’s a good message, it’ll get more exposure and if it’s a bad message, it won’t.
Alexandra: For people, it makes their lives easier. I would also like to learn more about Spotify Wrapped for our listeners who don’t know exactly what Spotify Wrapped is. I think at the end of the year, you get an overview of your listening habits for one year. In 2020, we had this one decade of listening habits in Spotify is one of the services that is around for quite some time, where you worked with a tremendous amount of data but still managed to spend significantly less on the processing. What were the learnings from this project and what was accomplished there?
Ryan: That’s an interesting one. It’s one of our most beloved kind of– Really, it’s organic marketing if you think about it. You can imagine, processing all that data for every user, going back a year is a very cumbersome task. I didn’t work directly on this, but a team in my organization did in New York, some good friends of mine. They were able to optimize something that was very expensive in a Dataflow job. Were able to bring this experience much, much cheaper to users.
The thing is, is like a lot of these problems, which I described at Spotify isn’t something that you could go take to your startup and apply. It’s only because we have 350, or whatever it is now million monthly active users. For example, that messaging use case, that doesn’t work that 95% of applications. Also, you don’t have a bunch of people in different parts of the organization wanting to send a message or having a need to send a message. Every person that wants to send the message is in the same room, as you are. I guess it is similar with that one data flow job. Looking at costs, it’s very easy to say, “Okay, what’s our most expensive job?” If you’re a big, scaled company, then it can save a significant amount of money if you’re able to optimize that.
Alexandra: Absolutely. If you say that things like that won’t work with many organizations because they are not in this position of having 350 million users, what would you say are the low-hanging AI and machine learning fruit for let’s say, many of our listeners come from the banking or insurance space, large organizations that of course, have digital banking apps and other services? What would you say are the low-hanging fruits there? What should you look first into?
Ryan: I think the lowest hanging fruit is data accessibility. When Spotify moved over to the cloud, which was not that long ago, actually. It was before I joined which was three years ago, but not too long before I joined is that the usage of data went up like a bunch. Of course, that’s expensive but it’s also a very, very good thing. We weren’t just measuring the counts of things. We’re measuring the rates and the rates between groups. The level of sophistication of the questions increased dramatically with those costs. That’s very good cost. In a lot of organizations, data is very much siloed and not very open or there’s no automated process around this. You have to ask somebody to get access to this and then they have to check their email.
Alexandra: It takes months and you still don’t have your data. This is actually the business problem we solve with our synthetic data software that we oftentimes hear from data scientists that have all these great ideas. They wait three months, four months to get access to data and with synthetic data, it gets much faster and enables them to be innovative. Getting access to data
Ryan: This is a real problem and you can’t do any kind of machine optimization if you don’t have access to data.
Alexandra: Having this in place. Coming back to Spotify and this impressive amounts of data that get processed here, how do you protect user privacy?
Ryan: Well we follow the law. I don’t have much exposure to this directly. I have teams who I have a great amount of trust in that really make sure that we’re following the law to the letter. The world has just changed post-GDPR. I used to know what my user ID was, I don’t anymore. That’s how I would tracklogs is go into my account, click around, and then go into the data and check. Now, I make an anonymized test user, fake user, and have to use that.
I think this is really the job of not data scientists to think about which data should be accessible. I’d say, not to shift the responsibility to users, but it’s more on what the public wants. Then also what the business really needs to provide their service. I think if you ask folks, “Do you like having personalization in Spotify or any app?” They’d say, “Yes, of course, it would be pretty useless without personalization.” Then if you ask them, “Do you like that company X is going through your data?” The answer tends to be no.
I think, Spotify is really lucky in this case, where we don’t really need much else than what you listen to. It turns out what you listen to is a good predictor of what you’re going to listen to. We don’t have that problem. We’re not going to track you off of the platform because the most valuable information is what music you listen to or what podcast you’re listening to.
I know there’s plenty of organizations and big tech companies that are towing a very fine line when it comes to privacy. I think it’s not only something that they should wait for lawmakers to address, because in law, a lot of cases, they’re very slow, and they might overreach in some places and underreach in others. This is a little bit above my paygrade but I really think it’s up to these market leaders in tech to help educate policymakers and really be a little bit forward in what regulations should be in place around their privacy.
It’s also highly controversial, so you can see that Apple has differentiated itself in the marketplace by saying, “Hey, we’re going to protect your data, and we’re not going to sell it, we’re going to keep it safe.” That’s a very good thing. It’s something that I value, personally. I think it’s something that their consumer base values, which is–
Alexandra: That’s that many people value and I’m actually happy to see that we’re currently in an age where people really appreciate it and demand that their privacy is protected. Therefore, I think it’s one of the ways forward that organizations really figure out ways to find this balance of, on the one hand, protecting the privacy and then, of course, delivering great customer experience and the services that are really helpful and beneficial for the consumer base.
Alexandra: I think many organizations are likely moving in the right direction. What I want to touch upon, again, you mentioned that it’s really also a collaborative effort of getting data scientists access to data. On the one hand, of course, users should be in charge of what they want, what they don’t want, but didn’t understand correctly that also within Spotify there are teams that really are looking into how everything you and your data scientists need to accomplish the great work you do have access to.
Ryan: We have an internal system for data scientists to go and check what’s up with a data set, what is the level of personal data in this data set. Like I said, most of the products and services we’re producing don’t require any personal data. To send an email, of course, you need that literal email string. I don’t need that level of data to count things or check AP tests. I’m interested in how a change that we ship affects hip hop listeners versus classical music listeners. Which is something that is huge groups of people, and is actually a lot more useful. If you want to iterate on a product, then what individual email addresses or I don’t know. I’m even struggling to find an example because this is not very useful.
Alexandra: I completely agree. I think it’s really the patterns and the insights that you can find about a broader group of people as opposed to really being that granular and looking into one individual. Because after all, we want to recommend, several people, which music to listen to, and figure out what you can learn from others to make these recommendations.
You mentioned that this of course makes privacy protection easy and I think is one of the ways how we can really accomplish this balance of, on the one hand, using a not that privacy-sensitive information but still delivering outstanding customer experience and great music recommendation as in the case of Spotify. Really impressive what you guys are doing here. At the end of our sessions, we usually ask our player little this or that game. Just answer with the first things that come to your mind. Are you ready?
Ryan: Yes, sure.
Alexandra: Wonderful. Our first question would be, data science or statistics? It’s a tricky one.
Ryan: Can I elaborate? I’d say data science. The difference is that the skill set is much more broad. I work in the product side, in R&D. Basically, I have a lot of the skills that a product manager might have because it’s required for my job, but I also have these data skills. I think that is much more broad of a skill set than a statistician. When I’m looking to hire data scientists, I’m not just looking for somebody that can tell me about the confidence intervals. I’m looking for somebody that can connect the dots, and I can ask like really good questions, and also be able to answer them themselves.
Alexandra: I can imagine, especially with what you shared that it’s oftentimes the job of the data scientist to educate the business people on how to rephrase the questions that he designs actually, he can help.
Ryan: Absolutely. It’s definitely data science, and it’s here to stay. If you’re a statistician, that’s great.
Alexandra: Get some product skills.
Ryan: Or some other skills, some commentary skills so you can apply what you know.
Alexandra: For example, what could be good?
Ryan: Business in general. Whatever industry knowledge is very useful. I don’t think it would be very good for me to be a data scientist in the telecom space or oil and gas. I have specialized in industry, which is consumer-based in digital products. It would be a big learning curve to enter a different space and be effective. I can apply the knowledge that I’ve built over the years about consumers and digital products to anything or subscription models for that. These are complicated things and more complicated than often ACI. It’s really important to have domain knowledge, and it’s often one of the common mistakes that people make with hiring. Is underemphasize the importance of domain knowledge.
Alexandra: I think that’s a very valuable takeaway for our listeners that are in the process of building their AI teams and data science teams. Next questions, desktop or mobile?
Ryan: Mobile, of course. Everything’s on mobile now, and it’s still eating everything up. I spend a lot of time on desktop, I think it’s more efficient.
Alexandra: Training reinforcement models on your smartphone is still as a little bit challenging, I could imagine.
Ryan: Mobile is here to stay.
Alexandra: That next question House or Techno?
Alexandra: Apple or Google.
Alexandra: Why Google?
Ryan: Because they’re our cloud service provider.
Alexandra: Cloud service is more important than the privacy aspects of Apple.
Ryan: Apple is a great company and so is Google, but–
Alexandra: The cloud capabilities of course are promising I could imagine. The last question, forest or beaches.
Ryan: Well, I’m in Sweden. I can’t say anything against the forest because some trees out here that will hear me. Definitely, I prefer the sunshine.
Alexandra: Let’s hope that’d be soon we’ll be able to travel to beautiful beaches again, but thank you so much, Ryan. It was really interesting to talk with you and we think there’s so many valuable key takeaways for our listeners. Thank you very much for taking the time.
Ryan: Of course, thanks for having me.
Jeffery: Ryan is awesome and Spotify sounds like a really fun place to work. We can learn a lot from how they do things over there.
Alexandra: We sure can. Let’s pull together our most important takeaways, shall we?
Jeffery: Let’s do it. Number one, sort out your data infrastructure before you start the actual data science. It’s very difficult for a data scientist to work in an environment where there is no data or it’s difficult to access. Set your team up for success by auditing the data you have and figuring out a smart MVP approach to your infrastructure.
Alexandra: Good points. Number two, a single business-minded individual contributor who is passionate about making your organization data-driven can have a bigger impact than signing on huge consultancy firms. The seasoned data scientists can automatically embed best practices in your team and she or he gets your business stakeholders to ask better questions, which is one of Ryan’s secrets to really get value out of data science.
Jeffery: Takeaway number three. We all know that machine learning is great for personalized recommendations and a technology that’s obviously here to stay but still, many organizations struggle to move from machine learning pilot projects to ML in production. To change that in your organization, Ryan shared some great places to start. Make data accessible, increased data usage through cloud migration, and automate the data access process.
Alexandra: Wise words from Ryan. Take away number four, data protection is a true differentiator. You don’t need to use personal data in most services and working with fake users is a great option too.
Jeffery: Indeed. If any of you business leaders are listening right now, make sure you empower your data scientists to work with good data. If you don’t have a data scientist yet, make sure you hire one with specific domain expertise.
Alexandra: That’s the way to go. We hope you found this episode as interesting as we did, see you next time.
Jeffery: If I may ask our audience real quick, if you could take a moment to give us a review or subscribe to our podcast, especially on Spotify, we’d be most appreciative.
Alexandra: Absolutely, goodbye.
The Data Democratization podcast was hosted by Alexandra Ebert and Jeffrey Dobin. It’s produced edited and engineered by Agnes Fekete and sponsored by MOSTLY AI, the world’s leading synthetic data company.