The synthetic data future with Tobi Hann, CEO of MOSTLY AI

Alexandra: Welcome to the 26th episode of the Data Democratization Podcast. I'm Alexandra Ebert, your host and MOSTLY AI's Chief Trust Officer. Today, I have a fellow MOSTLY on the show, our Chief Executive Officer, Tobi Hann. In his life before MOSTLY AI, Tobi was already a serial entrepreneur who co-founded and scaled numerous companies.

Besides his thorough understanding of the business, and the startup world, he's a self-professed data nerd, with a deep interest in machine learning and number crunching for actionable insights. In today's episode, we will talk about the future of synthetic data technology, how the market is evolving, and of course, about MOSTLY AI's recent $25 million Series B investment, and which exciting new steps are now ahead of the company. Without further ado, let's meet Tobi.

Welcome, Tobi. It's such an honor to have you here as a guest on the Data Democratization Podcast. Today, I want to discuss the role of synthetic data, the vision of synthetic data both from MOSTLY AI's point of view, but also what you see on the market, and of course, the big news everybody has seen, our series B investment and also your advice on finding the right investor.

Before we discuss all of these topics, could you briefly introduce yourself to our listeners? How did you become the CEO of MOSTLY AI? What's your journey and what motivates you to do the work you do?

Tobi: Hi, Alexandra. Great to be here on the podcast, and thanks for having me. My name is Tobi. I'm the CEO of MOSTLY AI. I have a background in business. I studied business in Austria and in the US. I worked in management consulting, I worked in different startups, in different roles. What was always part of my journey, and what I'm really excited about is seeing companies grow and develop and scale. I got to know the founders of MOSTLY AI pretty soon after they started the company, and we started discussions. I ended up joining, in 2019, as the Chief Operating Officer. You were there. We were only 10 people or so, right?

Alexandra: I can remember, yes.

Tobi: It was early days. I then took on the CEO role in 2020, and it has been a great journey so far, and obviously, lots of exciting things ahead. It's great to be at MOSTLY.

Alexandra: I can only agree to that definitely. That has been a great start back then in the tiny office and now to see how the company is evolved and what's happening in the market. Definitely something I'm super excited and pumped about. Maybe let's talk about the funding round. I already mentioned it earlier, MOSTLY AI just a few weeks ago has announced our Series B funding round, where we got a $25 million investment, which is the biggest investment round four synthetic data for European companies so far. Maybe for those who are not yet that aware of synthetic data and its role it has in the market, why is there such a hype around this technology?

Tobi: You're right. We're super excited with our Series B, and very grateful for all the support that we got from our existing investors and new investors. It's an exciting space to be in. Synthetic data is, as a category, it's a newly created or a new shape in category. There's a lot of buzz and hype around it. Why? Because it has really the fundamental possibility to change how companies work with data going forward. Our strong belief is that in the future, every company of a certain size will employ and use synthetic data in their data stack.

This is, I think, recognized more and more throughout the industry, and with the companies that we speak, they're more and more looking at synthetic data, but of course, also in the investment space, because it really will be a global category that's starting to evolve here. That, of course, will be a big business opportunity.

Alexandra: I can only agree. When you talk about synthetic data as its own category, I'm just thinking about some out there who see synthetic data in the category of privacy-enhancing technologies or emerging privacy-enhancing technologies. Do you agree with this categorization or do you see that there's something beyond privacy when it comes to synthetic data?

Tobi: When we started the company, obviously, privacy was at the forefront of the idea. Creating synthetic data, a data copy of existing data that is fully anonymous and privacy-preserving. That, of course, is still one of the major benefits of synthetic data. However, what we've also seen over the past couple of years is that the power of synthetic data, the value that you can get from synthetic data, actually goes beyond that.

In the past, we said that synthetic data is as good as real data. Now, we're saying that synthetic data is actually better than real data. What do I mean by that? Synthetic data, the way we understand it, it is generated with machine learning algorithms. It can be very complex, very representative of certain input data, but since it's created artificially, with mathematical and statistical models, it can actually be shaped and modified during the synchronization process. You can create synthetic data that not only represents existing datasets, but that represents entirely new worlds.

That allows you to, for example, correct for biases, create fair synthetic data, or create much more data when you only had a limited number of data points, or, for example, augment data and fill gaps in data. Gartner is predicting that in the next year, the majority of data that will go into machine learning models will actually be synthetic data, because it's just the more flexible, the more relevant data that can really help organizations to speed up many of their initiatives with data, for example, AI, machine learning, but also testing as an example.

Alexandra: Definitely. This is also part of the reason why we are so convinced that synthetic data will play a significant role now with everything in regards to responsible AI, ethical AI, but also, as you mentioned, this imagination part and being able to tweak and change data for so many companies just in the last few years with the pandemic, experienced the issues if the environment is suddenly changing, and all the historic data you have does not accurately reflect these changes.

Lots of potential ahead for the synthetic data category to say so. One other question I have for you, as a startup and as a scalar, finding investors, of course, is part of the game, but it's always also a strategic decision. While the majority of our listeners are from enterprise organizations, for all these startup CEOs out there, what's your advice on finding the right investor? What do you have to consider?

Tobi: Great question. There's a lot of capital out there in the market right now, and a lot of VCs have raised lots of money that they're looking to deploy. There are also great opportunities out there. There are many really cool startups and scale-ups out there. My advice would be, think about, who is an investor that gets your business and the challenges that come with your business?

Of course, if you're a B2B company like us, if you're an enterprise software, that's a very different kind of business than being a B2C mobile app, for example. Clearly, it makes sense to engage with investors that understand what you're doing, that have seen the challenges of companies in that space have or in similar spaces have that have portfolio companies in that space. Find someone that truly gets what you're doing, and also, and I think that's really, really important, believes in your vision, believes in what you're actually trying to achieve.

I can say that we were very lucky that with all our financing rounds, so far, we were always able to find those kinds of investors and have supporters that really share this long-term vision that we have for synthetic data.

Alexandra: Sure, that the future is going to be synthetic. Maybe to dig deeper here, if you say, it's important that investors get the business context you're operating in, why is that important? What role does an investor play or can play in the success of a startup or scale-up?

Tobi: Money, obviously, is one part. That's the common denominator that you would get from investors, but when you work with an investor, in a startup, and also, especially if you're the CEO, you interact with your investors on almost a daily basis, maybe sometimes weekly basis. It's a very close interaction. What's really important is that, first of all, you enjoy working with investors, you enjoy working with the people on their teams, but they can also support you with many different topics that goes well beyond the money. It can be, for example, recruiting, helping with introductions to certain service companies like executive search firms, helping with introductions to portfolio companies. It's always really valuable, I think, to have conversations with other CEOs and other scale-ups maybe a little bit earlier or a little bit later in the process, but there's so much learning from that. You really want an investor that not only gets your business but that is also like a true [unintelligible 00:10:13] partner and helps you on your journey.

Alexandra: That makes sense. You just mentioned that it's valuable for you to also interact and engage with other CEOs of startups and scale-ups. What are the different domains where you find the opportunity to take inspiration and learnings from? Is it super narrow with the privacy-enhancing technologies and data science-focused services or have you experienced it also areas that are a little bit further away from our core business have had helpful insights and learnings for you?

Tobi: I think there's something to learn from really everyone and all the different startups and scale-ups out there. Of course, when I have conversations with others that are also in the B2B software enterprise domain, there's a lot of overlap and a lot topics that we can talk about. Even if I talk to a B2C crypto startup, I've had great conversations because for example what really also is a common theme is people and the team and how do you scale the team? How do you attract talent? All those kinds of things. That's a very common topic. I think there's benefit in having those conversations, and it can be very broadly with different industries and different themes that those companies are working for.

Alexandra: I can only agree. I think this is also so nice to see that you really are one of these learning CEOs who brought so many experiences to MOSTLY AI when you join the company, but you're definitely also a role model for everybody when it comes to constantly learning, getting new insights, and better understanding how we can improve the different areas we are interacting in and challenges we have to figure out as a startup. That's really something I find super positive.

Another question that you potentially are already tired of getting asked and already have been asked quite a few times in the last few weeks, what's the plan for the investment? How do you plan to use it for Mostly AI? What are the big next steps ahead?

Tobi: One thing clearly is growing the team. We already have a fantastic team of close to 40 people, but we also see the limitations. There's so much more we can do and will do and we need to bring on the right people for that. We'll schedule the team across the board. We're hiring across the board, but more on the technical side of things and the engineering data science space, but most certainly also on the sales and operational side of the business.

We're also going to really put a focus on the US market opportunity. It's a huge market. Companies are really eager to work and innovate with data. We know this from our conversations, we know this from our initial clients that we have there. For us, it's going to be a focus. That's basically the two main things. People and the US. Then we'll invest in the product. We have many great ideas and topics that we want to address, needs that we see out there that can be served with our synthetic data platform. We'll heavily invest in the product too.

Alexandra: Exciting times ahead. For everybody listening, I think Tobi just said it, no matter which area you're in, if you're inclined by our vision and mission to make the future fairer and smart and help people to really work differently and more ethically and more privacy-friendly with data, then it's quite likely that we potentially have a job out there that you should take a look at. Just a little bit of advertising here for us on the side. When we talk about the business value of synthetic data, what are the elements that stand out for you the most?

Tobi: Synthetic data brings a really a wide range of benefits to organizations. I think it's sometimes, even more, the challenge to say, "What's the one big one?" First of all, with privacy comes the security and potentially avoiding fines, GDPR fines, and things like this. In that sense, it's a little bit of like an insurance almost. Then, of course, with much more automated processes when synthetic data is generated, it's so much faster and less labor-intense. You can free up resources. You can save time and cost. It is something that helps you on that side as well.

Thirdly, and I think that's the part that I'm most excited about is it really is an enabler for innovation because synthetic data allows you to unlock data assets that really, in the past, you were just not able to work with, both internally and externally. That's really the exciting part about synthetic data, I think, because it will just allow, in general, so much more to leverage data sets to help with data democratization. That will really bring fundamental new innovation to many companies and organizations.

Alexandra: Definitely. This always brings to my mind the story one of our clients from the banking sector once shared. When they first started out with synthetic data, they, as you just mentioned, had access to a granularity of data they weren't used to before. They found, in that case, since it was a bank, some [unintelligible 00:15:40] in financial transactions and in income streams where they were like, "There must be a bug in there. Nobody would have income like that, or income streams different income streams like that. Nobody would spend this money like that."

We were like, "Quite likely it's not a bug. Please check back." Then they went through the whole process to evaluate the real date. Indeed it was the case that the customers had behaviors they weren't even aware of and couldn't think of. Now, with synthetic data, the level of customer understanding has deepened so much, which of course translates to a whole new level of customer-centricity. I think this is one of the very big advantages synthetic data will bring to more and more businesses in the future.

Tobi: Absolutely. I think it's a great example of internal data access and internal democratizing data access. You think about it, opening up also towards external partners like startups, universities, research partners, and those kind of things. Great things are really bound to happen when that takes place. It's only really the beginning that we see currently.

Alexandra: Definitely, I think you would, and away beyond the business value and come into the value for the research community and also society and large. If you think of all the valuable data resources we have out here, but of course, due to privacy reasons, can't be openly shared, synthetic data is just the right tool to, as you said, open up access to data. Therefore I see huge promise and potential to help our society find new solutions for cancer, Alzheimer's, and many other diseases.

Just thinking of a conversation here I had a few years ago with a Alzheimer's researcher who said, "The problem we have with Alzheimer's is not the funding or not the knowledge, but it's really the data access because this disease comes into existence decades before you experience the first symptoms, but nobody gives out the data due to privacy concerns." I really have huge hopes for synthetic data to not only transform how businesses interact with data, but also how we, as a society, can benefit from it.

Tobi: Absolutely. I think that's also one of the beauty of working in this space. What we are doing is really fundamentally having a positive impact. A positive impact on how data is handled and how it's being worked with data in general, but also the potential outcomes of this. Research projects that you mentioned, I see great potential for that. We recently concluded the research project on the European level with cancer data and it's great to see how synthetic data can really help here making it easier for parties getting access to health data. We have those conversations on many levels. There's so much potential as you think about Corona and COVID data.

I think it's sometimes just a little unfortunate that things take time. It is such a new technology, such a new category. Especially in the public space, in the healthcare space, it's very sensitive data. It still takes some convincing. It takes some time. We see in the future great potential in that space for sure.

Alexandra: Definitely. I think it's also synthetic data is a strategic topic. When I think of our clients, those who are really the senior levels and C levels think about synthetic data and the impact it can have on an organization, they are much further down their synthetic data journey and it's just stunning to see the different projects they're working on. Maybe if we talk about the business side again and the use cases. We all know there are plenty of use cases that can be realized with synthetic data. Do you have one or two favorite business cases use cases that you can think of and share with our listeners?

Tobi: You mentioned one in the space of product development having access to granular data that really allows an organization to fine-tune their products and develop new machine learning algorithms. I think that's a big one where we think that, also when you open up those data assets to a broader audience within an organization, great things are bound to happen. I think the second one that we see also an increasing interest is software application testing.

If you think about, a lot of companies are, these days, moving to the cloud or implementing new IT systems, migrating systems, replacing old systems. Developing software, almost every company, to some extent, is also developing software. For all of that, you need data to test those things. In the past, organizations would often use actual production data. It's now clearly understood that that's not the way to do it. There's great potential there. Really having representative data that can be a drop-in replacement for actual production data in testing environments, we see great potential for that. I'm excited for that use case.

Alexandra: Me as well. Maybe another question. We have world-class researchers, especially in the field of data science, as part of our team. Of course, they focus a lot on the aspects of deep learning and how to tweak things in regards to synthetic data here, and then, of course, privacy is a huge topic. What else is keeping our researchers busy at the moment?

Tobi: There are a lot of topics that we're busy with, some are smaller than some are larger. One area that's of particular interest, and where we see also a lot of potential is fairness and biases of data because most often, existing data out there has biases and is, to an extent unfair. What you can do with synthetic data is you can actually correct for these biases, you can create fair synthetic data. It's quite a complex topic. There are different concepts for fairness, it's not easy to define what is actually fair. Our team is currently spending quite some time on those kinds of topics.

Alexandra: That's also an area I'm personally super passionate about. I'm also happy to see, in my day to day business, I speak quite a lot with regulators and they are also really pumped about this idea of being in a position where you don't only have to repeat historic mistakes or biases in that case but really can have a normative impact on our future, which more and more will also be influenced by machine learning and therefore they find it super valuable to be in a position where you can impose human values and adjust some past mistakes in data and have something that's fair according to your current understanding.

I think, in the future, this is definitely also being something that will be widely used, and I can't wait to see what we as a company can contribute here. As a last question for you, of course, it's a startup, the long-term vision is something that's sometimes harder to answer as a company that has been around for decades maybe. How do you see synthetic data as a technology? What's your long-term vision for synthetic data and, of course, also, for MOSTLY AI?

Tobi: As I mentioned before, our strong belief is that, in the near future, every organization, every company of a certain size will use synthetic data in their data stack. They will use it for several reasons. When we started the company, our initial focus was certainly around privacy and creating one-to-one copies of existing datasets that were fully private and fully anonymous. That's going to be still a big driver for the uptake of synthetic data. We already touched upon fairness and modifying data, existing datasets to not only create one-to-one copies of data but modified data. That's another big area.

The long-term vision really is for synthetic data to evolve beyond that, evolve into something where you create entirely new data worlds of data that you didn't have before. It is machine learning generated synthetic data, which means it's more flexible, you can create worlds that weren't there before. We think that it will not only enable or accelerate the adoption of AI, but it will also help with the whole aspect around expandability of AI because, often with AI, these days we have the problem or the challenge that some are like a black box and it's difficult to explain.

In order to explain the AI models, we believe that you need data. You need to have data to do benchmarks and AP tests and so forth. That's going to be synthetic data. Everything around AI, explainable AI, trustworthy AI, that's also a big long-term vision that we see for synthetic data.

Alexandra: Agreed. I think this is also already now something that regulators are interested to learn more about because also in the context of ensuring fairness of a system, you can't only assess the code. You need to have data to see how would it treat certain individuals compared to others. We all know historic data sometimes doesn't have this richness and diversity, and by having your human imagination to create examples that could help them but potentially weren't seen by the algorithm, this is also something that could help a regulator or even a company to better understand, is my model truly fair or do I have some hidden biases that are not yet aware of?

Plenty of applications for synthetic data to better understand machine learning and foster explainability as well. Tobi, thank you so much for everything you shared. It was a pleasure to have you on the show. I'm already looking forward to one of our next episodes, maybe in a few weeks or months' time. Thank you very much.

Tobi: Thank you so much for having me and great job, Alexandra. Thanks for hosting this podcast.

Alexandra: Sure, it's a pleasure.

I don't know about you, but I personally can't wait for the future of synthetic data to arrive. In particular, in regards to data democratization and open synthetic data for medical research and the betterment of society. What is your point of view on synthetic data? What were your takeaways from the conversation with Tobi and what do you envision for the future of synthetic data? That's something the Data Democratization Podcast team and I are super curious to hear. Let us know your thoughts by commenting on LinkedIn or writing us an email to podcast@mostly.ai. Until then, see you next time.

The synthetic data future with Tobi Hann, CEO of MOSTLY AI

Transcript

Ready to start?