And is it really possible to securely anonymize the location data that is currently being shared to combat the spread of COVID-19?
To answer these and more questions, SOSA’s Global Cyber Center (GGC) invited our CEO Michael Platzer to join them on their Cyber Insights podcast for an interview. For those of you, who don’t know SOSA: it’s a leading global innovation platform that helps corporates and governments alike to build and scale their open innovation efforts. What follows is a transcript of the podcast episode, but if you prefer to watch the video you can access it on the GGC blog.
William: Wonderful, now Micheal, when you think about the broad array of cybersecurity trends that are unfolding today – ranging from new threats to new regulations – what is really top of mind for you in 2020?
Michael: Thanks for having me! We are Mostly AI and we are a deep-tech startup founded here in Europe while preparing for GDPR. Very early on, we had this realization that synthetic data will offer a fundamentally new approach to data anonymization. The idea is quite simple. Rather than aggregating, masking or obfuscating existing data, you would allow the machine to generate new data or fake data. But we rather prefer to say “AI-generated synthetic data”. And the benefit is, that you can retain all the statistical information of the original data, but you break the 1:1 relationship to the original individuals. So you cannot re-identify anymore – and thus it’s not personal data anymore, it’s not subject to privacy regulations anymore. So you are really free to innovate and to collaborate on this data – but without putting your customers’ privacy at risk. It’s really a fundamental game-changer that requires quite a heavy lifting on the AI-engineering side. But we are proud to have an excellent team here and to really see that the need for our product is growing fast.
William: Very interesting! Now, we know that location data is among our most accessible PII – we kind of give it out all the time via our mobile device. In the wake of the coronavirus, we are seeing calls to use our location data to track the spread of this pandemic. Is it possible to really effectively anonymize and secure our location data? Or can this data just be reverse engineered? Could using synthetic data help?
Michael: Yes definitely, and we are also engaging with decision-makers at this moment in this crisis. Location data is incredibly difficult to anonymize. There have been enough studies that show how easy it is to re-identify location traces. So what organizations end up with is only sharing highly aggregated count statistics. For example, how many people are at which time at which location. But you lose the dimension at the individual level. And this is so important if you want to figure out what type of socio-demographic segments are adapting to these new social distancing measures, and for how long they do that. And is it 100% of the population that’s adapting, are social contacts reducing by 60% or is it maybe a tiny fragment of segments that is still spreading the virus? To get to this kind of level to intelligence you need to work at a granular level. So not on an aggregated level, but on a granular level. Synthetic data allows you to retain the information on a granular level but break the tie to us individually. We just, coincidentally, in February wrote a blogpost on synthetic location traces – so before the corona crisis started – because we were researching this for the last year. It’s on our company blog and I can only invite people to read it. Super exciting new opportunities now to anonymize location traces!
William: That is exciting – and it sounds as if it could be very helpful, especially given what we are all going through! Now, Micheal, there is an expanding list of techniques to protect data today; from encryption schemes, tokenization, anonymization, etc. Should CISOs look at the landscape as a “grocery shelf” with ingredients to be selected and combined or should they search for one technique to rule them all?
Michael: Well, I don’t believe that there is a one-size-fits-all solution out there. And those different solutions really serve different purposes. It’s important to understand that encryption allows you to safely share data with people that you trust – or you think that you trust. Whether that’s people or machines, at the end, there is someone sitting who is decrypting the data and then has access to the full data. And you hope that you can trust the person. Now, synthetic data allows you to share data with people where you don’t necessarily need to rely on trust, because you have controlled for the risk of a privacy leak. It’s still super valuable, highly relevant information. It contains your business secrets, it contains all the structure and correlations that are available to run your analytics, to train your machine learning algorithms. But you have zeroed out your privacy risk! In that sense, synthetic data and encryption serve two different purposes. So every CISO needs to see what their particular challenge and problem is that needs to be overcome.
William: Well Michael, we’re coming up on our time here. Are there any concluding remarks or anything you would like to add before we hang up?
Michael: Well, we just closed our financing round so we’re set for further growth both in Europe as well as the US. We’re excited about the growing demand for data anonymization solutions, also for our solution. Happy to collaborate with innovative companies, who take privacy seriously. And of course, I wish everyone best of health and that we get – also as a global community – just stronger out of the current crisis.