💡 Download the complete guide to AI-generated synthetic data!
Go to the ebook
Episode 35

Striking a balance - a conversation about privacy with Meta's Pedro Pavón

Hosted by
Alexandra Ebert
A fascinating conversation with behind-the-scenes glimpses of Meta's take on data privacy with Pedro Pavón, the Global Policy Director responsible for monetization, privacy, and fairness. Meta has been the focus of attention for its role in society and the public discourse. Pedro sheds light on some of the challenges and solutions social media companies need to tackle to preserve privacy, facilitate research and mitigate impacts on democracy. In this conversation, we'll talk about:
  • Social media challenges: fake news and its impact on democracy,
  • how to balance individuals' right to privacy and the collective interest, 
  • how natural language processing and voice commands help the illiterate,
  • personalization and media bubbles,
  • how to research news behavior without endangering individuals' privacy,
  • how the massive amount of available information and shorter content formats impact people,
  • AI fairness in the recommender and advertising ecosystem,
  • the role of privacy professionals and philosophers in AI development,
  • how to define fairness and the technical challenges of definition,
  • how can fair synthetic data unlock sensitive attributes,
  • privacy enhancing technologies to facilitate fairness without creating tension with privacy,
  • the future of synthetic data as a privacy-enhancing technology for AI fairness,
  • how far are we from federal privacy regulations in the US,
  • the challenges of data anonymization and reidentification,
  • how can synthetic data help with measuring data privacy,
  • the role of privacy professionals in the metaverse and the new privacy issues that will need to be taken care of.

Transcript

Alexandra Ebert: Hello, and welcome to Episode 35 of the Data Democratization Podcast. I'm Alexandra Ebert, your host, and MOSTLY AI's Chief Trust Officer. It's great to be back after our summer break, and I can promise you we have an outstanding lineup of guests for you, the first one today being no one less than the charismatic and also inspiring Pedro Pavón.

Pedro is the global Director of Ads and Monetization Privacy, as well as Fairness, at Meta. If you join us for a conversation today, you will learn more about the benefits but also the open research challenges of social media ranging from fake news to media bubbles, to figuring out how this wealth of information we nowadays have at our fingertips is changing how we learn and interact with content.

We also talked about AI fairness, how to approach it, why there is a tension with privacy, and why also Pedro pointed out that he thinks privacy-enhancing technologies like synthetic data, for example, will be critical to facilitate fairness in algorithms. Of course, we also discuss the state of US privacy laws and why Pedro is less optimistic than some others that we will soon have a federal bill in place.

Besides that, he also shared his point of view of the FTC statement that we saw before the summer where they warned data brokers and data owners to not share something that is "anonymous data" when it is in fact not as anonymous as you would like it to be. Lastly, we looked into the bright future for privacy professionals with topics like blockchain and metaverse privacy. Pedro also shared a few pieces of personal wisdom that I'm sure will not only benefit the privacy pros listening today but in fact, everyone, so definitely an episode you don't want to miss. Let us dive in.

Alexandra: Welcome to the Data Democratization Podcast, Pedro. It's wonderful to finally have you on the show. Very great to see you here today. Before we jump right into our topics today, could you briefly introduce yourself to our listeners and maybe also share what makes you so passionate about the work you do, particularly in the context of privacy and fairness?

Pedro Pavon: Awesome. Sounds good. Thank you for having me, Alexandra. This is a big honor for me. I've followed your podcast pretty closely and I'm a big fan. A little quick history about myself. I work at Meta. I run the Monetization Privacy Policy team. That's fancy talk for a public policy team focused on supporting all of Meta's revenue-generating products and services. Just think ads, think messaging, think metaverse monetization, think e-commerce. These are the types of areas that my team supports.

Before coming to Meta, I was in-house at a couple of other really great companies, Oracle and Salesforce, where I did similar things on legal teams and ran legal teams. Most of the work was oriented around privacy but also some commercial stuff and some M&A stuff. Before that, I was in private practice. My first couple of jobs out of law school were actually public service jobs at the Justice Department and at the Department of Energy here in the states. Before all that, I guess I was a baby. I was a baby in '80s and I just ran around playing with my friends.

Why I'm passionate about the work I do, I think it's two things. One, I think individual agency and autonomy has a lot to do with your capacity and ability to exclude others from your innermost thoughts and from your intentions sometimes, and just from your general activities. I'm a strong believer in the chilling effect of gaze upon behavior and making sure that people have spaces where they can think authentically, act authentically, behave authentically is important, and preserving that matters a lot to me.

All that said, balancing all of those human individual interests with collective interests of how much society can benefit from information sharing and from personalization is really important to me as well. I think the democratization of information has been one of the most powerful forces for good in the last 10 to 15 years. I know some of the negative aspects of it get all the headlines, but there are a lot of really positive things that come from people having access to more information, especially when it's of high quality and of high fidelity, meaning it's the truth and it's factual.

Maintaining systems, products, and flows of good high-quality connection and information amongst people who don't have access to a lot of resources besides the internet or a cell phone or whatever is really important to me. I think it gives voice to a lot of communities that have never had a voice, and it gives access to a lot of people who otherwise wouldn't have it, to information banks and education and knowledge. Preserving the free no-cost access to all of that is something that has always been really interesting and important to me and part of why I joined Meta and do the work that I do. I hope that explains me in a quick nutshell.

Alexandra: It definitely does and I can very well understand now why you're so passionate about your work because particularly also now thinking about education. When I was in I think it was starting of college, I did a research project on also the impact of AI systems that help people access information, translate data, and translate languages, and so on and so forth. Even in Austria where I'm based, we have a very high rate of illiterate people.

Simply having the ability to use voice gives so many more people access to information, access to knowledge, and I'm particularly hopeful when it comes to the advancements that we see in translation from language to language, that this will help significantly in parts of the world where there simply is not this kind of educational system that we have in many areas of the Western world. [crosstalk] Definitely it's so important that this work is happening and that there's this way where you can have access to free information in a much faster way than any state could build it up themselves.

Pedro: I agree. Not just faster than any state can build, but also independent of a state's notions about what's the right information to share. We see what's happening in Russia and other parts of the world where governments dominate the narrative. I'm not sure that is a healthy way to disseminate knowledge through propaganda or information colored through political agenda. Does that happen everywhere? Of course. Does it happen more so in places where the government has strict control over what you can read, what you can say, and what you can hear? Yes.

The internet is a great tool for combating some of those challenges and it creates a lot of friction and a lot of thrash around the world, I get that, but I think ultimately, knowledge can help us prevail.

Alexandra: I agree, but as you mentioned, of course, it's not frictionless, there are challenges. How is your experience on how to balance or what are the steps to mitigate the risk when it comes to, on the one hand, allowing people to access free information and the information and the truth that is there without governments having a say on what type of information people in a different, or a part of the world or some parts of the world can access, versus what we've heard in the past few years when it comes to fake news?

People being in their media bubbles and so on and so forth, which also created a new situation for also democracies because we went from traditional media where everybody was reading the same five newspapers to a much more personalized world where we have things like fake news to deal with, but also topics like, okay, you have your own media bubble and only see the content that you interact with. How to balance these two things, or what are things to keep in mind when we look at this friction between the two topics?

Pedro: What's interesting to me is that the idea of a bubble and meaning I'm only reading or accessing information and content that reinforces my worldview or whatever, is something new, is not real, not true. If you read The Washington Times or The Washington Post, you're being given information in different ways tailored to different audiences. That's been in existence since the first newspaper was published-

Alexandra: Sure.

Pedro: If you only watch cable news, depending on which cable news network you watch, your information is going to be tailored in one way or another. I think what the internet and particularly social media can do is personalize that to the individual. Where Fox News and MSNBC do it to a community and you choose to go there, you go on social media and it's being tailored to you as a person. It's designed to stimulate you and your interest based on your actions and your specific behaviors, and that tends to scare people more.

The question I ask is, what difference does it make if it's based on my personal information or the information about a community I'm a member and participant of if the content I'm getting is the same? What we're actually worried about is the fake tainted content. Let's focus on thinking through how we vet information so that people understand the fidelity of what they're consuming, right?

That's not the same thing as censorship. I'm not saying don't let people say whatever they want. What I'm suggesting is if you want to say cockamamie things not based on fact, flagging that the validity of what you're saying is being challenged for the person consuming it is a very powerful signal to send to that person, regardless of how the information's getting tailored and ending up in front of them.

I think it's really important to think about content first and then personalization second, versus we always think about it, which is the personalization itself that's creating the problem, but that's not true. I think it's part of the problem, but I don't think it is the catalyst for the whole thing. I think the content itself is and so we have to think really carefully about how we improve systems that give people signal about the value, about the integrity-- [crosstalk]

Alexandra: The quality of the content and integrity.

Pedro: The quality and integrity of the content they're receiving in ways that is going to be meaningful for them and doesn't trigger them into thinking this is just an agenda to get me to change my mind. Because that's another trap of when it comes to content, right? Like if I say to you, you're watching your favorite YouTube guy talk, whatever he's saying, and there's something under it that says, hey, the truthiness of this content is under investigation or whatever, that can be perceived as an agenda-driven message itself, right?

Figuring out how to give people information and facts about the information they consume so that they can make better choices, turn it off, turn it on, listen to it anyway, listen to it differently and receive it I think is an important undertaking, and I know lots of companies are thinking about how to make this happen. At the same time though, better understanding the effects of personalized content and ads on individual humans is research that has to continue and work that must continue to be done. Because we need-- I don't think the world fully understands how individual personalization is affecting in people really, and so continuing to investigate that is really important and make sure we're not…

Alexandra: …overlooking something.

Pedro: It's really, really important, exactly. Exactly.

Alexandra: Understood. Well, it absolutely makes sense, and to see that you point out that fake news or the quality of the content definitely is the bigger topic to address. One other argument I've heard which has a different point of view. I once had Paul Nemitz on the show. He's an advisor to the European Commission. Prior to that, he was the godfather to GDPR who brought us the level of privacy protection and regulation we have in the European Union that way. He wrote an excellent book that I can only recommend to everyone. The problem is it's currently only available in German and I think we're still waiting on the English version of it, where he lays out the impact of modern communication, internet communication technologies on democracy, and many other parts of societal living.

I think his argument was that he's concerned about this high degree of personalization for that reason that people lose the sight and the oversight of the information that's available on the spectrum as a whole, and therefore when people have arguments which come from opposing points of view, they really lose this capability to understand, okay, where's this other person coming from. They must be nuts because what they are referencing is something I've never seen or heard even in a peripheral way when looking at the news I receive. He was like, okay, it would be great to have more content that's overlapping so that people could better discuss things because he says that this is the pre-requirement for democratic reasoning that people have this baseline of where they're discussing.

Another argument I've heard that was quite interesting from actually a researcher that looked at privacy over the past hundreds of years, and lots of her work is happening by analyzing in the early or not too early days newspaper texts books, and so on and so forth and in the very early days' scripts you have from churches and so on and so forth, so it was publicly available information.

The point she made was that with this high degree of personalization, plus the lost ability to have 10 newspapers that you have to monitor and for example, archive as a national library, you as a researcher lose access to all the information that people are exposed to, which in the future will make it potentially challenging for researchers to do what you just pointed out, researching how personalization affects individuals and how different information streams in fact affect behavior, and so on and so forth because it's so highly personalized that it's not publicly available what type of information I saw in my social media stream. These were some of the other points of views I've heard on that, but I'm not the deep expert on the topic, so curious to hear also your thoughts.

Pedro: I think, first of all, I don't necessarily understand that last point well, which it's got to be hard to understand or research how people are being swayed by the information they're given because we don't have access to the information. I think that's almost always been true. I get the, you can't see my timeline issue but also, 150 years ago, you weren't at the saloon where I was getting my information at the table. We look to newspapers as primary sources under the assumption that, I don't know, 150 years ago, most people were reading newspapers. I just don't think that's true.

I think people have always gotten their information through these informal pathways, now one of which is social media and feeds on the internet and TikTok and Instagram and whatever. It's just a new venue with a lot more information, and that's what I think we have to look at, which is how does this access to much more content than ever affect people. Because we do have a lot of data around and we need to understand better because that is what is certainly universally happening to everyone, which is you have more access to content on your fingertips than any human being before you even 10 years ago, 15 years ago, a hundred years ago, definitely 200 years ago, certainly 500 years ago, obviously, right?

You have access to all of this information. How does that reshape your brain? How does that reshape your ability to be critical? How does that reshape your ability to be skeptical? These are really important questions that we don't understand. What I know for sure is there's no going back. I look at the generations of people coming behind me. I think I'm an old millennial, right? When I look at Gen Z and Gen Y and all the folks coming after and I listen and watch them, they learn differently than I do. I learn mostly from books and long-form reading. That's how I learn. I read a long magazine article. I read a book about a topic I'm interested in. I watch a documentary. This is not how a 15-year-old brain is being formed.

They learn in 30-second, two-minute intervals, and for me, I look at that and I go, how could they be absorbing anything? The superpower they have that I don't have is an ability to toggle that is so incredible that my brain will never be able to do. For example, they can go through 15 independent TikToks in 10 minutes and absorb interesting nuggets of content and memory from that exercise that I can't physically I don't think my brain is wired to do. How they're going to spread knowledge, information, and interpret content is just going to be different. Trying to study it from the perspective of how I learned I think is a bad move. I think it's a mistake.

I think there's different questions to be asked. How does this new access to information affect people who learned to learn the old way? How does it affect people who are learning to learn in this new way? What are the differences and what are the risks for each? I bet somebody way smarter than me is thinking about this already. I'm sure hundreds of scientists are, and I'm really interested in that research coming to light because that's where I think a lot of the lit light bulbs are got to come from in understanding how this proliferation of content and personalization is affecting people. I think what we're got to learn is that it affects different types of people in different ways.

There's not got to be any magical wand that you're got to be able to wave to make the world a perfect, fair, and objective place, but that's never been true anyway.

Alexandra: That's true. I can only second that. I'm also beyond excited to see what research will tell us in, let's say, 10 years about what they found out on that, and I think the point you just made on also these different age groups and population groups, when you say Millennials, or even younger generations both of those, which haven't interacted with technology for the majority of their life.

I think here, the point that you made earlier in our conversation of also flagging content, and if there's some reason to do that, tell people, hey, with this source, you might maybe want to be more critical and more skeptical, versus this is one of your trusted sources. You maybe don't have to be that critical. I think this will be a very important part because if we look into just the patterns that we see nowadays with focus spans going down and people switching from content to content, to content, I think just by the speed of it, and everybody being overwhelmed with the flow of information, critical thinking and also questioning the source of where the content is coming from, potentially something that isn't done every time to 100%, and assisting here and aiding people here, particularly from all the generations is something that I think can help to improve here.

Pedro: Some of the fears and some of the concerns, they're being applied to a new surface but they're not new. When we transitioned to radio in the early part of the 20th century, I guess, I don't know the exact date, but I think that's right. You can just Google and read about all of the newspaper articles about how radio was going to destroy reading and change people's ability to absorb information and affect school children negatively, and whatever. Then TV comes along and it's even worse.

I think you're old enough to know all the hysteria around TV brainwashing kids and all this stuff. Then video games came along and video games were going to destroy all of the children's brains and turn them into violent zombies that were going to do all these things. Then the internet came along and then it was the internet that was going to destroy my generation and my contemporaries, and now social media is going to destroy--

Look, the reality is human beings are super resilient and we adapt and we learn and we become smarter for going through a hazard period of some risk where we're still trying to figure out how things are affecting us. I think that's where we are right now on social media. I don't love panic button-driven conclusory discussions. I think that they're shortsighted and wrong. That's my personal opinion. I think what is much more productive is to approach it from this perspective, entire generations of people are born into the social media and internet age now. We're not that far removed from all generations being alive, being brought up in that in a world where the internet has always existed. When we get to that point, civilization is going to be different. That is called progress.

Now, with that being said, we should be vigilant and mindful of new risks that new technologies present and that new ways of learning and absorbing information create, but fear-mongering and over-the-top apocalyptic stuff, I just think history tells us that that's not the way things work.

Alexandra: Absolutely. I think also from my personal experience, I work a lot with regulators, particularly on the European Union level. We've seen it in the past few years that everybody was afraid of artificial intelligence and that it's going like pointed out, and terminate or take over the world, take all our jobs, and so on and so forth. This panic and hysteria is actually taking the focus and attention away of those topics that need to be addressed today, like for example, fairness or something like that.

Now also with regulators, some of them haven't even grasped artificial intelligence, others have but they now moved on and say, okay, the future of, let's say blockchain, the metaverse, neurotechnology, and so on and so forth, this is the new bad evil and we have to increase panic and so on and so forth. That's absolutely the wrong approach because, as you just pointed out, we've had all these discussions with all the new technologies that we've introduced even only during the past century.

As we've seen radio, didn't kill our ability to read, TV didn't significantly alter us as human beings. Looking at what needs to be addressed and addressing that in a more rational and calm manner, as opposed to having mass panic and dystopian scenarios about AI taking over the world, anything else taking over the world I think is also the better way to go.

Pedro: Agreed.

Alexandra: Perfect. I absolutely wanted to talk about AI fairness with you, particularly in the context of recommended systems and the advertising ecosystem, because this is something you mentioned or you talked about at an IPP panel earlier this year, and it's something we haven't yet discussed that closely on the podcast. AI fairness is something that's top of mind for our Data Democratization Podcast listeners, but what's particularly about fairness when it comes to recommended systems and the ad ecosystem? Can you elaborate on that a little bit?

Pedro: Look, I think when you have a computer making decisions that affect critical components of your life, whether that be your health or your economic situation or your liberty, scrutinizing those systems is really important. It's interesting to me that some of the most noteworthy early adoptions of AI actually came in really critical, human rights-oriented spaces.

Think like the common story told in one of the great books on this topic called- the title is Weapons of Math Destruction. The author there talks a lot about algorithmic decision-making around people's freedom, like parole boards, and people's ability to be released from jail, and how the algorithm was found to be discriminatory, but all these jurisdictions adopted it just for efficiency and for cost, and in the guise of removing human prejudice, just inserted technological prejudice which was just a reflection of the human prejudice that was built into the system because people made the system, right?

I think fairness is still a concept that we're working out in the context of computational decision-making. That said, fairness is much more complicated than what it sounds like on the surface, which means, are we being mindful of like how the outcomes are affecting all the people that the algorithm's affecting. I think a double click on the discussion of fairness goes something like this, let's say an algorithm makes decisions that affect a community of people fairly 99% of the time. 99% of the time, the algorithm is objectively behaving fairly, but 1% it's behaving unfairly. Does that make it an unfair algorithm or does that mean that we need to blow it up and start from scratch? I think the answer there is, it depends right?

Alexandra: On the use case, sure.

Pedro: It depends. What's the algorithm deciding and what's that impact on the entire population, but specifically, what's the impact on that 1% of the population who is being either put at risk or being treated not as fair as the rest of the population, or just receiving different outcomes systematically because of that algorithm. For example, if it's a matter of, do I get a coupon discount at McDonald's and 99% of the time, it's acting fundamentally fairly, but 1% of the time, it's excluding a group of people unfairly for whatever mathematical reason I don't understand. That sucks and we need to fix that. I don't think we need to shut the algorithm down in order to fix that. I think we try to iterate on it and make it better and figure out what's causing this unfair outcome.

If it's an outcome, and this is a hypothetical, but if the algorithm is making a life and death decision and disproportionately impacting a very small population of the user base, then we got to stop using it until we fix it. You can't use it. A 1% death rate is unacceptable.

Alexandra: Absolutely.

Pedro: A 1% failure rate on something of less consequence might be acceptable, and we have to bring in a lot of smart people to decide whether it is to continue to use the algorithm while we figure out what's causing this discrepancy. There are a lot of ethical exercises that go into deciding how we're going to approach fairness in algorithmic decision-making. I think one of them is how do we develop criteria for deciding absolute baselines for like stopping adoption? I don't know, I'm not an ethicist.

I think a lot of privacy professionals talk about this like they are the experts and I don't think we are. I think we need to turn to philosophers and ethicists and social scientists and psychologists and people who understand human beings in ways that privacy people do not to figure out what the answers there are. Unfortunately, I think we are not doing a good job at that as of, and when I say we, I mean the privacy profession at large. We're not well suited to make these types of societal and civilization-oriented decisions.

Alexandra: Agreed.

Pedro: In the context of fairness, communities identified as being treated unfairly should have disproportionately loud voices in those discussions. If an algorithm is being racist against a particular ethnic group, that ethnic group should have a very amplified voice in deciding how we calibrate whatever technology is creating the disparate impact up to and including whether we need to turn it off. In most cases, in the scenario I just gave probably sounds like it needs to be turned off, but that might not always be the case. I think that understanding that and making sure we have an inclusive conversation about impact effect is important.

All of this is great if you know what the disparate impact is, if you know what the unfair outcome is. I think the more opaque and difficult challenge, and it's somewhat more technical, is identifying the disparate impact, identifying the unfair outcome. If we don't know that it's there, we can't fix it. There's a tension with privacy there, I think, because sometimes being able to see if a particular group of people is being disproportionately negatively affected by something is the need to identify who they are.

Alexandra: Absolutely. Or at least some of their characteristics.

Pedro: Oh, yes, exactly. Or identifying the characteristics that make them who they are. In order to do that, you've got to learn something about them and you've got to save that information and then you've got to use that information. This is a random hypothetical, but if Google or Amazon or Meta said, hey, we want to make sure that communities of color aren't being disproportionately affected negatively by X algorithm that does Y thing. We need to go ask people what ethnic group they're a part of in order to study this as far as part of how the product works, the backlash would be immediate. We know it.

Alexandra: Absolutely. You're speaking to my heart here with everything you just pointed out.

Pedro: Yes. The backlash would be immediate. Maybe the answer is that there's a better way to do this and we need to think about that and that doesn't require us to know who groups are. I am not scientifically sophisticated enough to know whether that path exists, but--

Alexandra: It actually does.

Pedro: Yes, and if it does, then let's go figure out how to make that happen. What is most important though, is that we leave no stone unturned in the fairness discussion to make sure that you know, we aren't setting up entire communities of people up for failure in the name of efficiency or automation, because I don't think that's a good path for anybody.

Alexandra: Definitely. So many points in there in what you just mentioned, Pedro, that I want to follow up on. I'm also with you that I think privacy professionals have a role to play in the fairness discussions, in the responsible AI discussions, but I'm also not a fan of the positioning that we see at some privacy institutions of, okay, privacy professionals are the ones who should own responsible AI and fairness, because as you just pointed out, they're not trained philosophers, ethicists, and so on and so forth. Therefore, I think they bring important skills to the table, but it should be a much bigger, much more inclusive effort within organizations where you have people from all the different departments, backgrounds, and so on and so forth.

Also what you said about having disproportionally large or loud voices for particularly the minorities that are badly affected by algorithms or any decision-making systems, I think is also an important thing to consider because it's still such an early phase that nobody in the world has all the answers to do it. I think one important thing to keep in mind is how can we keep this communication channel open and how could we have capabilities in our products or features in our products that allow users to communicate, to mention, to point out that there is something unfair happening.

For example, I'm thinking here of a conversation I once had with an LGBTQ researcher who was focusing on AI fairness, and one issue he pointed out was that, for example, with Amazon Alexa and the United States, there are sometimes issues for the LGBTQ community because they're misgendered or their friends are misgendered by the device, and there's actually no way to change that or to even let Amazon know, hey, we are unhappy about it. Could you please add a tiny feature that would allow us to set different pronouns or something like that? Just having maybe some simple tweaks in additions to products that would allow particularly those minorities to have the loud voice I think is something particularly helpful in the early days where we are still at when it comes to AI fairness.

Then since you also mentioned this tension between privacy and fairness, absolutely. I actually was asked by the European parliament earlier, or actually, end of last year, to advise them on responsible AI aspect of the upcoming AI Act. There, it was also this tension between, of course, you want to protect privacy. You want to protect the most sensitive information of people about their ethnicity, about their gender, about their sexual orientation, but at the other hand, you want to have algorithms that don't discriminate against that, and state of the art still that you need to know what these characteristics are about different user groups to actually develop an algorithm that is truly colorblind, gender blind, and so on and so forth.

With current laws, actually, GDPR and various anti-discrimination laws, both in Europe, in the UK, and also in the US, oftentimes, AI fairness practitioners are prohibited from accessing the sensitive attributes. They have to operate in the blind and don't even know whether their algorithms are biased. There was also a survey done by I think Microsoft research two years ago that identified this as the biggest challenge. Oftentimes they don't know whether they're biased or not.

This is something that I'm so happy that we with our synthetic data at MOSTLY AI can help with because it allows organizations to have representative yet completely anonymous information, artificial information about their customers, or their training data, which can also have the sensitive attributes in there. This helps when you train the algorithm to set the algorithm in a way that it doesn't take gender or some other aspects into account.

Actually, if this is something of interest for the people listening today, one of our previous episodes two episodes ago was about Humana, the US health insurer, and their use of fair synthetic data where they use synthetic data, not only to access these attributes but in that case to make the data sets fairer and improve historic biases and have something that they can develop their models on and make sure that, in that case, healthcare resource allocation is fair across the different spectrum of individuals.

There are technologies out there, synthetic data, but far is not the only one. Luckily there's so much work happening and therefore I'm positive that we will in the future be in a position to say, okay, privacy, shouldn't be in tension with fairness, but you can actually reconcile both.

Pedro: I think you're right. I think privacy-enhancing technologies are going to be a critical component of balancing privacy and fairness towards fairness if that makes sense. What I mean by that is towards the pursuit of fairness, not at the expense of privacy, right?

Alexandra: Exactly.

Pedro: I don't know that we're fully there yet, and we've got a ways to go, but I'm really optimistic about a lot of the emerging technologies in the PETs space. You just mentioned one synthetic data, but there's a ton, and their application in the context of determining, measuring, and improving fairness. I think they are going to be an essential piece of the toolkit to getting this right. I don't think they solve everything and I think there's always going to be some tension but they should definitely work to improve things. We should pursue those just as aggressively as we're pursuing fairness outcomes because I think they're going to be a critical component to us. Fair, more inclusive technology surface for the world to interact with.

Alexandra: I see. Yes, absolutely. I think that's also the point too to keep in mind that privacy is a little bit more straightforward than fairness. Because if you talk about fairness, you will ask seven people and you will have 10 different opinions on what is fair or not. With privacy or when we look into data, there's not a spectrum, okay, is this anonymous or not? It either is anonymous or it isn't, and this is also something that's a little bit easier to address, which actually brings me to another point that I wanted to discuss with you, which is kind of the state of US privacy laws.

We've seen discussions about a bill that would be on a federal level. We've also had two weeks ago, or I'm not sure when exactly we're going to publish this episode so I'll say at the beginning or mid of July 2022, we had the statement from the Federal Trade Commission about the sensitivity of location data, health data, and that they really want to issue the warning on bad practices of saying something is anonymous and it's not anonymous, and then selling this data to data brokers and so on and so forth. What's your perception, since you're based in the United States, of the current state of US privacy? Where do you think we will be going in the next few months?

Pedro: Yes. I think we're definitely headed towards more strict regulation. That's the trajectory of the world but I think in the US specifically right now, at this moment in time, there are a cornucopia of federal bills, both like an omnibus approach where it would be like one federal unifying privacy bill for the whole United States and some piecemeal privacy bills, particularly folks around children and other discreet topic areas, that are all floating around in the US Congress and starting to get more traction than I've seen ever, which is really good.

A lot of my colleagues here in the states are optimistic about the prospects of a federal privacy law. I still think that we're not as close as other people feel we are to a federal privacy bill. I'm hopeful. My opinion is we need one and we should have one, and as long as it applies equally to companies and organizations so that it doesn't create unfair competitive advantages for company X over company Y and is applied evenly, I think it's a good thing, the idea of federal privacy bill or federal privacy law.

I do see two, I don't even know what the right word is, but like touchstone principle areas of debate that could make it really hard for a bill to pass. One is this idea of preemption, and preemption is fancy American legal talk for, will the federal bill supersede all state laws and govern? That is a very hot topic in the states that is debated vigorously. I'm not sure we've found a political middle ground on that issue yet. I think the other one is private right of action. A highly contested issue where the political parties in the states are very different parts or have taken very different stances that I think will make it difficult for a meaningful bill to pass.

Alexandra: Absolutely. Maybe quickly for our listeners, could you explain private right of action?

Pedro: Not turning this into a law class, simply speaking, it's the ability for private citizens to sue under the law for violations of the law by organizations and companies. Pedro, I believe, my rights under the federal privacy law were violated and in the context in which they were violated in, I can sue as an individual or as a class, meaning like a class action.

Alexandra: Yes, which then would be multimillion dollars in fines if enough people would come together.

Pedro: Exactly. I think that is a highly contested issue, and so the combination of those two make it difficult for me to be as optimistic as some of my colleagues are, which is like the possibility for a law being passed by spring. I'm not that optimistic. What I will say is the debate is-- it's possible. I think these things are not unachievable. I think the art of compromise is something that in the US Congress, is not as well practiced as it might have been historically. Passing a federal privacy bill I think has a cache on both sides of the political aisle in the United States, and so I'm hopeful but I'm not foolishly hopeful.

Alexandra: Yes, it still will take some time.

Pedro: If you asked me, I think whether it would be a good thing or not, I absolutely think it would.

Alexandra: Yes, definitely. It could make life so much easier for--

Pedro: Look at what GDPR has done for Europe in unifying the way the member states think about privacy. Has it made it perfectly clean and simple? No. There's debate amongst DPAs and there's all these discussions about whether enforcement is being pursued aggressively enough. These are all debates that come no matter what happens if you pass a unifying law but I'd rather be having the debates Europe is having now than the debate about whether or not we need a law in the first place. I think we need to just figure out how to make that happen.

Alexandra: Yes, definitely, definitely. Although I must admit that when I saw all this fuss going around about the Supreme Court ruling when it comes to reproductive health and the connection with privacy, and so on, and so forth, that this potentially would be something that would turn the dial and get us a little bit faster to a federal privacy law, but let's see when we will eventually have it.

Pedro: It's possible.

Alexandra: Also, from your perspective, since the FTC is the main enforcer when it comes to privacy on certain aspects, how impactful do you think are statements like what we sort of saw a few weeks ago from the FTC when it came to, okay, clear warning for data brokers and the pure anonymization practices that still are in place in so many organizations, which research has shown over and over again, simply just not anonymous, but still personal data. Do you think this will dramatically change how business is done in many organizations, or do you think it's just loud words, and not much behind it?

Pedro: I think the FTC is down in signaling its intention about what it's going to find acceptable or not, and where it's going to go with regard to enforcement. I commented about this on LinkedIn when I saw the FTC statement on anonymization and sensitive data. Here's my reaction to it, and this is my individual personal reaction to it. I think it's good that the FTC is raising the volume on deceptive practices around companies and organizations making claims about the characteristics of the data that they collect and use.

I do think it's important to be a little bit more nuanced than what I saw because it is not true that a company or an organization that implements an anonymization technique or technology, that they actually have vetted, done a rigorous process of incorporating, and have good faith confidence that it is meeting the threshold that they are articulating it as meeting, but later learn that either something changed in the way technology works. Meaning somebody was able to create a technology that undermines what they implement, or that they made an unknown error or mistake that was not unreasonable, should be treated the same way a company who just lies-

Alexandra: Yes, sure.

Pedro: -should be treated. I just don't think that's the right approach. I would have loved to have seen a little bit more nuance there, where it's like, we will appreciate good faith efforts, and we do take into account the rigor under which you make the claims that you make in determining. However, while still stating that if you say something's anonymous, and it turns out that it's not, we're going to investigate and pursue enforcement. I think that's fine.

I think explaining the proportionality of the enforcement and punishment based on the actions of the organization or company and whether or not they were in good faith would have been nice to see in there. I didn't see that there. I think it was a little bit too sledgehammering. I understand why the FTC would do it. I really do. In understanding the way the FTC usually operates, they go after egregious actors first, and I think egregious actors, what I'm just saying won't apply to them.

Alexandra: Yes, absolutely that makes sense. Of course, more nuance would have been desirable, but I also see so many-- we work a lot with financial institutions, insurance organizations, which have their traditional anonymization practices like masking and so on in place, but I've rarely encountered any organization that has done very vigilant vetting of the anonymization that was performed to really see if that's fulfilling the anonymization requirements under GDPR and the other laws. I think that's one thing that I hope is going to change.

One other thing that I'm also hopeful is I've actually also worked a lot with computational privacy researchers and Yves-Alexandre de Montjoye is one of the leaders in that space who published quite a few papers, which made it to the New York Times, Washington Post, The Guardian, and so on and so forth, stating and warning about the pitfalls of traditional anonymization. One interesting thing he pointed out is that, particular with traditional anonymization technologies where you stick to the original data but just distort some parts of it, delete some parts of it, it's terribly hard if not impossible to measure from a mathematical point of view, how high your risk of reidentification is.

This is why he's also one of the advocates for synthetic data and other emerging technologies, where you can much easier measure this because you have just a smaller group of privacy risks that need to be addressed. This is I think, one thing. Then, of course, besides more nuance from regulators, also guidance from regulators, what they're expecting to see, I think, is something that we're lacking both under European Union as well as in the United States, which I hope is going to come to make, as I've already mentioned, lives easier for everybody.

Pedro: Agreed.

Alexandra: Perfect, perfect. We are running out of time, Pedro. Actually, I was curious about your thoughts on how the role of privacy professionals is going to evolve in the future. If you can maybe also have some tips for our listeners on how they should prepare to be fit for metaverse privacy, blockchain privacy, and everything else that's going to come.

Pedro: Yes, look, I've been at this for a long time, and it's definitely the most exciting time to be a privacy practitioner ever. Probably that will be true in the future, meaning we are at that pivotal point of turning our profession into what it's going to be. We are the Impressionist artists in the late 1800s, and creating Impressionism, we're doing that for privacy right now. It's a super exciting time.

I think some of the things we talked about earlier are critical, which is approaching our profession with humility and understanding that we are not the ones who are going to have all the answers and the need to develop the expertise and the disciplines to speak intelligently and expertly about areas that are privacy adjacent is going to be something important going forward.

Whether that means we bring other disciplines into the profession, which is what I'm hopeful for, or that we collaborate more closely with other disciplines, I don't know. I think expanding the idea of what a privacy professional is, is something that's underway right now and I hope we do it carefully and thoughtfully so that we don't water down what it means. We need to expand it. I just don't want us to turn into a pseudo profession, which would not be a good outcome.

There are a lot of new interesting surfaces for people to practice. The metaverse is an up-and-coming and emerging place and concept. If I was graduating from law school right now and interested in privacy, I'd be really excited about what my job is going to look like in 10 years if I'm interested in the metaverse privacy issues that for sure are going to arise. There are some folks working on it now, but I think in 10 years, it'll be really exciting, really, really exciting.

Alexandra: Definitely.

Pedro: I'll probably be too old by then to be interesting, but the next crop of folks, I think is going to have a lot to work on. I think the future is bright for our profession. The top tips I'd give anyone either coming into the profession or trying to make a name for themselves would be, be humble, listen to your colleagues, listen to opposing views carefully, and be inclusive in your work, meaning bring other points of view and bring other ideas in, no ideas are bad. If there's a group, a person, or someone in the meeting on the design team, in the privacy team, wherever, that isn't being heard from or having a chance to make their point or provide their perspective, enable that. Enable it, enable, enable, enable, it's much more important to listen than to talk.

Alexandra: Absolutely, so I can only applaud you to that, Pedro, I think that's so important, regardless whether you're under Privacy professional or any other professional, but as you just pointed out, exciting times ahead for privacy. Thank you so much for taking the time today. It was a pleasure to have you on the podcast. I can only recommend to our listeners to also visit your podcast Data Protection Breakfast Club. We'd love to also join you there at one point in time, but I definitely enjoy the conversations you and your partner have there with the exciting guests you have. Another resource where they can follow you and hear more from your wisdom on privacy and fairness. Thank you, Pedro. It was a pleasure.

Pedro: Thank you for having me. This was super fun.

Alexandra: Wow, what a great conversation. I think you now see that it didn't over promise with describing Pedro both as super charismatic and inspiring to talk to. If you want to learn more from him, follow him on his Twitter, on LinkedIn, and absolutely also check out his podcast which is called Data Protection Breakfast Club. For our podcast, if you have any questions or comments for today's episodes, as always, reach out to us on LinkedIn. Write us a short email at podcast@mostly.ai. We will be back with our next episode in two weeks. See you!

Ready to try synthetic data generation?

The best way to learn about synthetic data is to experiment with synthetic data generation. Try it for free or get in touch with our sales team for a demo.
magnifiercross