Behrang Raji on ethical AI, fairness and data anonymization

Alexandra Ebert: This is the Data Democratization Podcast. Welcome to the show. I’m Alexandra Ebert, MOSTLY AI’s Chief Trust Officer and I have Jeffrey Dobin with me, Duality Technologies privacy expert. Hi, Jeff.

Jeffrey Dobin: Hey, Alexandra.

Alexandra: What’s new in privacy protection over the Atlantic?

Jeffrey: What’s new over here? Well, I guess exciting times for us privacy nerds. Colorado, following California and Virginia, has announced that it’s the third state to get essentially a data privacy statute passed, so look out for the CPA, not to be confused with the CCPA out in Colorado. What’s going on with you in Europe?

Alexandra: Definitely a lot. We had some very interesting developments lately, especially the Norwegian Data Protection Authority did something I want to mention. They issued a GDPR fine not long ago for using production data in testing. What makes this case so interesting, especially for us here at MOSTLY AI is that this fine came with a recommendation. The Data Protection Authority recommended or actually complained about this organization using production data for testing where synthetic data would have been sufficient and a viable alternative.

They said it was not okay that production data was used there. A fine that could have been prevented if synthetic data was used and I think that’s a fantastic development.

Jeffrey: Yes, I read that. It’s pretty cool. Sounds like Europe is on the right track in its adoption and also citing the right uses of privacy-enhancing technologies. Talking of data protection authorities, I believe we have a guest today who might be working for one.

Alexandra: Yes, you’re right. Today, we have Behrang Raji on the show. He’s an officer for the Hamburg Commissioner for Data Protection and Freedom of Information. By background and by trade, he’s a lawyer and he’s also currently doing a PhD and writing a thesis about AI in the public sector, so super exciting topic.

Then, of course, he also has deep expertise in open data and the European data strategy. Besides that, he’s also interested in philosophy and especially philosophy of law, and how technologies will completely change the landscape, including ethical AI, synthetic data, and smart contracts so there’s really lots of interesting topics that we discussed in today’s episode.

Jeffrey: Absolutely. It sounds fascinating. Let’s go ahead and meet Behrang.

Alexandra: Hello, Behrang. It’s so great to have you on the show today. Can introduce yourself to our listeners, and maybe also share a little bit about what you’re currently working on.

Behrang Raji: Hi, Alexandra, thank you for inviting me to your great podcast. I’m very happy to be able to talk to you about some exciting topics here today. Before we start, first of all, I must point out that all my statements made here are my private opinion and do not represent the opinion of my authority. And yes, I’m officer for the Hamburg Commissioner for Data Protection and Freedom of Information. HmbBfDI, as we shorten it.

What I’m doing there, I supervise companies and their compliance with the GDPR. Central role of my work is to deal with complaints from data subjects and to help them enforce their rights. What I can tell you, we receive an enormous number of complaints, so this consumes many resources.

Alexandra: We can only imagine since privacy is really on top of mind, not only for businesses but also for consumers nowadays.

Behrang: Yes, and what else? We are also involved in many committees and if time allows, we also advise companies and other authorities. What I mean is, we also advise the public sector. What I particularly find very exciting about my job is the interface between law, technology, and somehow also political aspects.

The interaction of these fields is incredibly exciting. What I observe around the topic of AI is that it’s also becoming more and more for us starting with business models of credit agencies that assess the creditworthiness of consumers, but also autonomous vehicles, intelligent AI systems for video surveillance, job application systems that assess how competent a person could be but of course, also the state which uses AI systems for better performance.

What I’m really concerned about is that we can hardly cope with these new challenges because of too less staff and also because of the lack of structures. We don’t even have such a thing as an AI unit. So I think that future issues are best tackled yesterday.

Alexandra: It would make sense if data protection authorities started creating and initializing some AI specialist groups to deal with this increasing interest of the society and economy in adopting AI technologies.

Behrang: Yes, I would appreciate it. That would be a nice approach. I’m also writing on my doctorate and I’m currently dealing there with many well-known questions such as fairness and transparency of AI systems.

Alexandra: Yes, you’re actually a passionate advocate for ethical AI. For you, what does ethical AI mean in the first place, and how does it look like in practice?

Behrang: In any case, we have to note that there is no AI regulation yet. Ethical issues are also at that point of time an important element of social influence on future routes. As a part of the European data strategy, the EU Commission has formulated their high-level expert group for basic ethical principles, and we have to understand them as a foundation which will certainly also be reflected in future regulations.

These are, first, respect for the human autonomy, second, prevention of harm, and of course, fairness, and last but not least, explainability as a part of transparency. In regard to ethical AI, actually, I would rather say that I’m a big fan of digital ethics as an extension of ethics and ethical AI I think is a part of this. Among all the–

Alexandra: Sorry, quick interruption. Is digital ethics different from normal ethics? In the end, it’s people we’re dealing with, so does this change or how does this change?

Behrang: Yes. I think ethical debates about digitalization, that’s what we are talking about, should be an integral part of the discussion about the technological progress. It’s the relationship between technology, and the people, and the society, but let’s look first at the relationship between technology and law because I think it’s also important aspect.

The use of technology aims to realize certain purposes. Which purposes these are in concrete terms is up to the decision of the users so that the technology itself is free of specifications of purposes. The hammer, for example, can be used to hammer nails, but also heads. This leads to deficits in reflection and responsibility for what we need ethical decisions as society and also for legal framework.

In this respect, it is the task of digital ethics to take up emerging questions in the digitalization process and to ask what is technically possible and desirable in our society.

Alexandra: Yes, I think that’s always the interesting combination. What’s on the one hand possible, but what’s also desirable and how should these things be used. From your point of view, what would you say are the most pressing questions of digital ethics nowadays?

Behrang: We should understand the discussion of ethical AI as a reflection on what should be considered also legally and ethically fair and good in our society. I think it’s very important when we are talking about ethical AI to ask whether or how fairness can be implemented in AI systems, or how it looks with the product safety of such technologies. At the same time, we have to think about which purposes of these systems should or can be used in our society.

Alexandra: Yes. I think also with the recent AI proposal that was issued by the European Commission, we have this use-based and risk-based approach where it was not about regulating the technology as such, but really always having the use case in mind and having different requirements and obligations, depending on what type of AI application or what intended purpose of the AI application the business is actually looking to focusing on. Are you satisfied with this proposal? Is there anything missing from your point of view? What are your thoughts on this new AI act that was suggested?

Behrang: I think it was two days ago, the 21st of June, the EDPB, for your listeners, the European Data Protection Board, and European Data Protection Supervisor, have adopted a joint opinion on the proposal for AI regulation. I see it the same way as the EDPB. Overall, it’s a very ambitious attempt to regulate AI systems as a risk technology. This is good and it’s the right way, especially European way. There are some points that need to be improved. It will be a long road before this law can really come into force.

When we are talking about AI and AI systems, we are talking about data-driven systems and of course, they will routinely process personal data. What I’m missing in the regulation is the clear distinction and the relationship between the AI regulation and the GDPR. It’s still not quite clear. Single provisions have to be reviewed to see if they do not allow a legal basis for the processing of personal data through the back door. It’s not clear, is there a legal basis in the AI regulation, or should only the GDPR be the legal framework for processing data? Furthermore, it’s important to completely ban biometric video surveillance in the public places. That’s what also the EDPB said.

The AI regulation takes a risk-based approach, and how the risk classes relate to risk classifications in the GDPR is also not entirely clear. The AI regulation is about the risk classification of the purpose, and the GDPR is about the classification of the processing. This cannot be considered separately because processing of data is always purpose related.

Alexandra: It’s not clear how these tools exactly work together. Let’s hope that this will be resolved. One other thing that I was wondering when I read through the proposal was that on the one hand, I was happy to see that the AI act puts a high emphasis on data quality, and that companies really should care about creating fair systems that don’t discriminate against people.

If I understood correctly, the majority of obligations in there are only self-assessments and there’s not necessarily some kind of external audit or AI certification. What’s your point of view? Could AI certification audits be beneficial, even desirable? What is a good reason why it is not in the act?

Behrang: Sure. That’s also very important point when we are talking about fairness because this is a very important and relevant aspect. Criteria-based validations, audits, and also certifications are, from my point of view, the central mechanisms for making fairness measurable, testable. We have to acknowledge that we need measures to decide whether when an AI system says that the short drive to do work for example, is in criteria that someone may be loyal to the company for a longer period of time. This system is also discriminating those people who cannot afford the higher housing cost near the workplace. We need measures, how we want to decide whether these decisions of the AI systems are fair and do not discriminate other groups.

Alexandra: I agree. That’s really of utmost importance. It’s so important that these decisions are not only made within the data science teams of some organizations, but that a public discussion can happen about what do we actually consider as fair outcome and in which direction do we want to move as a society? In general, besides you mentioning that AI certification and audits would be helpful to also ensure fairness, what else can corporates do to avoid bias and avoid discrimination when they are building their AI systems? Any recommendations from your side?

Behrang: One important thing is that we need experts in companies who are able to make such validations. This is a very important point. When we are talking about fairness, when the goal is to avoid a bias, and how fairness could be implemented in the AI systems, we have to first have a good understanding of fairness in law. Then a very interesting question is, if or how fairness in law can be implemented in AI system and machine learning algorithms.

Alexandra: What should our listeners know about fairness in law?

Behrang: Fairness in law’s, I think, major aim is to treat people equal. Equal treatment also means equal opportunities and equal rights. That depends extremely on the context. It may therefore be necessary to give preference to certain disadvantaged groups in order to achieve these aforementioned goals. This contextuality which AI systems, from my point of view, cannot learn on their own makes it necessary to conduct audits, taking into account different definitions of fairness and different audit criteria. From my point of view, audits, as I mentioned before, and also validation, are the key to algorithmic fairness. If the aim of the AI system is to measure competence as the aforementioned example, it must be figured out which the right criteria are to measure that.

Alexandra: Absolutely. Since you also mentioned this example of equality, equal treatment, and also equal opportunity, there’s this wonderful picture that illustrates how difficult it is sometimes to come up with what actually is considered fair. It’s the difference between equality and equity. You can see a fence and behind the fence, there is a football game and three people want to watch this football game. One person is approximately, I don’t know, two meters, another person is a small child, and another person would be like 1,70- 1,60.

If you give everybody the equal thing, so maybe a small box they can stand on, the 2 meter tall guy will be up way above with his head in the air and will still be able to see the game, but the child won’t be able to see over the fence yet. Maybe the person, who is at 1,70 can see a little bit over the fence. But if you give a bigger box to the small child and no box to the 2 meter guy because he doesn’t need it, then everybody has equal opportunity to see the game. I think this really illustrates and I think this is an important point in not only giving exact equal treatment but really recognizing the special needs that different groups of people need and catering for those.

Behrang: Yes, absolutely. That’s a very good point. There are different examples like COMPAS. It’s a system, which is used in the US, but there’s a quite similar system also used in Switzerland, I think it’s called ROS. COMPAS stands for correctional offender management profiling for alternative sanctions.

Alexandra: Impressive, I didn’t know that, I just knew COMPAS, but I never knew what exactly it stands for.

Behrang: Yes, right. This system would not be compatible with our understanding of constitutional law and the finding of guilt by a legal judge.

Alexandra: Sorry to interrupt, just for listeners who don’t know COMPAS, it’s a decision support tool that’s used by the US Justice System to decide I think on the likelihood of or how likely it is that the person re-offends and therefore determining if you’re allowed to be released from prison early or if they have to stay in prison for longer. It was shown that there was severe discrimination against people from darker skin colors, and therefore the system was biased and let criminals with lighter skin colors out of prison much sooner, as opposed to darker skinned individuals. Sorry to interrupt but just to give the perspective for the listeners who are not aware of this famous example of bias in AI.

Behrang: Sure, thank you very much. A really interesting point is that it turned out, what you said that the data system came to racist results and classified people of color with a high recidivism prognosis and, conversely, privileged white criminals. The interesting thing about this is that the company, which created that system, tried to create fairness by not processing ethnicity as an input relevant data. What it shows is that this approach, let’s call it fairness or blindness, achieved exactly the opposite of what it was trying to avoid, because other data such as place of residence, at least in the US, it’s an important point that this correlates strongly with their ethnicity, so that this data is indirectly, as a proxy, taken into account as a decision relevant criterion in the system.

In this case, we see that synthetic data could help if we are talking about implementing fairness, accuracy, and evidence of the results that are very important points. If we use synthetic data, we can test it, we can look whether there is discrimination in this data in the process.

Alexandra: Yes, absolutely. This is also the work that we do around fair synthetic data, create a data set in the first place where you, for example, don’t have any racial bias or gender bias or something like that so that the algorithm doesn’t even learn about all these different discriminatory aspects in the data. Yes, absolutely. I think one other point where synthetic data can also help is, coming back to the AI audits, is explainability. You could create synthetic individuals that are not the average Janes and Joes of a society but the more unusual edge cases and see what would happen if I don’t know some kind of public sector, social, whatever algorithm would treat a 90-year-old who is still working, differently than somebody who is in a more regular age of the workforce, and so on and so forth. So that not only the organization itself has a tool to figure out that you really have an ethical algorithm in place, but also the external bodies would be able to gain these insights. You can’t only assess the code of an algorithm, you need the data to really see it come to life and see what decision it makes on different groups of the population.

Behrang: Absolutely, that could also be a benefit for us as regulators. When we are talking about best practices or bad practices, I think that was also one question you asked before. I think that depends on the use case. I think that the purpose of the system, what is to be predicted, must be clearly defined in advance. That is one best practice. Then the criteria of measurements have to be defined. Synthetic data, how you said, is very important or very useful, at least and also governance structure. This is also a very important aspect because when the public sector, for example, wants to use AI systems, I’m sure that they are not the persons they need to check what the AI system is actually doing. Ideally, there should be a team of experts who set and review measurement criteria. These can be, for example, how I mentioned before, the evidence of the results, accuracy of the results, and diversity of the data sets.

Alexandra: Yes, definitely all very important points. One thing that would be of interest to me, though, if you say that there’s so much reason for having these AI audits, and also AI certifications, how could we get to that stage within Europe if it’s currently not in the proposed AI act? Is this something that potentially could be added now during the review process or are there some other mechanisms that could suggest that AI audits should be mandatory for organizations that want to deploy AI in high-risk use cases? What are your insights from the regulatory side here?

Behrang: I don’t have more insights than you at this point. I think that should be edited, at least some very general criteria should be implemented in law. For sure, the AI regulation would be a good place for that.

Alexandra: Makes sense. Now, we’ve talked so much about AI, but I definitely also want to cover the topic of privacy. What is your take on current anonymization practices? Also, what are the most common mistakes that companies make in data anonymization? Are there any misconceptions that you encounter frequently when consulting organizations or looking into the data protection standards within European businesses?

Behrang: Of course, I look at the whole issue primarily from a legal perspective but first to the basic problem. Often, too many attributes are deleted from the data, which leads to a strong reduction in the value of the data, that’s a big problem for the controllers, for companies. From another view, which is for me, very important, is when too little is blurred so that the limit of anonymization is not even reached. We have then to talk or speak about pseudonymization at best. Now, briefly to the legal perspective, anonymous data are not regulated in the GDPR. Article 26 of the GDPR, just clarifies that anonymous data is not in the scope of the GDPR. That shows why it’s so good or interesting for companies to deal with anonymous data. The measure of when data is actually anonymous is also not entirely clear. What I mean by that is that it depends on the identifiability of data subjects. It has not yet been clarified to whom exactly it must not be possible to identify. The European Court of Justice, the ECJ, seems to be leaning towards a relative approach and that is, it will then only depend on the fact that the controller himself can no longer identify the data subject, however, this was not entirely clear in the prior decision on dynamic IP addresses. My assessment is that the authorities will probably tend to assume an absolute understanding of the meaning of identifiability. In that case, it depends on the possibility of any third party. If it is also possible for a third party to be singled out from the supposedly anonymous data stock, then it cannot be assumed that the data is anonymous.

I think that the authorities will set the requirements for anonymization very high and will deal with the requirements of pseudonymization more easily. What I can say in this context is that there will be new guidelines on anonymization published by the EDPB in the near future. What does that mean in concrete terms, for example, in the context of synthetic data?

If the original, it depends on the use case I know, but if the original data which were inwardly randomized or kept, for example, for testing issues – we are not talking about anonymous data for this data either. This legal uncertainty, as a whole, is very unsatisfying, from my point of view. It would be very good if we have clear and workable clarification by the European Court what the term identifiability exactly means.

Alexandra: Exactly. Legal certainty I think is always something that’s desirable to have. Can you help me to better understand what you just mentioned? You said that if the original data that was synthesized is kept around then under this one perspective, it wouldn’t be seen as anonymous data. Why is that the case? Because even if you keep the original data, you still can’t prove in any way or derive from the synthetic data anything about a real individual, infer something about them, single them out, and so on and so forth.

You can’t make any connection between the synthetic data set and the original data set. Why would that be the case?

Behrang: Because I think that a third party, maybe when you are synthesizing the data, you have the original data and you can verify. I think you can explain it better, but when you can single out from the synthetic data which individual this is or was, also the synthetic data from absolute understanding of the term identifiability, then we’re not talking about anonymous data.

Alexandra: Interesting point that you’re making. I know that this is the case for legacy anonymization techniques. If you have a traditionally anonymized data set and due to the pitfalls of these technologies you are able to single something out from the anonymous data set because you have the anonymous data set and you the original data set that, of course, the anonymous data set is in fact not anonymous but only pseudonymized.

With synthetic data, I could put one employee in front of the original data and in front of the synthetic data, and it wouldn’t help that he or she has access to the original data because it still would not allow them to single anything out about any individual within the data set.

Behrang: If it’s really so, how you described it, we’re talking also in these cases of anonymous data.

Alexandra: Exactly, but of course, I completely agree that there’s definitely different discussions within the regulatory landscape currently, and that there are sometimes questions from the business side on how to actually deal with anonymization. What are the most important findings within your thesis that you currently have? What are the challenges for AI adoption within the public sector that you just mentioned and considered?

Behrang: I think we already have mentioned two or three very important effects. The first is that we need a good governance structure in companies and also in the public sector when there is a plan to use AI systems. Another point of my thesis is that I’m reviewing on whether there are in the constitutional law any approaches that can be used for legal framework of AI systems. That would take up too much space I think.

Alexandra: [chuckles] I can imagine, with the common length of theses you can’t cover that in 30 minutes talking time.

Behrang: But another very interesting although some philosophical question is whether we need some kind of enforcement deficit in a democratic state, because to follow the law out of insight, we need the opportunity to cross the line. If everything is forbidden by design because we use these smart contracts or AI systems which don’t make possible to do something that is forbidden and the development of such insights is not possible at all. That’s also one aspect of my thesis.

Alexandra: That’s definitely an interesting aspect. We’re also covering the realm of predictive policing here and actually using technology to not even allow people to break the law and say that this would be a paradigm shift in how we currently approach this as a society where we have rules in place that govern how we interact with each other and how we should behave to keep our society functioning and thriving, but still we would have the opportunity to break the law but most of us decide not to do it due to insight.

Is this what you’re highlighting that we potentially need to think about?

Behrang: Yes, that’s what we need to think about because that’s very fundamental for a democracy that you always have the possibility to decide by yourself whether you want to break the law or not.

Alexandra: You were suggesting when you started out that we could need a law that allows us to break the law? Was this the question that you were posing?

Behrang: No, but I think in some use cases, it is better not to implement total enforcement systems with AI. A yardstick is a human who’s able to make a decision.

Alexandra: A human-centric approach.

Behrang: Yes, right.

Alexandra: I think this is also something, at least, I would want to wish in the world and in the region that that is the case, but at least, from what I see going on with the European Union installing their ethical AI principles, which you just lined out at the beginning of the episode, number one being respect of human autonomy and then also, as you mentioned, forbidding facial recognition surveillance, video surveillance, and so on and so forth.

I, personally, am rather positive that we’re not moving in the direction of this totalitarian and controlled society. But of course, I think it’s always important to have people like you and also people from activist groups really closely keeping an eye on these situations and ringing the alarm bells early on to make sure that we can uphold the democratic values as we’ve always had in the European Union.

Behrang: Yes, I totally agree with that.

Alexandra: Perfect. In one of our previous conversations, we were also talking about standards in general but also for AI. Do you have the impression that we need some standards that we currently are not yet having? If yes, which standards?

Behrang: I think anonymization for training of AI systems is a very important field. What we more need is the AI regulation, especially a regulation for the purposes of AI systems. What we also need is better AI legal framework for the training of AI systems. That’s a very important thing. If we are talking about anonymization, I think we need clarification by the European Court but also useful guidelines for controllers, what they have to consider reaching the limits of anonymity. That would be very helpful and one step forward, I think.

Alexandra: Yes. I think this type of clarity would definitely be helpful. What we currently see is, I’m just thinking about the statement that was issued from the Norwegian Data Protection Authority, where they actually fined a Norwegian Institution for using production data for testing purposes. They specifically recommended that they could have done the same thing with anonymous synthetic data. Of course, synthetic data is something that’s different depending on which vendor is creating it and depending on whether you create it yourself and which frameworks you use.

Also here, of course, we are interested in having an industry-wide standard on a way to assess the privacy criteria of synthetic data. The same holds true of course for other technologies.

Behrang: This decision was a big victory for you guys. Wasn’t it?

Alexandra: Absolutely. We were very happy when we opened the newspapers that morning. Good to see that but in general, there are so many data protection authorities that are thinking fondly and positively about synthetic data. Also, Ulrich Kelber from the Federal Data Protection and Freedom of Information Commission is somebody who regularly mentions synthetic data as one of the solutions to really reconcile data utilization with data innovation. He always highlights, that the purpose of GDPR is not to restrict any type of data processing. We can’t afford that on an economical level, but just to ensure that it’s done in the right way. As you won’t be surprised, I’m, of course, a big advocate of synthetic data. I think that that is one of the right ways to go. In general, having companies care about that and recognizing that legacy anonymization techniques sometimes are not sufficient and that there is a difference between anonymization and pseudonymization is something that I also wish would reach everyone who’s dealing with that. We still see some data leakage and data breaches that could have been prevented if this knowledge would be more distributed within the industry.

Behrang: Yes, I think that synthetic data is a great approach for preserving privacy. But I think also the use of anonymous data could have a big impact on the interest of individuals and groups, but then we are again using this data, the purposes for whichever use. That’s why, again, ethical talks and also legal approaches are so important, I think.

Alexandra: Yes, absolutely. In general, just making data more accessible to a broader public. It is so important to really have these open discussions about ethics, about fairness, to actually understand what’s going on. Currently, we still live in a situation where the majority of data is within the big tech companies. AI is developed, but we don’t necessarily know where it’s deployed, how it impacts us, and how it is created.

I think really finding ways to make data accessible and have people with different mindsets and perspectives, having a look on this data can also help us as a society to spot some issues that potentially are not within the awareness of the data science teams in the big tech companies. I think also data democratization, which is the name of our podcast is something that we can do an even better job on, on a European Union level.

In general, what would you say is your advice as somebody from a data protection authority for all those companies that want to innovate more with the data, but still always want to comply with regulations and do the same thing?

Behrang: That’s a really good question. I always would advise thinking about legal issues in advance. I think this is a very important point, has to be highlighted. Data protection impact assessment or even a simple risk analysis helps to identify open questions before a project is started. I would proactively at this time contact the authorities with these questions. Just another tip I would like to give. In this time, they should be seen more as partners than opponents in this early stage. Finally, of course, lawyers, who are specialized in data protection law are also happy to help.

Alexandra: Yes. Maybe a provocative question because we know within GDPR, one of the intention was that the data protection authorities don’t only take care of handing out fines, but actually help people to do the right thing, coaching them, consulting them. At the beginning of the episode, you mentioned that there’s also lack of staff and the lack of people within data protection authorities. If all of the organizations that want to innovate more would issue their questions to Data Protection Authorities, would your response times be fast enough to allow them to innovate with the pace that they’re interested in? Or do you see there’s some capacity issues? That we maybe need to get more people to Data Protection Authorities to really make this work on scale?

Behrang: Yes, I can tell you we can handle these new challenges and we can’t give always legal advice and help the people how we would want to do that. I think it is a chance to try to get in a dialogue with the authorities and try to get help. Yes, the response time is a problem. It’s because of the task we have and less staff.

Alexandra: There’s definitely a lot on your shoulders. I see that the data protection authorities around Europe are doing great work and that there are guidelines on topics that are of particular interest for a bigger group of people. Maybe, also, if somebody is listening, what would be the top reasons to apply to a data protection authority and join the team there to really help move in a more privacy-friendly and more innovative direction? What would be your shout-out to people who are interested in working with data protection authorities? Why should they come and do that?

Behrang: In the beginning, I said that first, you are doing a great thing for the people.

Alexandra: Agreed.

Behrang: Then another point is that your decisions have a big impact on societal developments, and you are integrated in very interesting projects of the state, also in companies, as you see the interaction between technology, law and political interest handling with this fields. I think it’s a very interesting job.

Alexandra: I can imagine that this must be also incredibly rewarding to know that you’re really contributing on that level to societal development and upholding European standards and values. I also talked with some lawyers recently. I also had them on the podcast. Axel von dem Bussche, for example, and he also highlighted, that also on the private sector, lawyers’ side, it’s really a capacity issue at the moment. There are not enough skilled people who have this data knowledge, who have these insights. Therefore, I think it’s also an issue that we need to tackle on university education level and really get more people in this new space of intersection of data, privacy, and law. Lots to do still on a European level.

Behrang: Sure, sure.

Alexandra: Perfect. Perfect. Before we come to an end, Behrang, are there any final remarks that you want to leave our listeners with?

Behrang: Well, I think that data protection and data protection law is a very important aspect, which is needed to be considered in times of AI and big data. Yes, I think that also for future legal frameworks, that data protection is one aspect that should always be considered and will in the future also take a more and more important role.

Alexandra: I completely agree. If you’re a lawyer and not yet specialized on data protection and AI issues, then go ahead and do that because we need you. We need your support.

Behrang: Yes.

Alexandra: Perfect. Thank you so much.

Jeffery: All right. This conversation was a real treat for those interested in the legal aspects of data protection. I can tell you that as a lawyer and as a data privacy guru, and also sometimes referred to as a bit of a nerd, I can tell you that I really liked this interview.

Alexandra: Me as well. It was really great conversation with Behrang. I think that especially those companies looking to innovate with AI and also comply with GDPR can learn a lot and take away a lot from this conversation. Shall we put together the most important takeaways?

Jeffery: Yes, sure. Let’s get started then with takeaway number one. As part of the European Union data strategy, the EU commission formulated a high-level expert group for basic ethical AI principles. These principles are as follows. Number one, respect for human autonomy. Number two, prevention of harm. Number three, fairness and explainability as part of transparency.

Alexandra: Takeaway number two. Data-driven systems like AI, routinely process personal data. It’s still not clear what the relationship between the GDPR and the newly proposed European AI regulation will be, so there’s a little bit of legal uncertainty at that moment that hopefully will be clarified. Then Behrang mostly talked about audits and certifications which he thinks should really become a central mechanism to ensure fairness in ethical AI.

Jeffrey: Takeaway number three. When the goal is to avoid bias, we need to have a really good understanding of fairness in law. Fairness in law tries to treat people with equality, but to achieve this fairness, certain groups might need to be treated differently than others. AI systems can’t always learn this on their own, so audits are the key to algorithmic fairness.

Alexandra: That’s right. It was also this equality versus equity discussion; how to give access to the means necessary to achieve the same outcome, depending on what a person actually needs. Takeaway number four, synthetic data. We talked about synthetic data quite a bit in today’s episode. Synthetic data can be used to test systems for discrimination and especially fair synthetic data can be helpful to fix these biases that we find sometimes entering AI models. Also, synthetic data can help test AI systems against edge cases that would not have been included in the training data. It’s a valuable tool here as well. Lastly, synthetic data can also help with explainability and providing transparency, especially to its regulators.

Jeffrey: Absolutely. Takeaway number five. Behrang often encounters different data anonymization mistakes. You’re probably all too familiar with this, but too many attributes can be deleted from data sets, then this obviously reduces value in the data. Then on the other end of the spectrum, when too little data is obfuscated or blurred, the data isn’t really anonymized properly. True anonymization depends on identifiability. So far, he was talking about how this really hasn’t been clearly defined. Behrang expects that authorities will move towards a truer, or absolute understanding of identifiability. This should also impact higher standards or create higher standards of anonymization.

Alexandra: What we’ve heard from Behrang, the bar for anonymization’s definitely going to become higher and higher, which I think is a good thing because it increases the level of privacy protection for European citizens. Then takeaway number six. Behrang’s advice for companies that want to innovate and that also want to ensure compliance is that they should think about legal issues in advance. A data protection impact assessment, for example, helps to identify open questions even before a project starts. It’s always a good idea to do one.

Jeffrey: I like that proactive approach, begin before the project even starts. Then takeaway number seven. Another great recommendation he made is to proactively contact authorities for advice. He says, “Rather than seeing them as the enemy, look at them more as a partner or resource that you can take advantage of, and embrace as someone on your side.”

Alexandra: Absolutely. I think there’s so much great knowledge within the data protection authorities within Europe and organizations definitely should be more open in approaching them and discussing open questions with them. Thanks a lot, Jeff. I think summing this up was really helpful for our listeners.

Jeffrey: Thank you. Thank you to all our listeners who tuned in.

Behrang Raji on ethical AI, fairness and data anonymization

Transcript

Ready to start?