💡 Download the complete guide to AI-generated synthetic data!
Go to the ebook
Episode 39

What you didn't know about AI transparency with Rania Wazir, co-founder and AI expert

Hosted by
Alexandra Ebert
In episode 39 of the Data Democratization Podcast, host Alexandra Ebert, Chief Trust Officer at MOSTLY AI, is joined by Rania Wasir, co-founder and CTO of leiwand.ai, to discuss AI transparency, the misconceptions surrounding AI transparency and fairness, and why having a standard for transparency is important. The episode also explores the concepts of fairness and explainability in AI, and how they differ from transparency. The challenges of detecting biases in large language models such as ChatGPT are also explored.

Transcript

Alexandra Ebert: Hello and welcome to episode 39 of the Data Democratization Podcast. I'm Alexandra Ebert, your host and MOSTLY AI's Chief Trust Officer, and today's episode is all about AI transparency. What it is, what it not is, because there are quite some misconceptions, believe it or not, how to go about it, and why a standard for transparency is so important. The perfect guests to discuss all of this is Rania Wazir, co-founder and CTO of leiwand.ai, who is not only an expert for AI transparency, but actually also works on the standards of AI transparency, so I couldn't have imagined a better guest to discuss this.

Besides transparency, we'll also cover fairness in this episode and explainability and look into how those two concepts differ from transparency. Lastly, we talked a bit about large language models in ChatGPT, and Rania shared with us why it is so difficult to actually be transparent and fair, and even detect biases in these enormous models. I'm sure there will be plenty of things you can take away from this episode. Let's dive In.

[music]

Alexandra: Welcome, Rania. It's so nice to finally have this recording together with you. We tried to schedule it for several times and now it's finally happening. I can't wait to talk about AI transparency with you, but before we dive in, can you maybe briefly introduce yourself to our listeners and also share what makes you so passionate about the work that you do.

Rania Wazir: Hello, Alexandra. It's a great pleasure to be here. Thanks a lot for inviting me. Yes, I'm very happy that we finally managed to match our schedules and have this chat together. Let me tell you briefly a little bit about myself. I come from a mathematics background and it's been six, seven years that I work in the area of data science. I started actually working with a goal of doing AI for good. Trying to bring the tools of data science and AI to NGOs and civil society organizations, and from there I quickly came to the realization that we're trying to do something good, but we actually don't have a way of measuring or deciding if our product even works or if it has a good impact or not. As the saying goes in English, the road to hell is paved with good intentions. I might start out with a good intention, but how do I know that it did actually end up doing something good. This led me into the area of what is now called trustworthy AI and from there also to working with standards, and exactly trying to figure out, well, are there best practices? Are there standards that we can say, "I've produced this AI product and it actually works according to some specification"?

Alexandra: That's very, very exciting. Talking about transparency, our listeners have heard about AI fairness in many of the previous episodes. We've talked a little bit about explainability, but you're actually my first guest to dive deeper into AI transparency requirements. Can you give us an introduction to that and also share why this is actually something that organizations should aspire to satisfy and have whenever they use AI?

Rania: Yes. I feel like transparency is something that lies as a foundational block if you want to talk about AI and its being trustworthy. In fact, with my startup, we had a series of workshops last year, where we asked people in very different branches, working either as developers or actually users of AI, "What do you expect? What do you think of when you think of an AI system that you can trust?" Practically across the board, people chose all sorts of different characteristics, but everybody agreed that they wanted transparency. Now, of course, transparency, will mean different things to different people. Let me just take a deep dive and tell you what it means to me when I talk about transparency of an AI system.

Alexandra: Yes, please.

Rania: It can start with something very simple. Tell me what is your AI system supposed to do? Then next, tell me, mathematically maybe, what function did you optimize for? Tell me how did you test your system and how did you choose the criteria according to which you tested your system? You can break this down into very many different criteria. Actually, that's a project we're working on it in the international standards trying to figure out all the different items you can ask about an AI system in order to be transparent about it. Really, the information that is required will depend on who you are and what the AI system is being used for, and even what stage the AI system is at, because let's not forget, even as you're developing an AI system, let's say, you're actually developing one model and you need data. The data is going to be part of your system. You want your data provider to tell you all sorts of things about the data so that you use it properly. Transparency is actually not just something that happens at the very end by magic, it's something that accompanies you through the whole life cycle.

Alexandra: Sorry to interrupt. Similar to privacy by design and fairness by design, it's basically something that you should start to think of and take into account right at the beginning because only then you will be able to have meaningful transparency. Also, for example, to the end consumer if it's a consumer-facing application.

Rania: Exactly. Absolutely. This kind of transparency, people think, "Oh, another thing that I have to do." A lot of it is actually just good practice, like how are you going to build a quality product if the data provider didn't provide certain information to the developer and if the developer didn't provide certain important information to the person who's testing it? All of these things you can produce them as you move along, and just because you have the documentation, nobody says you have to disclose everything. It is then up to you to decide depending on who-

Alexandra: What level of information.

Rania: Exactly.

Alexandra: Still with the things that you described earlier of the different steps, these already went beyond what I often times hear when I talk with regulators. As you know, I do a lot, and whenever they speak about transparency, it oftentimes ends at the point, "Well, you should let customers, for example, know that they're affected by AI currently and that it's not a rule-based system or a human behind the interface." Obviously, just providing the context that this is AI is not that useful, particularly if we look a few years in the future where we can expect that much more AI applications will be on the market.

Maybe a comparison would be with magazines, where you have to, as a company, put on this kind of fine print text, "This is an ad or an advertorial but not editorial content," and everybody just doesn't really look out for this anymore, so I can definitely see that this wouldn't go far enough with transparency requirements. Yet from what you described about explaining the functions that you optimized for, where do you draw the line between transparency and explainability of AI systems?

Rania: Obviously, it's a very hard line to draw, but let's just say it turns out transparency is easier to define than explainability.

Alexandra: Oh, it is.

Rania: [laughs] If I'm being transparent, I'm actually disclosing information about my system. I'm disclosing information about maybe how it was built, what its composition is, the processes surrounding the development of the whole system, whereas explainability is actually asking something, in my view, quite different, and it's saying, well, let's say I have an AI system in operation and it's being used to make a prediction, well, which factors did it use in order to come up with this decision and how did it combine these factors in order to come to this decision?

If I think, for example, of a physics principle, transparency would say, what are all the experiments, and what is my background research that led me to hypothesize this particular principle? Then the explainability would be actually writing down the math formula and telling you exactly how you derived a certain conclusion.

Alexandra: Understood, but still, a challenge then, of course, would be that with explainability in AI, you want to get away from the math and actually come up with something that's more digestible for the end consumer, but I completely get your point. Could one say that transparency is more of the meta-level information of how you approach developing it, what you're set out to do, and explainability is more the inner working of how the system actually makes decisions, how certain results can come into existence or something like that?

Rania: I see explainability more as a focused looking at the outcome from an AI system. On the one hand, you can have explainability towards a developer or an expert who might want to understand the inner workings of the AI system; what activation functions were triggered? What did my weights look like in order to arrive at this particular outcome? Explainability from a user perspective would instead say, well, how did this decision arise based on which factors?

The way I differentiate this from transparency is that transparency is actually about the whole system. How did the system even come to be? What is the place of this system inside its context? Let's not forget, when you take an AI system, it's not just operating in a vacuum, it will be interacting with an environment. What is its position in this environment? What impact does it have? What does the environment actually do to impact the AI system? To answer those questions, you don't want explainability, you want transparency.

Alexandra: Understood.

Rania: Obviously--

Alexandra: Still hard to draw the line, but--

Rania: -it's still hard to draw the line, and on purpose, I think the two concepts go together, but in an oversimplified way, this is how I keep them separate.

Alexandra: Makes sense. You mentioned that the standard work is just in process and it's not computed yet, but do you have any best practices for transparency or any example that comes to your mind of, let's say, any consumer-facing AI solution where you had the feeling, "Wow, that's actually good in terms of transparency"? Anything that you can share with us to make it more tangible for the listeners?

Rania: As you said, the standards are still in development, but the standards build on practices and actually information that's coming out of industry and out of research. For example, I've been involved in some research projects where we've tried to be transparent mostly about the data that we use in the research project, and for that, there are things like the data sheets for datasets, and especially in the field of NLP, which is where I've been working, there are also things like nutrition labels and particular labeling methods for NLP datasets.

Alexandra: May I ask for nutrition labels for language datasets.

Rania: I'm so terrible with names of articles. I might be pulling a ChatGPT thing and inventing a new name out of-

[laughter]

Alexandra: Out of the blue. Go for it.

Rania: -out of the blue. Rania the bullshit generator. Basically, what all of these labeling methods for datasets are trying to do or to say, your data doesn't come out of nowhere, so place it, say who collected it? Why did you collect it? Who paid for collecting it? Which population did you go and collect the data from? Does it contain personal data? Did you then get the permissions that you need to get in order to use this? Then also explain how did you process the data? Did you combine any features?

Then, of course, in a lot of contexts, you also need to label the data, so who did the labeling? What instructions maybe did they follow when they were labeling the data? Then how did you check the quality of the labeling? Because when humans label things, we don't always label them exactly the same way, which is something that's actually great. That's part of the diversity of humans, but it doesn't work well for machines. You need to find a way to accommodate also this.

When you don't document this, you're pretending that there was only one way to collect the data and that there was only one way to do the labeling, and that's just not true, and including this extra layer of information helps you decide down the line, is this a good dataset for me to use in this context or can I combine this dataset with this other data set because I need more information?

Alexandra: What I'm still wondering is, obviously, it would be interesting to know how the data was collected, which decisions were made in the stage of data collection, but particularly as an end consumer, wouldn't it be too overwhelming to have this level of detail provided? How can you be transparent to the fullest and as much as you would like to, but on the other hand, not overwhelm the recipient with this information? Because I would assume that also consumer might be more interested then also in, "I hope you did everything you had to do in data collection in a trustworthy manner," but how actually is this system interconnected with other systems within your organization?

Let's take a practical example, credit lending or something like that, is the AI system making the sole decision about whether I should get a loan, or how is this whole process of approvals looking like? Where's the AI system built? How much built-in? How much authority does it have? Two questions in one, sorry for that. How to deal with the level of information that you've just provided, particularly for end consumers who maybe are not even aware that you need to label data? Then what about all this other information that might be even more interesting for end consumers as opposed to very detailed information about only the data?

Rania: That's a great question, especially considering that we are bombarded with too much information, and sometimes too much information is just as bad as not any information at all. My take on that is you need to target your information. Maybe the disclosures about your data set, they don't need to go to the end consumer, but they do need to go to the developer, they do need to go to an auditor maybe who's going to audit your system. What the end consumer needs to know is that this system was audited or that the information did go to the people who needed it in order to build the final product which is reaching them.

Alexandra: That just makes sense. You're still expanding my horizon because I have always seen transparency more is the end of the line consumer-facing issue, but as you stated earlier, this is obviously not the case. Okay, got that.

Rania: Well, we all tend to focus on one aspect, but as I said, and you know this also from the privacy perspective, it's not something you're going to slap on at the end of the process. It's really something that you have to develop from the beginning of the product on. If you want to be sure that something is private, you want to be sure people followed procedures all through the production cycle. The same principle applies when you're talking about transparency.

Alexandra: That makes sense, but it still makes it harder for me to conceptually then also find the lines between not only explainability and transparency, but also fairness and transparency, because for me, this very thorough process and fairness by design, where you documented how you collect the data and looked specifically also for the many ways how bias could enter into your data sets was something that I connect more with fairness and less with transparency. Here may be, again, a harder question, where do you draw the line between transparency and fairness or is it all very connected and not really possible to make very stringent distinctions between the different concepts?

Rania: Well, yes, you're just throwing a lot of difficult questions at me here. To me, it's very hard to be fair if you haven't been transparent.

Alexandra: Sure.

Rania: Let's just start from that premise, but you can be transparent without being fair. You can show me how you've made your system, and in showing me how you've made your system, you can tell me, "Well, yes, I saw that my data was biased, but that wasn't a big priority for me, so I didn't do anything about it." That's transparent, but you're not being fair. At that point, it's up to me as the end consumer to decide am I going to use your system anyway or not?

This is how I differentiate them, transparency is about disclosing information, fairness goes a step further and says, "Well, there are particular pieces of information that I want to know about the system and then I want to try to fix certain things which in my view are broken." Then you enter into a whole other a huge problem, which is trying to decide, well, what is fair? Because fair for me is not going to be the same thing as fair for you and for lots of other people.

Alexandra: Agreed with that, and this actually also came to my mind when you, at the beginning of our episode, mentioned the experiment you did, where you asked different participants to define their requirements of transparency, and I was just wondering, "How many different views can we get there?" Then we have the different fairness views. Definitely all not very easy, but thanks for the explanation.

I think now it makes sense to me that transparency is more of this base layer of information that needs to be there, and then fairness is some additional step which goes into specific aspects of the data, and maybe if I dare to go in this direction, explainability then would be a very specific explanation also why one specific result prediction classification happened and not in general the overall workings of the systems and how it's designed in general. Is this something you could, more or less, sign off without being afraid to get into muddy discussions? Of course, this latter point might be hard to achieve.

Rania: You summarized this much better than I managed to do. Just be warned, there are some explainability experts who would take issue with that.

Alexandra: I would be surprised if there are not some experts in any field which have completely different opinion on most of the aspects that we tend to talk about, but I think we can leave it like that. Still since you started, when asked for an example for transparency of best practice with the data, do you maybe have something which is then more end of the line to consumer, where you have the feeling, "This organization, this business did a good job in being transparent about their system"? Anything that comes to your mind here?

Rania: Whew, okay.

Alexandra: If not, this will be a call to action for our listeners if they have a nice example, they should send it us in the comments or email.

Rania: Yes. Let's say of a company, I am not sure, but there are certain cooperatives or open source communities which I think do a good job. The one that comes immediately to mind is Hugging Face, for example, which does disclose all the data they use, how they produce models, all the things that went into creating a particular model, and also, their whole organizational structure, and they keep working on it. That, to me, would be a best practice.

Alexandra: A good place to check out.

Rania: Yes.

Alexandra: This sounds good.

Rania: Although, I do understand they're not-for-profit organization and there are certain things that are much harder to do if you're for-profit, but let's just say that that would be my ideal.

Alexandra: That's actually makes me curious where you see this challenges of transparency versus for-profit organizations. My suspicion would be then disclosing your code or something like that is usually something organizations, businesses don't like to do. Anything else where you feel this could be a challenge to how transparent businesses actually want to be?

Rania: Yes, there are lots of challenges to how transparent businesses want to be. Number one, of course, they may not want to disclose their code or even the idea behind their code because of trade secrets and because they are worried that this will harm their competitivity on the market. Fundamentally, there are other things that are uncomfortable for an organization to disclose, like their whole organizational procedures, their risk management procedures, also their quality management. Fundamentally, if it's not required by law and it takes so much effort to do, you probably-

Alexandra: Then the likelihood of--

Rania: -won't do it, but do they really want to admit that they're not doing it? Probably not. In some sense, that's our own fault. If we're not going to require it of them, then we shouldn't be surprised that they cut the costs and don't do it.

Alexandra: That makes sense, which actually brings me to talking about the AI Act and other laws. How do you see transparency being shaped with the new AI law, and particularly also with the standards work which should accompany the AI act? Are we going in a direction where organizations need to be very transparent or do you feel that the requirements are more on the lower end of the spectrum?

Rania: Well, this is hard to say right now. If we rely only on the AI act, a lot of it is still up in the air. Beyond that, the AI act fundamentally is asking only for self-assessments. It doesn't ask for certifications, except in very special circumstances. We should also not forget that there are other forces at play and there are also other regulations that are coming in which might actually force greater transparency.

This can be things like ESG, so companies will be required to disclose their environmental impact and social impact, and eventually, this will also mean disclosing this about their AI systems. Then some of the liability directives might, let's say, encourage certain AI system providers to be transparent about their products to reduce the likelihood of litigation.

Alexandra: Understood.

Rania: Finally, people and civil society organizations who are doing the watchdogs and pointing out failures of AI systems, the reputation loss might push certain organizations to be more transparent about how they produce things.

Alexandra: Sure. We also saw that with privacy that some organization used it as a unique selling proposition or even something to shift their positioning in this direction. Maybe this is also something that we are going to see.

Rania: Let's be optimistic and say yes.

[laughter]

Alexandra: Good to hear that you're optimistic about this. Maybe one other thing since I know how important the role of standards is in the context of the new AI regulation and the laws that European Union puts forward, what's your point of view on this? Are we quick enough also with our standards? What's currently happening? What needs to change? Anything that you find worthwhile sharing with us about standards.

Rania: Let's just say, I'm very active in the ISO communities which are the communities that are producing the standards which will hopefully then support the AI Act. It's definitely a push to be fast enough in order to provide, let's say, all the proper standards and examples and requirements that we will need in order to make a regulation actually functional.

I think it's an important push and I'm hopeful that we will come up with a good, let's say, framework, and that we will continue to work on this to always improve it. Having the regulation then in place will be the extra motivation we need in order to keep up with the technology so that we can make sure that the standards that we've got really fit the state of technology that we have.

Alexandra: Sure. It's not set and forget, you always have to iterate and watch out.

Rania: I don't want to pretend that it's easy. Let's not forget that the technology is developing really rapidly and that there's actually a lot of resources and money going into this development and that standards are obviously lagging behind, and so it's always your playing catch up.

Alexandra: Sure. That doesn't make it easy to begin with. We talked a lot now about transparency and I found it super insightful because, as mentioned, for me, transparency was more in the line of what many politicians feel like, okay, just stating that AI is at play here, of course, with a little bit more detail, but definitely something more at the end of the process and not throughout the whole development life cycle and deployment life cycle and there's a baseline layer, but you are also an expert when it comes to fairness.

I would love to talk about the fAIr by design project with you and maybe also a little bit of fairness considerations with large language models. We touched them a little bit. What is the fAIr by design project and how did this all start and also lead to you founding your company?

Rania: Wow. First of all, the fAIr by design project is a project that's funded by the FFG, and now I don't know what is called in English, actually the FFG.

Alexandra: No worries. An Austrian entity that's giving out grants for research and businesses.

Rania: I think it's a really cool project also because we are a consortium and we really get to work with very different kinds of people. We have the data scientists, and then we have open innovation and social science, then we have the universities, at one time the Technical University, and then also law expertise. Then we have four different use cases, so everything from startup to a mid-size company to NGO and with very different kinds of use cases, from health to transportation to actually public radio and working with text to actually algorithms that some people would not call them AI, but they're definitely very complex mathematical optimization functions.

It's really a very broad scope, first of all, of perspectives, backgrounds, but also different kinds of technologies and different application fields. Our goal in the research project is to look at, "Well, let's take these projects and start looking at them from the very beginning, from the inception, and try to figure out what do I need to do during each different phase in order to ensure that my final product will be fair?" Obviously, this means each time you have to stop and ask yourself, "What does it mean to be fair in this situation?"

Even developing methods and tools to help you figure out, "For my company, for my values, for this context, what exactly does it mean to be fair?" and to be able to translate that into transparency about, "I'm developing my system and this is how I've defined fair, and now you can observe all the different steps that I've made, and you can even check that I followed my definition of fair." All of this turns out to be relatively new ground. Even though in each use case you'll find different elements that have to be emphasized, there are still procedures that turn out to be the same across the board.

Alexandra: That's actually interesting. Any procedures where you know they are probably applicable in many AI scenarios when fairness is one objective?

Rania: Yes. One of them would be transparency, just disclosing your decisions and why you made them. Another one is the fact of sitting down and looking at, "Well, who are my stakeholders here?" It's amazing how many people start developing an AI system without trying to figure out who's going to use this and who's going to be impacted by this? These are fundamental questions that actually, in my view, go beyond fairness. It's about how good is your product? What's the quality of your product?

This kind of stakeholder analysis, stakeholder involvement are also things that you have to do regardless of what context and regardless of the particular technology you're going to use. Then as you move along, there are certain tests that you can make obviously that will help you decide if something is fair or not and then exactly which one will depend, of course, on the technology you've used and the particular application context. Knowing that you have to do this at this stage turns out to be something that's overarching and it doesn't matter [crosstalk] which AI system you're developing.

Alexandra: This makes sense. In one of our earlier conversations you mentioned that this project also led to you founding your company. How did this happen and what is your current company about?

Rania: Actually, the consortium lead is a company called Winnovation. The founder of Winnovation is also my co-founder for Leiwand. It actually arose from our realization that when you want to create a good AI product, one that is worthy of being trusted, it's not enough to have only the technical expertise, you need to look at the whole system holistically and you do need to be able to understand organizational practices and social impact, and so you need the social science perspective as well.

Alexandra: Sure.

Rania: We have that and we also found that we work together really well, which is very important.

Alexandra: Absolutely.

Rania: So many projects fail not because people don't have the right qualifications, but because they don't get along on a personal level. That's how Leiwand was born. We complement each other on the expertise and also on a personal level. We all actually just have fun doing this, which is also so important.

Alexandra: That's never should be forgotten, the fun. You're focusing then on helping organizations build up also the organizational measures in conjunction to the technical aspect when it comes to fairness and transparency, or do you focus on one of the aspects?

Rania: We're actually focused on trustworthiness.

Alexandra: Trustworthiness, so more encompassing.

Rania: Yes. Obviously, trustworthiness is a huge area, and so from that, we have selected fairness, and because you can't really do fairness without being transparent, fairness and transparency. We also look at different aspects as well. Sometimes in addition you'll need explainability because in the particular context that the system is going to be deployed, you can't really be fair without explaining decisions to either the users of the system or the people who are going to be impacted.

Alexandra: Do you have a concrete example where this would be the case?

Rania: Well, let's just say if you're working, for example, in the HR field or in the medical field, important decisions are being made about people that can impact their health or their future career opportunities. In those situations, if you want to be fair, you also really need to be able to explain to people how a decision was made. In such situations, the explainability is just as important.

Actually, when you're, for example, testing, and this is even a third use case, if you're looking at NLP, and this is something we've done in a separate project, looking at bias in NLP systems, NLP systems, especially those built on language models, are so huge that it's really hard to get a grip on what's going on, even-

Alexandra: And what's going in?

Rania:  Yes. Explainability methods actually help you to detect which words, which particular features are being used in making a decision. That feeds into, again, the whole fairness testing. [crosstalk] That's even a third situation, which is actually based on the technology. Since the technology uses so much data, you use explainability to help you test for fairness there.

Alexandra: It's truly a fascinating field, and therefore, I can completely understand that you described working in this field is so much fun. Maybe before we come to an end, just very briefly on large language models and natural language processing, can you outline for our listeners why it's so challenging to identify biases or what some of the problems are that come with them?

Rania: Well, let's just say, you start with the fact that the way natural language processing is done nowadays is based not just on deep learning, which per se is hard to understand what is going on, but it's humongous. [chuckles] It's incredibly large data sets and incredibly large architectures. We have great difficulty understanding what kind of data went in, we've great difficulty understanding what exactly is going on inside the transformer architectures that are being used, and now with the chatbots, you've put on top of that reinforcement learning that's teaching the system even to have conversations.

You're layering one opaque thing on top of another that just creates more and more confusion, makes it difficult to track what's going on. Then when we go in and you look for bias, well, so far our techniques for finding bias have been based on simpler models. These simple methods will detect one particular kind of bias and then you can say, "Okay, well, I'll fine-tune this huge system to take care of this one thing that you've detected," but maybe all you've done is just hidden the bias so that your traditional detection methods won't find it anymore but it's still there.

There's this very nice paper called Lipstick on a Pig. They went and looked, but this was back in the day of word embeddings. Word embeddings had bias and people came and said, "Oh, but we can fix it." Then this paper came along and said, "Well, no, actually, you didn't fix it, you just hid it." We are still doing the same thing but on a bigger scale.

Alexandra: Can you remind me? I can't remember the paper in detail. What was the explanation or the examples that were given in this? How was the bias hidden and not properly removed? Or any other example, it doesn't have to be exactly from the paper.

Rania: I'm afraid I can't give you the exact examples. Basically, it showed that the discriminatory associations were still there, but they weren't the ones that are detected by simple analogy, which was what the word embedding mitigations were intended to fix. They were trying to fix the analogies that the word embeddings--

Alexandra: Like this classical man relates to doctors as female relates to nurse, something like that.

Rania: Exactly. Yes, which, by the way, there are chatbots other than ChatGPT, you can try that out. I sent a request in, "Daniel is to secretary as Sue is to what?' and I got as an answer, "Dan is to executive as Sue is to secretary." It's so stuck on the prejudice that it even ignored that I told it that Daniel was the secretary.

Alexandra: This is why Forbes described it as mansplaining as a service. [chuckles]

Rania: Yes. [laughs] It's a little bit unfair also to say that the system is biased or sexist, it's trained on our data, we are biased.

Alexandra: Sure.

Rania: We taught it.

Alexandra: I think it's just this kind of level of awareness now with ChatGPT, which is also more widely used in the general public, and just if people are not aware of the biases that went into the system and how this could impact the output that they are digesting, this maybe not helps us in working our ways away from this historical injustices towards a more just and non-discriminatory future, but definitely a challenge.

Rania: Absolutely. Who knows? Maybe that will be the good thing of all of these large language models, that we finally face up to the fact that we do have these biases, they're still here with us, and maybe we need to do something.

Alexandra: Do something about it. That's actually my optimism, even though it's not the state today, but I think that AI definitely forces us to lead these fairness conversations in a level of granularity that we as society just were never used to and that maybe in a few more many years in the future, we will be at a point where we can actually be more fair with also AI outcomes and counteract all these historic injustices and biases in this data, but I think we need to continue your episode in a few years and see where we are actually at.

Since we're running out of time, Rania, maybe before we end, are there any last call to actions, wishes, recommendations for our listeners? Either the data scientists or the business folks, what would you want them to keep in mind or start doing when it's about transparency and fairness by design?

Rania: Well, certainly, I encourage everyone to start thinking about transparency and trying to document. Actually, think about, "How much can I disclose? How much access can I give?" Because, to me, transparency is not only about documentation, but about allowing people to look inside your system. To think of it as not an imposition, but rather as something that will help you and your product become better and will set you apart from the rest of the field. Fundamentally, not all of it is complicated and hard, there are some very simple steps, baby steps-

Alexandra: That's a refreshing use, the uplifting use at the end.

Rania: -to get started. Yes.

[laughter]

Alexandra: Perfect. I think I definitely want to end on that note. Thank you so much for being with us today, Rania. You definitely expanded my horizon on transparency. I'm sure everybody took away a lot from today's episode, so many, many thanks.

Rania: Thank you, Alexandra. It's been a great pleasure. Thank you.

[music]

Alexandra: I hope you enjoyed today's episode as much as I did. Rania definitely changed my perspective what transparency and AI actually means. We'll be back in two weeks with the Data Democratization Podcast. Next up is an episode on the future of synthetic data. In the meantime, if you have questions, comments, or remarks about today's episode, we're always happy to hear from you. Just reach out to us via LinkedIn or by writing us a short email at podcast@mostly.ai. Until then, see you in two weeks.

Ready to try synthetic data generation?

The best way to learn about synthetic data is to experiment with synthetic data generation. Try it for free or get in touch with our sales team for a demo.
magnifiercross