You may remember that back in 2015 Google was called out for its photo app that mistakenly labeled pictures of people with darker skin color as “gorillas”. As you can imagine, it was a PR disaster back then. Of course, the company publicly apologized, said that such a result is unacceptable and promised to fix the mistake. But apparently – as Wired uncovered three and a half years later – they somehow never got to truly fixing the underlying issue. What they did is, they implemented a workaround: by blocking their AI from identifying gorillas (and some other primates) altogether to prevent another miscategorization. A Google spokesperson confirmed to Wired that certain image categories were and remained blocked after the incident in 2015 and added that “Image labeling technology is still early and unfortunately it’s nowhere near perfect”. But what does this mean for other companies and their chances for success in fighting bias in AI? If one of the biggest tech companies – that employs some of the brightest AI experts – was not able to come up with a better solution? Firstly, it definitely proves the point that it is extremely difficult to mitigate bias in machine learning models. Secondly, it may raise the question of whether Google really was not capable of resolving the issue, or whether they just were not willing to dedicate the necessary resources?  But before we look into how companies could tackle bias in Artificial Intelligence (and whether AI regulations could be the motivational factor to do so), let us start with a (non-exhaustive) list of reasons why algorithms are biased.
Read the other parts of the series:

Reason #1: Insufficient Training Data

As mentioned in part 1 of our Fairness Series, a major problem of bias in AI is that not enough training data was collected. Or more precisely, that only limited data for certain demographic groups or groups with extraordinary characteristics is available. The consequences of insufficiently diverse data can easily be observed with facial recognition technology. A study showed, that models performed significantly better on pictures of white males (99% accuracy) versus black females (65%), because the majority of images used in model training consisted of white men.

Reason #2: Humans Are Biased – And So Is The Data That AI Is Trained On

Whether we like it or not, as humans we all carry our (un)conscious biases that are reflected in the data we collect about our world. As this is the exact same data that is used to train AI models, it is not surprising that these biases find their way into algorithms. Imagine a hiring algorithm that is trained on existing U.S. employment data. Last year, women accounted for only 5% of CEOs in the top 500 companies. They also held significantly less senior management positions than their male co-workers. What would this mean for the algorithm? Quite likely, it would pick up that being female correlates poorly with being a CEO. And if hiring managers were to look for the ideal candidate to fill an open senior management position, the system will probably mainly show the resumes from male applicants. Another common problem with human bias occurs in the context of supervised machine learning, where humans oftentimes label the data that is used to train a model. Because even if they are well-intentioned and not mean any harm, their unconscious biases could sneak into the training sample.

Reason #3: De-Biasing Data Is Exceptionally Hard To Do

If you wanted to have a fair algorithm – but we have just established that historical data is biased – what if you would clean the data to make it fair? One approach that has been tried is removing sensitive attributes. For example, a person’s race. Unfortunately, research has shown that this does not prevent models from becoming biased. Why? Because of correlated attributes that can be used as proxies. Think about a neighborhood that is known to be home to predominantly black people. Even if the race-column would have been excluded from the training data, the ZIP code of this neighborhood would serve as a proxy that indicates the race of a person. It has been shown, that even if sensitive columns were removed, proxies allowed for systematic discrimination of minorities. For example, the denial of bank loans or access to Amazon’s same-day purchase delivery option. To counteract this, some researchers advise to actually keep the sensitive columns in the dataset, as they could serve as a more straightforward lever to mitigate data bias. For example, if you aim for a model that treats males and females equally, you can use the gender-column to directly monitor and correct for potential violations of your desired equality criteria during model training.

Reason #4: De-Biasing AI Models Is Very Difficult Too

There is a multitude of reasons why it is difficult to develop a machine learning model that is free of bias. One aspect to consider is that there are many decisions involved in the construction of a model that could potentially introduce bias – but the downstream impacts of them oftentimes do not become apparent until much later. For example, the choices AI researchers made in regards to how speech was analyzed and modeled led to the outcome that a speech recognition algorithm performed significantly worse for female speakers as opposed to male ones. Another aspect that has been criticized is that common practices in deep learning are not designed to help with bias detection. Because even though models are usually tested before they are deployed, this oftentimes happens with a holdout sample from the training dataset. While this certainly helps with the evaluation of an algorithm’s accuracy, it does not help with bias detection since the data that is used for testing is as biased as the data that is used during the training. Lastly, building an unbiased model requires expert knowledge that not every AI engineer may have obtained (yet). Especially as more and more “off the shelf”-algorithms become available nowadays, that can be used by non-experts, this becomes an additional point of concern.

Reason #5: Diversity Amongst AI Professionals Is Not As High As It Should Be

Lack of diversity is another contributing factor to bias in AI: At Facebook and Google less than 2% of technical roles are held by employees with darker skin color – and women account for only 22% of AI professionals globally. A famous example of why diversity helps to mitigate bias comes from Joy Buolamwini, founder of the Algorithmic Justice League and graduate researcher at the MIT Media Lab. When the Ghanaian-American computer scientist joined her research group, she discovered that facial recognition tools performed poorly on her darker skin tone – and sometimes only worked if she wore a white mask. Another quite peculiar incident that can be attributed to non-diversity happened to a South Korean woman, who was sleeping on her floor until a robotic vacuum cleaner “attacked” and ingested her hair. Rest assured that firefighters managed to rescue her (minus about 10 strands of her hair). But if the product development team would have consisted of a more diverse group of people – with different cultural backgrounds – somebody might have raised the question of whether all future users tend to sleep in beds. But if something is not part of a person’s reality it is hard to think about it, consider it, and ask the necessary questions.

Reason #6: Fairness Comes At A Cost (That Companies May Not Be Willing To Pay)

Depending on what is most important for a company that is developing an AI algorithm, it could either optimize the model to maximize the profits, to increase the revenue or the number of customers. No matter what they decide on, their main objective will be to improve the model’s accuracy. However, what would happen if the company decided that it wanted to have a fair model as well? Then the model would be forced to balance two conflicting objectives. Achieving fairness would inevitably come at the cost of maximum accuracy. In our economy companies often tend to optimize for profit. Thus, it can be put into question how many businesses would voluntarily decide to take the path of fairness or whether regulations would be required to “persuade” them?

Reason #7: External Audits Could Help – If Privacy Would Not Be An Issue

Especially in scenarios where AI applications are used in high-stake environments, voices were being raised that external audits should be used to systematically vet algorithms to detect potential biases. This may be an excellent idea – if privacy would not be an issue. To thoroughly evaluate an algorithm not only access to the model but also to the training data would be beneficial. But if a company would share the privacy-sensitive customer data it used to develop its model, it would quickly get into conflict with GDPR, CCPA, and other privacy regulations. However, synthetic data – a new approach to big data anonymization – could provide a solution for this issue. Synthetic data tools allow to generate fully anonymous, yet completely realistic and representative datasets. Their high accuracy enables an organization to directly train its machine learning models on top of it, while the strong privacy protection properties allow to externally share synthetic datasets with auditors without infringing people’s privacy.

Reason #8: Fairness Is Hard To Define

In the 1970s only 5% of musicians in the top five orchestras were female. Blind auditions increased the percentage of women to 30%, which certainly is an improvement – but many people would agree that this is not yet fair. However, it would be much harder to reach an agreement about what would be fair. Should there be 50% of women in the orchestra, because roughly half of our world’s population is female? Or would it be fairer if the same percentage of female, as well as male applicants, get accepted? For example, 20% each. Considering that many modern orchestras employ approximately 100 full-time musicians, this could mean that 40 seats get to female musicians and 60 to male ones (if 200 women and 300 men were to apply). Others might argue that due to centuries of injustice (and overrepresentation of male musicians in orchestras) employing significantly more women would be fairest. As you can see, it is pretty hard to define fairness. One reason is that different people have different values. Another one is that there are so many different ways to define fairness – in general, as well as mathematically (Arvind Narayanan, an associate professor of computer science at Princeton, even compiled an astonishing list of 21 fairness definitions).

Reason #9: What Was Fair Yesterday Can Be Biased Tomorrow

Do you remember Microsoft’s “Tay”? The innocent AI chatbot started as a harmless experiment and was intended to learn from conversations with Twitter users – which it did (but probably not as imagined). In less than a day, Tay became misogynistic and racist:
Fig. 1: Microsoft’s AI chatbot “Tay” became racist and misogynistic in less than a day.
Quickly, Microsoft decided to take Tay from the web. What remained is a statutory example that even if you take measures to mitigate bias during the initial training phase, many algorithms are designed to continuously learn and thus are potentially vulnerable to become biased over time. (An issue, where also external audits would be stretched to their limits if they are not designed for constant monitoring.)

Reason #10: The Vicious Bias Cycle (Biased AI Will Lead To More Bias)

If bias in AI is not successfully addressed, it will perpetuate and potentially even amplify biases in our society. A good example of this is Google’s search algorithm that was accused of showing racist images. When users searched for terms like “black hands”, the algorithm predominantly showed pictures of black hands working in the earth and other derogatory depictions. It can be assumed that more users will click on the top search results as opposed to potentially more neutral images which are not shown on the first page. Consequently, the algorithm is much more likely to display them more often and thus would further contribute to perpetuating biases.

How To Achieve Fairness In AI?

Precisely answering this question would go beyond the scope of this post. However, a first and very important step would be that society demands fairness in AI and puts it on the agenda of regulators (which subsequently will improve the chance that de-biased AI also makes it on the priority list of conscientious companies). Once the relevant stakeholders have decided that they want to have anti-discriminatory algorithms, a next step would be to define fairness and to establish a shared understanding about which outcomes would be considered ethically acceptable. Thirdly, researchers, and AI practitioners should continue to collaborate on the development of solutions to reduce AI bias. As diverse training data takes on such an important role in the mitigation of bias it is imperative that companies start to collect data that reflects the full spectrum of human diversity. Additionally, removing bias already at its source – namely, in existing datasets – would allow developing models without having to be concerned about bias correction anymore. Fair Synthetic Data seems to be particularly promising in this regard, but we will dive deeper into this topic in part 4 of our Fairness Series. Another important aspect is diversity in teams. However, achieving it may not be the easiest task considering not only the gender gap but also the underrepresentation of ethnic minorities in data science. Thus, companies might not be able to tackle this issue alone but will also be dependent on universities and governments undertaking efforts to make AI and data science education more attractive and inclusive. At this point, we would love to hear your thoughts and ideas on how fairness in AI could be achieved. Message us anytime for a fun conversation or a friendly debate. Tomorrow, part 3 of our Fairness Series will follow where we shine the spotlight on the concept and the definition of fairness.