Four years ago, the software engineer Jack Alciné caused a storm by pointing out to Google that their algorithm had the unsavoury tendency to classify his black friends as Gorillas. Following a public outcry for blatant racism, the giant apologised and diligently ‘fixed’ the problem. Last year Amazon got into hot water by finding its advanced AI hiring software heavily favoured men for technical positions. Again, retraction followed the outcry. In a more newsworthy style, an unfortunate translation from Facebook accidentally got a Palestinian man arrested in Israel by mis-translating a caption he had posted on a photo of himself. Posing next to a bulldozer, the caption read ‘Attack them!’ instead of ‘Good morning!’. The man underwent questioning for several hours until the mistake came to light.
But the GAFA aren’t the only ones struggling to navigate the dangers of at-scale AI and one can easily find a plethora of examples of discriminatory data science. Take the work coming out of the MIT Media Lab for example, where Joy Buolamwini[i] showed in early 2018 that three of the latest gender-recognition AIs, from IBM, Microsoft and Megvii, could infer a person’s gender from a photograph 99 per cent of the time as proclaimed… provided it was a white man. For dark-skinned women, accuracy dropped to a mere 35 per cent. You can imagine their public relations troubles.
One could easily think that it is solely a matter of writing smarter code: better translation algorithms, better image recognition software, etc... But closer inspection shows that waving the magic wand of perfect software engineering wouldn’t get us very far with regards to these issues, as their roots are deeper than code. The fact is that AI algorithms learn from given (often external) sources of data, and therefore their actions will naturally reflect the leanings or affinities of the information these sources contain. For Amazon’s hiring bot, it perpetuated the preference of the HR department, for Google’s recognition tool, it just had mainly seen pictures of white people. Even an autonomous AI gathering data dynamically would only be able to learn from the environment it would be subject to. And that leads to discrimination. Often for the simple reason that learning is by nature discriminatory and suffers from the unknown unknown problems.
This effect is known as “algorithmic bias” and is becoming a common issue for data scientists. Google didn’t go out of its way to be racist, Facebook didn’t intend to get users arrested and IBM et al didn’t decide to make their facial recognition software blind to black women. They were ‘victim’ of the environment their AI learned from and the negative impact on people’s lives was collateral damage of this limitation.
As data science becomes increasingly mainstream, managing algorithmic bias cannot become the elephant in the room. It is crucial that organisations implement fair protocols that lead to fair outcomes and decisions. Doing so will become part of the social contract. However, as educated on the issue as people are, it is commonly assumed that this work will be done at a data level, confined in data science labs. But in reality, it must inscribe itself in a broader societal effort. Not only because training models to be consistent and robust is very difficult, but because many key considerations belong outside the lab:
Call to Action
The issues of ethics and bias in data science are major in scale, and they are going to sneak in to everyone’s life, regardless of the attention they are paid. It is becoming painfully clear that many layers of society must come together to define the future of AI. It is a sin of arrogance to believe that tech can deal with this and we encourage all companies to engage in dialogue with lawmakers and politics. Data Scientists should have their own Hippocratic Oath but their responsibility should end there. Their value lies in creating robust models that achieve their aim of enhanced decision making. The rest of society must assist so clear auditing protocols can (and will) be used widely by public and private companies alike.
By Dany Majard, Data Scientist, Outra, UK