20 Feb
The machine learning diversity issue; How not to repeat history.

Let’s talk about the machine learning diversity problem.

It goes without saying that diversity in any industry is massively important. We all come from different backgrounds and cultures, all offering important contrasts with differing ideas and outlooks on the world. 

Diversity in the coding industry, however, is vital to the successful evolution of technology. If the lack of diversity is ignored within the tech and coding industry, humanity could repeat the worst moments of our history. There is a very real AI diversity problem that needs to be resolved. 

To understand this concerning issue, let’s first take a look at how coding has evolved, and the way machine learning actually works. 


How has coding evolved?

In traditional coding, we write code, or ‘instructions’ for a computer to follow. 

Google Translate used to have over 1 million lines of human-written code. Now, thanks to advances within the coding industry and the development of ‘machine learning’, Google Translate has around 500 lines of code that call in machine learning to do the rest of the job.

machine learning

How does machine learning work?

Machine learning is what is commonly used now to develop code. It works like this: 

You give a computer inputs and outputs, and the computer writes its own code or ‘instructions’ on how to get there. You give the computer many examples of the input and the output, which allows the computer to write rules on how to get from A (input) to B (output).  

Machine learning is the computer understanding the correlation between the input and output, and writing its own directions/instructions to get from the input to the output. 

The way this works is you give the computer examples (inputs) and tell the computer what the example is (output). In the show Explained on Netflix, Season 2 (https://www.netflix.com/watch/81097620?trackId=13752289&tctx=0%2C0%2Cc5bc9f8e-98cc-4b0f-a9fb-ca40eee13e11-10505337%2C%2C) has an episode called ‘Coding’, where they use an example of a party to explain machine learning. 

You show the computer images (input) of, for example, a party. You tell the computer these examples are of ‘a party’ (output). 

You then show the computer examples of photos that are not a party (input) and tell the computer that these examples are ‘not a party’ (output). 

This is how the computer learns how to distinguish if an image is an image of ‘a party’, or if it’s an image of ‘not a party’. From these examples, the computer builds a classification system based on the information it was given of what a party looks like in an image. Once it’s created this classification or set of ‘rules’, the computer can successfully identify a photo of a party. 

machine learning

How could Machine Learning cause humanity to repeat history?

Let’s say that because of an oversight, or because the people working on this project have very similar life experiences, the examples of a party that were given to the computer didn’t include any black or Asian people. This would mean that if a black or Asian person is in a photo of a party, the computer will distinguish the photo as ‘not a party’ because having a black or Asian person present in the photo doesn’t fit the computers' classification of a party, classifications made from the examples a human or humans gave it. 

Now, this isn’t to say that this gross oversight was purposeful, however, this type of oversight inadvertently creates a racist machine learning program. 

With this, we start to see how easily these huge issues could arise. There are a million ways in which bias can occur without diversity in the coding industry, the image of ‘a party’ being one example. 

Men, Women, non-binary peoples, members of the LGBTQ+ community, people from all different races, old people, young people, people with different religions and politics, we all have unique life experiences that shape who we are. If we want to translate these important cultural differences into our technology, diversifying the coding industry is an absolute must. More life-experience, a wider variety of life-experience is only beneficial for machine learning. 

This being said, the coding industry would need to take it one step further and really look at fixing systematic societal issues within the data it’s feeding machine learning. For example, the criminal justice system, which has structural racism built into the data from our history. Feeding this data into machine learning is going to cause repetitions of the mistakes we have made in the past. 

Historical data will lead us to remake mistakes from the past if not manually corrected by the tech and coding industry. 

We live in a really exciting time. We’re watching the tech industry change the course of humanity in front of our eyes. It was only in the ’60s that the world started to see computers being commercialised for business. The first laptop wasn’t introduced until the ’70s, around 50 years ago! Machine learning has the ability to change the world in a really fundamental way that could be wonderful, but it won’t be wonderful by itself. It needs people, people with different life experiences to make it great. 

If you’d like to be a part of the ever-evolving coding industry, check out this article by influencer shecancode (https://shecancode.io/blog/top-ten-coding-bootcamps-in-europe) where she talks about 10 great coding boot camps in Europe!

A blog by Gabrielle Lazareff



Explained - Season 2 - Coding