Training Data Matters
In the case of Joy’s study, the training data was mostly filled with white male faces, and so are the most widely used facial recognition data sets. According to the New York Times, one widely used data set is more than 75 percent male and more than 80 percent white.
These data sets aren’t just used in school projects and harmless apps either. Big names in data like Microsoft, IBM, and even Google have come up short. In the case of the darkest-skinned women, it failed to recognize their gender nearly fifty percent of the time, a failure rate so high they might as well be guessing at random.
Until standards emerge, the responsibility is on those developing facial recognition software to make sure they use diverse data sets. Bias in training data can be mitigated, but only if someone sees that it’s there and knows how to correct it.
Joy, for one, is using her desire to improve the state of facial recognition software to form the Algorithmic Justice League. The purpose of this group is to raise awareness and address the issues of inclusion and bias in tech.