SOURCES
OF MACHINE
LEARNING
BIAS
AI is far from infallible. We’re constantly reminded of this in the
headlines we read, as well as in our own lives.
When we produce AI training data we know to look for the biases that can
influence machine learning. In our experience there are four distinct kinds of
bias that data scientists and AI developers need to be aware of and guard
against. This paper offers a brief overview of these sources of bias in machine
learning and suggests ways to mitigate their impact on your AI systems.
Data scientists try to find a balance between a model’s bias and its variance,
another property that refers to the sensitivity of a model to variations in the
training data.
As you can see here, models with high variance tend to flex to fit the training
data very well. They can more easily accommodate complexity, but they are
also are more sensitive to noise and may not generalize well to data outside
the training data set.
Models with high bias are rigid. They are less sensitive to variations in data and
may miss underlying complexities. At the same time, they are more resistant
to noise. Finding the appropriate balance between these two properties for a
given model in a given environment is a critical data science skill set.
When the word bias is used outside machine learning, it most often refers
to something else. The dictionary definitions of bias, reflecting popular use
of the term, do not include the concept of algorithmic machine learning bias.
However, many of these popular definitions of bias also apply to machine
learning. Unlike algorithmic bias, the other three sources of AI bias are all
found in the data that are used to train AI models.
Sample bias occurs when the data used to train the model
does not accurately represent the problem space the model
will operate in.
There are a variety of techniques for both selecting samples from populations
and validating their representativeness. There are a number of techniques for
identifying population characteristics that need to be captured in samples,
and for analyzing a sample’s fit with the population.
So, for example, a model will not conclude that nurses are female if it is
exposed to images of male nurses in proportion to the number of males and
females in the world (but perhaps disproportionate to what can be found in
a workplace). A chatbot that has learned hate speech through exposure to it
will have to be constrained to stop using it. And the humans who label and
annotate training data may have to be trained to avoid introducing their own
societal prejudices or stereotypes into the training data.
Measurement Bias
The distortion could be the fault of a device. For example, a camera with a
chromatic filter will generate images with a consistent color bias. An 11-7/8
inch long “foot ruler” will always overrepresent lengths.
As with sample bias, there are established techniques for detecting and
mitigating measurement bias. It’s good practice to compare the outputs of
different measuring devices, for example. Survey design has well-understood
practices for avoiding systematic distortion. And it’s essential to train labeling
and annotation workers before they are put to work on real data.
AI models and algorithms are built by humans, and the data that train these
algorithms are assembled, cleaned, labeled and annotated by humans. The
sometimes poorly fit math built into algorithms seeks out patterns in the
sometimes biased data. The results, predictably, are not always what the
designers intended.
Not all data science teams have the skills in-house to avoid and mitigate
training data bias. New data science teams, especially, may struggle with
this. And those teams that by virtue of training or experience do have the
necessary skills still find that it’s a lot of work.