What does it mean for a model to be fair?

Facundo Santiago
6 min readOct 21, 2018

As Machine Learning models are more and more being used to inform, decide, and act upon human’s life instantly, a lot of concern has been raised on the fairness of the outcomes of such decisions. Particularly on avoiding decisions affecting groups disparately because of gender, race, religion, etc. But what does it mean to be fair?

It is important to distinguish model fairness from other unforgivable curses models suffer from:

Dataset bias: One widely used facial-recognition data set was estimated to be more than 75 percent male and more than 80 percent white, according to a research study. As a result Facial Recognition Is Accurate, if You’re a White Guy.

Model security: This would be the case of the well know Twitter bot “Tay” which learned to create racists tweets by being gamed by other users. We need to refrase what security means in these contexts.

One of the best ways to appreciate the need for a definition is to encounter a natural problem and find oneself more or less forced to make the definition in order to solve it. Let’s see a concrete example taken from Wachter et. al (2017) where the authors train a three-layer fully-connected neural network to predict a student’s average grade of the first year at law school, based on grade point average (GPA) prior to law school, race and law school entrance exam scores (LSAT).

Predicted scores based on the GPA, LSAT and Race features. Scores have been normalized, i.e. a student with a score of 0 is as good as the average of the students. A negative score means a below-average result, a positive score an above-average result.

A common concern around neural networks is interpretability: it is hard to tell how each of the features impacts on the final score. A technique that can be used to shed some light on it is called counterfactual feature values. This technique finds the smallest change of the feature values that would change the prediction to a different, meaningful, output. It allows us to understand which changes are required to get the desired outcome. Let’s image that we are interested to see how the input features would need to be changed to get a predicted score of 0 (as good as the average of the students).

Originally predicted scores and original features values. The last 3 columns represent the counterfactual feature values that would result in a 0 predicted score.

Recall that the method would find the “smallest change of the feature values that would change the prediction”. If you pay a closer look at instances 4 and 5, the model is suggesting to change race from black (coded as 1) to white (coded as 0). Would you say that there is a racial bias in the model? What does fairness of the outcomes really mean? What does fairness in a model mean?

Some comments:

You (I do) may cast doubt about how race is used as categorical feature. Since the method finds the smallest change of the features values (i.e. the closest point in the feature space), how far is the category 0 from 1? Is it close? Is it far? How can you measure that? (in case you’re wondering, this particular implementation is using Manhattan distance to compute it).

A definition:

According to J. Angwin, J. Larson, S. Mattu, and L. Kirchner (2016):

Individual fairness considers individuals who belong to different sensitive groups, yet share similar non-sensitive features (qualifications), and require them to receive similar decision outcomes. On the other hand, group fairness is based on different sensitive groups receiving beneficial decision outcomes in similar proportions.

Individual vs. group fairness

It is interesting to see that group fairness does not necessarily imply individual fairness. Here an example taken from the same source as above: Imagine that your company is hiring for a talented person and you are considering people from groups A and A’. However, in group A most talented people steers to engineering while in group A’ most talented people steers to accounting. If you look for your candidates in engineering then you will be picking the wrong subset from A’, even while mantaining group fairness. Of course, this naive error comes from blindly ignoring how A and A’ are composed.

If the datasets we use are representative, complete, unbiased samples from the real population then Machine Learning techniques (most of them) will satisfy statistical parity, the property that the demographics of those receiving a given prediction are identical to the demographics of the population as a whole. However, statistical parity is related to group fairness, but not precisely to individual fairness. We’ve shown before how easily is to play the fool when you don’t know how the groups are composed.

You may start to think then to remove race from the dataset. Grgic-Hlaca et al. (2016) showed how understandable models can easily mislead our intuitions, and that predominantly using features people believed to be fair slightly increased the racism exhibited by algorithms while decreasing accuracy. The reason (and the following is an opinion) is that race encodes a lot of other features which are more related to the society we live in rather than with the person itself.

Let me share with you an even more drastic report from a 2002 Institute of Medicine on racial and ethnic disparities in health care, which stressed that “a large body of research underscores the existence of disparities.” As examples, the report stated:

… minorities are less likely to be given appropriate cardiac medications or to undergo bypass surgery, and are less likely to receive kidney dialysis or transplants. By contrast, they are more likely to receive certain less-desirable procedures, such as lower limb amputations for diabetes and other conditions.

How on earth do we make sense of this? According to Dr. David Williams, professor of African and African American Studies at Harvard University, the research shows that when people hold a negative stereotype about a group and meet someone from that group, they often treat that person differently and honestly don’t even realize it. Williams noted that most Americans would object to being labeled as “racist” or even as “discriminating”, but he added, “Welcome to the human race. It is a normal process about how all of us process information. The problem for our society is that the level of negative stereotypes is very high.”

In this context, fairness in a model actually means trying to correct the distortions society has been creating for centuries inside itself, because of power, fear or ignorance. Fairness means to discover what are the implications that linger inside a given feature. Models are as good as the data you have. But the data you have is as good as the people, interactions and effects it captures.

It is critical to fix bias and make our models fair. It is because we shape our tools and afterward our tools shape us. Technologies are not neutral in our society; they have values, worldviews, and assumptions hard-coded into them. We need to think carefully about whether what the tool does, and what we actually want to achieve, are the same thing.

It is the job of all of us who work in technology to think about how our products and services shape and interact with society, which new scenarios they will enable, which other will vanish, which effects will amplify and which ones will deem.

If you want to put hands on this subject, in my post Model interpretability — Making your model confesses, I make a small review of all the methods available to check if your models are playing safe, advantages and disadvantages of each technique as well as code samples (mostly in Python). This includes techniques like LIME, Feature Importance, Shapley values, among others. Check it out.

--

--

Facundo Santiago

Product Manager @ Microsoft AI. Graduate adjunct professor at University of Buenos Aires. Frustrated sociologist.