# Do credit scoring algorithms discriminate against protected groups?

## Goals

By the end of this case study you should be able to:

1. Meaningfully interpret summary statistics
2. Meaningfully interpret data visualizations
3. Think critically about sociological factors at play in the data collected for machine learning applications

Most importantly, you will explore how historical & representation bias can creep into training datasets, and bias the final conclusions *in favor* of existing biased practices, thus perpetuating them.

## Introduction

**Business Context.** Investigative reports about digital financial services (DFS) have found instances of bias in the mechanisms that determine who a bank should loan money to. In many parts of the world, financial services are typically accessed based on an algorithmic assessment of their credit history. However, these systems have historically excluded consumers who are financially marginalized through intersecting forces of oppression. For example, in the United States, African Americans are disproportionately denied home loans because of a legacy of policies and banking practices implemented decades ago that were designed to exclude Black individuals from home ownership (also known as [Red Lining](https://en.wikipedia.org/wiki/Redlining)). Beyond race or ethnicity, many other factors may contribute to unfair distribution of financial opportunities, such as an applicant's gender, location, or age. 

In this case, you are a data analyst for a major credit bureau. Your organization is concerned that the data on which they have trained their assessment tools are leading to discriminatory outcomes. The company wants to know if their predictions have been inaccurate for specific subgroups of the population based on how their prediction of credit worthiness was correlated with protected categories (e.g., gender, nationality, age). The company believes that if you can find patterns in the dataset used to build their model that reflect long-standing unfair social determinants of creditworthiness, they can then rectify this to reduce their contribution to unfair outcomes.

**Business Problem.** Your employer would like you to answer the following: **"What are the hidden biases in our datasets used to train our credit risk assessment algorithms?"**

**Analytical Context.** This dataset includes information about individuals and their credit history (whether they had failed to pay their loans before, what other loans they had, etc.) The credit agency will train a model on this data to decide whether to approve individuals for a loan (we typically use something called [*classification models*](https://towardsdatascience.com/supervised-learning-basics-of-classification-and-main-algorithms-c16b06806cd3) for these tasks, which you will learn about in later cases). The model will predict whether the individual will default (stop paying the loan, which is a bad outcome). You can find more information about the dataset [here](http://www1.beuth-hochschule.de/FB_II/reports/Report-2019-004.pdf).

## Data exploration

Let's start by importing the necessary libraries and loading in the dataset:

In [1]:
# Import necessary libraries
import pandas as pd

In [2]:
# Load annd examine the dataset
df = pd.read_csv('data/german_credit.csv')
df.head()

Unnamed: 0,Id,status,duration,credit_history,purpose,amount,employment_duration,installment_rate,personal_status_sex,other_debtors,present_residence,age,housing,number_credits,job,people_liable,foreign_worker,credit_risk
0,0,1,18,4,2,1049,2,4,2,1,4,21,1,1,3,2,2,1
1,1,1,9,4,0,2799,3,2,3,1,2,36,1,2,3,1,2,1
2,2,2,12,2,9,841,4,2,2,1,4,23,1,1,2,2,2,1
3,3,1,12,4,0,2122,3,3,3,1,2,39,1,2,2,1,1,1
4,4,1,12,4,0,2171,3,4,3,1,4,38,2,2,2,2,1,1


Below, we have provided a description of the most important features and what various values for those features means. Please note that the currency used in this dataset is German Deutsche Mark, which is abbreviated as DM:

* **``Id``**: ID of individual entries for evaluation.
* **``status``**: Status of the debtor's checking account with the bank.
    * `1`: No checking account
    * `2`: Negative account balance
    * `3`: 0 - 199 DM account balance
    * `4`: 200+ DM account balance
* **``duration``**: Credit duration, in months.
* **``credit_history``**: History of compliance with previous or concurrent credit contracts.
    * `0`: Delay in paying off in the past
    * `1`: Critical account/other credits elsewhere
    * `2`: No credits taken/all credits paid back duly 
    * `3`: Existing credits paid back duly until now
    * `4`: All credits at this bank paid back duly
* **``purpose``**: Purpose for which the credit is needed.
    * `0`: Others
    * `1`: Car (new)
    * `2`: Car (used)
    * `3`: Furniture/equipment 
    * `4`: Radio/television
    * `5`: Domestic appliances 
    * `6`: Repairs
    * `7`: Education
    * `8`: Vacation
    * `9`: Retraining
    * `10`: Business
* **``amount``**: Credit amount in DM.
* **``employment_duration``**: Duration of debtor's employment with current employer.
    * `1`: Unemployed
    * `2`: Less than 1 year
    * `3`: 1 - 3 years
    * `4`: 4 - 6 years
    * `5`: 7+ years
* **``installment_rate``**: Credit installments as a percentage of debtor's disposable income.
    * `1`: 35%+
    * `2`: 25 - 34.99% 
    * `3`: 20 - 24.99%
    * `4`: Less than 20%
* **``personal_status_sex``**: Combined information on sex and marital status. (Sex cannot always be recovered from the variable, because male singles and female non-singles are coded with the same code 2. Furthermore, female widows cannot be easily classified, because the code table does not list them in any of the female categories.)
    * `1`: Divorced or separated male
    * `2`: Single male OR non-single female
    * `3`: Married or widowed male
    * `4`: Single female
* **``other_debtors``**: Whether or not there is another debtor or a guarantor for the credit.
    * `1`: None
    * `2`: Co-applicant 
    * `3`: Guarantor
* **``present_residence``**: Length of time (in years) the debtor has lived in the present residence.
    * `1`: Less than 1 year
    * `2`: 1 - 4 years
    * `3`: 4 - 7 years 
    * `4`: 7+ years
* **``age``**: Debtor's age, in years.
* **``housing``**: Type of housing the debtor lives in.
    * `1`: Free
    * `2`: Rent
    * `3`: Own
* **``number_credits``**: Number of credits including the current one the debtor has (or had) at this bank.
    * `1`: 1
    * `2`: 2 - 3 
    * `3`: 4 - 5 
    * `4`: 6+
* **``job``**: The quality of the debtor's job.
    * `1`: Unemployed/unskilled non-resident
    * `2`: Unskilled resident
    * `3`: Skilled employee/official
    * `4`: Manager/self-employed/highly-qualified employee
* **``people_liable``**: Number of persons who financially depend on the debtor (i.e. are entitled to maintenance).
    * `1`: 3+ 
    * `2`: 0 - 2
* **``foreign_worker``**: Whether or not the debtor is a foreign worker.
    * `1`: Yes 
    * `2`: No
* **``credit_risk``**: Whether the credit contract has been complied with (good) or not (bad).
    * `0`: Bad
    * `1`: Good

A full description of the dataset can be found [here](https://archive.ics.uci.edu/ml/datasets/South+German+Credit+%28UPDATE%29).

### Exercise 1

Examine the dataset's summary statistics and the provided visualizations below to better understand the demographic distribution of the data. Briefly summarize your main findings.

<img src="./data/images/stats_cont_vars.png" width="400">

<img src="./data/images/age_hist.png" width="600">
<img src="./data/images/age_cred_hist.png" width="600">

<img src="./data/images/pers_status_cred.png" width="600">
<img src="./data/images/risk_by_gender.png" width="600">

<img src="./data/images/job_cred.png" width="600">
<img src="./data/images/risk_by_job.png" width="400">

<img src="./data/images/foreign_cred.png" width="600">
<img src="./data/images/risk_by_foreign.png" width="400">

<img src="./data/images/deps_cred.png" width="600">
<img src="./data/images/risk_by_deps.png" width="400">


**Answer.**

-------

### Exercise 2

Learn about the social context of the problem by reading and summarizing research on financial discrimination in the population. For more information on the dataset sampling techniques, visit the [dataset's accompanying report](http://www1.beuth-hochschule.de/FB_II/reports/Report-2019-004.pdf).

From there, find one or two other reputable sources (e.g. research papers that have used the dataset, critiques of the dataset, German population reports, German financial industry statistics, German discrimination laws and financial product regulations) to form an opinion about the relevant social context. Summarize your findings and cite your references.

**Answer.**

-------

### Exercise 3

Look at your answer to Exercise 1 and see if it reflects your findings in Exercise 2. Summarize your conclusions.

**Answer.**

-------

### Exercise 4

Focus on two demographic variables - gender and age. Compare the outcomes in each gender group and in each age group using the provided visualizations below. What groups are privileged for age and gender? What do you think are the sources of these disparities?

<img src='data/images/gender_cred.png' width=500>
<img src='data/images/under_25_cred.png' width=500>

**Answer.**

-------

### Exercise 5

In their [article](https://arxiv.org/pdf/1901.10002.pdf) *A Framework for Understanding Unintended Consequences of Machine Learning*, authors Harini Suresh and John Guttag provide a brief description of various kinds of dataset bias. Specifically, they concretely define the concepts of **historical bias** and **representation bias**. Is our case an example of historical bias, representation bias, or both? Justify your answer.

**Answer.**

-------

## Reflection

### Discussion 1

Reflect on what it means for a dataset to be biased in the context of this case. Then reflect on how this may generalize to other example domains where we may want to examine bias.

In this broader context, does a difference in the data across groups constitute a bias in itself? Are there other elements that are necessary to say that there is bias? What are some of those elements?

### Discussion 2

Reflect on your own biases (e.g. cognitive, social, ideological) you brought to this case study. What messages have you received from media, your education, family, or your peers that may bias your approach to this problem? Did any of the results you saw in the dataset surprise you? Why?

### Discussion 3

With a partner, brainstorm ways to assess and address those cognitive biases when working on future data analysis projects. Describe one way you could implement a personal bias check into your own workflow.

## Conclusions

The South German Credit dataset, widely used to build financial services prediction models, is indeed biased. Importantly, we concluded that in this dataset:

1. Women are underrepresented compared to men
2. People under 25 are underrepresented compared to people over 25
3. Foreign workers are overrepresented compared to domestic workers
4. Caretakers with many dependents are underrepresented in comparison to those with fewer than 3 dependents

We saw that the dataset contained biases that were both historical and representational in nature, and that those two types of biases are intertwined. Historical (societal) biases, for example, may be the reason why underprivileged groups then become underrepresented in datasets. Thus, an investigation into whether or not the credit scoring algorithm your company uses contains biases is warranted. In fact, it is almost certain that the model will perform in ways that reinforce existing barriers for women and young people.

## Takeaways

In this case study, we learned about two common forms of dataset bias. When datasets are biased, models can become biased, especially if the dataset biases go undetected. We used data exploration techniques to understand the nature of the dataset and reason about how it might reinforce existing social biases. We learned how to perform background and domain research on our datasets to understand where the data is coming from, how it was generated, and what the social context of the data sample was at the time of collection.

Most importantly, we practiced critical thinking and self-reflection as we completed this case. It is important that we as data professionals remember that we are social beings bringing our biases, preconceptions, and blind spots to our projects. We should incorporate ways to recognize and account for these biases in our project workflows.

## Attribution

Grömping, U. (2019). South German Credit Data: Correcting a Widely Used Data Set. Report 4/2019, Reports in Mathematics, Physics and Chemistry, Department II, Beuth University of Applied Sciences Berlin.