# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
</div>
****

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pylab
sns.set()

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

# Data exploration

In this section, we will explore the data in order to determine the properties of the data frame we will be using.

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4870 entries, 0 to 4869
Data columns (total 65 columns):
id                    4870 non-null object
ad                    4870 non-null object
education             4870 non-null int8
ofjobs                4870 non-null int8
yearsexp              4870 non-null int8
honors                4870 non-null int8
volunteer             4870 non-null int8
military              4870 non-null int8
empholes              4870 non-null int8
occupspecific         4870 non-null int16
occupbroad            4870 non-null int8
workinschool          4870 non-null int8
email                 4870 non-null int8
computerskills        4870 non-null int8
specialskills         4870 non-null int8
firstname             4870 non-null object
sex                   4870 non-null object
race                  4870 non-null object
h                     4870 non-null float32
l                     4870 non-null float32
call                  4870 non-null float32
city        

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


# Question 1

The data set contains 4870 records. In this study, we are interested whether or not a candidate received a callback. This classification is similar to pass/fail test. The call back feature for each of these records can be modelled using a Bernouilli trial. When looking at the entiere group of records, we can model the sum of callbacks for a subset of record using a Binomial distribution. In our example, the size of the data set is large enough to extrapolate the behavior of the Binomial distribution to its limit. Per the Central Limit Theorem (CLT), for large sample, the Binomial distribution converges to a normal distribution.

In [5]:
# number of callbacks for black-sounding names
b_call = int(sum(data.call[data.race=='b']))
print("Number of callbacks for black-sounding names:",b_call)
# number of callbacks for white-sounding names
w_call = int(sum(data.call[data.race=='w']))
print("Number of callbacks for white-sounding names:",w_call)
# number of non-callbacks for black-sounding names
b_nocall = len(data.call[(data.race=='b') & (data.call==0)])
print("Number of non-callbacks for black-sounding names:",b_nocall)
# number of non-callbacks for white-sounding names
w_nocall = len(data.call[(data.race=='w') & (data.call==0)])
print("Number of non-callbacks for white-sounding names:",w_nocall)

Number of callbacks for black-sounding names: 157
Number of callbacks for white-sounding names: 235
Number of non-callbacks for black-sounding names: 2278
Number of non-callbacks for white-sounding names: 2200


# Question 2

The purpose of this analysis is to determine if the callback rate is different for white-sounding names and black-sounding names. In order to provide an answer to this question, we will perform a hypothesis testing using the following hypoteses:

**Null hypothesis**: Ho -> There is no difference between the callback rate between black and white sounding names.  
**Alternative hypothesis**: Ha -> There is a difference between the callback rate between black and white sounding names.

# Question 3  

In this section, we will compute:
- the margin of error
- the confidence interval
- the p-value associated to our hypothesis testing  

The descriptive statistic that we are using for our hypothesis testing is the difference of mean of callback rate between the two name types. Moreover, since we do not have the standard deviation of the callback rates for the entiere populations, we will be using a T-test with two samples. We will also consider the 95% confidence interval (note: we could have used any percentage but the 95% is the most commonly used).

In [6]:
# Extract the samples related to each name types
b_call = data[data.race=='b'].call
w_call = data[data.race=='w'].call

In [7]:
# Compute observe mean difference
diff_w_b_mean = w_call.mean()-b_call.mean()
print("Difference in mean callback rate:",diff_w_b_mean)

Difference in mean callback rate: 0.03203285485506058


In [8]:
# Compute standard error: t-test, two samples
SE = np.sqrt(b_call.var()/b_call.size+w_call.var()/w_call.size)
print("Standard Error:",SE)

Standard Error: 0.00778490682688


In [9]:
# Compute degree of freedom: t-test, two samples
v_b = b_call.var()
v_w = w_call.var()

n_b = b_call.size
n_w = w_call.size

DF = int((v_b/n_b+v_w/n_w)**2/(((v_b/n_b)**2/(n_b-1))+((v_w/n_w)**2/(n_w-1))))
print("Degree of freedom:", DF)

Degree of freedom: 4711


In [10]:
# Compute critical value: t-test, two samples
t_crit = stats.t.ppf(0.975,df=DF)
print("Critical value:",t_crit)

Critical value: 1.96046767176


In [11]:
# Compute margin of error:
margin = t_crit * SE
print("Margin of error:",margin)

Margin of error: 0.0152620581617


In [12]:
# Compute 95% confidence interval: t-test, two samples
CI_95 = [diff_w_b_mean-margin,diff_w_b_mean+margin]
print("95% Confidence interval:",CI_95)

95% Confidence interval: [0.016770796693321634, 0.047294913016799521]


In [13]:
# Compute the t-score associated to the t-test
t_score = diff_w_b_mean/SE
print("t-score:",t_score)

t-score: 4.11473837355


In [14]:
# Compute p-value associated with t-test:
p_val = stats.t.sf(t_score,df=DF)*2
print("p-value:",p_val)

p-value: 3.94238662022e-05


In [15]:
# Verification:
stats.ttest_ind(w_call,b_call,equal_var=False)

Ttest_indResult(statistic=4.1147052908617514, pvalue=3.9429415136459352e-05)

**Analysis**: The p-value is very small. Therefore, we reject the null hypothesis. In conclusion, there is a significant difference in callback rate between name types. In addition, we can also reach the same conclusion by inspecting the confidence interval. Indeed, the value 0.0 is not contained in the 99% confidence interval.

# Question 4, 5

In addition to the purely statistical meaning of our test, we can concluded that when considering only name sounding, the population of black applicants do not receive callback as much often as the population of white applicants. This statement cannot serve as the final conclusion of our analysis. Indeed, the data set provided for this analysis contains several other features that can be used. If we were to pursue this analysis, we would try to prove that race is a key feature when comparing the callback rate.

For instance, we would group applicants by category (age, degrees, experience, industry) and repeat this analysis. Our assumption is that we would still find statistically significant difference between the callback rate but this time, the results would be more specific. These could then be used to target industries and employeers who show statistical difference in callback rate and create/modify the law to protect the equality between applicants no matter their races.

# Experimentation with $\chi^2$-Test

Instead of separating the dataset into two groups based on name-sounding, we will now directly perform a $\chi^2$ test. To do so, we first need to generate a contingency table.

In [16]:
# Contigency table for chi-square test
contingency_table = pd.crosstab(data.race,data.call)
contingency_table.columns = ['No_call','Call']
contingency_table.index = ['Black','White']
contingency_table

Unnamed: 0,No_call,Call
Black,2278,157
White,2200,235


We will now perform a $\chi^2$-test with the following hypotheses:
  
**Null hypothesis**: Ho -> The two features are independent.
**Alternative hypothesis**: Ha -> The two features are statistically dependent.

In [17]:
chi2,p,dof,expected = stats.chi2_contingency(contingency_table)
print('p-value:',p)

p-value: 4.99757838996e-05


With a very low p-value, we can reject the null hypothesis and validate the alternative hypothesis. The conclusion remains valid, the callback rate is correlated with the name-type.