# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
</div>
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [5]:
data_final = data[['race','call']]
data_final.head()

Unnamed: 0,race,call
0,w,0.0
1,w,0.0
2,b,0.0
3,b,0.0
4,w,0.0


## Q1.A) What test is appropriate for this problem? 
    A chi-square test is suitable in cases where a difference between two groups are to be analyzed. In our case, we are trying to see if there is any significan difference in callbacks for resumes with a balck name Vs those with white names. 

## Q1.B) Does CLT apply?
    This is a case of binomial distribution where there is a probability for success(p) and failure(1-p) for each of the observantions (trials) in the dataset. A binomial distribution can be approximated to a normal distribution if the following conditions are satisfied

    -Total number of trials, i.e. the value of "n" is large enough.
    -The probability of success (p) is near to 0 or 1. For example - 0.08, 0.04 etc.
    -Total probability in each trial must be 1.
    -All the conditions of binomial distribution must be satisfied.
    -The product of number of trials and probability of success should be greater than or equal to 1, i.e. np ≥ 1
    -The product of number of trials and probability of failure should be greater than or equal to 1, i.e. nq ≥ 1
    

In [6]:
# Total number of trials
print("n =" ,len(data_final))

n = 4870


In [7]:
data_final_1 = data_final.groupby(['call', 'race']).size().unstack('race')
data_final_1

race,b,w
call,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,2278,2200
1.0,157,235


In [8]:
data_final_1['b'].sum()

2435

In [9]:
data_final_1['w'].sum()

2435

In [10]:
# Probability of success in this case is the probability of getting a call
p_b = data_final_1.loc[1][0]/data_final_1['b'].sum()
p_w = data_final_1.loc[1][1]/data_final_1['w'].sum()
print("probability of success for black =",p_b)
print("probability of success for white =",p_w)

probability of success for black = 0.064476386037
probability of success for white = 0.0965092402464


In this case both the probabilities are closer to 0

Also for each resume the outcome could either be a success(callback) or a failure (no callback). Hence the probability of each of the events to occur is 0.5 each. Sum of probabilities = 1

In [11]:
# The product of number of trials and probability of SUCCESS should be greater than or equal to 1, i.e. np ≥ 1

# For blacks
np_b = data_final_1['b'].sum()*p_b
print("Product of number of trials and probability of success for blacks:", np_b)

# For whites
np_w = data_final_1['b'].sum()*p_w
print("Product of number of trials and probability of success for whites:", np_w)


Product of number of trials and probability of success for blacks: 157.0
Product of number of trials and probability of success for whites: 235.0


## Q2) What are the null and alternate hypotheses?

#### Hypothesis test for Chi2
    Ho : There is no significant relationship between callbacks and race
    H1 : There is a significant relationship between callbacks and race
    Alpha = 0.05


## Q3) Write a story describing the statistical significance in the context or the original problem

In [12]:
from scipy.stats import chi2_contingency
data_final_1 = data_final.groupby(['call', 'race']).size().unstack('race')
data_final_1

race,b,w
call,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,2278,2200
1.0,157,235


In [13]:
# chi2-contingency uses chi2 test of independence to determine the validity of the underlying hypothesis
#chi2_contingency(data_final_1)

print("Chi2 test statistic('chi2_calculated'): ", chi2_contingency(data_final_1)[0])
print("p-value: ", chi2_contingency(data_final_1)[1])
print("Degrees of freedom: ", chi2_contingency(data_final_1)[2])
#print("expected data_final: \n", chi2_contingency(data_final_1)[3])

Chi2 test statistic('chi2_calculated'):  16.4490285842
p-value:  4.99757838996e-05
Degrees of freedom:  1


In [14]:
from scipy.stats import chi2
chi2_critical = chi2.isf(q=0.05, df=1)
chi2_critical

3.8414588206941245

#### Conclusion
    Since chi2_calculated >> chi2_critical , we got such a low p-value (look up chi table with df = 1). Since p-value < 0.05, we can reject the null hypothesis Ho.
    This implies that there is indeed a significant relationship between callbacks and race.

## Q4) Compute margin of error, confidence interval, and p-value.

Standard error for a two way table = sqrt(p1*q1/n1 + p2*q2/n2)

Difference in proportions of success for blacks and whites = p1 - p2

    where
     p1 : proportion of success (callbacks) for blacks 
     q1 : proportion of failure (no callbacks) for blacks 
     n1 : Number of resumes related to a black name
 
     p2 : proportion of success (callbacks) for whites 
     q2 : proportion of failure (no callbacks) for whites 
     n2 : Number of resumes related to a white name
 
     Alpha = 0.05
     Confidence level = 95%
     
 
 Margin of error = Z*Standard error
     Z at alpha/2 ~ 1.96
 
 Confidence interval = [(p1 - p2) + margin of error],[(p1 - p2) - margin of error]
        

In [15]:
data_final_1 = data_final.groupby(['call', 'race']).size().unstack('race')
data_final_1

race,b,w
call,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,2278,2200
1.0,157,235


In [16]:
# Standard error
from math import sqrt
p1 = data_final_1.loc[1][0]/sum(data_final_1['b'])
q1 = data_final_1.loc[0][0]/sum(data_final_1['b'])
n1 = sum(data_final_1['b'])

p2 = data_final_1.loc[1][1]/sum(data_final_1['w'])
q2 = data_final_1.loc[0][1]/sum(data_final_1['w'])
n2 = sum(data_final_1['w'])

standard_error = sqrt(p1*q1/n1 + p2*q2/n2)
print("Standard error:", standard_error)

Standard error: 0.007783370586676755


In [17]:
# Margin of error
Z = 1.96
margin_of_error = Z*(standard_error)
print("Margin of error:", margin_of_error, 'or', margin_of_error*100,'%')

Margin of error: 0.01525540634988644 or 1.525540634988644 %


In [18]:
# Confidence Interval 
upper_limit = abs((p1 - p2)+ margin_of_error)
lower_limit = abs((p1 - p2)- margin_of_error)

print("Upper Limit of CI:", upper_limit,'or', upper_limit*100,'%')
print("Lower Limit of CI:", lower_limit,'or', lower_limit*100,'%')

Upper Limit of CI: 0.0167774478596 or 1.67774478596 %
Lower Limit of CI: 0.0472882605593 or 4.72882605593 %


#### Conclusion
We 95% confidence we can conclude that blacks receive less callbacks than whites by a % point ranging anywhere between 1.67% and 4.72% .