# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

# remove warnings
import warnings
warnings.filterwarnings("ignore")

In [2]:
data = pd.io.stata.read_stata('C:/Users/jwhoj/Desktop/EDA_racial_discrimination/data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

In [4]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [5]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [6]:
data[['race']].head()

Unnamed: 0,race
0,w
1,w
2,b
3,b
4,w


<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>

# 1.) What test is appropriate for this problem? Does CLT apply?

The bootstrap replicates test is appropriate. We would test the null
hypothesis that black and white sounding named resumes receive the
same callback rate is true or false.
Central limit theorem does apply because 
a.) The sample size is sufficiently large (n > 30)
b.) The sampling is random and independent from every other observation

# 2.) What are the null and alternate hypotheses?
H0: The null hypothesis is that there is no statistical callback
difference in black and white sounding names. 
Ha: The alternative hypothesis is that there is statistical callback
difference in black and white sounding names.

In [7]:
w = data[data.race=='w']
b = data[data.race=='b']

# 3.) Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.

In [8]:
# Number of black and white sounding names
b_names = len(b)
w_names = len(w)

# Number of callbacks
b_call = sum(data[data.race=='b'].call)
w_call = sum(data[data.race=='w'].call)

# Call rate
b_rate = b_call/b_names
w_rate = w_call/w_names

print('number of black names', b_names)
print('number of callback for blacks', b_call)

print('number of white names', w_names)
print('number of callback for whites', w_call)

print('call rate for blacks', b_rate)
print('call rate for whites', w_rate)

number of black names 2435
number of callback for blacks 157.0
number of white names 2435
number of callback for whites 235.0
call rate for blacks 0.06447638603696099
call rate for whites 0.09650924024640657


In [9]:
# Create an array
call_array = np.array(data.call)

In [10]:
# Difference between call rates
diff_cr = w_rate - b_rate
diff_cr

0.032032854209445585

# Bootstrap

In [11]:
# Create bootstrap samples
# Resampling small samples from the original sample to create sampling distribution 
bs = np.empty(10000)
for i in range(10000):
    bs_sample_b = np.random.choice(call_array, size=b_names)
    
    # bs_b is the call rate for black sounding names, bs_w is the call rate for white sounding names
    bs_b = np.sum(bs_sample_b)/b_names
    bs_w = (np.sum(call_array)-np.sum(bs_sample_b))/w_names
    bs[i] = bs_w - bs_b

In [12]:
# p-value
p = np.sum(bs >= diff_cr) / 10000
print('p-value is', p)

p-value is 0.0014


The p-value is 0.0016 which is below the alpha value 0.05, so we reject the null hypothesis. 
There is a statistical significance to suggest that there is a difference in callback rates
between black people and white people. 

In [13]:
# Frequentist approach 

# Difference in sample means
x = np.mean(w.call) - np.mean(b.call)
print('Difference in sample means:', round(x, 4))

# Lengths
w_names = len(w)
print('Len white:', w_names)
b_names = len(b)
print('Len black:', b_names)

# Degrees of freedom
degrees_of_freedom = w_names + b_names - 2
print('Degrees of freedom:', degrees_of_freedom)

# Standard deviations
w_std = w.call.std()
b_std = b.call.std()
std = np.sqrt(((w_names - 1)*w_std**2 + (b_names - 1)*b_std**2) / degrees_of_freedom)
print('Standard deviation:', round(std, 3))

# Confidence level
alpha = 0.05
print('Alpha for 95% confidence level:', alpha)


Difference in sample means: 0.032
Len white: 2435
Len black: 2435
Degrees of freedom: 4868
Standard deviation: 0.272
Alpha for 95% confidence level: 0.05


In [28]:
# Standard Error 
standard_error = (np.sqrt((1/w_names)+(1/b_names))*std) 
print('Standard error is:', standard_error)

Standard error is: 0.007784906919813795


Since we are interested in the precision of the means and comparing the differences betweens means, we use standard error here 

# T-interval and t-test

For further exploration, I would like to implement the two-sample t-test approach because we do not know the population
standard deviation, but the sample standard deviation. Since we have two sample groups, we want to compare the mean 
callback of each group (white vs. black).
The general purpose of a two-sample t-test is to test if two population means are equal.  

In [18]:
import scipy.stats as st

# Calculate t for 2 standard deviations away from mean (95% of the normal distribution)
t = st.t.ppf(0.975, degrees_of_freedom)

# Calculate margin of error
margin_of_error = t * std * np.sqrt(1/w_names + 1/b_names)

# Difference in sample means
x = np.mean(w.call) - np.mean(b.call)

# Calculate interval
low = x - margin_of_error
high = x + margin_of_error

print("T confidence interval:", (low, high))

T confidence interval: (0.016770923005034827, 0.04729478670508633)


In [19]:
# Calculate the T-test for the means of TWO INDEPENDENT samples
# https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ttest_ind.html#scipy.stats.ttest_ind 
t_test = st.ttest_ind(w.call, b.call)
print('t-test statistic', t_test[0])
print('p-value:', t_test[1])

# Conclusion of test
# alpha = 0.05
if t_test[1] < alpha:
    print('Reject null hypothesis')
else:
    print('Fail to reject null hypothesis')

t-test statistic 4.114705290861751
p-value: 3.940802103128886e-05
Reject null hypothesis


From the two-sample t-test, it has produced a p-value of 3.940802103128886e-05 that is below the alpha value of 0.05
This leads us to reject the null hypothesis and suggests that there is a statistical difference in callback rates between
the two groups of samples. 

<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

# 4.) Write a story describing the statistical significance in the context or the original problem.

If we are to assume that all other factors and variables in the subjects resumes are the same, 
we would come to the same statistical conclusion that there is a difference in callback for white
vs. black named resumes. The below alpha level p-values and the 95% confidence intervals suggest 
that there is more than chance at play when it comes to blacks vs. whites being hired, but it may
possibly be due to their names. 


# 5.) Does your analysis mean that race/name is the most important factor in callback success? 
Why or why not? If not, how would you amend your analysis?

The analysis done does not necessarily mean that race/name is the most important factor in 
callback success. We can explore other factors and possibilities more to more accurately 
determine the callback rate. 

In [20]:
from IPython.display import display
callback = data.groupby('call').mean()
pd.options.display.max_columns = None
display(callback)

Unnamed: 0_level_0,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,occupbroad,workinschool,email,computerskills,specialskills,h,l,adid,fracblack,fracwhite,lmedhhinc,fracdropout,fraccolp,linc,col,eoe,parent_sales,parent_emp,branch_sales,branch_emp,fed,fracblack_empzip,fracwhite_empzip,lmedhhinc_empzip,fracdropout_empzip,fraccolp_empzip,linc_empzip,manager,supervisor,secretary,offsupport,salesrep,retailsales,req,expreq,comreq,educreq,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind
call,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1
0.0,3.619696,3.660563,7.751228,0.048013,0.410451,0.098928,0.437472,213.860875,3.460697,0.563644,0.475435,0.823805,0.31331,0.498437,0.501563,644.512939,0.313002,0.539347,10.142312,0.18704,0.211467,9.542634,0.720634,0.290755,580.99585,2191.904541,203.60231,768.117249,0.113438,0.078705,0.845355,10.661489,0.101429,0.332549,10.030473,0.15431,0.07816,0.332291,0.114113,0.153417,0.167709,0.792318,0.438142,0.124609,0.10987,0.440822,0.075257,0.085306,0.028584,0.085753,0.216615,0.26686,0.15096,0.165922
1.0,3.604592,3.670918,8.890306,0.107143,0.423469,0.076531,0.568878,235.936224,3.719388,0.512755,0.522959,0.783163,0.505102,0.545918,0.454082,734.767883,0.285926,0.581963,10.203009,0.169992,0.24076,9.644422,0.706633,0.295918,677.433594,3557.262451,105.912766,597.571411,0.130802,0.084698,0.820905,10.572091,0.105458,0.352861,10.046303,0.127551,0.066327,0.339286,0.170918,0.125,0.170918,0.729592,0.403061,0.127551,0.071429,0.395408,0.043367,0.056122,0.05102,0.076531,0.183673,0.278061,0.19898,0.155612


As you can see, there are subtle differences just by looking at the data: 
'yearsexp', 'honors', 'military', 'empholes' etc. I would delve into further 
analysis to see if race plays a part in these differences. 

In [21]:
race_df = data.groupby('race').mean()
display(race_df)

Unnamed: 0_level_0,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,occupbroad,workinschool,email,computerskills,specialskills,h,l,call,adid,fracblack,fracwhite,lmedhhinc,fracdropout,fraccolp,linc,col,eoe,parent_sales,parent_emp,branch_sales,branch_emp,fed,fracblack_empzip,fracwhite_empzip,lmedhhinc_empzip,fracdropout_empzip,fraccolp_empzip,linc_empzip,manager,supervisor,secretary,offsupport,salesrep,retailsales,req,expreq,comreq,educreq,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind
race,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1
b,3.616016,3.658316,7.829569,0.051335,0.414374,0.101848,0.445996,216.744969,3.487885,0.560986,0.479671,0.832444,0.32731,0.502259,0.497741,0.064476,651.777832,0.313214,0.540329,10.143023,0.185319,0.21264,9.547022,0.722793,0.29117,587.686462,2287.05127,196.050659,755.416992,0.114765,0.079096,0.843762,10.65568,0.101692,0.333873,10.031505,0.151951,0.077207,0.33306,0.118686,0.151129,0.167967,0.787269,0.435318,0.124846,0.106776,0.437372,0.07269,0.082957,0.03039,0.08501,0.213963,0.267762,0.154825,0.165092
w,3.620945,3.664476,7.856263,0.054209,0.408624,0.092402,0.450103,214.530595,3.475154,0.558111,0.47885,0.808624,0.330185,0.502259,0.497741,0.096509,651.777832,0.308439,0.545211,10.151353,0.186026,0.214998,9.554592,0.716222,0.29117,587.686462,2287.05127,196.050659,755.416992,0.114765,0.079096,0.843762,10.65568,0.101692,0.333873,10.031505,0.152361,0.077207,0.332649,0.118686,0.151129,0.167967,0.787269,0.435318,0.124846,0.106776,0.436961,0.07269,0.082957,0.03039,0.08501,0.213963,0.267762,0.154825,0.165092


Just by glancing at the data, you cannot see a noticeable difference between blacks and whites. 
This suggests that race is a prevalent portion of the callback process. 