# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
print(sum(data[data.race=='b'].call))
# number of callbacks for white-sounding names
print(sum(data[data.race=='w'].call))

157.0
235.0


In [4]:
data.head()
#we can see here that the resume is quantified into column categories such as "years of experience" "volunteer" "military", 
#etc... but we are not here to determine what factors influence callback rates except whether the name they used on the
#resume sounds black or white.

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


# Question 1
What test is appropriate for this problem? Does CLT apply?

The appropriate test for this situation is a 2-proportion-z-test.

Also, the CLT applies because 1) the number of samples is far greater than 30 and 2) they are independent of each other just like how different people sending in different resumes in real life is an independent process with no event influencing the outcome of the next event's success. The employer has a criteria and the resume either meets their criteria or not, not to be influenced by the resume before, or to influence the resume after. 

# Question 2
What are the null and alternate hypotheses?

The Null Hypothesis: Race does not have a significant effect on the rate of callback for resumes. Or, in other words, and for this case study specifically: 

H0: The proportion of callback for resumes with 'names that sound black' is equal to the proportion of callback for resumes with 'names that sound white.'

The Alternate Hypothesis: Race does have a significant effect on the rate of callback for resumes. Or, in other words: 

H1: The proportion of callback for resumes with 'names that sound black' is NOT equal to the proportion of callback for resumes with 'names that sound white.'

# Question 3
Compute margin of error, confidence interval, and p-value.

In [5]:
#PERFORMING QUANTITATIVE EDA 
print(data['id'])
print(data.shape)
print(data.columns)

0       b
1       b
2       b
3       b
4       b
5       b
6       b
7       b
8       b
9       b
10      b
11      b
12      b
13      b
14      b
15      b
16      b
17      b
18      b
19      b
20      b
21      b
22      b
23      b
24      b
25      b
26      b
27      b
28      b
29      b
       ..
4840    a
4841    a
4842    b
4843    b
4844    b
4845    b
4846    a
4847    a
4848    a
4849    a
4850    b
4851    b
4852    b
4853    b
4854    a
4855    a
4856    a
4857    a
4858    a
4859    a
4860    a
4861    a
4862    b
4863    b
4864    b
4865    b
4866    a
4867    a
4868    a
4869    a
Name: id, Length: 4870, dtype: object
(4870, 65)
Index(['id', 'ad', 'education', 'ofjobs', 'yearsexp', 'honors', 'volunteer',
       'military', 'empholes', 'occupspecific', 'occupbroad', 'workinschool',
       'email', 'computerskills', 'specialskills', 'firstname', 'sex', 'race',
       'h', 'l', 'call', 'city', 'kind', 'adid', 'fracblack', 'fracwhite',
       'lmedhhinc', 'fracdropout

In [6]:
#we are only interested in the 'race' and 'call' columns for this particular experiment
print(data['race'])

0       w
1       w
2       b
3       b
4       w
5       w
6       w
7       b
8       b
9       b
10      b
11      w
12      b
13      w
14      b
15      w
16      w
17      b
18      w
19      b
20      b
21      w
22      w
23      w
24      w
25      b
26      b
27      w
28      b
29      b
       ..
4840    b
4841    b
4842    b
4843    w
4844    b
4845    w
4846    w
4847    w
4848    b
4849    b
4850    b
4851    w
4852    w
4853    b
4854    w
4855    w
4856    b
4857    b
4858    b
4859    b
4860    w
4861    w
4862    w
4863    w
4864    b
4865    b
4866    b
4867    w
4868    b
4869    w
Name: race, Length: 4870, dtype: object


In [22]:
resume_white = data[data.race == 'w']
total_white = resume_white.shape[0]
resume_black = data[data.race == 'b']
total_black = resume_black.shape[0]

print(total_white)
print(total_black)

white_call = sum(data[data.race == 'w'].call)
print(white_call)
black_call = sum(data[data.race == 'b'].call)
print(black_call)

white_proportion = white_call/total_white
black_proportion = black_call/total_black
pool_proportion = (white_call + black_call)/(total_white + total_black)
print('White proportion of callbacks: ' + str(white_proportion) + ' Black proportion of callbacks: ' + str(black_proportion))
print('The pooled proportion is: ' + str(pool_proportion))

2435
2435
235.0
157.0
White proportion of callbacks: 0.0965092402464 Black proportion of callbacks: 0.064476386037
The pooled proportion is: 0.0804928131417


In [9]:
white_mean = np.mean(resume_white.call)
white_std = np.std(resume_white.call)
print('Mean of white callback rate: ' + str(white_mean) + ", STD of white callback: " + str(white_std))
black_mean = np.mean(resume_black.call)
black_std = np.std(resume_black.call)
print('Mean of black callback rate: ' + str(black_mean) + ", STD of black callback: " + str(black_std))

Mean of white callback rate: 0.09650924056768417, STD of white callback: 0.29528486728668213
Mean of black callback rate: 0.0644763857126236, STD of black callback: 0.24559901654720306


In [10]:
#Using an alpha of 0.05, and being a two-tailed, two-sample test, we can compute the margin of error by first calculating:
#the pooled standard deviation between the two samples
pooled_std = np.sqrt( ((2435 - 1)*white_std**2 + (2435 - 1)*black_std**2) / (2435 + 2435 -2)  )    
MoE = 1.96 * pooled_std * np.sqrt(1/2435 + 1/2435)
print('The margin of error is: ' + str(MoE))

The margin of error is: 0.0152552843854


In [11]:
#the confidence interval is calculated as the difference of the means + or - the margin of error:
confidence_lb = (white_mean - black_mean) - MoE
confidence_ub = (white_mean - black_mean) + MoE
print('The confidence interval is: (' + str(confidence_lb) + ',' + str(confidence_ub) + ')')

The confidence interval is: (0.0167775704696,0.0472881392405)


In [31]:
#the p value is calculated as follows:
z_val = (white_mean - black_mean)/np.sqrt((white_std**2)/total_white + (black_std**2)/total_black)
print('The z-value is: ' + str(p_val))

The z-value is: 4.11558342208


In [23]:
#margin of error for 2 proportions calculation:
proportion_MoE = 1.96* np.sqrt(pool_proportion*(1-pool_proportion)*(1/total_white + 1/total_black))
print('The margin of error is : ' + str(proportion_MoE))
#note that this is very close but not identical to the 2 sample z test done above

The margin of error is : 0.0152819123109


In [24]:
#confidence interval calculation for 2 proportions test:
prop_conf_lb = (white_proportion - black_proportion) - proportion_MoE
prop_conf_ub = (white_proportion - black_proportion) + proportion_MoE
print('The confidence interval is: (' + str(prop_conf_lb) + ',' + str(prop_conf_ub) + ')')
#note that this too is close but not identical to the 2 sample z test calculation. 

The confidence interval is: (0.0167509418986,0.0473147665203)


In [25]:
#finally, to check the z-value using the 2 proportion z test
prop_z_val = (white_proportion - black_proportion)/ np.sqrt(pool_proportion*(1-pool_proportion)*(1/total_white + 1/total_black))
print('The z-value of the 2 proportion test is: ' + str(prop_z_val))

The z-value of the 2 proportion test is: 4.10841215243


In [33]:
#p-value calculation
from scipy.stats import norm
p_value_2sample = scipy.stats.norm.sf(abs(z_val))*2
p_value_2props = scipy.stats.norm.sf(abs(prop_z_val))*2
print('2 sample z-test p-val: ' + str(p_value_2sample))
print('2 proportion z-test p-val: ' + str(p_value_2props))

2 sample z-test p-val: 3.86201280197e-05
2 proportion z-test p-val: 3.98388683759e-05


# Question 4 Write a story describing the statistical significance in the context or the original problem.
We ran the calculations using two tests: 2 proportion z-test and 2 sample z test. They were very slightly different, and yielded nearly identical results. However, in this case, the 2 proportion z test is the appropriate one to use. The final conclusion is:

Confidence Interval: (0.0167509418986,0.0473147665203) we are 95% sure that there is a 1.675% to 4.731% significant difference between white and black call back success rates. Keep in mind, the black mean callback was only 6.45% and the white was 9.65%, so this confidence interval represents significant change. In the worst case, for black applicants, 6.45% - 4.731% is 1.719%, which is a dramatic -73.35% change. At best, the percent change from 6.45% to 4.775% is a - 26% change. So the practical confidence interval we are observing with this data set is that: "We are 95% that the difference between being black and being white attributes to a reduced chance of receiving a callback by 26% to 73.35% for black applicants.

Z-Value and P-Test: The z-value was calculated as 4.108, which greatly exceeds the critical z-value of 1.96. Practically speaking, we can see that this shows a deviation from the mean of the normal distribution of the Null Hypothesis that the observed difference between black and white callback success rates is equal to 4.108 standard deviations. The chance for that to happen is .003984%, which is extremely unlikely and therefore we can reject the null hypothesis under 95% certainty. Unfortunately, this data supports the claim that there are significant signs of racial prejudice by hiring managers. Perhaps this is because they are reading the applications behind a computer screen, and not face to face with their applicant. It is very easy to say "No" to an applicant, and that means that even something as simple as having a "black sounding name" can be enough to trigger a racial bias within the hiring manager resulting in turning down the application. 

# Question 5: Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

My analysis does not mean that race/name is the most important factor in callback success because there are also many other factors at play. For example, a "black sounding name" on a resume with a PhD has undoubtedly drastically increased qualifications over a resume with a bachelor's. I believe that first, the qualifications must be met, then the race may play a factor once there are two or more applicants who are similar in qualifications (assuming a racist hiring manager). If the qualifications are handily met by an applicant, race would probably not play a big role in their hiring besides in cases of extreme racism. I would like to see an analysis breaking down regionally where the "black name resumes" were declined most significantly on a heat map. That would add a layer of depth to explore the root of the racial discrimination within our nation. 

Another amendment to the analysis could be to use pictures instead of names (as explained above in Q4). It would be very interesting to see if showing applicant pictures, revealing their race, would have similar results as using different sounding names.