# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
</div>
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [5]:
data.race.value_counts()

w    2435
b    2435
Name: race, dtype: int64

In [6]:
data.call.value_counts()

0.0    4478
1.0     392
Name: call, dtype: int64

In [7]:
data.groupby(['race', 'call']).size()

race  call
b     0.0     2278
      1.0      157
w     0.0     2200
      1.0      235
dtype: int64

In [8]:
data_w = data[data.race == 'w']
data_b = data[data.race == 'b']

In [9]:
callpct_w = sum(data_w.call) / len(data_w)
callpct_b = sum(data_b.call) / len(data_b)
callpct_w, callpct_b

(0.096509240246406572, 0.064476386036960986)

### 1. What test is appropriate for this problem? Does CLT apply?

A two-proportion z-test is appropriate.  It compares the difference between two proportions (the alternative hypothesis) to the null hypothesis (no difference).  The proportions are calculated from categorical data by dividing the sum of 'successes' by the sample size and are used to represent the sample mean.  The explanatory variable in this case is 'race', and we want to determine if there is a statistically significant difference in callback proportion ('call') by 'race.'  As it's a z-test, the Central Limit Theorem does apply


### 2. What are the null and alternate hypotheses?

The experiment is structured as a one-tailed test.
* Null hypothesis: (Successful 'call' proportion for 'race' 'b') - (Successful 'call' proportion for 'race' 'w') >= 0
    * callpct_b - callpct_w >= 0
* Alternative hypothesis: (Successful 'call' proportion for 'race' 'b') - (Successful 'call' proportion for 'race' 'w') < 0
    * callpct_b - callpct_w < 0

### 3. Compute margin of error, confidence interval, and p-value.

In [10]:
sum(data.call) / len(data)

0.080492813141683772

In [11]:
pooled_prop = (callpct_b * len(data_b) + callpct_w * len(data_w)) / len(data)
pooled_prop

0.080492813141683772

In [12]:
# Standard Error
SE = np.sqrt(pooled_prop * (1 - pooled_prop) * ((1/len(data_b)) + (1/len(data_w))))
SE

0.0077968940361704568

The standard error of the pooled sample proportion of callbacks is 0.007797.

In [13]:
crit_value = stats.norm.ppf(0.05)
crit_value

-1.6448536269514729

In [14]:
# Confidence Interval
conf_int = stats.norm.interval(0.95, loc=(callpct_b - callpct_w), scale=SE)
conf_int

(-0.047314485711614819, -0.016751222707276352)

The 95% confidence interval for the difference between black-sounding name callbacks and white-sounding name callbacks is between -4.73% and -1.68%

In [15]:
z_stat = (callpct_b - callpct_w) / SE
z_stat

-4.1084121524343464

In [16]:
# P-value
p_value = stats.norm.cdf(z_stat)
p_value

1.9919434187925383e-05

The p-value of 0.00199% is much less than the 5% cutoff, indicating that the null hypothesis can be rejected.

### 4. Write a story describing the statistical significance in the context or the original problem.

If the null hypothesis (that black-sounding names are equal or more likely than white-sounding names to receive callbacks) were true, there would only be a 0.00199% chance of the observed data occurring.  This is strong evidence that the result did not occur by chance and that there is a statistically significant difference between the groups.

### 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

No, there may be other confounding variables (e.g. education level differences by race) that account for the statistically significant real difference in callback rates between the two groups.  However, these variables' impact would be minimized/eliminated if they were randomized between the two groups.