# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


### answer start here:
   1. What test is appropriate for this problem? Does CLT apply?
      - I will use independent two sample t test
   2. What are the null and alternate hypotheses?
      - null hypothesis is there is not difference in callback rates between the resumes which has a black-sounding name and the resumes which has a white sounding name. The alternate hypotheses is that there is a difference between those two.
   3. Compute margin of error, confidence interval, and p-value.
      - the margin of error for p-value equals 0.05 was calculated.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?
      - there are other factors affecting the callback rates. For example, the years of experience is one of the factors. We control the years of experience to be 6 years, and studied the differences of callback rates between two groups. And we found there is no significant difference between the two groups.

In [5]:
len(data)

4870

In [6]:
data_b = data.loc[data.race == 'b']
data_w = data.loc[data.race == 'w']

In [7]:
len(data_b)

2435

In [8]:
np.std(data_b.call)

0.24559901654720306

In [9]:
np.std(data_w.call)

0.29528486728668213

#### In this case, we will use independent two sample t test, to test whether the two sample have the same mean. 

#### There are two prerequisite for using two sample t test, one is the size of the two samples are the same (both are 2435), the other one is the variances of the two sample should be the same (???? not equal).


In [10]:
import scipy.stats as stats
stats.ttest_ind(data_b.call,data_w.call)

Ttest_indResult(statistic=-4.1147052908617514, pvalue=3.9408021031288859e-05)

In [19]:
#Compute margin of error, confidence interval, and p-value.
# RESOURCE: http://www.statisticshowto.com/probability-and-statistics/hypothesis-testing/margin-of-error/
ME = 1.96*np.std(data_b.call)/np.sqrt(len(data_b))
print('margin of error for black names= ', ME)
print('confidence interval for black names = ', np.mean(data_b.call)-ME, 'to',np.mean(data_b.call)+ME)

margin of error for black names=  0.00975513338481
confidence interval for black names =  0.0547212523278 to 0.0742315190974


In [18]:
ME_w = 1.96*np.std(data_w.call)/np.sqrt(len(data_w))
print('margin of error for white names= ', ME_w)
print('confidence interval for white names = ', np.mean(data_w.call)-ME_w, 'to',np.mean(data_w.call)+ME_w)

margin of error for white names=  0.0117286433284
confidence interval for white names =  0.0847805972393 to 0.108237883896


#### From the test above, we get the t score equals -4.11, which is much less than -1.96, which mean the distribution of data set for black sounding name cases are in the extreme 5%. So reject the null hypotheses and we think there is a discrimination. 

#### However, this test do no consider other factors. So this test is no reliable. Let's look at other factors.

In [None]:
list(data)

#### I select several columns which I think may affect the callback rate:

In [None]:
data_select = data[['education','yearsexp','sex','race','occupspecific','call']]

In [None]:
# resource: https://stackoverflow.com/questions/17114904/python-pandas-replacing-strings-in-dataframe-with-numbers
mapping = {'f':1, 'm': 0,'b':1,'w':0}
corrmat=data_select.replace({'sex':mapping,'race':mapping}).corr()

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

#corrmat = data_select.corr()

sns.heatmap(corrmat,square=True)

plt.show()
print (corrmat)

#### From the about analysis, we can see that yearsexp and occupspecific also affect the callback rate. We need to control those factors which may affect the result. 

In [None]:
data_b['yearsexp'].describe()

In [None]:
data_w['yearsexp'].describe()

In [None]:
data_b['occupspecific'].describe()

In [None]:
data_w['occupspecific'].describe()

#### In the following test, we select those samples with 6 years of experience.

In [None]:
len(data_b.loc[data_b['yearsexp']==6])

In [None]:
len(data_w.loc[data_w['yearsexp']==6])

In [None]:
data_b_6 = data_b.loc[data_b['yearsexp']==6][:408]

In [None]:
data_w_6 = data_w.loc[data_w['yearsexp']==6][:408]

In [None]:
stats.ttest_ind(data_b_6.call,data_w_6.call)

#### t test conclusion: 
the score is larger than -1.96, which means we accept the null hypothesis that there is no discrimination between both groups.