# Resume Experiment Analysis

Luopeiwen Yi

How much harder is it to get a job in the United States if you are Black than if you are White? Or, expressed differently, what is the *effect* of race on the difficulty of getting a job in the US?

In this exercise, we will be analyzing data from a real world experiment designed to help answer this question. Namely, we will be analyzing data from a randomized experiment in which 4,870 ficticious resumes were sent out to employers in response to job adverts in Boston and Chicago in 2001. The resumes differ in various attributes including the names of the applicants, and different resumes were randomly allocated to job openings. 

The "experiment" part of the experiment is that resumes were randomly assigned Black- or White-sounding names, and then watched to see whether employers called the "applicants" with Black-sounding names at the same rate as the applicants with the White-sounding names.

(Which names constituted "Black-sounding names" and "White-sounding names" was determined by analyzing names on Massachusetts birth certificates to determine which names were most associated with Black and White children, and then surveys were used to validate that the names were perceived as being associated with individuals of one racial category or the other. Also, please note I subscribe to the logic of [Kwame Anthony Appiah](https://www.theatlantic.com/ideas/archive/2020/06/time-to-capitalize-blackand-white/613159/) and chose to capitalize both the B in Black and the W in White). 

You can get access to original article [here](https://www.aeaweb.org/articles?id=10.1257/0002828042002561). 

**Note to Duke students:** if you are on the Duke campus network, you'll be able to access almost any academic journal articles directly; if you are off campus and want access, you can just go to the [Duke Library](https://library.duke.edu/) website and search for the article title. Once you find it, you'll be asked to log in, after which you'll have full access to the article. You will also find this pattern holds true at nearly any major University in the US.


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from scipy import stats
import seaborn as sns
from scipy.stats import ttest_ind
import warnings
from scipy.stats import chi2_contingency


warnings.filterwarnings("ignore")

pd.set_option("mode.copy_on_write", True)

results = {}


## Gradescope Autograding

Please follow [all standard guidance](https://www.practicaldatascience.org/html/autograder_guidelines.html) for submitting this assignment to the Gradescope autograder, including storing your solutions in a dictionary called `results` and ensuring your notebook runs from the start to completion without any errors.

For this assignment, please name your file `exercise_resume_experiment.ipynb` before uploading.

You can check that you have answers for all questions in your `results` dictionary with this code:

```python
assert set(results.keys()) == {
    "ex2_pvalue_computerskills",
    "ex2_pvalue_female",
    "ex2_pvalue_yearsexp",
    "ex3_pvalue_education",
    "ex4_validity",
    "ex5_pvalue",
    "ex5_white_advantage_percent",
    "ex5_white_advantage_percentage_points",
    "ex6_black_pvalue",
    "ex8_black_college",
    "ex8_black_nocollege",
    "ex8_college_heterogeneity",
    "ex9_gender_and_discrimination",
    "ex10_experiment_v_us",
}
```


### Submission Limits

Please remember that you are **only allowed FOUR submissions to the autograder.** Your last submission (if you submit 4 or fewer times), or your third submission (if you submit more than 4 times) will determine your grade Submissions that error out will **not** count against this total.

That's one more than usual in case there are issues with exercise clarity.

## Checking for Balance

The first step in analyzing any experiment is to check whether you have *balance* across your treatment arms—that is to say, do the people who were randomly assigned to the treatment group look like the people who were randomly assigned to the control group. Or in this case, do the resumes that ended up with Black-sounding names look like the resumes with White-sounding names. 

Checking for balance is critical for two reasons. First, it's always possible that random assignment will create profoundly different groups—the *Large of Large Numbers* is only a "law" in the limit. So we want to make sure we have reasonably similar groups from the outset. And second, it's also always possible that the randomization wasn't actually implemented correctly—you would be amazed at the number of ways that "random assignment" can go wrong! So if you ever do find you're getting unbalanced data, you should worry not only about whether the groups have baseline differences, but also whether the "random assignment" was actually random!

### Exercise 1

Download the data set from this experiment (`resume_experiment.dta`) from [github](https://github.com/nickeubank/MIDS_Data/tree/master/resume_experiment). To aid the autograder, please load the data directly from a URL.


In [3]:
resume = pd.read_stata(
    "https://github.com/nickeubank/MIDS_Data/raw/master/resume_experiment/resume_experiment.dta"
)
resume.head()

Unnamed: 0,education,ofjobs,yearsexp,computerskills,call,female,black
0,4,2,6,1,0.0,1.0,0.0
1,3,3,6,1,0.0,1.0,0.0
2,4,1,6,1,0.0,1.0,1.0
3,3,4,6,1,0.0,1.0,1.0
4,3,3,22,1,0.0,1.0,0.0



### Exercise 2

- `black` is the treatment variable in the data set (whether the resume has a "Black-sounding" name).
- `call` is the dependent variable of interest (did the employer call the fictitious applicant for an interview)

In addition, the data include a number of variables to describe the other features in each fictitious resume, including applicants education level (`education`), years of experience (`yearsexp`), gender (`female`), computer skills (`computerskills`), and number of previous jobs (`ofjobs`). Each resume has a random selection of these attributes, so on average the Black-named fictitious applicant resumes have the same qualifications as the White-named applicant resumes. 

Check for balance in terms of the average values of applicant gender (`female`), computer skills (`computerskills`), and years of experience (`yearsexp`) across the two arms of the experiment (i.e. by `black`). Calculate both the differences in means across treatment arms *and* test for statistical significance of these differences. Does gender, computer skills, and yearsexp look balanced across race groups in terms of both statistical significance and magnitude of difference?

Store the p-values associated with your t-test of these variables in `ex2_pvalue_female`, `ex2_pvalue_computerskills`, and `ex2_pvalue_yearsexp`. **Round your values to 2 decimal places.**


In [4]:
# Separating the groups to be
group_black = resume[resume["black"] == 1]
group_white = resume[resume["black"] == 0]

# Computing differences in means for gender, computer skills, years of experience in the two groups
mean_diff_female = group_black["female"].mean() - group_white["female"].mean()
mean_diff_computerskills = (
    group_black["computerskills"].mean() - group_white["computerskills"].mean()
)
mean_diff_yearsexp = group_black["yearsexp"].mean() - group_white["yearsexp"].mean()

print(
    f"The difference in means for gender between the Black group and White group is around: {mean_diff_female:.2f}."
)
print(
    f"The difference in means for computer skills between the Black group and White group are around :{mean_diff_computerskills:.2f}."
)
print(
    f"The difference in means for years of experience between the Black group and White group are around :{mean_diff_yearsexp:.2f}."
)

The difference in means for gender between the Black group and White group is around: 0.01.
The difference in means for computer skills between the Black group and White group are around :0.02.
The difference in means for years of experience between the Black group and White group are around :-0.03.


In [5]:
# Statistical significance (t-tests)
t_test_female = ttest_ind(group_black["female"], group_white["female"])
t_test_computerskills = ttest_ind(
    group_black["computerskills"], group_white["computerskills"]
)
t_test_yearsexp = ttest_ind(group_black["yearsexp"], group_white["yearsexp"])

print(
    "t-test for the difference in means for gender between the Black group and White group: "
)
print(t_test_female)
print(
    "t-test for the difference in means for computer skills between the Black group and White group: "
)
print(t_test_computerskills)
print(
    "t-test for the difference in means for years of experience between the Black group and White group: "
)
print(t_test_yearsexp)

t-test for the difference in means for gender between the Black group and White group: 
TtestResult(statistic=0.8841321196036144, pvalue=0.37666855949097355, df=4868.0)
t-test for the difference in means for computer skills between the Black group and White group: 
TtestResult(statistic=2.1664271042751966, pvalue=0.03032693395539194, df=4868.0)
t-test for the difference in means for years of experience between the Black group and White group: 
TtestResult(statistic=-0.18461970685747395, pvalue=0.8535350182481283, df=4868.0)


In [6]:
# Storing p-values with 2 decimal places
results["ex2_pvalue_female"] = round(t_test_female.pvalue, 2)
results["ex2_pvalue_computerskills"] = round(t_test_computerskills.pvalue, 2)
results["ex2_pvalue_yearsexp"] = round(t_test_yearsexp.pvalue, 2)

print(
    f"p-value of t-test for the difference in means for gender between the Black group and White group: {round(t_test_female.pvalue, 2)}"
)
print(
    f"p-value of t-test for the difference in means for computers skills between the Black group and White group: {round(t_test_computerskills.pvalue, 2)}"
)
print(
    f"p-value of t-test for the difference in means for years of experience between the Black group and White group: {round(t_test_yearsexp.pvalue, 2)}"
)

p-value of t-test for the difference in means for gender between the Black group and White group: 0.38
p-value of t-test for the difference in means for computers skills between the Black group and White group: 0.03
p-value of t-test for the difference in means for years of experience between the Black group and White group: 0.85


> - **Magnitude of Difference**: All three variables (gender, computer skills, and years of experience) show very small differences in means between the Black and White groups, suggesting the groups can be considered balanced on these characteristics.
> - **Statistical Significance**: Only the computer skills show a statistically significant difference between the Black and White groups (p-value<0.05), suggesting a lack of balance in this characteristic between the two groups. In contrast, there's no statistically significant difference for gender or years of experience across the two race groups (p-value>0.05), suggesting balance in these characteristics across the two groups. 
> - Overall, while gender and years of experience appear balanced across race groups both in terms of statistical significance and magnitude, computer skills show a statistically significant difference despite the small magnitude of the difference.

### Exercise 3

Do a similar tabulation for education (`education`). Education is a categorical variable coded as follows:

- 0: Education not reported
- 1: High school dropout
- 2: High school graduate
- 3: Some college
- 4: College graduate or higher

Because these are categorical, you shouldn't just calculate and compare means—you should compare share or count of observations with each value (e.g., a chi-squared contingency table). You may also find the `pd.crosstab` function useful.

Does education look balanced across racial groups?

Store the p-value from your chi squared test in results under the key `ex3_pvalue_education`. **Please round to 2 decimal places.**

In [7]:
# Creating a contingency table for education across racial groups
contingency_table_education = pd.crosstab(resume["education"], resume["black"])

print(f"The contingency table is")
print(contingency_table_education)

The contingency table is
black       0.0   1.0
education            
0            18    28
1            18    22
2           142   132
3           513   493
4          1744  1760


In [8]:
# Performing chi-squared test
chi2, p, dof, expected = chi2_contingency(contingency_table_education)

print(
    "The p-value from my chi square test for the difference in means for education between the Black group and White group is around: ",
    round(p, 2),
)
# Storing p-value, rounded to 2 decimal places
results["ex3_pvalue_education"] = round(p, 2)

The p-value from my chi square test for the difference in means for education between the Black group and White group is around:  0.49


> There is no statistically significant difference in the distribution of education levels between the two racial groups (p value>0.05), indicating the education appears to be balanced across racial groups.

### Exercise 4

What do you make of the overall results on resume characteristics? Why do we care about whether these variables look similar across the race groups? And if they didn't look similar, would that be a threat to internal or external validity? 

Answer in markdown, then also store your answer to the question of whether imbalances are a threat to internal or external validity in `"ex4_validity"` as the string `"internal"` or `"external"`.


> - The overall results on resume characteristics suggest that, except for computer skills, the variables of gender, years of experience, and education level appear to be balanced across the race groups. This balance is crucial in ensuring the fairness and accuracy of the experimental design, which in this context aims to assess the impact of race (as indicated by "Black-sounding" vs. "White-sounding" names) on the likelihood of being called for an interview.
> - The similarity of these variables across race groups is essential. It helps to control for confounding variables that could otherwise skew the results. For instance, if one racial group had significantly higher computer skill levels, any difference in call-back rates could be attributed to this factor rather than the race signal sent by the name on the resume.
> - If these variables does not look similar across the race groups, it would primarily pose a threat to the internal validity of the study. Differences in variables such as gender, education, experience, or computer skills across race groups could confound the results, making it difficult to attribute any difference in call-back rates to the race signal alone. This could lead to incorrect conclusions about the impact of racial bias in hiring practices.

In [9]:
results["ex4_validity"] = "internal"

## Estimating Effect of Race

### Exercise 5

The variable of interest in the data set is the variable `call`, which indicates a call back for an interview. Perform a two-sample t-test comparing applicants with black sounding names and white sounding names.

Interpret your results—in both percentage *and* in percentage points, what is the effect of having a Black-sounding name (as opposed to a White-sounding name) on your resume?

Store how much more likely a White applicant is to receive a call back than a Black respondent in percentage and percentage points in `"ex5_white_advantage_percent"`and `"ex5_white_advantage_percentage_points"`. Please scale percentages so 1 is 1% and percentage points so a value of `1` corresponds to 1 percentage point. **Please round these answers to 2 decimal places.**

Store the p-value of the difference in `"ex5_pvalue"` **Please round your p-value to 5 decimal places.**

In [10]:
# Splitting data based on 'black' variable
group_black_callback = resume[resume["black"] == 1]["call"]
group_white_callback = resume[resume["black"] == 0]["call"]

t_stat_race = ttest_ind(group_black_callback, group_white_callback)

print(
    "Two-sample t-test comparing the callback for an interview for applicants with black sounding names and white sounding names:"
)
print(t_stat_race)

Two-sample t-test comparing the callback for an interview for applicants with black sounding names and white sounding names:
TtestResult(statistic=-4.114705356750735, pvalue=3.940800981423711e-05, df=4868.0)


In [11]:
p_value = t_stat_race[1]

print(
    "p-value of t-test for the difference in means for callback between applicants with black sounding names and white sounding names is around : ",
    round(p_value, 5),
)

p-value of t-test for the difference in means for callback between applicants with black sounding names and white sounding names is around :  4e-05


In [12]:
# Calculating much more likely a White applicant is to receive a call back than a Black respondent.
white_advantage_percent = (
    (group_white_callback.mean() - group_black_callback.mean())
    / group_black_callback.mean()
    * 100
)  # Conversion to percentage
white_advantage_percentage_points = (
    group_white_callback.mean() - group_black_callback.mean()
) * 100  # Conversion to percentage points

print(
    f"An applicant with white sounding name is {round(white_advantage_percent, 2)}% more likely to receive a call back than an applicant with black sounding name."
)
print(
    f"An applicant with white sounding name is {round(white_advantage_percentage_points, 2)} percentage points more likely to receive a call back than an applicant with black sounding name."
)

An applicant with white sounding name is 49.68% more likely to receive a call back than an applicant with black sounding name.
An applicant with white sounding name is 3.2 percentage points more likely to receive a call back than an applicant with black sounding name.


In [13]:
# Storing the rounded values
results["ex5_white_advantage_percent"] = round(white_advantage_percent, 2)
results["ex5_white_advantage_percentage_points"] = round(
    white_advantage_percentage_points, 2
)
results["ex5_pvalue"] = round(p_value, 5)

> There is a statistically significant difference in the callback rates for interviews between applicants with Black-sounding names and those with White-sounding names (p value<0.05). A White applicant is 49.68% more likely to receive a call back than a Black respondent. In absolute terms, the callback rate for White applicants is 3.2 percentage points higher than for Black applicants. Overall, the effect of having a Black-sounding name on a resume, as indicated by this analysis, suggests a statistically significant negative impact (3.2 percentage points lower) on the likelihood of being called back for an interview compared to having a White-sounding name on a resume. 

### Exercise 6

Now, use a linear probability model (a linear regression with a 0/1 dependent variable!) to estimate the differential likelihood of being called back by applicant race (i.e. the racial discrimination by employers). Please use [statsmodels](https://www.statsmodels.org/stable/index.html).

Since we have a limited dependent variable, be sure to use [heteroskedastic robust standard errors.](https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html) Personally, I prefer the `HC3` implementation, as it tends to do better with smaller samples than other implementations.

Interpret these results—what is the *effect* of having a Black-sounding name (as opposed to a White-sounding name) on your resume in terms of the likelihood you'll be called back? 

How does this compare to the estimate you got above in exercise 5?

Store the p-value associated with `black` in `"ex6_black_pvalue"`. **Please round your pvalue to 5 decimal places.**

In [14]:
## fit the model
model_callback_race = smf.ols(formula="call ~ black", data=resume).fit()

## use robust standard errors
robust_results = model_callback_race.get_robustcov_results(cov_type="HC3")

## summary output
robust_results.summary()

0,1,2,3
Dep. Variable:,call,R-squared:,0.003
Model:,OLS,Adj. R-squared:,0.003
Method:,Least Squares,F-statistic:,16.92
Date:,"Mon, 04 Mar 2024",Prob (F-statistic):,3.96e-05
Time:,21:32:34,Log-Likelihood:,-562.24
No. Observations:,4870,AIC:,1128.0
Df Residuals:,4868,BIC:,1141.0
Df Model:,1,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.0965,0.006,16.121,0.000,0.085,0.108
black,-0.0320,0.008,-4.114,0.000,-0.047,-0.017

0,1,2,3
Omnibus:,2969.205,Durbin-Watson:,1.44
Prob(Omnibus):,0.0,Jarque-Bera (JB):,18927.068
Skew:,3.068,Prob(JB):,0.0
Kurtosis:,10.458,Cond. No.,2.62


In [15]:
# Printing out the coefficient and p-value for 'black'
black_coefficient = robust_results.params[1]
black_pvalue = robust_results.pvalues[1]

print(
    f"Coefficient for the difference in likelihood of being called back by having black sounding name comparing to having white sounding name: {round(black_coefficient, 5)}"
)
print(
    f"P-value for the difference in likelihood of being called back by having black sounding name comparing to having white sounding name: {round(black_pvalue, 5)}"
)

## store the result
results["ex6_black_pvalue"] = round(black_pvalue, 5)

Coefficient for the difference in likelihood of being called back by having black sounding name comparing to having white sounding name: -0.03203
P-value for the difference in likelihood of being called back by having black sounding name comparing to having white sounding name: 4e-05


> - The coefficient -0.03203 indicates that, applicant with a Black-sounding name on the resume is 3.203 percentage points less likely to receive a callback compared to applicant with a White-sounding name. The p-value 4e-05 indicates that there is a statistically significant difference in callback rates based on the name's racial association. Overall, the effect of having a Black-sounding name on a resume, as indicated by this analysis, suggests a statistically significant negative impact on the likelihood of being called back for an interview. 
> - The result I got from the linear probability model is consistent with the estimate I got above in exercise 5 (around 3.2 percentage points difference between the racial group callback rates). Both analyses confirm the statistical significance of the difference in callback rates for interviews between applicants with Black-sounding names and those with White-sounding names (p value<0.05).

### Exercise 7

Even when doing a randomized experiment, adding control variables to your regression *can* improve the statistical efficiency of your estimates of the treatment effect (the upside is the potential to explain residual variation; the downside is more parameters to be estimated). Adding controls can be particularly useful when randomization left some imbalances in covariates (which you may have seen above). 

Now let's see if we can improve our estimates by adding in other variables as controls. Add in `education`, `yearsexp`, `female`, and `computerskills`—be sure to treat education as a categorical variable!

In [16]:
## fit the model
model_callback_all = smf.ols(
    formula="call ~ black + C(education) + yearsexp + female + computerskills",
    data=resume,
).fit()

## robust standard error
robust_new_model = model_callback_all.get_robustcov_results(cov_type="HC3")

## summary
robust_new_model.summary()

0,1,2,3
Dep. Variable:,call,R-squared:,0.008
Model:,OLS,Adj. R-squared:,0.006
Method:,Least Squares,F-statistic:,4.35
Date:,"Mon, 04 Mar 2024",Prob (F-statistic):,3.04e-05
Time:,21:32:34,Log-Likelihood:,-551.02
No. Observations:,4870,AIC:,1120.0
Df Residuals:,4861,BIC:,1178.0
Df Model:,8,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.0821,0.040,2.053,0.040,0.004,0.160
C(education)[T.1],-0.0017,0.057,-0.030,0.976,-0.113,0.110
C(education)[T.2],-8.953e-05,0.042,-0.002,0.998,-0.082,0.082
C(education)[T.3],-0.0025,0.039,-0.065,0.948,-0.079,0.074
C(education)[T.4],-0.0047,0.038,-0.124,0.901,-0.080,0.070
black,-0.0316,0.008,-4.076,0.000,-0.047,-0.016
yearsexp,0.0032,0.001,3.665,0.000,0.001,0.005
female,0.0112,0.010,1.165,0.244,-0.008,0.030
computerskills,-0.0186,0.011,-1.616,0.106,-0.041,0.004

0,1,2,3
Omnibus:,2950.646,Durbin-Watson:,1.448
Prob(Omnibus):,0.0,Jarque-Bera (JB):,18631.25
Skew:,3.047,Prob(JB):,0.0
Kurtosis:,10.395,Cond. No.,225.0


In [17]:
# Printing out the coefficient and p-value for 'black'
black_coefficient_new = robust_new_model.params[5]
black_pvalue_new = robust_new_model.pvalues[5]

print(
    f"Coefficient for the difference in likelihood of being called back by having black sounding name comparing to having white sounding name, controlling for other factors: {round(black_coefficient_new, 5)}"
)
print(
    f"P-value for the difference in likelihood of being called back by having black sounding name comparing to having white sounding name: {round(black_pvalue_new, 5)}"
)

Coefficient for the difference in likelihood of being called back by having black sounding name comparing to having white sounding name, controlling for other factors: -0.03161
P-value for the difference in likelihood of being called back by having black sounding name comparing to having white sounding name: 5e-05


> - The coefficient -0.03161 indicates that, applicant with a Black-sounding name on the resume is 3.161 percentage points less likely to receive a callback compared to applicant with a White-sounding name, controlling for other factors. The p-value 5e-05 indicates that there is a statistically significant difference in callback rates based on the name's racial association. Overall, the effect of having a Black-sounding name on a resume, as indicated by this analysis, suggests a statistically significant negative impact on the likelihood of being called back for an interview. 

## Estimating Heterogeneous Effects

### Exercise 8

As you may recall from some past readings (such as this one on the [migraine medication Aimovig](https://ds4humans.com/30_questions/15_answering_exploratory_questions.html#faithful-representations)), our focus on estimating *Average Treatment Effects* runs the risk of papering over variation in how individuals respond. In the case of Aimovig, for example, nearly no patients actually experienced the Average Treatment Effect of the medication; around half of patients experienced no benefit, while the other half experienced a benefit of about twice the average treatment effect.

So far in this analysis we've been focusing on the *average* effect of having a Black-sounding name (as compared to a White-sounding name). But we can actually use our regression framework to look for evidence of *heterogeneous treatment effects*—effects that are different for different types of people in our data. We accomplish this by *interacting* a variable we think may be related to experiencing a differential treatment effect with our treatment variable. For example, if we think that applicants with Black-sounding names who have a college degree are likely to experience less discrimination, we can interact `black` with an indicator for having a college degree. If having a college degree reduces discrimination, we could expect the interaction term to be positive. 

Is there more or less racial discrimination (the absolute magnitude difference in call back rates between Black and White applicants) among applicants who have a college degree? Store your answer as the string `"more discrimination"` or `"less discrimination"` under the key `"ex8_college_heterogeneity"`.

Please still include `education`, `yearsexp`, `female`, and `computerskills` as controls.

**Note:** it's relatively safe to assume that someone hiring employees who sees a resume that does *not* report education levels will assume the applicant does not have a college degree. So treat "No education reported" as "not having a college degree."

In percentage points, what is the difference in call back rates:

- between White applicants without a college degree and Black applicants without a college degree (`ex8_black_nocollege`).
- between White applicants with a college degree and Black applicants with a college degree (`ex8_black_college`).

Use negative values to denote a lower probability for Black applicants to get a call back. **Scale so a value of `1` is a one percentage point difference. Please round your answer to 2 percentage points.**

Focus on the coefficient values, even if the significance is low.

In [18]:
# Creating a new variable 'college' indicating whether the applicant has a college degree (1) or not (0)
resume["college"] = resume["education"].apply(lambda x: 1 if x == 4 else 0)

# Interacting 'black' with 'college' and including other controls in the model
model_college_black = smf.ols(
    formula="call ~ black * college + education + yearsexp + female + computerskills",
    data=resume,
).fit(cov_type="HC3")

model_college_black.summary()

0,1,2,3
Dep. Variable:,call,R-squared:,0.008
Model:,OLS,Adj. R-squared:,0.007
Method:,Least Squares,F-statistic:,5.064
Date:,"Mon, 04 Mar 2024",Prob (F-statistic):,9.67e-06
Time:,21:32:35,Log-Likelihood:,-550.76
No. Observations:,4870,AIC:,1118.0
Df Residuals:,4862,BIC:,1169.0
Df Model:,7,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,0.0883,0.030,2.906,0.004,0.029,0.148
black,-0.0405,0.015,-2.735,0.006,-0.070,-0.011
college,-0.0071,0.021,-0.346,0.729,-0.047,0.033
black:college,0.0124,0.017,0.711,0.477,-0.022,0.046
education,-0.0014,0.010,-0.133,0.894,-0.022,0.019
yearsexp,0.0032,0.001,3.670,0.000,0.001,0.005
female,0.0112,0.010,1.157,0.247,-0.008,0.030
computerskills,-0.0187,0.011,-1.648,0.099,-0.041,0.004

0,1,2,3
Omnibus:,2950.174,Durbin-Watson:,1.448
Prob(Omnibus):,0.0,Jarque-Bera (JB):,18623.672
Skew:,3.046,Prob(JB):,0.0
Kurtosis:,10.393,Cond. No.,87.2


In [19]:
# Getting the coefficients for the interaction term and the main effect
coeff_black = model_college_black.params[1]
coeff_college = model_college_black.params[2]
coeff_interaction_black_college = model_college_black.params[3]

print(
    f"Coefficient for the difference in likelihood of getting callback for Black applicants comparing to White applicants, controlling for other factors: {round (coeff_black, 5)}"
)
print(
    f"Coefficient for the difference in likelihood of getting callback for White applicants with college degree comparing to White applicants without college degree, controlling for other factors: {round (coeff_college, 5)}"
)
print(
    f"Coefficient for the difference in likelihood of getting callback for Black applicants with college degree comparing to Black applicants without college degree VS White applicants with college degree comparing to White applicants without college degree, controlling for other factors: {round (coeff_interaction_black_college, 5)}"
)

Coefficient for the difference in likelihood of getting callback for Black applicants comparing to White applicants, controlling for other factors: -0.04053
Coefficient for the difference in likelihood of getting callback for White applicants with college degree comparing to White applicants without college degree, controlling for other factors: -0.0071
Coefficient for the difference in likelihood of getting callback for Black applicants with college degree comparing to Black applicants without college degree VS White applicants with college degree comparing to White applicants without college degree, controlling for other factors: 0.01237


In [20]:
# Difference in callback rates without a college degree (just the effect of being black)
ex8_black_nocollege = round((coeff_black * 100), 2)  # Scaling to percentage points

# Difference in callback rates with a college degree (effect of being black + interaction term)
ex8_black_college = round(
    (coeff_black + coeff_interaction_black_college) * 100, 2
)  # Scaling to percentage points

# Interpretation of discrimination based on the interaction term
ex8_college_heterogeneity = (
    "less discrimination"
    if coeff_interaction_black_college > 0
    else "more discrimination"
)


print(
    f"Black applicants without a college degree is {abs(ex8_black_nocollege)} percentage points less likely than White applicants without a college degree to get callback."
)
print(
    f"Black applicants with a college degree is {abs(ex8_black_college)} percentage points less likely than White applicants with a college degree to get callback."
)
print(
    f"There is {ex8_college_heterogeneity} among applicants who have a college degree."
)

results["ex8_black_nocollege"] = ex8_black_nocollege
results["ex8_black_college"] = ex8_black_college
results["ex8_college_heterogeneity"] = ex8_college_heterogeneity

Black applicants without a college degree is 4.05 percentage points less likely than White applicants without a college degree to get callback.
Black applicants with a college degree is 2.82 percentage points less likely than White applicants with a college degree to get callback.
There is less discrimination among applicants who have a college degree.


### Exercise 9

Now let's compare men and women—is the penalty for having a Black-sounding name greater for Black men or Black women? Store your answer as `"greater discrimination for men"` or `"greater discrimination for women"` in `"ex9_gender_and_discrimination"`.

Focus on the coefficient values, even if the significance is low.

Again, please still include `education`, `yearsexp`, `female`, and `computerskills` as controls.

In [21]:
# we want to use female as an interaction term here because we are concerned with black men and black females
model_gender_black = smf.ols(
    formula="call ~ black*female + yearsexp + computerskills + C(education)",
    data=resume,
).fit(cov_type="HC3")

model_gender_black.summary()

0,1,2,3
Dep. Variable:,call,R-squared:,0.008
Model:,OLS,Adj. R-squared:,0.006
Method:,Least Squares,F-statistic:,3.866
Date:,"Mon, 04 Mar 2024",Prob (F-statistic):,6.76e-05
Time:,21:32:35,Log-Likelihood:,-551.0
No. Observations:,4870,AIC:,1122.0
Df Residuals:,4860,BIC:,1187.0
Df Model:,9,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,0.0807,0.040,1.996,0.046,0.001,0.160
C(education)[T.1],-0.0021,0.057,-0.037,0.971,-0.114,0.110
C(education)[T.2],-0.0001,0.042,-0.003,0.998,-0.082,0.082
C(education)[T.3],-0.0026,0.039,-0.066,0.947,-0.079,0.074
C(education)[T.4],-0.0048,0.038,-0.125,0.900,-0.080,0.070
black,-0.0287,0.016,-1.840,0.066,-0.059,0.002
female,0.0131,0.014,0.919,0.358,-0.015,0.041
black:female,-0.0038,0.018,-0.213,0.831,-0.039,0.031
yearsexp,0.0032,0.001,3.668,0.000,0.001,0.005

0,1,2,3
Omnibus:,2950.616,Durbin-Watson:,1.448
Prob(Omnibus):,0.0,Jarque-Bera (JB):,18630.964
Skew:,3.047,Prob(JB):,0.0
Kurtosis:,10.395,Cond. No.,226.0


In [22]:
# Getting the coefficients for the interaction term and the main effect
coeff_black_new = model_gender_black.params[5]
coeff_female = model_gender_black.params[6]
coeff_interaction_black_female = model_gender_black.params[7]

print(
    f"Coefficient for the difference in likelihood of getting callback for Black applicants comparing to White applicants, controlling for other factors: {round (coeff_black_new, 5)}"
)
print(
    f"Coefficient for the difference in likelihood of getting callback for female White applicants comparing to male White applicants, controlling for other factors: {round (coeff_female, 5)}"
)
print(
    f"Coefficient for the difference in likelihood of getting callback for female Black applicants comparing to male Black applicants VS female White applicants comparing to male White applicants, controlling for other factors: {round (coeff_interaction_black_female, 5)}"
)

Coefficient for the difference in likelihood of getting callback for Black applicants comparing to White applicants, controlling for other factors: -0.02866
Coefficient for the difference in likelihood of getting callback for female White applicants comparing to male White applicants, controlling for other factors: 0.01313
Coefficient for the difference in likelihood of getting callback for female Black applicants comparing to male Black applicants VS female White applicants comparing to male White applicants, controlling for other factors: -0.00384


In [23]:
# Interpretation of discrimination
ex9_gender_and_discrimination = (
    "greater discrimination for men"
    if coeff_interaction_black_female > 0
    else "greater discrimination for women"
)

if ex9_gender_and_discrimination == "greater discrimination for men":
    print("The penalty for having a Black-sounding name greater for Black men.")
else:
    print("The penalty for having a Black-sounding name greater for Black women.")

results["ex9_gender_and_discrimination"] = ex9_gender_and_discrimination

The penalty for having a Black-sounding name greater for Black women.


In [24]:
print(
    f"The interaction term of {round(coeff_interaction_black_female*100,2)} percentage points suggests that the penalty for being a Black woman is slightly more pronounced than for Black men when compared to their White counterparts."
)
print(
    "Therefore, the penalty for having a Black-sounding name greater for Black women."
)

The interaction term of -0.38 percentage points suggests that the penalty for being a Black woman is slightly more pronounced than for Black men when compared to their White counterparts.
Therefore, the penalty for having a Black-sounding name greater for Black women.


### Exercise 10

Calculate and/or lookup the following online:

- What is the share of applicants in our dataset with college degrees?
- What share of Black adult Americans have college degrees (i.e. have completed a bachelors degree)?

Is the share of Black applicants with college degrees in this data `"greater"`, or `"less"` than in the US? Store your answer as one of those strings in `"ex10_experiment_v_us"`

In [25]:
# Calculate the share of applicants in the dataset with college degrees
share_college_degrees = (resume["education"] == 4).mean()

# Calculate the share of Black adult Americans with college degrees (i.e. have completed a bachelors degree)
share_black_college_degrees = (
    resume.groupby("black")["education"].apply(lambda x: (x == 4).mean()).loc[1]
)

# Print the results
print(
    f"Share of applicants in the dataset with college degrees in this dataset: {share_college_degrees:.2%}"
)

print(
    f"Share of Black adult Americans with college degrees in this dataset: {share_black_college_degrees:.2%}"
)

Share of applicants in the dataset with college degrees in this dataset: 71.95%
Share of Black adult Americans with college degrees in this dataset: 72.28%


> According to the [Census' American Community Survey](https://www.census.gov/programs-surveys/acs), in 2021 12% of the total U.S. population identified as Black or African American. Among Black residents aged 25 or over, 22.6% had earned a bachelor's degree or higher. This rate is up from 17.9% in 2010, but falls short of the national rate of 32.9%.

In [26]:
national_share_black_college_degrees = 0.226
# Compare the two shares
if share_black_college_degrees > 0.226:
    ex10_experiment_v_us = "greater"
else:
    ex10_experiment_v_us = "less"


print(
    f"The share of Black applicants with college degrees in this data is {ex10_experiment_v_us} than in the US."
)

results["ex10_experiment_v_us"] = ex10_experiment_v_us

The share of Black applicants with college degrees in this data is greater than in the US.


### Exercise 11

Bearing in mind your answers to Exercise 8 and to Exercise 10, how do you think the Average Treatment Effect you estimated in Exercises 5 and 6 might generalize to the experience of the average Black American (i.e., how do you think the ATE for the average Black American would compare to the ATE estimated from this experiment)?


> - The analysis revealed a statistically significant negative impact of having a Black-sounding name on callback rates for interviews, quantified as a 3.2 percentage points lower likelihood compared to having a White-sounding name. The experiment showed that discrimination based on name varies by educational attainment. Although Black applicants, both with and without college degrees, face discrimination, the penalty is less severe for those with a college degree.
> - However, the share of Black applicants with college degrees in the dataset is significantly higher than the national average for Black Americans (71.95% in the dataset vs. 22.6% nationally). This discrepancy indicates that the sample may not be fully representative of the broader Black American population in terms of educational attainment.
> - The higher educational attainment among the sample suggests that the ATE derived from this dataset may not fully capture the experiences of the broader Black population, particularly those without college degrees, who might face more severe discrimination. Therefore, the Average Treatment Effect I estimated in Exercises 5 and 6 might not generalize to the experience of the average Black American. The ATE for the average Black American in the U.S. may be more severe compare to the ATE estimated from this experiment, given the share of Black applicants with college degrees in the dataset is significantly higher than the national average for Black Americans.

### Exercise 12

What does your answer to Exercise 10 imply about the study's *internal* validity?

> My answer to Exercise 10 doesn't imply anything about the study's internal validity. The internal validity of the study holds since the study's design and execution in establishing causal relationships is valid.

### Exercise 13

What does your answer to Exercise 10 imply about the study's *external* validity?

> My answer to Exercise 10 implies the study's external validity is poor since the broader applicability of the findings is poor. The higher educational attainment among the sample suggests that the ATE derived from this dataset may not fully capture the experiences of the broader Black population, particularly those without college degrees, who might face more severe discrimination. Therefore, the Average Treatment Effect I estimated in Exercises 5 and 6 might not generalize to the experience of the average Black American. The ATE for the average Black American in the U.S. may be more severe compare to the ATE estimated from this experiment, given the share of Black applicants with college degrees in the dataset is significantly higher than the national average for Black Americans. 

## What Did We Just Measure?

It's worth pausing for a moment to think about exactly what we've measured in this experiment. Was it the effect of race on hiring? Or the difference in the experience of the average White job applicant from the average Black job applicant?

Well... no. What we have measured in this experiment is **just** the effect of having a Black-sounding name (as opposed to a White-sounding name) on your resume on the likelihood of getting a followup call from someone hiring in Boston or Chicago given identical resumes. In that sense, what we've measured is a small *piece* of the difference in the experience of Black and White Americans when seeking employment. As anyone looking for a job knows, getting a call-back is obviously a crucial step in getting a job, so this difference—even if it's just one part of the overall difference—is remarkable.

In [27]:
results

{'ex2_pvalue_female': 0.38,
 'ex2_pvalue_computerskills': 0.03,
 'ex2_pvalue_yearsexp': 0.85,
 'ex3_pvalue_education': 0.49,
 'ex4_validity': 'internal',
 'ex5_white_advantage_percent': 49.68,
 'ex5_white_advantage_percentage_points': 3.2,
 'ex5_pvalue': 4e-05,
 'ex6_black_pvalue': 4e-05,
 'ex8_black_nocollege': -4.05,
 'ex8_black_college': -2.82,
 'ex8_college_heterogeneity': 'less discrimination',
 'ex9_gender_and_discrimination': 'greater discrimination for women',
 'ex10_experiment_v_us': 'greater'}

In [28]:
assert set(results.keys()) == {
    "ex2_pvalue_computerskills",
    "ex2_pvalue_female",
    "ex2_pvalue_yearsexp",
    "ex3_pvalue_education",
    "ex4_validity",
    "ex5_pvalue",
    "ex5_white_advantage_percent",
    "ex5_white_advantage_percentage_points",
    "ex6_black_pvalue",
    "ex8_black_college",
    "ex8_black_nocollege",
    "ex8_college_heterogeneity",
    "ex9_gender_and_discrimination",
    "ex10_experiment_v_us",
}