In [1]:
# Add all necessary imports here
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.reload_library()
plt.style.use("ggplot")

### Cover Slide 1

### Cover Slide 2

# Job description 

To succeed we need to understand the unique character of each of the world’s communities, what Facebook means or could mean to them, and how best to make our technology work for them. We’re looking for people with strong quantitative research skills to help in this effort. 

The ideal candidate will be a social scientist with expertise in quantitative research methodologies or a quantitative specialist with experience solving social problems. They’ll be comfortable improvising and have the ability to work cross-functionally and thrive in a fast-paced organization. All product groups across Facebook are hiring including Instagram.

### Responsibilities

- Help shape the research agenda and drive research projects from end-to-end
- Collaborate with product teams to define relevant questions about user growth and engagement
- Deploy appropriate quantitative methodologies to answer those questions
- Develop novel approaches where traditional methods won’t do
- Collaborate with qualitative researchers as needed and iterate quickly to generate usable insights for product and business decisions
- Deliver insights and recommendations clearly to relevant audiences

### Minimum Qualifications

- Ability to communicate analyses and results to any audience
- Master’s or Ph.D. in the social sciences (e.g., Psychology, Communication, Sociology, Political Science, Economics), or in a quantitative field (e.g., Statistics, Informatics, Econometrics) with experience answering social questions
- Knowledge in data manipulation and analysis (R/SAS/Stata, SQL/Hive)
- Knowledge in quantitative research methodologies (e.g., survey sampling and design, significance testing, regression modeling, experimental design, behavioral data analysis)

### Preferred Qualifications

- Experience with Unix, Python, and large datasets (> 1TB)

# Presentation Guidelines 

- Conversational 
- Tell the “story” of 1 impactful research project
- Tell us your role on the project, your research plan, implementation of the method, findings, and what happened as a result of the research
- You can also discuss any constraints, stakeholders, teammates, roadblocks that occurred, etc.
- Your project should demonstrate how your research had an impact on the product or the subject at hand. 
- Dedicate ~10 minutes addressing how you would change your research design if you had the opportunity to do it again – this time with unlimited resources
- Please describe how your research could be applied to make real-world changes, and describe what your recommendations for those changes would be. 


## More details

- Include 1 project that you are most proud of and that you consider relevant to the work done at the company. Provide adequate detail about your role, the team, project duration, etc. Within the details you share, consider highlighting the following areas:
- How the project influenced your company or the industry
- Your approach and contribution to the team
- How you helped enable success for your team
- We suggest you tell the “story” of the project. Think like you are already a Researcher here—consider the challenges, the scope, and the methodologies.
- Also, we’d like to hear what you would have done differently if given the opportunity, this time with unlimited resources.
- You will be presenting to 3-4 researchers who will also be interviewing you one-on-one later in the day.
- Be prepared to speak to your choice of method(s), participants, analysis and results, as well as the possibility of entertaining alternative criteria proposed by the audience.
- Please provide sufficient methodological detail so that your mastery of the method(s) is obvious.
- Your presentation should fit comfortably into a 45-minute timeframe, and should include approximately 3-5 minutes to tell us about yourself and 5 minutes for questions. Note that questions are typically asked as they arise, as opposed to at the end of the presentation


# About Me

- PhD Sociology/Demography at UW-Madison

- Consultancy within and outside the academia

- Data Science projects

- [http://sdaza.com/](http://sdaza.com/)

# Consultancy

<img src="img/consultancy.png" align="middle" style="margin:10px 0px 0px 0px">


# PhD Journey!

<img src="img/researchProjects.png" align="middle" style="margin:10px 0px 0px 0px">


# Consultancy & PhD

- [Applied Population Lab, UW-Madison](https://apl.wisc.edu/)
- **acsr**: R package to extract and compute statistics from the ACS and US Census (https://github.com/sdaza/acsr)

<img src="img/poverty.png" align="middle" style="margin:10px 0px 0px 100px">



# Consultancy & PhD

- [Center for Applied Social Studies, Catholic University](http://sociologia.uc.cl/desuc/)
- Sampling design, weighting, non-response, multiple imputation
- **sampler**: R package to design samples adjusting for DEFF (https://github.com/sdaza/sampler)

# Data Science

- Data Incubator Fellow
- [Tracking congress member tweets](https://sdaza-capstone.herokuapp.com/)

<img src="img/track.png" align="middle" width="600" height="600" style="margin:10px 0px 0px 50px">

# Why Facebook?

- Make positive social impact

- Impact million of users

-  Solving social problems from an applied perspective

- Team culture, diversity

# American exceptionalism 

![rates](img/incarceration_rates.jpg)

# Expansion of Punishment

<img src="img/growth_incarceration.png" align="middle" width="500" height="500" style="margin:10px 0px 0px 50px">


# US Health Disadvantage

<img src="img/health_disadvantage.png" align="middle" width="700" height="700" style="margin:10px 0px 0px -10px">


# Why?

<img src="img/incarceration_mortality.png" align="middle" width="800" height="800" style="margin:10px 0px 0px -10px">

# Research Questions

- What is the long-term association between imprisonment and mortality in the US?

- How much of the gap in mortality between the US and the UK can be attributed to different incarceration regimes?

# Approach

<img src="img/strategy.png" align="middle" width="600" height="600" style="margin:10px 0px 0px 50px">

# Data

- **Panel Study of Income Dynamic (PSID)**

    - Since 1968, the survey has followed the same families
    
    - The first wave included roughly 5,000 families (18,000 individuals)
    
    - Recent waves have about 9,000 families (22,000 individuals)

-   **<span style="color:gray">National Longitudinal Survey of Youth 1979 (NLSY)</span>**

    - <span style="color:gray">12,686 respondents ages 14-22 when first interviewed in 1979</span>

# Measures

- **Mortality**

    - Year of death, National Death Index (NDI) and non-response records
    
    - 6,457 deaths

- **Incarceration**

    - Reports of whether a member of a household was incarcerated (n=630)
    
    - 1995 wave: have ever served time in jail or prison (n=836)

- **Incarceration PSID**
<img src="img/imprisonment_psid.png" align="middle" width="700" height="700" style="margin:10px 0px 0px 0px">


- **Covariates**

    - Age, gender, race, education attainment, household income, and health

# Statistical Model

- **Survival Parametric Models (Gompertz)**
    - Validate set up of the data using the underlying US population

### Survival

- The hazard function h(t) gives the instantaneous potential per unit time for the event to occur, given that the individual has survived up to time t
- Independent censoring essentially means that within any subgroup of interest, the subjects who are censored at time t should be representa- tive of all the subjects in that subgroup who remained at risk at time t with respect to their survival experience.
- Non-informative censoring occurs if the dis- tribution of survival times (T) provides no information about the distribution of censor- ship times (C), and vice versa.

# Gompertz 

- Gompertz function of hazard (rate) that is an exponentially increasing function of time
- The Gompertz model usually fits old age mortality very well
- Verify consistency of the adult mortality pattern we find in the data (constant and slope of the Gompertz model) with  US data

- **Survival Semi-parametric Models (Cox)**

   - Heterogeneity (gamma, family identifier)

   - Non-proportional hazard adjustments
   
   - Marginal Structural Models (MSM)
        - Attrition (non-independent censoring)
        - Time-varying confounders
        

## Cox

- It is a semi-parametric model because even if the regression parameters (the betas) are known, the distribution of the outcome remains unknown. The baseline survival (or hazard) function is not specified in a Cox model.
- A key reason why the Cox model is widely popular is that it does not rely on distributional assumptions for the outcome.
- Cox proportional hazard model is a *robust* model, so that the results from using the Cox model will closely approximate the results for the correct parametric model
- Proportional hazard model where the hazard ratio is assumed constant over time.

### MSM

- Time-dependent Cox proportional hazards model: bias

1. There exists a time-dependent covariate that is both a risk factor for mortality and also predicts subsequent exposure 
2. Past exposure history predicts the risk factor
3. Covariates satisfying condition 1 are called time- dependent confounders.

Past CD4 count is a time- dependent confounder for the effect of zidovudine on survival, because it is a risk factor for mortality and a predictor of subsequent initiation of zidovudine therapy, and past zidovudine history is an independent predictor of subsequent CD4 count. In fact, all standard methods (for example, Cox or Poisson regression) that predict the mortality rate at each time using a summary of zidovudine history up to that time may produce biased estimates of the causal effect of zidovudine whether or not one adjusts for past CD4 count in the analysis.

# Sensitivity Analysis

- Specifications

- Missing data
    - Last observation carried forward (LOCF) and backwards
    - Multiple imputation (100)

- Sampling weights

# Decomposition

How much of the gap in mortality between the US and the UK can be attributable to differential imprisonment experiences?

$$ D_{(x)} = MU_{(x)} - mu_{(x)} $$

$$ MU_{(x)} = MUo_{(x)} \times (P_{(x)} \times (E-1) + 1) =  MUo_{(x)} \times H_{(x)}$$

$$ mu_{(x)} = muo_{(x)} \times (p_{(x)} \times(E-1) + 1) = muo_{(x)} \times h_{(x)}$$

# Decomposition

$$D_{(x)} = A_{(x)} +B_{(x)}$$

- $A_{(x)}$ is the contribution of the population who has not experienced prison 

- $B_(x)$, is the contribution of the population who has been in prison

- $\frac{B_{(x)}}{D_{x}}$ is the fraction of the difference attributable to imprisonment

# Models

<img src="img/table.png" align="middle" width="630" height="630"  style="margin:-25px 0px 0px 0px">

### Interpretation

- Relative risk or hazard hartio
- exp(0.84) = 2.32


# Incidence Mortality

<img src="img/curve_mortality.png" align="middle" width="630" height="63"  style="margin:0px 0px 0px 0px">

### Transition probabilities

- The probababiility of dying give responded survived until time t and was incarcerated. 
- Predicted probabilities for male, black, less than high school, average income, not in poor health.

# Mortality Gap US - UK

<img src="img/decomp.png" align="middle" width="630" height="630"  style="margin:0px 0px 0px 0px">

# Conclusions

- Incarceration is associated with a moderate risk of mortality

    - Losses of life expectancy at age 50 of about 4 years or 12% of current U.S. life expectancy    

- The fraction of the mortality gap between the US and the UK that can be attributed to imprisonment experience ranges from **2%** to **10%**

# Limitations

- Heterogeneity / Interactions / Sample size

- Causality

- Bayesian approach

    - Previous knowledge (priors)
    
    - Model Uncertainty: BMA or stacking

# Limitations

- Measurement of incarceration

    - Underestimation?
    
    - Clear treatment


-  The UK better data

### Q&A Slide