# PUI 2018 - HOMEWORK 6 - Part 3
## Effectiveness of Post-Prison Employment Programs in NYC
Principles of Urban Informatics, Fall 2018 class
@ New York University's Center for Urban Science & Progress
Taughty by Federica Bianco

October 16, 2018
Author: Zoe Martiniak (zem232)

Reproducing the results produced in OPRE'S "Final Results of the Hard-to-Employ Demonstration and Evaluation Project and Selected Sites from the Employment Retention and Advancement Project" Report, March 2012.

In [1]:
import os
import sys
import numpy as np
import pylab as pl
%pylab inline

from IPython.display import Image


Populating the interactive namespace from numpy and matplotlib


## Part 1. Null Hypothesis:

the % of former prisoners employed after release is the same or lower for candidates who participated with the program as for the control group, significance level p=0.05

0 represents the control group, and 1 represents the program group. 

$$H_0 : P_0-P_1\geq0$$
$$H_a : P_0-P_1<0$$
$$ \alpha=0.05$$

Our hypothesis represents a test of proportions. The Binomial distribution applies since it is a yes/no (boolean) test for each subject: the former inmate was or was not employed in a CEO transitional job. The following results indicate the percent of former prisoners who were ever employed in a CEO transitional job (Years 1-3):

$$ P_0=0.035, \ P_1=0.701$$

### I want to look at the total employed after release from prison (first row "Ever employed" in Employment (Years1-3))
The following results indicate the percent of former prisoners who were ever employed after release:

$$ P_0=0.704, \ P_1=0.838$$

In [2]:
# Setting significance threshold
alpha=0.05

# Converting 0 for control group, 1 for program group

P_0 = 70.4 * 0.01 
P_1 = 83.8 * 0.01

if P_0 - P_1 >= 0:
    # we are done
    print ("the Null holds")
else:
    print ("we must assess the statistical significance")

#sample sizes 

n_0 = 409
n_1 = 564
    
#lets get the counts by multiplying by the sample size

Nt_0 = P_0 * n_0
Nt_1 = P_1 * n_1

we must assess the statistical significance


In [3]:
# defining sample proportion
sp = (P_0 * n_0 + P_1 * n_1) / (n_1 + n_0)
print (sp)

0.781673175745


In [4]:
# Functions to calculate p & the standard error

P = lambda p0, p1, n0, n1: (p0 * n0 + p1 * n1) / (n0 + n1)
#standard error
se = lambda p, n0, n1: np.sqrt(p * (1 - p) * (1.0 / n0 + 1.0 / n1))

In [6]:
# z-score:
zscore=lambda p0,p1,s:(p0-p1)/s
z_2y = zscore(P_1,P_0, se(P(P_0,P_1,n_0,n_1),n_0,n_1))
print(z_2y)

4.99440125464


In [7]:
p_2y=1-0.9984
p_2y = 1 - 0.9984


def report_result(p,a):
    print ('is the p value ' + 
           '{0:.2f} smaller than the critical value {1:.2f}?'.format(p,a))
    if p < a:
        print ("YES!")
    else: 
        print ("NO!")
    
    print ('the Null hypothesis is {}'.format(\
                            'rejected' if p < a  else 'not rejected') )

    
report_result(p_2y, alpha)


is the p value 0.00 smaller than the critical value 0.05?
YES!
the Null hypothesis is rejected


## Part 2. Null Hypothesis:
the % of former prisoners convicted of a felony after release is the same or lower for candidates who participated with the program as for the control group, significance level p=0.05

0 represents the control group, and 1 represents the program group. 

$$H_0 : P_0-P_1\geq0$$
$$H_a : P_0-P_1<0$$
$$ \alpha=0.05$$

Our hypothesis represents a test of proportions. The Binomial distribution applies since it is a yes/no (boolean) test for each subject: the former inmate was or was not convicted of a felony. The following results indicate the percent of former prisoners who were convicted of felonies after releas:

$$ P_0=0.117, \ P_1=0.10$$


In [8]:
alpha=0.05
P_0 = 70.4 * 0.01 
P_1 = 83.8 * 0.01

if P_0 - P_1 >= 0:
    # we are done
    print ("the Null holds")
else:
    print ("we must assess the statistical significance")

#sample sizes 
n_0 = 409
n_1 = 564
Nt_0 = P_0 * n_0
Nt_1 = P_1 * n_1

we must assess the statistical significance


In [9]:
sp = (P_0 * n_0 + P_1 * n_1) / (n_1 + n_0)
print (sp)
P = lambda p0, p1, n0, n1: (p0 * n0 + p1 * n1) / (n0 + n1)
#standard error
se = lambda p, n0, n1: np.sqrt(p * (1 - p) * (1.0 / n0 + 1.0 / n1))

0.781673175745


In [31]:
zscore=lambda p0,p1,s:(p0-p1)/s
z_2y = zscore(P_1,P_0, se(P(P_0,P_1,n_0,n_1),n_0,n_1))
print(z_2y)

4.99440125464


In [32]:
p_2y=1-0.9984


def report_result(p,a):
    print ('is the p value ' + 
           '{0:.2f} smaller than the critical value {1:.2f}?'.format(p,a))
    if p < a:
        print ("YES!")
    else: 
        print ("NO!")
    
    print ('the Null hypothesis is {}'.format(\
                            'rejected' if p < a  else 'not rejected') )

    
report_result(p_2y, alpha)

is the p value 0.00 smaller than the critical value 0.05?
YES!
the Null hypothesis is rejected


## Part 3: Chi Square for Employment
### Contingency table



| Post-release Employment | Employed | Not Employed |        |
|---------------------------|----------|--------------|--------|
| Test Sample   |0.838x564 = 472.63 | 0.162x564= 91.37 |  564      |
| Control Sample  |0.704x409 = 287.94  |   0.295x409= 121.06 |    409    |
| Total               | 760.57 |  212.43       |   973    |

Chisq statistics:
$\chi^2 = \sum_i \frac{(f_{i,observed} - f_{i,expectated})^2}{f_{i,expected}}$


In [25]:
# ct=contingency table
def evalChisq(ct):
        return ((ct[0,0]*ct[1,1]-ct[1,0]*ct[0,1])**2)*(sum(ct))/((sum(ct[0]))*(sum(ct[1]))*sum(ct[:,0])*sum(ct[:,1]))
    

In [28]:
sample_values = np.array([[0.701 * 564, 0.299 * 564], [0.0305 * 409, 0.965 * 409]])
print (evalChisq(sample_values))

436.223462575


In [29]:
employment = np.array([[0.838 * 564, 0.162 * 564], [0.704 * 409, 0.296 * 409]])
print (evalChisq(employment))

24.9440438924


In [39]:
DOF=(size(employment[0])-1)*(size(employment[1])-1)
x2=evalChisq(employment)
cv=3.841
def chi_result(x2,cv):
    print ('Does the chisq statistic value ' + 
           '{0:.2f} exceed the critical value {1:.2f}?'.format(x2,cv))
    if x2 > cv:
        print ("YES!")
    else: 
        print ("NO!")
    
    print ('the Null hypothesis is {}'.format(\
                            'rejected' if x2 > cv  else 'not rejected') )

chi_result(x2,cv)

Does the chisq statistic value 24.94 exceed the critical value 3.84?
YES!
the Null hypothesis is rejected


The chi-square statistic exceeds the critical value, meaning that the data fits 95% of the Chi-square distribution curve and we have significant evidence to reject the null hypothesis that the variables are independent (e.g. there is a correlation between employment rate after prison release and enrollment in the program).

## Part 4: Chi Square for Felony Recidivism
### Contingency table

|convicted of a fellony     |     yes   | no        |    total            
|---------------------------|-----------|-----------|----------------|
| Test Sample   |0.10x564 = 56.4 | 0.90x564=507.6 |  564      |
| Control Sample  |0.117x409 = 47.8 |   0.2883x409= 361.2 |    409    |
|     |         |         |     |
| Total               | 104.2 |  868.8       |   973    |

In [47]:
felonies = np.array([[0.1*564, .9*564], [0.117 * 409, 0.2883 * 409]])
print (evalChisq(felonies))

37.2448770044


In [48]:
x2_felonies=evalChisq(felonies)
cv=3.841
chi_result(x2_felonies,cv)

Does the chisq statistic value 37.24 exceed the critical value 3.84?
YES!
the Null hypothesis is rejected


The chi-square statistic exceeds the critical value, so we have significant evidence to reject the null hypothesis that the variables are independent (e.g. there is a correlation between felony recidivism and enrollment in the program).