# Before your start:
- Read the README.md file
- Comment as much as you can and use the resources (README.md file)
- Happy learning!

In [225]:
# import numpy and pandas

import pandas as pd
import numpy as np
import scipy.stats as st
from scipy.stats import t

In [3]:
# Video support https://www.youtube.com/watch?v=zJ8e_wAWUzE

# Challenge 1 - Exploring the Data

In this challenge, we will examine all salaries of employees of the City of Chicago. We will start by loading the dataset and examining its contents.

In [4]:
# Your code here:

data = pd.read_csv('Current_Employee_Names__Salaries__and_Position_Titles.csv')
data.head(5)

Unnamed: 0,Name,Job Titles,Department,Full or Part-Time,Salary or Hourly,Typical Hours,Annual Salary,Hourly Rate
0,"AARON, JEFFERY M",SERGEANT,POLICE,F,Salary,,101442.0,
1,"AARON, KARINA",POLICE OFFICER (ASSIGNED AS DETECTIVE),POLICE,F,Salary,,94122.0,
2,"AARON, KIMBERLEI R",CHIEF CONTRACT EXPEDITER,GENERAL SERVICES,F,Salary,,101592.0,
3,"ABAD JR, VICENTE M",CIVIL ENGINEER IV,WATER MGMNT,F,Salary,,110064.0,
4,"ABASCAL, REECE E",TRAFFIC CONTROL AIDE-HOURLY,OEMC,P,Hourly,20.0,,19.86


Examine the `salaries` dataset using the `head` function below.

In [5]:
# Your code here:
data.head(5)


Unnamed: 0,Name,Job Titles,Department,Full or Part-Time,Salary or Hourly,Typical Hours,Annual Salary,Hourly Rate
0,"AARON, JEFFERY M",SERGEANT,POLICE,F,Salary,,101442.0,
1,"AARON, KARINA",POLICE OFFICER (ASSIGNED AS DETECTIVE),POLICE,F,Salary,,94122.0,
2,"AARON, KIMBERLEI R",CHIEF CONTRACT EXPEDITER,GENERAL SERVICES,F,Salary,,101592.0,
3,"ABAD JR, VICENTE M",CIVIL ENGINEER IV,WATER MGMNT,F,Salary,,110064.0,
4,"ABASCAL, REECE E",TRAFFIC CONTROL AIDE-HOURLY,OEMC,P,Hourly,20.0,,19.86


In [6]:
data.shape

(33183, 8)

In [7]:
data.dtypes

Name                  object
Job Titles            object
Department            object
Full or Part-Time     object
Salary or Hourly      object
Typical Hours        float64
Annual Salary        float64
Hourly Rate          float64
dtype: object

We see from looking at the `head` function that there is quite a bit of missing data. Let's examine how much missing data is in each column. Produce this output in the cell below

In [8]:
# Your code here:

data.isnull().sum()


Name                     0
Job Titles               0
Department               0
Full or Part-Time        0
Salary or Hourly         0
Typical Hours        25161
Annual Salary         8022
Hourly Rate          25161
dtype: int64

Let's also look at the count of hourly vs. salaried employees. Write the code in the cell below

In [9]:
# Your code here:
data['Salary or Hourly'].unique()


array(['Salary', 'Hourly'], dtype=object)

In [10]:
data['Salary or Hourly'].value_counts()

Salary    25161
Hourly     8022
Name: Salary or Hourly, dtype: int64

In [11]:
data.groupby(['Salary or Hourly']).agg({'Typical Hours':'count','Annual Salary':"count"})

Unnamed: 0_level_0,Typical Hours,Annual Salary
Salary or Hourly,Unnamed: 1_level_1,Unnamed: 2_level_1
Hourly,8022,0
Salary,0,25161


In [12]:
# To validate were are the empty values
data.isnull().groupby(data['Salary or Hourly']).sum().astype(int)


Unnamed: 0_level_0,Name,Job Titles,Department,Full or Part-Time,Salary or Hourly,Typical Hours,Annual Salary,Hourly Rate
Salary or Hourly,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Hourly,0,0,0,0,0,0,8022,0
Salary,0,0,0,0,0,25161,0,25161


What this information indicates is that the table contains information about two types of employees - salaried and hourly. Some columns apply only to one type of employee while other columns only apply to another kind. This is why there are so many missing values. Therefore, we will not do anything to handle the missing values.

There are different departments in the city. List all departments and the count of employees in each department.

In [13]:
# Your code here:
data['Department'].unique()

array(['POLICE', 'GENERAL SERVICES', 'WATER MGMNT', 'OEMC',
       'CITY COUNCIL', 'AVIATION', 'STREETS & SAN', 'FIRE',
       'FAMILY & SUPPORT', 'PUBLIC LIBRARY', 'TRANSPORTN',
       "MAYOR'S OFFICE", 'HEALTH', 'BUSINESS AFFAIRS', 'LAW', 'FINANCE',
       'CULTURAL AFFAIRS', 'COMMUNITY DEVELOPMENT', 'PROCUREMENT',
       'BUILDINGS', 'ANIMAL CONTRL', 'CITY CLERK', 'BOARD OF ELECTION',
       'DISABILITIES', 'HUMAN RESOURCES', 'DoIT', 'BUDGET & MGMT',
       'TREASURER', 'INSPECTOR GEN', 'HUMAN RELATIONS', 'COPA',
       'BOARD OF ETHICS', 'POLICE BOARD', 'ADMIN HEARNG',
       'LICENSE APPL COMM'], dtype=object)

In [14]:
data['Department'].value_counts()

POLICE                   13414
FIRE                      4641
STREETS & SAN             2198
OEMC                      2102
WATER MGMNT               1879
AVIATION                  1629
TRANSPORTN                1140
PUBLIC LIBRARY            1015
GENERAL SERVICES           980
FAMILY & SUPPORT           615
FINANCE                    560
HEALTH                     488
CITY COUNCIL               411
LAW                        407
BUILDINGS                  269
COMMUNITY DEVELOPMENT      207
BUSINESS AFFAIRS           171
COPA                       116
BOARD OF ELECTION          107
DoIT                        99
PROCUREMENT                 92
INSPECTOR GEN               87
MAYOR'S OFFICE              85
CITY CLERK                  84
ANIMAL CONTRL               81
HUMAN RESOURCES             79
CULTURAL AFFAIRS            65
BUDGET & MGMT               46
ADMIN HEARNG                39
DISABILITIES                28
TREASURER                   22
HUMAN RELATIONS             16
BOARD OF

# Challenge 2 - Hypothesis Tests

In this section of the lab, we will test whether the hourly wage of all hourly workers is significantly different from $30/hr. Import the correct one sample test function from scipy and perform the hypothesis test for a 95% two sided confidence interval.

In [15]:
data.sample(5)

Unnamed: 0,Name,Job Titles,Department,Full or Part-Time,Salary or Hourly,Typical Hours,Annual Salary,Hourly Rate
11766,"HAN, HUI L",PRINCIPAL DATA BASE ANALYST,DoIT,F,Salary,,113868.0,
13236,"HUBBARD, SHAVADA L",CROSSING GUARD,OEMC,P,Hourly,20.0,,19.38
8326,"ESPEJO, ANTONIO U",POLICE OFFICER,POLICE,F,Salary,,87006.0,
25197,"RODRIGUEZ, AGUSTIN",POLICE OFFICER,POLICE,F,Salary,,84054.0,
19868,"MITCHELL, RODNEY L",PLUMBING INSPECTOR,WATER MGMNT,F,Salary,,102510.0,


In [16]:
# Your code here:
#We get a datafram only with people msuered hourly
data_hourly_rate = data[data['Hourly Rate'].isnull() == False]
data_hourly_rate.sample(5)

Unnamed: 0,Name,Job Titles,Department,Full or Part-Time,Salary or Hourly,Typical Hours,Annual Salary,Hourly Rate
20579,"MULLAN, JOSEPH F",ELECTRICAL MECHANIC,AVIATION,F,Hourly,40.0,,46.1
16986,"LIZZO, JOSEPH",MOTOR TRUCK DRIVER,STREETS & SAN,F,Hourly,40.0,,35.6
9527,"FUENTES, VICTOR A",LAMP MAINTENANCE WORKER,TRANSPORTN,F,Hourly,40.0,,38.14
30557,"VAZQUEZ, GEORGE L",ELECTRICAL MECHANIC (AUTOMOTIVE),GENERAL SERVICES,F,Hourly,40.0,,46.1
17304,"LOUZON, FRANK G",WATCHMAN,GENERAL SERVICES,F,Hourly,40.0,,21.98


In [29]:
# we get a sample of 50 pleople
hourly_wage_sample1 = data_hourly_rate['Hourly Rate'].sample(35)
hourly_wage_sample1

10113    45.35
6947     40.20
3839     15.22
23663    43.03
29485    48.25
22303    36.21
7876     35.60
2695     36.21
29883    21.20
3153     37.56
23118    32.04
6145     36.21
22715    35.60
1118     19.38
2638     46.10
22776    45.07
21273    49.37
1279     37.25
22809    46.10
18413    32.04
18184    32.04
25817    17.68
29860    47.44
23819    46.10
15957    20.31
12817    40.20
17662    19.86
15348    23.31
588      38.33
19521    35.60
1358     35.60
10088    40.20
20011    41.10
25926    22.36
2779      2.65
Name: Hourly Rate, dtype: float64

In [30]:
#Hypotesis
# Ho: /mu = 30  H1: /mu != 30
#The fact that alternative hypotesys H1 is different != than 30, could be morte than 30 or less than 30 is a two tail test
#(the value could be in both sides of the normal distribution)
# shade region = 0.024 i two sides. C value 1.96




In [38]:
#compute sample mean and sample std
mean1 = hourly_wage_sample1.mean()
std1 = hourly_wage_sample1.std()
display(mean1)
display(std1)

34.307714285714276

11.179614348213894

In [45]:
#compute the test statistics
stat1 = (mean1-30)/(std1/np.sqrt(35))
stat1

2.2795760751045226

In [46]:
#Compute the p-value for a 2-tailed
p_value_sample1 =display(st.t.sf(abs(stat1),35-1)*2)
p_value_sample1

0.02902809096154299

In [47]:
type(p_value_sample1)


NoneType

In [48]:
p_val_sample1 = 0.02902809096154299

In [50]:
# Same calculation withd st packege
st.ttest_1samp(hourly_wage_sample1,30)

Ttest_1sampResult(statistic=2.2795760751045226, pvalue=0.02902809096154299)

In [51]:
#Result
if p_val_sample1 > 0.05:
    print("With the available data we dont have enoght evidence to reject the Null hypotesis H0")
elif p_val_sample1 < 0.05:
    print('We reject the hypotesis')

We reject the hypotesis


We are also curious about salaries in the police force. The chief of police in Chicago claimed in a press briefing that salaries this year are higher than last year's mean of $86000/year a year for all salaried employees. Test this one sided hypothesis using a 95% confidence interval.

Hint: A one tailed test has a p-value that is half of the two tailed p-value. If our hypothesis is greater than, then to reject, the test statistic must also be positive.

In [54]:
# Your code here:
police_deparment = data[data['Department'] == 'POLICE']
police_deparment

Unnamed: 0,Name,Job Titles,Department,Full or Part-Time,Salary or Hourly,Typical Hours,Annual Salary,Hourly Rate
0,"AARON, JEFFERY M",SERGEANT,POLICE,F,Salary,,101442.0,
1,"AARON, KARINA",POLICE OFFICER (ASSIGNED AS DETECTIVE),POLICE,F,Salary,,94122.0,
9,"ABBATE, TERRY M",POLICE OFFICER,POLICE,F,Salary,,93354.0,
11,"ABDALLAH, ZAID",POLICE OFFICER,POLICE,F,Salary,,84054.0,
12,"ABDELHADI, ABDALMAHD",POLICE OFFICER,POLICE,F,Salary,,87006.0,
...,...,...,...,...,...,...,...,...
33177,"ZYGMUNT, DAWID",POLICE OFFICER,POLICE,F,Salary,,72510.0,
33178,"ZYLINSKA, KATARZYNA",POLICE OFFICER,POLICE,F,Salary,,72510.0,
33179,"ZYMANTAS, LAURA C",POLICE OFFICER,POLICE,F,Salary,,48078.0,
33180,"ZYMANTAS, MARK E",POLICE OFFICER,POLICE,F,Salary,,90024.0,


In [57]:
police_sample1 = police_deparment['Annual Salary'].sample(35)
police_sample1

11695     93354.0
20762     87006.0
30158    104628.0
2754      93354.0
10517     96060.0
24748    123894.0
4976      96060.0
30177     72510.0
630      104628.0
32823     97056.0
7047      48078.0
27601     84054.0
5214      40392.0
10223     90024.0
16907     94122.0
10067     94524.0
27364     93354.0
13712     87006.0
6183      72510.0
27537    100980.0
23479     72510.0
22842     87006.0
3714      87006.0
20663     90024.0
21147     96060.0
7781      93354.0
15790    107988.0
13353     76266.0
6579      48078.0
28719     84054.0
2169      48078.0
5625      90024.0
3736      90024.0
8541     101442.0
23563     90024.0
Name: Annual Salary, dtype: float64

In [None]:
#We are also curious about salaries in the police force. The chief of police in Chicago claimed in a press
#briefing that salaries this year are higher than last year's mean of $86000/year a year for all salaried employees.
#Test this one sided hypothesis using a 95% confidence interval.

In [None]:
# H0: / mu <= 86000    H1: > 86000

In [58]:
#compute sample mean and sample std
mean2 = police_sample1.mean()
std2 = police_sample1.std()
display(mean2)
display(std2)

86729.4857142857

17959.152320771493

In [60]:
#compute the test statistics
stat2 = (mean2-8600)/(std2/np.sqrt(35))
stat2

25.73731001566558

In [62]:
# p-value 1-tailed
display(st.t.sf(abs(stat2),35-1))


3.53732951349345e-24

In [68]:
st.ttest_1samp(police_sample1,8600,alternative='less')

Ttest_1sampResult(statistic=25.737310015665578, pvalue=1.0)

In [67]:
st.ttest_1samp(police_sample1,8600,alternative="greater")

Ttest_1sampResult(statistic=25.737310015665578, pvalue=3.537329513493476e-24)

In [None]:
#Result
if p_val_sample1 > 0.05:
    print("With the available data we dont have enoght evidence to reject the Null hypotesis H0")
elif p_val_sample1 < 0.05:
    print('We reject the hypotesis')

In [None]:
# Duda, entonces  cuando se usa alternative, y cuando es less or greater
#En el ejemplo primero que es 2 tail  se usa la formula sin alternative pq solo se debe comprobar que es = por lo tanto es un valor
# A diferencia del segundo que es one tail el que se busca descartar H0 tiene 2 por ende se usa con alternative pero como sabes
#cual alternative usar, si el que es greater pq H1 es > o pq es lo inverso de H0 o que logica usar, la formula 
# da el valor igual al the greater

Using the `crosstab` function, find the department that has the most hourly workers. 

In [70]:
data.sample(5)

Unnamed: 0,Name,Job Titles,Department,Full or Part-Time,Salary or Hourly,Typical Hours,Annual Salary,Hourly Rate
4139,"CARNEY, JEANETTE I",CROSSING GUARD - PER CBA,OEMC,P,Hourly,20.0,,14.54
28036,"SORICH, ANDREW D",PROPERTY CUSTODIAN - AUTO POUND,STREETS & SAN,F,Salary,,70092.0,
11608,"HALEEM, MAHMOUD A",POLICE OFFICER,POLICE,F,Salary,,93354.0,
4188,"CARRERA, JASMIN A",POLICE OFFICER,POLICE,F,Salary,,48078.0,
7942,"DYSON, ANTOINETTE",CONCRETE LABORER,TRANSPORTN,F,Hourly,40.0,,40.2


In [193]:
#data[data['Department'] == 'HEALTH'].sample(10)

In [194]:
#data[data['Salary or Hourly'] == 'Salary']

In [192]:
pd.crosstab(data['Department'],data['Salary or Hourly']).reset_index().sort_values(by=['Hourly'],ascending=False).head(1)

Salary or Hourly,Department,Hourly,Salary
31,STREETS & SAN,1862,336


In [160]:
#all the deparment 2198 lines
data[data['Department'] == 'STREETS & SAN'].shape

(2198, 8)

In [188]:
#Only workers hourly in dept streets ...
data[data['Department'] == 'STREETS & SAN'][data['Salary or Hourly'] == 'Hourly'].shape


  data[data['Department'] == 'STREETS & SAN'][data['Salary or Hourly'] == 'Hourly'].shape


(1862, 8)

In [None]:
#data['Hourly Rate'][data['Hourly Rate'].isnull() == False].sort_values(ascending=False) 

In [190]:
#Crossing tables  the department with the highst mean hourly workes is Health
pd.crosstab(data['Department'],data['Salary or Hourly'], values =data['Hourly Rate'][data['Hourly Rate'].isnull() == False] ,aggfunc=np.mean).reset_index().sort_values(by=['Hourly'],ascending=False).head(3)

Salary or Hourly,Department,Hourly
11,HEALTH,81.953333
21,WATER MGMNT,42.178698
10,GENERAL SERVICES,41.775503


The workers from the department with the most hourly workers have complained that their hourly wage is less than $35/hour. Using a one sample t-test, test this one-sided hypothesis at the 95% confidence level.

In [196]:
# Your code here:
#Dataframe to analyze
data[data['Department'] == 'STREETS & SAN'][data['Salary or Hourly'] == 'Hourly'].head(5)


  data[data['Department'] == 'STREETS & SAN'][data['Salary or Hourly'] == 'Hourly'].head(5)


Unnamed: 0,Name,Job Titles,Department,Full or Part-Time,Salary or Hourly,Typical Hours,Annual Salary,Hourly Rate
7,"ABBATE, JOSEPH L",POOL MOTOR TRUCK DRIVER,STREETS & SAN,F,Hourly,40.0,,35.6
21,"ABDUL-SHAKUR, TAHIR",GENERAL LABORER - DSS,STREETS & SAN,F,Hourly,40.0,,21.43
24,"ABERCROMBIE, TIMOTHY",MOTOR TRUCK DRIVER,STREETS & SAN,F,Hourly,40.0,,35.6
36,"ABRAMS, DANIELLE T",SANITATION LABORER,STREETS & SAN,F,Hourly,40.0,,36.21
39,"ABRAMS, SAMUEL A",POOL MOTOR TRUCK DRIVER,STREETS & SAN,F,Hourly,40.0,,35.6


In [198]:
#Dataframe filtered by hourly rate to take sample from here
data["Hourly Rate"][data['Department'] == 'STREETS & SAN'][data['Salary or Hourly'] == 'Hourly']


7        35.60
21       21.43
24       35.60
36       36.21
39       35.60
         ...  
33106    36.13
33107    35.60
33147    35.60
33149    36.21
33156    22.12
Name: Hourly Rate, Length: 1862, dtype: float64

In [202]:
#Sampling 35 
streetsdep_sample = data["Hourly Rate"][data['Department'] == 'STREETS & SAN'][data['Salary or Hourly'] == 'Hourly'].sample(40)
streetsdep_sample

5136     28.48
20243    35.60
14556    20.12
21574    35.60
12763    36.21
32004    35.60
14632    20.12
4294     35.60
22715    35.60
12175    35.60
12201    35.60
18051    37.25
3622     36.21
4173     35.60
11230    36.21
22643    28.48
4985     36.21
32166    36.21
25412    35.60
10980    35.60
30403    19.50
15264    36.21
33105    35.60
8972     36.21
11465    36.21
31667    36.21
10974    35.60
26942    35.60
23544    28.48
32111    32.04
1229     28.48
9191     36.13
31601    35.60
14479    36.21
29876    37.25
22755    35.60
1073     36.21
24911    36.21
9601     36.22
16861    36.21
Name: Hourly Rate, dtype: float64

In [213]:
#compute sample mean and sample std
mean3 = streetsdep_sample.mean()
std3 = streetsdep_sample.std()
display(mean3)
display(std3)

33.93200000000001

4.678320540207038

In [214]:
#The workers from the department with the most hourly workers have complained that their hourly wage 
# is less than $35/hour. Using a one sample t-test, test this one-sided hypothesis at the 95% confidence level.

#H0: >= 35    H1:< 35


In [215]:
#compute the test statistics
stat3 = (mean3-35)/(std3/np.sqrt(40))
stat3

-1.443814083295087

In [216]:
display(st.t.sf(abs(stat3),40-1))

0.07838848090244929

In [219]:
p_val_sample_street =0.07838848090244929

In [217]:
st.ttest_1samp(streetsdep_sample,35,alternative='less')

Ttest_1sampResult(statistic=-1.443814083295096, pvalue=0.07838848090244795)

In [218]:
st.ttest_1samp(streetsdep_sample,35,alternative="greater")

Ttest_1sampResult(statistic=-1.443814083295096, pvalue=0.9216115190975521)

In [221]:
#Result
if p_val_sample_street > 0.05:
    print("With the available data we dont have enought evidence to reject the Null hypotesis H0")
elif p_val_sample_street < 0.05:
    print('We reject the hypotesis')

With the available data we dont have enought evidence to reject the Null hypotesis H0


# Challenge 3: To practice - Constructing Confidence Intervals

While testing our hypothesis is a great way to gather empirical evidence for accepting or rejecting the hypothesis, another way to gather evidence is by creating a confidence interval. A confidence interval gives us information about the true mean of the population. So for a 95% confidence interval, we are 95% sure that the mean of the population is within the confidence interval. 
).

To read more about confidence intervals, click [here](https://en.wikipedia.org/wiki/Confidence_interval).


In the cell below, we will construct a 95% confidence interval for the mean hourly wage of all hourly workers. 

The confidence interval is computed in SciPy using the `t.interval` function. You can read more about this function [here](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.t.html).

To compute the confidence interval of the hourly wage, use the 0.95 for the confidence level, number of rows - 1 for degrees of freedom, the mean of the sample for the location parameter and the standard error for the scale. The standard error can be computed using [this](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.sem.html) function in SciPy.

In [222]:
# Your code here:
meanc3 = hourly_wage_sample1.mean()
stdc3 = hourly_wage_sample1.std()
dof = len(hourly_wage_sample1) -1
confidence = 0.95


In [228]:
#We now need the value of t. The function that calculates the inverse cumulative distribution is ppf. 
#We need to apply the absolute value because the cumulative distribution works with the left tail, so
#the result would be negative.

t_crit = np.abs(t.ppf((1-confidence)/2,dof))
t_crit

2.032244509317718

In [229]:
#Now, we can apply the original formula to calculate the 95% confidence interval.
(meanc3-stdc3*t_crit/np.sqrt(len(hourly_wage_sample1)), meanc3+stdc3*t_crit/np.sqrt(len(hourly_wage_sample1))) 


(30.467382392227925, 38.14804617920063)

In [None]:
# we could say that, with 95% confidence, the expected value of the population lies between 30.46 and 38.14

Now construct the 95% confidence interval for all salaried employeed in the police in the cell below.

In [230]:
data.head(2)

Unnamed: 0,Name,Job Titles,Department,Full or Part-Time,Salary or Hourly,Typical Hours,Annual Salary,Hourly Rate
0,"AARON, JEFFERY M",SERGEANT,POLICE,F,Salary,,101442.0,
1,"AARON, KARINA",POLICE OFFICER (ASSIGNED AS DETECTIVE),POLICE,F,Salary,,94122.0,


In [233]:
# Your code here:
#Dataframe filtered by hourly rate to take sample from here
police_salary_sample =data["Annual Salary"][data['Department'] == 'POLICE'][data['Salary or Hourly'] == 'Salary'].sample(35)
police_salary_sample

3030      84054.0
23666     84054.0
22509     90024.0
29980    104628.0
26753    100980.0
14716     84054.0
19456     87006.0
21548     87006.0
20287     94122.0
9143     114846.0
14877     48078.0
18581     87006.0
4668      90024.0
25929     87006.0
20998     87006.0
13156     80016.0
31550     48078.0
13775    104628.0
27970     93354.0
14810     87006.0
22222     48078.0
25256     63876.0
6313      48078.0
12023     48078.0
31041     90024.0
27617     40392.0
26246     84054.0
11947     96060.0
31962     97440.0
15246     90024.0
27047     50628.0
30978     90024.0
22538     90024.0
25780     53340.0
22055     94524.0
Name: Annual Salary, dtype: float64

In [235]:
meanc3pol = police_salary_sample.mean()
stdc3pol = police_salary_sample.std()
dof_pol = len(police_salary_sample) -1
confidence = 0.95

In [236]:
t_crit_pol = np.abs(t.ppf((1-confidence)/2,dof))
t_crit_pol

2.032244509317718

In [237]:
#Now, we can apply the original formula to calculate the 95% confidence interval.
(meanc3pol-stdc3pol*t_crit/np.sqrt(len(police_salary_sample)), meanc3pol+stdc3pol*t_crit/np.sqrt(len(police_salary_sample))) 

(73955.13910471294, 87623.14660957277)

# Bonus Challenge - Hypothesis Tests of Proportions

Another type of one sample test is a hypothesis test of proportions. In this test, we examine whether the proportion of a group in our sample is significantly different than a fraction. 

You can read more about one sample proportion tests [here](http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/SAS/SAS6-CategoricalData/SAS6-CategoricalData2.html).

In the cell below, use the `proportions_ztest` function from `statsmodels` to perform a hypothesis test that will determine whether the number of hourly workers in the City of Chicago is significantly different from 25% at the 95% confidence level.

In [None]:
# Your code here:

