# Assignment # 09 
original file retrieved from: https://github.com/wcj365/python-stats-dataviz/blob/master/assignments/assignment_09.ipynb
### - Point Estimate and Interval Estimate (Confidence Interval)

A random survey of enrollment at **35** community colleges across the United States yielded the following figures: 

            6,414; 1,550; 2,109; 9,350; 21,828; 4,300; 5,944; 5,722; 2,825; 2,044;

            5,481; 5,200; 5,853; 2,750; 10,012; 6,357; 27,000; 9,414; 7,681; 3,200; 

            17,500; 9,200; 7,380; 18,314; 6,557; 13,713; 17,768; 7,493; 2,771; 2,861; 

            1,263; 7,285; 28,165; 5,080; 11,622

#### Perform point estimate and interval estimate with **95% confidence level** using **t-distribution**.

_Since we don't know the population variance, we use t-distribiution instead of normal distribution._

In [31]:
# Import Python packages
from statistics import mean
from scipy import stats
import numpy as np

### Step 0 - Data Preprocessing 

Process the raw data to make a list of integers. In order to calculate descriptive statistic, Python needs to work with a list of numbers.

- make each line of numbers a string object and then concatenate them together 
- The end result is one single string containing 35 numbers separated by ";"

### note:
Don't manually make the list by hand-typing the numbers. Write code to automate the data preparation.

In [12]:
data_1 = "6,414; 1,550; 2,109; 9,350; 21,828; 4,300; 5,944; 5,722; 2,825; 2,044; " 
data_2 = "5,481; 5,200; 5,853; 2,750; 10,012; 6,357; 27,000; 9,414; 7,681; 3,200; "
data_3 = "17,500; 9,200; 7,380; 18,314; 6,557; 13,713; 17,768; 7,493; 2,771; 2,861; "
data_4 = "1,263; 7,285; 28,165; 5,080; 11,622"
data_raw = data_1 + data_2 + data_3 + data_4
data_raw

'6,414; 1,550; 2,109; 9,350; 21,828; 4,300; 5,944; 5,722; 2,825; 2,044; 5,481; 5,200; 5,853; 2,750; 10,012; 6,357; 27,000; 9,414; 7,681; 3,200; 17,500; 9,200; 7,380; 18,314; 6,557; 13,713; 17,768; 7,493; 2,771; 2,861; 1,263; 7,285; 28,165; 5,080; 11,622'

- Convert the single string to a list of strings using split() function
- Make sure to specify a delimter or separator



In [13]:
data_cleanse = data_raw.split('; ')
print(data_cleanse)

['6,414', '1,550', '2,109', '9,350', '21,828', '4,300', '5,944', '5,722', '2,825', '2,044', '5,481', '5,200', '5,853', '2,750', '10,012', '6,357', '27,000', '9,414', '7,681', '3,200', '17,500', '9,200', '7,380', '18,314', '6,557', '13,713', '17,768', '7,493', '2,771', '2,861', '1,263', '7,285', '28,165', '5,080', '11,622']


#### Using List Comprehension
Create a list of integers from the list of strings using List Comprehension or for loop.

_Make sure to remove the "," first and then convert the strings to integers._

In [17]:
data = [int(string.replace(',','')) for string in data_cleanse]
print (data)

[6414, 1550, 2109, 9350, 21828, 4300, 5944, 5722, 2825, 2044, 5481, 5200, 5853, 2750, 10012, 6357, 27000, 9414, 7681, 3200, 17500, 9200, 7380, 18314, 6557, 13713, 17768, 7493, 2771, 2861, 1263, 7285, 28165, 5080, 11622]


### Step 1 - Calculate and Display the Sample Size and Sample Mean

- Calculate and display 
    - the sample size
    - the sample mean




In [24]:
sample_size = len(data)
sample_mean = round(mean(data))

print (f"Sample size = {sample_size}\nSample mean = {sample_mean}")


Sample size = 35
Sample mean = 8629


The point estimate of the mean enrollment of US community colleges is **8629**.

### Step 2 - Calculate and Display the Sample Standard Deviation & Sample Standard Error

Sample Standard Deviation $S=\sqrt{\dfrac{1}{n-1}\sum\limits_{i=1}^n (X_i-\bar{X})^2}$

Sample Standard Error = $\dfrac{S}{\sqrt{n}}$

Note: The default **Delta Degree of Freedom (DDOF)** for Numpy's std function is 0 which is applicable to populate data. For sample data, we need to specify **ddof=1**. 

For the enrollment data, we round up the statistics to be the full integers (no decimal points).


#### Calculate and display:  
- the sample standard deviation using Numpy's std function.
- the sample standard error

In [34]:
data_std = np.std(data, ddof=1).round()
data_sem = stats.sem(data).round()

print('Standard deviaton of sample is ', data_std)
print('Sample standard error is ', data_sem)


Standard deviaton of sample is  6944.0
Sample standard error is  1174.0


### Step 3 - Calculate t Critical Value using t-Distribution 

$\alpha$ = 1 - Confidence Level = 1 - 95% = 0.05

$\dfrac{\alpha}{2}$ = 0.025

n (sample size) = 35

df (degree of freedom) = n - 1 = 35 - 1 = 34



#### We will use Python scipy.stats t-distribution's PPF (Percentage Point Function) to calculate t critical value $t_{0.025,34}$.

- Calculate and display the t critical value using scipy.stats.t package ppf function


In [52]:


t_crit = (stats.t.ppf(0.025, 34)*-1).round(2)

print('t critical value = ', t_crit )

t critical value =  2.03


### Step 4 - Calculate the Margin of Error

Margin of Error = t-Statistics * Sample Standard Error = $t_{\alpha/2,n-1}\left(\dfrac{s}{\sqrt{n}}\right)$

#### Calculate and display: 
- the margin of error



In [58]:
margin_err = (t_crit*data_sem).round()
print("The margin of error equals " , margin_err)



The margin of error equals  2383.0


### Step 5 - Calculate Lower and Upper Limit of the Confidence Interval

Lower Limit = Sample Mean - Margin of Error

Upper Limit = Sample Mean + Margin of Error

#### Calculate and display 
- the lower limit
- the upper limit



In [61]:
lim_low, lim_up = sample_mean - margin_err, sample_mean + margin_err
print(f"Lower Limit: {lim_low}\nUpper Limit: {lim_up}")
# Lower Limit =  6246.0
# Upper Limit =  11012.0



Lower Limit: 6246.0
Upper Limit: 11012.0


### Step 6 - Now We have the 95% Confidence Interval
Confidence Interval ($\sigma$ unknown) = $\bar{x}\space\pm\space t_{\alpha/2}\left(\dfrac{S}{\sqrt{n}}\right)$ = Sample_Mean $\pm$ Margin of Error

#### Print the 95% confidence interveral in the form (lower limit, upper limit)



In [63]:
print(f"The 95% Confidence Interval = ( {lim_low}, {lim_up} )")

The 95% Confidence Interval = ( 6246.0, 11012.0 )


### The End

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=0aee4006-0af7-4619-8637-ef17c840cb78' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>