[Link to videos and excercises](https://www.khanacademy.org/math/statistics-probability/confidence-intervals-one-sample)

In [21]:
# a cell to import modules and define helper functions
import math
import scipy.stats as ss

def z_table_probability(z):
    # we need to find probability from -math.inf to -z to find probability of a tail
    # once we know the probability of one of the tails
    # we can calculate probability of z standard deviations from the mean
    p = 1 - (ss.norm.cdf(-z) * 2)
    print("For critical value z* %.3f, probability is: %.2f" % (z, p))
    return p

def z_table_zscore(p):
    # we need to transform p ppf calculates from -inf to the boundary
    # but we need two-tailed estimation
    z = ss.norm.ppf(p/2+0.5)
    print("For confidence interval of %.2f, we need to be in < %.3f standard deviations from the mean" % (p, z))
    return z

def t_table_zscore(p, n):
    # we need to transform p ppf calculates from -inf to the boundary
    # but we need two-tailed estimation
    t = ss.t.ppf(p/2+0.5, n-1)
    print("t-score for confidence level of %d%% with df = %d is: %.3f" % (p*100, n-1, t))
    return t

def transform_margin_to_interval(mu, interval):
    print("The probability interval is: %.3f - %.3f" % (mu-interval, mu+interval))

# 1. Estimating a population proportion

Conditions for inference on a proportion:

1. Random 
2. Normal (proportion will always be normal if population is normally distributed and randomness condition is satisfied)
    * (expect > 10 sucesses || failures) or np > 10 and n(1-p) > 10
    * have more > 30 samples
3. Independent 
    * if sampling without replacement, sample should be < 10% of population
    
    
How to choose if you should use a z distribution or a t distribution

* if you know population's standard deviation && sample size > 30: use z distribution
* if you don't know population's standard deviation or sample size is < 30: use t distribution

## Calculating confidence levels and critical values of z*
### Problem 1: Calculate crical value z*
![](img/confidence_intervals_p1.png)
![](img/confidence_intervals_c1.png)

In [20]:
z = z_table_zscore(0.92)

For confidence interval of 0.92, we need to be in < 1.751 standard deviations from the mean


### Problem 2: Calculate confidence interval
![](img/confidence_intervals_p2.png)
![](img/confidence_intervals_c2.png)

In [19]:
p = z_table_probability(1.476)

For critical value z* 1.476, probability is: 0.86


### Problem 3: Calculate crical value z*
![](img/confidence_intervals_p3.png)

$(statistic)  \pm (critical\ value)(standard\ deviation\ of\ statistic)$

$\hat{p} \pm z^*\sqrt{\frac{p(1-p)}{n}}$

In [18]:
CONFIDENCE_LEVEL = 0.9
SAMPLE_SIZE = 200
SUCCESSES_NUMBER = 96

p = SUCCESSES_NUMBER / SAMPLE_SIZE
interval = z_table_zscore(CONFIDENCE_LEVEL) * math.sqrt((p*(1-p))/SAMPLE_SIZE)

print("Probability of success is %.2f " % p)
print("Margin of error for %.2f confidence interval is %.3f" % (CONFIDENCE_LEVEL, interval))

transform_margin_to_interval(p, interval)

For confidence interval of 0.90, we need to be in < 1.645 standard deviations from the mean
Probability of success is 0.48 
Margin of error for 0.90 confidence interval is 0.058
The probability interval is: 0.422 - 0.538


### Problem 4: Calculate minimal sample size for a margin of error
![](img/confidence_intervals_p4.png)

We need to solve 

$z^*\sqrt{\frac{p(1-p)}{n}} \leq margin\ of \ error$ 

which results in

$n \geq \frac{z^2}{margin\ of\ error^2} p(1-p)
$

In [17]:
CONFIDENCE_LEVEL = 0.95
MARGIN_OF_ERROR = 0.02
# set to None if success probability is unknown
SUCCESS_PROBABILITY = 0.9

# if we don't know p and need to maximize p(1-p) the best value for p is 0.5
if not SUCCESS_PROBABILITY:
    SUCCESS_PROBABILITY = 0.5

# solving algebraically for n
min_sample_size = (z_table_zscore(CONFIDENCE_LEVEL)**2 / MARGIN_OF_ERROR**2) \
    * SUCCESS_PROBABILITY * (1-SUCCESS_PROBABILITY)
print("If we want margin of error less than %d%%, sample size should be > %.2f" % \
      (MARGIN_OF_ERROR*100, min_sample_size))

For confidence interval of 0.95, we need to be in < 1.960 standard deviations from the mean
If we want margin of error less than 2%, sample size should be > 864.33


# 2. Estimating a population mean

To estimate margin of error we can use the formula

$\sigma_\bar{p} = \frac{\sigma}{\sqrt{n}}$

However we usually don't know the population standard deviation $\sigma$, so we substitute the sample standard deviation $s_x$ as an estimate for $\sigma$. When we do this, we call it the **standard error** of $\bar{x}$ to distinguish it from the standard deviation.

So formula for standard error of $\bar{x}$ is:

$\sigma_\bar{x} \approx \frac{s_x}{\sqrt{n}}$

Calculating t-interval for a mean:

$\bar{x} \pm t^*\frac{s_x}{\sqrt{n}}$

to find t we need to know: degree of freedom (df, which here could be n-1) and confidence level

### Problem 1: Finding the critical value t* for a desired confidence level
![](img/confidence_intervals_p5.png)

In [16]:
CONFIDENCE_LEVEL = 0.9
SAMPLE_SIZE = 18

t = t_table_zscore(CONFIDENCE_LEVEL, SAMPLE_SIZE)

t-score for confidence level of 90% with df = 17 is: 1.740


### Problem 2: Finding the critical value t* for a desired confidence level
![](img/confidence_intervals_p6.png)

In [15]:
SAMPLE_SIZE = 12
SAMPLE_MU = 2.29
SAMPLE_SIGMA = 0.2
CONFIDENCE_LEVEL = 0.9

interval = t_table_zscore(CONFIDENCE_LEVEL, SAMPLE_SIZE) * (SAMPLE_SIGMA / math.sqrt(SAMPLE_SIZE))

print("Margin of error is: %.3f" % interval)
transform_margin_to_interval(SAMPLE_MU, interval)

t-score for confidence level of 90% with df = 11 is: 1.796
Margin of error is: 0.104
The probability interval is: 2.186 - 2.394


### Problem 3: Finding the critical value t* for a desired confidence level
![](img/confidence_intervals_p7.png)

We need to solve 

$t^*\frac{s_x}{\sqrt{n}} \leq margin\ of \ error$

However getting a t score involves knowing sample size, which is exactly what we are trying to get.
We'll have to use estimated standard deviation of the population and z instead of t:

$z^*\frac{\sigma}{\sqrt{n}} \leq margin\ of \ error$

which results in

$n \geq (\frac{z^* * \sigma}{margin\ of\ error})^2
$

In [12]:
CONFIDENCE_LEVEL = 0.9
MARGIN_OF_ERROR = 20
ESTIMATED_SIGMA = 50

z = z_table_zscore(CONFIDENCE_LEVEL)
min_sample_size = (z * ESTIMATED_SIGMA / MARGIN_OF_ERROR)**2

print("If we want margin of error less than %d, sample size should be > %.2f" % \
      (MARGIN_OF_ERROR, min_sample_size))

For confidence interval of 0.90, we need to be in < 1.645 standard deviations from the mean
If we want margin of error less than 20, sample size should be > 16.91
