[Link to videos and excercises](https://www.khanacademy.org/math/statistics-probability/confidence-intervals-one-sample)

# 1. Estimating a population proportion

Conditions for inference on a proportion:

1. Random 
2. Normal (proportion will always be normal if population is normally distributed and randomness condition is satisfied)
    * (expect > 10 sucesses || failures) or np > 10 and n(1-p) > 10
    * have more > 30 samples
3. Independent 
    * if sampling without replacement, sample should be < 10% of population

## Calculating confidence levels and critical values of z*
### Problem 1: calculate crical value z*
![](img/confidence_intervals_p1.png)
![](img/confidence_intervals_c1.png)
### Problem 2: calculate confidence interval
![](img/confidence_intervals_p2.png)
![](img/confidence_intervals_c2.png)

In [1]:
import scipy.stats as ss

def z_table_probability(z):
    # we need to find probability from -math.inf to -z to find the tails
    # and then substract the tails from 1
    # 7 + (88) + 7
    p = 1 - (ss.norm.cdf(-z) * 2)
    print("For critical value z* %.3f, probability is %.2f" % (z, p))
    return p

def z_table_zscore(p):
    # we need to transform p because ppf calculates cumulatively 
    # and we are passing withing standard deviations from the mean
    # for example "find z of 92% confidence interval" means we have:
    # (4 + 92) + 4, we need to pass to ppf the probability in brackets
    z = ss.norm.ppf(p/2+0.5)
    print("For confidence interval of %.2f, we need to be in < %.3f standard deviations from the mean" % (p, z))
    return z

z_table_zscore(0.92)
z_table_probability(1.476)

For confidence interval of 0.92, we need to be in < 1.751 standard deviations from the mean
For critical value z* 1.476, probability is 0.86


0.8600561077685378

### Problem 3: calculate crical value z*
![](img/confidence_intervals_p3.png)

$(statistic)  \pm (critical\ value)(standard\ deviation\ of\ statistic)$

$\hat{p} \pm z^*\sqrt{\frac{p(1-p)}{n}}$

In [2]:
import math

CONFIDENCE_LEVEL = 0.9
SAMPLE_SIZE = 200
SUCCESSES_NUMBER = 96

p = SUCCESSES_NUMBER / SAMPLE_SIZE
print("Probability of success is %.2f " % p)
interval = z_table_zscore(CONFIDENCE_LEVEL) * math.sqrt((p*(1-p))/SAMPLE_SIZE)
print("Margin of error for %.2f confidence interval is %.3f" % (CONFIDENCE_LEVEL, interval))
print("The probability interval is %.3f - %.3f" % (p-interval, p+interval))

Probability of success is 0.48 
For confidence interval of 0.90, we need to be in < 1.645 standard deviations from the mean
Margin of error for 0.90 confidence interval is 0.058
The probability interval is 0.422 - 0.538


### Problem 4: calculate minimal sample size for a margin of error
![](img/confidence_intervals_p4.png)

We need  to solve 
$z^*\sqrt{\frac{p(1-p)}{n}} \geq margin\ of \ error$ 

which results in

$n \geq \frac{z^2}{margin\ of\ error^2} p(1-p)
$

In [3]:
CONFIDENCE_LEVEL = 0.95
MARGIN_OF_ERROR = 0.02
# set to None if success probability is unknown
SUCCESS_PROBABILITY = 0.9

# if we don't know p and need to maximize p(1-p) the best value for p is 0.5
if not SUCCESS_PROBABILITY:
    SUCCESS_PROBABILITY = 0.5

# solving algebraically for n
min_sample_size = (z_table_zscore(CONFIDENCE_LEVEL)**2 / MARGIN_OF_ERROR**2) * SUCCESS_PROBABILITY * (1-SUCCESS_PROBABILITY)
print("If we want margin of error less than %d%%, sample size should be > %d" % (MARGIN_OF_ERROR*100, min_sample_size))

For confidence interval of 0.95, we need to be in < 1.960 standard deviations from the mean
If we want margin of error less than 2%, sample size should be > 864


# 2. Estimating a population mean

To estimate margin of error we can use the formula

$\sigma_\bar{p} = \frac{\sigma}{\sqrt{n}}$

However we usually don't know the population standard deviation $\sigma$, so we substitute the sample standard deviation $s_x$ as an estimate for $\sigma$. When we do this, we call it the **standard error** of $\bar{x}$ to distinguish it from the standard deviation.

So formula for standard error of $\bar{x}$ is:

$\sigma_\bar{x} \approx \frac{s_x}{\sqrt{n}}$

Calculating t-interval for a mean:

$\bar{x} \pm t^*\frac{s_x}{\sqrt{n}}$

to find t we need to know: degree of freedom (df, which here could be n-1) and confidence level

### Problem 1: Finding the critical value t* for a desired confidence level
![](img/confidence_intervals_p5.png)

In [26]:
CONFIDENCE_LEVEL = 0.9
SAMPLE_SIZE = 18

def t_table_zscore(p, n):
    # we need to transform p because ppf calculates cumulatively from -inf
    # but we need within std from the mean
    t = ss.t.ppf(p/2+0.5, n-1)
    print("t-score for confidence level of %d%% with df = %d is %.3f" % (p*100, n-1, t))
    return t

t = t_table_zscore(CONFIDENCE_LEVEL, SAMPLE_SIZE)

t-score for confidence level of 90% with df = 17 is 1.740
