In [5]:
import numpy as np

Today will be a few more examples of hypothesis tests set up in different situations, trying to expand our experience a bit from the example we left off class with. 

## Small Sample Tests for Differences of Means

Our first example will simmultaneously demonstrate how we handle differences in means and also how we use the t-Distribution for a test with a small sample size.

This example is based on work my dad did as an industrial engineer:  The shop lead at an oven factory has developed two different procedures for making a new part that is needed in the new model. The question confronting us is which procedure is faster (and thus, assuming the same error rates, the one we should use).  The shop team procedes to make a set of the parts using the new procedures while being timed by the engineering team (note do not expect, if you are the guy holding the stop watch in this situation, that the shop team will like you very much). The results are the following:

Procedure 1:  n_1 =11, $\bar{Y}_1 = 5.6 $ minutes, and $S_1 = 0.35 $ minutes.

Procedure 2:  n_1 = 9, $\bar{Y}_2 = 5.4 $ minutes, and $S_2 = 0.42 $ minutes.

We will need to assume that the populations have the same standard deviation.

Our hypothesis could be:

$H_0:$ the two procedures have the same mean assembly time $\mu_1 - \mu_2 \leq 0$ with a confidence of 99%. 

$H_a:$ the second procedure is faster: $\mu_1 - \mu_2 > 0$. 

### Pooled Standard Deviation

To start we need to combine the two sample standard deviations we have to give the pooled distribution. The way to remember this (I am now convinced) that it is coming from the weighted average of the sample variances weighted by the degrees of freedom:

In [6]:
n1 = 11
Ybar1 = 5.6
S1 = 0.35
n2 = 9
Ybar2 = 5.4
S2 = 0.42

In [7]:
Sp = np.sqrt( ((n1-1)*S1**2 + (n2-1)*S2**2 ) / (n1+n2-2) )
Sp

0.3826951208933236

Our result from the estimator work is that 

$$ T = \frac{ (\bar{Y}_1 - \bar{Y}_2) - (\mu_1 - \mu_2) }{S_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2} }} $$

fits a Student's t-Distribution with $n_1 + n_2 - 2$ degrees of freedom.

The probability of a Type I error is then:  $P(T > t_\star)$ with

$$ t_\star = \frac{ (\bar{Y}_1 - \bar{Y}_2) }{S_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2} }} $$

In [8]:
tstar = (Ybar1 - Ybar2) / Sp / np.sqrt( 1/n1 + 1/n2)
tstar

1.1627321199889935

In [9]:
from scipy.stats import t

In [12]:
pvalue = 1 - t.cdf(tstar, n1+n2-2)
pvalue

0.1300611529709157

Our p-value is coming in at 13%, well above the target confidence and we would say **we have insufficient evidence to reject the null hypothesis**.

### Rejection Region

Forgetting about the specific outcome, we can also ask what the rejection region would be:



In [14]:
# the t-value we would need to reject the null hypothesis:

t_alpha = t.ppf(0.99, n1+n2-2)
t_alpha

2.552379630179453

In [15]:
# working backwards to the corresponding d_alpha, the difference between Y_1 and Y_2 we would need

d_alpha = t_alpha*Sp * np.sqrt(1/n1 + 1/n2)
d_alpha

0.4390314133927263

Choosing a significance of 0.5 minutes then we would estimate the probability of a Type 2 Error to be:

In [16]:
beta = t.cdf( (d_alpha - 0.5) / Sp / np.sqrt(1/n1 + 1/n2), n1+n2-2)
beta

0.3635598671550872

Note that this is the case and there is nothing to do unless we are willing to repeat the experiment. Repeating the experiment, using $S_p$ as our *a priori* estimate, we could identify a choice for $n_1$ and $n_2$ that minimize $\beta$.

### Assumptions

Note that the $T$ distribution comes with some assumptions, namely that the sample has been drawn from two normally distributed variables. This may or may not be true. A test is called a **Robust Statistical Test** if small departures from the assumptions, like the distribution of times not being exactly normal, do not change the conclusions. Provided that the p-value was not right at $\alpha$ or that the $\bar{Y}_1 - \bar{Y}_2$ was not right on the decision boundary, we do not expect small changes to have a big effect in our conclusion or error estimates. 

The two sample test is also robust on the assumption that the population standard deviations are identical:  meaning if they are not exactly equal we do not expect it to change our conclusions.


## Two Tail Tests

Two tail tests arise when our alternative hypothesis takes the form of assuming the mean is different from some value, not bigger or a smaller than some value. In the example above if we rephrased the question of:  Is their evidence that the assembly times for the two procedures is different?  We would want to use a two tailed test.

Here we distribute the allowed Type I error across both the left and right tails effectively halving the value we use to compute $t_\alpha$:

In [17]:
# the t-values for a two tailed test

t_alpha = t.ppf(0.995, n1+n2-2)
- t_alpha, t_alpha

(-2.878440472713585, 2.878440472713585)

Then we can work backwards to find the rejection region which will now have to pieces a $d < d_{-\alpha}$ and a $d > d_\alpha$:

In [18]:
-t_alpha*Sp*np.sqrt(1/n1+1/n2), t_alpha*Sp*np.sqrt(1/n1+1/n2)

(-0.49511670370658084, 0.49511670370658084)

The probability of a Type II error in this case is then the mass under the alternative distribution of this interval. Well except again this estimate is hamstrung in that we need to assume a particular alternative difference of the means.

In [21]:
l = (-t_alpha*Sp*np.sqrt(1/n1+1/n2) + 0.5)/Sp / np.sqrt(1/n1+1/n2)
u = (t_alpha*Sp*np.sqrt(1/n1+1/n2) + 0.5)/Sp/np.sqrt(1/n1 + 1/n2)

beta = t.cdf(u, n1+n2-2) - t.cdf(l, n1+n2-2)
beta

0.4888230451599327

Again for this problem this is not great. However if we can repeat the experiment then we can ask how large $n_1$ and $n_2$ should be so that the $\beta$ is small enough.

## Testing Hypothesis for Variances

The procedure can be adpated to any distribution and in particular we can use it to address questions about the variance of a population.

The key idea is that we have, for a sample drawn form a normal distribution:

$$\chi^2 = \frac{(n-1) S^2}{\sigma_0^2} $$

A cholesterol test read properly should give a variance of no more than $0.8 (\mbox{mg/L})^2$ on samples from the same blood draw. A new technician in the lab is being evaluated on the procedure and are asked to run the test 5 times. 

What is the maximum amount of sample variance we should allow for them to pass the test with 99% confidence?

- $H_0:$ is that $\sigma_0^2 \leq 0.8$

- $H_a:$ is that $\sigma_a^2 > 0.8$

It is reasonable to assume that the test, run on a single blood draw, gives results that are normally distributed with a mean at the true cholesterol level, and therefore the ratio above will fit the $\chi^2$ distribution.

The question posed is look for an uppder tail test:  What is the probability with a fixed $\sigma_0^2$ that the $S^2$ would be very large?




In [22]:
from scipy.stats import chi2

In [23]:
chi2_alpha = chi2.ppf(0.99, 4)
chi2_alpha

13.276704135987622

With this we then work backwards to a decsion boundary for $S^2$:

In [24]:
chi2_alpha * 0.8 / 4

2.6553408271975245

Note why we did this:  We expect with only 4 items in our sample a significant variance before we would reject the null hypothesis!

## Hypothesis about 2 population variances

For the manufacturing example we started with, one of our assumptions was that the population variances were equal. Can we conclude that they are in fact equal?

A reminder of the result: 

Procedure 1:  n_1 =11, and $S_1 = 0.35 $ minutes.

Procedure 2:  n_1 = 9, and $S_2 = 0.42 $ minutes.

Noting that it will not matter what the means are.

What we know is that the statistic:

$$ F = \frac{S_1^2 \sigma_2^2}{S_2^2 \sigma_1^2} $$

Satisfies an F-distribution with $n_1-1, n_2-1$ paired degrees of freedom.

Under our null hypothesis we are asking if the fact that $S_1 < S_2$ can be explained by randomness if the $\sigma_1 = \sigma_2$? 

The p-value will be the $P( F< f_\star)$ where

In [26]:
f_star = 0.35**2 / 0.42**2
f_star

0.6944444444444444

In [28]:
from scipy.stats import f

In [30]:
f.cdf(f_star, 10, 8)

0.2891197317010914

So we do not have sufficient evidence to conclude that the variances are not equal. Let's check where the decision boundary would be for a two-tailed test with confidence of 0.95. We need to find left and right boundaries such that 

$$P ( F < a) < 0.025; \quad \mbox{and} P(F > b) < 0.025 $$

In [32]:
a = f.ppf(0.025, 10, 8)
b = f.ppf(0.975, 10, 8)
a, b

(0.2594107151696225, 4.295126960172586)

In [34]:
# Check that the density between a and b is our 0.95

f.cdf(b, 10, 8) - f.cdf(a, 10, 8)

0.95

So our decision boundary is then that 

$$ S_1^2 < a S_2^2 $$ or $$ S_1^2 > b S_2^2$$



### Discussion

The tests for variances are **not robust**.  They are very sensitive to departures from the normal distribution for the population being studied. To my thinking, this is related to degree of variance they predict we will see from our samples. 

# More Discussion

That will be it for this foray into hypothesis testing. Our book goes into some additional detail of constructions of the maximum power tests.
