# Two Sample t-test for Independent Samples

## Paired Data

* Paired subjects
    * before and after
    * twins
* Pairing lowers variability

## Independent Samples

* Samples
    * Random
    * different populations
* Experiments
    * Two randomly assigned treatments

## Details for a 2 sample test

### Hypotheses

$$ H_0: \mu_1 - \mu_2 = 0 $$
$$ H_0: \mu_1 - \mu_2 (<,>,\ne) 0 $$

### Statistic

$$ \bar{y}_1 - \bar{y}_2$$

### Standard Error

$$SE_1 = \frac{s_1}{\sqrt{n_1}}, SE_2 = \frac{s_2}{\sqrt{n_2}}$$

$$ SE_{\bar{y}_1-\bar{y}_2}  = \sqrt{SE_1^2+SE_2^2}$$

### Degrees of Freedom (Satterthwaite approximation)

$$ df = \frac{\left(SE_1^2+SE_2^2\right)^2}{\frac{SE_1^4}{n_1-1} + \frac{SE_2^4}{n_2-1}} $$

### Test Statistic

$$ t_s = \frac{(\bar{y}_1-\bar{y}_2) - 0}{SE_{\bar{y}_1-\bar{y}_2}}$$

## Confidence Interval

$$ \left( \bar{y}_1 - \bar{y}_1\right) \pm t_c*SE_{\bar{y}_1-\bar{y}_2}$$

Where `t_c = abs(qt(tail_area, df))`

### Example

* randomized experiment
    * 6 subject given a drug
    * 6 subjects given a placebo
* Reaction time was measured
* **Research question:** Did the drug increase reaction time? 


We are going to subtract $Treat - Control$ and treatment is hypothetically larger

Thus $H_a: \mu_{treat} - \mu_{control} > 0$

In [36]:
Control = c(91, 87, 99, 77, 88, 91)
Treat = c(101, 110, 103, 93, 99, 104)

## Long way

In [37]:
mean_diff <- mean(Treat) - mean(Control)
mean_diff

In [38]:
n_1 <- length(Treat)
n_2 <- length(Control)
SE_1 <- sd(Treat)/sqrt(n_1)
SE_2 <- sd(Control)/sqrt(n_2) 
SE_diff <- sqrt(SE_1^2 + SE_2^2)
SE_diff

In [39]:
t_s <- (mean_diff - 0)/SE_diff
t_s

In [40]:
df = (SE_1^2 + SE_2^2)^2/(SE_1^4/(n_1 - 1) + SE_2^4/(n_2 - 1))
df

In [41]:
# Greater than test -> lower tail = FALSE
pval <- pt(t_s, df, lower.tail = FALSE)
pval

In [42]:
# Make a 90% CI
tail_area <- 0.05
t_c <- abs(qt(tail_area, df))
t_c

In [43]:
lower <- mean_diff - t_c*SE_diff
upper <- mean_diff + t_c*SE_diff
c(lower, upper)

## Short way - One Tail Test

In [44]:
t.test(Treat, Control, alternative = "greater")


	Welch Two Sample t-test

data:  Treat and Control
t = 3.4456, df = 9.4797, p-value = 0.003391
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 6.044949      Inf
sample estimates:
mean of x mean of y 
101.66667  88.83333 


## Short way - Two tail CI

In [45]:
t.test(Treat, Control, alternative = "two.sided", conf.level = 0.90)


	Welch Two Sample t-test

data:  Treat and Control
t = 3.4456, df = 9.4797, p-value = 0.006782
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
  6.044949 19.621717
sample estimates:
mean of x mean of y 
101.66667  88.83333 


## Two Sample t-test in JMP

Now I will demonstrate the 2 sample test in JMP, summarized as

1. Enter the data stacked
    1. Data in one column
    2. Group in another
2. Analyze > Fit Y by X
    1. Response in Y
    2. Groups in X
3. Add a Quantile plot
4. Perform a t-test
    1. Pay attention to subtraction

## Data for Group Work

Here are the data values for the homework.

In [47]:
honey_locust <- c(271.5, 21.1, 18.7, 32.2, 29.2, 32.2, 99.9, 12.3, 38.6, 1.8, 45.6, 53.2, 32.2, 38.6, 49.3, 49.3, 18.7, 21.1, 45.6, 32.2, 42, 29.2, 45.6, 12.3, 61.4, 5.9, 49.3, 8.8, 7.3, 7.3, 8.8, 29.2, 45.6, 29.2, 45.6, 23.6, 45.6, 29.2, 10.5, 35.3, 5.9, 16.4)
norway_maple <- c(262.7, 168.1, 38.6, 45.6, 23.6, 74.7, 116.7, 182.4, 141.3, 61.4, 182.4, 262.7, 128.7, 99.9, 205, 128.7, 74.7, 168.1, 122.7, 182.4, 45.6, 49.3, 70.1, 116.7, 57.2, 45.6, 122.7, 237.1, 18.7, 10.5, 3.6, 29.2, 42, 29.2, 4.7, 8.8)