## Great Lakes

The DAAG package has data on the levels of several Great Lakes; this data is in the greatLakes dataset. Suppose you want to know whether Lake Erie or Lake Huron has the higher level. You can set up a hypothesis test to do this as follows. First, load the DAAG package with library(DAAG). Then create two vectors with these commands: erie <- greatLakes[,1] and huron <- greatLakes[,2]. Use these vectors to create a hypothesis test to determine whether the two lake levels are equal. 

### What is the null hypothesis for this test?

```R
H0 = mean(erie) = mean(huron)
```

### What is the alternate hypothesis for this test?

```R
H1 = mean(erie) != mean(huron) 
```

## Description
Various data sets and functions used or referred to in the book Maindonald, J.H. and Braun, W.J.
(3rd edn 2010) "Data Analysis and Graphics Using R", plus other selected datasets and functions.

## Details
For a list of , use 
```R
library(help="DAAG")
```

## where does the great lakes data come from? 

https://cran.r-project.org/web/packages/DAAG/index.html

In [40]:
# install the package DAAG
install.packages("DAAG")


The downloaded binary packages are in
	/var/folders/wk/6why77bn1kn0l0pkd4vd3zl00000gn/T//RtmpZnA2ss/downloaded_packages


In [41]:
# use the package DAAG
library(DAAG)

In [42]:
# ensure you can see the dataset greatLakes
head(greatLakes)

Erie,michHuron,Ontario,StClair
174.015,176.8867,74.8725,174.9567
174.1808,176.745,74.95583,175.0767
173.9083,176.625,74.5675,174.815
174.0258,176.4883,74.66583,174.8692
173.9275,176.445,74.65667,174.8108
173.7575,176.2642,74.44333,174.6525


In [43]:
# list column names
colnames(greatLakes)

In [44]:
# extract Erie column and put it into a dataframe called erie
erie <-greatLakes[,1]

In [45]:
# print the dataframe to ensure you have acurate information
print(erie)

Time Series:
Start = 1918 
End = 2009 
Frequency = 1 
 [1] 174.0150 174.1808 173.9083 174.0258 173.9275 173.7575 173.8433 173.5933
 [9] 173.6408 173.8183 173.9517 174.3117 174.2700 173.7050 173.7358 173.6767
[17] 173.3317 173.4108 173.4883 173.8200 173.8408 173.8458 173.7950 173.7000
[25] 173.8767 174.2317 174.0917 174.1883 174.1358 174.2350 174.2208 173.9733
[33] 174.1350 174.3242 174.5200 174.3308 174.2883 174.3483 174.1200 174.0367
[41] 173.8083 173.8500 174.0942 174.1008 173.9075 173.7267 173.6075 173.7125
[49] 173.8992 174.0592 174.2042 174.3900 174.2583 174.3117 174.5000 174.7408
[57] 174.6892 174.6158 174.5733 174.2975 174.3725 174.3917 174.5217 174.3983
[65] 174.4250 174.5208 174.5225 174.7283 174.8983 174.6675 174.2617 174.2258
[73] 174.2942 174.3267 174.3567 174.5042 174.3667 174.2842 174.3750 174.7175
[81] 174.5508 174.1025 173.9900 173.9142 174.0583 173.9658 174.1158 174.1700
[89] 174.1467 174.1392 174.1592 174.2483


In [46]:
# extract michHuron column and put it into a dataframe called huron
huron <-greatLakes[,2]

In [47]:
print(huron)

Time Series:
Start = 1918 
End = 2009 
Frequency = 1 
 [1] 176.8867 176.7450 176.6250 176.4883 176.4450 176.2642 176.1867 175.9192
 [9] 175.8850 176.1483 176.4433 176.8958 176.6508 176.1183 175.9408 175.8675
[17] 175.7667 175.8908 175.9392 175.9225 176.1408 176.2692 176.1417 176.1217
[25] 176.3342 176.6267 176.5967 176.5700 176.6017 176.5667 176.5308 176.2108
[33] 176.2608 176.7358 177.0850 176.9333 176.8292 176.7225 176.4400 176.2633
[41] 176.0675 176.0058 176.4775 176.3767 176.2225 175.9225 175.6825 175.9158
[49] 176.1608 176.3008 176.4467 176.6958 176.6783 176.8050 176.8883 177.1233
[57] 177.0933 176.9733 176.8992 176.5050 176.5908 176.7942 176.8033 176.6983
[65] 176.5983 176.8333 176.8950 177.1267 177.2925 176.9700 176.5642 176.4008
[73] 176.3500 176.4692 176.4792 176.6958 176.6783 176.5275 176.6542 176.9842
[81] 176.7167 176.2358 175.9783 175.9508 176.1183 175.8917 176.1108 176.0900
[89] 176.0158 175.9433 176.0050 176.2583


In [48]:
t_ind <- t.test(erie, huron, alternative='two.sided', var.equal=FALSE)

In [49]:
t_ind


	Welch Two Sample t-test

data:  erie and huron
t = -44.455, df = 178, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.398172 -2.194310
sample estimates:
mean of x mean of y 
 174.1382  176.4345 


In [50]:
# check the mean of erie just to double check that it's the x in the two sample t-test
mean(erie)

In [51]:
# check the mean of huron just to double check that it's the y in the two sample t-test
mean(huron)

The two means are not equal so the alternate hypothesis H1 that the two means are not equal is correct.

In [52]:
# learn more about DAAG!
# library(help="DAAG")

In [53]:
# learn about clipr
#install.packages('clipr')

In [54]:
#library(clipr)

In [55]:
#rc <- read_clip(allow_non_interactive = TRUE)

In [56]:
#print(rc)

In [57]:
# learn more about clipr
#library(help="clipr")