# ECON 325: Hypothesis Testing
---

#### Authors
---
Oliver (Junye) Xu (xjy099@mail.ubc.ca) <br>
Colby Chambers (colby.chambers@ubc.ca)

#### Prerequisites
---
* Introduction to Jupyter
* Introduction to R
* Introduction to Visualization
* Central Tendency
* Distribution
* Dispersion and Dependence
* Confidence Intervals


#### Outcomes
---
After completing this notebook, you will be able to:

* Set up hypotheses to address a research question
* Conduct 1-sample and 2-sample $t$-tests to address these questions in the context of population means
* Use the critical value and $p$-value approaches to determine whether or not to reject a null hypothesis
* Interpret type I and type II errors in order to explore how sample and population statistics relate

#### Introduction
---
In the previous notebook, we covered a basic estimating tool in statistics: confidence intervals. In this notebook, we will build on this knowledge and learn about an important inference technique, perhaps one of the most important concepts in elementary statistics: **hypothesis testing**. Hypothesis testing allows us to test expectations we have about data. We first create a hypothesis about some phenomenon (i.e. the relationship between two variables in our dataset). We then conduct a test to determine whether the sample data gives us credible reason to reject this initial hypothesis. This is a cursory glance of hypothesis testing. We will dive into the concept in much more detail throughout this notebook; along the way, we will rely on helpful built-in functions in R to make this process more convenient. As you go through this notebook, pay careful attention to not just the mechanics but also the logic of hypothesis testing. This is perhaps the single most important concept in elementary econometrics and so a careful understanding of this material will serve you well in future courses and beyond. Let's get started.

In [7]:
#loading tests
source("testing_hypothesis_testing.r")

# importing packages
library(tidyverse)
library(haven)

# reading in the data
census_data <- read_dta("01_census2016.dta")
census_data <- filter(census_data, !is.na(census_data$wages))

## The Hypothesis Testing Procedure
---

A hypothesis test always involves two hypotheses: the null hypothesis and the alternative hypothesis. The null hypothesis (${H_0}$) expresses a default claim which is to be tested, while the alternative hypothesis (${H_1}$) expresses the contrary to the null hypothesis. Typically, our alternative hypothesis expresses what we may hope to prove about our data. Perhaps we suspect that the mean wage of Canadian men is $50,000 per year. As a default claim, we will assert the null hypothesis that the mean wage of Canadian men equals 50,000. Our alternative hypothesis will then express that the mean wage of Canadian men is not 50,000. If we find sufficient evidence in the data to reject the null hypothesis, we can argue with a certain degree of certainty that we should favour this alternative hypothesis. If we don't find this strong evidence, we accept the null hypothesis that the mean wage of Canadian men is 50,000.

To determine whether we can accept this null hypothesis or reject it in favour of the altnerative hypothesis, we need two figures: a **significance level** (denoted by $\alpha$) and a **test statistic**. The test statistic is a number we calculate from our data, which is a function of various features of that data such as its mean, standard deviation, and sample size. The significance level, on the other hand, is a probability that we choose at the outset of the hypothesis testing process. It is the probability of seeing a sample statistic (such as the sample mean for wages of Canadian men) in our data if we were to assume that the null hypothesis is true. Together, these two figures provide the criterion under which we can accept or reject our null hypothesis. 

There are two common approaches we can use when testing a hypothesis: the **critical value approach (rejection region)** and **$p$-value approach**. Let's now concretely explore the steps in both approaches. Steps 1-3 apply identically to both the critical value and $p$-value approaches, and must always be followed. Step 4, the interpretation step, diverges between the two approaches.

We will start with one important type of test: the **one sample $t$-test**.   This kind of test is used to evaluate statements about whether the population average is equal to a particular value - for instance, our example above with average wages being equal to $50,000.  This test is appropriate in situations where:

1.  The statistic is normally distributed: in the case of the sample mean, when  $n > 30$, invoking the Central Limit Theorem for normality, 
2.  We don't know the standard deviation of the population of interest. 

This is very similar to when we constructed confidence intervals for a sample mean when we didn't know the population standard deviation in the previous notebook. While a common type of test, this is just one of many possibilities for constructing a hypothesis test; however, we will use it as our example for brevity's sake as we go through the general hypothesis testing procedure. For a list of possible hypothesis testing set-ups, 

> **Tip**: Wikipedia actually has quite a useful article containing a chart of [Common Hypothesis Tests](https://en.wikipedia.org/wiki/Test_statistic) for different kinds of statistics.

#### Step 1: State the Null Hypothesis and Alternative Hypothesis

The null hypothesis that a population mean wage $\mu^{wage}$ equals a certain value $\mu_{0}$ is:

$$ {H_0}: \mu^{wage} = \mu_{0}$$

At this point, we have 3 choices for how to formulate our alternative hypothesis:

1. **Two-Sided Test**: If we want the rejection of the null hypothesis to allow us to argue that $\mu^{wage}$ is different from the specific value $\mu_{0}$, then we can express our alternative hypothesis as:

$$ {H_1}: \mu^{wage} \neq \mu_{0}$$

2. **One-Sided Test (Left-Tailed)**: If we want the rejection of the null hypothesis to allow us to argue that $\mu^{wage}$ is less than the specific value $\mu_{0}$, then we can express our alternative hypothesis as:

$$ {H_1}: \mu^{wage} < \mu_{0}$$

3. **One-Sided Test (Right-Tailed)**: If we want the rejection of the null hypothesis to allow us to argue that $\mu^{wage}$ is greater than the specific value $\mu_{0}$, then we can express our alternative hypothesis as:

$$ {H_1}: \mu^{wage} > \mu_{0}$$

In all of these hypotheses, and for hypotheses in general, it is important to remember that we construct our hypotheses about *population parameters*, not sample statistics. This is because the purpose of the hypothesis testing procedure is for us to make inferences about a population. When we collect a sample, we can immediately calculate its sample mean, variance or other features. There is thus no use to construct hypotheses about these parameters! Further, while the example above considers hypotheses about a population mean, we can make hypotheses about a population variance, proportion, or various other parameters of interest. The population mean is just the default we are considering since we are focusing our attention on the one-sample t-test for now.

#### Step 2: Choose a Significance Level $\alpha$
Before any calculation of test statistics, we must choose a significance level. As a reminder, this is the probability of seeing the sample statistic we find from our data when we assume that our null hypothesis is actually true. We most commonly set our significance level at 0.05, or 5%. You will understand the importance of this number more when we reach the interpretion stage, but it is important to recognize that this value must be selected before running our test. One important note: remember in the previous notebook on confidence intervals that the confidence level is denoted as $1 - \alpha$? Here, $\alpha$ itself is the significance level, meaning that the confidence level and significance level add up to 1. It is important not to use these two terms interchangeably. They are distinctively different.

#### Step 3: Compute the Test Statistic
This is the most mathematical step, truly the only one requiring any calculation in the procedure. Our test statistic gives us a numeric benchmark under which we can accept or reject our null hypothesis upon interpretation in Step 4. Calculating the test statistic is quite quick, but it is important to understand the intuition behind it and how it is derived. Any time we calculate a test statistic, we use the same approach. We take our sample statistic, subtract it from the mean of our sampling distribution, then divide that all by the standard deviation of our sampling distribution. This will differ slightly depending on the situation we find ourselves in. The type of parameter we are making inferences about, as well as our sample size and the shape of our population distribution, will all play important roles in determining how exactly we calculate the mean and standard deviation of our desired sampling distribution; however, the general process outlined above will always hold for calculating a test statistic. As noted, we will look below at calculating the test statistic for one cases: one sample t-tests.

If we don't know the standard deviation of our population but do know that the distribution of our sampling statistic is normal (since either the population distribution is normal or the sample size is > 30), we calculate our test statistic using the following **one sample t-test**:

> $$\frac{\bar x - \mu_{0}}{(s / \sqrt n)}$$ <br>
where $\bar x$ is the sample mean we have found, $\mu_{0}$ is the population mean we are assuming to be true under the null hypothesis $H_{0} : \mu = \mu_{0}$, $s$ is the sample standard deviation and $n$ is the sample size. Here we simply use the sample standard deviation in place of the population standard deviation to calculate the standard deviation of the distribution of sample means. This is because we do not know the population standard deviation itself. In this way, this is a corollary to when we used the t-distribution in place of the z-distribution when calculating confidence intervals in the previous notebook.

Again, the test statistic will be calculated using many different formulas, depending on the shape of our population distribution, what we know about it, the size of our sample, and whether our hypothesis is about a population mean, proportion, or variance. However, in the case of focus, making a hypothesis about the mean of a population for which we don't know its standard deviation, the above formula is what should be used.

#### Step 4: Interpret the Results
This the final step of our hypothesis testing. It allows us to conclusively reject or fail to reject the null hypothesis we have set up. Importantly, this is the first step in the hypothesis testing process that allows to choose one of a couple of methods to arrive at the same answer. We can either use the critical value approach or p-value approach. Let's look at each of them individually.

##### The Critical Value (or Rejection Region) Approach:

In this approach, we compare our calculated test statistic to a **critical value** (or values) corresponding to our chosen significance level.  The critical value serves as the cutoff point beyond which we reject our null hypothesis.   This means that if our calculated test statistic is more extreme than the critical value (situated more within the tail of the investigated distribution), we reject our null hypothesis. If the test statistic is instead within these bounds, we fail to reject our null hypothesis. 

How are critical values computed?  Depending on our test, we determine a critical value by figure what values of statistics have an $\alpha$-percent chance of being more extreme than the critical value.  These values are called the **rejection region** and it is specific to the test.  The diagrams below should help to make the process more clear.


<img src="media/rejection 2.png" width = 500, height = 200>

In this one-sided test, our null hypothesis that $\mu \leq\mu_0$, and the alternative is that $\mu > \mu_0$.  The value which had an $\alpha$-percent of lying above is the critical value, and the red region is the rejection region. The first diagram shows us where we can reject a null hypothesis such as $H_{0}:\mu = \mu_{0}$ in favour of the alternative hypothesis $H_{1}:\mu < \mu_{0}$

<img src="media/rejection 3.png" width = 500, height = 200>

This second diagram shows us where we can reject a null hypothesis such as $H_{0}:\mu = \mu_{0}$ in favour of the alternative hypothesis $H_{1}:\mu > \mu_{0}$. In either case, the red region is our rejection region. If our calculated test statistic falls in this region, it is "more extreme" than the critical value corresponding to our chosen significance level. This signals to us that we can can reject the null hypothesis in favour of the alternative hypothesis. If our calculated test statistic falls within the white region, we fail to reject our null.

<img src="media/rejection region.png" width = 500, height = 200>

The above diagram, on the other hand, shows us how to use the critical value approach to choose whether or not to reject a null hypothesis for a two-sided test. There are now two red regions for this type of test, indicative of the fact that our alternative hypothesis is now $H_{1}: \mu \neq \mu_{0}$. Again, if our calculated test statistic falls within either of these rejection regions, we reject our null hypothesis. If it falls within the white region, we fail to reject our null. Pay attention to the fact that in all three cases, the area of the total red region equals $\alpha$, our chosen significance level. However, for the two-sided test, this area is split up. Considering that our test statistic will deterministically fall near (or in) either the left-hand rejection region or right-hand region, the probability that it falls within that specific region itself is now half as likely as it was for the one-sided tests (the probability is now $\alpha / 2$). In this sense, the two-sided test is more conservative. It is less likely for our calculated test statistic to fall inside the rejection region and allow us to reject our null hypothesis. Thus, if we want to set up a null hypothesis against some alternative hypothesis we are hoping to prove, it is often safer to use a two-sided test. In this way, if we are still able to reject our null, we can be twice as certainty about the validity of our results. Thus, two-sided tests are often used to boost the confidence in one's results.

The p-value is the probability of observing the test statistic if ${H_0}$ is true. Small p-values provide evidence to reject null hypothesis. The smaller (closer to 0) the p-value, the stronger is the evidence against the null hypothesis. In short, if the p-value is less than or equal to our significance level $\alpha$, the null hypothesis is rejected; otherwise, the null hypothesis is not rejected.

##### The p-value Approach:

In this approach, we again use our test statistic to make inferences about our population. However, we no longer rely on the diagrams above. In this approach, we instead calculate what is called a p-value. This is the probability of seeing a value as extreme as the test statistic we have calculated (or more) under the assumption that our null hypothesis is actually true. In this sense, if our calculated test statistic is 2, the p-value is simply $Prob \ (t_{\alpha, n - 1} > 2)$, where we are using the student's t-distribution for a significance level $\alpha$ and $n - 1$ degrees of freedom since we have been using the example of a one sample t-test throughout this procedure section. If this probability is small, it means that the probability of finding the test statistic (and thus sample mean) we have actually found is very likely under the null hypothesis being true. If this probability is smaller than our significance level $\alpha$, we reject our null hypothesis. If the probability is not so small, at least larger our significance level $\alpha$, we can reason that it is not incredibly unlikely to find the test statistic we have actually found under the null being true. In this case, we fail to reject the null hypothesis.

Underpinning both of these approaches is the assumption that our null hypothesis is true. For example, suppose our null hypothesis is that Canadian men have a mean income of 50,000, which can be expressed as $H_{0}: \mu = 50000$; let's further suppose that our alternative hypothesis is $H_{1}: \mu \neq 50000$. Imagine we've chosen a significance level of $\alpha = 0.05$ (5%) and calculate our test statistic to be 2 based on the sample mean, sample standard deviation and sample size we've collected. Let's say our calculated critical values are -1.5 and 1.5 (remember we have two critical values for a two-sided test). We notice that our calculated test statistic is more extreme than our upper bound (2 > 1.5). This tells us that, under the assumption that our null hypothesis is true, it is incredibly unlikely that we see a test statistic as extreme as the one we calculated. For good measure, we further calculate our p-value to be 0.02, meaning there is a 2% chance of seeing this test statistic under the assumption of our null being true. In both approaches, it becomes clear that it is quite unlikely that our original assumption, being that the null hypothesis is true, actually holds. We thus have strong evidence (in this case at the 95% level) to reject the null hypothesis in favour of the alternative hypothesis. Notice we never say that the null hypothesis is "wrong" or "right". Since we only have information about our sample and not our population, we can never say the null hypothesis is correct or incorrect. Instead, our hypothesis test gives us a certain degree of confidence in rejecting or failing to rejecting the null. In this case, we have 95% confidence in our ability to reject the null hypothesis. Let's now run through the hypothesis testing procedure more quickly with a few examples.

## Applications of the Procedure
---

#### Example 1: One Sample T-Test

Let's work with our labour force survey census data. Keep in mind that our census data is only a sample of the entire Canadian population. However, for the purpose of this example, let's suppose that our census data represents the entire Canadian population. We will then randomly select a sample of observations from our census data to represent our sample. We can then compare the average wage in our sample data to the average wage of the population as a whole. Let's suppose that we assume from the literature on labour market earnings that we assume the null hypothesis that the mean wage of Canadians is $54,000 per year. We will set this up against a two-sided alternative, since this is a more stringent alternative hypothesis and requires us to have more certainty in our results in order to reject the null (as explained above).

$$H_{0}: \mu = 54000$$
$$H_{1}: \mu \neq 54000$$

We will choose our confidence level to be 5% since this is the standard. 

$$ \alpha = 0.05$$

We will now define our population and sample based on the specifications above.

In [None]:
set.seed(123)

# Creating our sample
n <- 100
x <- sample(x = na.omit(census_data$wages), size = n)

For fun, let's look at the sample statistic we have found for the sample mean.

In [None]:
mean(x)

The sample mean we have found is about 57600. This is not super close to our null hypothesis of 54000, but we will have to conduct a t-test to determine whether this value is credible and allows us to reject our null hypothesis, or if we have instead happened to draw an extreme value by pure chance.

In [None]:
# conducting our one sample t-test
t.test(x = x, mu = 54000, alternative = "two.sided", conf.level = 0.95)

The `t.test()` function in R is helpful in that it allows us to either reject or fail to reject the null hypothesis immediately. This is because it outputs a p-value and test statistic. Our p-value is about 0.47, which is much larger than our confidence level of 0.05. This means that, assuming the null is true, it is very likely that we see a value as extreme as 57600. It is thus not bizarre to imagine pulling such a sample statistic when the null of 54000 is in fact true. This causes us to fail to reject the null hypothesis; however, to confirm this, we can also use the critical value approach. Let's utilize the `qt()` function below to do so.

In [None]:
# finding the lower and upper critical values
qt(p=0.025, df=99, lower.tail=TRUE)
qt(p=0.025, df=99, lower.tail=FALSE)

We can see from the above that our test statistic of about 0.72 fits nicely within the upper and lower bound critical values of -1.98 and 1.98. Thus, again, it is very likely we see the sample statistic of 57600 we have actually found conditioned on the null hypothesis being true. We thus do not have strong evidence to reject the null hypothesis. Further, we say that our findings of a sample mean of 57600 are not **statistically significant**. Statistically significant results are those which we find upon rejecting the null hypothesis. In this case, our large p-value and non-extreme test statistic prevent us from rejecting the null. If we received much different results (say a p-value < 0.05/test statistic very large in magnitude), we would say that the probability of finding the sample mean we did under the null being true is incredibly unlikely. This sample mean would thus be a statistically significant result which we could try with a high degree of uncertainty, allowing us to reject the null in favour of the alternative hypothesis. One final point about the p-value. In this case, the p-value was about 0.47. This does not mean that the probability the null hypothesis is true is 47%. It instead means that, conditioning on the null hypothesis being true, the probability of seeing a sample mean at least as far away from 54000 as 57600 is 47%. This is why we cannot reject the null and did not have statistically significant results. It is quite likely to pull a sample mean this large by chance under the given situation.

#### Example 2: Two Sample T-Test
Next let us look at the **two-sample t-test**. Unlike the one-sample t-test where we use a sample mean point estimate to test a hypothesis about a population mean, the two-sample t-test uses two sample means to test a hypothesis about the difference between two population means. More specifically, in this case, the two sample t-test will test whether the means of two independent populations differ from each another. We will specifically use the two sample unpooled t-test with unequal variances, which is used to tests hypotheses about the difference between population means when we know both populations are normally distributed (or the sum of their sample sizes exceeds 40, invoking normality), the observations are independent between the two groups (i.e. observations are not paired between populations), and we assumed that both population standard deviations, while unknown, are different. For this example, we will test the hypothesis that there is no difference between the mean wages of Canadian men and women. We will set this up against a two-sided alternative.

$$H_{0}: \mu_{men} = \mu_{women}$$
$$H_{1}: \mu_{men} \neq \mu_{women}$$

We will again set our significance level at 5%.

$$ \alpha = 0.05$$

Again, we will assume our census data represents our population and take two random samples from it, each of which will consistent exclusively of men or women.

In [None]:
set.seed(123)

# Creating our samples
n <- 100
samplefemale <- sample(x = filter(census_data, census_data$sex == 1)$wages, size = n)
samplemale <- sample(x = filter(census_data, census_data$sex == 2)$wages, size = n)

For fun, let's again look at our sample statistics.

In [None]:
mean(samplefemale)
mean(samplemale)

We can already see a large difference in mean wages between men and women here. However, we will have to conduct our t-test to determine whether or not this difference is statistically significant and thus whether or not we can reject our null hypothesis.

In [None]:
# conducting our two sample t-test
t.test(x=samplefemale, y=samplemale, conf.level=0.95)

Our t-test yields a p-value of about 0.002, much smaller than our significance level of 0.05. Thus, our result is statistically significant and we can reject our null hypothesis. We have evidence, at about the 0.2% level, in favour of the fact that the mean wages between men and women are not equal; however, this reveals nothing about why this is the case and does not control for any relevant factors. You will learn more about this in upcoming courses.

We ran this t-test on two independent populations (men and women). Alternatively, if we want to compare the means of dependent populations and test whether or not they are the same, we can employ the `y ~ x` option to our `t.test()` function. The variable on the left-hand side of the tilde (`~`) is the dependent variable, while the variables on the right-hand side is the independent variable (or variables). We can also specify within the `t.test()` function arguments to the options `paired` and `var.equal`. Both of these are set to FALSE by default, but we can change one or both of them to TRUE if we believe that our two samples come in pairs (a specific case of dependent samples) or the variances of the two populations are equal.

#### Example 3: Pearson Correlation Test

Another parameter we can make hypotheses about is the correlation coefficient. We can use hypothesis testing to test inferences about the correlation between two variables by analyzing random samples. Let's do this with `wages` and `mrkinc`. Recall from the Dependence and Dispersion Notebook that two variables are highly positively correlated if their correlation coefficient is close to 1, while they are highly negatively correlated if it is close to -1. Let's suppose that we have reason to believe that `wages` and `mrkinc` are quite correlated (hence their correlation coefficient is not 0). To find support for this, we will set this up as an alternative hypothesis to be supported after rejecting the null hypothesis that there is no correlation. In this way, we have to work to reject our null hypothesis in order to find support for this belief. Let's set up the hypotheses below. 

$${H_0}: r = 0$$
$${H_1}: r \neq 0$$

where $r$ is the population correlation coefficient between the wages and market income of Canadians. Let's further choose our significance level to be the default 5% (95% confidence level).

$$\alpha = 0.05$$

Let's now look at our sample statistic (sample correlation coefficient) to shed some light on the number whose significance we will be testing in our hypothesis test.

In [None]:
# finding the cor between wages and mrkinc, including use="complete.obs" to remove NA entries
cor(census_data$wages, census_data$mrkinc, use="complete.obs")

This correlation coefficient appears to be quite different from 0, giving us some reason to believe we will likely be able to reject the null hypothesis in favour of our alternative hypothesis of some relationship between `wages` and `mrkinc` (in this case, a possibly very strongly positive relationship). However, there is always the small chance we happen to have pulled a sample with a strong correlation which does not otherwise exist. To guard against this error of a false positive, we will conduct a Pearson Correlation test. Luckily, instead of having to calculate a test statistic and then calculate critical values or a p-value, we can invoke the `cor.test()` function. 

In [None]:
# Pearson correlation test
cor.test(census_data$wages, census_data$mrkinc, use="complete.obs") 

The correlation test yields an incredibly small p-value of 2.2e-16 < α = 0.05. Thus, we see that this correlation is statistically significant and reject the null hypothesis in favour of the alternative hypothesis that the true correlation coefficient is not zero.

## Type I and Type II Errors
---

One thing that is crucial to remember is that our hypothesis test may not always be correct. While a hypothesis test provides strong evidence for us to reject or fail to reject a null hypothesis, it is not concrete. That is why we never say that we "accept" the null hypothesis, instead preferring to say that we "fail to reject" the null hypothesis when no strong evidence exists against it. Similarly, we never say that we "accept" the alternative hypothesis, only that we "reject the null hypothesis in favour of the alternative hypothesis". Neither hypothesis can conclusively be proven as true or false. Otherwise, we could calculate these parameters directly and there would be no need for constructing hypotheses about them! 

Due to this lack of concreteness, we may occasionally make incorrect decisions about rejecting or failing to reject a null hypothesis. These errors are  called **type I errors** and **type II errors**. A type I error occurs when we incorrectly reject a true null hypothesis, otherwise known as a "false positive". This can occur when we draw a sample statistic which appears incredibly unlikely under the null hypothesis and then falsely assume that means our null hypothesis is incorrect. In reality, that sample statistic could have just been an unlikely pull under a true null hypothesis. Type II errors, on the other hand, occur when we to fail to reject a false null hypothesis, otherwise known as a "false negative". This can occur when we pull a sample statistic which is seemingly reasonable under our null hypothesis and falsely assume that we cannot reject the null. In reality, that sample statistic could have just been an unlikely pull which would have otherwise encouraged us to reject the null. A helpful infographic illustrating these concepts is below.

<img src="media/Graphical-representation-of-type-1-and-type-2-errors.png" width = 600, height = 300>

The probability of making a type I error is denoted by $\alpha$ and is the significance level of our test. Conversely, the probability of correctly failing to reject a true null hypothesis is $1 - \alpha$. The probability of a type II error is denoted by $\beta$, while the probability of correctly rejecting a false null hypothesis is $1 - \beta$ and is known as the power of the test. A higher confidence level (corresponds to lower significance value) reduces the chances of getting a false positive and increases the chances of getting a false negative. In other words, for a fixed sample size, the smaller the $\alpha$, the larger the $\beta$. This indicates that there is a constant tradeoff betweeen making type I and II errors. As well, it is important to remember that we select our significance level and hence the probability of falsely rejecting a true null hypothesis before we even calculate our test statistic (it is Step 2). Conversely, we can never select for the probability of failing to reject a false null, $\beta$. This probability instead emerges in the testing process.

## Exercises
---

#### Exercise 1:
Suppose that you are investigating the mean years of education among all citizens in a country who are over the age of 18. Let's say you want to find support for the idea that the mean years of education among adults in this country is greater than 12, indicating that the average years of schooling accumulated per person amounts to some degree of post-secondary education.

##### Question 1:
How would you set up your null and alternative hypotheses?

<span style=color:red>Explain your reasoning here:

##### Question 2:
Suppose instead that you now want to find support for the claim that the mean years of schooling completed by adults in a country is not 12, indicating that the average years of schooling completed does not amount to full completion of exclusively primary and secondary schooling. How would you set up your null and alternative hypotheses?

<span style=color:red>Explain your reasoning here:

##### Question 3:
Regardless of the situation outlined above, let's say that you choose a 5% significance level and conduct a one sample t-test (since you're testing a hypothesis about the mean of a single population for which you don't know the standard deviation). You receive a p-value of 0.02 and correctly reject your null hypothesis. Have you proved that your null hypothesis is false?

In [None]:
answer_1 <- "x" # your answer of "yes" or "no" in place of "x" here

test_1()

<span style=color:red>Explain your reasoning here:

##### Question 4:
Suppose your friend conducts the exact same test and receives the same results, but then concludes that the results are not statistically significant. What error has your friend made? 

In [None]:
answer_2 <- "x" # your answer of "type 1" or "type 2" in place of "x" here

test_2()

<span style=color:red>Explain your reasoning here:

#### Exercise 2
Suppose you instead want to compare the mean earnings of those who have and have not graduated from high school. You determine that these are independent populations. Further, even though you don't know the population standard deviations of earnings in each group, you determine that these standard deviations must not be the same, arguing that there is a wider spread of earnings among those who graduated high school. For these reasons, you conduct an unpooled, unequal variances two sample t-test, the type of two sample t-test we explored earlier in our applications. You then set up the following hypotheses.

$$H_{0}: \mu_{graduated} = \mu_{didn't \ graduate}$$
$$H_{1}: \mu_{graduated} \neq \mu_{didn't \ graduate}$$

You then choose a significance level of 5%, the default level used. Suppose a friend instead sets up a one-sided alternative, namely that $\mu_{graduated} > \mu_{didn't \ graduate}$.

##### Question 1:
Assuming the null hypothesis, significance level, sample data and type of test used are identical for both you and your friend, who is more likely to receive statistically significant results?

In [None]:
answer_3 <- "x" # your answer for "you" or "your friend" in place of "x" here

test_3()

<span style=color:red>Explain your reasoning here:

##### Question 2:
Moving forward with your two-sided hypothesis test, you find a sample mean statistic of 60000 for those who graduated high school and 25000 for those who didn't graduate high school. You find for your chosen significance level and distribution of sample means for each population that the resulting test statistic in your test is 1.5, while the critical values from the student's t-distribution are -2 and 2 respectively. Should you reject the null hypothesis that there is no statistically significant difference between the mean earnings of each population?

In [None]:
answer_4 <- "x" # your answer for "yes" or "no" in place of "x" here

test_4()

<span style=color:red>Explain your reasoning here: