# Comparing Group Means: T-tests and One-way ANOVA
https://scholarworks.iu.edu/dspace/handle/2022/19735


## 1. Introduction
### 1.1 Background of the T-test: Key Assumptions
The t-test assumes that **samples are randomly drawn from normally distributed populations with unknown population variances.** If such assumption cannot be made, you may try nonparametric methods. The variables of interest should be **random variables**, whose values change randomly. A constant such as the number of parents of a person is not a random variable. In addition, the occurrence of one measurement in a variable should be independent of the occurrence of others. In other word, the occurrence of an event does not change the probability that other events occur. This property is called **statistical independence**. Time series data are likely to be statistically dependent because they are often autocorrelated.

T-tests assume **random sampling without any selection bias**. If a researcher intentionally selects some samples with properties that he prefers and then compares them with other samples, his inferences based on this non-random sampling are neither reliable nor generalized. In an experiment, a subject should be randomly assigned to either the control or treated group so that two groups do not have any systematic difference except for the treatment applied. When subjects can decide whether or not to participate (non-random assignment), however, the independent sample t-test may under- or over-estimate the difference between the control and treated groups. In this case of self-selection, the propensity score matching and treatment effect model may produce robust and reliable estimates of mean differences.

Another, yet closely related to random sampling, key component is **population normality**. If this assumption is violated, a sample mean is no longer the best measure (unbiased estimator) of central tendency and t-test will not be valid. Figure 1 illustrates the standard normal probability distribution on the left and a bimodal distribution on the right. Even if the two distributions have the same mean and variance, we cannot say much about their mean difference.

![](http://ou8qjsj0m.bkt.clouddn.com//17-11-19/957879.jpg)

The violation of normality becomes more problematic in the one-tailed test than the two-tailed one (Hildebrand et al. 2005: 329). Figure 2 shows how the violation influences statistical inferences. The left red curve indicates the standard normal probability distribution with its 1 percent one-tailed rejection region on the left. The blue one is for a non-normal distribution with the blue 1 percent rejection region (critical region). The test statistic indicated by a vertical green line falls in the rejection region of the skewed non-normal distribution but does not in the red shaded area of the standard normal distribution. If the populations follow such a non-normal distribution, the one-tailed t-test based on the normality does not mistakenly reject the null hypothesis.

![](http://ou8qjsj0m.bkt.clouddn.com//17-11-19/65932100.jpg)

**Due to the Central Limit Theorem, the normality assumption is not as problematic as imagined in the real world.** The Theorem says that the distribution of a sample mean (e.g., $\bar{y_1}$ and $\bar{y_2}$ ) is
approximately normal when its sample size is sufficiently large. **When $n_1 + n_2 \ge 30$ , in practice, you do not need to worry too much about normality.**

When sample size is small and normality is questionable, you might draw a histogram, P-P plot, and Q-Q plots or conduct the Shapiro-Wilk W (N<=2000), Shapiro-Francia W (N<=5000), Kolmogorov-Smirnov D (N>2000), and Jarque-Bera tests. If the normality assumption is violated, you might try such nonparametric methods as the Kolmogorov-Smirnov Test, Kruscal-Wallis Test, or Wilcoxon Rank-Sum Test.

### 1.2 T-test and Analysis of Variance
The t-test can be conducted on a one sample, paired samples, and independent samples. **The one sample t-test checks if the population mean is different from a hypothesized value (oftentimes zero).** If you have two samples, which are not independent but paired, you need to compute differences of individual matched pairs. A typical example is outcome measurements of pre- and post- treatment. **The paired t-test examines if the mean of the differences (effect of treatment) is discernable from zero (no effect). Therefore, the underlying methods of one-sample t-test and paired t-test are in fact identical.**

If two samples are taken from different populations and their elements are not paired, the independent sample t-test compares the means of two samples. In a GPA data set of male and female students, for example, the GPA of the first male student is nothing to do with that of the first female student. When two samples have the same population variance, the independent samples t-test uses the pooled variance when computing standard error. Otherwise, individual variances need to be used instead in computation, and degrees of freedom should be approximated. The folded F test is used to evaluate the equality of two variances. In both cases, the null hypothesis is two samples have the same mean. Figure 3 illustrates these four types of t-tests and one way ANOVA.

![](http://ou8qjsj0m.bkt.clouddn.com//17-11-19/46607077.jpg)

While the independent sample t-test is limited to comparing the means of two groups, the one-way ANOVA (Analysis of Variance) can compare more than two groups. ANOVA use F statistic to test if all groups have the same mean. Therefore, the t-test is considered a special case of the one-way ANOVA. When comparing means of two groups (one degree of freedom), the t statistic is the square root of the F statistic of ANOVA ($F=t^2$). But DO NOT be confused with the folded F test for examining the equality of the two variances.

These analyses do not necessarily posit any causal relationship between the left-hand and right- hand side variables. Whether data are balanced(the numbers of observations across groups are not necessarily equal) does not matter in the t-test and one-way ANOVA. Table 1 compares the independent sample t-test and one-way ANOVA.

![](http://ou8qjsj0m.bkt.clouddn.com//17-11-19/26292397.jpg)

## 2. One sample T-Test
Suppose we obtain n measurements $y_1$ through $y_n$ that were randomly selected from a normally distributed population with unknown parameters $\mu$ and $\sigma^2$. One example is the SAT scores of 100 undergraduate students who were randomly chosen.

The one sample t-test examines whether the unknown population mean μ differs from a hypothesized value c. This is the null hypothesis of the one sample t-test, $H_0 : \mu = c$ . The t statistic is computed as follows.

$t=\frac{\bar{y}-c}{s_{\bar{y}}} \sim t(n-1)$

- y is a variable to be tested
- $\bar{y}=\frac{\sum{y_i}}{n}$ is the mean of y
- $s^2=\frac{\sum(y_i-\bar{y})^2}{n-}$ is the variance of y
- $s_{\bar{y}}=\frac{s}{\sqrt{n}}$ is the standard error of $\bar{y}$
- n is the number of observations

The t statistic follows Student’s t probability distribution with n-1 degrees of freedom (Gosset 1908).

### 2.1 One Sample T-test in R

In [2]:
df <- read.table('http://www.indiana.edu/~statmath/stat/all/ttest/smoking.txt',
                 sep='', skip=33, nrow=44, header=F,
                 col.names=c('state', 'cigar', 'bladder', 'lung', 'kidney', 'leukemia', 'area'))

In [5]:
head(df)

state,cigar,bladder,lung,kidney,leukemia,area
AK,30.34,3.46,25.88,4.32,4.9,3
AL,18.2,2.9,17.05,1.59,6.15,3
AZ,25.82,3.52,19.8,2.75,6.61,4
AR,18.24,2.99,15.98,2.02,6.94,3
CA,28.6,4.46,22.07,2.66,7.06,4
CT,31.1,5.11,22.83,3.35,7.2,1


In [6]:
# IF cigar >23.77 THEN smoke=1; ELSE smoke=0;
df$smoke <- df$cigar>23.77
head(df)

state,cigar,bladder,lung,kidney,leukemia,area,smoke
AK,30.34,3.46,25.88,4.32,4.9,3,True
AL,18.2,2.9,17.05,1.59,6.15,3,False
AZ,25.82,3.52,19.8,2.75,6.61,4,True
AR,18.24,2.99,15.98,2.02,6.94,3,False
CA,28.6,4.46,22.07,2.66,7.06,4,True
CT,31.1,5.11,22.83,3.35,7.2,1,True


In [7]:
# IF area = 3 or area =4 THEN west=1; ELSE west=0;
df$west <- df$area>=3
head(df)

state,cigar,bladder,lung,kidney,leukemia,area,smoke,west
AK,30.34,3.46,25.88,4.32,4.9,3,True,True
AL,18.2,2.9,17.05,1.59,6.15,3,False,True
AZ,25.82,3.52,19.8,2.75,6.61,4,True,True
AR,18.24,2.99,15.98,2.02,6.94,3,False,True
CA,28.6,4.46,22.07,2.66,7.06,4,True,True
CT,31.1,5.11,22.83,3.35,7.2,1,True,False


In [8]:
attach(df)

The following command conducts one sample t-test with the null hypothesis of population mean 20 at the .01 significance level (.99 confidence level). The small t statistic indicates that the population mean of lung cancer is 20 at the .01 level.

In [9]:
t.test(lung, mu=20, conf.level=.99)


	One Sample t-test

data:  lung
t = -0.5441, df = 43, p-value = 0.5892
alternative hypothesis: true mean is not equal to 20
99 percent confidence interval:
 17.93529 21.37108
sample estimates:
mean of x 
 19.65318 


## 3. Paired T-test: Dependent Samples
T-tests compare the means of two samples. Two variables may or may not be independent. When each element of a sample is matched to its corresponding element of the other sample, two samples are paired. This paired t-test examines the mean of individual differences of paired measurements and thus is appropriate for pre-post situations. Suppose we want to investigate the effectiveness of a new medicine on lung cancer by checking patients before and after they took the medicine.

The paired t-test is based on the pairwise differences in values of matched observations of two samples, $d_i = y_{1i} - y_{2i}$ . The difference of matched pairs is treated as a variable; the logic of the paired t-test and one sample t-test is identical.

$t_{\bar{d}}=\frac{\bar{d}-D_0}{s_{\bar{d}}} \sim t(n-1)$

- $\bar{d}=\frac{\sum d_i}{n}$
- $s_d^2=\frac{\sum(d_i-\bar{d})^2}{n-1}$
- $s_{\bar{d}}=\frac{s_d}{\sqrt{n}}$

The null hypothesis is that the population mean of individual differences of paired observations is $D_0$ (zero unless explicitly specified), $H_0 : \mu_d = D_0$ . If the null hypothesis is rejected, there must be a significant difference (effect) between two samples (pre and post outcomes).

### 3.1 Paired T-test in R
paired=T conducts the paired t-test in R. The result is similar to that of one sample t-test.

```r
> t.test(pre, post, mu=0, paired=T)
```

## 4. Comparing Independent Samples with Equal Variances
This section discusses the most typical form of t-test that compares the means of two independent random samples $y_1$ and $y_2$. They are independent in the sense that they are drawn from different populations and each element of one sample is not paired (linked to) with its corresponding element of the other sample.

An example is the death rate from lung cancer between heavy cigarette consuming states and light consuming states. Since each state is either a heavy or light consumer, observations of two groups are not linked. The typical null hypothesis of the independent sample t-test is that the mean difference of the two groups is zero, $H_0 : \mu_1 - \mu_2 = 0$ .

### 4.1 F test for Equal Variances
T-tests assume that samples are randomly drawn from normally distributed populations with unknown parameters. In addition to these random sampling and normality assumptions, you should check the equal variance assumption when examining the mean difference of two independent samples. The population variances of the two groups $\sigma_1^2$ and $\sigma_2^2$ need to be equal in order to use the pooled variance. Otherwise, the t-test is not reliable due to the incorrect variance and degrees of freedom used.

pooled variance：

- $E(\bar{y_1}-\bar{y_2})=\mu_1-\mu_2$
- $Var(\bar{y_1}-\bar{y_2})=\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}=\sigma^2(\frac{1}{n_1}+\frac{1}{n_2})$
    - when $\sigma_1^2=\sigma_2^2=\sigma^2$

In practice, unequal variances of two independent samples are less problematic when two samples have the same number of observations (balanced data) (Hildebrand et al. 2005: 362). The problem will be critical if one sample has a larger variance and a much smaller sample size compared to the other (362).

The folded form F-test is commonly used to examine whether two populations have the same variance, $H_0 :\sigma_1^2 = \sigma_2^2$ . The F statistic is

$\frac{s_L^2}{s_S^2} \sim F(n_L-1,n_S-1)$

where L and S respectively indicate groups with larger and smaller sample variances.

### 4.2 Overview of the Independent Sample T-test
If the null hypothesis of equal variances is not rejected, the pooled variance $s^2_{pool}$ can be used. The pooled variance consists of individual sample variances weighted by the number of observations of the two groups. The null hypothesis of the independent sample t-test is $H_0 :\mu_1 - \mu_2 = D_0$ and the degrees of freedom are $n_1+n_2-2=(n_1-1)+(n_2-1)$. Thet statistic is computed as follows.

$t=\frac{(\bar{y_1}-\bar{y_2})-D_0}{s_{pool}\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}} \sim t(n_1+n_2-2)$

where $s^2_{pool}=\frac{\sum(y_{1i}-\bar{y_1})^2+\sum(y_{2i}-\bar{y_2})^2}{n_1+n_2-2}=\frac{(n_1-1)s^2_1+(n_2-1)s^2_2}{n_1+n_2-2}$

When the equal variance assumption is violated, the t-test needs to use individual variances in the approximate t and the degrees of freedom. This test may be called the unequal variance t-test (Hildebrand et al. 2005: 363). Notice that the approximation below is based both on the number of observations and variances of two independent samples. The approximate t is

$t'=\frac{\bar{y_1}-\bar{y_2}-D_0}{\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}} \sim t(df_{Satterthwaite})$

where 

- $df_{Satterthwaite}=\frac{(n_1-1)(n_2-1)}{(n_1-1)(1-c)^2+(n_2-1)c^2}$
- $c=\frac{s^2_1/n_1}{s^2_1/n_1+s^2_2/n_2}$ 

### 4.3 Independent Sample T-test in R