# Using Pearson's $\chi^2$ Test of Independence

# 1. Pearson's $\chi^2$ (Independence) Test

We can also use Pearson's $\chi^2$ to solve a different sort of problem. In particular, we can use Pearson's $\chi^2$ to test the extent to which two categorical variables are independent.


# 1.1 Pearson's $\chi^2$ (Independence) Test Example

Suppose we would like to teach cats to dance. 

		
We have two training systems: using food as a reward, and using affection as a reward. Suppose after a week of training the cats, we test dancing ability. So, we have two categorical variables: _training_ and _dance_, each with two levels.


|.|.|Food as reward|Affection as reward|
|---|---|---|---|
|Cat Dances? |Yes| 28           | 48                |
|     .     |No | 10           | 114               |


From these data, are the _training_ and _dance_ varialbes independent?

*Source: Field _et al._ (2012)

### 1.1.1 Pearson's $\chi^2$ Independence Test (cont.)
The test statistic is $\chi^2$ and is computed using:

 $$ \chi ^{2}=\sum _{{i=1}}^{{r}}\sum _{{j=1}}^{{c}} {(O_{{i,j}}-E_{{i,j}})^{2} \over E_{{i,j}}},  $$
	
where $$ E_{i,j} = { \text{row-total}_i \times \text{column-total}_j \over N} $$
	
and where $O_{i,j}$ is the observed count in cell $i, j$ and $E_{i,j}$ is the expected count for cell $i,j$ under the null hypothesis. 

### 1.1.2 Pearson's $\chi^2$ Independence Test (cont.)
Note:
  - Degrees of freedom: $ df = (r - 1)(c - 1) $ where $r$ is the number of rows, and $c$ is the number of columns
  - Assumption that observations are independent from one another 
    + E.g., In above example, a cat could only be in one _training_ condition

# 2. Pearson's $\chi^2$ Independence Test in R

In [3]:
can_dance <- c(rep(TRUE, 76), rep(FALSE, 124))

training <- c(rep("food", 28), rep("affection", 48), rep("food", 10), rep("affection", 114))

cats <- data.frame(can_dance, training)

head(cats)

Unnamed: 0_level_0,can_dance,training
Unnamed: 0_level_1,<lgl>,<chr>
1,True,food
2,True,food
3,True,food
4,True,food
5,True,food
6,True,food


## 2.1 Running $\chi^2$ Test of Independence

In [4]:
# sanity check to make sure data are correct
xtab1 <- table(cats$can_dance, cats$training)

print(xtab1)

       
        affection food
  FALSE       114   10
  TRUE         48   28


In [5]:
test1 <- chisq.test(cats$training, cats$can_dance)

print(test1)


	Pearson's Chi-squared test with Yates' continuity correction

data:  cats$training and cats$can_dance
X-squared = 23.52, df = 1, p-value = 1.236e-06

