# Test 39: Fisher’s exact test for consistency in a 2 × 2 table

## Objective

- You have a 2x2 table
- The columns are made up of 2 classes
- The rows are made up of 2 samples
- Is there evidence that the two samples are drawn from the same population?

## Assumptions

- The classification is dichotomous
- The elements originate from 2 sources
- Number of elements are small, expected frequencies are less than 5

## Method

### Hypergeometric Distribution

- The derivation of this test is actually from simple sampling without replacement (i.e. Hypergeometric Distribution)

- Imagine you have a bag with 7 red balls, and 6 green balls

- I claim that I have special powers. Without looking into the bag, I am able sense whether the ball is red or green. 

- To test this claim, you ask that I draw 7 balls from this bag, drawing only red balls

- Of this 7, I drew 6 red and 1 green ball

- Let's put this result into a 2x2 table

| | Red | Green | Row Total |
| - | - | - | - |
| Sampled | 6 | 1 | 7 |
| Not Sampled | 1 | 5 | 6 | 
| Column Total | 7 | 6 | n = 13 |

- Does this confirm that I have special powers? 

- Let $k$ be the number of red balls I manage to draw

- Let's assume that we do not. The probability that I get this result by randomly picking balls is simply the probality that $k = 6$ 

- We can compute the probability that $k=6$ by counting the number of ways to get 6 reds and 1 green out of the total number of ways to draw 7 balls randomly
$$\begin{aligned}
    P(k=6) &= \frac{\binom{7}{6} \cdot \binom{6}{1}}{\binom{13}{7}} \\
    &= \frac{7 \cdot 6}{1716} \\
    &= 0.02448
\end{aligned}$$

### Deriving the p-value

- Ok, so now we know the probability of such a scenario happening by chance, or the probability mass function (PMF) of this outcome

- However, we don't just want to know this; we want the p-value. That is; what is the probability of seeing something equal to or more extreme than this value

- Why do we want the p-value? Because to conclude that something is rare, it's not enough to know what the absolute probability is. It's more important to know where it stands among all possible outcomes! (i.e. how extreme is this value)
    - For example, 2.4% may seem small, but what if that 2.4% is actually in the middle of the distribution? i.e. 50% of all possible outcomes are on the right of the 2.4%. 
    - In that case, it's obviously not an extreme value at all, though it is seemingly "rare" to see something like this

- So how do we compute the p-value? p-value is simply the sum of probabilities that are more extreme than $k=6$. 

- Practically speaking, we find all outcomes that have a lower probability than $k=6$ and sum them up
    - In the one-tail test, the alternative hypothesis is: is the proportion of successes (reds) larger in my sampled class compared to my unsampled class. Concretely:
    $$\begin{aligned}
        \sum p &= P(k=6) + P(k=7) \\
        &= \frac{\binom{7}{6} \cdot \binom{6}{1}}{\binom{13}{7}} + \frac{\binom{7}{7} \cdot \binom{6}{0}}{\binom{13}{7}} \\
        &= 0.0245 + 0.000583 \\
        &= 0.02505
    \end{aligned}$$

    - In the two-tail test, the alternative hypothesis is: is the proportion of successes (reds) **different** in my sampled class compared to my unsampled class 
        - Using the same logic as the one tail test, let's compute everything that is "rarer" than the $k=6$ case
        - $P(k=6) = \frac{\binom{7}{6} \cdot \binom{6}{1}}{\binom{13}{7}} = 0.0245$
        - $P(k=7) = \frac{\binom{7}{7} \cdot \binom{6}{0}}{\binom{13}{7}} = 0.000583$
        - $P(k=1) = \frac{\binom{7}{1} \cdot \binom{6}{6}}{\binom{13}{7}} = 0.00408$
        - $P(k=2) = \frac{\binom{7}{2} \cdot \binom{6}{5}}{\binom{13}{7}} = 0.0734$
        - The cases that are rarer that $P(k=6)$ are $P(k=7)$ and $P(k=1)$
        - Therefore, p-value is $0.0245 + 0.000583 + 0.00408 = 0.0292$

### Test statistic

- In fact, the process computing the p-value above is exactly the test-statistic for Fisher's Exact test!

- In general, let's suppose we have a 2x2 table below:

| | Class 1A | Class 1B | Total |
| - | - | - | - |
| Class 2A | a | b | a + b |
| Class 2B | c | d | c + d |
| Total | a+c | b+d | n=a+b+c+d |

- The test statistic is 
$$\begin{aligned}
    \sum p &= \sum \frac{\frac{(a+c)!}{a_i! c_i!} \frac{(b+d)!}{b_i! d_i!}}{\frac{n!}{(a+b)!(c+d)!}} \\
    &= \sum \frac{(a+b)!(c+d)!(a+c)!(b+d)!}{a_i! c_i!b_i! d_i! n!} \\
    &= \frac{(a+b)!(c+d)!(a+c)!(b+d)!}{n!} \sum \frac{1}{a_i! c_i!b_i! d_i!} \\
\end{aligned}$$

- The confusing portion of this notation is the subscript $i$; in this case, we are summing across all $i$ such that the PMF of the specific case is lower than the PMF of the hypothesis

- Since the test statistic is a sum of PMFs, it is bounded between 0 and 1

- Therefore, the critical value is simply $\alpha = 0.05$ or whatever level of confidence you want. No need for additional look up

## Proof

- Derivation from first principles above, no need for simulation

In [1]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns