# Test 45: The sign test for a median

- Because this test is nonparametric, it does not strictly follow the cutoff values of the tables. 

- Keep in mind that this means that the confidence level $\alpha$ isn't going to be precisely accurate

## Objective

- From a population, I have some sample of size $N$
- Is the population median equal to some specified $M_0$?

## Assumptions

- Observations in the sample are independent of each other 
- Any sample values equal to $M_0$ should be discarded 

## Method

- The sign test forms the basis for many other tests, so it's useful understand the computation from scratch

### Sign Test from Binomial

- Let's suppose we have some population $X$ with median $M$

- I have a hypothesis that the median of $X$ is $M_0$

- I draw a sample $x$ of size $n$, and I sort the array 

- Remove all values in $x$ that are equal to $M_0$

- Comparing the values in $x$ to $M_0$: 
    - Count the number of values that are smaller than $M_0$, call it $N_1$
    - Count the number of values that are larger than $M_0$, call it $N_2$
    - Let the total count be $N$

- What's the idea here? If $M_0$ is truly the median value (i.e. the middle of the dataset), then it must be true that 50% of all values in the population are above it, and 50% are below it

- Get $T = \min{(N_1, N_2)}$ as the test statistic

- Under the null hypothesis, we know that $T \sim \text{Bin}(n, 0.5)$ 

- The question simply reduces to this: if every value we draw is either above or below $M_0$ with probability 50%, what is the likelihood of observing the test statistic of **at most** $T$?

- This is simply the binomial CDF 
$$\begin{aligned}
    P(X \le T) &= \sum_{i=0}^{T} \binom{N}{i} p^{i} (1-p)^{N-i} 
\end{aligned}$$

- For one tailed test:
    - If $P(X \le T) \le \alpha$, then we reject the null hypothesis that $M_0 = M$

- For two tailed test:
    - If $2 \cdot P(X \le T) \le \alpha$, then we reject the null hypothesis that $M_0 = M$

### [Deprecated] Book Method

**DOES NOT CONVERGE WTH**

- I have a sample of size $N$

- Count the number of observations below the median value to test $M_0$, and call this $N_1$

- Count the number of observations above the median value to test $M_0$, and call this $N_2$

- For a two sided test (median $\neq$ $M_0$), critical value is $\min{(N_1, N_2)}$
- Else for one sided test (median $\gt \text{or} \lt M_0$), critical value is $N_2 \text{ or } N_1$

- Read the critical value from Table 17, with the relevant value of $N$

## Proof

### Proof of Sign Test

In [122]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns

In [289]:
sample_size = 20
population = np.random.randint(0,10, size=10_00)
true_median = np.median(population)

def get_test_statistic_binom(hypothesized_median, type, alpha):
    sample = np.random.choice(population, size=sample_size)
    sample = np.delete(sample, np.where(sample == hypothesized_median))
    # print(sample)

    n1 = np.sum(sample < hypothesized_median)
    n2 = np.sum(sample > hypothesized_median)
    test_statistic = min(n1, n2)
    corrected_sample_size = len(sample)

    cdf = scipy.stats.binom(corrected_sample_size, 0.5).cdf(test_statistic)

    ## Reject if test statistic lower than critical value
    if type == 'one-tailed':
        return cdf < alpha
    
    if type == 'two-tailed':
        return (2*cdf) < alpha

def get_test_statistic(hypothesized_median):
    sample = np.random.choice(population, size=sample_size)
    sample = np.delete(sample, np.where(sample == hypothesized_median))
    # print(sample)

    n1 = np.sum(sample < hypothesized_median)
    n2 = np.sum(sample > hypothesized_median)
    test_statistic = min(n1, n2)
    corrected_sample_size = len(sample)

    #https://openpress.usask.ca/app/uploads/sites/76/2020/11/Sign-Test-Critical-Values-Table.pdf
    critical_value = {
        8: 0,
        9: 1,
        10: 1,    
        11: 1,
        12: 2,
        13: 2,
        14: 3,
        15: 3,
        16: 3,
        17: 4,
        18: 4,
        19: 4,
        20: 5,
        21: 5,
        22: 5,
        23: 6,
        24: 6,
        25: 6,
    }

    ## Reject if test statistic lower than critical value
    return test_statistic < critical_value.get(corrected_sample_size)

In [291]:
test_statistic = [get_test_statistic_binom(true_median, 'two-tailed', 0.05) for _ in range(3_000)]
np.mean(test_statistic)

0.050666666666666665

### [Deprecated] Book Method

In [4]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns

In [119]:
mu = 50
m0 = 50

def get_test_statistic(mu):
    population = np.random.choice([
        mu-5, mu-4, mu-4, mu-2, mu-1, mu+1, mu+2, mu+3, mu+4, mu+5
    ],10_000)
    m0 = np.median(population)

    sample_size = 100
    sample = np.random.choice(population, sample_size)

    n1 = np.sum(sample < m0)
    n2 = np.sum(sample > m0)

    return min(n1,n2)
# min(n1, n2)

get_test_statistic(mu)

40

In [121]:
test_reject_null = [get_test_statistic(50) > 22 for _ in range(3_000)]
np.mean(test_reject_null)

0.9996666666666667