# Test 46: The sign test for two medians (paired observations)

## Objective

- You have a sample with some set of values
- After some treatment applied to the sameple, you get another set of values
- Has the median changed between the 2 sets?

## Assumptions

- The observations in the two samples should be taken in pairs, one from each distribution.

- Each one of a pair of observations should be taken under the **same conditions**, 
    - It is not necessary that different pairs should be taken under similar conditions.

## Method

- You have a "before" sample $x$, and an after sample $y$

- Compute $y-x$ and record the sign (either positive or negative)
    - Count the number of positives as $N_1$
    - Count the number of negatives as $N_2$
    - Remove all 0s

- Let test statistic $r = \min{(N_1, N_2)}$ be the number of times that the least frequent sign occurs

- Method 1
    
    - If this is less than the critical value in Table 18 (Critical values of r for the sign test for paired observations), we reject the null hypothesis that the two population medians are equal

- Method 2
    - Alternatively, assume that, under the null hypothesis that the medians are the same, each observation has a 0.5 probability of being positive or negative
    - So the p-value that the test statistic $r = r_0$ is simply the binomial CDF:
    $$\begin{aligned}
        P(r <= r_0) &= \sum_{i=1}^{r_0} \binom{N}{i} p^{i} (1-p)^{N-i}
    \end{aligned}$$

    - This can be compared with the desired $\alpha$ value to determine whether the CDF $r=r_0$ lies in the rejection region

## Proof

In [1]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns

In [79]:
population = np.random.randint(1, 10, 10_00)
true_median = np.median(population)

def get_test_statistic(type, alpha):
    sample1 = np.random.choice(population, 50)
    sample2 = np.random.choice(population, 50)
    diffs = sample1-sample2
    diffs = np.delete(diffs, np.where(diffs == 0))
    corrected_sample_size = len(diffs)

    n1 = np.sum(diffs < 0)
    n2 = np.sum(diffs > 0)
    r = min(n1,n2)

    ## Method 1
    # if type == 'one-tailed':
    #     return r < 17
    # if type == 'two-tailed':
    #     return (2*r) < 17

    ## Method 2
    pval = scipy.stats.binom(corrected_sample_size, 0.5).cdf(r)
    if type == 'one-tailed':
        return pval < alpha
    if type == 'two-tailed':
        return (2*pval) < alpha

In [81]:
test_statistic = [get_test_statistic('one-tailed', 0.05) for _ in range(3_000)]
np.mean(test_statistic)

0.06866666666666667