# Test 48: The signed rank test for two means (paired observations)

## Objective

- I have 2 samples, each of which is an array of values, from 2 different populations
- Every pair of values in the same array index position was obtained under similar conditions
- Is there a significant difference in the means of the 2 samples?

## Assumptions

- Observations in the two samples should be taken in pairs, one from each distribution, under the same conditions 
- Different pairs don't have to be taken under similar conditions. 
- Any pair of observations giving equal values will be ignored in the analysis

## Method

- I have a sample arrays $x$ and $y$

- Compute difference between pairs $d = x-y$

- Rank $d$ according to absolute value, ignoring sign

- Then, where $d$ is negative, add a negative value to the rank

- Sum all positive ranks as $N_1$ and negative ranks as $N_2$

- The test statistic is $\min{(N_1, N_2)}$

- Find the critical value from Table 19 (signed rank test for paired differences)

## Proof

In [1]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
population = np.random.normal(10,5,10_00)
true_mean = np.mean(population)
sample_size=50
critical_value_alpha5pct_n50 = 434

def get_test_statistic():
    sample1 = np.random.choice(population, sample_size) 
    sample2 = np.random.choice(population, sample_size) 
    diffs = sample1-sample2
    diffs = np.delete(diffs, np.where(diffs==0))

    sorted_diffs = np.array(sorted(diffs, key=lambda x: abs(x)))
    ranks = np.array([np.where(sorted_diffs == x)[0][0] for x in diffs])

    assert all(sorted_diffs[ranks] == diffs)

    signed_ranks = [-r if x < 0 else r for r, x in zip(ranks, diffs)]
    n1 = -np.sum([x for x in signed_ranks if x < 0])
    n2 = np.sum([x for x in signed_ranks if x > 0])
    test_statistic = min(n1, n2)
    return test_statistic < critical_value_alpha5pct_n50

In [17]:
np.mean([get_test_statistic() for _ in range(1_000)])

0.06