# Test 55: The rank sum difference test for the multiple comparison of K population means

## Objective

- You have $K$ random samples (with multiple observations each)
- Do they come from $K$ populations with the same mean?

## Assumptions

- The K samples must have the same size 
- The frequency distributions of the population are assumed continuous

## Method

- The $K$ samples are combined and sorted, then assigned a rank. 
- The largest value is assigned rank 1. 
- Compute the rank sum for each sample
- The test statistic for 2 samples is $R_i - R_j$
- The test statistic follows a the two-sided rank-sum difference test table (Table 23). If the test statistic exceeds the critical value, then the null hypothesis of equal means is rejected

## Proof

In [1]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns

In [8]:
K = 5
sample_size=25

def get_test_statistic():
    samples = np.random.normal(10,5,K*sample_size).reshape(K, sample_size)
    ranks = samples.reshape(-1).argsort().argsort().reshape(K, sample_size)
    rank_sums = np.sum(ranks, axis=1)
    test_statistic = rank_sums[0] - rank_sums[1]
    return test_statistic

In [18]:
test_statistic_distribution = [get_test_statistic() for _ in range(3_000)]
np.percentile(test_statistic_distribution, 95)

415.0