# Test 52: Wilcoxon–Mann–Whitney rank sum test of two populations

## Objective

- You have 2 random samples
- Did they come from 2 populations with the same mean?

## Assumptions

- The two populations have continuous frequency distributions with the same shape and spread

## Method

- You have 2 samples with sizes $n_1, n_2$, which we'll call $x$ and $y$

- Combine them into a sample of size $n_1 + n_2 = N$, and arrange them in order of increasing values, then assign ranks to each observations

- Compute the sum of ranks for $x$ and $y$, which we denote as $r_1$ and $r_2$
    - Let's assume that $n_1 < n_2$
    - Then $R = r_1$
    - Else $R = r_2$

- Let $n = \min(n_1, n_2)$

- Next, compute $R^1 = n \cdot (N + 1) - R$

- Find the critical value from the Wilcoxon tables (Table 21); if either $R$ or $R^1$ less than the critical value, then the null hypothesis of equal means is rejected



## Proof

In [23]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns

In [41]:
# def get_test_statistic():
#     ...

critical_val = 51

def get_test_statistic():
    sample1 = np.random.normal(10,5, 9)
    sample2 = np.random.normal(10,5, 8)
    concat = np.concatenate((sample1, sample2))
    ranks = concat.argsort().argsort()

    r1 = np.sum(ranks[:len(sample1)])
    r2 = np.sum(ranks[len(sample1):])
    if len(sample1) < len(sample2):
        r = r1
    elif len(sample1) > len(sample2):
        r = r2
    elif len(sample1) == len(sample2):
        r = min(r1, r2)

    R = (min(len(sample1), len(sample1)) * (len(sample1) + len(sample2) + 1)) - r

    if (r < critical_val) or (R < critical_val):
        return True
    return False

In [46]:
test_statistic_distribution = [get_test_statistic() for _ in range(3_000)]
np.mean(test_statistic_distribution)

0.101