# Test 55: The rank sum difference test for the multiple comparison of K population means

## Objective

- You have $K$ random samples (with multiple observations each)
- Do they come from $K$ populations with the same mean?

## Assumptions

- The K samples must have the same size 
- The frequency distributions of the population are assumed continuous

## Method

- The $K$ samples are combined and sorted, then assigned a rank. 
- The largest value is assigned rank 1. 
- Compute the rank sum for each sample
- The test statistic for 2 samples is $R_i - R_j$
- The test statistic follows a the two-sided rank-sum difference test table (Table 23). If the test statistic exceeds the critical value, then the null hypothesis of equal means is rejected

## Example

- Scenario: 
    - A perfume manufacturer has four floral fragrances and wishes to compare each one against the others in a preference test. Selected perfume testers can give a perfume a score between 1 and 100. For each of these four fragrances four testers

- Results
    | | Perfume 1 | Perfume 2 | Perfume 3 | Perfume 4 |
    | --- | --- | --- | --- | --- |
    | Tester 1 | 70 | 12 | 10 | 29 |
    | Tester 2 | 52 | 18 | 43 | 31 |
    | Tester 3 | 51 | 35 | 28 | 41 |
    | Tester 4 | 67 | 36 | 26 | 44 |

- So by the methodology above, we can rewrite the scores as ranks instead
    | | Perfume 1 | Perfume 2 | Perfume 3 | Perfume 4 |
    | --- | --- | --- | --- | --- |
    | Tester 1 | 16 | 2 | 1 | 6 |
    | Tester 2 | 14 | 3 | 11 | 7 |
    | Tester 3 | 13 | 8 | 5 | 10 |
    | Tester 4 | 15 | 9 | 4 | 12 |

- Then to find out if fragrances are different from each other:
    - P1 = 16+14+13+15 = 58
    - P2 = 22
    - P3 = 21
    - P4 = 35

- Critical value from table is 34.6

- So P1 - P2 = 36 > 34.6, so 1 and 2 are different

## Proof

In [9]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns

- Here there is a small trick to get rank of an array
    - Let $a$ be the element $i$ in `array.argsort()`, (that is; `a = array.argsort()[i]`) 
    - This tells us that, for `array` to be sorted `array[a]` should be placed at index $i$
    - For example:
        - `array = [1,7,3,2,5,4]`
        - `array.argsort() = [0,3,2,5,4,1]`
        - Since `array.argsort()[1] = 3`, it tells us that `array[3]=2` should be placed at index `1` for array to be sorted
    
    - What happens when we do `array.argsort().argsort()`?
        - Using same example; `array.argsort() = [0,3,2,5,4,1]`
            - Recall that array.argsort() tells you where the index `i` of each value in array should be
            - Then the argsort() of `array.argsort()` tells you where `array[i]` is ranked
            - That is, array.argsort() tells us `array[1]` should be in position `5`
            - Since 
                - `array.argsort()[5] = 1`
                - we know that `array.argsort()` of size $n$ must contains non-repeating values between $[0, n-1]$
                - So sorting the argsort() simply tells us the rank of the argsorted index!
                - That is, if `array.argsort()[5] = 1`, then `array[1]` must belong to index `5`

    - That is
        - array.argsort() answers: "If sorted, where does each element go?"
        - array.argsort().argsort() on that result answers: "Where does my original index appear in the sorted order?"

In [20]:
from itertools import combinations

K = 5
sample_size=5
critical_value = 63.5
def get_test_statistic():
    samples = np.random.normal(10,5,K*sample_size).reshape(K, sample_size)
    ranks = samples.reshape(-1).argsort().argsort().reshape(K, sample_size)
    rank_sums = list(np.sum(ranks, axis=1))

    test_statistics = np.array([
        (max(x) - min(x)) for x in combinations(rank_sums, 2)
    ])
    return test_statistics

get_test_statistic() > critical_value

array([False, False, False, False, False, False, False, False, False,
       False])

In [45]:
test_statistic_distribution = [any(get_test_statistic() > critical_value) for _ in range(3_000)]
np.mean(test_statistic_distribution)

np.float64(0.03133333333333333)