# Test 49: Wilcoxon inversion test (U-test)

## Objective

- I have 2 samples from 2 populations
- Do these 2 populations have the same frequency distribution?

## Assumptions

- The two frequency distributions are continuous 
- The two samples are random and independent

## Method

- Take 2 samples from 2 populations, of sizes $n_1$ and $n_2$ 

- Merge the 2 samples, and arrange in ascending order

- Let $x = [1,3,5]$ and $y = [2,4,6]$

- Then a combined and sorted array is simply $[1,2,3,4,5,6]$

- `2` has an inversion value of 1, since it is larger than 1 value from $x$
- `4` has an inversion value of 2, since it is larger than 2 values from $x$
- `6` has an inversion value of 3, since it is larger than 3 values from $x$

- All in all, this gives inversion count of $1+2+3 = 6$ 
- Thereby, noninversion value is $(n_1 * n_2) - 6 = 9 - 6 = 3$

- The test statistic $U$ is $\min({6,3})=3$

- Check this against the Wilcoxon inversion test critical values (Table 20)

## Proof

In [1]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns

In [33]:
sample_size = 5

def get_test_statistic():
    x = np.random.randint(1, 20, sample_size)
    y = np.random.randint(1, 20, sample_size)
    # x = np.array([1,3,5])
    # y = np.array([2,4,6])

    xsorted = np.sort(x)
    ysorted = np.sort(y)
    sorted_arr = np.array([])
    inversion_count = 0

    xpointer=0
    ypointer=0
    while (xpointer <= (len(x) - 1)) and (ypointer <= (len(y)-1)):
        if xsorted[0] <= ysorted[0]:
            if xsorted[xpointer] <= ysorted[ypointer]:
                xpointer+=1
                continue
            
            if ysorted[ypointer] < xsorted[xpointer]:
                ypointer+=1
                inversion_count+=xpointer
                continue
        else:
            if xsorted[xpointer] <= ysorted[ypointer]:
                xpointer+=1
                inversion_count+=ypointer
                continue
            
            if ysorted[ypointer] < xsorted[xpointer]:
                ypointer+=1
                continue

    if (xsorted[0] <= ysorted[0]) & (xpointer > (len(x) - 1)):
        inversion_count += (len(y) - (ypointer)) * len(x)
    elif (xsorted[0] <= ysorted[0]) & (ypointer > (len(x) - 1)):
        ...
    elif (xsorted[0] > ysorted[0]) & (xpointer > (len(x) - 1)):
        ...
    elif (xsorted[0] > ysorted[0]) & (ypointer > (len(x) - 1)):
        inversion_count += (len(x) - (xpointer)) * len(y)

    non_inversion_count = (len(x) * len(y)) - inversion_count
    test_statistic = min(inversion_count, non_inversion_count)
    return test_statistic <= 2


In [47]:
np.mean([get_test_statistic() for _ in range(1_000)])

0.035