## References

H.B. Mann and D.R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other”, The Annals of Mathematical Statistics, Vol. 18, pp. 50-60, 1947.

Hollander, M., Wolfe, D.A. and Chicken, E. (2014) Nonparametric Statistical Methods. 3rd Edition, John Wiley & Sons, Inc., New York.

## Assumptions

We obtain $N=m+n$ observations $X_1, \ldots, X_m$ and $Y_1, \ldots, Y_n$.


* The observations $X_1, \ldots, X_m$ are a random sample from population 1; that is, the $X$'s are independent and identically distributed. The observations $Y_1, \ldots, Y_n$ are a random sample from population 2; the $Y$'s are independent and identically distributed.

* The $X$'s and $Y$'s are mutually independent. Thus, in addition to assumptions of independence within each sample, we also assume independence between the two samples.

* Populations 1 and 2 are **continuous populations**.

For a test that handles discrete populations, see the Brunner-Munzel test.

## Hypothesis

### Null Hypothesis

Let $F$ be the distribution function corresponding to population 1 and let $G$ be the distribution function corresponding to population 2. Typicaly, group 1 is the control group and group 2 is the treatment group.

The null hypothesis $H_0$ is that the two populations are identical, that is:

$$
\begin{align*}
H_0: F(t) = G(t) \quad \text{for all } t.
\end{align*}
$$

Note that although the null hypothesis asserts that $X$ and $Y$ have the same CDF, the common CDF is not specified.

### Alternative Hypothesis

The alternative hypothesis in a two-sample location problem typically states that $Y$ is generally larger (or smaller) than $X$. One useful model for this is the **translation model**, also called the **location-shift model**:

$$
G(t) = F(t - \Delta), \quad \text{for all } t
$$

This means population 2 is identical to population 1, but shifted by an amount $\Delta$. Another way to express this is:

$$
Y \stackrel{d}{=} X + \Delta
$$

where $\stackrel{d}{=}$ indicates "has the same distribution as." The parameter $\Delta$ is the **location shift** or **treatment effect**. If $X$ is from population 1 (control) and $Y$ is from population 2 (treatment), then $\Delta$ represents the **expected effect of the treatment**. If the mean $E(X)$ of population 1 exists, and $E(Y)$ is the mean of population 2, then:

$$
\Delta = E(Y) - E(X)
$$

is the difference in population means. Under the location-shift model, the null hypothesis $H_0$ becomes:

$$
H_0: \Delta = 0
$$

which asserts that the population means are equal, implying no treatment effect.

## Mann-Whitney U Statistic

The Mann-Whitney statistic is defined as:

$$
\begin{align*}
U=\sum_{i=1}^m \sum_{j=1}^n \phi\left(X_i, Y_j\right)
\end{align*}
$$

where

$$
\begin{align*}
\phi\left(X_i, Y_j\right)= \begin{cases}1 & \text { if } X_i<Y_j \\ 0 & \text { otherwise. }\end{cases}
\end{align*}
$$

The generalization of the Mann-Whitney U statistic to the case of tied observations is:

$$
\begin{align*}
U=\sum_{i=1}^m \sum_{j=1}^n \phi^*\left(X_i, Y_j\right)
\end{align*}
$$

where

$$
\begin{align*}
\phi^*\left(X_i, Y_j\right)= \begin{cases}1, & \text { if } X_i<Y_j \\ \frac{1}{2}, & \text { if } X_i=Y_j \\ 0, & \text { if } X_i>Y_j\end{cases}
\end{align*}
$$

### Null Distribution

Some textbooks and some software find it more convenient to use $U^\prime$ instead of $U$ as the test statistic. The two statistics are related by

$$
\begin{align*}
U^{\prime}=U- mn
\end{align*}
$$

The possible values of $U$ and $U^{\prime}$ are $0,1, \ldots$, $m n$. Furthermore, when $H_0$ is true, the mean and variance of $U$ and $U^{\prime}$ are, respectively,

$$
\begin{align*}
E_0(U) & =E_0\left(U^{\prime}\right)=\frac{mn}{2} \\
\operatorname{Var}_0(U) & =\operatorname{Var}_0\left(U^{\prime}\right)=\frac{m n(m+n+1)}{12}
\end{align*}
$$


The null distributions of $U$ and $U^{\prime}$ are symmetric about the mean $\frac{mn}{2}$.

### Alternative Hypotheses

Let $F(t)$ and $G(t)$ be the cumulative distribution functions (CDFs) of the distributions underlying $X$ and $Y$, respectively. The alternative hypotheses for the Mann-Whitney U test are defined as follows:

#### 1. **Two-Sided Alternative (Two-Sided)**:

   - $F_{X}(t) \neq G_{Y}(t)$ for at least one $t$.  

   - **Interpretation**: The distributions underlying $X$ and $Y$ are not equal.

#### 2. **Lower Tail Alternative (Less)**:

   - $F_{X}(t) > G_{Y}(t)$ for all $t$.  

   - **Interpretation**: The distribution underlying $X$ is stochastically **less than** that of $Y$. In other words, $X$ tends to take on smaller values compared to $Y$. This is because the probability that $X$ is less than or equal to any given $t$ is greater than the probability that $Y$ is less than or equal to $t$.

#### 3. **Upper Tail Alternative (Greater)**:

   - $F_{X}(t) < G_{Y}(t)$ for all $t$.  
   
   - **Interpretation**: The distribution underlying $X$ is stochastically greater than that of $Y$. This implies that $X$ tends to take on larger values compared to $Y$, as the probability of $X$ being less than or equal to any given $t$ is smaller than the probability of $Y$ being less than or equal to $t$.

These hypotheses describe the relationship between the CDFs. Even though the direction of the inequalities might seem counterintuitive, they correctly indicate that if $F(t) > G(t)$, samples drawn from $X$ tends to be less than those drawn from $Y$. Similarly, if $F(t) < G(t)$, samples drawn from $X$ tends to be greater than those drawn from $Y$.


## Examples

The following examples illustrate the Mann-Whitney U test under two stress scenarios:

* Two-Sided Alternative: The distributions underlying $X$ and $Y$ are equal but the sample sizes are small.

* Lower Tail Alternative: The distribution underlying $X$ is stochastically less than that of $Y$.

In [1]:
import random

import numpy as np
from scipy.stats import PermutationMethod, mannwhitneyu, norm



### No Ties & Either Sample Size is Small (< 8)

The `exact` method computes the exact p-value by comparing the observed $U$ statistic to the exact distribution of the $U$ statistic under the null hypothesis; no correction is made for ties.

In [24]:
# Generate random n and m
n_control_example_1 = random.randint(3, 8)
n_treatment_example_1 = random.randint(8, 12)

# Define the mean and standard deviation for the normal distribution
mean_example_1 = 10
std_dev_example_1 = 3

# Generate data for control and treatment groups from the same normal distribution
control_group_example_1 = norm.rvs(
    loc=mean_example_1, scale=std_dev_example_1, size=n_control_example_1
)
treatment_group_example_1 = norm.rvs(
    loc=mean_example_1, scale=std_dev_example_1, size=n_treatment_example_1
)

print(f"The size of the control group is {n_control_example_1}")
print(f"The size of the treatment group is {n_treatment_example_1}")

The size of the control group is 4
The size of the treatment group is 11


In [25]:
mannwhitneyu(
    control_group_example_1,
    treatment_group_example_1,
    alternative="two-sided",
    method="exact",
)

MannwhitneyuResult(statistic=np.float64(28.0), pvalue=np.float64(0.48937728937728936))

### Ties Present & Either Sample Size is Small (< 8)

The `permutation` method conducts the permutation version of the test:

**Note**: The variance of the samples are still assumed to be equal.

In [32]:
n_control_example_2 = random.randint(5, 7)
n_treatment_example_2 = random.randint(9, 12)

# Treatment effect for the location-shift model
delta_example_2 = 5

# Generate data for control and treatment groups from normal distributions with different means
mean_example_2 = 17
std_dev_example_2 = 3

control_group_example_2 = norm.rvs(
    loc=mean_example_2, scale=std_dev_example_2, size=n_control_example_2
)
treatment_group_example_2 = norm.rvs(
    loc=mean_example_2 + delta_example_2,
    scale=std_dev_example_2,
    size=n_treatment_example_2,
)

print(f"The size of the control group is {n_control_example_2}")
print(f"The size of the treatment group is {n_treatment_example_2}")

The size of the control group is 5
The size of the treatment group is 10


In [34]:
rs = np.random.RandomState(12345)
res_wrt_x = mannwhitneyu(
    control_group_example_2,
    treatment_group_example_2,
    alternative="less",
    method=PermutationMethod(n_resamples=9999, random_state=rs),
)
res_wrt_x

MannwhitneyuResult(statistic=np.float64(2.0), pvalue=np.float64(0.001332001332001332))

The test statistic in the output is the Mann-Whitney U statistic with respect to the first sample $X$ with the following hypothesese:

- $H_0$: Samples drawn from the distribution of $X$ (control) is the same as or stochastically greater than those drawn from the distribution of $Y$ (treatment).

- $H_1$: Samples drawn from the distribution of $X$ (control) is stochastically less than those drawn from the distribution of $Y$ (treatment).

To obtain the test statistic with respect to the second sample $Y$:

$$
\begin{align*}
U_{Y} = m \times n - U_{X}
\end{align*}
$$

where $U_{X}$ is the Mann-Whitney U statistic with respect to the first sample $X$.

In [35]:
(
    control_group_example_2.shape[0] * treatment_group_example_2.shape[0]
    - res_wrt_x.statistic
)

np.float64(48.0)

In [36]:
mannwhitneyu(
    treatment_group_example_2,
    control_group_example_2,
    alternative="greater",
    method=PermutationMethod(n_resamples=9999, random_state=rs),
)

MannwhitneyuResult(statistic=np.float64(48.0), pvalue=np.float64(0.001332001332001332))