## References

H.B. Mann and D.R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other”, The Annals of Mathematical Statistics, Vol. 18, pp. 50-60, 1947.

## Assumptions

We obtain $N=m+n$ observations $X_1, \ldots, X_m$ and $Y_1, \ldots, Y_n$.


* A1. The observations $X_1, \ldots, X_m$ are a random sample from population 1 ; that is, the $X$ 's are independent and identically distributed. The observations $Y_1, \ldots, Y_n$ are a random sample from population 2 ; that is, the $Y$ 's are independent and identically distributed.

* A2. The $X$ 's and $Y$ 's are mutually independent. Thus, in addition to assumptions of independence within each sample, we also assume independence between the two samples.

* A3. Populations 1 and 2 are continuous populations.

## Hypothesis

### Null Hypothesis

Let $F$ be the distribution function corresponding to population 1 and let $G$ be the distribution function corresponding to population 2.

The null hypothesis $H_0$ is that the two populations are identical, that is:

$$
\begin{align*}
H_0: F(t) = G(t) \quad \text{for all } t.
\end{align*}
$$

Note that although the null hypothesis asserts that $X$ and $Y$ have the same CDF, the common CDF is not specified.

### Alternative Hypothesis

The alternative hypothesis in a two-sample location problem typically states that $Y$ is generally larger (or smaller) than $X$. One useful model for this is the **translation model**, also called the **location-shift model**:

$$
G(t) = F(t - \Delta), \quad \text{for all } t
$$

This means population 2 is identical to population 1, but shifted by an amount $\Delta$. Another way to express this is:

$$
Y \stackrel{d}{=} X + \Delta
$$

where $\stackrel{d}{=}$ indicates "has the same distribution as." The parameter $\Delta$ is the **location shift** or **treatment effect**. If $X$ is from population 1 (control) and $Y$ is from population 2 (treatment), then $\Delta$ represents the **expected effect of the treatment**. If the mean $E(X)$ of population 1 exists, and $E(Y)$ is the mean of population 2, then:

$$
\Delta = E(Y) - E(X)
$$

is the difference in population means. Under the location-shift model, the null hypothesis $H_0$ becomes:

$$
H_0: \Delta = 0
$$

which asserts that the population means are equal, implying no treatment effect.

## Mann-Whitney Statistic

The Mann-Whitney statistic is defined as:

$$
\begin{align*}
U=\sum_{i=1}^m \sum_{j=1}^n \phi\left(X_i, Y_j\right)
\end{align*}
$$

where

$$
\begin{align*}
\phi\left(X_i, Y_j\right)= \begin{cases}1 & \text { if } X_i<Y_j \\ 0 & \text { otherwise. }\end{cases}
\end{align*}
$$

The generalization of the Mann-Whitney statistic to the case of tied observations is:

$$
\begin{align*}
U=\sum_{i=1}^m \sum_{j=1}^n \phi^*\left(X_i, Y_j\right)
\end{align*}
$$

where

$$
\begin{align*}
\phi^*\left(X_i, Y_j\right)= \begin{cases}1, & \text { if } X_i<Y_j \\ \frac{1}{2}, & \text { if } X_i=Y_j \\ 0, & \text { if } X_i>Y_j\end{cases}
\end{align*}
$$

### Null Distribution

Some textbooks and some software find it more convenient to use $U^\prime$ instead of $U$ as the test statistic. The two statistics are related by

$$
\begin{align*}
U^{\prime}=U- mn
\end{align*}
$$

The possible values of $U$ and $U^{\prime}$ are $0,1, \ldots$, $m n$. Furthermore, when $H_0$ is true, the mean and variance of $U$ and $U^{\prime}$ are, respectively,

$$
\begin{align*}
E_0(U) & =E_0\left(U^{\prime}\right)=\frac{mn}{2} \\
\operatorname{Var}_0(U) & =\operatorname{Var}_0\left(U^{\prime}\right)=\frac{m n(m+n+1)}{12}
\end{align*}
$$


The null distributions of $U$ and $U^{\prime}$ are symmetric about the mean $\frac{mn}{2}$.

### Alternative Hypotheses

Let $F(t)$ and $G(t)$ be the cumulative distribution functions (CDFs) of the distributions underlying $X$ and $Y$, respectively. The alternative hypotheses for the Mann-Whitney U test are defined as follows:

#### 1. **Two-Sided Alternative (‘two-sided’)**:

   - $F(t) \neq G(t)$ for at least one $t$.  

   - **Interpretation**: The distributions underlying $X$ and $Y$ are not equal.

#### 2. **Lower Tail Alternative (‘less’)**:

   - $F(t) > G(t)$ for all $t$.  

   - **Interpretation**: The distribution underlying $X$ is stochastically **less than** that of $Y$. In other words, $X$ tends to take on smaller values compared to $Y$. This is because the probability that $X$ is less than or equal to any given $t$ is greater than the probability that $Y$ is less than or equal to $t$.

#### 3. **Upper Tail Alternative (‘greater’)**:

   - $F(t) < G(t)$ for all $t$.  
   
   - **Interpretation**: The distribution underlying $X$ is stochastically greater than that of $Y$. This implies that $X$ tends to take on larger values compared to $Y$, as the probability of $X$ being less than or equal to any given $t$ is smaller than that of $Y$.

These hypotheses describe the relationship between the CDFs. Even though the direction of the inequalities might seem counterintuitive, they correctly indicate that if $F(t) > G(t)$, samples drawn from $X$ tends to be less than those drawn from $Y$. Similarly, if $F(t) < G(t)$, sampples drawn from $X$ tends to be greater than those drawn from $Y$.


## Cases 

In [22]:
from scipy.stats import mannwhitneyu, PermutationMethod
import numpy as np
from scipy.stats import norm
import random



### No Ties & Either Sample Size is Small (< 8)

The `exact` method computes the exact p-value by comparing the observed $U$ statistic to the exact distribution of the $U$ statistic under the null hypothesis; no correction is made for ties.

In [34]:
# Generate random n and m
n = random.randint(3, 8)
m = random.randint(8, 12)

# Define the mean and standard deviation for the normal distribution
mean = 10
std_dev = 3

# Generate data for x and y from the same normal distribution
x = norm.rvs(loc=mean, scale=std_dev, size=n)
y = norm.rvs(loc=mean, scale=std_dev, size=m)

x, y

(array([ 4.90604786,  7.1535022 ,  9.0975125 ,  3.64767472,  6.73705288,
        12.11357852,  9.43230283]),
 array([ 6.91743252,  9.35244026, 10.81840993, 13.30595309, 10.61167553,
        10.27983169, 10.44614491, 11.31749122, 18.49084223, 11.56369562,
        10.8920939 , 10.76189071]))

In [35]:
mannwhitneyu(x, y, alternative='two-sided', method="exact")

MannwhitneyuResult(statistic=np.float64(14.0), pvalue=np.float64(0.01706755576724617))

### Ties Present & Either Sample Size is Small (< 8)

The `permutation` method conducts the permutation version of the test:

**Note**: The variance of the samples are assumed to be equal.

In [65]:
# Generate random n and m
n = random.randint(3, 7)
m = random.randint(8, 12)

delta = 5

x = norm.rvs(loc=mean, scale=std_dev, size=n)
y = norm.rvs(loc=mean + delta, scale=std_dev, size=m)

x, y

(array([11.69749582, 11.78467761,  9.35781118, 13.36375481, 11.1938952 ,
        13.34797627]),
 array([21.25842092, 16.48838726, 12.10239594, 12.85302506, 11.50726303,
        13.41354991, 14.88162135, 12.22757211, 13.11881165, 17.89117204]))

In [71]:
rs = np.random.RandomState(12345)
res_wrt_x = mannwhitneyu(x, y, alternative='less', method=PermutationMethod(n_resamples=9999, random_state=rs))
res_wrt_x

MannwhitneyuResult(statistic=np.float64(12.0), pvalue=np.float64(0.027972027972027972))

The test statistic in the output is the Mann-Whitney U statistic with respect to the first sample $X$ with the following hypothesese:

- $H_0$: The distribution of $X$ is the same as or greater than the distribution of $Y$.

- $H_1$: The distribution of $X$ is less than the distribution of $Y$.

To obtain the test statistic with respect to the second sample $Y$:

$$
\begin{align*}
U_{Y} = m \times n - U_{X}
\end{align*}
$$

where $U_{X}$ is the Mann-Whitney U statistic with respect to the first sample $X$.

In [73]:
x.shape[0] * y.shape[0] - res_wrt_x.statistic

np.float64(48.0)

In [74]:
mannwhitneyu(y, x, alternative='greater', method=PermutationMethod(n_resamples=9999, random_state=rs))

MannwhitneyuResult(statistic=np.float64(48.0), pvalue=np.float64(0.027972027972027972))