# Kolmogorov-Smirnov Test


Kolmogorov - Smirnov (KS) test is a non-parametric test to compare the equality of two continuous one dimensional probability distributions. In this test, we quantify the distance (absolute difference) between distributions. These two distributions could be two different sample, **or one could be sample and another one a theoretical distribution**. Let us test if our generated normal random variable follow normal distribution or not. st.kstest is the function to to perform KS test.

![KS Hypothesis](KS_plot.png)

The graph shows two curves: the red line is **CDF** or the **Cumulative distribution function** of the theorical distribution, whilst the blue one is the **empirical CDF**, which is the distribution of the sample.

The test answers the question "How likely is it that we would see a collection of samples like this if they were drawn from that probability distribution?" or, in the second case, "How likely is it that we would see two sets of samples like this if they were drawn from the same (but unknown) probability distribution?".

In [9]:
import numpy as np
import scipy.stats as st

In [10]:
# generamos numeros aleatorios distribuidos normalmente, similar a rnorm() de R
x = np.random.randn(1000)

# Kolmogorov-Smirnov Test  # D the greatest or max vertical distance between the two distributions
                           # x and the theorical 'norm', in other words, the KS statistic
D,p = st.kstest(x,'norm') # p -> p-value
print(p)

0.1768260698073626


![KS Hypothesis](hypothesis_Kolmogorov_Smirnov.png)

We get a p-value higher than the threshold, which means that our generated normally distributed random variable is in fact normal. We can also test if the the generated uniformly distributed random variable are not normal by chance. In this we get a p-value less than the threshold, which means that our generated random numbers in this case are not normal.

In [11]:
# genera numeros aleatorios en el intervalo [0, 1) con una distribución uniforme
y = np.random.rand(1000) 

D, p = st.kstest(y,'norm')
print(p)

3.988092097923799e-232


In [12]:
D,p = st.kstest(y,x)
print(D)
print(p)

0.496
4.361853008614382e-112
