## Jensen Shannon Divergence

*NOTE: To understand this derivation, you first need to understand Kullback-Liebler Divergence (KL Divergence). So go read that notebook if you are not already clear about the derivation of KL Divergence. This notebook will assume full knowledge of everything there*

- We have established that the KL-divergence between distributions $P(X)$ and $Q(X)$, denoted as $D_{KL}(P || Q)$ is simply the difference between the cross entropy of approximating $P(X)$ with $Q(X)$, and the entropy of $Q(X)$
$$\begin{aligned}
    D_{KL}(P || Q) &= \sum_X P(X) \log_2(\frac{P(X)}{Q(X)})
\end{aligned}$$

- However, this runs into an immediate problem; KL divergence is not symmetric. 
$$\begin{aligned}
    D_{KL}(P || Q) \neq D_{KL}(Q || P)
\end{aligned}$$

- Jensen-Shannon is simply adapting the KL-divergence to make it a symmetric measure of distance between $P(X)$ and $Q(X)$

### Theory

- To ensure symmetric measure of distance, for a given $P(X)$ and $Q(X)$, find the midpoint distribution $M(X)$ by taking $M(X) = \frac{P(X) + Q(X)}{2}$

- Then, we compute the KL-divergences $D_{KL}(P || M)$ and $D_{KL}(Q || M)$

- Finally, just take the weighted average of the 2 KL-divergences as the Jensen Shannon Divergence

### Implementation

In [None]:
import numpy as np

def getdist():
    x = np.random.rand(10) 
    return x/np.sum(x)

def yj_kl_div(p, q):
    res=0
    for i in range(len(p)):
        res += p[i] * np.log(p[i]/q[i])
    return res

def yj_js_div(p, q):
    m = (p+q)/2
    js_div = 0.5 * (yj_kl_div(p,m) + yj_kl_div(q,m))
    return js_div


In [None]:
p = getdist()
q = getdist()
yj_js_div(p, q)

np.float64(0.10552098302073679)

In [None]:
from scipy.spatial.distance import jensenshannon
jensenshannon(p,q) ** 2 ##the scipy module returns the sqrt value

np.float64(0.10552098302073681)