In [1]:
## basic tutorial on calculating KL Divergence
## tutorial url:
## https://www.statology.org/kl-divergence-python/

In statistics, the Kullback–Leibler (KL) divergence is a distance metric that quantifies the difference between two probability distributions.

If we have two probability distributions, P and Q, we typically write the KL divergence using the notation KL(P || Q), which means “P’s divergence from Q.”

We calculate it using the following formula:

KL(P || Q) = ΣP(x) ln(P(x) / Q(x))

If the KL divergence between two distributions is zero, then it indicates that the distributions are identical.

We can use the scipy.special.rel_entr() function to calculate the KL divergence between two probability distributions in Python.

In [3]:
from scipy.special import rel_entr

In [2]:
#define two probability distributions
P = [.05, .1, .2, .05, .15, .25, .08, .12]
Q = [.3, .1, .2, .1, .1, .02, .08, .1]

In [5]:
#calculate (P || Q)
sum(rel_entr(P, Q))

0.589885181619163

Note that the units used in this calculation are known as nats, which is short for natural unit of information.

Thus, we would say that the KL divergence is 0.589 nats.

Also note that the KL divergence is not a symmetric metric.

In [6]:
#calculate (Q || P)
sum(rel_entr(Q, P))

0.497549319448034