In [1]:
import numpy as np

In [2]:
def entropy(p):
    """Compute the Shannon entropy of a distribution.

    The Shannon entropy is defined as follows
    :math:`\sum_x p(x_i) * \log p(x_i)`.

    Parameters
    ----------
    p : np.ndarray

    Examples
    --------
    >>> distribution = np.array([0.25, 0.25, 0.5])
    >>> entropy(distribution)
    1.5

    """
    assert isinstance(p, np.ndarray), '`p` must be a numpy array'
    assert np.isclose(np.sum(p), 1.), '`p` must be a probability distribution'

    p = p[np.nonzero(p)]

    return np.sum(-p * np.log2(p))

In [5]:
entropy(np.array([0.3,0.4,0.3]))

1.5709505944546684

In [6]:
def kl_divergence(p, q):
    """Compute the Kullback-Leibler divergence between two distributions.

    The KL divergence is defined as
    :math:`D_{KL}(p, q) = \sum_x p(x_i) * (\log p(x_i) - \log q(x_i))`
    which can be rewritten as
    :math:`D_{KL}(p, q) = \sum_x p(x_i) * \log \frac{p(x_i)}{q(x_i)}`
    and is computationally more conventient.

    Some interesting properties of the KL divergence:
      - The KL divergence is always non-negative, i.e.
        :math:`D_{KL}(p, q) \geq 0`.

      - The KL divergence is additive for independent distributions, i.e.
        :math:`D_{KL}(P, Q) = D_{KL}(P_1, Q_1) + D_{KL}(P_2, Q_2)`.

    Parameters
    ----------
    p : np.ndarray
    q : np.ndarray

    Examples
    --------
    >>> p = np.array([0.7, 0.2, 0.05, 0.05])
    >>> q = np.array([0.05, 0.05, 0.2, 0.7])
    >>> kl_divergence(p, q)
    2.77478069934

    """
    assert isinstance(p, np.ndarray), '`p` must be a numpy array'
    assert np.isclose(np.sum(p), 1.), '`p` must be a probability distribution'

    assert isinstance(q, np.ndarray), '`q` must be a numpy array'
    assert np.isclose(np.sum(q), 1.), '`q` must be a probability distribution'

    # Define the zero masks for P and Q and ignore them during computation
    q_mask, p_mask = q == 0, p == 0
    # The implication `p => q` is equivalent to `not p or q`
    assert all(~q_mask | p_mask), 'The KL divergence is defined iif Q(x)=0 implies P(x)=0'
    p, q = p[~p_mask], q[~q_mask]

    return np.sum(p * np.log2(p / q))

In [8]:
kl_divergence(np.array([0.3,0.4,0.3]), np.array([0.3,0.4,0.3]))

0.0

Suppose, one random variable X has 50 data points and another random variable Y has 52 data points. Can I calculate KL Divergence of these two distributions ?

- To compute the **Kullback-Leibler (KL) divergence** between two distributions \( P \) and \( Q \), both distributions must be defined over the same probability space.

For your case:

1. You have two random variables \( X \) (50 points) and \( Y \) (52 points).
2. If you want to compute KL divergence, you first need to estimate the probability distributions \( P(X) \) and \( Q(Y) \).

### Steps to Check Feasibility:
- **Ensure Same Support**: KL divergence is only meaningful if \( P(x) > 0 \) implies \( Q(x) > 0 \). That is, every value where \( P \) has probability mass should also exist in \( Q \). Otherwise, KL divergence is undefined.
- **Estimate Probability Distributions**: Since you only have discrete samples, you can estimate probability mass functions (PMFs) using **histograms** or **kernel density estimation (KDE)**.
- **Bin Alignment**: If using histograms, make sure both distributions use the same binning strategy.
- **Compute KL Divergence**: Using the formula:
  
  $
  D_{\text{KL}}(P \parallel Q) = \sum_x P(x) \log \frac{P(x)}{Q(x)}
  $

### Possible Issues:
- **Different Sample Sizes**: Since \( X \) has 50 samples and \( Y \) has 52, you need to estimate probability distributions from these finite samples.
- **Zero Probability Issue**: If \( Q(x) = 0 \) for any \( x \) where \( P(x) > 0 \), KL divergence becomes undefined. A common fix is **smoothing techniques** like adding a small constant (Laplace smoothing) to probability estimates.

### Conclusion:
Yes, you **can** calculate KL divergence, but you need to **estimate distributions carefully** and ensure the support of \( P \) is contained in \( Q \). If there's a mismatch, use smoothing to avoid undefined values.

In [None]:
def symmetric_kl_divergence(p, q):
    """The symmetric Kullback-Leibler divergence.

    Kullback and Leibler themselves defined the symmetric divergence as
    :math:`D_{KL}(p, q) + D_{KL}(q, p)`.

    """
    return kl_divergence(p, q) + kl_divergence(q, p)