# Introduction to MLE (and MAP)
---
## Overview
We will be looking at parameter estimation in this notebook. In particular, we will be:
- Defining distributions
- Plotting likelihood functions
- Estimating parameters
- Comparing numerical methods (that estimate parameters even in the absence of a closed-form solution)

I thought this would be a good opportunity to introduce maximum a posteriori (MAP), which is a Bayesian method for parameter estimation. MAP incorporates a prior distribution over the possible values of the parameter to be estimated.

**CAUTION:** this is NOT an introduction to Bayesian inference. My intention is to provide examples where **prior information** *might* help with parameter estimation. Note:
- MAP is not representative of Bayesian inference. A MAP estimate is a point estimate equal to the mode of the posterior distribution. In general, Bayesians consider parameters and hypotheses as random variables. As random variables, you can (in most cases) meaningfully report statistics of the posterior distribution other than the mode (e.g. mean, standard deviation).
- There are unsolved problems with regards to the intepretation of prior probabilities. These problems are anathema to frequentists and divide (philosophically inclined) Bayesians (see https://plato.stanford.edu/entries/epistemology-bayesian/#PotPro).

In [2]:
import numpy as np
import pandas as np
import seaborn as sns
import scipy
import itertools

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

sns.set_style("darkgrid")
sns.set(rc={'figure.figsize':(12,9)})

In [None]:
# Set random seed for reproducibility
np.seed(42)

# 1. Distributions and likelihood functions

## MLE:
- We will be using `scipy`: a Python library for scientific computing. 
- `scipy` was introduced in the previous notebook on limited binary outcomes. `scipy` was used to implement Tobit from scratch.
- Do not worry if you had difficulty understanding the Tobit implementation! 
- The examples in this notebook are a lot more straightforward than the Tobit example, which required us to specify a piece-wise likelihood function. We will revisit the Tobit implementation at the end of this notebook.

## MAP:
- We will be using `pymc3`: a high-level Python library for Bayesian inference via probabalistic programming.

## 1.1 Normal distribution
- Recall that the general form of the Gaussian probability distribution function (PDF) is given by:

$$ {\displaystyle f(x)={\frac {1}{\sigma {\sqrt {2\pi }}}}e^{-{\frac {1}{2}}\left({\frac {x-\mu }{\sigma }}\right)^{2}}} $$

In [None]:
# 1. Let's define the pdf of the Normal distribution from scratch

def gaussian_pdf(mu: float,
                 sigma: float,
                 x: float) -> float:
    """Given mu, sigma, and a single observation x, 
    return the value of the Gaussian pdf.
    """
    sigma_2 = sigma**2
    pi = scipy.pi
    a1 = (sigma_2*2*pi)**-(1/2)
    a2 = -1/(2*sigma_2)*(x-mu)**2
    y = a1*np.exp(a2)
    return y


# 2. Let's define the likelihood and log-likelihood

def gaussian_likelihood(pdf, params, X, ll=False):
    """Given a Gaussian pdf, params (mu, sigma), and a 
    vector X of observations, return the value of the Gaussian 
    likelihood function.
    
    Note: if ll=True, returns the log-likelihood value.
    """
    def _pdf(mu, sigma, ll):
        """Converts the pdf function into a sequence of
        functions, where the inner-most function takes a 
        single argument x.

        This design pattern is called "currying".
        """
        def _pdf_x(x):
            if not(ll):
                func = pdf(mu, sigma, x)
            else:
                func = np.log(pdf(mu, sigma, x))
            return func
        return _pdf_x
    
    mu, sigma = params
    
    # Map X to the pdf / log-pdf function
    val_seq = np.array(list(map(_pdf(mu, sigma, ll), X)))   
    
    # If likelihood
    if not(ll):
        # Apply the PRODUCT operator
        value = np.prod(val_seq)
    # If log-likelihood
    else:
        # Apply SUMMATION
        value = sum(val_seq)
    return value

## 1.2 Bernoulli distribution
- Recall that the general form of the Bernoulli probability mass function (PMF) is given by:

$$ {\displaystyle f(x;p)= p^{x}(1-p)^{1-x}} $$ 

In [None]:
# 1. Let's define the Bernoulli distribution

def bernoulli_pmf(p: float, x: int) -> float:
    """Given probability of success p and a single observation x, 
    return the value of the Bernoulli pmf.
    """
    y = p**x*(1-p)**(1-x)
    return y
    

# 2. Let's define the likelihood and log-likelihood

def bernoulli_likelihood(pmf, p, X, ll):
    """Given a Bernoulli pmf, params (mu, sigma), and a 
    vector X of observations, return the value of the Gaussian 
    likelihood function.
    
    Note: if ll=True, returns the log-likelihood value.
    """
    def _pmf(p, ll):
        """Converts the pmf function into a sequence of
        functions, where the inner-most function takes a 
        single argument x.

        This design pattern is called "currying".
        """
        def _pmf_x(x):
            if not(ll):
                func = pmf(p, x)
            else:
                func = np.log(pmf(p, x))
            return func
        return _pmf_x

    # Map X to the pdf / log-pdf function
    val_seq = np.array(list(map(_pmf(p, ll), X)))   
    
    # If likelihood
    if not(ll):
        # Apply the PRODUCT operator
        value = np.prod(val_seq)
    # If log-likelihood
    else:
        # Apply SUMMATION
        value = sum(val_seq)
    return value

# 2. Plotting likelihoods

## 2.1 Normal distribution

In [None]:
# Let create a sample of size N=50 that is drawn from a normal distribution 
# with mean = 0 and sigma = 1

mu, sigma, N = 0, 1, 50
X_normal = np.random.normal(mu, sigma, N)

In [None]:
# Let's plot the gaussian likelihood on a 3D plane (likelihood value, mu, sigma)
# by defining a range of possible mu and sigma values.
# Use numpy's linspace function to create an array of decimal step values

mu_range = np.linspace(-1, 1, num=100)
sigma_range = np.linspace(0.05, 4.5, num=100)

In [None]:
# Calculate likelihood values

# 1. Compute the Cartesian product of mu_range and sigma_range
mu_sigma_seq = itertools.product(mu_range, sigma_range)

# 2. Apply gaussian_likelihood in a list comprehension over sequence from 1.
# 2.1 Likelihood
likelihood_val_seq = [gaussian_likelihood(gaussian_pdf,
                                          (mu, sigma),
                                          X_normal,
                                          ll=False)
                      for mu, sigma in mu_sigma_seq]

# 2.2. Log-likliehood
log_likelihood_val_seq = [gaussian_likelihood(gaussian_pdf,
                                              (mu, sigma),
                                              X_normal,
                                              ll=True)
                          for mu, sigma in mu_sigma_seq]

In [None]:
# Plot Gaussian likelihood function in 3D


In [None]:
# Plot Gaussian log-likelihood function in 3D


## 2.2. Bernoulli distribution

# 3. Numerical optimisation in Python
- `scipy` only has a minimiser.
- Therefore, we need to use negative likelihoods / negative log-likelihoods.