<a href="https://colab.research.google.com/github/ttruong1000/MAT-494-Mathematical-Methods-for-Data-Science/blob/main/2_4_Maximum_Likelihood_Estimation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **2.4 - Maximum Likelihood Estimation (MLE)**

### **2.4.0 - Python Libraries for Maximum Likelihood Estimation (MLE)**

In [None]:
import numpy as np, pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from scipy.optimize import minimize
import scipy.stats as stats
import pymc3 as pm3
import numdifftools as ndt
import statsmodels.api as sm
from statsmodels.base.model import GenericLikelihoodModel

### **2.4.1 - MLE for Random Samples**

##### Definition 2.4.1.1 - Maximum Likehood Estimation

Maximum Likelihood Estimation (MLE) is an effective approach of estimating the parameters of a probability distribution through maximizing a likelihood function. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate.

##### Definition 2.4.1.2 - Maximum Likelihood Estimates

Let $X_1, X_2, \ldots, X_n$ have a joint PMF (discrete) or PDF (continuous) of
\begin{equation*}
  f(x_1, x_2, \ldots, x_n; \theta_1, \theta_2, \ldots, \theta_n)
\end{equation*}
where the parameters $\theta_1, \theta_2, \ldots \theta_n$ have unknown values. When $x_1, x_2, \ldots, x_n$ are the observed sample values and $f(x_1, x_2, \ldots, x_n; \theta_1, \theta_2, \ldots, \theta_n)$ is regarded as a function of $\theta_1, \theta_2, \ldots \theta_n$, this function is called the likelihood function. The maximum likelihood estimates (MLEs) $\hat{\theta_1}, \hat{\theta_2}, \ldots, \hat{\theta_n}$ are those values of the $\theta_i$'s that maximize the likelihood function so that
\begin{equation*}
  f(x_1, x_2, \ldots, x_n; \hat{\theta_1}, \hat{\theta_2}, \ldots, \hat{\theta_n}) \geq f(x_1, x_2, \ldots, x_n; \theta_1, \theta_2, \ldots, \theta_n)
\end{equation*}
for all $\theta_1, \theta_2, \ldots, \theta_n)$. The maximum likelihood estimators are obtained when $x_1 = X_1$, $x_2 = X_2$, $\ldots$, $x_n = X_n$.

##### Definition 2.4.1.3 - Maximum Likelihood Estimates in a Normal Distribution

Let $X_1, X_2, \ldots, X_n$ be a random sample from a normal distribution. Then, the likelihood function is
\begin{equation*}
  f(x_1, x_2, \ldots, x_n; \mu, \sigma^2) = \left(\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x_1 - \mu)^2}{2\sigma^2}}\right)\left(\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x_2 - \mu)^2}{2\sigma^2}}\right)\cdots\left(\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x_n - \mu)^2}{2\sigma^2}}\right)
\end{equation*}
\begin{equation*}
  f(x_1, x_2, \ldots, x_n; \mu, \sigma^2) = \left(\frac{1}{\sqrt{2\pi}\sigma}\right)^ne^{-\displaystyle\sum_{i = 1}^n\frac{(x_i - \mu)^2}{2\sigma^2}}
\end{equation*}
The resulting maximum likelihood estimates by taking the partial derivatives of this function with respect to $\mu$ and $\sigma^2$ is
\begin{equation*}
  \hat{\mu} = \overline{X} \quad \hat{\sigma}^2 = \frac{\displaystyle\sum_{i = 1}^n(X_i - \overline{X})}{n}
\end{equation*}

### **2.4.2 - MLE for Linear Regression**

### **2.4.3 - Examples of MLE**

### **2.4.4 - References**

1. MAT 494 Chapter 2 Notes
2. https://analyticsindiamag.com/maximum-likelihood-estimation-python-guide/
3. https://github.com/RajkumarGalaxy/StructuredData/blob/master/MLE_Maximum_Likelihood_Estimation.ipynb
4. https://towardsdatascience.com/a-gentle-introduction-to-maximum-likelihood-estimation-9fbff27ea12f