In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

# Gaussian discriminant analysis

## 1.2 The Gaussian Discriminant Analysis model

<font size=4>
&emsp;&emsp;When we have a classification problem in which the input features $x$ are continuous-valued random variables, we can then use the Gaussian Discriminant Analysis (GDA) model, which models $p(x\,|\,y)$ using a multivariate normal distribution. The model is:
</font>

<font size=4>
$$
\begin{align}
y&\sim{Bernoulli(\phi)} \\
x\,|\,y=0&\sim\mathcal{N}(\mu_0,\Sigma) \\
x\,|\,y=1&\sim\mathcal{N}(\mu_1,\Sigma) 
\end{align}
$$
</font>

<font size=4>
&emsp;&emsp;Writing out the distribution, this is:
</font>

<font size=4>
$$
\begin{align}
p(y) &= \phi^y(1-\phi)^{1-y} \\
p(x\,|\,y=0) &= \frac{1}{(2\pi)^{n/2}|\Sigma|^{1/2}}\,exp\left( -\frac{1}{2}(x-\mu_0)^T\Sigma^{-1}(x-\mu_0) \right) \\
p(x\,|\,y=1) &= \frac{1}{(2\pi)^{n/2}|\Sigma|^{1/2}}\,exp\left( -\frac{1}{2}(x-\mu_1)^T\Sigma^{-1}(x-\mu_1) \right)
\end{align}
$$
</font>

<font size=4>
&emsp;&emsp;Here, the parameters of our model are $\phi,\Sigma,\mu_0,\mu_1$. (Note that while there're two different mean vectors $\mu_0$ and $\mu_1$, this model is usually applied using only one covariance matrix $\Sigma$.) The log-likelihood of the data is given by
</font>

<font size=4>
$$
\begin{align}
l(\phi,\mu_0,\mu_1,\Sigma)
&= log\,\prod^m_{i=1}p(x^{(i)},y^{(i)};\phi,\mu_0,\mu_1,\Sigma) \\
&= log\,\prod^m_{i=1}p(x^{(i)}\,|\,y^{(i)};\phi,\mu_0,\mu_1,\Sigma)p(y^{(i)};\phi)
\end{align}
$$
</font>

<font size=4>
&emsp;&emsp;By maximizing $l$ with respect to the parameters, we find the maximum likelihood estimate of the parameters to be:
</font>

<font size=4>
$$
\begin{align}
\phi &= \frac{1}{m}\sum^{m}_{i=1}1\{y^{(i)}=1 \} \\
\mu_0 &= \frac{\sum^{m}_{i=1}1\{y^{(i)}=0\}x^{(i)}}{\sum^{m}_{i=1}1\{y^{(i)}=0\}} \\
\mu_1 &= \frac{\sum^{m}_{i=1}1\{y^{(i)}=1\}x^{(i)}}{\sum^{m}_{i=1}1\{y^{(i)}=1\}} \\
\Sigma &= \frac{1}{m}\sum^{m}_{i=1}(x^{(i)}-\mu_{y^{(i)}})(x^{(i)}-\mu_{y^{(i)}})^T
\end{align}
$$
</font>

## 1.3 详细推导

### 1.3.1 准备工作

<font size=4>
&emsp;&emsp;在推导高斯判别分析的过程中，需要用到以下四个公式：
</font>

<font size=4>
$$
\triangledown_xx^TAx=2Ax，其中，A为对称矩阵\quad(1)
$$
</font>

<font size=4>
$$
\triangledown_A\big|\,A\,\big|=\big|\,A\,\big|\,(A^{-1})^T\quad(2)
$$
</font>

<font size=4>
$$
\triangledown_Alog\,\big|\,A\,\big|=A^{-1}，其中，A为正定矩阵\quad(3)
$$
</font>

<font size=4>
$$
\triangledown_Ax^TAx=xx^T，其中，A为对称矩阵\quad(4)
$$
</font>

<font size=4>
&emsp;&emsp;因为，式（1）的矩阵 A 为对称矩阵，所以 $x^TAx$ 为二次型，因此，$\triangledown_xx^TAx=2Ax$。
</font>

<font size=4>
下证式（2）：
</font>

<font size=4>
&emsp;&emsp;由
</font>

<font size=4>
$$
\big|\,A\,\big|=\sum^n_{i=1}(-1)^{i+j}A_{ij}\big|\,A_{\backslash{i},\backslash{j}}\big| \quad(\,对任意\,j\in1,\cdots,n)
$$
</font>

<font size=4>
&emsp;&emsp;可得
</font>

<font size=4>
$$
\frac{\partial}{\partial{A_{kl}}}\big|\,A\,\big|=\frac{\partial}{\partial{A_{kl}}}\sum^n_{i=1}(-1)^{i+j}A_{ij}\big|\,A_{\backslash{i},\backslash{j}}\big|=(-1)^{k+l}\big|\,A_{\backslash{k},\backslash{l}}\big|=(adj(A))_{lk}
$$
</font>

<font size=4>
&emsp;&emsp;其中，adj(A) 表示矩阵 A 的伴随矩阵。因此
</font>

<font size=4>
$$
\triangledown_A\big|\,A\,\big|=(adj(A))^T=\big|\,A\,\big|\,(A^{-1})^T
$$
</font>

<font size=4>
下证式（3）：
</font>

<font size=4>
&emsp;&emsp;因为，矩阵 A 为正定矩阵，所以，$\big|\,A\,\big|>0$，即 $log\,\big|\,A\,\big|$ 存在，由
</font>

<font size=4>
$$
\frac{\partial\,log\,\big|\,A\,\big|}{\partial\,A_{ij}}=\frac{\partial\,log\,\big|\,A\,\big|}{\partial\,\big|\,A\,\big|}\frac{\partial\,\big|\,A\,\big|}{\partial\,A_{ij}}=\frac{1}{\big|\,A\,\big|}\frac{\partial\,\big|\,A\,\big|}{\partial\,A_{ij}}
$$
</font>

<font size=4>
&emsp;&emsp;以及式（2）可得
</font>

<font size=4>
$$
\triangledown_Alog\,\big|\,A\,\big|=\frac{1}{\big|\,A\,\big|}\triangledown_A\,\big|\,A\,\big|=A^{-1}
$$
</font>

<font size=4>
&emsp;&emsp;因为，矩阵 A 为对称矩阵，所以，上式最后的结果没有转置符号。
</font>

<font size=4>
下证式（4）：
</font>

<font size=4>
&emsp;&emsp;由
</font>

<font size=4>
$$
\frac{\partial\,(x^TAx)}{\partial\,A_{lk}}=\frac{\partial}{\partial\,A_{lk}}\sum_i\sum_jA_{ij}x_ix_j=x_lx_k
$$
</font>

<font size=4>
&emsp;&emsp;可得
</font>

<font size=4>
$$
\triangledown_Ax^TAx=xx^T
$$
</font>

### 1.3.2 推导

In [None]:
<font size=4>
&emsp;&emsp;
</font>