# Modern Data Science 
**(Module 03: Pattern Classification)**

---
- Materials in this module include resources collected from various open-source online repositories.
- You are free to use, change and distribute this package.

Prepared by and for 
**Student Members** |
2006-2018 [TULIP Lab](http://www.tulip.org.au), Australia

---


# Session B - Decision Boundaries


### Bayes' Rule:


$P(\omega_j|x) = \frac{p(x|\omega_j) * P(\omega_j)}{p(x)}$ 

### Discriminant Functions:

The goal is to maximize the discriminant function, which we define as the posterior probability here to perform a **minimum-error classification** (Bayes classifier).

$g_1(\vec{x}) = P(\omega_1 | \; \vec{x}), \quad  g_2(\vec{x}) = P(\omega_2 | \; \vec{x})$

$\Rightarrow g_1(\vec{x}) = P(\vec{x}|\;\omega_1) \;\cdot\; P(\omega_1) \quad | \; ln \\
\quad g_2(\vec{x}) = P(\vec{x}|\;\omega_2) \;\cdot\; P(\omega_2) \quad | \; ln$

<br>
We can drop the prior probabilities (since we have equal priors in this case): 

$\Rightarrow g_1(\vec{x}) = ln(P(\vec{x}|\;\omega_1))\\
\quad g_2(\vec{x}) = ln(P(\vec{x}|\;\omega_2))$
$\Rightarrow g_1(\vec{x}) = \frac{1}{2\sigma^2} \bigg[\; \vec{x}^{\,t} - 2 \vec{\mu_1}^{\,t} \vec{x} + \vec{\mu_1}^{\,t} \bigg] \mu_1 \\ 
= - \frac{1}{2} \bigg[ \vec{x}^{\,t} \vec{x} -2 \; [0 \;\; 0] \;\; \vec{x} +  [0 \;\; 0] \;\; \bigg[ 
\begin{array}{c}
0 \\
0 \\
\end{array} \bigg] \bigg] \\
= -\frac{1}{2} \vec{x}^{\,t} \vec{x}$

$\Rightarrow g_2(\vec{x}) = \frac{1}{2\sigma^2} \bigg[\; \vec{x}^{\,t} - 2 \vec{\mu_2}^{\,t} \vec{x} + \vec{\mu_2}^{\,t} \bigg] \mu_2 \\ 
= - \frac{1}{2} \bigg[ \vec{x}^{\,t} \vec{x} -2 \; 2\;  [1 \;\; 1] \;\; \vec{x} +  [1 \;\; 1] \;\; \bigg[ 
\begin{array}{c}
1 \\
1 \\
\end{array} \bigg] \bigg] \\
= -\frac{1}{2} \; \bigg[ \; \vec{x}^{\,t} \vec{x} - 2\;  [1 \;\; 1] \;\; \vec{x} + 2\; \bigg] \;$

### Decision Boundary

$g_1(\vec{x}) = g_2(\vec{x})$ 

$\Rightarrow  -\frac{1}{2} \vec{x}^{\,t} \vec{x} = -\frac{1}{2} \; \bigg[ \; \vec{x}^{\,t} \vec{x} - 2\;  [1 \;\; 1] \;\; \vec{x} + 2\; \bigg] \;$ 

$\Rightarrow -2[1\;\; 1] \vec{x} + 2 = 0$

$\Rightarrow [-2\;\; -2] \;\;\vec{x} + 2 = 0$

$\Rightarrow -2x_1 - 2x_2 + 2 = 0$

$\Rightarrow -x_1 - x_2 + 1 = 0$|

# Linear Discriminant Analysis
In this part, we will do lda on a synthetic data set. That means we will generate the data ourselves and then fit a linear classifier to this data.

### Step1: Create data set

We are going to sample 500 points each from three 2d gaussian distributions. The means of the three gaussians are $\mu_1 = [a, b]^T$, $\mu_2 = [a+2, b+4]^T$ and $\mu_3 = [a+4, b]^T$ respectively, where **a** is *the last digit of your roll number* and **b** is *second last digit of your roll number*. <br>
Similarly the covariance matrices are $\Sigma_1 = \Sigma_2 = \Sigma_3 = I$ <br>
To generate points from 2d gaussians, we should first know how to generate random numbers.

##### How to generate random numbers?
use numpy random package.

In [None]:
%matplotlib inline
# code to sample a random number between 0 & 1
# Try running this multiple times by pressing Ctrl-Enter
import numpy as np
import matplotlib.pyplot as plt
import sympy as sp
from sympy import *
from numpy.linalg import solve

print (np.random.random())

##### How to sample from a gaussian?
Use randn function to sample from a 1D gaussian with mean 0 and variance 1.

In [None]:
print (np.random.randn())

##### Let's sample 1000 points!
Use random.normal(mu, sigma, number of points). Let'us assume mean is 3.

In [None]:
points = np.random.normal(3, 1, 1000)
# A histogram plot. It looks like a gaussian distribution centered around 3
plt.hist(points)
plt.show()

##### Generate samples from a 2D gaussian
Use random.multivariate_normal(mean, cov, 100) to generate 100 points from a multivariate gaussian

In [None]:
mean = np.array([3, 3])
cov = np.eye(2) # the identity matrix

points = np.random.multivariate_normal(mean, cov, 100)
# scatter plot with x axis as the first column of points and y axis as the second column
plt.scatter(points[:, 0], points[:, 1])
plt.show()

#### Sample from three different 2D gaussians
The means of the three gaussians should be $\mu_1 = [a, b]^T$, $\mu_2 = [a+2, b+4]^T$ and $\mu_3 = [a+4, b]^T$ respectively, where **a** is *the last digit of your roll number* and **b** is * the second last digit of your roll number*. <br>
Similarly the covariance matrices are $\Sigma_1 = \Sigma_2 = \Sigma_3 = I$ <br>

In [None]:
#covariance matrix for all 3 distributions
cov = np.eye(2)

d1 = np.random.multivariate_normal([7, 5], cov, 500)
d2 = np.random.multivariate_normal([9, 9], cov, 500)
d3 = np.random.multivariate_normal([11, 5], cov, 500)

data = np.vstack([d1, d2, d3])
plt.scatter(d1[:, 0], d1[:, 1], color='red')
plt.scatter(d2[:, 0], d2[:, 1], color='blue')
plt.scatter(d3[:, 0], d3[:, 1], color='green')
plt.show()

### Step2: Estimate the Parameters
##### Estimate 3 means and a covariance matrix from data
We have assumed that $\Sigma = \sigma^2 I$. <br>
Convince yourself that the Maximum Likelihood Estimate for $\sigma^2$ is $\frac{1}{2n}\sum\limits_{i=1}^n (x_i-\mu)^T(x_i-\mu)$, where $n$ is the number of samples. <br>

Let's compute the maximum likelihood estimates for the three sets of data points (generated from 3 different gaussians) separately, denote them as $\hat\sigma_1^2$, $\hat\sigma_2^2$ and $\hat\sigma_3^2$ and then take the combined estimate as the averae of the three estimates.

In [None]:
#MLE of mean of the 3 distributions
m1 = np.mean(d1, axis = 0)
m2 = np.mean(d2, axis = 0)
m3 = np.mean(d3, axis = 0)

print(m1)
print(m2)
print(m3)

#MLE of covariance of the 3 distributions

t1 = d1 - m1
s1 = np.trace(np.dot(np.transpose(t1), t1)) / (2*500)

t2 = d2 - m2
s2 = np.trace(np.dot(np.transpose(t2), t2)) / (2*500)

t3 = d3 - m3
s3 = np.trace(np.dot(np.transpose(t3), t3)) / (2*500)

print(s1)
print(s2)
print(s3)

#Combined estimate - the average of the 3 estimates
s = (s1 + s2 + s3)/3
print(s)

### Step3: Draw the Decision Boundaries
Refer your notes/textbook to convince yourself that in the particular case where all the normal distributions have the same prior and the same covariance matrix of the form $\sigma^2I$, the discriminant functions are given by $$g_i(x) = \mu_i^Tx - \frac{1}{2}\mu_i^T\mu_i$$Find the point at which $g_1(x) = g_2(x) = g_3(x)$ <br>
Draw the three decision boundaries by solving $g_1(x) = g_2(x)$, $g_1(x) =  g_3(x)$ and $g_2(x) = g_3(x)$

In [None]:
plt.scatter(d1[:, 0], d1[:, 1], color='red')
plt.scatter(d2[:, 0], d2[:, 1], color='blue')
plt.scatter(d3[:, 0], d3[:, 1], color='green')

#Solving g1(x)=g2(x) to get the common point of intersection
a = np.array([[m1.item(0)-m2.item(0), m1.item(1)-m2.item(1)],
              [m1.item(0)-m3.item(0), m1.item(1)-m3.item(1)]])
b = np.array([0.5 * ((m1.item(0) * m1.item(0)) + (m1.item(1) * m1.item(1)) - 
                     (m2.item(0) * m2.item(0)) - (m2.item(1) * m2.item(1))),
              0.5 * ((m1.item(0) * m1.item(0)) + (m1.item(1) * m1.item(1)) - 
                     (m3.item(0) * m3.item(0)) - (m3.item(1) * m3.item(1)))])
sol = np.linalg.solve(a, b)
print('Common point :')
print (sol)
plt.scatter(sol.item(0), sol.item(1), color='black')

#Plot the decision boundary g1(x)=g2(x)
a = m1.item(0)-m2.item(0)
b = m1.item(1)-m2.item(1)
c = 0.5 * ((m1.item(0) * m1.item(0)) + (m1.item(1) * m1.item(1)) - 
           (m2.item(0) * m2.item(0)) - (m2.item(1) * m2.item(1)))
x = np.linspace(2, sol.item(0), 20)
y = (c - (a*x)) /b
plt.plot(x, y, color='blue')   

#Plot the decision boundary g1(x)=g3(x)
a = m1.item(0)-m3.item(0)
b = m1.item(1)-m3.item(1)
c = 0.5 * ((m1.item(0) * m1.item(0)) + (m1.item(1) * m1.item(1)) -
           (m3.item(0) * m3.item(0)) - (m3.item(1) * m3.item(1)))
x = np.linspace(sol.item(0), 9.2, 10)
y = (c - (a*x)) /b
plt.plot(x, y, color='red')  

#Plot the decision boundary g3(x)=g2(x)
a = m3.item(0)-m2.item(0)
b = m3.item(1)-m2.item(1)
c = 0.5 * ((m3.item(0) * m3.item(0)) + (m3.item(1) * m3.item(1)) -
           (m2.item(0) * m2.item(0)) - (m2.item(1) * m2.item(1)))
x = np.linspace(sol.item(0), 15, 20)
y = (c - (a*x)) /b
plt.plot(x, y, color='green')   

plt.axis([3,15,0,13])
plt.show()