<img src="images/ublogo.png"/>

### CSE610 - Bayesian Non-parametric Machine Learning

  - Lecture Notes
  - Instructor - Varun Chandola
  - Term - Fall 2020

### Objective
The objective of this notebook is to provide detailed discussions about using Gaussian Processes (GP) for classification problems.

<div class="alert alert-info">

**Note:** This material is based on Chapter 3 of the GPML book.

</div>

We now turn out attention to *classification* problems, where the target is not real, but categorical, i.e., $y \in \{C_1,C_2,\ldots,C_\mathcal{C}\}$, where $\mathcal{C}$ is the number of possible classes that the target can be in. We will work with probabilistic classification, where we are interested in:
$$
P(Y = C_1\vert X = {\bf x})
$$
> There are two general approaches to building probabilistic classifiers. One is to start with the joint probability distribution, $p({\bf x},y)$, which can be decomposed as $p({\bf x}\vert y)p(y)$ and then use *Bayes rule* to get $p(y\vert{\bf x})$. The other is to directly model $p(y\vert {\bf x})$ without making any assumptions about $p({\bf x})$. The former approach is called **generative models** and the latter approach is called **discriminative models**. 

<div class="alert alert-warning">
Gaussian process classification is discriminative.
</div>

We will see that, to a certain extent, applying GP to probabilistic classification is similar to applying GP to regression, as we saw earlier. However, the problem is much more demanding due to one key reason: *the likelihood function in a classification setting cannot be Gaussian*. This leads to a challenging issue that requires approximate solutions.

### Linear logistic regression model
Consider a binary classification setting, i.e., the targets can either be $+1$ or $-1$. In the logistic regression model, we directly model the conditional probability for the targets, as follows:
$$
p(y = +1 \vert {\bf x},{\bf w}) = \sigma({\bf w}^\top{\bf x})
$$
> $\sigma(\cdot)$ can be any *sigmoid* function, which is any monotonically increasing function from $\mathbb{R} \rightarrow [0,1]$. 

> A popular choice for $\sigma(\cdot)$ is the *logistic response function*, such that $\sigma(z) = \frac{1}{1 + \exp{-z}}$, in which case the classification model is called **linear logistic regression** (or just **logistic regression**).

> However, there are other possible choices as well. For instance, the **linear probit regression** model uses the probit function, which is the cumulative density of a standard normal distribution, i.e., $\sigma(z) = \int_{-\infty}^z\mathcal{N}(u\vert 0,1)du$

<div class="alert alert-info">
**Note:** For the subsequent discussions, we will use the logistic function as the response function.
</div>

Using this model, we can get the probability of $y$ to be any one of the labels:
$$
p(y = +1 \vert {\bf x},{\bf w}) = \sigma({\bf w}^\top{\bf x})\text{;   } p(y = -1 \vert {\bf x},{\bf w}) = 1 - p(y = +1 \vert {\bf x},{\bf w}) = \sigma(-{\bf w}^\top{\bf x})
$$
The above result uses the fact that the logistic function is symmetric, i.e., $1 - \sigma(z) = \sigma(-z)$.

Thus, if we are given a training data set, $\mathcal{D} = \{({\bf x}_i,y_i)\vert i = 1,\ldots,N\}$, the probability of the $i^{th}$ label is given by:
$$
p(y_i\vert {\bf x}_i,{\bf w}) = \sigma(y_i{\bf w}^\top{\bf x}_i)
$$

#### The non-Bayesian formulation of logistic regression
To learn the optimal weights, ${\bf w}$, one can calculate the log-likelihood of the training data set, i.e., 
$$
\mathcal{L}(\mathcal{D}\vert {\bf w}) = \sum_{i=1}^N \log{\sigma(y_i){\bf w}^\top{\bf x}_i}
$$
and then find the ${\bf w}$ that maximizes the log-likelihood. This can be done using a gradient based optimizer.

#### The Bayesian formulation of logistic regression
Assuming a Gaussian prior on ${\bf w}$, i.e., ${\bf w} \sim \mathcal{N}({\bf 0},\Sigma_p)$, the posterior distribution for ${\bf w}$ will be:
$$
p({\bf w}\vert \mathcal{D}) = \frac{p(\mathcal{D}\vert{\bf w})p({\bf w})}{\int p(\mathcal{D}\vert{\bf w}')p({\bf w}')d{\bf w}'}
$$

Now, this is the start of all our troubles :(