# 6. Kernel Methods

### *Table of Contents*

* 6.1 [Dual Representations](#6.1-Dual-Representations)

In [Chapter 3](ch3_linear_models_for_regression.ipynb) and [4](ch4_linear_models_for_classification.ipynb), we considered linear parametric models governed by a vector $\mathbf{w}$ of adaptive parameters. During the learning phase, a set of training data is used to obtain a point estimate of the parameter vector or determine their posterior distribution. Then, the training set may be discarded and predictions are based only on the learned parameters. The same approach is employed for non-linear models such as neural networks.

However, there is a class of techniques, in which the training data are kept and used also in the prediction phase. For instance, *memory-based* methods, such as Parzen density models and nearest-neighbors, store the entire training set in order to make predictions for future data points. These methods typically require a metric that measures the similarity of any pair of vectors in the input space. They are generally fast to train, because they just store the training data, and slow at making predictions, because they have to pass over the training set, possibly multiple times.

Interestingly, many linear parametric models can be re-cast into an equivalent *dual representation* in which the predictions are also based on linear combinations of a *kernel function* evaluated on the training data points. Assuming models based on a fixed nonlinear *feature space* mapping $\boldsymbol\phi(\mathbf{x})$, the kernel function is defined by

$$
k(\mathbf{x},\mathbf{x}') = \boldsymbol\phi(\mathbf{x})^T\boldsymbol\phi(\mathbf{x}')
$$

where the kernel is a symmetric function of its arguments $k(\mathbf{x},\mathbf{x}')=k(\mathbf{x}',\mathbf{x})$. To that end, the simplest example of a kernel function is obtained by considering the identity feature mapping, which is $\boldsymbol\phi(\mathbf{x})=\mathbf{x}$, and thus $k(\mathbf{x},\mathbf{x}')=\mathbf{x}^T\mathbf{x}'$ is referred as the **linear kernel**.

The concept of a kernel formulated as an inner product in a feature space allows us to build interesting extensions of well-known algorithms by making use of the *kernel trick*, also known as *kernel substitution*. The idea is to replace the scalar product of the input vector $\mathbf{x}$ in the formulation of interest with any kernel.

> One of the most significant developments has been the extension of kernels to handle symbolic objects.

## 6.1 Dual Representations