# Question 4

## Nearest centroid classifier

The nearest centroid classifier (NCC) is a classification model that compares a new instance with the estimated mean (centroid) vector of each class. The vector is then labeled with the class whose mean vector is closest.

Consider:
1. A set of classes, $\Omega = \left\{1, 2, \cdots, C\right\}$, where $C$ is the number of classes.
1. A training set, given by $\left\{\mathbf{x}_n, \omega_n\right\}_{n=1}^N$, where $N$ is the size of the training set, and $\omega_n \in \Omega$ is the label for the $n$-th instance of the training set.

the nearest centroid classifier can be described as follows:

1. Training phase
    1. Compute the estimated mean vector, given by: $$\hat{\mathbf{m}}_i = \frac{1}{\mid N_i\mid} \sum_{\forall n \in N_i} \mathbf{x}_n ,$$ where $N_i$ is a set of training samples belonging to the class $i$, and $\mid N_i\mid$ is its cardinality.
1. Test phase
    1. For a new instance, $\mathbf{x}_n$, where $n>N$, compute the Euclidean distance between the new vector and each estimated mean vector, $d(\mathbf{x}_n, \hat{\mathbf{m}}_i)$.
    1. Decide by the class $i^\star$, where $$d(\mathbf{x}_n, \hat{\mathbf{m}}_{i^\star}) < d(\mathbf{x}_n, \hat{\mathbf{m}}_i), \forall i\neq i^\star \text{ and } i,i^\star \in \Omega$$

Let us define the estimated mean vectors
$$\hat{\mathbf{m}}_1 = \begin{bmatrix} 51.69 & 12.82 & 43.54 & 38.86 \end{bmatrix}^\mathsf{T}$$
$$\hat{\mathbf{m}}_2 = \begin{bmatrix} 71.51 & 20.75 & 64.11 & 50.77 \end{bmatrix}^\mathsf{T}$$
$$\hat{\mathbf{m}}_3 = \begin{bmatrix} 47.64 & 17.40 & 35.46 & 30.24 \end{bmatrix}^\mathsf{T}$$
and the new instance
$$\mathbf{x}_n = \begin{bmatrix} 80.07 & 48.07 & 52.40 & 32.01 \end{bmatrix}^\mathsf{T}$$

In [36]:
from numpy import array, delete, set_printoptions
from numpy.linalg import norm

x   = array([80.07, 48.07, 52.40, 32.01])
m_1 = array([51.69, 12.82, 43.54, 38.86])
m_2 = array([71.51, 20.75, 64.11, 50.77])
m_3 = array([47.64, 17.40, 35.46, 30.24])

# set float precision of numpy array
set_printoptions(precision=2)

# compute the Euclidean distances
euclidean_dist = list(map(norm, (x-m_1, x-m_2, x-m_3)))
# winning Euclidean distance
min_euc_dist = min(euclidean_dist)
# winning class
winning_class = euclidean_dist.index(min_euc_dist) + 1
# other Euclidean distances
other_euc_dist = delete(euclidean_dist, winning_class-1)

print(f"The new vector is labeled to class {winning_class}, its Euclidean distance was {min_euc_dist:.2f}, whereas the other ones were {other_euc_dist}")

The new vector is labeled to class 2, its Euclidean distance was 36.18, whereas the other ones were [46.62 47.77]


# Question 5

## Maximum correlation coefficient classifier

The maximum correlation classifier has almost the same behavior as the nearest centroid classifier, but it uses the normalized data's inner product (a similarity measurement) as the decision criterion. Therefore, the decision rule becomes

$$\left<\bar{\mathbf{x}}_n, \bar{\mathbf{m}}_{i^\star} \right> > \left<\bar{\mathbf{x}}_n, \bar{\mathbf{m}}_{i}\right> , \forall i\neq i^\star \text{ and } i,i^\star \in \Omega,$$

where $\bar{\mathbf{v}} = \mathbf{v}/\lVert \mathbf{v} \lVert$, being $\lVert \mathbf{v} \lVert$ the norm of $\mathbf{v}$.

In [37]:
from numpy import array, inner, set_printoptions

x   = array([80.07, 48.07, 52.40, 32.01])
m_1 = array([51.69, 12.82, 43.54, 38.86])
m_2 = array([71.51, 20.75, 64.11, 50.77])
m_3 = array([47.64, 17.40, 35.46, 30.24])

# set float precision of numpy array
set_printoptions(precision=2)

# inner products
inner_prods = list(map(inner, (x, x, x), (m_1, m_2, m_3)))
# winning inner product
max_inner_prod = max(inner_prods)
# winning class
winning_class = inner_prods.index(max_inner_prod) + 1
# other inner products
other_inner_prods = delete(inner_prods, winning_class-1)

print(f"The new vector is labeled to class {winning_class}, its inner product was {max_inner_prod:.2f}, whereas the other ones were {other_inner_prods}")

The new vector is labeled to class 2, its inner product was 11707.77, whereas the other ones were [8280.48 7477.04]
