# Part 4: Classification

In this notebook, we will exploit quantitative measurements in order to study different phenotypic groups present in a large image dataset. The end goal will be to group samples (whether full images or individual objects) into different classes, a process referred to as *classification*. We will here review commonly-used (non-machine-learning) strategies for classification.

In [None]:
import os
import numpy as np
import imageio.v2 as imageio
import matplotlib.pyplot as plt

plt.rcParams['figure.dpi'] = 200

## 1. Data loading

**2.1** Load feature matrix for the entire BBBC010 dataset - this is a per-image feature matrix

## 2. Feature selection and dimensionality reduction

**2.1** Investigate feature distributions with violin plots

**2.2** Fisher score for feature selection

**2.3** Principal component analysis

## 3. Classification

**3.1** K-means clustering

**3.2** Linear Discriminant Analysis (LDA)

## 4. Statistics

**4.1**  Kolmogorovâ€“Smirnov test - check if the samples from both clusters are drawn from the same distributions

**4.2** Correlation matrices for each feature

## 5. Evaluating classification performance

**5.1** Load GT

**5.2** Classification metrics

**5.3** Confusion matrix

## BONUS. Classifying individual objects

**6.1** Load the per-object feature matrix

**6.2** Adapt the analysis we did above to classify individual worms into 2 catergories: dead or alive

**6.3** Identify images in which there are misclassified worms and find a good way to visualize the result