Linear Discriminant Analysis(LDA) is a machine learning classification algorithm. In this repository, we implement this model from scratch (no built in library). We use Raisin Dataset from UCI Machine Learning Repository. The dataset has 2 classes and 7 features.
First, clone the repo using the following command line:
git clone https://github.com/zillur-av/LDA.git
This project is an implementation of Linear Discriminant Analysis (LDA) from scratch in Python. LDA is a classification algorithm used in machine learning that finds a linear combination of features that best separates two or more classes of objects or events. The algorithm is used extensively in pattern recognition and has applications in fields such as image and speech recognition.
LDA works by finding a linear combination of features that maximizes the ratio of the between-class scatter to the within-class scatter. The goal is to find a decision boundary that best separates the classes, by minimizing the overlap between the different classes while maximizing the distance between them.
The algorithm consists of the following steps:
-
Calculate the mean vector for each class.
-
Calculate the within-class scatter matrix
S_w
. -
Calculate the between-class scatter matrix
S_b
. -
Compute the optimal weights
w
and bias termw_0
that maximize the ratio ofS_b
toS_w
. -
Use the decision boundary defined by
w
andw_0
to classify new examples.
The following equations are used in the implementation:
n_c
is the number of examples in class c, x_i
is the i-th example, and \mu_c
is the mean vector for class c.
n_c
is the number of examples in class c, x_i
is the i-th example, \mu_c
is the mean vector for class c, C is the total number of classes, and T denotes the transpose of a matrix.
n_c
is the number of examples in class c, \mu_c
is the mean vector for class c, C is the total number of classes, and \mu
is the mean vector for all classes.
S_w
is the within-class scatter matrix, and \mu_0
and \mu_1
are the mean vectors of the two classes.
\mu_0
and \mu_1
are the mean vectors of the two classes.