# **Financial Applicactions with ML & AI**

<img style="float: right;" src="../../docs/img/logo_bourbaki.png" width="100"/>

## **Module I:** Default Analysis
#### Topic:  Clasification with Linear Discriminant Analysis

##### Name: Julio César Avila Torreblanca

- **Objective**: apply linear discriminant analysis to a default classification problem.

- **Contents**:
    - Notes:
        - LDA Algorithm
    - Code:
        1. Libraries and parameters
        2. Read Data
        3. EDA
        4. Modeling
        5. Evaluation
        6. Conclusions

----


# NOTES: Linear Discriminant Analysis for Default Classification

### Introduction to Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) is a supervised machine learning algorithm that is primarily used for classification purposes. It seeks to find a linear combination of features that characterizes or separates two or more classes (in this case, defaulters and non-defaulters).

### Purpose of LDA in Default Classification

In the context of financial applications, LDA can be employed to classify whether a client will default or not on a loan based on historical data. The algorithm helps in reducing dimensionality while preserving as much class discriminatory information as possible.

### Mathematical Formulation

LDA works by modeling the difference between the classes as linear equations. It aims to maximize the ratio of between-class variance to the within-class variance, thereby ensuring maximum separability. The mathematical formulation involves:

1. Compute the mean vectors for each class.
2. Compute the within-class scatter matrix ($S_W$):

$$S_W = \sum_{i=1}^{c} \sum_{x \in D_i} (x - \mu_i)(x - \mu_i)^T$$

3. Compute the between-class scatter matrix ($S_B$):
$$S_B = \sum_{i=1}^{c} N_i (\mu_i - \mu)(\mu_i - \mu)^T$$

4. Solve the generalized eigenvalue problem for the matrix ($S_W^{-1}S_B$).

The solution of this problem gives us the linear discriminant coefficients.

### Implementation Steps

1. Standardize the dataset.
2. Compute the LDA components.
3. Project the data onto the LDA components.
4. Use the projections for classification.

### Advantages of LDA

- **Dimensionality Reduction**: LDA can reduce the number of features needed while retaining the class discriminatory information.
- **Computationally Efficient**: Compared to other classification methods like SVM and neural networks, LDA is relatively fast.
- **Interpretability**: The linear combinations of features are easy to interpret.
- **Captures Class Covariance**: Takes into account the covariance between the different classes.

### Disadvantages of LDA

- **Assumption of Normality**: LDA assumes that data is normally distributed, which may not hold true in many real-world applications.
- **Linearity Assumption**: It assumes linear decision boundaries, which may not capture complex relationships well.
- **Sensitive to Outliers**: Being based on the mean, LDA is sensitive to outliers which can mislead the results.
- **Not Suitable for Non-Gaussian Distributions**: If the Gaussian assumption is seriously violated, LDA will perform poorly.

---


# Coding

# 1. Libraries and Parameters

In [1]:
import pandas as pd
import numpy as np

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import classification_report, confusion_matrix

# 2. Libraries and parameters

In [None]:
df = pd.read_excel("/content/drive/MyDrive/Cruso-ApsFinancieras/semana1/lending_clubFull_Data_Set.xlsx", index_col=0)
df

# 3. EDA

#### Miss values

In [None]:
df.isna().sum().sort_values()

In [None]:
na_values = (df.isna().sum().sort_values() / len(df)).reset_index(name = 'n')
na_values

In [None]:
aux = na_values[na_values['n']>0.1]
aux

In [None]:
columns_to_drop = list(aux['index'])
columns_to_drop

# 3. Random Walk Analysis

# 4. Conclusions