## Classifying firms credit rating using three Models.

### 1. INTRODUCTION
Aim is to classify a firm’s credit rating into one of the 16 categories and to predict whether a firm is considered investment grade or not. To do so dataset is split into two set as 80% training set and 20% testing set. Two methodology is being followed, first one is by creating dummy variables and adding it to dataset and second one is without using dummy variables. Last column denotes the rating, while the second-to-last column denotes Investment grade value. ‘1’ indicates assets are of investment grade and ‘0’ indicates they are not.

### 2.Methodology
#### 2.1 Data cleaning and exploration

<div style="text-align:justify">
Firstly, checking whether dataset has any null values inside it using function “isnull()”. As there are no null values in this dataset, no need of using any function to remove null from dataset. In every modelling process two parameters are selected as input features (input variables) and target variable (output variable). From the data set and aim of modelling, it is obvious to select input features other than last two columns of dataset (InvGrd&Rating). And Target variable will be “InvGrd” column, because classification is based on “0’s” and “1’s” of InvGrd column values. Data is split into two sets using train_test_split() function. Function random_state is used to split the data in similar random way every time program runs. Later training and testing data is computed using iloc[] function .
</div>  

#### 2.2 Using dummy variables
Here Along with 26 input features, 16 more dummy variables are added using “pd. merge ()” function. Rating column which has 16 categories are converted into dummy variables using function “pd.get_dummies()” and merged along with previous features using indexing and left join method.


#### 2.3 Without using dummy variables
Here only 26 categories are used as input features, hence there are no dummy variables present.

### MODELS
ALl the models here are performed using dummy variables
#### 1. Linear Regression
In this model , parameter alpha is used for regularization, which helps to reduce overfitting. Ridge() function is used to create this regression object with an alpha value of 0.1.

In [None]:
#import pandas and assign it as pd
import pandas as pd
#'train_test_split' imported from 'sklearn.model_selection'
#'train_test_split' used to split a dataset into training and testing subsets for model evalution
from sklearn.model_selection import train_test_split
#Used to perform read or write files,here used to set directory where csv file is present.
import os
#Import 'Ridge' and 'Lasso' from linear model for regularized linear regression 
from sklearn.linear_model import Ridge, Lasso
#Import accuracy_score from metrics for calculating accuracy score for classification models
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import scale

In [22]:
os.chdir('F:/SUB3 - Big data for computational finance/Project')

In [23]:
# Load the dataset
Credit_data = pd.read_csv('MLF_GP1_CreditScore.csv')

In [24]:
### Get the dummy variables of Rating and merge it with the dataset
Credit_data_dummies=pd.merge(Credit_data.reset_index(),
         pd.get_dummies(Credit_data["Rating"]).reset_index(),
        left_on="index",right_on="index",how="left").drop("index",axis=1)

In [25]:
from sklearn.preprocessing import scale
### X and y
A=scale(Credit_data_dummies.drop(['InvGrd', 'Rating'],axis=1).values)
B=Credit_data_dummies["InvGrd"].values
### Train and Test
A_train,A_test,B_train,B_test=train_test_split(A,B,test_size=0.25,random_state=50)

In this model , parameter alpha is used for regularization, which helps to reduce overfitting. Ridge() function is used to create this regression object with an alpha value of 0.1.
Ridge gave same accuracy(0.77) when alpha is 0.01,0.1,1. 
Lasso gave accuracy as 0.75, for alpha = 0.1 and 0.76, for alpha=0.01. 
Hence setting alpha value as 0.1 for ridge and 0.01 for lasso regression. 

In [26]:

# Train a linear regression model with Ridge regularisation
ridge = Ridge(alpha=0.1)
ridge.fit(A_train, B_train)

# Train a linear regression model with Lasso regularisation
lasso = Lasso(alpha=0.1)
lasso.fit(A_train, B_train)


Lasso(alpha=0.1)

In [28]:

# Predict the target variable for the testing set
B_pred_ridge = ridge.predict(A_test)
B_pred_lasso = lasso.predict(A_test)

# Convert the predicted values to binary (1 if investment grade, 0 if not)
B_pred_ridge_bin = [1 if a >= 0.5 else 0 for a in B_pred_ridge]
B_pred_lasso_bin = [1 if a >= 0.5 else 0 for a in B_pred_lasso]

In [29]:
# Compute the accuracy of the models
accuracy_ridge = accuracy_score(B_test, B_pred_ridge_bin)
accuracy_lasso = accuracy_score(B_test, B_pred_lasso_bin)

print(f"Accuracy of Ridge Regression Model: {accuracy_ridge:.2f}")
print(f"Accuracy of Lasso Regression Model: {accuracy_lasso:.2f}")


Accuracy of Ridge Regression Model: 1.00
Accuracy of Lasso Regression Model: 0.95


### 2. Logistic Regression

<div style="text-align:justify">
It implements logistic regression, which is used for binary classification. In this model , parameter penalty is used for regularization, which helps to reduce overfitting. When penalty='l1', solver='saga' for both ridge and lasso , there is warning saying coefficients did not withing given number of iterations and accuracy was 0.72 for both. 
When penalty='l1', solver='saga' , max_iter=’10000’ it gave 0.75 without any warning. 
When ridge = LogisticRegression(penalty='l1', solver='liblinear') , accuracy = 0.76. 
When lasso = LogisticRegression(penalty='l2', solver='liblinear') , accuracy = 0.77.
When ridge = LogisticRegression(penalty='l2', solver='liblinear') , accuracy = 0.77. 
When solver is newton-cg, it behaves same as liblinear. Hence selecting penalty as l2 for both and solver as liblinear.
</div>

In [30]:
#Import 'LogisticRegression' from linear model present in 'scikit-learn(library)'
from sklearn.linear_model import LogisticRegression

In [31]:
#LogisticRegression() is used to add penalty of l1 regularisation with 'liblinear' solver
#liblinear is well suited for l1
ridge = LogisticRegression(penalty='l1', solver='liblinear')
ridge.fit(A_train, B_train)

#LogisticRegression() is used to add penalty of l2 regularisation with 'liblinear' solver
#liblinear is well suited for l2
lasso = LogisticRegression(penalty='l2', solver='liblinear')
lasso.fit(A_train, B_train)

#ridge.predict() takes X_test(input variables) as input 
#And predicts target variable(InvGrd)
B_pred_ridge = ridge.predict(A_test)
#lasso.predict() takes X_test(input variables) as input 
#And predicts target variable(InvGrd)
B_pred_lasso = lasso.predict(A_test)

#Calculates the accuracy of binay predictions made by respective regression model.
accuracy_ridge = accuracy_score(B_test, B_pred_ridge)
accuracy_lasso = accuracy_score(B_test, B_pred_lasso)

#Accuracy value is the printed by rounding of two decimals(.2f)
print(f"Accuracy of Ridge Logistic Regression Model: {accuracy_ridge:.2f}")
print(f"Accuracy of Lasso Logistic Regression Model: {accuracy_lasso:.2f}")

Accuracy of Ridge Logistic Regression Model: 1.00
Accuracy of Lasso Logistic Regression Model: 1.00


### 3. Neural Network

<div style="text-align:justify">
Initially data is pre-processed by scaling input features to improve performance of model. Later mean and standard deviation of data is calculated and stored. This statistics data is used while scaling. Model approaches sequential architecture. By iteratively modifying the weights and biases during the training phase, dense layers are utilised to learn complicated patterns and representations from the input data. An activation function is used to calculate the dense layer's output, adding non-linearity to the model and enabling the discovery of non-linear correlations in the data. Rectified linear unit is used as activation function to reduce the vanishing gradient problem. Sequential layers are chosen mainly because information is flowing in one direction, from input to output. Dense () tells that neurons are interconnected. At the end sigmoid activation is used for binary classification. Optimizer = ‘adam’ , Adam algorithm is used as it automatically adjusts the learning rate for each parameter compared to fixed learning rate methods. As binary classification is performed to achieve classification in this model, binary_crossentropy is used to calculate loss between two predicted outputs. ‘accuracy’ is a list containing the accuracy metric, which measures the amount of correctly predicted samples out of total samples.
</div>

In [32]:
#StandardScaler used to normalize the feature of dataset
#Scaling to have zero mean and unit variance
from sklearn.preprocessing import StandardScaler
#Sequential is used to add one layer at a time in sequence and
#Specify the input and output dimensions
from keras.models import Sequential
#Dense is used for classification
#It consists of interconnected nodes
from keras.layers import Dense

In [33]:
#Standardizes the range of input features
scaler = StandardScaler()
#Computes mean and standard deviation of the training data and scales based on this statistics
A_train = scaler.fit_transform(A_train)
#Similarly as above test data is scaled.
A_test = scaler.transform(A_test)

# Creates a Sequential neural network model
#Here layers can be addded sequentially
model = Sequential()
#Adding dense layer with 64 neurons to the model,
#with rectified linear unit(ReLU) activation function
#input_dim specifies dimension of input features
model.add(Dense(64, activation='relu', input_dim=X_train.shape[1]))
#Adding dense layer with 32 neurons to the model,
#with rectified linear unit(ReLU) activation function
model.add(Dense(32, activation='relu'))
#Adding dense layer with 16 neurons to the model,
#with rectified linear unit(ReLU) activation function
model.add(Dense(16, activation='relu'))
#Adds output layer with 1 neuron and sigmoid function,used for binary classification
model.add(Dense(1, activation='sigmoid'))

#Configure training process
#optimizer specifies algorithm, adam is alogrithm used
#loss specifies loss function, binary_crossentropy is commonly used for binary classification
#metrics is evaluation metrics
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

#model.fit is used to train model with providing training data
#epochs is specifies number of times training data must be passed through neural network
#batch_size specifies number of samples to be used in each iteration
#verbose specifies , whether progress information should be displayed(1) or not(0,2)
model.fit(A_train, B_train, epochs=50, batch_size=32, verbose=1)

#model.predict() does binary classification for test data
B_pred = model.predict(A_test)
#later based on threshold(0.5), they are labeled as 0 or 1
B_pred_bin = [1 if a >= 0.5 else 0 for a in B_pred]

##Calculates the accuracy of binay predictions made by respective neural network model.
accuracy = accuracy_score(B_test, B_pred_bin)

#Accuracy value is the printed by rounding of two decimals(.2f)
print(f"Accuracy of Neural Networks Model: {accuracy:.2f}")

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Accuracy of Neural Networks Model: 0.99


### 4.RESULTS

When parameter random_state value is 42 , accuracy is 0.76 for Linear and Logistic , whereas 0.80 for neural model.
When parameter random_state value is 60 , accuracy is 0.78 for Linear and Logistic , whereas 0.80 for neural model.
When parameter random_state value is 50 , accuracy is 0.81 for all models
Hence parameter random_state value is set as 50, so we get 0.81 accuracy

Table 4.1 Without using dummy variable

| Regularisation method | Model Approach (Accuracy) | Linear Regression | Logistic Regression | Neural Network |
|-----------------------|---------------------------|-------------------|---------------------|----------------|
| Ridge                 | 0.81                      | 0.81              | 0.81                | 0.81           |
| Lasso                 | 0.81                      | 0.81              | 0.81                |                |


When parameter random_state value is 42 , accuracy is 0.1 for all model, except for lasso regression linear appproch was 0.87
When parameter random_state value is 60 , accuracy is 0.78 for Linear and Logistic , whereas 0.80 for neural model.
When parameter random_state value is 50 , accuracy is 0.1 for model, except for lasso regression linear appproch was 0.95 and for neural it was 0.99
Hence parameter random_state value is set as 50, so we get good accuracy

Table 4.2 After using dummy variable

| Regularisation method | Model Approach (Accuracy) | Linear Regression | Logistic Regression | Neural Network |
|-----------------------|---------------------------|-------------------|---------------------|----------------|
| Ridge                 | 0.1                       | 0.1               | 0.1                 | 0.99           |
| Lasso                 | 0.95                      | 0.1               | 0.1                 |                |

It can be observed that, when dummy variables are used accuracy is more compared to when dummy variables were not used. During the begining of analysis every model had accuracy around 0.75 to 0.77.But when the parameters such that penalty, alpha , solver and random_state is changed , accuracy above 80 was achievable. So it means that parameters affect in different way for different types of dataset.Hence parameters must be selected as per data set requirement after proper analysis of data , accuracy and all other factors.