# Logistic Regression

<font color=darkblue>
    
Logistic regression to classify the financial condition of a new bank.



## Import Libraries & Data Overview

The file **Banks.csv** includes data on a sample of 20 banks.

The “Financial Condition” column records the judgment of an expert on the financial condition of each bank. This response variable takes one of two possible values—weak or strong—according to the financial condition of the bank.

The predictors are two ratios used in the financial analysis of banks: TotLns&Lses/Assets is the ratio of total loans and leases to total assets and TotExp/Assets is the ratio of total expenses to total assets.

The target is to classify the financial condition of a new bank using the two ratios.


### Q1.1 Load Packages & Import Dataset

In [35]:
# Load the required packages
import pandas as pd
import numpy as np

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score

import warnings
warnings.filterwarnings("ignore")

In [3]:
# Import the dataset
from google.colab import files
upload = files.upload()


Saving Banks.csv to Banks.csv


In [6]:
bank_data = pd.read_csv('Banks.csv')
bank_data.head()

Unnamed: 0,Obs,FinancialCondition,TotCap/Assets,TotExp/Assets,TotLns&Lses/Assets
0,1,1,9.7,0.12,0.65
1,2,1,1.0,0.11,0.62
2,3,1,6.9,0.09,1.02
3,4,1,5.8,0.1,0.67
4,5,1,4.3,0.11,0.69


In [7]:
bank_data.describe()

Unnamed: 0,Obs,FinancialCondition,TotCap/Assets,TotExp/Assets,TotLns&Lses/Assets
count,20.0,20.0,20.0,20.0,20.0
mean,10.5,0.5,9.32,0.1045,0.6285
std,5.91608,0.512989,4.797214,0.026052,0.159779
min,1.0,0.0,1.0,0.07,0.3
25%,5.75,0.0,7.125,0.08,0.525
50%,10.5,0.5,9.2,0.1,0.64
75%,15.25,1.0,11.3,0.12,0.7225
max,20.0,1.0,20.5,0.16,1.02


In [8]:
bank_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Obs                 20 non-null     int64  
 1   FinancialCondition  20 non-null     int64  
 2   TotCap/Assets       20 non-null     float64
 3   TotExp/Assets       20 non-null     float64
 4   TotLns&Lses/Assets  20 non-null     float64
dtypes: float64(3), int64(2)
memory usage: 928.0 bytes


## Logistic Regression Model

### Q2.1 Modeling

Run a logistic regression model (on the entire dataset) that models the status of a bank as a function of the two financial measures, TotLns&Lses/Assets and TotExp/Assets.

Specify the success class as weak (this is similar to creating a dummy that is 1 for financially weak banks and 0 otherwise), and use the default cutoff value of 0.5.


In [12]:
# Create predictors vector x consisting of two ratios used TotLns&Lses/Assets is the ratio of total loans and leases to total assets and TotExp/Assets is the ratio of total expenses to total assets
x = bank_data[['TotExp/Assets','TotLns&Lses/Assets']]

# Create function financial condition of each bank y i.e. 1 for financially weak banks and 0 for strong
y = bank_data['FinancialCondition']

lr_model = LogisticRegression()
lr_model.fit(x,y)

### Q2.2 Estimated Equations

Write the estimated equation that associates the financial condition of a bank with its two predictors in three formats:

a. The logit as a function of the predictors

b. The odds as a function of the predictors

c. The probability as a function of the predictors


In [21]:
# Extracting the coefficients and the intercept obtained from the logistic regression model
beta_zero = lr_model.intercept_[0]
beta_one = lr_model.coef_[0][0]
beta_two = lr_model.coef_[0][1]

# The logit as a function of the predictors x1 and x2 is
logit_func = f"{beta_zero} + {beta_one} * x1 + {beta_two} * x2"


# Solving for the odds as a function of the predictors
odds = f"(e^{logit_func})"

# Solving for the probability as a function of the predictors
prob = f"1 / (1 + e^(-{beta_zero}  -{beta_one} * X1  -{beta_two} * X2))"

# Print the results
print("Logit Function is:", logit_func)
print("Odds are:", odds)
print("Probability is:", prob)


Logit Function is: -0.4733 + 0.1608 * x1 + 0.7264 * x2
Odds are: (e^-0.4733 + 0.1608 * x1 + 0.7264 * x2)
Probability is: 1 / (1 + e^(--0.4733  -0.1608 * X1  -0.7264 * X2))


## Classify Financial Condition of New Bank

### Q3.1 Estimates for New Bank

Consider a new bank whose total loans and leases/assets ratio = 0.6 and total expenses/assets ratio = 0.11.

From your logistic regression model, estimate the following four quantities for this bank:

the logit, the odds, the probability of being financially weak, and the classification of the bank (use cutoff = 0.5).

In [37]:
x1 = 0.11
x2 = 0.66

logit_value = lr_model.intercept_[0] + lr_model.coef_[0][0] * x1 + lr_model.coef_[0][1] * x2
odds_value = np.exp(logit_value)
prob_value = float(1 / (1 + np.exp(-logit_value)))

print("\nLogit value = ", logit_value)
print("\nOdds value           = ", odds_value)
print("\nProbability    = ", prob_value)

if prob_value >= 0.5:
    print("\nAs probability is greater than 0.5, the bank is classified as financially weak bank i.e. class is 1")
else:
    print("\nAs probability is less than 0.5, the bank is classified as financially strong bank i.e. class is 0")

# Solving for Probability of being financially weak and being in class 1
prob_final = lr_model.predict_proba(np.array([x1, x2]).reshape(1, -1))[0][1]
print("\nProbability of being financially weak bank is ", prob_final)


Logit value =  0.023787157965411365

Odds value           =  1.0240723290547689

Probability    =  0.5059465091017796

As probability is greater than 0.5, the bank is classified as financially weak bank i.e. class is 1

Probability of being financially weak bank is  0.5059465091017796


### Q3.2 Classify New Bank

We use a cutoff value of 0.5 to classify a record based on propensity.

Instead, if we want to classify the record using the odds or logit, what value should we take as a cutoff?

In [None]:
'''Depending on the experimentation of odd and logit values and the trade-off between false positives and false negatives, we must select a cutoff value when classifying a record. This cutoff value can be established through testing and analyzing model performance indicators. we can choose 1.0 as the categorization cutoff value, if you decide that an odds value greater than 1.0 implies a financially weak bank outcome. Similar to this, we can choose a logit threshold of 0 for logit-based categorization that best meets the objectives.
'''

### Q3.3 Cutoff Value

When a bank with in poor financial condition is misclassified as financially strong, the misclassification cost is much higher than a financially strong bank misclassified as weak.

To minimize the expected cost of misclassification, should the cutoff value for classification (which is currently at 0.5) be increased or decreased?

The cutoff value for classification needs to be lower in order to reduce the predicted cost of misclassification in a situation where misclassifying a bank as financially strong instead of in poor financial condition is more expensive than the opposite. Lowering the cutoff number makes the model more cautious when identifying institutions as financially weak, decreasing the likelihood that a bank in financial trouble may be incorrectly categorized as strong. This change matches with the problem's unique cost-sensitive objectives by prioritizing lowering the cost linked to the most serious misclassification scenario.