# **Project: Bankruptcy Prevention**

## **Business Objective:**

This project aims to classify whether a business is likely to go bankrupt or not based on various risk factors. The target variable is binary, indicating either bankruptcy or non-bankruptcy. The goal is to model the probability of a business going bankrupt using the given features.

## **Data Description:**

The dataset contains 7 features for around 250 companies, each representing different business risks and flexibility. 

Below is a description of the variables in the dataset:

1. **industrial_risk**: 
   - 0 = low risk
   - 0.5 = medium risk
   - 1 = high risk
   
2. **management_risk**: 
   - 0 = low risk
   - 0.5 = medium risk
   - 1 = high risk
   
3. **financial_flexibility**: 
   - 0 = low flexibility
   - 0.5 = medium flexibility
   - 1 = high flexibility
   
4. **credibility**: 
   - 0 = low credibility
   - 0.5 = medium credibility
   - 1 = high credibility
   
5. **competitiveness**: 
   - 0 = low competitiveness
   - 0.5 = medium competitiveness
   - 1 = high competitiveness
   
6. **operating_risk**: 
   - 0 = low risk
   - 0.5 = medium risk
   - 1 = high risk
   
7. **class** (target variable): 
   - bankruptcy
   - non-bankruptcy


## Import Necessary Libraries

In [18]:
# Import Necessary Libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from collections import Counter
import warnings
warnings.filterwarnings("ignore")

In [146]:
# Load the Dataset
df = pd.read_excel("Bankruptcy.xlsx")

# EDA (Exploratory Data Analysis)

## Descriptive Analysis

In [147]:
df.head()

Unnamed: 0,industrial_risk,management_risk,financial_flexibility,credibility,competitiveness,operating_risk,class
0,0.5,1.0,0.0,0.0,0.0,0.5,bankruptcy
1,0.0,1.0,0.0,0.0,0.0,1.0,bankruptcy
2,1.0,0.0,0.0,0.0,0.0,1.0,bankruptcy
3,0.5,0.0,0.0,0.5,0.0,1.0,bankruptcy
4,1.0,1.0,0.0,0.0,0.0,1.0,bankruptcy


In [148]:
df.tail()

Unnamed: 0,industrial_risk,management_risk,financial_flexibility,credibility,competitiveness,operating_risk,class
245,0.0,1.0,1.0,1.0,1.0,1.0,non-bankruptcy
246,1.0,1.0,0.5,1.0,1.0,0.0,non-bankruptcy
247,0.0,1.0,1.0,0.5,0.5,0.0,non-bankruptcy
248,1.0,0.0,0.5,1.0,0.5,0.0,non-bankruptcy
249,1.0,0.0,0.5,0.5,1.0,1.0,non-bankruptcy


In [14]:
df.shape

(250, 7)

In [15]:
df.size

1750

In [16]:
df.columns

Index(['industrial_risk', 'management_risk', 'financial_flexibility',
       'credibility', 'competitiveness', 'operating_risk', 'class'],
      dtype='object')

In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 250 entries, 0 to 249
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   industrial_risk        250 non-null    float64
 1   management_risk        250 non-null    float64
 2   financial_flexibility  250 non-null    float64
 3   credibility            250 non-null    float64
 4   competitiveness        250 non-null    float64
 5   operating_risk         250 non-null    float64
 6   class                  250 non-null    object 
dtypes: float64(6), object(1)
memory usage: 13.8+ KB


## Class Distribution and Feature-wise Breakdown 

In [149]:
columns=df.columns
columns = df.columns.drop("class")
data_description={}
for column in columns:
    data = pd.crosstab(df[column],df["class"])
    data["total"] = df[column].value_counts()
    data_description[column]=data
description = pd.concat([data_description[feature] for feature in columns],keys=features)

In [150]:
description

Unnamed: 0_level_0,class,bankruptcy,non-bankruptcy,total
Unnamed: 0_level_1,industrial_risk,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
industrial_risk,0.0,26,54,80
industrial_risk,0.5,28,53,81
industrial_risk,1.0,53,36,89
management_risk,0.0,11,51,62
management_risk,0.5,23,46,69
management_risk,1.0,73,46,119
financial_flexibility,0.0,102,17,119
financial_flexibility,0.5,4,70,74
financial_flexibility,1.0,1,56,57
credibility,0.0,87,7,94


### Handling Null Values

In [89]:
# Checking for null values.
df.isna().sum().sum()

0

### Removing Duplicated Values

In [151]:
duplicated = df[df.duplicated()]

In [95]:
duplicated.to_excel("duplicate_data.xlsx",index=False)

In [159]:
data = df.drop_duplicates()

In [160]:
data.duplicated().sum()

0

In [153]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 0.0 to 1.0
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype
---  ------          --------------  -----
 0   bankruptcy      3 non-null      int64
 1   non-bankruptcy  3 non-null      int64
 2   total           3 non-null      int64
dtypes: int64(3)
memory usage: 96.0 bytes


In [161]:
columns = df.columns.drop("class")
data_description={}
for column in columns:
    cross_tab = pd.crosstab(data[column],df["class"])
    cross_tab["total"] = data[column].value_counts()
    data_description[column]=cross_tab
cleaned_data_description = pd.concat([data_description[feature] for feature in columns],keys=features)

In [162]:
cleaned_data_description

Unnamed: 0_level_0,class,bankruptcy,non-bankruptcy,total
Unnamed: 0_level_1,industrial_risk,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
industrial_risk,0.0,6,29,35
industrial_risk,0.5,6,30,36
industrial_risk,1.0,13,19,32
management_risk,0.0,4,25,29
management_risk,0.5,7,25,32
management_risk,1.0,14,28,42
financial_flexibility,0.0,23,10,33
financial_flexibility,0.5,1,35,36
financial_flexibility,1.0,1,33,34
credibility,0.0,18,4,22


### Comparing the data distribution before and After removing duplicated rows.

In [128]:
distribution_comparision = pd.concat([description,cleaned_data_description],axis=1)

In [163]:
distribution_comparision

Unnamed: 0_level_0,class,bankruptcy,non-bankruptcy,total,bankruptcy,non-bankruptcy,total
Unnamed: 0_level_1,industrial_risk,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
industrial_risk,0.0,26,54,80,6,29,35
industrial_risk,0.5,28,53,81,6,30,36
industrial_risk,1.0,53,36,89,13,19,32
management_risk,0.0,11,51,62,4,25,29
management_risk,0.5,23,46,69,7,25,32
management_risk,1.0,73,46,119,14,28,42
financial_flexibility,0.0,102,17,119,23,10,33
financial_flexibility,0.5,4,70,74,1,35,36
financial_flexibility,1.0,1,56,57,1,33,34
credibility,0.0,87,7,94,18,4,22


In [134]:
distribution_comparision

Unnamed: 0_level_0,class,bankruptcy,non-bankruptcy,total,bankruptcy,non-bankruptcy,total
Unnamed: 0_level_1,industrial_risk,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
industrial_risk,0.0,26,54,80,6,29,35
industrial_risk,0.5,28,53,81,6,30,36
industrial_risk,1.0,53,36,89,13,19,32
management_risk,0.0,11,51,62,4,25,29
management_risk,0.5,23,46,69,7,25,32
management_risk,1.0,73,46,119,14,28,42
financial_flexibility,0.0,102,17,119,23,10,33
financial_flexibility,0.5,4,70,74,1,35,36
financial_flexibility,1.0,1,56,57,1,33,34
credibility,0.0,87,7,94,18,4,22


In [131]:
x

KeyError: '`Styler.apply` and `.map` are not compatible with non-unique index or columns.'

<pandas.io.formats.style.Styler at 0x219fbc3d0d0>

In [135]:
import pandas as pd

# Sample DataFrame
data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}
df = pd.DataFrame(data)

# Function to bold a specific column


# Apply styling to bold column 'B'


# Display the styled DataFrame
styled_df


Unnamed: 0,A,B,C
0,1,4,7
1,2,5,8
2,3,6,9


In [137]:
distribution_comparision.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 18 entries, ('industrial_risk', 0.0) to ('operating_risk', 1.0)
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype
---  ------          --------------  -----
 0   bankruptcy      18 non-null     int64
 1   non-bankruptcy  18 non-null     int64
 2   total           18 non-null     int64
 3   bankruptcy      18 non-null     int64
 4   non-bankruptcy  18 non-null     int64
 5   total           18 non-null     int64
dtypes: int64(6)
memory usage: 1.8+ KB
