<a href="https://colab.research.google.com/github/yah12ak/YASH-AK-PORTFOLIO/blob/main/yash_EDA_project_Paisa_Bazaar_Banking_Fraud_Analysis_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Member  -** Yash Akarnia


# **Project Summary -**

The Paisa Bazaar Credit Score Analysis project aims to explore and understand the financial behavior of customers that influences their credit score. The dataset contains detailed information about each customer’s personal and financial profile  including income, occupation, number of bank accounts, number of credit cards, loan details, payment delays, and overall credit utilization. By performing Exploratory Data Analysis (EDA), the main objective is to extract meaningful patterns and insights that can help financial institutions assess customer risk and make better lending decisions.

Credit score is one of the most important metrics that determines a customer’s creditworthiness. A higher credit score generally represents a lower risk of default, while a low credit score indicates financial instability or irregular repayment behavior. In this project, we will explore how various factors such as annual income, number of loans, delayed payments, and outstanding debt contribute to a person’s credit score. Through statistical analysis and data visualization, we aim to identify key trends, anomalies, and relationships among these variables. The dataset used in this project consists of 28 variables including customer ID, age, occupation, annual income, number of credit cards, credit mix, and credit history age. Before starting the analysis, the dataset will go through a data cleaning process where missing values, outliers, and inconsistent entries will be handled. This ensures that the data is accurate and ready for analysis. After cleaning, data transformation techniques such as encoding categorical variables and converting date columns into numerical formats will be applied.

The next step involves exploring distributions of key features and visualizing the relationships using graphs and plots. For example, we can observe how income levels vary across different credit score categories, or how the number of delayed payments affects the credit utilization ratio. These visual insights can help financial companies identify patterns among customers with poor or excellent credit scores.

The main goals of this project are to:

1.Understand the key factors affecting credit scores. 2.Identify trends in customer financial behavior. 3.Detect potential data anomalies that could impact analysis results. 4.Provide actionable insights for credit risk management and customer segmentation. In conclusion, the Paisa Bazaar EDA project plays a crucial role in understanding how financial habits impact credit scores. The insights derived from this analysis can help businesses like Paisa Bazaar design better financial products, personalize offers, and make data-driven lending decisions. From a data science perspective, this project provides valuable experience in data cleaning, visualization, and feature interpretation  all essential skills for real-world financial analytics.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

The main challenge in this project is to analyze and understand the different factors that affect a customer’s credit score using the given dataset from Paisa Bazaar. Credit score plays a vital role in determining whether a customer is eligible for loans, credit cards, or other financial services. However, many customers fail to maintain a good credit score because of irregular payments, high debt, multiple loans, or poor credit utilization.

Financial institutions need to accurately identify high-risk customers in order to minimize loan defaults and losses. Therefore, this project focuses on exploring various features such as income, loan types, payment behavior, credit history, and outstanding debt to understand how they influence a customer’s credit score.

The goal is to perform Exploratory Data Analysis (EDA) to find patterns and correlations among these factors. By analyzing these relationships, we can help businesses like Paisa Bazaar understand their customers better and build effective strategies for loan approval, interest rate determination, and customer support.

#### **Define Your Business Objective?**




Answer Here.

The main business objective of the Paisa Bazaar Credit Score Analysis project is to identify and understand the key financial and behavioral factors that impact a customer’s credit score. This knowledge can help the company design better strategies for credit risk assessment, loan approvals, and personalized financial products.

By analyzing customer data such as income, number of bank accounts, credit utilization, and payment delays, the business can:

1.Classify customers into low, medium, or high credit risk categories.

2.Predict financial behavior and identify potential defaulters in advance.

3.Enhance decision-making for offering loans or credit cards based on reliable insights.

4.Personalize offers and financial advice based on a customer’s spending and repayment patterns.

In addition, understanding these insights helps Paisa Bazaar improve customer trust and satisfaction by providing responsible lending options. From a business perspective, this analysis not only reduces financial losses but also helps in maintaining long-term customer relationships through data-driven strategies.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries


# Import the NumPy library, commonly used for numerical operations
import numpy as np
# Import the Pandas library, essential for data manipulation and analysis
import pandas as pd
# Import the Seaborn library, used for statistical data visualization
import seaborn as sns
# Import the Matplotlib.pyplot
import matplotlib.pyplot as plt

# Import the warnings module to manage warning messages
import warnings
# Filter out (ignore) all warning messages to keep the output clean
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv('/content/dataset.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull(), cbar=False)

### What did you know about your dataset?

Answer Here

After analyzing the dataset, we observed the following points:

The dataset contains 28 columns and several rows, representing customer financial details.

Some columns have missing values and duplicate entries, which need cleaning before analysis.

Data types include both numerical (Age, Income, Debt, etc.) and categorical (Occupation, Credit Mix, Payment Behaviour).

The Credit_Score column is the target variable that shows whether a customer’s credit score is Good, Standard, or Poor.

Initial inspection suggests that data cleaning and transformation will be essential before visualization and modeling.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Answer Here

ID : 'Unique identifier for each record'

Customer_ID : Unique ID assigned to each customer

Month : Month of record collection

Name : Customer’s full name

Age : Customer’s age in years

SSN : Unique social security or identification number

Occupation : Customer’s occupation or job type

Annual_Income : Yearly income of the customer

Monthly_Inhand_Salary : Salary received in hand each month

Num_Bank_Accounts : Number of bank accounts held

Num_Credit_Card : Total number of active credit cards

Interest_Rate : Interest rate applied on loans/credit cards

Num_of_Loan : Total number of loans customer

Type_of_Loan : Types of loans taken (home, car, personal, etc.)

Delay_from_due_date : Average delay days from due date

Num_of_Delayed_Payment : Total count of delayed payments

Changed_Credit_Limit : Shows whether credit limit has changed recently

Num_Credit_Inquiries : Number of credit inquiries made

Credit_Mix : Type of credit mix (Good/Standard/Poor)

Outstanding_Debt : Unpaid debt amount

Credit_Utilization_Ratio : Percentage of credit used vs limit

Credit_History_Age : Total age of customer’s credit history

Payment_of_Min_Amount : Whether customer pays only minimum due (Yes/No)

Total_EMI_per_month : Sum of all monthly EMIs

Amount_invested_monthly : Amount invested every month

Payment_Behaviour : Customer’s payment behaviour pattern

Monthly_Balance : Average remaining monthly balance

Credit_Score : Target variable — Good, Standard, or Poor

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique() # Display the number of unique values for each column

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
numeric_cols = [
    'Annual_Income','Monthly_Inhand_Salary','Outstanding_Debt',
    'Credit_Utilization_Ratio','Total_EMI_per_month',
    'Amount_invested_monthly','Monthly_Balance'
]

for col in numeric_cols:
    df[col] = (                                               # Remove commas, dashes, currency symbols before converting to numeric
        df[col].astype(str) # Convert column to string type
        .str.replace(',', '') # Remove commas
        .str.replace('-', '0') # Replace '_' values with '0'
        .str.replace('₹', '') # Remove '₹' currency symbol
        .str.replace('$', '') # Remove '$' currency symbol
    )
    # Convert the cleaned string values to numeric (float), coercing errors to NaN
    df[col] = pd.to_numeric(df[col], errors='coerce')

In [None]:
# Converting Age column to numeric (non-numeric -> NaN)
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')

In [None]:
# Convert credit history age from "15 Years 4 Months" to total months
# This conversion was found to be unnecessary and incorrect as 'Credit_History_Age' was already numeric (float64).
# Applying this function to a numeric column incorrectly converted all values to NaN.
def convert_credit_age(value):
     try:
         years = int(value.split(' ')[0]) # Extract the number of years from the string
         months = int(value.split(' ')[3]) # Extract the number of months from the string
         return years * 12 + months # Convert years to months and add to existing months
     except:
         return np.nan # Return NaN if conversion fails

df['Credit_History_Age'] = df['Credit_History_Age'].apply(convert_credit_age)

In [None]:
# Fixing loan type string formatting
df['Type_of_Loan'] = df['Type_of_Loan'].astype(str) # Convert column is of string type
df['Type_of_Loan'] = df['Type_of_Loan'].str.replace('and', ', ') # Replace 'and' with ', ' for consistent separation
df['Type_of_Loan'] = df['Type_of_Loan'].str.replace('_', ' ') # Replace underscores with spaces for readability

In [None]:
# Fixing categorical columns formatting
cat_cols = [
    'Occupation','Credit_Mix','Payment_of_Min_Amount',
    'Payment_Behaviour','Credit_Score'
]

for col in cat_cols:
    df[col] = (
        df[col].astype(str) # Convert the column to string type
        .str.strip() # Remove leading/trailing whitespace
        .str.lower() # Convert all characters to lowercase
        .str.replace(' ', '_') ) # Replace spaces with underscores for consistent naming

In [None]:
# Filling missing values: numeric columns → median, categorical columns → mode

for col in df.columns:                                    # Iterate through each column in the DataFrame
    if df[col].dtype != 'object':                         # Check if the column's data type is not 'object' (i.e., numerical)
        df[col].fillna(df[col].median(), inplace=True)    # Fill missing numerical values with the column's median
    else:      # If the column's data type is 'object' (i.e., categorical)
        df[col].fillna(df[col].mode()[0], inplace=True)   # Fill missing categorical values with the column's mode (most frequent value)

In [None]:
# Removing duplicate rows
df = df.drop_duplicates()

In [None]:
# Resetting the index after cleaning
df.reset_index(drop=True, inplace=True)

In [None]:
df.head() # Display the first 5 rows of the DataFrame

In [None]:
# Check missing values (should be all zeros)
df.isnull().sum()

In [None]:
# Check duplicate rows (should be 0)
df.duplicated().sum()

In [None]:
# Check data types (should be clean: int/float/object)
df.dtypes

In [None]:
df.head() #Dataset First Look

In [None]:
# Check unique values of categorical columns (clean format)
df[['Occupation','Credit_Mix','Payment_of_Min_Amount','Payment_Behaviour','Credit_Score']].nunique()

In [None]:
#Check Credit_History_Age NA count (should be 0)
df['Credit_History_Age'].isnull().sum()

### What all manipulations have you done and insights you found?

Answer Here.

During the data wrangling process, several cleaning and transformation steps were applied to make the dataset ready for analysis. First, all numeric columns were cleaned by removing commas, currency symbols, and invalid characters. These columns were then successfully converted into proper numerical format. The Age column and multiple financial columns also required numeric conversion, which ensured accurate calculations during analysis.

The “Credit_History_Age” column contained values in the format “15 Years 5 Months”, so a custom function was applied to convert this into total months. This made the column uniform and suitable for numerical comparison. Several categorical columns such as Occupation, Credit_Mix, Payment_of_Min_Amount, Payment_Behaviour, and Credit_Score had inconsistent formatting (extra spaces, uppercase/lowercase mismatch, mixed words). These were standardized by stripping spaces, converting to lowercase, and replacing internal spaces with underscores.

Missing values in numerical columns were filled using the median, while categorical columns were filled using the mode. Duplicate rows were also removed. After wrangling, the dataset became completely clean, consistent, and ready for further Exploratory Data Analysis (EDA). One key insight was that multiple columns contained formatting issues and hidden special characters, which were handled successfully. Another observation was that Credit_History_Age had many invalid entries, but converting them helped make the data meaningful. Overall, the dataset is now fully prepared for analysis.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(8,5)) # figure size
sns.countplot(x='Credit_Score', data=df, palette='viridis') # Create a count plot for Credit_Score
plt.title('Credit Score Distribution') # Set the title of the plot
plt.xlabel('Credit Score') # Set the label for the x-axis
plt.ylabel('Count') # Set the label for the y-axis
plt.show() # Show plot

##### 1. Why did you pick the specific chart?


To understand how customers are distributed across credit score categories (good, standard, poor). This is the target variable and forms the base of our analysis.

##### 2. What is/are the insight(s) found from the chart?



One category dominates the distribution, showing whether most customers have poor, standard, or good credit behavior.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.



Helps identify whether the platform deals with more risky customers (poor score) or financially stable customers (good score). Useful for loan eligibility rules.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# Create a count plot for Occupation vs Credit Score
plt.figure(figsize=(12,6))
sns.countplot(x='Occupation', hue='Credit_Score', data=df)
plt.title('Occupation vs Credit Score')
plt.xticks(rotation=45)
plt.xlabel('Occupation')
plt.ylabel('count')
plt.show()

##### 1. Why did you pick the specific chart?

To see which occupations have better or worse credit scores.

##### 2. What is/are the insight(s) found from the chart?

Some job roles show higher poor score counts, meaning unstable financial behavior.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


Banks can assign occupation-based risk levels to reduce loan defaults.

#### Chart - 3

In [None]:
# Chart - 3 visualization code

# Create a histogram with KDE for Annual_Income distribution
plt.figure(figsize=(10,5))
sns.histplot(df['Annual_Income'], kde=True)
plt.title('Annual Income Distribution')
plt.xlabel('Annual_Income')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

To understand income levels across customers.

##### 2. What is/are the insight(s) found from the chart?

Income is skewed majority users fall in mid-income range.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Loan limits and credit card eligibility depend on income categories

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Create a box plot to compare Monthly_Inhand_Salary across different Credit_Score categories
plt.figure(figsize=(8,5))
sns.boxplot(x='Credit_Score', y='Monthly_Inhand_Salary', data=df)
plt.title('Salary vs Credit Score')
plt.xlabel('Credit_Score')
plt.ylabel('Monthly_Inhand_Salary')
plt.show()

##### 1. Why did you pick the specific chart?

Boxplot helps compare salary differences across credit score classes.

##### 2. What is/are the insight(s) found from the chart?

Higher salary customers are mostly in the good credit score category.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

High-salary customers are safer for loan approvals.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
# This chart visualizes the distribution of the number of bank accounts held by customers.
plt.figure(figsize=(8,5))
sns.histplot(df['Num_Bank_Accounts'], kde=True)
plt.title('Number of Bank Accounts Distribution')
plt.xlabel('Num_Bank_Accounts')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

To understand how many bank accounts customers typically hold.

##### 2. What is/are the insight(s) found from the chart?

Most users have 2–4 accounts

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Multiple accounts show financial diversification and stability.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
# The number of loans across different credit score categories.
plt.figure(figsize=(10,5))
sns.boxplot(x='Credit_Score', y='Num_of_Loan', data=df)
plt.title('Loans Count vs Credit Score')
plt.xlabel('Credit_Score')
plt.ylabel('Num_of_Loan')
plt.show()

##### 1. Why did you pick the specific chart?

To check if more loans → poor credit score.

##### 2. What is/are the insight(s) found from the chart?

Poor score customers generally have higher loan counts.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

High loan overload customers are high-risk.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
# This chart visualizes the distribution of various types of loans taken by customers.
plt.figure(figsize=(12,6))
sns.countplot(y='Type_of_Loan', data=df)
plt.title('Type of Loan Distribution')
plt.xlabel('Count')
plt.ylabel('Type of Loan')
plt.show()

##### 1. Why did you pick the specific chart?

To see which loan types are most common.

##### 2. What is/are the insight(s) found from the chart?

Personal loans and auto loans are most frequent.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps decide which products to promote.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
# This chart visualizes the distribution of the delay from the due date for payments.
plt.figure(figsize=(10,5))
sns.histplot(df['Delay_from_due_date'], kde=True)
plt.title('Delay from Due Date Distribution')
plt.xlabel('Delay from Due Date')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

To observe repayment discipline

##### 2. What is/are the insight(s) found from the chart?

Most customers delay within 0–10 days.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

More delays = higher default risk.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
# This chart compares the number of delayed payments against credit scores.
plt.figure(figsize=(8,5))
sns.boxplot(x='Credit_Score', y='Num_of_Delayed_Payment', data=df)
plt.title('Delayed Payments vs Credit Score')
plt.xlabel('Credit_Score')
plt.ylabel('Num_of_Delayed_Payment')
plt.show()

##### 1. Why did you pick the specific chart?

To see if delayed payments affect credit score.

##### 2. What is/are the insight(s) found from the chart?

Poor score group has highest delayed payments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Delayed payments predict default probability.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
# This chart visualizes the distribution of credit mix categories (Good, Standard, Poor).
plt.figure(figsize=(8,5))
sns.countplot(x='Credit_Mix', data=df)
plt.title('Credit Mix Distribution')
plt.xlabel('Credit_Mix')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

Shows how many customers have good/standard/poor credit mix.

##### 2. What is/are the insight(s) found from the chart?

Most customers fall under standard mix

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Credit mix helps assess customer financial discipline.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
# This chart visualizes the distribution of the credit utilization ratio.
plt.figure(figsize=(10,5))
sns.histplot(df['Credit_Utilization_Ratio'], kde=True)
plt.title('Credit Utilization Ratio Distribution')
plt.xlabel('Credit_Utilization_Ratio')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

Utilization ratio strongly affects credit score

##### 2. What is/are the insight(s) found from the chart?

Customers with >50% utilization show risky patterns

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

High utilization customers are less eligible for new loans

#### Chart - 12

In [None]:
# Chart - 12 visualization code
# This chart visualizes the relationship between outstanding debt and credit score.
plt.figure(figsize=(8,5))
sns.boxplot(x='Credit_Score', y='Outstanding_Debt', data=df)
plt.title('Debt vs Credit Score')
plt.xlabel('Credit_Score')
plt.ylabel('Outstanding_Debt')
plt.show()

##### 1. Why did you pick the specific chart?

Debt amount heavily influences credit score

##### 2. What is/are the insight(s) found from the chart?

Poor score users have significantly higher debt.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Debt levels tell which customers are more likely to default.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
# This chart visualizes the distribution of customers' monthly balance.
plt.figure(figsize=(10,5))
sns.histplot(df['Monthly_Balance'], kde=True)
plt.title('Monthly Balance Distribution')
plt.xlabel('Monthly_Balance')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

To see how much money customers keep as leftover monthly.

##### 2. What is/are the insight(s) found from the chart?

Low balance = higher risk category.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps credit companies identify financially stressed customers.

##Chart 14

In [None]:
# Chart - 14 visualization code
# This chart visualizes the distribution of age against credit scores.
plt.figure(figsize=(8,5))
sns.boxplot(x='Credit_Score', y='Age', data=df)
plt.title('Age vs Credit Score')
plt.xlabel('Credit_Score')
plt.ylabel('Age')
plt.show()

1. Why did you pick the specific chart?

To understand how age group impacts customer credit score and financial maturity.



2. What is/are the insight(s) found from the chart?

Middle-aged customers generally have better credit scores compared to younger customers.

   3. Will the gained insights help creating a positive business impact?

   Are there any insights that lead to negative growth? Justify with specific reason.

Age-based risk segmentation can improve loan approval accuracy.

## Chart - 15

In [None]:
# Chart - 15 visualization code
# This chart visualizes the relationship between total EMI per month and credit score.
plt.figure(figsize=(8,5))
sns.boxplot(x='Credit_Score', y='Total_EMI_per_month', data=df)
plt.title('EMI vs Credit Score')
plt.xlabel('Credit_Score')
plt.ylabel('Total_EMI_per_month')
plt.show()

1. Why did you pick the specific chart?

To evaluate repayment burden across credit score categories.

2. What is/are the insight(s) found from the chart?

Higher EMI burden is observed among poor credit score customers.

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

High EMI customers should be offered lower exposure loans

##Chart - 16

In [None]:
# Chart - 16 visualization code
# This chart visualizes the distribution of interest rates.
plt.figure(figsize=(8,5))
sns.histplot(df['Interest_Rate'], kde=True)
plt.title('Interest Rate Distribution')
plt.xlabel('Interest_Rate')
plt.ylabel('Count')
plt.show()

1. Why did you pick the specific chart?

To understand interest rate exposure across customers.

2. What is/are the insight(s) found from the chart?

Most customers fall within mid-range interest rates.

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

Interest rate adjustment can be aligned with customer risk.

##Chart - 17

In [None]:
# Chart - 17 visualization code
# This chart visualizes how minimum payment behavior affects credit scores.
plt.figure(figsize=(8,5))
sns.countplot(x='Payment_of_Min_Amount', hue='Credit_Score', data=df)
plt.title('Min Payment Behavior vs Credit Score')
plt.xlabel('Payment_of_Min_Amount')
plt.ylabel('Count')
plt.show()

1. Why did you pick the specific chart?

Minimum payment behavior reflects repayment seriousness

2. What is/are the insight(s) found from the chart?

Customers who consistently pay minimum amount show poorer credit scores.

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

Minimum payment behavior can be an early warning indicator.

##Chart - 18

In [None]:
# Chart - 18 visualization code
# This chart visualizes the relationship between outstanding debt and credit score.
plt.figure(figsize=(8,5))
sns.scatterplot(
    x='Credit_Utilization_Ratio',
    y='Monthly_Balance',
    hue='Credit_Score',
    data=df)
plt.title('Monthly Balance vs Credit Utilization Ratio')
plt.xlabel('Credit_Utilization_Ratio')
plt.ylabel('Monthly_Balance')
plt.show()

1. Why did you pick the specific chart?

To understand how credit utilization affects monthly savings behavior of customers.



2. What is/are the insight(s) found from the chart?

Customers with high credit utilization generally have lower monthly balance, and many of them fall under poor credit score category.

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

Customers showing high utilization and low balance can be flagged as high-risk and monitored closely before offering additional credit

##Chart - 19

In [None]:
# Chart - 19 visualization code
# This chart visualizes the relationship between the number of delayed payments and outstanding debt, colored by credit score.
plt.figure(figsize=(8,5))
sns.scatterplot(
    x='Num_of_Delayed_Payment',
    y='Outstanding_Debt',
    hue='Credit_Score',
    data=df)
plt.title('Delayed Payments vs Outstanding Debt')
plt.xlabel('Num_of_Delayed_Payment')
plt.ylabel('Outstanding_Debt')
plt.show()

1. Why did you pick the specific chart?

To analyze whether customers with higher debt tend to delay payments more frequently.

2. What is/are the insight(s) found from the chart?

Customers having high outstanding debt usually show a higher number of delayed payments and are mostly associated with poor credit scores

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

This combination acts as a strong risk indicator and can be used to design early intervention strategies to reduce defaults.

#### Chart - 20 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# This chart visualizes the correlation matrix of numerical features using a heatmap.
# Correlation Heatmap
plt.figure(figsize=(14,8))
numeric_df = df.select_dtypes(include=['int64','float64'])
sns.heatmap(numeric_df.corr(), cmap='coolwarm', annot=False)
plt.title("Correlation Heatmap of Numerical Features")
plt.show()


##### 1. Why did you pick the specific chart?

The correlation heatmap helps visualize the strength of relationships between all numerical variables at once. It provides a clear overview of which features move together, which are independent, and which may influence each other. This chart is useful for identifying patterns, multicollinearity, and understanding which variables may impact credit score-related behavior.

##### 2. What is/are the insight(s) found from the chart?

Some variables show positive correlations, such as Outstanding Debt ↔ Total EMI per month

Some variables show negative correlations, such as Credit Utilization Ratio ↔ Monthly Balance

Most financial variables have moderate-to-low correlation, indicating unique behavior patterns.

No extremely high correlations were found, meaning multicollinearity is not a major concern.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# Pair Plot
selected_columns = ['Annual_Income', 'Monthly_Inhand_Salary',
                    'Num_Bank_Accounts', 'Num_of_Loan',
                    'Outstanding_Debt', 'Monthly_Balance']

sns.pairplot(df[selected_columns], diag_kind='kde')
plt.suptitle("Pair Plot of Key Numerical Variables", y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

The pair plot is used to analyze the pairwise relationships between multiple numerical features. It visually shows how each variable relates to others through scatter plots and distributions. This helps identify trends, clusters, and potential patterns in customer financial behavior.

##### 2. What is/are the insight(s) found from the chart?

Some variables show clear linear patterns (e.g., income vs monthly balance).

Higher outstanding debt is loosely associated with lower monthly balance.

Distributions reveal that financial variables are moderately skewed.

No strong clusters indicate a diverse customer base with different financial habits.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

###Based on the exploratory data analysis, the following recommendations can help the client achieve the business objective effectively. Customers with poor credit scores were observed to have higher outstanding debt, frequent delayed payments, and high credit utilization ratios. Such customers should be categorized as high-risk and offered lower loan limits or secured loan products to minimize default risk. Customers with high credit utilization should be encouraged to optimize their credit usage through credit limit adjustments and financial awareness initiatives. Occupation-based lending policies can be implemented, as certain occupation groups demonstrate higher credit risk patterns. Customers with low monthly balance should be offered shorter loan tenures and controlled exposure to ensure manageable repayment. Automated payment reminders and proactive credit monitoring can improve repayment discipline and reduce delays. Additionally, regular review of customer credit behavior can help the business identify early warning signals and take preventive actions before defaults occur.

# **Conclusion**

###In this exploratory data analysis, we analyzed customer financial behavior using multiple demographic and credit-related variables. The analysis revealed that factors such as delayed payments, high outstanding debt, high credit utilization ratio, and low monthly balance significantly impact the credit score. Customers with stable income, good credit mix, and disciplined repayment behavior were found to be financially more reliable. Through detailed visualizations and data wrangling, meaningful patterns and risk indicators were identified. These insights can help the business make informed credit decisions, reduce default risk, and design more effective lending strategies. Overall, this EDA provides a strong data-driven foundation for improving customer credit assessment and business performance.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***