#Demo 2: Scaling Features Using StandardScaler and MinMaxScaler from Scikit-learn


##**Scenario: Loan Eligibility Prediction**

A banking institution wants to develop a machine learning model to predict whether a loan applicant is eligible for a loan. The dataset contains customer financial details, such as income, loan amount, credit score, and debt-to-income ratio. However, these features have different scales:

* Income is in thousands of dollars (e.g., 30,000 to 150,000).

* Loan Amount ranges from a few thousand to hundreds of thousands.

* Credit Score is typically between 300 and 850.

* Debt-to-Income Ratio is a decimal between 0 and 1.

Since machine learning models perform poorly when features have different scales, proper feature scaling is necessary to ensure fair weightage and better convergence in optimization.

##**Objective**
* Apply StandardScaler (Z-score normalization) to transform data into a standard distribution with mean 0 and variance 1.

* Apply MinMaxScaler (Min-Max normalization) to scale features between 0 and 1, preserving the relative distribution.

* Compare the effect of different scaling techniques on data distribution and machine learning performance.

In [2]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler

In [3]:
# Load the dataset
df = pd.read_csv("loan_eligibility_dataset.csv")

# Display first few rows to understand the dataset
print("Initial Dataset:\n", df.head())

Initial Dataset:
    CustomerID  Income  LoanAmount  CreditScore  DebtToIncomeRatio LoanApproved
0           1   45795       46606          771               0.27           No
1           2   30860       92313          362               0.65           No
2           3  133694      146699          438               0.00          Yes
3           4  149879       13792          798               0.35           No
4           5  140268      179073          691               0.30          Yes


In [4]:
# Selecting numerical columns for scaling
numerical_features = ["Income", "LoanAmount", "CreditScore", "DebtToIncomeRatio"]

In [5]:
# Extracting only numerical features for scaling
df_numerical = df[numerical_features]

In [7]:
# Initialize StandardScaler
standard_scaler = StandardScaler()

# Apply StandardScaler transformation
df_standard_scaled = standard_scaler.fit_transform(df_numerical)

# Convert back to DataFrame
df_standard_scaled = pd.DataFrame(df_standard_scaled, columns=numerical_features)

In [8]:
# Display scaled data
print("\nStandard Scaled Data (Z-score normalization):\n", df_standard_scaled.head())


Standard Scaled Data (Z-score normalization):
      Income  LoanAmount  CreditScore  DebtToIncomeRatio
0 -1.271665   -0.744736     1.113694          -0.729006
1 -1.698694    0.026258    -1.251561           0.709323
2  1.241592    0.943651    -0.812051          -1.750976
3  1.704363   -1.298248     1.269835          -0.426200
4  1.429560    1.489741     0.651052          -0.615453


In [9]:
# Initialize MinMaxScaler
minmax_scaler = MinMaxScaler()

# Apply MinMaxScaler transformation
df_minmax_scaled = minmax_scaler.fit_transform(df_numerical)

# Convert back to DataFrame
df_minmax_scaled = pd.DataFrame(df_minmax_scaled, columns=numerical_features)

In [10]:
# Display scaled data
print("\nMin-Max Scaled Data (Range 0-1):\n", df_minmax_scaled.head())


Min-Max Scaled Data (Range 0-1):
      Income  LoanAmount  CreditScore  DebtToIncomeRatio
0  0.126152    0.209486     0.861624               0.27
1  0.000764    0.445381     0.107011               0.65
2  0.864117    0.726068     0.247232               0.00
3  1.000000    0.040132     0.911439               0.35
4  0.919310    0.893151     0.714022               0.30


In [11]:
# Save Standard Scaled dataset
df_standard_scaled.to_csv("standard_scaled_loan_data.csv", index=False)

# Save Min-Max Scaled dataset
df_minmax_scaled.to_csv("minmax_scaled_loan_data.csv", index=False)

print("\nScaled datasets saved as 'standard_scaled_loan_data.csv' and 'minmax_scaled_loan_data.csv'")


Scaled datasets saved as 'standard_scaled_loan_data.csv' and 'minmax_scaled_loan_data.csv'
