In [38]:
import pandas as pd
# import numpy as np

**Loading the Dataset**  
The dataset is loaded using `pd.read_csv`, which reads the CSV file containing credit data.

In [40]:
data = pd.read_csv("DT-Credit.csv")
# data

**Displaying dataset overview** <br>
Head, Summary statistics, and count of missing values for each column.

In [42]:
print("Displaying the first few rows of the dataset:")
print(data.head())

print("---------------------------------------------")
print("Displaying summary statistics for the dataset:")
print(data.describe())

print("---------------------------------------------")
print("Displaying the count of missing values for each column:")
print(data.isnull().sum())

Displaying the first few rows of the dataset:
    Income  Limit  Rating  Cards  Age  Education  Own Student Married Region  \
0   14.891   3606     283      2   34         11   No      No     Yes  South   
1  106.025   6645     483      3   82         15  Yes     Yes     Yes   West   
2  104.593   7075     514      4   71         11   No      No      No   West   
3  148.924   9504     681      3   36         11  Yes      No      No   West   
4   55.882   4897     357      2   68         16   No      No     Yes  South   

   Balance  
0      333  
1      903  
2      580  
3      964  
4      331  
---------------------------------------------
Displaying summary statistics for the dataset:
           Income         Limit      Rating       Cards         Age  \
count  400.000000    400.000000  400.000000  400.000000  400.000000   
mean    45.218885   4735.600000  354.940000    2.957500   55.667500   
std     35.244273   2308.198848  154.724143    1.371275   17.249807   
min     10.354000 

**Encoding Binary Categorical Variables**  
This cell encodes the binary categorical variables `Own`, `Student`, and `Married` by mapping 'Yes' to 1 and 'No' to 0. This transformation makes the variables numerical.

In [44]:
data['Own'] = data['Own'].map({'Yes': 1, 'No': 0})
data['Student'] = data['Student'].map({'Yes': 1, 'No': 0})
data['Married'] = data['Married'].map({'Yes': 1, 'No': 0})
# data

**One-Hot Encoding for 'Region' Feature**  
Here, `pd.get_dummies` is used to convert the categorical variable `Region` into multiple binary columns (e.g., `Region_East`, `Region_South`, `Region_West`).

In [46]:
data = pd.get_dummies(data, columns=['Region'], prefix='Region')
# data

**Ensuring Consistent Data Types**  
The newly created region columns are cast to integer type, ensuring that all region-related columns are of the same data type.

In [48]:
data[['Region_East', 'Region_South', 'Region_West']] = data[['Region_East', 'Region_South', 'Region_West']].astype(int)
# data

**Feature Scaling**  
The `MinMaxScaler` is used to normalize the continuous features (`Income`, `Limit`, `Rating`, `Cards`, `Age`, `Education`) to a range between 0 and 1.

In [57]:
from sklearn.preprocessing import MinMaxScaler

features_to_scale = ['Income', 'Limit', 'Rating', 'Cards', 'Age', 'Education']
scaler = MinMaxScaler()
data[features_to_scale] = scaler.fit_transform(data[features_to_scale])
data

Unnamed: 0,Income,Limit,Rating,Cards,Age,Education,Own,Student,Married,Balance,Region_East,Region_South,Region_West
0,0.025737,0.210675,0.213723,0.125,0.146667,0.400000,0,0,1,333,0,1,0
1,0.542722,0.443406,0.438695,0.250,0.786667,0.666667,1,1,1,903,0,0,1
2,0.534598,0.476336,0.473566,0.375,0.640000,0.400000,0,0,0,580,0,0,1
3,0.786079,0.662353,0.661417,0.250,0.173333,0.400000,1,0,0,964,0,0,1
4,0.258271,0.309542,0.296963,0.125,0.600000,0.733333,0,0,1,331,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
395,0.009882,0.248507,0.240720,0.250,0.120000,0.533333,0,0,1,560,0,1,0
396,0.017075,0.228442,0.228346,0.500,0.560000,0.800000,0,0,0,480,1,0,0
397,0.269560,0.253944,0.256468,0.500,0.586667,0.466667,1,0,1,138,0,1,0
398,0.155287,0.127891,0.111361,0.000,0.280000,0.533333,0,0,1,0,0,1,0


**Separating Features and Target Variable**  
The target variable (`Balance`) is separated from the feature set (`X`).

In [None]:
X = data.drop(columns=['Balance'])
y = data['Balance']
print(type(X))
print(type(y))