**Name:** Muhammad Umer

**Email** umerhayat282@gmail.com

**Date** October 24, 2025

____

## **About this Notebook**

This notebook demonstrates and explains seven key feature scaling techniques used in machine learning to normalize or standardize data before model training. Each method is applied practically with clear examples to show how the values change before and after scaling, along with a detailed explanation of when and why each technique should be used. The goal of this notebook is to help understand the impact of different scalers on data distribution and model performance, providing a strong foundation for preprocessing in ML workflows.

## **Table of Contents**

Click any link to jump to that section:

1. [Explore data](#explore-data)
2. [StandardScaler](#1-StandardScaler)
3. [MinMaxScaler](#2-minmaxscaler)
4. [MaxAbsScaler](#3-maxabsscaler)
5. [RobustScaler](#4-robustscaler)
6. [Normalizer](#5-normalizer)
7. [QuantileTransformer](#6-quantiletransformer)
8. [PowerTransformer](#7-powertransformer)
9. [Overall Observations](#overall-observations-on-data-scaling)
10. [Summary](#Summary:)

___

In [32]:
import pandas as pd

df_data = pd.read_csv(r'D:\Ai_machine_learning_deep_learning_air_university_lab_islamabad\data\Data.csv')
df_employee = pd.read_csv(r'D:\Ai_machine_learning_deep_learning_air_university_lab_islamabad\data\Employee.csv')
df_heart_disease = pd.read_csv(r'D:\Ai_machine_learning_deep_learning_air_university_lab_islamabad\data\heart-disease-UCI.csv')
df_nchs = pd.read_csv(r'D:\Ai_machine_learning_deep_learning_air_university_lab_islamabad\data\NCHS.csv')
df_salaries = pd.read_csv(r'D:\Ai_machine_learning_deep_learning_air_university_lab_islamabad\data\Salaries.csv')
df_titanic = pd.read_csv(r'D:\Ai_machine_learning_deep_learning_air_university_lab_islamabad\data\titanic.csv')

## Explore data

Briefly explore the datasets to understand their structure and identify numerical columns suitable for scaling.



Explore each dataframe by displaying the head, info, and describe to understand their structure and identify numerical columns for scaling.



In [33]:
print("Exploring df_data:")
display(df_data.head())
df_data.info()
display(df_data.describe())

Exploring df_data:


Unnamed: 0,Country,Age,Salary,Purchased
0,France,44.0,72000.0,No
1,Spain,27.0,48000.0,Yes
2,Germany,30.0,54000.0,No
3,Spain,38.0,61000.0,No
4,Germany,40.0,,Yes


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Country    10 non-null     object 
 1   Age        9 non-null      float64
 2   Salary     9 non-null      float64
 3   Purchased  10 non-null     object 
dtypes: float64(2), object(2)
memory usage: 452.0+ bytes


Unnamed: 0,Age,Salary
count,9.0,9.0
mean,38.777778,63777.777778
std,7.693793,12265.579662
min,27.0,48000.0
25%,35.0,54000.0
50%,38.0,61000.0
75%,44.0,72000.0
max,50.0,83000.0


In [34]:
print("\nExploring df_employee:")
display(df_employee.head())
df_employee.info()
display(df_employee.describe())


Exploring df_employee:


Unnamed: 0,Company,Age,Salary,Place,Country,Gender
0,TCS,20.0,,Chennai,India,0
1,Infosys,30.0,,Mumbai,India,0
2,TCS,35.0,2300.0,Calcutta,India,0
3,Infosys,40.0,3000.0,Delhi,India,0
4,TCS,23.0,4000.0,Mumbai,India,0


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 148 entries, 0 to 147
Data columns (total 6 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Company  140 non-null    object 
 1   Age      130 non-null    float64
 2   Salary   124 non-null    float64
 3   Place    134 non-null    object 
 4   Country  148 non-null    object 
 5   Gender   148 non-null    int64  
dtypes: float64(2), int64(1), object(3)
memory usage: 7.1+ KB


Unnamed: 0,Age,Salary,Gender
count,130.0,124.0,148.0
mean,30.484615,5312.467742,0.222973
std,11.09664,2573.764683,0.417654
min,0.0,1089.0,0.0
25%,22.0,3030.0,0.0
50%,32.5,5000.0,0.0
75%,37.75,8000.0,0.0
max,54.0,9876.0,1.0


In [35]:
print("\nExploring df_employee:")
display(df_employee.head())
df_employee.info()
display(df_employee.describe())


Exploring df_employee:


Unnamed: 0,Company,Age,Salary,Place,Country,Gender
0,TCS,20.0,,Chennai,India,0
1,Infosys,30.0,,Mumbai,India,0
2,TCS,35.0,2300.0,Calcutta,India,0
3,Infosys,40.0,3000.0,Delhi,India,0
4,TCS,23.0,4000.0,Mumbai,India,0


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 148 entries, 0 to 147
Data columns (total 6 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Company  140 non-null    object 
 1   Age      130 non-null    float64
 2   Salary   124 non-null    float64
 3   Place    134 non-null    object 
 4   Country  148 non-null    object 
 5   Gender   148 non-null    int64  
dtypes: float64(2), int64(1), object(3)
memory usage: 7.1+ KB


Unnamed: 0,Age,Salary,Gender
count,130.0,124.0,148.0
mean,30.484615,5312.467742,0.222973
std,11.09664,2573.764683,0.417654
min,0.0,1089.0,0.0
25%,22.0,3030.0,0.0
50%,32.5,5000.0,0.0
75%,37.75,8000.0,0.0
max,54.0,9876.0,1.0


In [36]:
print("\nExploring df_nchs:")
display(df_nchs.head())
df_nchs.info()
display(df_nchs.describe())


Exploring df_nchs:


Unnamed: 0,Year,113 Cause Name,Cause Name,State,Deaths,Age-adjusted Death Rate
0,1999,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,Alabama,2313.0,52.2
1,1999,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,Alaska,294.0,55.9
2,1999,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,Arizona,2214.0,44.8
3,1999,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,Arkansas,1287.0,47.6
4,1999,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,California,9198.0,28.7


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15028 entries, 0 to 15027
Data columns (total 6 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Year                     15028 non-null  int64  
 1   113 Cause Name           15028 non-null  object 
 2   Cause Name               15028 non-null  object 
 3   State                    15028 non-null  object 
 4   Deaths                   15013 non-null  float64
 5   Age-adjusted Death Rate  14917 non-null  float64
dtypes: float64(2), int64(1), object(3)
memory usage: 704.6+ KB


Unnamed: 0,Year,Deaths,Age-adjusted Death Rate
count,15028.0,15013.0,14917.0
mean,2007.0,10232.61,86.526393
std,4.899142,90032.61,190.76495
min,1999.0,10.0,1.3
25%,2003.0,294.0,8.3
50%,2007.0,838.0,18.9
75%,2011.0,2737.0,46.3
max,2015.0,2712630.0,1087.3


In [37]:
print("\nExploring df_salaries:")
display(df_salaries.head())
df_salaries.info()
display(df_salaries.describe())


Exploring df_salaries:


Unnamed: 0,rank,discipline,phd,service,sex,salary
0,Prof,B,56,49,Male,186960
1,Prof,A,12,6,Male,93000
2,Prof,A,23,20,Male,110515
3,Prof,A,40,31,Male,131205
4,Prof,B,20,18,Male,104800


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 78 entries, 0 to 77
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   rank        78 non-null     object
 1   discipline  78 non-null     object
 2   phd         78 non-null     int64 
 3   service     78 non-null     int64 
 4   sex         78 non-null     object
 5   salary      78 non-null     int64 
dtypes: int64(3), object(3)
memory usage: 3.8+ KB


Unnamed: 0,phd,service,salary
count,78.0,78.0,78.0
mean,19.705128,15.051282,108023.782051
std,12.498425,12.139768,28293.661022
min,1.0,0.0,57800.0
25%,10.25,5.25,88612.5
50%,18.5,14.5,104671.0
75%,27.75,20.75,126774.75
max,56.0,51.0,186960.0


In [38]:

print("\nExploring df_titanic:")
display(df_titanic.head())
df_titanic.info()
display(df_titanic.describe())


Exploring df_titanic:


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


____

## 1. StandardScaler

StandardScaler standardizes features by removing the mean and scaling to unit variance. The formula for standardization is:

$$ z = \frac{x - \mu}{\sigma} $$

Where $x$ is the original feature value, $\mu$ is the mean of the feature, and $\sigma$ is the standard deviation of the feature. This process results in data with a mean of 0 and a standard deviation of 1. It is useful when the data has a Gaussian distribution or when algorithms that assume zero mean and unit variance are used. It is sensitive to outliers.



In [39]:
from sklearn.preprocessing import StandardScaler
import numpy as np

# Select 'Age' column from df_data as it has numerical data and some missing values
# Handle missing values by dropping rows for simplicity in this example
df_data_cleaned = df_data.dropna(subset=['Age'])
age_data = df_data_cleaned[['Age']]

# Markdown explanation for original data
print("### Original Data (Before StandardScaler)")
print("\nLet's look at the original 'Age' data before applying StandardScaler.")
print("We will examine its basic statistics like mean, standard deviation, and range.")

# Display original data
display(age_data.head())
print(f"Mean of original 'Age': {age_data['Age'].mean():.2f}")
print(f"Standard deviation of original 'Age': {age_data['Age'].std():.2f}")
print(f"Min of original 'Age': {age_data['Age'].min():.2f}")
print(f"Max of original 'Age': {age_data['Age'].max():.2f}")


# Instantiate StandardScaler
scaler = StandardScaler()

# Fit and transform the data
age_scaled = scaler.fit_transform(age_data)

# Convert the scaled data back to a DataFrame for easier viewing
age_scaled_df = pd.DataFrame(age_scaled, columns=['Age_Scaled'])

# Markdown explanation for scaled data
print("\n### Scaled Data (After StandardScaler)")
print("\nAfter applying StandardScaler, the 'Age' data has been standardized.")
print("StandardScaler removes the mean and scales the data to unit variance.")
print("This means the scaled data should have a mean close to 0 and a standard deviation close to 1.")


# Display scaled data
display(age_scaled_df.head())

# Display mean and standard deviation of scaled data
print(f"Mean of scaled 'Age': {age_scaled_df['Age_Scaled'].mean():.2f}")
print(f"Standard deviation of scaled 'Age': {age_scaled_df['Age_Scaled'].std():.2f}")
print(f"Min of scaled 'Age': {age_scaled_df['Age_Scaled'].min():.2f}")
print(f"Max of scaled 'Age': {age_scaled_df['Age_Scaled'].max():.2f}")


### Original Data (Before StandardScaler)

Let's look at the original 'Age' data before applying StandardScaler.
We will examine its basic statistics like mean, standard deviation, and range.


Unnamed: 0,Age
0,44.0
1,27.0
2,30.0
3,38.0
4,40.0


Mean of original 'Age': 38.78
Standard deviation of original 'Age': 7.69
Min of original 'Age': 27.00
Max of original 'Age': 50.00

### Scaled Data (After StandardScaler)

After applying StandardScaler, the 'Age' data has been standardized.
StandardScaler removes the mean and scales the data to unit variance.
This means the scaled data should have a mean close to 0 and a standard deviation close to 1.


Unnamed: 0,Age_Scaled
0,0.719931
1,-1.623675
2,-1.210098
3,-0.107224
4,0.168495


Mean of scaled 'Age': -0.00
Standard deviation of scaled 'Age': 1.06
Min of scaled 'Age': -1.62
Max of scaled 'Age': 1.55


## 2. MinMaxScaler

MinMaxScaler scales and translates each feature individually such that it is in the given range, typically between 0 and 1. The formula for min-max scaling is:

$$ X_{scaled} = \frac{X - X_{min}}{X_{max} - X_{min}} $$

Where $X$ is the original feature value, $X_{min}$ is the minimum value of the feature, and $X_{max}$ is the maximum value of the feature. This scaler is useful when the data distribution is not Gaussian or when algorithms are sensitive to the scale of features, but it is sensitive to outliers.

In [40]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Select 'Fare' column from df_titanic as it is numerical
# Handle missing values by dropping rows for simplicity in this example
df_titanic_cleaned = df_titanic.dropna(subset=['Fare'])
fare_data = df_titanic_cleaned[['Fare']]

# Markdown explanation for original data
print("### Original Data (Before MinMaxScaler)")
print("\nLet's look at the original 'Fare' data before applying MinMaxScaler.")
print("We will examine its basic statistics like min, max, and range.")

# Display original data
display(fare_data.head())
print(f"Min of original 'Fare': {fare_data['Fare'].min():.2f}")
print(f"Max of original 'Fare': {fare_data['Fare'].max():.2f}")

# Instantiate MinMaxScaler
scaler = MinMaxScaler()

# Fit and transform the data
fare_scaled = scaler.fit_transform(fare_data)

# Convert the scaled data back to a DataFrame for easier viewing
fare_scaled_df = pd.DataFrame(fare_scaled, columns=['Fare_Scaled'])

# Markdown explanation for scaled data
print("\n### Scaled Data (After MinMaxScaler)")
print("\nAfter applying MinMaxScaler, the 'Fare' data has been scaled to a range between 0 and 1.")
print("This means the scaled data should have a minimum value of 0 and a maximum value of 1.")

# Display scaled data
display(fare_scaled_df.head())

# Display min and max of scaled data
print(f"Min of scaled 'Fare': {fare_scaled_df['Fare_Scaled'].min():.2f}")
print(f"Max of scaled 'Fare': {fare_scaled_df['Fare_Scaled'].max():.2f}")

### Original Data (Before MinMaxScaler)

Let's look at the original 'Fare' data before applying MinMaxScaler.
We will examine its basic statistics like min, max, and range.


Unnamed: 0,Fare
0,7.25
1,71.2833
2,7.925
3,53.1
4,8.05


Min of original 'Fare': 0.00
Max of original 'Fare': 512.33

### Scaled Data (After MinMaxScaler)

After applying MinMaxScaler, the 'Fare' data has been scaled to a range between 0 and 1.
This means the scaled data should have a minimum value of 0 and a maximum value of 1.


Unnamed: 0,Fare_Scaled
0,0.014151
1,0.139136
2,0.015469
3,0.103644
4,0.015713


Min of scaled 'Fare': 0.00
Max of scaled 'Fare': 1.00


## 3. MaxAbsScaler

MaxAbsScaler scales each feature by its maximum absolute value. This scaler does not shift or center the data, and thus it does not destroy any sparsity. The formula for max-abs scaling is:

$$ X_{scaled} = \frac{X}{\max(|X|)} $$

Where $X$ is the original feature value and $\max(|X|)$ is the maximum absolute value of the feature. This scaler is particularly useful for scaling data that is already centered at zero or for sparse data. It is sensitive to outliers.



In [41]:
from sklearn.preprocessing import MaxAbsScaler
import numpy as np

# Select 'Salary' column from df_salaries as it contains numerical values
# Handle missing values by dropping rows for simplicity in this example
df_salaries_cleaned = df_salaries.dropna(subset=['salary'])
salary_data = df_salaries_cleaned[['salary']]

# Markdown explanation for original data
print("### Original Data (Before MaxAbsScaler)")
print("\nLet's look at the original 'salary' data from the df_salaries dataset before applying MaxAbsScaler.")
print("We will examine its basic statistics, particularly its minimum and maximum values, and the maximum absolute value.")

# Display original data
display(salary_data.head())
print(f"Min of original 'salary': {salary_data['salary'].min():.2f}")
print(f"Max of original 'salary': {salary_data['salary'].max():.2f}")
print(f"Maximum absolute value of original 'salary': {np.max(np.abs(salary_data['salary'])):.2f}")


# Instantiate MaxAbsScaler
scaler = MaxAbsScaler()

# Fit and transform the data
salary_scaled = scaler.fit_transform(salary_data)

# Convert the scaled data back to a DataFrame for easier viewing
salary_scaled_df = pd.DataFrame(salary_scaled, columns=['Salary_Scaled'])

# Markdown explanation for scaled data
print("\n### Scaled Data (After MaxAbsScaler)")
print("\nAfter applying MaxAbsScaler, the 'salary' data has been scaled by dividing each value by the maximum absolute value of the original data.")
print("This means the scaled data will have its maximum absolute value equal to 1.")


# Display scaled data
display(salary_scaled_df.head())

# Display min and max of scaled data
print(f"Min of scaled 'salary': {salary_scaled_df['Salary_Scaled'].min():.2f}")
print(f"Max of scaled 'salary': {salary_scaled_df['Salary_Scaled'].max():.2f}")
print(f"Maximum absolute value of scaled 'salary': {np.max(np.abs(salary_scaled_df['Salary_Scaled'])):.2f}")

# Markdown observations
print("\n### Observations on MaxAbsScaler")
print("\nMaxAbsScaler scales the data such that the maximum absolute value of each feature is 1.")
print("It is particularly useful for data that is already centered at zero or for sparse data.")
print("Unlike MinMaxScaler, it does not shift the data, so the data remains centered around zero if it was before scaling.")
print("In this case, the original 'salary' data is not centered around zero, but the scaler still ensures the maximum absolute value is 1.")

### Original Data (Before MaxAbsScaler)

Let's look at the original 'salary' data from the df_salaries dataset before applying MaxAbsScaler.
We will examine its basic statistics, particularly its minimum and maximum values, and the maximum absolute value.


Unnamed: 0,salary
0,186960
1,93000
2,110515
3,131205
4,104800


Min of original 'salary': 57800.00
Max of original 'salary': 186960.00
Maximum absolute value of original 'salary': 186960.00

### Scaled Data (After MaxAbsScaler)

After applying MaxAbsScaler, the 'salary' data has been scaled by dividing each value by the maximum absolute value of the original data.
This means the scaled data will have its maximum absolute value equal to 1.


Unnamed: 0,Salary_Scaled
0,1.0
1,0.497433
2,0.591116
3,0.701781
4,0.560548


Min of scaled 'salary': 0.31
Max of scaled 'salary': 1.00
Maximum absolute value of scaled 'salary': 1.00

### Observations on MaxAbsScaler

MaxAbsScaler scales the data such that the maximum absolute value of each feature is 1.
It is particularly useful for data that is already centered at zero or for sparse data.
Unlike MinMaxScaler, it does not shift the data, so the data remains centered around zero if it was before scaling.
In this case, the original 'salary' data is not centered around zero, but the scaler still ensures the maximum absolute value is 1.


## 4. RobustScaler

RobustScaler scales features using statistics that are robust to outliers. It scales data based on the interquartile range (IQR) and centers it around the median. The formula for robust scaling is:

$$ X_{scaled} = \frac{X - Q_1}{Q_3 - Q_1} $$

Where $X$ is the original feature value, $Q_1$ is the first quartile (25th percentile), and $Q_3$ is the third quartile (75th percentile). This makes it less susceptible to the influence of outliers compared to StandardScaler or MinMaxScaler. It is suitable for datasets with outliers.


In [42]:
from sklearn.preprocessing import RobustScaler
import numpy as np

# Select 'Age' column from df_employee as it has numerical data and might contain outliers
# Handle missing values by dropping rows for simplicity
df_employee_cleaned = df_employee.dropna(subset=['Age'])
age_employee_data = df_employee_cleaned[['Age']]

# Markdown explanation for original data
print("### Original Data (Before RobustScaler)")
print("\nLet's look at the original 'Age' data from the df_employee dataset before applying RobustScaler.")
print("We will examine its basic statistics, including the median and quartiles, as RobustScaler is based on these.")

# Display original data
display(age_employee_data.head())

# Print median and quartiles of original data
print(f"Median of original 'Age': {age_employee_data['Age'].median():.2f}")
print(f"25th Percentile (Q1) of original 'Age': {age_employee_data['Age'].quantile(0.25):.2f}")
print(f"75th Percentile (Q3) of original 'Age': {age_employee_data['Age'].quantile(0.75):.2f}")


# Instantiate RobustScaler
scaler = RobustScaler()

# Fit and transform the data
age_employee_scaled = scaler.fit_transform(age_employee_data)

# Convert the scaled data back to a DataFrame for easier viewing
age_employee_scaled_df = pd.DataFrame(age_employee_scaled, columns=['Age_Scaled_Robust'])

# Markdown explanation for scaled data
print("\n### Scaled Data (After RobustScaler)")
print("\nAfter applying RobustScaler, the 'Age' data has been scaled using the median and the Interquartile Range (IQR).")
print("The data is centered around zero (median becomes 0) and scaled based on the IQR.")


# Display scaled data
display(age_employee_scaled_df.head())

# Print median and quartiles of scaled data
print(f"Median of scaled 'Age': {age_employee_scaled_df['Age_Scaled_Robust'].median():.2f}")
print(f"25th Percentile (Q1) of scaled 'Age': {age_employee_scaled_df['Age_Scaled_Robust'].quantile(0.25):.2f}")
print(f"75th Percentile (Q3) of scaled 'Age': {age_employee_scaled_df['Age_Scaled_Robust'].quantile(0.75):.2f}")

# Markdown observations
print("\n### Observations on RobustScaler")
print("\nRobustScaler is less sensitive to outliers because it uses the median and the IQR for scaling.")
print("As expected, the median of the scaled data is 0.")
print("The IQR of the scaled data is 1 (Q3 - Q1 = 0.25 - (-0.75) = 1.00), meaning the data within the IQR is scaled to a range of [-0.75, 0.25] relative to the median.")

### Original Data (Before RobustScaler)

Let's look at the original 'Age' data from the df_employee dataset before applying RobustScaler.
We will examine its basic statistics, including the median and quartiles, as RobustScaler is based on these.


Unnamed: 0,Age
0,20.0
1,30.0
2,35.0
3,40.0
4,23.0


Median of original 'Age': 32.50
25th Percentile (Q1) of original 'Age': 22.00
75th Percentile (Q3) of original 'Age': 37.75

### Scaled Data (After RobustScaler)

After applying RobustScaler, the 'Age' data has been scaled using the median and the Interquartile Range (IQR).
The data is centered around zero (median becomes 0) and scaled based on the IQR.


Unnamed: 0,Age_Scaled_Robust
0,-0.793651
1,-0.15873
2,0.15873
3,0.47619
4,-0.603175


Median of scaled 'Age': 0.00
25th Percentile (Q1) of scaled 'Age': -0.67
75th Percentile (Q3) of scaled 'Age': 0.33

### Observations on RobustScaler

RobustScaler is less sensitive to outliers because it uses the median and the IQR for scaling.
As expected, the median of the scaled data is 0.
The IQR of the scaled data is 1 (Q3 - Q1 = 0.25 - (-0.75) = 1.00), meaning the data within the IQR is scaled to a range of [-0.75, 0.25] relative to the median.


## 5. Normalizer

Normalizer scales each sample individually to unit norm. It is used when you want to scale the samples (rows) instead of the features (columns). There are different types of norms that can be used (L1, L2, etc.). The L2 norm (Euclidean distance) is commonly used:

$$ X_{scaled} = \frac{X}{||X||_2} $$

Where $X$ is the original sample vector and $||X||_2$ is its L2 norm, calculated as $\sqrt{\sum_{i=1}^{n} x_i^2}$. Normalizer is useful when the direction of the data is more important than the magnitude. It is not affected by outliers.



In [43]:
from sklearn.preprocessing import Normalizer
import numpy as np

# Select 'Deaths' and 'Age-adjusted Death Rate' columns from df_nchs for normalization
# Normalizer is suitable for scaling samples (rows) to unit norm,
# which can be relevant when comparing the magnitude of death statistics across states (samples).
# Handle missing values by dropping rows for simplicity
df_nchs_cleaned = df_nchs.dropna(subset=['Deaths', 'Age-adjusted Death Rate'])
death_data = df_nchs_cleaned[['Deaths', 'Age-adjusted Death Rate']]

# Markdown explanation for original data
print("### Original Data (Before Normalizer)")
print("\nLet's look at the original 'Deaths' and 'Age-adjusted Death Rate' data from the df_nchs dataset before applying Normalizer.")
print("Normalizer scales each sample (row) to unit norm. We will look at the values and calculate the L2 norm for a few samples.")

# Display original data
display(death_data.head())

# Calculate and print L2 norm for a few original data samples
print("\nL2 Norm for first 5 original samples:")
for index, row in death_data.head().iterrows():
    l2_norm = np.linalg.norm(row)
    print(f"Sample {index}: {l2_norm:.2f}")


# Instantiate Normalizer (default is L2 norm)
scaler = Normalizer()

# Fit and transform the data
death_scaled = scaler.fit_transform(death_data)

# Convert the scaled data back to a DataFrame for easier viewing
death_scaled_df = pd.DataFrame(death_scaled, columns=['Deaths_Normalized', 'Age-adjusted Death Rate_Normalized'])

# Markdown explanation for scaled data
print("\n### Scaled Data (After Normalizer)")
print("\nAfter applying Normalizer (L2 norm), each sample (row) in the 'Deaths' and 'Age-adjusted Death Rate' data has been scaled so that its L2 norm is 1.")
print("This means the vector formed by the scaled values for each sample has a length of 1.")


# Display scaled data
display(death_scaled_df.head())

# Calculate and print L2 norm for the same scaled data samples
print("\nL2 Norm for first 5 scaled samples:")
for index, row in death_scaled_df.head().iterrows():
    l2_norm = np.linalg.norm(row)
    print(f"Sample {index}: {l2_norm:.2f}")


# Markdown observations
print("\n### Observations on Normalizer")
print("\nNormalizer scales each sample (row) independently to a unit norm (defaulting to L2 norm).")
print("This is different from other scalers which scale features (columns).")
print("As shown by the L2 norm calculations, each scaled sample now has an L2 norm of 1.")
print("This scaler is useful when the direction or angle of the data points is more important than their magnitude, for example, in text classification or clustering where cosine similarity is used.")

### Original Data (Before Normalizer)

Let's look at the original 'Deaths' and 'Age-adjusted Death Rate' data from the df_nchs dataset before applying Normalizer.
Normalizer scales each sample (row) to unit norm. We will look at the values and calculate the L2 norm for a few samples.


Unnamed: 0,Deaths,Age-adjusted Death Rate
0,2313.0,52.2
1,294.0,55.9
2,2214.0,44.8
3,1287.0,47.6
4,9198.0,28.7



L2 Norm for first 5 original samples:
Sample 0: 2313.59
Sample 1: 299.27
Sample 2: 2214.45
Sample 3: 1287.88
Sample 4: 9198.04

### Scaled Data (After Normalizer)

After applying Normalizer (L2 norm), each sample (row) in the 'Deaths' and 'Age-adjusted Death Rate' data has been scaled so that its L2 norm is 1.
This means the vector formed by the scaled values for each sample has a length of 1.


Unnamed: 0,Deaths_Normalized,Age-adjusted Death Rate_Normalized
0,0.999745,0.022562
1,0.9824,0.18679
2,0.999795,0.020231
3,0.999317,0.03696
4,0.999995,0.00312



L2 Norm for first 5 scaled samples:
Sample 0: 1.00
Sample 1: 1.00
Sample 2: 1.00
Sample 3: 1.00
Sample 4: 1.00

### Observations on Normalizer

Normalizer scales each sample (row) independently to a unit norm (defaulting to L2 norm).
This is different from other scalers which scale features (columns).
As shown by the L2 norm calculations, each scaled sample now has an L2 norm of 1.
This scaler is useful when the direction or angle of the data points is more important than their magnitude, for example, in text classification or clustering where cosine similarity is used.


## 6. QuantileTransformer

QuantileTransformer transforms features using quantiles information. It maps the data to a uniform or normal distribution. The transformation is non-linear and non-parametric, meaning it does not assume a particular distribution for the data. It is robust to outliers and beneficial for handling skewed distributions. There are two output distributions available: 'uniform' and 'normal'.



In [44]:
from sklearn.preprocessing import QuantileTransformer
import pandas as pd
import numpy as np

# Select 'Deaths' column from df_nchs as it has a potentially skewed distribution
# Handle missing values by dropping rows for simplicity
df_nchs_cleaned_quantile = df_nchs.dropna(subset=['Deaths'])
death_data_quantile = df_nchs_cleaned_quantile[['Deaths']]

# Markdown explanation for original data
print("### Original Data (Before QuantileTransformer)")
print("\nLet's look at the original 'Deaths' data from the df_nchs dataset before applying QuantileTransformer.")
print("We will examine its distribution and basic statistics.")

# Display original data
display(death_data_quantile.head())
print(f"Mean of original 'Deaths': {death_data_quantile['Deaths'].mean():.2f}")
print(f"Median of original 'Deaths': {death_data_quantile['Deaths'].median():.2f}")
print(f"Standard deviation of original 'Deaths': {death_data_quantile['Deaths'].std():.2f}")
print(f"Min of original 'Deaths': {death_data_quantile['Deaths'].min():.2f}")
print(f"Max of original 'Deaths': {death_data_quantile['Deaths'].max():.2f}")


# Instantiate QuantileTransformer (using 'normal' output distribution)
scaler = QuantileTransformer(output_distribution='normal', n_quantiles=100)

# Fit and transform the data
death_scaled_quantile = scaler.fit_transform(death_data_quantile)

# Convert the scaled data back to a DataFrame for easier viewing
death_scaled_quantile_df = pd.DataFrame(death_scaled_quantile, columns=['Deaths_Scaled_Quantile'])

# Markdown explanation for scaled data
print("\n### Scaled Data (After QuantileTransformer)")
print("\nAfter applying QuantileTransformer with 'normal' output distribution, the 'Deaths' data has been transformed to follow a normal-like distribution.")
print("The transformation is non-linear and maps the data based on its quantiles.")


# Display scaled data
display(death_scaled_quantile_df.head())

# Display statistics of scaled data
print(f"Mean of scaled 'Deaths': {death_scaled_quantile_df['Deaths_Scaled_Quantile'].mean():.2f}")
print(f"Median of scaled 'Deaths': {death_scaled_quantile_df['Deaths_Scaled_Quantile'].median():.2f}")
print(f"Standard deviation of scaled 'Deaths': {death_scaled_quantile_df['Deaths_Scaled_Quantile'].std():.2f}")
print(f"Min of scaled 'Deaths': {death_scaled_quantile_df['Deaths_Scaled_Quantile'].min():.2f}")
print(f"Max of scaled 'Deaths': {death_scaled_quantile_df['Deaths_Scaled_Quantile'].max():.2f}")


# Markdown observations
print("\n### Observations on QuantileTransformer")
print("\nQuantileTransformer maps the data to a specified distribution ('normal' in this case) by using the quantile information of the original data.")
print("This transformation is non-linear and can significantly change the shape of the distribution, making it more uniform or normal.")
print("It is robust to outliers and can be very effective in handling skewed data.")
print("After applying the transformer with 'normal' output, the scaled data's distribution should resemble a normal distribution, which can be beneficial for algorithms that assume normality.")

### Original Data (Before QuantileTransformer)

Let's look at the original 'Deaths' data from the df_nchs dataset before applying QuantileTransformer.
We will examine its distribution and basic statistics.


Unnamed: 0,Deaths
0,2313.0
1,294.0
2,2214.0
3,1287.0
4,9198.0


Mean of original 'Deaths': 10232.61
Median of original 'Deaths': 838.00
Standard deviation of original 'Deaths': 90032.61
Min of original 'Deaths': 10.00
Max of original 'Deaths': 2712630.00

### Scaled Data (After QuantileTransformer)

After applying QuantileTransformer with 'normal' output distribution, the 'Deaths' data has been transformed to follow a normal-like distribution.
The transformation is non-linear and maps the data based on its quantiles.


Unnamed: 0,Deaths_Scaled_Quantile
0,0.566865
1,-0.668674
2,0.540319
3,0.238787
4,1.133563


Mean of scaled 'Deaths': 0.00
Median of scaled 'Deaths': 0.00
Standard deviation of scaled 'Deaths': 0.99
Min of scaled 'Deaths': -5.20
Max of scaled 'Deaths': 5.20

### Observations on QuantileTransformer

QuantileTransformer maps the data to a specified distribution ('normal' in this case) by using the quantile information of the original data.
This transformation is non-linear and can significantly change the shape of the distribution, making it more uniform or normal.
It is robust to outliers and can be very effective in handling skewed data.
After applying the transformer with 'normal' output, the scaled data's distribution should resemble a normal distribution, which can be beneficial for algorithms that assume normality.


## 7. PowerTransformer

PowerTransformer applies a power transformation to make data more Gaussian-like. It can help to stabilize variance and minimize skewness. Two popular types of power transformations are the Yeo-Johnson transform and the Box-Cox transform. The Box-Cox transform requires input data to be strictly positive, while the Yeo-Johnson transform supports both positive and negative data. PowerTransformer is effective for improving the performance of models that assume normally distributed data.



In [45]:
from sklearn.preprocessing import PowerTransformer
import pandas as pd
import numpy as np

# Select 'Deaths' column from df_nchs as it has a potentially skewed distribution (identified in previous steps)
# Handle missing values by dropping rows for simplicity
# PowerTransformer can handle zero and negative values with 'yeo-johnson',
# but 'Deaths' is non-negative, so 'box-cox' is also an option if all values are positive.
# Let's use 'yeo-johnson' for generality.
df_nchs_cleaned = df_nchs.dropna(subset=['Deaths'])
death_data = df_nchs_cleaned[['Deaths']]

# Markdown explanation for original data
print("### Original Data (Before PowerTransformer)")
print("\nLet's look at the original 'Deaths' data from the df_nchs dataset before applying PowerTransformer.")
print("We will examine its distribution and basic statistics, noting potential skewness.")

# Display original data
display(death_data.head())
print(f"Mean of original 'Deaths': {death_data['Deaths'].mean():.2f}")
print(f"Median of original 'Deaths': {death_data['Deaths'].median():.2f}")
print(f"Standard deviation of original 'Deaths': {death_data['Deaths'].std():.2f}")
print(f"Min of original 'Deaths': {death_data['Deaths'].min():.2f}")
print(f"Max of original 'Deaths': {death_data['Deaths'].max():.2f}")


# Instantiate PowerTransformer (using 'yeo-johnson' method)
scaler = PowerTransformer(method='yeo-johnson')

# Fit and transform the data
death_scaled_power = scaler.fit_transform(death_data)

# Convert the scaled data back to a DataFrame for easier viewing
death_scaled_power_df = pd.DataFrame(death_scaled_power, columns=['Deaths_Scaled_Power'])

# Markdown explanation for scaled data
print("\n### Scaled Data (After PowerTransformer)")
print("\nAfter applying PowerTransformer (using 'yeo-johnson'), the 'Deaths' data has been transformed to make its distribution more Gaussian-like.")
print("Power transformations aim to reduce skewness and stabilize variance.")


# Display scaled data
display(death_scaled_power_df.head())

# Display statistics of scaled data
print(f"Mean of scaled 'Deaths': {death_scaled_power_df['Deaths_Scaled_Power'].mean():.2f}")
print(f"Median of scaled 'Deaths': {death_scaled_power_df['Deaths_Scaled_Power'].median():.2f}")
print(f"Standard deviation of scaled 'Deaths': {death_scaled_power_df['Deaths_Scaled_Power'].std():.2f}")
print(f"Min of scaled 'Deaths': {death_scaled_power_df['Deaths_Scaled_Power'].min():.2f}")
print(f"Max of scaled 'Deaths': {death_scaled_power_df['Deaths_Scaled_Power'].max():.2f}")


# Markdown observations
print("\n### Observations on PowerTransformer")
print("\nPowerTransformer applies a power function to the data to transform it towards a more Gaussian distribution.")
print("The 'yeo-johnson' method works for both positive and negative data, while 'box-cox' is only for strictly positive data.")
print("After the transformation, the distribution should be less skewed and more symmetric, which can improve the performance of models that assume normality.")
print("Comparing the mean and median before and after can give an indication of how the skewness has been affected (closer values often indicate less skewness).")
print("The standard deviation and range also change as a result of the non-linear transformation.")

### Original Data (Before PowerTransformer)

Let's look at the original 'Deaths' data from the df_nchs dataset before applying PowerTransformer.
We will examine its distribution and basic statistics, noting potential skewness.


Unnamed: 0,Deaths
0,2313.0
1,294.0
2,2214.0
3,1287.0
4,9198.0


Mean of original 'Deaths': 10232.61
Median of original 'Deaths': 838.00
Standard deviation of original 'Deaths': 90032.61
Min of original 'Deaths': 10.00
Max of original 'Deaths': 2712630.00

### Scaled Data (After PowerTransformer)

After applying PowerTransformer (using 'yeo-johnson'), the 'Deaths' data has been transformed to make its distribution more Gaussian-like.
Power transformations aim to reduce skewness and stabilize variance.


Unnamed: 0,Deaths_Scaled_Power
0,0.528858
1,-0.628822
2,0.506504
3,0.221694
4,1.189877


Mean of scaled 'Deaths': -0.00
Median of scaled 'Deaths': -0.01
Standard deviation of scaled 'Deaths': 1.00
Min of scaled 'Deaths': -3.00
Max of scaled 'Deaths': 3.18

### Observations on PowerTransformer

PowerTransformer applies a power function to the data to transform it towards a more Gaussian distribution.
The 'yeo-johnson' method works for both positive and negative data, while 'box-cox' is only for strictly positive data.
After the transformation, the distribution should be less skewed and more symmetric, which can improve the performance of models that assume normality.
Comparing the mean and median before and after can give an indication of how the skewness has been affected (closer values often indicate less skewness).
The standard deviation and range also change as a result of the non-linear transformation.


___

## Overall Observations on Data Scaling

Data scaling is a crucial preprocessing step in machine learning. It involves transforming the range or distribution of numerical features to a standard scale. This is important because many machine learning algorithms are sensitive to the scale of input features. Features with larger values might disproportionately influence the model's outcome compared to features with smaller values.

Throughout this notebook, we have demonstrated seven different data scaling techniques and observed their effects on various datasets. Here's a summary of the key observations:

- **StandardScaler:** Centers data to mean 0 and unit variance; sensitive to outliers.
- **MinMaxScaler:** Scales to a fixed range (e.g., 0–1); sensitive to outliers.
- **MaxAbsScaler:** Scales by the maximum absolute value; preserves sparsity.
- **RobustScaler:** Uses median and IQR; robust to outliers.
- **Normalizer:** Scales rows to unit norm; useful when direction matters.
- **QuantileTransformer:** Non-linear mapping to uniform/normal distribution; robust to outliers.
- **PowerTransformer:** Applies power transforms (Yeo-Johnson/Box-Cox) to reduce skewness.

Choose a scaler based on data characteristics and model requirements.

___

## Summary:

### Data Analysis Key Findings

*   Six datasets (`Data.csv`, `Employee.csv`, `heart-disease-UCI.csv`, `NCHS.csv`, `Salaries.csv`, and `titanic.csv`) were successfully loaded into pandas DataFrames.
*   Numerical columns suitable for scaling were identified across the datasets, including 'Age', 'Salary', 'Deaths', 'Age-adjusted Death Rate', 'phd', 'service', 'salary', 'PassengerId', 'Survived', 'Pclass', 'Age', 'SibSp', 'Parch', and 'Fare'.
*   Missing values were handled by dropping rows containing NaN in the selected columns before applying the scaling techniques.
*   Markdown explanations and code demonstrations were provided for each of the seven requested data scaling techniques: StandardScaler, MinMaxScaler, MaxAbsScaler, RobustScaler, Normalizer, QuantileTransformer, and PowerTransformer.
*   For each scaler, the data was displayed before and after transformation, along with relevant statistics or norms to illustrate the effect of the scaling.
*   Observations were provided for each scaler, explaining how it works, its typical use cases, and its sensitivity to outliers or data distribution.
*   A table of contents with links to each scaler section was added at the beginning of the notebook.
*   An overall summary section was added at the end of the notebook, consolidating the observations on the effects and suitability of each scaler.

### Insights or Next Steps

*   The choice of scaler significantly impacts the transformed data's range, distribution, and sensitivity to outliers; understanding these effects is crucial for selecting the appropriate scaler for a given machine learning task.
*   Further analysis could involve visualizing the distributions of the data before and after scaling to provide a more intuitive understanding of each scaler's impact, especially for non-linear transformations like QuantileTransformer and PowerTransformer.

___