Topic 5: **Data Normalization**

Data normalization is the process of transforming data to conform to a specified range or distribution. This is often necessary to ensure that the data meets the assumptions of certain statistical methods or machine learning algorithms. Let's explore two common techniques for data normalization:

### 1. Min-Max Scaling

Min-max scaling rescales the data to a specified range, typically between 0 and 1. This technique is useful when the distribution of the data is approximately uniform.

In [1]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Example data with numerical features
data = pd.DataFrame({'Feature1': [1, 2, 3, 4, 5]})

# Min-max scaling
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data)
data_scaled = pd.DataFrame(data_scaled, columns=data.columns)

print("Original data:")
print(data)
print("\nScaled data using Min-Max Scaling:")
print(data_scaled)

Original data:
   Feature1
0         1
1         2
2         3
3         4
4         5

Scaled data using Min-Max Scaling:
   Feature1
0      0.00
1      0.25
2      0.50
3      0.75
4      1.00


### 2. Standardization (Z-score Scaling)

Standardization rescales the data to have a mean of 0 and a standard deviation of 1. This technique is useful when the distribution of the data is approximately Gaussian.

In [4]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

# Example data with numerical features
data = pd.DataFrame({'Feature1': [1, 2, 3, 4, 5]})

# Z-score scaling
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
data_scaled = pd.DataFrame(data_scaled, columns=data.columns)

print("Original data:")
print(data)
print("\nScaled data using Z-score Scaling:")
print(data_scaled)

Original data:
   Feature1
0         1
1         2
2         3
3         4
4         5

Scaled data using Z-score Scaling:
   Feature1
0 -1.414214
1 -0.707107
2  0.000000
3  0.707107
4  1.414214
