# ‚öñÔ∏è Data Scaling: Standardization vs Normalization vs Min‚ÄìMax Normalization

Many machine learning algorithms are sensitive to the **scale of features**.  
To ensure fair contribution of each variable, we often apply **scaling techniques**.

## üîπ Standardization (Z-score Normalization)

Transforms data so that it has **mean = 0** and **standard deviation = 1**.

$z = \frac{x - \mu}{\sigma}$

- $\mu$ ‚Üí mean of the feature  
- $\sigma$ ‚Üí standard deviation  

**Pros**:
- Keeps outliers (not bounded).  
- Useful for algorithms assuming Gaussian distribution (e.g., Logistic Regression, Linear Regression, PCA).  

## üîπ Normalization (Vector Norm)

Scales data so that the **magnitude of each sample vector** equals 1.

For **L2 normalization**:

$x_{norm} = \frac{x}{\|x\|_2}$

- Ensures each row (sample) has unit length.  
- Often used in **text mining** or algorithms relying on cosine similarity (e.g., KNN, SVM).  

## üîπ Min‚ÄìMax Normalization

Scales data to a **fixed range** (usually [0,1]):

$x' = \frac{x - x_{min}}{x_{max} - x_{min}}$

- $x_{min}, x_{max}$ ‚Üí minimum and maximum values of the feature.  

**Pros**:
- Keeps relationships between values.  
- Ideal for algorithms that require bounded input (e.g., Neural Networks, K-Means).  

**Cons**:
- Sensitive to outliers (they stretch the range).  

---


#### ¬ª Create a dataframe with vectors and set the type to float

In [1]:
import numpy as np
import pandas as pd
vector1 = np.array([2,7,4,9,1])
vector2 = np.array([4,2,7,8,7])
vector3 = np.array([5,4,8,12,5])
df = pd.DataFrame({"V1":vector1,"V2":vector2,"V3":vector3})
df = df.astype(float)
df

Unnamed: 0,V1,V2,V3
0,2.0,4.0,5.0
1,7.0,2.0,4.0
2,4.0,7.0,8.0
3,9.0,8.0,12.0
4,1.0,7.0,5.0


## Standardization (Z-score Normalization)

In [4]:
from sklearn import preprocessing
preprocessing.scale(df)

array([[-0.86474714, -0.71269665, -0.61522733],
       [ 0.79822813, -1.60356745, -0.9570203 ],
       [-0.19955703,  0.62360956,  0.41015156],
       [ 1.46341823,  1.06904497,  1.77732341],
       [-1.19734219,  0.62360956, -0.61522733]])

## Normalization (Vector Norm)

In [5]:
preprocessing.normalize(df)

array([[0.2981424 , 0.59628479, 0.74535599],
       [0.84270097, 0.24077171, 0.48154341],
       [0.35218036, 0.61631563, 0.70436073],
       [0.52941176, 0.47058824, 0.70588235],
       [0.11547005, 0.80829038, 0.57735027]])

## Min‚ÄìMax Normalization

In [6]:
scaler = preprocessing.MinMaxScaler(feature_range=(10,20))
scaler.fit_transform(df)

array([[11.25      , 13.33333333, 11.25      ],
       [17.5       , 10.        , 10.        ],
       [13.75      , 18.33333333, 15.        ],
       [20.        , 20.        , 20.        ],
       [10.        , 18.33333333, 11.25      ]])