# Normalization Data

- standard scaler magic: Standardize features by removing the mean and scaling to unit variance
- StandardScaler() will normalize the features (each column of X, INDIVIDUALLY !!!) so that each column/feature/variable will have mean = 0 and standard deviation = 1.
>- mean: $\mu = \frac{1}{n} \sum_{i=1}^n x_i$
>- stdev: $\sigma = \sqrt{\frac{1}{n} \sum_{i=1}^n (x_i - \mu_i)^2}$
>- standardization: $z = \frac{x-\mu}{\sigma}$

In [3]:
from sklearn.metrics.pairwise import euclidean_distances
import numpy as np

In [4]:
X = np.array([[1.0,2.0,100.0],[4.0,1.0,75.0],[2.0,1.0,40.0]]) # float for preventing StandardScaler() varnings
print(X)

X_dis =np.around(euclidean_distances(X), 2)
print(X_dis)

[[  1.   2. 100.]
 [  4.   1.  75.]
 [  2.   1.  40.]]
[[ 0.   25.2  60.02]
 [25.2   0.   35.06]
 [60.02 35.06  0.  ]]


In [7]:
from sklearn.preprocessing import StandardScaler

X_norm = np.around( StandardScaler().fit_transform(X))
print(X_norm)

X_dis = np.around(euclidean_distances(X_norm) , 2)
print(X_dis)

[[-1.  0.  1.]
 [-0. -1.  0.]
 [ 1.  1. -1.]]
[[0.   1.73 3.  ]
 [1.73 0.   2.45]
 [3.   2.45 0.  ]]
