# Transformation
- smoothe the data: remove the noise in the data by clustering or regression
- generalize the data
- normalize the data: min-maz, z-score, etc
- feature engineering: create new features by existing features.

We use sklearn to normalize our data.
### 1.Min-max
Transfering the data so that the new range is [0,1]  
$newData = \frac{original-min}{max-min}$

In [4]:
from sklearn import preprocessing
import numpy as np
# initialize the data, each row is a sample, each column is a feature
x = np.array([[0., -3., 1.],
             [3., 1., 2.],
             [0., 1., -1.]])
min_max_scaler = preprocessing.MinMaxScaler()
minmax_x = min_max_scaler.fit_transform(x)
print (minmax_x)

[[0.         0.         0.66666667]
 [1.         1.         1.        ]
 [0.         1.         0.        ]]


### 2. Z-score
$Z = \frac{original - \mu }{\sigma}$

In [7]:
y = np.array([[0., -3., 1.],
             [3., 1., 2.],
             [0., 1., -1.]])
z_scale = preprocessing.scale(y)
print (z_scale)

[[-0.70710678 -1.41421356  0.26726124]
 [ 1.41421356  0.70710678  1.06904497]
 [-0.70710678  0.70710678 -1.33630621]]


### 3. Moving the float point
Depend on the maximum absolute value.

In [8]:
z = np.array([[0., -30., 1.],
             [3., 12., 2.],
             [0.99, 1., -1.]])
i = np.ceil(np.log10(np.max(abs(z))))
scaled_z = z/(10**i)
print (scaled_z)

[[ 0.     -0.3     0.01  ]
 [ 0.03    0.12    0.02  ]
 [ 0.0099  0.01   -0.01  ]]
