# Standardization, or mean removal and variance scaling

Standardization:Gaussian with zero mean and unit variance.

by removing the mean value of each feature, then scale it by dividing non-constant features by their standard deviation.

@If a feature has a variance that is orders of magnitude larger than others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

In [4]:
##scale函数

from sklearn import preprocessing
import numpy as np

x_train=np.array([[1.,-1.,2.],
                  [2.,0.,0.,],
                  [0.,1.,-1.]])
x_scaled=preprocessing.scale(x_train)
x_scaled

array([[ 0.        , -1.22474487,  1.33630621],
       [ 1.22474487,  0.        , -0.26726124],
       [-1.22474487,  1.22474487, -1.06904497]])

In [8]:
x_scaled.mean(axis=0)

array([0., 0., 0.])

In [9]:
x_scaled.std(axis=0)

array([1., 1., 1.])

In [13]:
###StandardScaler
scaler=preprocessing.StandardScaler().fit(x_train)
scaler

StandardScaler(copy=True, with_mean=True, with_std=True)

In [15]:
scaler.mean_

array([1.        , 0.        , 0.33333333])

In [19]:
scaler.scale_

array([0.81649658, 0.81649658, 1.24721913])

In [17]:
scaler.transform(x_train)

'''
scaler=preprocessing.StandardScaler().fit(xxx)
scaler.transform(xxx)

=======

preprocessing.scale(xxx)

'''


array([[ 0.        , -1.22474487,  1.33630621],
       [ 1.22474487,  0.        , -0.26726124],
       [-1.22474487,  1.22474487, -1.06904497]])

The scaler instance can then be used on new data to transform it the same way it did on the training set:

In [22]:
x_test=[[-1.,1.,0.]]
scaler.transform(x_test)

array([[-2.44948974,  1.22474487, -0.26726124]])

## Scaling features to a range

In [24]:
# sklearn.preprocessing.MinMaxScaler
'''Transforms features by scaling each feature to a given range.'''

'Transforms features by scaling each feature to a given range.'

In [40]:
x_train=np.array([[1.,-1.,2.],
                 [2.,0.,0.,],
                 [0.,1.,-1.]])
min_max_scaler=preprocessing.MinMaxScaler()
x_train_minmax=min_max_scaler.fit_transform(x_train)
x_train_minmax

array([[0.5       , 0.        , 1.        ],
       [1.        , 0.5       , 0.33333333],
       [0.        , 1.        , 0.        ]])

In [42]:
print(min_max_scaler.data_max_)

[2. 1. 2.]


In [46]:
print(min_max_scaler.scale_)

[0.5        0.5        0.33333333]


In [47]:
print(min_max_scaler.min_)

[0.         0.5        0.33333333]


In [48]:
feature_range=()

In [52]:
#MaxAbsScaler [-1,1]
X_train=np.array([[1.,-1.,2.],
                 [2.,0.,0.],
                 [0.,1.,-1.]])
max_abs_scaler=preprocessing.MaxAbsScaler()
X_train_maxabs=max_abs_scaler.fit_transform(X_train)
X_train_maxabs

array([[ 0.5, -1. ,  1. ],
       [ 1. ,  0. ,  0. ],
       [ 0. ,  1. , -0.5]])

In [54]:
X_test = np.array([[ -3., -1.,  4.]])
X_test_maxabs = max_abs_scaler.transform(X_test)
X_test_maxabs    

array([[-1.5, -1. ,  2. ]])

In [55]:
max_abs_scaler.scale_  

array([2., 1., 2.])

## Scaling sparse data

@MaxAbsScaler and maxabs_scale were specifically designed for scaling sparse data

Note that the scalers accept both Compressed Sparse Rows and Compressed Sparse Columns format (see scipy.sparse.csr_matrix and scipy.sparse.csc_matrix).

## Scaling data with outliers

In [56]:
# robust_scale

# RobustScaler

if a downstream model can further make some assumption on the linear independence of the features

use sklearn.decomposition.PCA with whiten=True to further remove the linear correlation across features.

In [57]:
# sklearn.decomposition.PCA

## sklearn.decomposition.PCA

In [58]:
#  KernelCenterer

# Non-linear transformation

@quantile transforms and power transforms——monotonic transformations of the features

Power transforms are a family of parametric transformations that aim to map data from any distribution to as close to a Gaussian distribution.

## Mapping to a Uniform distribution

#QuantileTransformer and #quantile_transform provide a non-parametric transformation to map the data to a uniform distribution with values between 0 and 1


In [60]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
quantile_transformer = preprocessing.QuantileTransformer(random_state=0)
X_train_trans = quantile_transformer.fit_transform(X_train)
X_test_trans = quantile_transformer.transform(X_test)
np.percentile(X_train[:, 0], [0, 25, 50, 75, 100]) 

array([4.3, 5.1, 5.8, 6.5, 7.9])

In [62]:
np.percentile(X_train_trans[:, 0], [0, 25, 50, 75, 100])

array([9.99999998e-08, 2.38738739e-01, 5.09009009e-01, 7.43243243e-01,
       9.99999900e-01])

In [63]:
np.percentile(X_test[:, 0], [0, 25, 50, 75, 100])

array([4.4  , 5.125, 5.75 , 6.175, 7.3  ])

In [65]:
np.percentile(X_test_trans[:, 0], [0, 25, 50, 75, 100])

array([0.01351351, 0.25012513, 0.47972973, 0.6021021 , 0.94144144])

## Mapping to a Gaussian distribution

@Power transforms are a family of parametric, monotonic transformations that aim to map data from any distribution to as close to a Gaussian distribution as possible in order to stabilize variance and minimize skewness.

@the Yeo-Johnson transform and @ the Box-Cox transform.

In [69]:
from sklearn import preprocessing
pt = preprocessing.PowerTransformer(method='box-cox', standardize=False)
X_lognormal = np.random.RandomState(616).lognormal(size=(3, 3))
X_lognormal                                         

AttributeError: module 'sklearn.preprocessing' has no attribute 'PowerTransformer'

In [68]:
pt.fit_transform(X_lognormal)                   

NameError: name 'pt' is not defined

It is also possible to map data to a normal distribution using QuantileTransformer by setting output_distribution='normal'. Using the earlier example with the iris dataset

In [70]:
quantile_transformer = preprocessing.QuantileTransformer(
...     output_distribution='normal', random_state=0)
X_trans = quantile_transformer.fit_transform(X)
quantile_transformer.quantiles_ 

array([[4.3       , 2.        , 1.        , 0.1       ],
       [4.31491491, 2.02982983, 1.01491491, 0.1       ],
       [4.32982983, 2.05965966, 1.02982983, 0.1       ],
       ...,
       [7.84034034, 4.34034034, 6.84034034, 2.5       ],
       [7.87017017, 4.37017017, 6.87017017, 2.5       ],
       [7.9       , 4.4       , 6.9       , 2.5       ]])

# Normalization

In [71]:
X = [[ 1., -1.,  2.],
...      [ 2.,  0.,  0.],
...      [ 0.,  1., -1.]]

In [72]:
X_normalized = preprocessing.normalize(X, norm='l2')
X_normalized

array([[ 0.40824829, -0.40824829,  0.81649658],
       [ 1.        ,  0.        ,  0.        ],
       [ 0.        ,  0.70710678, -0.70710678]])

In [73]:
normalizer = preprocessing.Normalizer().fit(X)  # fit does nothing
normalizer

Normalizer(copy=True, norm='l2')

In [74]:
normalizer.transform(X)  

array([[ 0.40824829, -0.40824829,  0.81649658],
       [ 1.        ,  0.        ,  0.        ],
       [ 0.        ,  0.70710678, -0.70710678]])

In [75]:
normalizer.transform([[-1.,  1., 0.]]) 

array([[-0.70710678,  0.70710678,  0.        ]])

@normalize and Normalizer accept both dense array-like and sparse matrices from scipy.sparse as input.

# Encoding categorical features

To convert categorical features to such integer codes, we can use the OrdinalEncoder. This estimator transforms each categorical feature to one new feature of integers (0 to n_categories - 1):

In [82]:
from sklearn import preprocessing
enc = preprocessing.OrdinalEncoder()
X = [['male', 'from US', 'uses Safari'], ['female', 'from Europe', 'uses Firefox']]
enc.fit(X) 

AttributeError: module 'sklearn.preprocessing' has no attribute 'OrdinalEncoder'

In [None]:
enc.transform([['female', 'from US', 'uses Safari']])

# Discretization

1. K-bins discretization

In [84]:
from sklearn import preprocessing
from sklearn.preprocessing import KBinsDiscretizer
X = np.array([[ -3., 5., 15 ],
               [  0., 6., 14 ],
               [  6., 3., 11 ]])

est = preprocessing.KBinsDiscretizer(n_bins=[3, 2, 2], encode='ordinal').fit(X)

ImportError: cannot import name 'KBinsDiscretizer' from 'sklearn.preprocessing' (/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/__init__.py)

2. Feature binarization

It is also common among the text processing community to use binary feature values (probably to simplify the probabilistic reasoning) even if normalized counts (a.k.a. term frequencies) or TF-IDF valued features often perform slightly better in practice.

the utility class Binarizer is meant to be used in the early stages of sklearn.pipeline.Pipeline.

In [85]:
X = [[ 1., -1.,  2.],
...      [ 2.,  0.,  0.],
...      [ 0.,  1., -1.]]
binarizer = preprocessing.Binarizer().fit(X)  # fit does nothing
binarizer

Binarizer(copy=True, threshold=0.0)

In [86]:
binarizer.transform(X)

array([[1., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.]])

In [87]:
#binarizer = preprocessing.Binarizer(threshold=1.1)
binarizer = preprocessing.Binarizer(threshold=1.1)
binarizer.transform(X)

array([[0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 0.]])

As for the StandardScaler and Normalizer classes, the preprocessing module provides a companion function binarize to be used when the transformer API is not necessary.

# Generating polynomial features

In [88]:
# PolynomialFeatures
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
X = np.arange(6).reshape(3, 2)
X


array([[0, 1],
       [2, 3],
       [4, 5]])

In [89]:
poly = PolynomialFeatures(2)
poly.fit_transform(X)   

array([[ 1.,  0.,  1.,  0.,  0.,  1.],
       [ 1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  4.,  5., 16., 20., 25.]])

# Custom transformers

In [91]:
# FunctionTransformer
#to build a transformer that applies a log transformation in a pipeline
import numpy as np
from sklearn.preprocessing import FunctionTransformer
transformer = FunctionTransformer(np.log1p, validate=True)
X = np.array([[0, 1], [2, 3]])
transformer.transform(X)

array([[0.        , 0.69314718],
       [1.09861229, 1.38629436]])

In [None]:
http://www.pianshen.com/article/2556225859/