# MaxAbsScaler

MaxAbsScaler is a function in the preprocessing class of sklearn library. It is used for Sparse Datasets.

We cannot do Scaling and centering of sparse datasets directly because it can distort the actual structure of the dataset.

Sparse datasets are those which have missing data at many places in the whole dataset. Missing data means less accuracy. 

We can come up with some solutions like:

* Taking the mean of the whole attribute and replacing it with the missing values. This method is good to use because mean values do not hinder the dataset as a whole until the sparsity of data is less. More replacing the missing values with the mean will surely decrease the standard deviation, which can negatively impact on accuracy.


* Taking the whole tuple out of the dataset which do have any value missing. This method is good only for very small datasets because removing large datasets will decrease the accuracy at great extent.


* We can use algorithms which internally handle the sparsity of the dataset.



In [8]:
from sklearn.preprocessing import MaxAbsScaler

MaxAbsScaler scales and transform individual attributes so that maximum absolute value of each attribute will be 1.

This does not centers the data so it does not distort the sparsity of the dataset.

In [32]:
import pandas as pd
data = pd.DataFrame([[5., 0, 0], [7., 0, 9.], [9., 0, 3.], [0 ,0, 41.]], columns=list('ABC'))
data = data.values

you can see that most of the values in the dataset are 0, which means it is a spase dataset.

In [33]:
scaler = MaxAbsScaler()
scaler.fit_transform(data)

array([[0.55555556, 0.        , 0.        ],
       [0.77777778, 0.        , 0.2195122 ],
       [1.        , 0.        , 0.07317073],
       [0.        , 0.        , 1.        ]])

# Attributes

* scale_ : relative scaling of the data per feature

In [42]:
scaler.scale_

array([ 9.,  1., 41.])

* max_abs_ : maximum absolute value per feature

In [43]:
scaler.max_abs_

array([ 9.,  0., 41.])

* n_samples_seen_ : number of samples processed by the estimator

In [44]:
scaler.n_samples_seen_

4

# References

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler

http://www.diva-portal.se/smash/get/diva2:1111045/FULLTEXT01.pdf