# Centering and Scaling¶ 

1. Numeric variables are often on different scales and cover different ranges, so they can't be easily compared.
   
2. What's more, variables with large values can dominate those with smaller values when using certain modeling techniques.

3. Centering and scaling is a common preprocessing task that puts numeric variables on a common scale so no single variable will dominate the others.

4. The simplest way to center data is to subtract the mean value from each data point.

5. Subtracting the mean centers the data around zero and sets the new mean to zero.

6. Let's try zero-centering the mtcars dataset, a small set of car-related data.  

In [1]:
import numpy as np 
import pandas as pd 
import os 

%matplotlib inline  

In [2]:
os.getcwd()

'/kaggle/working'

In [3]:
os.chdir('/kaggle/')

In [4]:
os.getcwd()

'/kaggle'

In [5]:
os.listdir('/kaggle/')

['src', 'lib', 'input', 'nbdev', 'working']

In [6]:
os.listdir('/kaggle/input/')

['mtcars']

In [7]:
os.listdir('input/mtcars')  


['mtcars.csv']

In [8]:
mtcars = pd.read_csv("/kaggle/input/mtcars/mtcars.csv")   

In [9]:
print(mtcars.head()) 

               model   mpg  cyl   disp   hp  drat     wt   qsec  vs  am  gear  \
0          Mazda RX4  21.0    6  160.0  110  3.90  2.620  16.46   0   1     4   
1      Mazda RX4 Wag  21.0    6  160.0  110  3.90  2.875  17.02   0   1     4   
2         Datsun 710  22.8    4  108.0   93  3.85  2.320  18.61   1   1     4   
3     Hornet 4 Drive  21.4    6  258.0  110  3.08  3.215  19.44   1   0     3   
4  Hornet Sportabout  18.7    8  360.0  175  3.15  3.440  17.02   0   0     3   

   carb  
0     4  
1     4  
2     1  
3     1  
4     2  


In [10]:
# Set row index to car model: 

mtcars.index = mtcars.model       

# Drop car name column:

del mtcars["model"]               
 


In [11]:
print(mtcars.head())

                    mpg  cyl   disp   hp  drat     wt   qsec  vs  am  gear  \
model                                                                        
Mazda RX4          21.0    6  160.0  110  3.90  2.620  16.46   0   1     4   
Mazda RX4 Wag      21.0    6  160.0  110  3.90  2.875  17.02   0   1     4   
Datsun 710         22.8    4  108.0   93  3.85  2.320  18.61   1   1     4   
Hornet 4 Drive     21.4    6  258.0  110  3.08  3.215  19.44   1   0     3   
Hornet Sportabout  18.7    8  360.0  175  3.15  3.440  17.02   0   0     3   

                   carb  
model                    
Mazda RX4             4  
Mazda RX4 Wag         4  
Datsun 710            1  
Hornet 4 Drive        1  
Hornet Sportabout     2  


In [12]:
# Calculate column means: 

colmean = mtcars.sum()/mtcars.shape[0] 
colmean

mpg      20.090625
cyl       6.187500
disp    230.721875
hp      146.687500
drat      3.596563
wt        3.217250
qsec     17.848750
vs        0.437500
am        0.406250
gear      3.687500
carb      2.812500
dtype: float64

Conclusion: 

1. With the column means in hand, we just need to subtract the column means from each row in an element-wise fashion to zero center the data.
   
2. Pandas performs math operations involving DataFrames and columns on an element-wise row-by-row basis by default.

3. So we can simply subtract our column means series from the data set to center it: 
