#Feature Scaling:

Feature Scaling is a technique to standardize the independent features present in the data in a fixed range. 
It is performed during the data pre-processing.

Use Case:

Given a data-set with features- Age, Salary, BHK Apartment with the data size of 5000 people, 
each having these independent data features. Each data point is labeled as:

Class1- YES (means with the given Age, Salary, BHK Apartment feature value one can buy the property)
Class2- NO (means with the given Age, Salary, BHK Apartment feature value one can’t buy the property).

Using dataset to train the model, one aims to build a model that can predict 
whether one can buy a property or not with given feature values.

Need of Feature Scaling:
    
The given data set contains 3 features – Age, Salary, BHK Apartment. Consider a range of 10- 60 for Age, 
1 Lac- 40 Lacs for Salary, 1- 5 for BHK of Flat. All these features are independent of each of other.
Suppose the centroid of class 1 is [40, 22 Lacs, 3] and data point to be predicted is [57, 33 Lacs, 2].

Using Manhattan Method:

Distance = (|(40 - 57)| + |(2200000 - 3300000)| + |(3 - 2)|)

It can be seen that Salary feature will dominate all other features while predicting the class of the given data point 
and since all the features are independent of each other i.e. a person’s salary has no relation with his/her age or 
what requirement of flat he/she has. This means that the model will always predict wrong.
So, the simple solution to this problem is Feature Scaling. Feature Scaling Algorithms will scale Age, Salary, 
BHK in fixed range say [-1, 1] or [0, 1]. And then no feature can dominate other.

In [2]:
# perform Feature Scaling

In [37]:
#importing libraries
import pandas as pd
from sklearn import preprocessing


In [44]:
#loading sample data
url=r'https://github.com/vimalstat/ML/blob/main/Apartment_Purchase.csv?raw=true'
df =  pd.read_csv(url)
print(df.head())
#handling missing values
print(df.isnull().sum())

df['Age'] = df['Age'].fillna(df['Age'].mean()).astype('int')
df['Salary'] = df['Salary'].fillna(df['Salary'].mean()).astype('int')
df.head(20)


   BHK Apartment   Age   Salary Purchased
0              2  44.0  72000.0        No
1              2  27.0  48000.0       Yes
2              3  30.0  54000.0        No
3              3  38.0  61000.0        No
4              3  40.0      NaN       Yes
BHK Apartment    0
Age              1
Salary           1
Purchased        0
dtype: int64


Unnamed: 0,BHK Apartment,Age,Salary,Purchased
0,2,44,72000,No
1,2,27,48000,Yes
2,3,30,54000,No
3,3,38,61000,No
4,3,40,63777,Yes
5,3,35,58000,Yes
6,2,38,52000,No
7,3,48,79000,Yes
8,3,50,83000,No
9,3,37,67000,Yes


In [36]:
#here Features - BHK APartment, Age and Salary columns
x = df.iloc[:, 0:3].values
print ("\nOriginal data values : \n",  x)


Original data values : 
 [[    2    44 72000]
 [    2    27 48000]
 [    3    30 54000]
 [    3    38 61000]
 [    3    40 63777]
 [    3    35 58000]
 [    2    38 52000]
 [    3    48 79000]
 [    3    50 83000]
 [    3    37 67000]]


In [33]:
# Scaled feature

#MIN MAX SCALER
min_max_scaler = preprocessing.MinMaxScaler(feature_range =(0, 1))
x_after_min_max_scaler = min_max_scaler.fit_transform(x)
print ("\nAfter min max Scaling : \n", x_after_min_max_scaler)

#Standardisation
Standardisation = preprocessing.StandardScaler()
x_after_Standardisation = Standardisation.fit_transform(x)
print ("\nAfter Standardisation : \n", x_after_Standardisation)


After min max Scaling : 
 [[0.         0.73913043 0.68571429]
 [0.         0.         0.        ]
 [1.         0.13043478 0.17142857]
 [1.         0.47826087 0.37142857]
 [1.         0.56521739 0.45077143]
 [1.         0.34782609 0.28571429]
 [0.         0.47826087 0.11428571]
 [1.         0.91304348 0.88571429]
 [1.         1.         1.        ]
 [1.         0.43478261 0.54285714]]

After Standardisation : 
 [[-1.52752523e+00  7.69734393e-01  7.49480344e-01]
 [-1.52752523e+00 -1.69922498e+00 -1.43817132e+00]
 [ 6.54653671e-01 -1.26352627e+00 -8.91258402e-01]
 [ 6.54653671e-01 -1.01663033e-01 -2.53193334e-01]
 [ 6.54653671e-01  1.88802776e-01 -6.38065068e-05]
 [ 6.54653671e-01 -5.37361746e-01 -5.26649792e-01]
 [-1.52752523e+00 -1.01663033e-01 -1.07356271e+00]
 [ 6.54653671e-01  1.35066601e+00  1.38754541e+00]
 [ 6.54653671e-01  1.64113182e+00  1.75215402e+00]
 [ 6.54653671e-01 -2.46895937e-01  2.93719581e-01]]
