<a href="https://colab.research.google.com/github/vidhiisaxena/Mean-Normalization-and-Data-Seperation/blob/main/Mean_Normalization_and_Data_Seperation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mean Normalization
In machine learning we use large amounts of data to train our models. Some machine learning algorithms may require that the data is normalized in order to work correctly. The idea of normalization, also known as feature scaling, is to ensure that all the data is on a similar scale, i.e. that all the data takes on a similar range of values. For example, we might have a dataset that has values between 0 and 5,000. By normalizing the data we can make the range of values be between 0 and 1.

In [2]:
# import NumPy into Python
import numpy as np

# Create a 1000 x 20 ndarray with random integers in the half-open interval [0, 5001).
X = np.random.randint(0,5001,size=(1000, 20))

# print the shape of X
print("Shape of X is: ", X.shape)

Shape of X is:  (1000, 20)


In [3]:
# Average of the values in each column of X
ave_cols =np.mean(X, axis=0)

# Standard Deviation of the values in each column of X
std_cols = np.std(X, axis=0)

In [4]:
# Print the shape of ave_cols
print(ave_cols.shape)

# Print the shape of std_cols
print(std_cols.shape)

(20,)
(20,)


**We will perform mean normalization using the following equation:**

*Norm_Col𝑖 = ( Col𝑖 - Average ) / Stardard Deviation*

In [5]:
# Mean normalize X
X_norm = (X-ave_cols)/std_cols

In [6]:
# Print the average of all the values of X_norm
print("The average of all the values of X_norm is: ")
print(X_norm.mean())

# Print the average of the minimum value in each column of X_norm
print("The average of minimum values in each column of X_norm is: ")
print(X_norm.max(axis = 0).mean())

# Print the average of the maximum value in each column of X_norm
print("The average of maximum values in each column of X_norm is: ")
print(X_norm.min(axis=0).mean())

The average of all the values of X_norm is: 
-1.3322676295501878e-17
The average of minimum values in each column of X_norm is: 
1.7378590284024242
The average of maximum values in each column of X_norm is: 
-1.7332469756282962


# Data Separation
After the data has been mean normalized, it is customary in machine learnig to split our dataset into three sets:

A Training Set

*   A Training Set
*   A Cross Validation Set
*   A Test Set


The dataset is usually divided such that the Training Set contains 60% of the data, the Cross Validation Set contains 20% of the data, and the Test Set contains 20% of the data.

Separating X_norm into a Training Set, Cross Validation Set, and a Test Set. Each data set will contain rows of X_norm chosen at random, making sure that we don't pick the same row twice. This will guarantee that all the rows of X_norm are chosen and randomly distributed among the three new sets.

In [7]:
# We create a random permutation of integers 0 to 4
np.random.permutation(5)

array([1, 2, 4, 0, 3])

In [8]:
# Create a rank 1 ndarray that contains a random permutation of the row indices of `X_norm`
row_indices = np.random.permutation(X_norm.shape[0])

In [9]:
# Let's get the count of 60% rows. Since, len(X_norm) has a lenght 1000, therefore, 60% = 600
sixty = int(len(X_norm) * 0.6)

# Let's get the count of 80% rows
eighty = int(len(X_norm) * 0.8)

# Create a Training Set
# Here row_indices[:sixty] will give you first 600 values, e.g., [93 255 976 505 281 292 977,.....]
# Those 600 values will will be random, because row_indices is a 1-D array of random integers.
# Next, extract all rows represented by these 600 indices, as X_norm[row_indices[:sixty], :]
X_train = X_norm[row_indices[:sixty], :]

# Create a Cross Validation Set
X_crossVal = X_norm[row_indices[sixty: eighty], :]

# Create a Test Set
X_test = X_norm[row_indices[eighty: ], :]

In [10]:
# Print the shape of X_train
print(X_train.shape)

# Print the shape of X_crossVal
print(X_crossVal.shape)

# Print the shape of X_test
print(X_test.shape)

(600, 20)
(200, 20)
(200, 20)
