# Introduction

Principal component analysis (PCA) is one of the most commonly used dimensionality reduction techniques in the industry. By converting large data sets into smaller ones containing **fewer variables**, it helps in **improving model performance**, visualising complex data sets, and in many more areas. 

# In this session
First, in order to fully appreciate PCA’s usefulness, you will look at a wide variety of situations - some of which you may have encountered in your earlier modules, like the multicollinearity problem and how PCA helps us solve it. Then, you will learn the basic definition of PCA, followed by a brief introduction to linear algebra topics that are crucial for understanding PCA and its building blocks. After this, you will look at two key ideas that form the workings of PCA: change of basis and variance as information.

# Module Overview
Module has been divided into the following main sections:

- Fundamentals of PCA: Here, you will get an idea of why you should learn about PCA and its essential building blocks before understanding the process. This has been divided into 2 sub-sessions:
 - Fundamentals of PCA I 
 - Fundamentals of PCA II

- PCA Using Python:  Here, you will implement PCA using Python and get to know its various applications.

# Prerequisites
This module requires prior knowledge of certain linear algebra concepts, such as matrices, vectors, etc. You will get to know about those prerequisites, along with a brief overview of each, as you go through the sessions. You can also learn the same from the optional course titled - Resources Part -1, which contains some useful additional content and questions to improve your understanding of these concepts. Here is a checklist of the concepts that you need to know to understand this module:

- Vectors and their properties
- Vector operations (addition, scaling, linear combination and dot product)
- Matrices 
- Matrix operations (matrix multiplication and matrix inverses)

# Problem Statement

Couple of situations where having a lot of features posed problems for us are as follows:

- **The predictive model setup:** Having a lot of correlated features leads to the multicollinearity problem. Iteratively removing features is time-consuming and also leads to some information loss.

- **Data visualisation:** It is not possible to visualise more than two variables at the same time using any 2-D plot. Therefore, finding relationships between the observations in a data set having several variables through visualisation is quite difficult. 
 

Now, PCA helps in solving both the problems mentioned above which you'll study shortly.

# Application

1. **Dimensionality reduction**  - lot of correlated features leads to the multicollinearity problem.
2. **Data Visualisation** - Dimensionality reduction helps in data visualization as representing so many features in 2D or 3D is nearly impossible. Say eg. reducing 20 features to 2 features makes it easier to plot a scatter plot or other visualization this is where PCA comes in handy.
3. **EDA** - Dimensionality reduction helps in EDA as finding corelation between multiple features is very difficult
4. **Building Predictive Models** - In building predective models you do not want features that are highly corelated. PCA can help reduce from say 200 features to 20 uncorelated features leading to
   
       a. No multicollinearity (stable and robust model)
       b. Faster models
   
6. **Finding Latent Themes** - hidden pattern in the data , use also in clustering , eg movies to categorise in gener (comedy, horror) PCA can be helpful here 
7. **Noise Reduction** - remove divergent data , outliers 

## Dimensionality Reduction MNIST Example 

In MNIST data set which is used for recognition of hand written character.  

Inputs are 28 x 28 pixel grey scale pictures shown below:

![1.png](attachment:8dd84402-9a38-4595-a0b9-4e0da925e9ca.png)

So if you flatten out each image shown above it is a 28 x 28 = 784 pixel values/variables

![2.png](attachment:8a37523d-0cd2-413a-a484-cd11e3ee6589.png)

So this is 784 dimensions which is very difficult to visualise, this is where we will use PCA. See image where the above data is changed to just 2 dimensions and is visible using the scatter plot. 

Here in the image below the color coding represents the different digits. and it is bringing together items that are similar

![3.png](attachment:0fa28f3f-58dc-4b57-8f88-d0270182c981.png)







**Dimensionality Reduction**

![4.png](attachment:3c09da60-135f-4e06-beed-8ee398acd821.png)

In the image above, you can see that a data set having N dimensions has been approximated to a smaller data set containing 'k' dimensions. In this module, you will learn how this manipulation is done. And this simple manipulation helps in several ways such as follows:

- For data visualisation and EDA
- For creating uncorrelated features that can be input to a prediction model:  With a smaller number of uncorrelated features, the modelling process is faster and more stable as well.
- Finding latent themes in the data: If you have a data set containing the ratings given to different movies by Netflix users, PCA would be able to find latent themes like genre and, consequently, the ratings that users give to a particular genre.
- Noise reduction

# How PCA works ?

In simple terms, dimensionality reduction is the exercise of dropping the unnecessary variables, i.e., the ones that add no useful information. Now, this is something that you must have done in the previous modules. In EDA, you dropped columns that had a lot of nulls or duplicate values, and so on. In linear and logistic regression, you dropped columns based on their p-values and VIF scores in the feature elimination step.

 

Similarly, what PCA does is that it converts the data by **creating new features from old ones**, where it becomes easier to decide which features to consider and which not to. 

# Definition 
PCA is a statistical procedure to convert observations of possibly correlated variables to **‘principal components’** such that:

- They are **uncorrelated** with each other.
- They are **linear combinations of the original variables**.
- They help in **capturing maximum information** in the data set.
 

Now, the aforementioned definition introduces some new terms, such as ‘linear combinations’ and ‘capturing maximum information’, for which you will need some knowledge of linear algebra concepts as well as other building blocks of PCA. In the next session, we will start our journey in the same direction with the introduction of a very basic idea: the vectorial representation of data.

# Fundamentals of PCA - I

Here we'll learn about two of the most important building blocks of PCA - **basis** and **change of basis**. But before that, we'll go through a brief refresher on basic linear algebra concepts. 

- Vectors
- Matrices and their Inverse
- Basis vectors
- Change of Basis
- PCA and Change of Basis
## Vectorial Representation of Data
You're going to learn in this segment here's a handy checklist:

- Vectors and their properties
- Vector operations (addition, scaling, linear combination and dot product)
- Matrices 
- Matrix operations (matrix multiplication and matrix inverses)
  
<p>Consider the following data set containing the height and weight of five patients.</p>
<img data-height="366" data-width="655" height="335.2671755725191" src="https://images.upgrad.com/79ddc89e-f599-43be-822e-81cba3a63ec1-PCA image 1.JPG" width="600">

<p>The height and weight information can be represented in the form of a matrix as follows</p>

<img data-height="258" data-width="269" height="258" src="https://images.upgrad.com/a3389241-1f71-4a39-82de-c5912558af4b-1.png" width="269">

with each row representing a particular patient's data and each column representing the original variable. Geometrically, these patients can be represented as shown in the following image.

<p style="text-align: center;"><img data-height="540" data-width="969" height="334.3653250773994" src="https://images.upgrad.com/57483c16-3bec-4a6a-8272-d6c6a41ba691-2.png" width="600"></p>

### Vector Representation

The vector associated with the first patient is given by the values (165, 55). This value can also be written in the following way:</p>

<ol><li>&nbsp;A column containing the values along the rows. This is also known as the column-vector representation.
<img alt="Equation" data-latex="\begin{bmatrix} 165\\ 55 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%20165%5C%5C%2055%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;"></li><li>As a transpose of the above form. Essentially, it is the same column vector but now written as a transpose of a row vector.<br><img alt="Equation" data-latex="\begin{bmatrix} 165 &amp; 55 \end{bmatrix}^{T}" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%20165%20%26%2055%20%5Cend%7Bbmatrix%7D%5E%7BT%7D" style="vertical-align: middle;display: inline;"><br>[Note: Transpose is something you must have learnt in your Python for DS&nbsp; module. If you need some brushing up on this topic, you can take a look at this<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html" target="_blank"> link</a>]</li><li>In terms of the basis vectors&nbsp;<br>This is something that you'll learn in detail in later segments. To give a brief idea, the vector (165,55) can also be written as 165<strong>i</strong> +55<strong>j</strong>, where <strong>i</strong> and <strong>j</strong> are the unit vectors along X and Y respectively and are the basis vectors used to represent all vectors in the 2-D space.</li></ol>

### Vector Representation for n-dimensional data
<p>Each vector will contain values&nbsp;representing all the dimensions or variables in the data. For example, if there was an age variable also included in the above dataset and the first patient had an age of 22 years, then the vector representing him would be written as&nbsp; (165, 55, 22). Similarly, if the dataset had 10 variables, there would be 10 dimensions in the vector representation. Similarly, you can extend it for n dimensions or variables.</p><p>Now, these vectors have certain properties and operations associated with them. Let's go ahead and learn them in the next segment. Before that, you can attempt the following question to test your understanding until now.</p>

## Vector Operations
<ol><li><strong>Vectors have a direction and magnitude</strong><br>Each vector has a direction and magnitude associated with it. The direction is given by an arrow starting from the origin and&nbsp;pointing towards the vector's position. The magnitude is given by taking a sum of squares of all the coordinates of that vector and then taking its square root.<br><br>For example, the vector (2,3)&nbsp;has the direction given by the arrow joining (0,0)&nbsp;and (2,3) pointing towards (2,3). Its magnitude is given by&nbsp;&nbsp;<img alt="Equation" data-latex="\sqrt {2^2 + 3^2} = \sqrt {13}" src="https://latex.upgrad.com/render?formula=%5Csqrt%20%7B2%5E2%20%2B%203%5E2%7D%20%3D%20%5Csqrt%20%7B13%7D" style="vertical-align: middle;display: inline;">.<br><br>Similarly, for a vector in 3 dimensions, say (2,-3,4) its direction is given by the arrow joining (0,0,0)&nbsp;and (2,-3,4) pointing towards (2,-3,4). And as in the 2D case, we get the magnitude of this vector as&nbsp;&nbsp;<img alt="Equation" data-latex="\sqrt {(2)^2 + (-3)^2 + (4)^2} = \sqrt{29}" src="https://latex.upgrad.com/render?formula=%5Csqrt%20%7B%282%29%5E2%20%2B%20%28-3%29%5E2%20%2B%20%284%29%5E2%7D%20%3D%20%5Csqrt%7B29%7D" style="vertical-align: middle;display: inline;">&nbsp;.<br>&nbsp;</li><li><strong>Vector Addition</strong><br>When you add two or more vectors, we essentially add their corresponding values element-wise. The first elements of both the vectors get added, the second element of both get added, and so on.<br>For example, if you've two vectors say&nbsp;<br><img alt="Equation" data-latex="V_1 = (2, 3)" src="https://latex.upgrad.com/render?formula=V_1%20%3D%20%282%2C%203%29" style="vertical-align: middle;display: inline;">&nbsp;and&nbsp;<img alt="Equation" data-latex="V_2 = (1,2)" src="https://latex.upgrad.com/render?formula=V_2%20%3D%20%281%2C2%29" style="vertical-align: middle;display: inline;">&nbsp;then&nbsp;<br><img alt="Equation" data-latex="V_1+V_2 = (2+1 , 3+2)=(3,5)" src="https://latex.upgrad.com/render?formula=V_1%2BV_2%20%3D%20%282%2B1%20%2C%203%2B2%29%3D%283%2C5%29" style="vertical-align: middle;display: inline;">.<br>&nbsp;</li><li>In the<strong> i, j </strong>notations that we introduced earlier, the above addition&nbsp;can be written as&nbsp;<img alt="Equation" data-latex="V_1 + V_2 = (2i+3j)+(i+2j) = (2+1)i + (3+2)j=3i+5j" src="https://latex.upgrad.com/render?formula=V_1%20%2B%20V_2%20%3D%20%282i%2B3j%29%2B%28i%2B2j%29%20%3D%20%282%2B1%29i%20%2B%20%283%2B2%29j%3D3i%2B5j" style="vertical-align: middle;display: inline;"><br>Similarly,&nbsp;this idea can be extended to multiple dimensions as well.&nbsp;<br>&nbsp;</li><li><strong>Scalar Multiplication</strong><br>If you multiply any real number or scalar by a vector, then there is a change in the magnitude of the vector and the direction remains same or turns completely opposite depending on whether the value is positive or negative respectively.</li></ol>


<img src="attachment:38a9dfe6-67bb-4a2a-b723-1a76d4c339d0.png" width="600px">

# Matrix Multiplication

Process of matrix multiplication is quite simple, and it involves element-wise multiplication followed by the addition of all the elements present in it. The one key rule that it must satisfy is when you multiply 2 matrices, say A and B, the number of columns of A must equal the number of rows in B. Visually, you can take a look at the following image to get the idea of how that should be.

![7.png](attachment:2b81490c-2c8c-4b59-89d0-1c6dfc222032.png)![8.png](attachment:e365eb25-0c6e-4164-a5ed-b7fbd1a10221.png)
 

As shown in the example, since the number of columns in the first matrix and the number of rows in the second matrix are equal to 4, matrix multiplication is possible and the resultant matrix has a shape of 5 x 6.

 

The element-wise multiplication followed by addition is also pretty straightforward as can be seen in the following example.


![8.png](attachment:40a29a2e-c043-4cb7-8081-165f7b2f07d5.png)
 

Matrix Multiplication example: 

![6.png](attachment:94bf9ea6-ca5a-4d3e-a41f-8c02d95f9833.png)

### Python Example: 

In [2]:
import numpy as np
A = np.array([2,3],[4,5])
B = np.array([1,2])

C = A @ B
C

ModuleNotFoundError: No module named 'numpy'