## __Applying Principal Component Analysis (PCA)__ ##
Let's build a principal component analysis.

## Step 1: Import the Necessary Libraries and Load Data set

- Import the **matplotlib.pyplot, pandas, NumPy, Seaborn,** and **sklearn** libraries
- Load the breast cancer data set from sklearn


In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline

In [None]:
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

In [None]:
print(cancer['DESCR'])

__Observation:__
- The cancer DESCR has given us all the information regarding the cancer data set.

## Step 2: Create a DataFrame

- Create a DataFrame from the cancer data set


In [None]:
df = pd.DataFrame(cancer['data'], columns=cancer['feature_names'])

In [None]:
df.head()

__Observation:__
- We can see a DataFrame with cancer data and feature names as columns.


## Step 3: Pre-process the Data

- Split the data into training and testing
- Instantiate a StandardScaler object and fit it to the data
- Transform the data using the StandardScaler object


In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA

In [None]:
X_train, X_test = train_test_split(df, test_size=0.2, random_state=42)
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

In [None]:
scaler = StandardScaler()
scaler.fit(df)

In [None]:
scaled_data = scaler.transform(df)

## Step 4: Apply PCA

- Import PCA from sklearn.decomposition
- Instantiate a PCA object with two components
- Fit PCA to the scaled data
- Transform the scaled data using PCA
- Check the shape of the transformed data


In [None]:
from sklearn.decomposition import PCA

In [None]:
pca = PCA()

In [None]:
pca.fit(scaled_data)

In [None]:
X_new = pca.transform(scaled_data)

In [None]:
scaled_data[:5]

In [None]:
X_new[:5]

In [None]:
pca.explained_variance_ratio_

In [None]:
pca.explained_variance_ratio_.cumsum()

In [None]:
scaled_data.shape

In [None]:
pca = PCA(n_components=4)
pca.fit(scaled_data)
X_new = pca.transform(scaled_data)

In [None]:
X_new.shape

In [None]:
X_new[:5]

## Step 5: Visualize the PCA Results

- Create a scatter plot of the two principal components, with the target variable as color

In [None]:
plt.figure(figsize=(8,6))
plt.scatter(X_new[:,0],X_new[:,1],c=cancer['target'],cmap='plasma')
plt.xlabel('First principal component')
plt.ylabel('Second Principal Component')

## Step 6: Analyze the PCA Components

- Display the PCA component matrix
- Display the explained variance


In [None]:
pca.components_

In [None]:
pca.explained_variance_