# Week-19 In-class Assignment


#### 1. Take one of the supervised learning models you have built recently and apply at least three dimensionality reduction techniques to it (separately). Be sure to create a short summary of each technique you use. Indicate how each changed the model performance. Reference:
https://machinelearningmastery.com/dimensionality-reduction-algorithms-with-python/


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression

### Diabetes Dataset with Logistic Regression Model

In [2]:
diabetes_df = pd.read_csv("A:\launch_code_STL\Final_Homework\week-13\diabetes.csv")
diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [3]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = diabetes_df.drop('Outcome', axis=1)
y = diabetes_df['Outcome']

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42, stratify=y)

#Standardize
sc= StandardScaler()
X_train_sc=sc.fit_transform(X_train)
X_test_sc=sc.fit_transform(X_test)

In [4]:
#LogisticRegression

clr = LogisticRegression(random_state=42).fit(X_train_sc, y_train)

#predict
y_predicted = clr.predict(X_test_sc)

#accuracy
clr.score(X_test_sc, y_test)

0.7359307359307359

### Singular Value Decomposition

SVD breaks a matrix down to its component parts. For the diabetes dataset, SVD produced an accuracy score of 0.71

In [12]:
from sklearn.decomposition import TruncatedSVD

svd = TruncatedSVD(n_components=6)

X_train_svd = svd.fit_transform(X_train)
X_test_svd = svd.fit_transform(X_test)

clr = LogisticRegression(random_state=42).fit(X_train_svd, y_train)

clr.score(X_test_svd, y_test)

0.70995670995671

#### Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html
    
https://machinelearningmastery.com/singular-value-decomposition-for-machine-learning/

### Linear Discriminant Analysis

Linear Discriminant Analysis(LDA) is most often used for multi-class classification problems. LDA reduces the number of input variables in the dataset.



In [13]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

lda = LinearDiscriminantAnalysis(n_components=1)

X_train_lda = lda.fit_transform(X_train, y_train)
X_test_lda = lda.fit_transform(X_test, y_test)

clr = LogisticRegression(random_state=42).fit(X_train_lda, y_train)

clr.score(X_test_lda, y_test)

0.7489177489177489

LDA on the diabetes dataset - Because the diabetes dataset has 2 classes, the maximum number of components to use is 1. We found that LDA gave an accuracy score of 0.75.

#### Resources:
https://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
    
https://machinelearningmastery.com/linear-discriminant-analysis-for-dimensionality-reduction-in-python/

### Principal Component Analysis

PCA is used to reduce the number of features by finding the related components in the dataset and removing the non-essential components. It projects the high dimensional original data into a lower dimensional subspace.

In [14]:
from sklearn.decomposition import PCA

pca = PCA(n_components = 6, random_state=42)

X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.fit_transform(X_test)

clr = LogisticRegression(random_state = 42).fit(X_train_pca, y_train)

clr.score(X_test_pca, y_test)

0.7489177489177489

PCA on the diabetes dataset - A PCA with 6 components gave an accuracy score of 0.75.
For both PCA and LDA we got the same accuracy score, using same random_state but the n_components for 
LDA is 1 whereas for PCA we use 6.

#### Resources:

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html 
    
https://machinelearningmastery.com/calculate-principal-component-analysis-scratch-python/ 
    
https://www.datacamp.com/community/tutorials/principal-component-analysis-in-python

#### 2. Write a function that will indicate if an inputted IPv4 address is accurate or not. IP addresses are valid if they have 4 values between 0 and 255 (inclusive), punctuated by periods. 

#### Input 1:  2.33.245.5
#### Output 1:  True

#### Input 2:  12.345.67.89
#### Output 2:  False

In [15]:
def IPv4_address(address):
    import numbers
    if address.count(".") !=3:
        return False
    elif address == "":
        return False
    else:
        lst_split_address = address.split(".")
        for k in lst_split_address:
            if k.isnumeric() == False:  # isnumeric returns True if all characters in a string are numeric, else returns False.
                return False
                break
        slice_split_address = [int(num) for num in lst_split_address[0:]]
        m=0
        for u in slice_split_address:
            if u > 255:
                return False
            else:
                m = m + 1
        if m == 4:
            return True
        else:
            return False


In [16]:
IPv4_address('12.256.67.89')

False

In [17]:
IPv4_address('2.66.245.5')

True