### Basic Supervised Learning Algorithms: Logistic Regression
---


In [21]:
## Import relevant libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Scikit-learn
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, StandardScaler


%matplotlib inline

np.random.seed(100)

## Logistic Regression
---
<img src='./images/LRNB1.png' width='90%'>
<img src='./images/LRNB2.png' width='90%'>
<img src='./images/LRNB3.png' width='90%'>
<img src='./images/LRNB4.png' width='90%'>
<img src='./images/LRNB5.png' width='90%'>
<img src='./images/LRNB6.png' width='90%'>
<img src='./images/LRNB7.png' width='90%'>

##### Gradient Descent Algorithm
<div class='eqnbox2' >
$$\large
\theta^{new} \leftarrow \theta^{old} - \alpha \nabla J = \theta^{old} - \alpha (X^T(y-p)).
$$
where $\alpha$ is the step size.
</div>

The following approach is from the book by Hastie et. al. [Element of Statistical Learning](https://web.stanford.edu/~hastie/ElemStatLearn/) Freely available for download.

--- 
<img src='./images/LRNB8.png' width='90%'>
Newton's Method could be used to find the stationary point.
<img src='./images/LRNB9.png' width='90%'>

<img src='./images/LRNB10.png' width='90%'>

In [22]:
from sklearn.datasets import load_iris
# IRIS FLOWER Dataset can also be loaded from  Scikit-Learn
iris = load_iris()
print (iris.data.shape)

(150, 4)


In [27]:
print(iris.target_names)
print(iris.feature_names)
print(iris.DESCR)

['setosa' 'versicolor' 'virginica']
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 3

In [29]:
#Shuffle and split
X_iris_train, X_iris_test, y_iris_train, y_iris_test = train_test_split(
    iris.data, iris.target, test_size=0.3, random_state=100)

In [30]:
scaler = MinMaxScaler()
#scaler = StandardScaler()
scaler.fit(X_iris_train)
Xn_train = scaler.transform(X_iris_train)
Xn_test =scaler.transform(X_iris_test)

In [32]:
#Create an object for Logistic Regressor
from sklearn.linear_model import LogisticRegression
lrc = LogisticRegression(C=1000, tol=0.0001, max_iter=2000)

#Train
lrc.fit(Xn_train, y_iris_train)

LogisticRegression(C=1000, max_iter=2000)

In [33]:
print ("Train Accuracy: "+ str(100 * lrc.score(Xn_train, y_iris_train)))

print ("Test Accuracy: "+ str(100 * lrc.score(Xn_test, y_iris_test)))

Train Accuracy: 97.14285714285714
Test Accuracy: 100.0


<div class="alert alert-danger">
Note that accuracy is not the best criteria to evaluate a classifier. Why? Give an example.
</div>

### Performance Evaluation: A discussion on  Precision, Recall and F1- Score
***
#### Confusion Matrix in Binary Classification
<img src="https://au.mathworks.com/matlabcentral/mlc-downloads/downloads/submissions/60900/versions/13/screenshot.png" width="60%" />

[Source:MathWorks](https://au.mathworks.com/matlabcentral/mlc-downloads/downloads/submissions/60900/versions/13/screenshot.png)<br>
NOTE: Scikit-learn uses different indexing



<img src="https://en.wikipedia.org/wiki/Receiver_operating_characteristic#/media/File:Roc-draft-xkcd-style.svg" width="50%" />




### Understanding Precision and Recall as $F_1$ - score Evaluation Metrics

**Precision**: $P = \frac{TP}{\hat{P}} = \frac{TP}{TP+FP}$. Fraction of positive predictions that are actually correct. 

**Recall**: $R = \frac{TP}{P} = \frac{TP}{TP+FN}$. Fraction of actual positives that are correctly predicted.

$$F_1-\textrm{score} = \frac{2 P R}{P+R} $$

NOTE: 
>- Recall is also called sensitivity and true positive rate (TPR).

>- False positive rate (FPR) $\frac{FP}{FP+TN} = 1- $ specificity.


<div style="display:inline;align:right">
<img src="https://miro.medium.com/max/878/1*Ub0nZTXYT8MxLzrz0P7jPA.png" width="80%" /> 
</div>


**ROC Curves** (Receiver Operating Characteristic Curves) [Read this](https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc#:~:text=An%20ROC%20curve%20(receiver%20operating,False%20Positive%20Rate)
<img src="./images/RoC.png" width="60%" />

[Image Source](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#/media/File:Roc-draft-xkcd-style.svg)


[Source 1: WikiPedia](https://upload.wikimedia.org/wikipedia/commons/thumb/2/26/Precisionrecall.svg/350px-Precisionrecall.svg.png)

**False Positive: Type - I Error **

**False Negative: Type - II Error **

<div style="display:inline"><img src="http://www.statisticssolutions.com/wp-content/uploads/2017/12/rachnovblog.jpg" width="70%" /> </div>

[Image Source 2](http://www.statisticssolutions.com/wp-content/uploads/2017/12/rachnovblog.jpg)

In [34]:
## Let us look at some other criteria for performance evaluation
from sklearn.metrics import confusion_matrix, classification_report

conf_matrix = confusion_matrix(y_iris_test, 
                               lrc.predict(Xn_test))
print (conf_matrix)
class_report = classification_report(y_iris_test, 
                                    lrc.predict(Xn_test))
print (class_report)

[[16  0  0]
 [ 0 11  0]
 [ 0  0 18]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       1.00      1.00      1.00        11
           2       1.00      1.00      1.00        18

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



#### Logistic Regression on Breast-Cancer Data (UCI)
<hr>

In [35]:
#Load Breast Cancer Data
from sklearn.datasets import load_breast_cancer
breast_cancer = load_breast_cancer()

In [36]:
print(breast_cancer.DESCR)

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        worst/largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 0 is Mean Radi

In [38]:
np.set_printoptions(precision=2)
X = breast_cancer.data
y = breast_cancer.target
print (np.round(X[5],2))
print (y[355:380]) # Some random y values
np.sum(y)

[1.24e+01 1.57e+01 8.26e+01 4.77e+02 1.30e-01 1.70e-01 1.60e-01 8.00e-02
 2.10e-01 8.00e-02 3.30e-01 8.90e-01 2.22e+00 2.72e+01 1.00e-02 3.00e-02
 4.00e-02 1.00e-02 2.00e-02 1.00e-02 1.55e+01 2.38e+01 1.03e+02 7.42e+02
 1.80e-01 5.20e-01 5.40e-01 1.70e-01 4.00e-01 1.20e-01]
[1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 1 0 0 1 1 1 1 1 0]


357

In [39]:
breast_cancer.target_names

array(['malignant', 'benign'], dtype='<U9')

In [40]:
# Split Data into Train and Test subsets:
X_train, X_test, y_train, y_test  = train_test_split(X, y, test_size=0.3, random_state=100)

In [41]:
# Scaling
scaler =MinMaxScaler()
scaler.fit(X_train)
X_train_n = scaler.transform(X_train)
X_test_n = scaler.transform(X_test)

In [42]:
lrc = LogisticRegression(C=1.0, max_iter=1000, tol=0.00001)

In [43]:
lrc.fit(X_train_n, y_train)# Train the model

LogisticRegression(max_iter=1000, tol=1e-05)

In [45]:
## NOTE: ACCURACY is not always a good performance evaluation criteria,
##especially when the class distribution is skewed/unbalanced. 
############# CONFUSION MATRIX ##################
'''
By definition a confusion matrix :math:`C` is such that :math:`C_{i, j}`
is equal to the number of observations known to be in group :math:`i` but
predicted to be in group :math:`j`.
'''
y_pred = lrc.predict(X_test_n)
print ("Confusion Matrix:\n",confusion_matrix(y_test, y_pred ))
print ("\n Classification Report:\n", classification_report(y_test, 
                                                            lrc.predict(X_test_n),
                                                            target_names=breast_cancer.target_names, 
                                                            digits=2))

Confusion Matrix:
 [[ 62   7]
 [  0 102]]

 Classification Report:
               precision    recall  f1-score   support

   malignant       1.00      0.90      0.95        69
      benign       0.94      1.00      0.97       102

    accuracy                           0.96       171
   macro avg       0.97      0.95      0.96       171
weighted avg       0.96      0.96      0.96       171



In [None]:
# Instantiate PCA
from sklearn.decomposition import PCA
pca = PCA(n_components=5)
pca.fit(Xbn.T) # Features are the rows, so we transform

Xpt = pca.transform(Xbn.T)

print("Explained Variance: %s") % pca.explained_variance_
print(pca.components_)
print "Explained Variance Ratio"
print pca.explained_variance_ratio_

### Logistic Regression and Perceptron Learning Algorithm
---
<img src = "https://www.simplilearn.com/ice9/free_resources_article_thumb/symbolic-representation-of-perceptron-learning-rule.jpg" width='80%'/>

In [44]:
%%html
<style>
.eqnbox{
    margin:auto;width:500px;padding:20px;
    border: 3px solid green; border-radius:15px;margin-top:20px;margin-bottom:20px;
}
.eqnbox2{
    margin:auto;width:500px;padding:20px;
    border: 1px solid green; border-radius:15px;margin-top:20px;margin-bottom:20px;
}
.eqnbox3{
    margin:auto;width:700px;padding:20px;background-color:#c6d6b4;
    border: 1px solid green; border-radius:15px;margin-top:20px;margin-bottom:20px;
}
</style>

In [47]:
np.linalg.inv(np.array([[35, 44],
                       [44, 56]])) @ np.array([[9],[12]])

array([[-1.],
       [ 1.]])