# Single Layer Perceptron model 
***
**Data**: Pima Indians Diabetes Database (UCI ML); <br>
**Purpose**: finding oprimal hyperparameters for data and selecting the model with highest level of identification of patients with diabeters <br> 



In [13]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron, LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.metrics import recall_score, f1_score 

df = pd.read_csv('diabetes.csv')


In [14]:
df.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


In [None]:
df['Outcome'].value_counts()     # class is not balanced

Outcome
0    500
1    268
Name: count, dtype: int64

In [16]:
X = df.drop('Outcome', axis=1)
y = df['Outcome']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [17]:
#  single layer perceptron

print("Search Grid - Hyperparameters")
slp_params = {
    'max_iter': [1000, 2000],                      #max n of iteration
    'tol': [1e-3, 1e-4],                          #tolerance
    'alpha': [0.0001, 0.001, 0.01],               #Regularization term parameter
    'eta0': [0.0001],                              #learnng rate
    'penalty': ['l2', 'l1', 'elasticnet'] ,      #regulariztion        
    'shuffle': [False],                           #shuffle training data in each iteration
    'early_stopping': [True, False],              #Whether to stop training when validation score isn't improving
    'random_state': [42],
    #'class_weight': [{0:1, 1:2}]                 if class in unbalanced assign the weights    
}

slp = Perceptron() 
slp_grid = GridSearchCV(slp, slp_params, cv=5, scoring='accuracy', n_jobs=-1)
slp_grid.fit(X_train_scaled, y_train)

print("\nparameters")
print(slp_grid.best_params_)
print("\nbest cross-validation score:", slp_grid.best_score_)
slp_pred = slp_grid.predict(X_test_scaled)



print("\nconfusion matrix:")
print(confusion_matrix(y_test, slp_pred))
print("\nclassification report:")
print(classification_report(y_test, slp_pred))

Search Grid - Hyperparameters

parameters
{'alpha': 0.01, 'early_stopping': True, 'eta0': 0.0001, 'max_iter': 1000, 'penalty': 'l2', 'random_state': 42, 'shuffle': False, 'tol': 0.001}

best cross-validation score: 0.6886312141809942

confusion matrix:
[[84 15]
 [22 33]]

classification report:
              precision    recall  f1-score   support

           0       0.79      0.85      0.82        99
           1       0.69      0.60      0.64        55

    accuracy                           0.76       154
   macro avg       0.74      0.72      0.73       154
weighted avg       0.75      0.76      0.76       154



In [18]:
#  applying logstic regression to compare results


lr = LogisticRegression(penalty='l2', solver='liblinear', max_iter= 2000, C = 10,
                    class_weight= {0:1, 1:2},
                    random_state=42)

lr.fit(X_train_scaled, y_train)

lr_pred = lr.predict(X_test_scaled)

print("confusion matrix:")
print(confusion_matrix(y_test, lr_pred))
print("\nclassification report")
print(classification_report(y_test, lr_pred))

confusion matrix:
[[66 33]
 [15 40]]

classification report
              precision    recall  f1-score   support

           0       0.81      0.67      0.73        99
           1       0.55      0.73      0.62        55

    accuracy                           0.69       154
   macro avg       0.68      0.70      0.68       154
weighted avg       0.72      0.69      0.69       154



## Results

The highest value of recall for the people with diabeter class were provided with hyperparameters settings as in the 9th case (*See Perceptron_settings_resuts.xlsx*): <br>
Results for perceptron in 9th case: <br>
Recall for 1 class - 82% with accuracy - 71%
Results for perceptron in 7th case: <br>
Recall for 1 class - 53% with accuracy - 79% <br>

Logistic regression results: <br>
Highest recall for 1st clss reached 73% with accuracy 69%