## Week 18 in class assignment

### 1. Look up the Adam optimization functions in PyTorch https://pytorch.org/docs/stable/optim.html . 

How does it work? Try at least one other optimization function with the diabetes dataset shown in class. 

How does the model perform with the new optimizer? 

Did it perform better or worse than Adam? Why do you think that is?

Adam optimization - algorithm on top of Base class. Adam optimizer involves a combination of two gradient descent methodologies - momentum (taking into consideration the exponentially weighted average of the gradients) and Root mean square propagation  (taking the exponential moving average).

Similar to the momentum optimizer, Adam makes use of an exponentially decaying average of past gradients. Thus, the direction of parameter updates is calculated in a manner similar to that of the momentum optimizer.

Adam also employs an exponentially decaying average of past squared gradients in order to provide an adaptive learning rate. Thus, the scale of the learning rate for each dimension is calculated in a manner similar to that of the RMSProp optimizer.

Changed the learning rate to 0.05 and increase recall, precision, and accuracy.

Chose Adagrad which performed slightly better than Adam, with the same parameters. I think it is because with Adagrad, each parameter has it's own learning rate that improves performance with sparse gradients.

In [1]:
import pandas as pd
import torch

diabetes_df = pd.read_csv("../../in_class/in_class_assignments/diabetes.csv")
diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [2]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = diabetes_df.drop('Outcome', axis=1).values
y = diabetes_df['Outcome'].values

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42, stratify=y)

# Standardize
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)


In [3]:
import torch.nn as nn
import torch.nn.functional as F #where the activation functions are

#create tensors = matrices
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)

y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

In [4]:
#artificial neural network
class ANN_Model(nn.Module):
    def __init__(self, input_features=8,hidden1=20,hidden2=20,out_features=2):
        super().__init__() #super is a computed indirect reference. So, it isolates changes
        # and makes sure that children in the layers of multiple inheritence are calling
        #the right parents
        self.layer_1_connection = nn.Linear(input_features, hidden1)
        self.layer_2_connection = nn.Linear(hidden1, hidden2)
        self.out = nn.Linear(hidden2, out_features)
        
    def forward(self, x):
        #apply activation functions
        x = F.relu(self.layer_1_connection(x))
        x = F.relu(self.layer_2_connection(x))
        x = self.out(x)
        return x

In [5]:
torch.manual_seed(42)

#create instance of model
ann = ANN_Model()

In [6]:
#loss function
loss_function = nn.CrossEntropyLoss()

#optimizer
optimizer = torch.optim.Adam(ann.parameters(),lr=0.01) #lr is learning rate - play around with

In [7]:
#run model through multiple epochs/iterations
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = ann.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1:
        print(f'Epoch number: {epoch} with loss: {loss}')
        
    optimizer.zero_grad() #zero the gradient before running backwards propagation
    loss.backward() 
    optimizer.step() #perform one optimization step each epoch

Epoch number: 1 with loss: 0.647470235824585
Epoch number: 11 with loss: 0.5270779132843018
Epoch number: 21 with loss: 0.4539138376712799
Epoch number: 31 with loss: 0.4234801232814789
Epoch number: 41 with loss: 0.39819812774658203
Epoch number: 51 with loss: 0.3721073269844055
Epoch number: 61 with loss: 0.34377244114875793
Epoch number: 71 with loss: 0.31378453969955444
Epoch number: 81 with loss: 0.28582650423049927
Epoch number: 91 with loss: 0.25994443893432617
Epoch number: 101 with loss: 0.2377132922410965
Epoch number: 111 with loss: 0.21422427892684937
Epoch number: 121 with loss: 0.19071193039417267
Epoch number: 131 with loss: 0.17592686414718628
Epoch number: 141 with loss: 0.15840402245521545
Epoch number: 151 with loss: 0.14416663348674774
Epoch number: 161 with loss: 0.12847739458084106
Epoch number: 171 with loss: 0.11511365324258804
Epoch number: 181 with loss: 0.10320579260587692
Epoch number: 191 with loss: 0.09023238718509674
Epoch number: 201 with loss: 0.0791607

In [8]:
# predictions
y_pred = []

with torch.no_grad():
    for i, data in enumerate(X_test):
        prediction = ann(data)
        y_pred.append(prediction.argmax())
        

In [9]:
from sklearn.metrics import classification_report

print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.75      0.81      0.78       150
           1       0.59      0.49      0.54        81

    accuracy                           0.70       231
   macro avg       0.67      0.65      0.66       231
weighted avg       0.69      0.70      0.69       231



Going to work on the learning rate first (leaving Adam there) to see what the results are

In [36]:
#loss function
loss_function = nn.CrossEntropyLoss()

#optimizer
optimizer = torch.optim.Adam(ann.parameters(),lr=0.05) #lr is learning rate - play around with

In [37]:
#run model through multiple epochs/iterations
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = ann.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    optimizer.zero_grad() #zero the gradient before running backwards propagation
    loss.backward() 
    optimizer.step() #perform one optimization step each epoch

In [38]:
# predictions
y_pred = []

with torch.no_grad():
    for i, data in enumerate(X_test):
        prediction = ann(data)
        y_pred.append(prediction.argmax())

In [39]:
from sklearn.metrics import classification_report

print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.77      0.82      0.80       150
           1       0.62      0.56      0.59        81

    accuracy                           0.73       231
   macro avg       0.70      0.69      0.69       231
weighted avg       0.72      0.73      0.72       231



In [46]:
#loss function
loss_function = nn.CrossEntropyLoss()

#optimizer
optimizer = torch.optim.Adagrad(ann.parameters(),lr=0.01)



In [47]:
#run model through multiple epochs/iterations
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = ann.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    optimizer.zero_grad() #zero the gradient before running backwards propagation
    loss.backward() 
    optimizer.step() #perform one optimization step each epoch

In [48]:
# predictions
y_pred = []

with torch.no_grad():
    for i, data in enumerate(X_test):
        prediction = ann(data)
        y_pred.append(prediction.argmax())

In [49]:
from sklearn.metrics import classification_report

print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.75      0.82      0.79       150
           1       0.60      0.51      0.55        81

    accuracy                           0.71       231
   macro avg       0.68      0.66      0.67       231
weighted avg       0.70      0.71      0.70       231



### 2. Write a function that lists and counts the number of divisors for an input value.

Example 1:

Input: 5

Output: “There are 2 divisors: 1 and 5”

Example 2:

Input: 40

Output: “There are 8 divisors: 1, 2, 4, 5, 8, 10, 20, and 40".

In [103]:
def divisor_count(num):
    divisors = []
    for i in range(1, num + 1):
        if num % i == 0:
            divisors.append(i)
    #convert to string, putting all but the last number into the string so a comma can be added if needed
    str_to_hold = str((divisors)[:-1])[1:-1]
    #if the string is longer than 1 digit, adding a comma at the end
    if len(str_to_hold) > 1:
        str_to_hold = str_to_hold + ','
    #using the converted string, print it followed by "and" and then the last one   
    print(f'There are {len(divisors)} divisors: {str_to_hold} and {divisors[-1]}')
    
    return

In [104]:
divisor_count(5)
divisor_count(40)

There are 2 divisors: 1 and 5
There are 8 divisors: 1, 2, 4, 5, 8, 10, 20, and 40


In [105]:
divisor_count(4)

There are 3 divisors: 1, 2, and 4
