1. Look up the Adam optimization functions in PyTorch https://pytorch.org/docs/stable/optim.html . How does it work? Try at least one other optimization function with the diabetes dataset shown in class. How does the model perform with the new optimizer? Did it perform better or worse than Adam? Why do you think that is?

The Adam optimization is an adaptive learning rate method. It uses a moving average of the gradient instead of just the gradient and computes individual learning rates for different parameters. 

In [1]:
import pandas as pd
import torch

diabetes_df = pd.read_csv('../week_13/diabetes.csv')
diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [2]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = diabetes_df.drop('Outcome', axis=1).values
y = diabetes_df['Outcome'].values

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42, stratify=y)

# #Standardize
sc= StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.fit_transform(X_test)

In [3]:
import torch.nn as nn
import torch.nn.functional as F #this has activation functions

# Creating tensors
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)

y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

print(X_train)

tensor([[-0.8514, -0.9801, -0.4048,  ..., -0.6077,  0.3108, -0.7922],
        [ 0.3566,  0.1614,  0.4654,  ..., -0.3021, -0.1164,  0.5610],
        [-0.5494, -0.5045, -0.6223,  ...,  0.3726, -0.7649, -0.7076],
        ...,
        [-0.8514, -0.7582,  0.0303,  ...,  0.7800, -0.7861, -0.2847],
        [ 1.8665, -0.3142,  0.0303,  ..., -0.5695, -1.0194,  0.5610],
        [ 0.0546,  0.7322, -0.6223,  ..., -0.3149, -0.5770,  0.3073]])


In [4]:
class ANN_Model(nn.Module):
    def __init__(self, input_features=8, hidden1=20, hidden2=20, out_features =2):
        super().__init__()
        self.layer_1_connection = nn.Linear(input_features, hidden1)
        self.layer_2_connection = nn.Linear(hidden1, hidden2)
        self.out = nn.Linear(hidden2, out_features)
    
    def forward(self, x):
        #apply activation functions
        x = F.relu(self.layer_1_connection(x))
        x = F.relu(self.layer_2_connection(x))
        x = self.out(x)
        return x

In [5]:
torch.manual_seed(42)

#instantiate the model
model = ANN_Model()

In [11]:
# loss function
loss_function = nn.CrossEntropyLoss()

#optimizer
#optimizer = torch.optim.Adam(model.parameters(), lr = 0.01)
optimizer = torch.optim.Adadelta(model.parameters(), lr = 1.0)

In [12]:
#run model through multiple epochs/iterations
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = model.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1:
        print(f'Epoch number: {epoch} with loss: {loss.item()}')
    
    optimizer.zero_grad() #zero the gradient before running backwards propagation
    loss.backward() #for backward propagation 
    optimizer.step() #performs one optimization step each epoch

Epoch number: 1 with loss: 0.6276398301124573
Epoch number: 11 with loss: 0.5820643901824951
Epoch number: 21 with loss: 0.5244066119194031
Epoch number: 31 with loss: 0.48099902272224426
Epoch number: 41 with loss: 0.4535696506500244
Epoch number: 51 with loss: 0.43771159648895264
Epoch number: 61 with loss: 0.4287622272968292
Epoch number: 71 with loss: 0.4211755692958832
Epoch number: 81 with loss: 0.4141918122768402
Epoch number: 91 with loss: 0.40700459480285645
Epoch number: 101 with loss: 0.3994758129119873
Epoch number: 111 with loss: 0.3915945887565613
Epoch number: 121 with loss: 0.3838939964771271
Epoch number: 131 with loss: 0.37905967235565186
Epoch number: 141 with loss: 0.37678611278533936
Epoch number: 151 with loss: 0.3752165138721466
Epoch number: 161 with loss: 0.3739779591560364
Epoch number: 171 with loss: 0.3722655475139618
Epoch number: 181 with loss: 0.3705430328845978
Epoch number: 191 with loss: 0.36792322993278503
Epoch number: 201 with loss: 0.36456099152565

In [13]:
#predictions
y_pred = []

with torch.no_grad():
    for i, data in enumerate(X_test):
        prediction = model(data)
        y_pred.append(prediction.argmax().item())

In [14]:
from sklearn.metrics import accuracy_score
a_score = accuracy_score(y_test, y_pred)
print(a_score)

0.6883116883116883


In [15]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.84      0.64      0.73       100
           1       0.54      0.78      0.64        54

    accuracy                           0.69       154
   macro avg       0.69      0.71      0.68       154
weighted avg       0.74      0.69      0.70       154



I used the Adadelta optimization and it has a lower accuracy score than the Adam optimization, but not my much. The Adam optimization is better because it combines the better attributes of the AdaGrad and the Root Mean Square propagation.  

2. Write a function that lists and counts the number of divisors for an input value.

In [29]:
def divisors(val):
    count = 0
    num = []
    for i in range(1,val + 1):
        if val%i==0:
            count = count + 1
            num.append(i)
    print("There are " + str(count) + " divisors: " + str(num))

divisors(5)

There are 2 divisors: [1, 5]
