<b>This version of the model, </b>
1. runs the code on gpu, if gpu  is available 
2. adds a ReLU activation function after the linear layers
    so instead of y = W*x +b, it is, y = ReLU(W*x +b). It could also be, y = sigmoid(W*x +b) 

Steps:
1. get dataset, make dataloaders object
2. make model
3. train model
4. evaluate model

<b> Prepare Data </b> 

In [16]:
import torch
from torch import nn 

import torchvision
from torchvision import datasets
from torchvision.transforms import ToTensor

In [17]:
train_data = datasets.FashionMNIST(root="data", train=True, transform=ToTensor(), download=True)
test_data = datasets.FashionMNIST(root="data", train=False, transform=ToTensor(), download=True)

from torch.utils.data import DataLoader
BATCH_SIZE= 32
train_dataloader = DataLoader(train_data, batch_size=32, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=32, shuffle=False)

<b>Making model </b>

In [18]:
class FashionMNISTModelV1(nn.Module):
    def __init__(self, input_shape:int, hidden_units:int, output_shape:int ):
        super().__init__()
        self.layer_stack = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=input_shape, out_features=hidden_units),
            nn.ReLU(),
            nn.Linear(in_features=hidden_units, out_features=output_shape),
            nn.ReLU()
        )
    def forward(self, x:torch.Tensor):
        return self.layer_stack(x)

In [19]:
#instantiating model and setting device
device = "gpu" if torch.cuda.is_available() else "cpu"
model_1 = FashionMNISTModelV1(input_shape=784, hidden_units=10, output_shape=10)
model_1.to(device)

#checking the current device
next(model_1.parameters()).device

device(type='cpu')

<b> Training the Model </b>

In [20]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model_1.parameters(), lr=0.1)

epochs = 3

model_1.to(device)
for epoch in range(epochs):
    print("training in epoch: ", epoch)
    train_loss = 0

    batch_count = 0
    for X, y in train_dataloader:
        X, y = X.to(device), y.to(device)
        batch_count += 1
        # X = a batch of 32 images
        # y = the corresponding labels for the images
        model_1.train() #setting our model to train mode. whether we are testing or training is needed to be known by some layers but not all
        #1 forward pass
        y_pred = model_1(X) 
        #2 loss for the batch        
        loss = loss_fn(y_pred, y)
        #3 optimizer zero grad
        optimizer.zero_grad()
        #4 backward()
        loss.backward() #the backward function doesn't work without a forward pass prior to its call. hence we had to calulate loss. i guess the forward pass creates the necessary computational graph for gradient calculation
        #5 update weights
        optimizer.step()

        if (batch_count % 400 == 0):
            print("finished training with batch ", batch_count)

    print("finished training of epoch ", epoch)


training in epoch:  0
finished training with batch  400
finished training with batch  800
finished training with batch  1200
finished training with batch  1600
finished training of epoch  0
training in epoch:  1
finished training with batch  400
finished training with batch  800
finished training with batch  1200
finished training with batch  1600
finished training of epoch  1
training in epoch:  2
finished training with batch  400
finished training with batch  800
finished training with batch  1200
finished training with batch  1600
finished training of epoch  2


<b>Training done, now evaluating the model</b>

In [21]:
from helper_func import eval_model

eval_model(model=model_1, test_dataloader=test_dataloader, loss_fn=loss_fn, device=device)

{'model_name': 'FashionMNISTModelV1',
 'model_acc': 81.9988019169329,
 'model_loss': tensor(0.4906, grad_fn=<DivBackward0>)}