<a href="https://colab.research.google.com/github/visiont3lab/deep-learning-course/blob/main/colab/NN_structure_Class.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Network Introduction

> Disclaimer: Il contenuto di questo notebook è basato sui libri 
* [Neural Networks from Scratch in Python by Harrison Kinsley & Daniel Kukiela](https://www.goodreads.com/book/show/55927899-neural-networks-from-scratch-in-python)
* [Pytorch Computer Vision Cookbook](https://github.com/PacktPublishing/PyTorch-Computer-Vision-Cookbook)

## Theory: Forward, Backward (Backprogration) steps

<img src="https://github.com/visiont3lab/deep-learning-course/blob/main/material/nn.jpg?raw=true" alt="Image NN" width=800></div>

Forward Step:

$$z_j = \sum_{i=1}^{p}w_{ij}^{h}x_{i} + b_{j}^h \quad with \quad j=1,2, \cdots  h$$
$$v_{j} = \delta_{j}(z_j)$$
$$\hat{y}_l = \sum_{j=1}^{h}w_{jl}^{o}v_{j} + b_{l}^o \quad with \quad l=1,2, \cdots n$$

Cost Function:

$$ J = \frac{1}{2}\sum_{l=1}^{n}e_{l}^{2} \quad with \quad e_l=\hat{y}_l-y_l =  \sum_{j=1}^{h}w_{jl}^{o}v_{j} + b_{l}^o - y_l $$

Output Layer Gradient Estimation:

$$\frac{\partial J}{w_{jl}^{o}}=\frac{\partial J}{e_l}\frac{\partial e_l}{\hat{y}_l}\frac{\partial \hat{y}_l}{w_{jl}^{o}} = e_l(1)v_j = v_je_l$$

$$\frac{\partial J}{b_{l}^{o}}=\frac{\partial J}{e_l}\frac{\partial e_l}{\hat{y}_l}\frac{\partial \hat{y}_l}{b_{l}^{o}} = e_l(1)(1) = e_l$$

Output Layer: Gradient Descent Update Rule:

$$w_{jl}^{o}=w_{jl}^{o}-l\frac{\partial J}{w_{jl}^{o}}=w_{jl}^{o}-lv_je_l$$
$$b_{l}^{o}=b_{j}^{o}-l\frac{\partial J}{b_{l}^{o}}=b_{l}^{o}-le_l$$

Hidden Layer Gradient Estimation:

$$\frac{\partial J}{w_{ij}^{h}}=\frac{\partial J}{v_j}\frac{\partial v_j}{z_j}\frac{\partial {z}_j}{w_{ij}^{h}} = (\sum_{j=1}^{h}e_jw_{jl}^o)\big(\delta_{j}^\prime(z_j)\big)(x_i) = x_i\delta_{j}^\prime(z_j)\sum_{j=1}^{h}e_jw_{jl}^o$$

$$\frac{\partial J}{b_{j}^{h}}=\frac{\partial J}{v_j}\frac{\partial v_j}{z_j}\frac{\partial {z}_j}{b_{j}^{h}} = (\sum_{j=1}^{h}e_jw_{jl}^o)\big(\delta_{j}^\prime(z_j)\big)(1) = \delta_{j}^\prime(z_j)\sum_{j=1}^{h}e_jw_{jl}^o$$

Hidden Layer: Gradient Descent Update Rule:
$$w_{ij}^{h}=w_{ij}^{h}-l\frac{\partial J}{w_{ij}^{h}}=w_{ij}^{h}-lx_i\delta_{j}^\prime(z_j)\sum_{j=1}^{h}e_jw_{jl}^o$$
$$b_{j}^{h}=b_{j}^{h}-l\frac{\partial J}{b_{j}^{h}}=b_{j}^{h}-l\delta_{j}^\prime(z_j)\sum_{j=1}^{h}e_jw_{jl}^o \quad l=learning \quad rate$$




## Single Neuron

In [None]:
# Single Neuron con 3 inputs
x = [3,5,6]              # inputs
w = [0.4,0.5,0.2]        # weights
b = 3                    # esiste un bias per ogni neurone  
# Concetto chiave: Un neurone somma ogni input moltiplicato per il peso  poi aggiunge il bias
y = ( x[0]*w[0] + x[1]*w[1] + x[2]*w[2] + b)
y

7.9

##  Neural Network: Single hidden layer of Neurons:

In [None]:
# Neural Network : 3 inputs , 1 Hidden layer formato da 3 neuroni, 1 output
# Inputs
x = [3,5,6]     
# Neurone 1 pesi e bias
w1 = [0.4,0.5,0.2]       
b1 = 3                   
# Neurone 2 pesi e bias
w2 = [0.1,1.5,0.01]       
b2 = 0.3                 
# Neurone 3 pesi e bias
w3 = [0.1,-0.3,-0.8]       
b3 = 1                   

# output di ogni neurone
yn1 = ( x[0]*w1[0] + x[1]*w1[1] + x[2]*w1[2] + b1)
yn2 = ( x[0]*w2[0] + x[1]*w2[1] + x[2]*w2[2] + b2)
yn3 = ( x[0]*w3[0] + x[1]*w3[1] + x[2]*w3[2] + b3)

print("Output di ogni neurone", yn1,yn2,yn3)

Output di ogni neurone 7.9 8.16 -5.000000000000001


## Neural Network: Dot Product

In [None]:
import numpy as np
x = np.array( [[3,5,6]] )
w = np.array( [[0.4,0.5,0.2],[0.1,0.2,0.3],[0.2,0.1,0.2]] )
b = np.array( [4,4,3])
#v = x[0]*w[0] + x[1]*w[1] + x[2]*w[2]

print(x.shape)
print(w.T.shape)
vdot =  np.dot(x,w.T) + b
print(vdot)

(1, 3)
(3, 3)
[[8.9 7.1 5.3]]


In [None]:
a = [2,3,4]
b = [3,4,5]
c = [2,3,4]
 
for el1,el2,el3 in zip(a,b,c):
    print(el1,el2,el3)



2 3 2
3 4 3
4 5 4


In [None]:
#6 input
#5 neuroni di iput
#2 nueroni di output

n_inputs = 6
num_hidden_neurons  = 5
num_output_neurons = 2
parametri = n_inputs*num_hidden_neurons + num_hidden_neurons + num_hidden_neurons*num_output_neurons + num_output_neurons
print("parametri: " , parametri)
#--> 6x5+5x2 + 5 +2  = 

x = np.array( [[3,5,6,4,5,6]] ,dtype=np.float)
w_h = np.array( [[0.4,0.5,0.2,0.4,0.2,0.4],
               [0.4,0.5,0.2,0.4,0.4,0.4],
               [0.4,0.5,0.2,0.4,0.3,0.4],
               [0.4,0.5,0.2,0.4,0.3,0.4],
               [0.4,0.5,0.2,0.4,0.3,0.4]
               ] )
b_h =  np.array( [[4,5,6,7,8]], dtype=np.float)
w_o = np.array( [[0.4,0.5,0.2,0.4,0.2],
               [0.4,0.5,0.2,0.4,0.4]
               ] )
b_o=  np.array( [[4,5]], dtype=np.float)

# Forward
v = np.dot(x,w_h.T) + b_h
print(v)
y = np.dot(v,w_o.T) + b_o
y


47
[[13.9 15.9 16.4 17.4 18.4]]


array([[31.43, 36.11]])

In [None]:
import numpy as np
x = np.array([3,5,6])
ws = np.array( [[0.4,0.5,0.2],[0.1,1.5,0.01],[0.1,-0.3,-0.8] ] )
bs = np.array( [3,0.3,1] ) 

def dot_product(x,w):
    y = 0
    for xt,wt in zip(x,w):
        y += xt*wt 
    return y

for w,b in zip(ws,bs):
    # Output di un neurone
    print("Inputs: ", x)
    print("Weights: ", w)
    # dot produt : sum of element wise multiplication
    y = np.dot(x,w) + b
    print("Result dot product numpy: ", y)
    # example dot product
    y = dot_product(x,w) + b
    print("Result dot product function: ",y)

Inputs:  [3 5 6]
Weights:  [0.4 0.5 0.2]
Result dot product numpy:  7.9
Result dot product function:  7.9
Inputs:  [3 5 6]
Weights:  [0.1  1.5  0.01]
Result dot product numpy:  8.16
Result dot product function:  8.16
Inputs:  [3 5 6]
Weights:  [ 0.1 -0.3 -0.8]
Result dot product numpy:  -5.0
Result dot product function:  -5.000000000000001


## Neural Network: Multi Hidden layer of Neurons

In [None]:
# ------ Inputs  3 input
x = np.array(     [       1,             2,              2.5           ] )

# ------ Hidden layer 1 --> 3 neuroni
w_h1 = np.array( [ [0.4,0.5,0.2], [0.1,1.5,0.01], [0.1,-0.3,-0.8]     ] ).T
b_h1 = np.array( [       3      ,       1       ,       0.3           ] )

# ------ Hidden layer 2 --> 2 neuroni
w_h2 = np.array( [ [0.2,0.1,0.4],        [0.01,-1.5,0.1]   ] ).T
b_h2 = np.array( [       1      ,             -1           ] )

# ------ Output Layer --> 1 neurone
w_h3 = np.array( [ [0.2,0.1 ] ]).T
b_h3 = np.array( [    1       ] )

v_h1 = np.dot(x,w_h1) + b_h1
print("Output hidden layer 1: ", v_h1)
v_h2 = np.dot(v_h1,w_h2) + b_h2
print("Output hidden layer 2: ",v_h2)
y = np.dot(v_h2,w_h3) + b_h3 
print("Output hidden layer 3 o Output layer: ", y)

Output hidden layer 1:  [ 4.9    4.125 -2.2  ]
Output hidden layer 2:  [ 1.5125 -7.3585]
Output hidden layer 3 o Output layer:  [0.56665]


## Neural Network: Feed Multiple input : Batch data

In [None]:
def sigmoid(x):
  return 1.0 / 1.0 + np.exp(-x)

# ------ Inputs  3 input
x = np.array( [ [1,2,2.5], [0.6,3,3],[0.2,3,32] ,[0.2,3,32] ]   )

# ------ Hidden layer 1 --> 3 neuroni
w_h1 = np.array( [ [0.4,0.5,0.2], [0.1,1.5,0.01], [0.1,-0.3,-0.8]     ] ).T
b_h1 = np.array( [ [      3      ,       1       ,       0.3         ]  ] )

# ------ Hidden layer 2 --> 2 neuroni
w_h2 = np.array( [ [0.2,0.1,0.4],        [0.01,-1.5,0.1]   ] ).T
b_h2 = np.array( [       1      ,             -1           ] )

# ------ Output Layer --> 2 neurone
w_h3 = np.array( [ [0.2,0.1 ],[3,4] ]).T
b_h3 = np.array( [    1     ,3  ] )

y_h1 = sigmoid( np.dot(x,w_h1) + b_h1 )
print("Output hidden layer 1: ", y_h1)
y_h2 = sigmoid ( np.dot(y_h1,w_h2) + b_h2 )
print("Output hidden layer 2: ", y_h2)
y = np.dot(y_h2,w_h3) + b_h3 
print("Output hidden layer 3 o Output layer: ", y)
print(y.shape)

Output hidden layer 1:  [[1.00744658e+00 1.01616349e+00 1.00250135e+01]
 [1.00479587e+00 1.00373503e+00 1.99158463e+01]
 [1.00001704e+00 1.00290884e+00 2.34330887e+11]
 [1.00001704e+00 1.00290884e+00 2.34330887e+11]]
Output hidden layer 2:  [[1.00492658 5.53429415]
 [1.00009443 2.65527962]
 [1.         1.        ]
 [1.         1.        ]]
Output hidden layer 3 o Output layer:  [[ 1.75441473 28.15195632]
 [ 1.46554685 16.62140178]
 [ 1.3        10.        ]
 [ 1.3        10.        ]]
(4, 2)


## Neural Network: Activation function



Le activation functions sono applicate all'output di ogni neurone. Di solito l'activation function usata per i neuroni è la stessa per
tutti gli hidden layer. L'output layer ha un activation function diversa normalmente lineare.

* Step Activation function: $$\begin{cases} 1 \quad x>0 \\ 0 \quad x\le 0 \end{cases} $$
* Linear activation function: $$y=x$$
* Sigmoid activation function: $$y=\frac{1}{1+e^{-x}}$$
* Tanget Hyperbolic activation function: $$y=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}$$

* Rectified Linear activation function: $$\begin{cases} x \quad x>0 \\ 0 \quad x\le 0 \end{cases} $$

Usiamo le funzioni di attivazione  per catturale la non linearità del problema.

In [None]:
def step(x):
  return np.heaviside(x, 1)

def linear(x):
  return x

def sigmoid(x):
  return 1.0 / 1.0 + np.exp(-x)

def relu(x):
  return np.maximum(0,x)

## Numpy neural network: Example: Sine Approximation 

In [None]:
x = 1*np.linspace(-2,3,6000)
y = np.cos(5*x)

x_vec = x.reshape(-1,1)
y_vec = y.reshape(-1,1)

# Forward Step: Preparare la rete
# -- input layer
w1 = np.random.randn(1,5)
b1 = np.random.rand(1,5)
w2 = np.random.randn(5,3)
b2 = np.random.rand(1,3)
# -- output layer
w3 = np.random.randn(3,1)
b3 = np.random.rand(1,1)

def tanh(x,der=False):
  f = (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))
  if der: 
    df = 1 - f**2
    return df
  return f
act = tanh

for e in range(0,epochs):
    
    # Forward
    v0 = x_vec[0].reshape(-1,1)
    z1 = np.dot(v0,w1) + b1 
    v1 = act(z1)
    z2 = np.dot(v1,w2) + b2
    v2 = act(z2)
    z3 = np.dot(v2,w3) + b3
    v3 = z3

    # Backpropagation
    dz3 = v3 - y  # --- errore
    dz2 = np.multiply( act(v2,True) ,np.dot(dz3 , w3.T))  # dJ/w3
    dz1 = np.multiply( act(v1,True) ,np.dot(dz2 , w2.T))  # dJ/w2
    dw3 = np.dot(v2.T,dz3) # dJ/w3
    db3 = np.sum(dz3,axis=0,keepdims=True) #dJ/b3
    dw2 = np.dot(v1.T,dz2) # dJ/w2
    db2 = np.sum(dz2,axis=0,keepdims=True) # dJ/b2
    dw1 = np.dot(v0.T,dz1)  # dJ/w1
    db1 = np.sum(dz1,axis=0,keepdims=True) # dJ/w1

    # Ottimizzazione: Gradient Descent
    w3 = w3 -l*dw3
    b3 = b3 -l*db3
    w2 = w2 -l*dw2
    b2 = b2 -l*db2
    w1 = w1 -l*dw1
    b1 = b1 -l*db1


ValueError: ignored

In [None]:
import numpy as np
from IPython.display import clear_output 
import plotly.graph_objects as go
from sklearn.model_selection import KFold
# https://blogs.cuit.columbia.edu/zp2130/files/2019/01/Derivative_of_Sigmoid_Hyperbolic_Tangent_Functions-1024x768.jpg
 
def get_data():
  x = 1*np.linspace(-2,3,6000)
  y = np.sin(5*x)
  return x,y

def sigmoid(x,der=False):
  f = 1/(1 + np.exp(-x))
  if der: 
    df = f * (1-f)
    return df
  return f

def tanh(x,der=False):
  f = (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))
  if der: 
    df = 1 - f**2
    return df
  return f

def relu(x,der=False):
  f = np.maximum(0,x)
  if der: 
    df = f
    df[df<=0] = 0
    df[df>0] = 1
    return df
  return f

x_vec, y_vec = get_data()

# Init params
epochs = 500
batch_size = 50
act = tanh
l = 1e-2 # 0.01
w1 = np.random.randn(1,5)
b1 = np.random.rand(1,5)
w2 = np.random.randn(5,4)
b2 = np.random.rand(1,4)
w3 = np.random.randn(4,1)
b3 = np.random.rand(1,1)
#w1 = np.array([[0.3],[0.1],[0.1],[0.1],[0.1],[0.8]], dtype=np.float32).T
#b1 = np.array([[-0.2,0.1,-0.1,0.1,0.2,0.5]], dtype=np.float32)
#w2 = np.array([[0.3,1,0.2,-0.3,0.3,0.4]], dtype=np.float32).T
#b2 = np.array([[0.2]],dtype=np.float32)

dw3 = np.zeros(w3.shape)
db3 = np.zeros(b3.shape)
dw2 = np.zeros(w2.shape)
db2 = np.zeros(b2.shape)
dw1 = np.zeros(w1.shape)
db1 = np.zeros(b1.shape)

for e in range(epochs):  
  #clear_output()

  for j in range(0, 100 ):
    # Prepare Data
    id = np.random.choice(len(y_vec), batch_size, replace=False)
    x = x_vec[id].reshape(-1,1)
    y = y_vec[id].reshape(-1,1)

    # Forward
    v0 = x
    z1 = np.dot(v0,w1) + b1 
    v1 = act(z1)
    z2 = np.dot(v1,w2) + b2
    v2 = act(z2)
    z3 = np.dot(v2,w3) + b3
    v3 = z3

    # Backpropagation
    dz3 = v3 - y 
    dz2 = np.multiply( act(v2,True) ,np.dot(dz3 , w3.T)) 
    dz1 = np.multiply( act(v1,True) ,np.dot(dz2 , w2.T)) 

    # Accumulate gradient
    dw3 += np.dot(v2.T,dz3)
    db3 += np.sum(dz3,axis=0,keepdims=True)
    dw2 += np.dot(v1.T,dz2)
    db2 += np.sum(dz2,axis=0,keepdims=True)
    dw1 += np.dot(v0.T,dz1) 
    db1 += np.sum(dz1,axis=0,keepdims=True)

    w3 = w3 - l* dw3 / batch_size
    b3 = b3 - l* db3 / batch_size
    w2 = w2 - l* dw2  / batch_size
    b2 = b2 - l* db2 / batch_size
    w1 = w1 - l* dw1 / batch_size
    b1 = b1 - l* db1 / batch_size
    dw3 = np.zeros(w3.shape)
    db3 = np.zeros(b3.shape)
    dw2 = np.zeros(w2.shape)
    db2 = np.zeros(b2.shape)
    dw1 = np.zeros(w1.shape)
    db1 = np.zeros(b1.shape)

  # Validation Error mesure
  v0 = x_vec.reshape(-1,1)
  z1 = np.dot(v0,w1) + b1 
  v1 = act(z1)
  z2 = np.dot(v1,w2) + b2
  v2 = act(z2)
  v3 = np.dot(v2,w3) + b3
  y_hat_vec = v3
  #y_hat_vec = np.dot( act( np.dot( act( np.dot(x_vec.reshape(-1,1),w1) + b1 ) ,w2) + b2 ) , w3) + b3
  y_hat_vec = y_hat_vec.reshape(-1)
  rmse = np.sqrt( np.mean( (y_vec - y_hat_vec )**2 ) ) 
  print("Epochs: " + str(e) + " MSE: " + str(np.round(rmse,3)))

# Deploy
fig = go.Figure()
fig.add_traces( go.Scatter(x=x_vec, y=y_vec, name="Real"))
fig.add_traces( go.Scatter(x=x_vec, y=y_hat_vec, name="Estimate"))
fig.show()


In [None]:
plt.plot(x,y)
plt.plot(x,y)


### Extra: Add KFold Batch Splitting

In [None]:
batch_size = 30
x = 1*np.linspace(-2,3,6000)
#x = x.reshape(-1,1)
y = np.sin(5*x)
n_splits = int(len(y)/batch_size)
print("Number of Splits: ", n_splits)
kf = KFold(n_splits=n_splits)
for train_idx, test_idx in kf.split(x,y):
  x_train, x_test = x[train_idx], x[test_idx]
  y_train, y_test = y[train_idx],y[test_idx]


Number of Splits:  200


## Pytorch Neural Network: Example: Sine Approximation

### Training

In [None]:
from torch import nn
from torch.utils.data import DataLoader
from torch import optim
import torch
from torch import nn
from torchsummary import summary
#!pip install torchsummary
import torch.nn.functional as F
from torch.utils.data import TensorDataset,Dataset
from torchvision import datasets
from torchvision import transforms
from sklearn.model_selection import train_test_split

# Loss function pytorch: https://neptune.ai/blog/pytorch-loss-functions

class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        self.fc1 = nn.Linear(1,60)
        self.fc2 = nn.Linear(60,40)
        self.fc3 = nn.Linear(40,1)
    def forward(self,x):
        x = torch.tanh(self.fc1(x))
        x = torch.tanh(self.fc2(x))
        x = self.fc3(x)
        return x

class CustomTensorDataset(Dataset):

    def __init__(self, x,y):
        self.x = x
        self.y = y

    def __getitem__(self, index):
        x = self.x[index]
        y = self.y[index]
        return x, y

    def __len__(self):
        return self.x.shape[0]


def metrics_batch(target, output):
    mse = torch.sum((output - target) ** 2)
    return mse

def loss_batch(loss_func, xb,yb,yb_h, opt=None):
    # obtain loss
    loss = loss_func(yb_h, yb)
    # obtain permormance metric 
    metric_b = metrics_batch(yb,yb_h)
    if opt is not None:
        loss.backward()
        opt.step()
        opt.zero_grad()
    return loss.item(), metric_b

def loss_epoch(model, loss_func, dataset_dl, opt,device):
    loss = 0.0
    metric = 0.0
    len_data = len(dataset_dl.dataset)

    # Get batch data
    for xb,yb in dataset_dl:    
        # Send to cuda the data (batch size)
        xb = xb.type(torch.float32).to(device)
        yb = yb.to(device)

        # obtain model output 
        yb_h = model(xb)

        # Loss and Metric Calculation
        loss_b, metric_b = loss_batch(loss_func, xb,yb,yb_h,opt)
        loss += loss_b
        if metric_b is not None:
            metric+=metric_b 
    
    loss /=len_data
    metric /=len_data
    return loss, metric

def train_val(epochs, model, loss_func, opt, train_dl,val_dl,device):
    for epoch in range(epochs):
        model.train()
        train_loss,train_metric = loss_epoch(model, loss_func, train_dl, opt,device)
        model.eval()
        with torch.no_grad():
            val_loss, val_metric = loss_epoch(model, loss_func, val_dl,opt=None,device=device)
        accuracy = 100*val_metric
        print("epoch: %d, train_loss: %.6f, val loss: %.6f, accuracy: %.2f" % (epoch,train_loss, val_loss,accuracy))

# Setup GPU Device
device = torch.device("cpu")
if torch.cuda.is_available():
    device = torch.device("cuda:0")

# Load and Preprocess data 
x = 1*np.linspace(-2,3,6000)
y = np.sin(5*x)
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.33, random_state=4)
x_train = torch.from_numpy(x_train).unsqueeze(1).type(torch.float32)
y_train = torch.from_numpy(y_train).unsqueeze(1).type(torch.float32)
x_val = torch.from_numpy(x_val).unsqueeze(1).type(torch.float32)
y_val = torch.from_numpy(y_val).unsqueeze(1).type(torch.float32)

# Transformation
train_ds = CustomTensorDataset(x_train, y_train)
val_ds = CustomTensorDataset(x_val, y_val)

# Create Data loader
train_dl = DataLoader(train_ds, batch_size=50)
val_dl = DataLoader(val_ds, batch_size=50)

# Define Model, Loss , Optimizer
model = Net()
model.to(device)
#print(model)
# By default model is hosted on CPU
print("Model Parameter Device: ", next(model.parameters()).device)
summary(model, input_size=tuple(x_train.shape))
loss_func = nn.MSELoss(reduction="sum") 
opt = optim.Adam(model.parameters(), lr=1e-2)

# Train
num_epochs = 100
train_val(num_epochs,model, loss_func,opt, train_dl, val_dl,device)

# Save Models (It save last weights)
path2weigths="./weights.pt"
torch.save(model.state_dict(),path2weigths)

Model Parameter Device:  cpu
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1             [-1, 4020, 60]             120
            Linear-2             [-1, 4020, 40]           2,440
            Linear-3              [-1, 4020, 1]              41
Total params: 2,601
Trainable params: 2,601
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.02
Forward/backward pass size (MB): 3.10
Params size (MB): 0.01
Estimated Total Size (MB): 3.12
----------------------------------------------------------------
epoch: 0, train_loss: 0.513570, val loss: 0.493185, accuracy: 49.32
epoch: 1, train_loss: 0.481905, val loss: 0.471187, accuracy: 47.12
epoch: 2, train_loss: 0.447446, val loss: 0.428971, accuracy: 42.90
epoch: 3, train_loss: 0.365902, val loss: 0.294025, accuracy: 29.40
epoch: 4, train_loss: 0.232585, val loss: 0.230907, accuracy: 23.

### Test

In [None]:
# Data 
x = 1*np.linspace(-5,6,6000)
y = np.sin(5*x)

# Model Trained Load
md = Net()
weights = torch.load(path2weigths)
md.load_state_dict(weights)
device = torch.device("cpu")
md = md.to(device)
#print(next(md.parameters()))

y_hat_vec = md(torch.from_numpy(x).unsqueeze(1).type(torch.float32))
y_hat_vec = y_hat_vec.detach().numpy()
y_hat_vec = y_hat_vec.reshape(-1)


# Deploy
fig = go.Figure()
fig.add_traces( go.Scatter(x=x, y=y, name="Real"))
fig.add_traces( go.Scatter(x=x, y=y_hat_vec, name="Estimate"))
fig.show()
