# 05 Crash course on Neural Nets

## Skip this lecture if you think you are a Deep Learning master

#### 👉 Let's train a few neural network models in a supervised setting.

#### 👉 In Supervised Machine Learning you have input features and target values. And the goal is to find the *right* mapping between input features and target values.

#### 👉Neural networks are usually highly-parametric models that are able to fit complex patterns between the input features and the target.

#### 👉 The type of neural network we will use is a Multi Layer Perceptron (MLP). MLPs are stacks of linear models, interleaved with *activation functions.*

In [None]:
%load_ext autoreload
%autoreload 2
%pylab inline
%config InlineBackend.figure_format = 'svg'

## Environment 🌎

In [None]:
import gymnasium as gym
env = gym.make('CartPole-v1')

# 1. Data

Steps to generate the train data and test data we will need to build the neural network, and to evaluate it.

## 1.1 Download the agent parameters from Google Drive 📩

In [None]:
from src.supervised_ml import download_agent_parameters

path_to_agent_data = download_agent_parameters()
print(f'path_to_agent_data={path_to_agent_data}')

## 1.2 Create `QAgent` object from the parameters (and hyper-parameters) we just downloaded

In [None]:
from src.q_agent import QAgent

agent = QAgent.load_from_disk(env, path=path_to_agent_data)

## 1.3 Let's check it works like a charm

In [None]:
from src.utils import set_seed
set_seed(env, 1234)

from src.loops import evaluate
rewards, steps = evaluate(
    agent, env,
    n_episodes=1000,
    epsilon=0.0 # 100% greedy strategy
)

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

fig, ax = plt.subplots(figsize = (10, 4))
ax.set_title("Rewards")    
pd.Series(rewards).plot(kind='hist', bins=100)

plt.show()

## 1.4 Generate train data -->  `train.csv`

In [None]:
from src.supervised_ml import generate_state_action_data
from src.config import DATA_SUPERVISED_ML

n_samples_train = 1000
path_to_train_data = DATA_SUPERVISED_ML / 'train.csv'
env.reset(seed=0)

# we let the agent interact with the environment until we have
# collected enough pairs (state, action)
generate_state_action_data(env, agent,
                           n_samples=n_samples_train,
                           path=path_to_train_data)

## 1.5 Generate test data --> `test.csv`

In [None]:
# test data
n_samples_test = 1000
path_to_test_data = DATA_SUPERVISED_ML / 'test.csv'

# Very important to use another seed than for the
# train set
env.reset(seed=1)

generate_state_action_data(env, agent, 
                           n_samples=n_samples_test,
                           path=path_to_test_data)

-----

# 2. Let's train a few neural networks

#### 👉We will try different neural network architecture, to see which one works best for our problem.
#### 👉 Finding the right architecture is not an easy problem. This is one of the hard parts of training neural networks.

## Load `train.csv` and `test.csv` into 🐼

In [None]:
import pandas as pd

train_data = pd.read_csv(path_to_train_data)
test_data = pd.read_csv(path_to_test_data)

## PyTorch datasets

In [None]:
from torch.utils.data import Dataset

class OptimalPolicyDataset(Dataset):

    def __init__(self, X: pd.DataFrame, y: pd.Series):
        self.X = X
        self.y = y

    def __len__(self):
        """
        Returns number of samples in the data
        """
        return len(self.X)

    def __getitem__(self, idx):
        """
        Returns the features and label
        of sample number `idx`
        """
        return self.X.iloc[idx].values, self.y.iloc[idx]

In [None]:
# split features and labels
X_train = train_data[['s0', 's1', 's2', 's3']]
y_train = train_data['action']
X_test = test_data[['s0', 's1', 's2', 's3']]
y_test = test_data['action']

# PyTorch datasets
from src.supervised_ml import OptimalPolicyDataset
train_dataset = OptimalPolicyDataset(X_train, y_train)
test_dataset = OptimalPolicyDataset(X_test, y_test)

## PyTorch dataloaders

In [None]:
BATCH_SIZE = 64

# PyTorch dataloaders
from torch.utils.data import DataLoader
train_dataloader = DataLoader(train_dataset,
                              batch_size=BATCH_SIZE,
                              shuffle=True)
test_dataloader = DataLoader(test_dataset,
                             batch_size=BATCH_SIZE,
                             shuffle=False)

## Loss function
#### 👉 The cross entropy is a common choice for classification problems

In [None]:
import torch.nn as nn
criterion = nn.CrossEntropyLoss()

## Model 0: Baseline

In [None]:
train_data['action'].value_counts()

In [None]:
test_data['action'].value_counts()

## Model 1: Linear model

### Model architecture 📐🏗️

In [None]:
import torch
from src.model_factory import get_model, count_parameters

# linear model --> no hidden_layers (hidden_layers = None)
model = get_model(input_dim=4, output_dim=2, hidden_layers=None)

# send the model to GPU if you have one.
# GPUs have a very fast implementation of matrix multiplication,
# which is the key operation to propagate inputs to outputs
# in most neural network architectures
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

print(f'{count_parameters(model):,} parameters')
print(model)

![image](https://github.com/Paulescu/hands-on-rl/blob/main/03_cart_pole/images/linear_model_sml.jpg?raw=true)

### Train loop 🏋️

In [None]:
# Tensorboard logger
import time
from src.supervised_ml import get_tensorboard_writer
run_name = f'linear/{str(int(time.time()))}'
tensorboard_writer = get_tensorboard_writer(run_name)

# Adam is always a safe choice
import torch.optim as optim
optimizer = optim.Adam(model.parameters())

# train_val_loop runs a full pass on the given data
# (either train or test) and logs metrics to tensorboard
from src.supervised_ml import get_train_val_loop
train_val_loop = get_train_val_loop(model,
                                    criterion,
                                    optimizer,
                                    tensorboard_writer)


# call train_val_loop 150 times for training,
# and 150 times for evaluating.
N_EPOCHS = 150
for epoch in range(N_EPOCHS):
    # train
    train_val_loop(is_train=True,
                   dataloader=train_dataloader,
                   epoch=epoch)

    with torch.no_grad():
        # validate
        train_val_loop(is_train=False,
                       dataloader=test_dataloader,
                       epoch=epoch)

    print('----------')

-------

## Model 2: Neural network with 1 hidden layer

### Model architecture 📐🏗️

In [None]:
model = get_model(input_dim=4, output_dim=2, hidden_layers=[256])
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

print(f'{count_parameters(model):,} parameters')
print(model)

![image](https://github.com/Paulescu/hands-on-rl/blob/main/03_cart_pole/images/nn_1_hidden_layer_sml.jpg?raw=true)

### Train loop 🏋️

In [None]:
# Tensoboard logger
import time
from src.supervised_ml import get_tensorboard_writer

run_name = f'[256]/{str(int(time.time()))}'
tensorboard_writer = get_tensorboard_writer(run_name)

# Adam is always a safe choice
import torch.optim as optim
optimizer = optim.Adam(model.parameters())

from src.supervised_ml import get_train_val_loop
train_val_loop = get_train_val_loop(model, criterion, optimizer, tensorboard_writer)

N_EPOCHS = 150

for epoch in range(N_EPOCHS):
    # train
    train_val_loop(is_train=True, dataloader=train_dataloader, epoch=epoch)

    with torch.no_grad():
        # validate
        train_val_loop(is_train=False, dataloader=test_dataloader, epoch=epoch)

    print('----------')

## Model 3: Neural network with 2 hidden layers

### Model architecture 📐🏗️

In [None]:
model = get_model(input_dim=4, output_dim=2, hidden_layers=[256, 256])
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

print(f'{count_parameters(model):,} parameters')
print(model)

![image](https://github.com/Paulescu/hands-on-rl/blob/main/03_cart_pole/images/nn_2_hidden_layers_sml.jpg?raw=true)

### Train loop 🏋️

In [None]:
# Tensoboard logger
import time
from src.supervised_ml import get_tensorboard_writer

run_name = f'[256,256]/{str(int(time.time()))}'
tensorboard_writer = get_tensorboard_writer(run_name)

# Adam is always a safe choice
import torch.optim as optim
optimizer = optim.Adam(model.parameters())

from src.supervised_ml import get_train_val_loop
train_val_loop = get_train_val_loop(model, criterion, optimizer, tensorboard_writer)

N_EPOCHS = 150

for epoch in range(N_EPOCHS):
    # train
    train_val_loop(is_train=True, dataloader=train_dataloader, epoch=epoch)

    with torch.no_grad():
        # validate
        train_val_loop(is_train=False, dataloader=test_dataloader, epoch=epoch)

    print('----------')

-------

## Model 4: Neural network with 3 hidden layers

### Model architecture 📐🏗️

In [None]:
model = get_model(input_dim=4, output_dim=2,
                  hidden_layers=[256, 256, 256])
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

print(f'{count_parameters(model):,} parameters')
print(model)

![image](https://github.com/Paulescu/hands-on-rl/blob/main/03_cart_pole/images/nn_3_hidden_layers_sml.jpg?raw=true)

### Train loop 🏋️

In [None]:
# Tensoboard logger
import time
from src.supervised_ml import get_tensorboard_writer

run_name = f'[256,256,256]/{str(int(time.time()))}'
tensorboard_writer = get_tensorboard_writer(run_name)

# Adam is always a safe choice
import torch.optim as optim
optimizer = optim.Adam(model.parameters())

from src.supervised_ml import get_train_val_loop
train_val_loop = get_train_val_loop(model, criterion, optimizer, tensorboard_writer)

N_EPOCHS = 150

for epoch in range(N_EPOCHS):
    # train
    train_val_loop(is_train=True, dataloader=train_dataloader, epoch=epoch)

    with torch.no_grad():
        # validate
        train_val_loop(is_train=False, dataloader=test_dataloader, epoch=epoch)

    print('----------')

## Tensorboard ON to visualize train and validation curves

In [None]:
from src.config import TENSORBOARD_LOG_DIR
%load_ext tensorboard
%tensorboard --logdir $TENSORBOARD_LOG_DIR