<a href="https://colab.research.google.com/github/vlassner/dsml4220_lab4/blob/main/lab4_mlp_w_w_and_b.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 4: Experimenting with a Multi-layer Perceptron (with W&B)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sgeinitz/DSML4220/blob/main/lab4_mlp_w_w_and_b.ipynb)

[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/sgeinitz/DSML4220/blob/main/lab4_mlp_w_w_and_b.ipynb)

In this notebook we'll build go back to our MLP to classify the nationality of a surname (notebook [Misc 2](https://github.com/sgeinitz/DSML4220/blob/main/misc02_mlp_for_surnames.ipynb)), but this time we want to see how a neural network (with all of its various configuration) can be experimented on. By "_experiment_", we are not only talking about possible hyperparameter tuning (similar to ) but also other possible factors that could cause a neural networks performance to vary, such as type of optimizer (SGD, RMSProp, Adam, etc.), batch sizes, dropout rates, etc.

Again, this was a fun dataset as it is very easy to understand. It consisted of 10k observations with:
* y = nationality  
* x = last name

### Lab 4 Assignment/Task
Similar to Lab 2, this lab does not have separate questions. It instead only asks you to change some of the various parameters/configurations/etc. that you want to experiment with, and then run the notebook so that the results appear on [https://wandb.ai](https://wandb.ai).

Note that everything you need is in this notebook, but if you want to see more of what wandb can do, check out [their intro notebook](https://wandb.me/intro).

As before, you'll submit the link to your completed notebook, and your notebook should have links to a wandb reports.

__Note that at the bottom of this notebook there are a few cells that you need to complete. This notebook will not run until you do so.__


********

__Write a few sentences about which hyperparameter/configuration settings had the biggest impact on performance and how?__
The hyperparameters/ configuration settings that had the biggest impact were the dropout rates, the learning rates and the rmsprop optimizer. For the learning rate, the values were too high. I believe smaller values <0.1 would have been better to test. The lower learning rates demonstrated better results than the ones using 0.1. The dropout rate on the other hand had very mixed results where it did good and bad with both the high and low rates. The rmsprop optimizer did the worst conisistly out of the three optimizers. While it did get the best parameters for one iteration at 59% for accuracy, the other two iterations had the worst accuracy validation at 25% and 30%.

*******

URL: https://wandb.ai/DSML4220/Lassner-mlp-Victoria/sweeps/cypxacle?nw=nwuservlassner


## Imports

In [17]:
from argparse import Namespace
import json
import os

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

In [18]:
#!pip install wandb

import wandb
import random

In [19]:
wandb.login()
# or
#!wandb login

True

In [20]:
sweep_config = {
    'method': 'random' # or 'grid' or 'bayes'
}

In [21]:
metric = {
    'name': 'loss',
    'goal': 'minimize'
}

sweep_config['metric'] = metric

## Data Vectorization classes

### The Vocabulary

In [22]:
class Vocabulary(object):
    """Class to process text and extract vocabulary for mapping"""

    def __init__(self, token_to_idx=None, add_unk=True, unk_token="<UNK>"):
        """
        Args:
            token_to_idx (dict): a pre-existing map of tokens to indices
            add_unk (bool): a flag that indicates whether to add the UNK token
            unk_token (str): the UNK token to add into the Vocabulary
        """

        if token_to_idx is None:
            token_to_idx = {}
        self._token_to_idx = token_to_idx

        self._idx_to_token = {idx: token
                              for token, idx in self._token_to_idx.items()}

        self._add_unk = add_unk
        self._unk_token = unk_token

        self.unk_index = -1
        if add_unk:
            self.unk_index = self.add_token(unk_token)


    def to_serializable(self):
        """ returns a dictionary that can be serialized """
        return {'token_to_idx': self._token_to_idx,
                'add_unk': self._add_unk,
                'unk_token': self._unk_token}

    @classmethod
    def from_serializable(cls, contents):
        """ instantiates the Vocabulary from a serialized dictionary """
        return cls(**contents)

    def add_token(self, token):
        """Update mapping dicts based on the token.

        Args:
            token (str): the item to add into the Vocabulary
        Returns:
            index (int): the integer corresponding to the token
        """
        try:
            index = self._token_to_idx[token]
        except KeyError:
            index = len(self._token_to_idx)
            self._token_to_idx[token] = index
            self._idx_to_token[index] = token
        return index

    def add_many(self, tokens):
        """Add a list of tokens into the Vocabulary

        Args:
            tokens (list): a list of string tokens
        Returns:
            indices (list): a list of indices corresponding to the tokens
        """
        return [self.add_token(token) for token in tokens]

    def lookup_token(self, token):
        """Retrieve the index associated with the token
          or the UNK index if token isn't present.

        Args:
            token (str): the token to look up
        Returns:
            index (int): the index corresponding to the token
        Notes:
            `unk_index` needs to be >=0 (having been added into the Vocabulary)
              for the UNK functionality
        """
        if self.unk_index >= 0:
            return self._token_to_idx.get(token, self.unk_index)
        else:
            return self._token_to_idx[token]

    def lookup_index(self, index):
        """Return the token associated with the index

        Args:
            index (int): the index to look up
        Returns:
            token (str): the token corresponding to the index
        Raises:
            KeyError: if the index is not in the Vocabulary
        """
        if index not in self._idx_to_token:
            raise KeyError("the index (%d) is not in the Vocabulary" % index)
        return self._idx_to_token[index]

    def __str__(self):
        return "<Vocabulary(size=%d)>" % len(self)

    def __len__(self):
        return len(self._token_to_idx)

### The Vectorizer

In [23]:
class SurnameVectorizer(object):
    """ The Vectorizer which coordinates the Vocabularies and puts them to use"""
    def __init__(self, surname_vocab, nationality_vocab):
        """
        Args:
            surname_vocab (Vocabulary): maps characters to integers
            nationality_vocab (Vocabulary): maps nationalities to integers
        """
        self.surname_vocab = surname_vocab
        self.nationality_vocab = nationality_vocab

    def vectorize(self, surname):
        """
        Args:
            surname (str): the surname

        Returns:
            one_hot (np.ndarray): a collapsed one-hot encoding
        """
        vocab = self.surname_vocab
        one_hot = np.zeros(len(vocab), dtype=np.float32)
        for token in surname:
            one_hot[vocab.lookup_token(token)] = 1

        return one_hot

    @classmethod
    def from_dataframe(cls, surname_df):
        """Instantiate the vectorizer from the dataset dataframe

        Args:
            surname_df (pandas.DataFrame): the surnames dataset
        Returns:
            an instance of the SurnameVectorizer
        """
        surname_vocab = Vocabulary(unk_token="@")
        nationality_vocab = Vocabulary(add_unk=False)

        for index, row in surname_df.iterrows():
            for letter in row.surname:
                surname_vocab.add_token(letter)
            nationality_vocab.add_token(row.nationality)

        return cls(surname_vocab, nationality_vocab)

    @classmethod
    def from_serializable(cls, contents):
        surname_vocab = Vocabulary.from_serializable(contents['surname_vocab'])
        nationality_vocab =  Vocabulary.from_serializable(contents['nationality_vocab'])
        return cls(surname_vocab=surname_vocab, nationality_vocab=nationality_vocab)

    def to_serializable(self):
        return {'surname_vocab': self.surname_vocab.to_serializable(),
                'nationality_vocab': self.nationality_vocab.to_serializable()}

### The Dataset

In [24]:
class SurnameDataset(Dataset):
    def __init__(self, surname_df, vectorizer):
        """
        Args:
            surname_df (pandas.DataFrame): the dataset
            vectorizer (SurnameVectorizer): vectorizer instatiated from dataset
        """
        self.surname_df = surname_df
        self._vectorizer = vectorizer

        self.train_df = self.surname_df[self.surname_df.split=='train']
        self.train_size = len(self.train_df)

        self.val_df = self.surname_df[self.surname_df.split=='val']
        self.validation_size = len(self.val_df)

        self.test_df = self.surname_df[self.surname_df.split=='test']
        self.test_size = len(self.test_df)

        self._lookup_dict = {'train': (self.train_df, self.train_size),
                             'val': (self.val_df, self.validation_size),
                             'test': (self.test_df, self.test_size)}

        self.set_split('train')

        # Class weights
        class_counts = surname_df.nationality.value_counts().to_dict()
        def sort_key(item):
            return self._vectorizer.nationality_vocab.lookup_token(item[0])
        sorted_counts = sorted(class_counts.items(), key=sort_key)
        frequencies = [count for _, count in sorted_counts]
        self.class_weights = 1.0 / torch.tensor(frequencies, dtype=torch.float32)

    @classmethod
    def load_dataset_and_make_vectorizer(cls, surname_csv):
        """Load dataset and make a new vectorizer from scratch

        Args:
            surname_csv (str): location of the dataset
        Returns:
            an instance of SurnameDataset
        """
        surname_df = pd.read_csv(surname_csv)
        train_surname_df = surname_df[surname_df.split=='train']
        return cls(surname_df, SurnameVectorizer.from_dataframe(train_surname_df))

    @classmethod
    def load_dataset_and_load_vectorizer(cls, surname_csv, vectorizer_filepath):
        """Load dataset and the corresponding vectorizer.
        Used in the case in the vectorizer has been cached for re-use

        Args:
            surname_csv (str): location of the dataset
            vectorizer_filepath (str): location of the saved vectorizer
        Returns:
            an instance of SurnameDataset
        """
        surname_df = pd.read_csv(surname_csv)
        vectorizer = cls.load_vectorizer_only(vectorizer_filepath)
        return cls(surname_df, vectorizer)

    @staticmethod
    def load_vectorizer_only(vectorizer_filepath):
        """a static method for loading the vectorizer from file

        Args:
            vectorizer_filepath (str): the location of the serialized vectorizer
        Returns:
            an instance of SurnameVectorizer
        """
        with open(vectorizer_filepath) as fp:
            return SurnameVectorizer.from_serializable(json.load(fp))

    def save_vectorizer(self, vectorizer_filepath):
        """saves the vectorizer to disk using json

        Args:
            vectorizer_filepath (str): the location to save the vectorizer
        """
        with open(vectorizer_filepath, "w") as fp:
            json.dump(self._vectorizer.to_serializable(), fp)

    def get_vectorizer(self):
        """ returns the vectorizer """
        return self._vectorizer

    def set_split(self, split="train"):
        """ selects the splits in the dataset using a column in the dataframe """
        self._target_split = split
        self._target_df, self._target_size = self._lookup_dict[split]

    def __len__(self):
        return self._target_size

    def __getitem__(self, index):
        """the primary entry point method for PyTorch datasets

        Args:
            index (int): the index to the data point
        Returns:
            a dictionary holding the data point's:
                features (x_surname)
                label (y_nationality)
        """
        row = self._target_df.iloc[index]

        surname_vector = \
            self._vectorizer.vectorize(row.surname)

        nationality_index = \
            self._vectorizer.nationality_vocab.lookup_token(row.nationality)

        return {'x_surname': surname_vector,
                'y_nationality': nationality_index}

    def get_num_batches(self, batch_size):
        """Given a batch size, return the number of batches in the dataset

        Args:
            batch_size (int)
        Returns:
            number of batches in the dataset
        """
        return len(self) // batch_size


def generate_batches(dataset, batch_size, shuffle=True,
                     drop_last=True, device="cpu"):
    """
    A generator function which wraps the PyTorch DataLoader. It will
      ensure each tensor is on the write device location.
    """
    dataloader = DataLoader(dataset=dataset, batch_size=batch_size,
                            shuffle=shuffle, drop_last=drop_last)

    for data_dict in dataloader:
        out_data_dict = {}
        for name, tensor in data_dict.items():
            out_data_dict[name] = data_dict[name].to(device)
        yield out_data_dict

## The Model: SurnameClassifier

In [25]:
class SurnameClassifier(nn.Module):
    """ A 2-layer Multilayer Perceptron for classifying surnames """
    def __init__(self, input_dim, hidden_dim, output_dim, dropout_rate=0.5):
        """
        Args:
            input_dim (int): the size of the input vectors
            hidden_dim (int): the output size of the first Linear layer
            output_dim (int): the output size of the second Linear layer
        """
        super(SurnameClassifier, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.dropout = nn.Dropout(dropout_rate)
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x_in, dropout=False, apply_softmax=False):
        """The forward pass of the classifier

        Args:
            x_in (torch.Tensor): an input data tensor.
                x_in.shape should be (batch, input_dim)
            apply_softmax (bool): a flag for the softmax activation
                should be false if used with the Cross Entropy losses
        Returns:
            the resulting tensor. tensor.shape should be (batch, output_dim)
        """
        intermediate_vector = F.relu(self.fc1(x_in))
        if dropout:
            intermediate_vector = self.dropout(intermediate_vector)
        prediction_vector = self.fc2(intermediate_vector)

        if apply_softmax:
            prediction_vector = F.softmax(prediction_vector, dim=1)

        return prediction_vector

## Training Routine

### Helper functions

In [26]:
def make_train_state(args):
    return {'stop_early': False,
            'early_stopping_step': 0,
            'early_stopping_best_val': 1e8,
            #'learning_rate': args.learning_rate,
            'epoch_index': 0,
            'train_loss': [],
            'train_acc': [],
            'val_loss': [],
            'val_acc': [],
            'test_loss': -1,
            'test_acc': -1,
            'model_filename': args.model_state_file}

def update_train_state(args, model, train_state):
    """Handle the training state updates.

    Components:
     - Early Stopping: Prevent overfitting.
     - Model Checkpoint: Model is saved if the model is better

    :param args: main arguments
    :param model: model to train
    :param train_state: a dictionary representing the training state values
    :returns:
        a new train_state
    """

    # Save one model at least
    if train_state['epoch_index'] == 0:
        torch.save(model.state_dict(), train_state['model_filename'])
        train_state['stop_early'] = False

    # Save model if performance improved
    elif train_state['epoch_index'] >= 1:
        loss_tm1, loss_t = train_state['val_loss'][-2:]

        # If loss worsened
        if loss_t >= train_state['early_stopping_best_val']:
            # Update step
            train_state['early_stopping_step'] += 1
        # Loss decreased
        else:
            # Save the best model
            if loss_t < train_state['early_stopping_best_val']:
                torch.save(model.state_dict(), train_state['model_filename'])

            # Reset early stopping step
            train_state['early_stopping_step'] = 0

        # Stop early ?
        train_state['stop_early'] = \
            train_state['early_stopping_step'] >= args.early_stopping_criteria

    return train_state

def compute_accuracy(y_pred, y_target):
    _, y_pred_indices = y_pred.max(dim=1)
    n_correct = torch.eq(y_pred_indices, y_target).sum().item()
    return (n_correct, len(y_pred_indices))

Some general utiliites to set a random seed (everywhere a random number generator is used) and to deal with directory paths.

In [27]:
def set_seed_everywhere(seed, cuda):
    np.random.seed(seed)
    torch.manual_seed(seed)
    if cuda:
        torch.cuda.manual_seed_all(seed)

def handle_dirs(dirpath):
    if not os.path.exists(dirpath):
        os.makedirs(dirpath)

### Settings and some prep work

In [28]:
args = Namespace(
    # Data and path information
    surname_csv='https://raw.githubusercontent.com/sgeinitz/DSML4220/main/data/surnames.csv',
    vectorizer_file="vectorizer.json",
    model_state_file="model.pth",
    save_dir="data/surname_mlp",
    early_stopping_criteria=5,
    seed=42,
    # Model hyper parameters, these will all now be taken care of with the wandb sweep
    # hidden_dim=300,
    # Training  hyper parameters
    # num_epochs=100,
    # learning_rate=0.001,
    # batch_size=64,
    # Runtime options
    cuda=True,
    reload_from_files=False,
    expand_filepaths_to_save_dir=True,
)

if args.expand_filepaths_to_save_dir:
    args.vectorizer_file = os.path.join(args.save_dir,
                                        args.vectorizer_file)

    args.model_state_file = os.path.join(args.save_dir,
                                         args.model_state_file)

    print("Expanded filepaths: ")
    print("\t{}".format(args.vectorizer_file))
    print("\t{}".format(args.model_state_file))

# Check CUDA
if not torch.cuda.is_available():
    args.cuda = False

args.device = torch.device("cuda" if args.cuda else "cpu")

print("Using CUDA: {}".format(args.cuda))


# Set seed for reproducibility
set_seed_everywhere(args.seed, args.cuda)

# handle dirs
handle_dirs(args.save_dir)

Expanded filepaths: 
	data/surname_mlp/vectorizer.json
	data/surname_mlp/model.pth
Using CUDA: False


### Initializations

In [29]:
if args.reload_from_files:
    # training from a checkpoint
    print("Reloading!")
    dataset = SurnameDataset.load_dataset_and_load_vectorizer(args.surname_csv,
                                                              args.vectorizer_file)
else:
    # create dataset and vectorizer
    print("Creating fresh!")
    dataset = SurnameDataset.load_dataset_and_make_vectorizer(args.surname_csv)
    dataset.save_vectorizer(args.vectorizer_file)

vectorizer = dataset.get_vectorizer()

Creating fresh!


In [30]:

def build_model(hidden_layer_size, dropout):
    classifier = SurnameClassifier(input_dim=len(vectorizer.surname_vocab),
                                   hidden_dim=hidden_layer_size,
                                   output_dim=len(vectorizer.nationality_vocab),
                                   dropout_rate=dropout)
    return classifier.to(device)


def build_optimizer(classifier, optimizer, learning_rate):
    if optimizer == "sgd":
        optimizer = optim.SGD(classifier.parameters(), lr=learning_rate, momentum=0.9)
    elif optimizer == "rmsprop":
        optimizer = optim.RMSprop(classifier.parameters(), lr=learning_rate)
    elif optimizer == "adam":
        optimizer = optim.Adam(classifier.parameters(), lr=learning_rate)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer, mode='min', factor=0.5, patience=1)
    return {'opt': optimizer, 'scheduler': scheduler}


def train_epoch(classifier, optimizer, scheduler, batch_size):
    cumu_loss = 0
    n_correct = 0
    n_total = 0

    # training batches
    dataset.set_split('train')
    batch_generator = generate_batches(dataset, batch_size, device=args.device)
    classifier.train()
    for batch_index, batch_dict in enumerate(batch_generator):

        optimizer.zero_grad()

        # forward step
        y_pred = classifier(batch_dict['x_surname'])
        loss = loss_func(y_pred, batch_dict['y_nationality'])
        cumu_loss += loss.item()

        # backward pass
        loss.backward()
        optimizer.step()

        wandb.log({"train batch loss": loss.item()})
        results = compute_accuracy(y_pred, batch_dict['y_nationality'])
        n_correct += results[0]
        n_total += results[1]
    wandb.log({"accuracy_train": n_correct / n_total})

    train_loss = cumu_loss / (len(dataset) / batch_size)
    cumu_loss = 0
    n_correct = 0
    n_total = 0

    # validation batches
    dataset.set_split('val')
    batch_generator = generate_batches(dataset, batch_size, device=args.device)
    classifier.eval()
    for batch_index, batch_dict in enumerate(batch_generator):

        y_pred = classifier(batch_dict['x_surname'])

        loss = loss_func(y_pred, batch_dict['y_nationality'])
        cumu_loss += loss.item()

        wandb.log({"val batch loss": loss.item()})
        results = compute_accuracy(y_pred, batch_dict['y_nationality'])
        n_correct += results[0]
        n_total += results[1]
    wandb.log({"accuracy_val": n_correct / n_total})


    val_loss = cumu_loss / (len(dataset) / batch_size)
    scheduler.step(val_loss)
    return (train_loss, val_loss)

### Training loop

Note that you need to fill in the `parameters_dict` with the different hyperparameter/configuration settings that you want to try.

In [31]:
loss_func = nn.CrossEntropyLoss(dataset.class_weights)

parameters_dict = {
    'optimizer': {
        'values': ['adam', 'rmsprop', 'sgd']
    },
    'hidden_layer_size': {
        'values': [50, 100, 200, 250]
    },
    'dropout': {
          'values': [0.3, 0.1, 0.5, 0.6]
    },
    'batch_size': {
          'values': [8, 15, 25, 50]
    },
    'learning_rate': {
          'values': [0.001, 0.01, 0.1, 0.0001]
    },

    # try adding another parameter for random seed to see how much it affects results (relative to other hyperparameters)

    'epochs': {    # notice that this does not vary (i.e. will not be swept), so it uses one fixed "value" (not "values")
          'value': 100
    },
}

sweep_config['parameters'] = parameters_dict

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def train(config=None):
    # Initialize a new wandb run
    with wandb.init(config=config):

        # set ramdom seend using config.seed

        # If called by wandb.agent, as below,
        # this config will be set by Sweep Controller
        config = wandb.config

        train_state = make_train_state(args)

        #loader = build_dataset(config.batch_size)
        classifier = build_model(config.hidden_layer_size, config.dropout)
        optimizer = build_optimizer(classifier, config.optimizer, config.learning_rate)

        for epoch in range(config.epochs):

            # train epoch
            avg_loss = train_epoch(classifier, optimizer['opt'], optimizer['scheduler'], batch_size=config.batch_size)
            wandb.log({"train_loss": avg_loss[0], "val_loss": avg_loss[1], "epoch": epoch})


You will need to add your name or a unique identifier to the `project` parameter below.

In [34]:
sweep_id = wandb.sweep(sweep_config, entity="DSML4220", project="Lassner-mlp-Victoria")

wandb.agent(sweep_id, train, count=10)

Create sweep with ID: cypxacle
Sweep URL: https://wandb.ai/DSML4220/Lassner-mlp-Victoria/sweeps/cypxacle


[34m[1mwandb[0m: Agent Starting Run: k0nk9bkt with config:
[34m[1mwandb[0m: 	batch_size: 15
[34m[1mwandb[0m: 	dropout: 0.1
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	hidden_layer_size: 200
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


0,1
accuracy_train,▁▂▄▇████████████████████████████████████
accuracy_val,▁▃▃▆▇███████████████████████████████████
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▄▄▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇████
train batch loss,█▆▆▄▃▃▂▂▃▁▂▂▁▁▂▂▁▂▂▁▂▂▁▁▃▄▃▂▂▁▂▂▂▂▁▁▂▂▁▂
train_loss,█▇▄▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val batch loss,▂▄▁▄▆▇▂▅▁▅▁▃▂▂█▅▆▃▆▃▃▃▂▁▂▄▅▅▆▃▁▃▂▂▃▄▄▄▂▃
val_loss,▂▁▃▃▇▇▇▇▆▆▇█▆▆▇▆▇▇▇▆▇▇▆▆▆▇▇▆▇▇▇▇▆▇▆▆▆▇▇▆

0,1
accuracy_train,0.80456
accuracy_val,0.59205
epoch,99.0
train batch loss,0.22701
train_loss,0.3811
val batch loss,3.01688
val_loss,2.33974


[34m[1mwandb[0m: Agent Starting Run: 1f4xgrf3 with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	hidden_layer_size: 250
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	optimizer: sgd


0,1
accuracy_train,▁▁▁▁▁▄▄▅▆▇██████████████████████████████
accuracy_val,▁▂▂▁▅▇▇█████████████████████████████████
epoch,▁▁▁▁▂▂▂▃▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇███
train batch loss,▇▆▇█▂▂▄▂▄▄▃▅▁▅▄▂▃▅▅▃▆▅▅▁▃▄▆▃▃▂▅▄▅▃█▂▃▂▂▂
train_loss,███▆▅▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val batch loss,▂▂▃▃▂▂▂▂▂▃▂▂▂▂▂▂▃▁▅▄▂▃▃▅▂▂▁▄▂▂█▂▂▃▂▃▄▂▄▅
val_loss,█▆▇▇▂▄▃▁▂▁▁▂▁▂▁▁▂▁▂▂▂▂▂▂▁▁▂▁▁▁▁▂▁▁▂▁▁▁▁▂

0,1
accuracy_train,0.58424
accuracy_val,0.50244
epoch,99.0
train batch loss,1.10244
train_loss,1.16754
val batch loss,0.6667
val_loss,2.20591


[34m[1mwandb[0m: Agent Starting Run: 0918oc12 with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	hidden_layer_size: 200
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	optimizer: rmsprop


0,1
accuracy_train,▁▂▃▆▇███████████████████████████████████
accuracy_val,▁▇▆▇████████████████████████████████████
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▄▅▅▅▅▅▅▆▆▇▇▇▇▇█████
train batch loss,█▅▆▆▅▃▁▂▂▄▃▅▄▃▁▅▃▄▁▄▁▂▃▄▂▃▂▂▃▂▃▂▄▂▂▆▃▂▅▅
train_loss,█▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val batch loss,▃▃▃▃▃▃▅▁▂▃▃▃▂▂▁▃▂█▃▁▃▂▂▂▂▁▄▁▂▆▃▁▂▃▁▂▁▃▁▅
val_loss,▂▁▄▄▅▆▇▆▇▇▇▇▇█▆▆▆▇▇▇▆█▇▆▇▇▇▇▆▆▆▇▇▆▇▇█▇▇▇

0,1
accuracy_train,0.79961
accuracy_val,0.59939
epoch,99.0
train batch loss,0.09197
train_loss,0.49047
val batch loss,8.37583
val_loss,2.67762


[34m[1mwandb[0m: Agent Starting Run: naffhdhp with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	hidden_layer_size: 250
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	optimizer: adam


0,1
accuracy_train,▁▅▅▆▅▆▆▆▇▇▇█████████████████████████████
accuracy_val,▁▇▇▆▇█▆▇████████████████████████████████
epoch,▁▁▁▁▁▂▂▂▂▃▃▃▄▄▄▄▄▄▄▅▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇████
train batch loss,▆▃▅▃▅▆▄▆▆▂▁▂▃█▁▄▅▃▄▄▃▁▁▃▅▄▅▇▅▁▂▃▅▅▁▄▄▅▃▅
train_loss,█▇▆▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val batch loss,██▆▅▅▃▄▅▄▄▂▇▅▄▃▃▂▅▆▃▅▄▃█▄▁▇▇▆▄▅▇▅▂▃▃▂▅▇▆
val_loss,█▇▆▅▄▂▂▂▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
accuracy_train,0.51693
accuracy_val,0.49085
epoch,99.0
train batch loss,2.11434
train_loss,1.53674
val batch loss,1.24227
val_loss,1.79311


[34m[1mwandb[0m: Agent Starting Run: y8wzmtns with config:
[34m[1mwandb[0m: 	batch_size: 15
[34m[1mwandb[0m: 	dropout: 0.6
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	hidden_layer_size: 100
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	optimizer: sgd


0,1
accuracy_train,▁▂▃▄▅▇▇█████████████████████████████████
accuracy_val,▁▄▇█████████████████████████████████████
epoch,▁▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇████
train batch loss,▄▇▅█▃▃▄▄▃▃▂▃▃▁▁▂▁▂▅▂▂▂▄▄▃▂▂▃▄▂▁▃▃▁▂▂▃▃▃▃
train_loss,█▆▅▄▄▃▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val batch loss,▅▃▁▆▅▄▄▂▅▃▂▃▂▄▂▃▃█▃▃▃▄▄▆▁▄▂█▂▂▃▅▄▂▅▂▃▂▂▄
val_loss,█▄▄▂▃▂▂▂▂▁▁▂▁▂▂▂▁▁▁▂▂▂▁▁▂▂▂▂▂▂▂▁▂▁▂▁▁▁▁▂

0,1
accuracy_train,0.61432
accuracy_val,0.51804
epoch,99.0
train batch loss,0.86729
train_loss,0.98945
val batch loss,1.93524
val_loss,1.79224


[34m[1mwandb[0m: Agent Starting Run: 5euz9u59 with config:
[34m[1mwandb[0m: 	batch_size: 15
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	hidden_layer_size: 250
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	optimizer: adam


0,1
accuracy_train,▁▄▄▂▁▁▄▄▄▅▆▆▇▇▇█▇██▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
accuracy_val,▁▆██▇▇▇▇▇▇▇▇█▇▇▇▇▇▇▇█▇██▇▇█▇███▇██▇█▇███
epoch,▁▁▁▁▁▂▂▃▃▃▃▃▄▄▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇▇██
train batch loss,█▇█▆▇▃▄▆▃▂▂▄▅▃▃▃▃▆▄▁▂▄▄▆▂▃▄▄▃▃▃▂▅▄▃▃▂▅▃▂
train_loss,█▆▆▅▄▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val batch loss,▇▅▆▅▅▄▄▄▅▄▂▃▂▄▆▅▄▄█▂▃▄▃▃▁▃▁▄▄▃▄▂▅▂▄▂▃▂▃▂
val_loss,█▇▄▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
accuracy_train,0.47813
accuracy_val,0.45076
epoch,99.0
train batch loss,1.19336
train_loss,1.58228
val batch loss,1.51556
val_loss,1.8443


[34m[1mwandb[0m: Agent Starting Run: r1e6sgqh with config:
[34m[1mwandb[0m: 	batch_size: 50
[34m[1mwandb[0m: 	dropout: 0.1
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	hidden_layer_size: 50
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	optimizer: sgd


0,1
accuracy_train,▁▂▃▄▅▇██████████████████████████████████
accuracy_val,▆▁▄▆▇▇▇█▇███████████████████████████████
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▆▇▇▇▇▇███
train batch loss,█▅█▃▂▃▂▃▄▃▂▄▃▂▂▂▂▂▂▂▂▂▃▁▂▃▂▄▃▂▂▂▂▂▁▂▁▃▂▄
train_loss,█▆▅▅▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val batch loss,▄▄▃▃▅▂▄▃▇▅▃▃▆▄▅▇▁▄▄▅▂▄▂▃▄▅▄▄▂▂▅▁▂▂█▂▅▃▆▁
val_loss,█▄▂▁▂▂▃▃▃▃▃▃▂▃▄▄▂▂▃▃▃▃▃▄▂▄▂▂▄▃▃▃▃▄▁▃▃▃▃▃

0,1
accuracy_train,0.56954
accuracy_val,0.49125
epoch,99.0
train batch loss,0.89907
train_loss,0.97795
val batch loss,1.32619
val_loss,1.96711


[34m[1mwandb[0m: Agent Starting Run: 91e8nron with config:
[34m[1mwandb[0m: 	batch_size: 50
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	hidden_layer_size: 200
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: sgd


0,1
accuracy_train,▁▃▄▆▆▆▆▆▇▇▇▇▇▇██████████████████████████
accuracy_val,▁▄▄▅▅▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇████████████████████
epoch,▁▁▁▁▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇█
train batch loss,█▇▇▆▆▅▆▅▅▅▄▆▄▅▆▂▄▃▃▂▄▁▄▄▅▂▄▁▂▃▂▆▃▃▅▁▃▃▃▃
train_loss,███▇▇▆▆▆▅▅▃▃▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val batch loss,██▇▆▆▅▇▆▆▆▇▇▄▄▄▅▄▆▁▄▁▄▄▄▄▁▂▃▃▄▂▄▃▄▂▃▅▅▄▂
val_loss,██▇▇▇▆▆▆▅▅▄▄▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
accuracy_train,0.38196
accuracy_val,0.34563
epoch,99.0
train batch loss,1.42118
train_loss,1.84414
val batch loss,2.09274
val_loss,1.9712


[34m[1mwandb[0m: Agent Starting Run: p8q58p9h with config:
[34m[1mwandb[0m: 	batch_size: 25
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	hidden_layer_size: 100
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	optimizer: rmsprop


0,1
accuracy_train,▁▂▄▇████████████████████████████████████
accuracy_val,▁▃▂▃█▅▅▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆
epoch,▁▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▆▇▇▇▇▇██
train batch loss,█▃▃▂▂▂▂▂▁▃▂▂▂▂▃▂▄▂▂▂▂▂▂▂▂▂▂▃▂▃▂▁▃▂▂▂▃▂▁▁
train_loss,█▃▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val batch loss,▂▁▁▁▁▁▂▅▁▄▂▂▂▁▂▆█▁▂▁▂▅▁▂▁▄▂▁▂▂▁▁▂▁█▂▄▁▂▁
val_loss,▁▆▁▂▂▅▄▄▅▆▇▆▄█▅▆▅▆▆▇▅▅▆▆▆▆▅▆▇▆▆▆▅▇▆▇▇▆▆▇

0,1
accuracy_train,0.33746
accuracy_val,0.30646
epoch,99.0
train batch loss,2.93067
train_loss,2.06411
val batch loss,2.68417
val_loss,3.37304


[34m[1mwandb[0m: Agent Starting Run: p4v8k731 with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	dropout: 0.6
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	hidden_layer_size: 200
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	optimizer: rmsprop


0,1
accuracy_train,▁▂▃▅▆█▅▆▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
accuracy_val,▂▁▁▆▃█▄█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇▇███
train batch loss,▂▂▂▂▃▂▂▁█▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▁▂▂▂▁▂▂▂▂▂▂▂▂▂
train_loss,█▃▃▃▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val batch loss,▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▂▁▁▁▁
val_loss,▂▁▁▃▁▃▃▅▅▅▆▆▆▆▆▆▇▅▆▆▇▇▇█▅▆▅▇▆▆▆▅▇▇▆█▅▇▇▆

0,1
accuracy_train,0.26068
accuracy_val,0.25
epoch,99.0
train batch loss,2.30547
train_loss,2.4638
val batch loss,2.2586
val_loss,3.68087
