<a href="https://colab.research.google.com/github/talitmr/text_classification_with_RNN/blob/main/text_classification_BiLSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

BiLSTM

First step is creating data's field and label field. For this I implement spacy as a tokenizer and choose the tokenizer language to English. 

In [None]:
import torch
from torchtext.legacy import data

SEED = 1234

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

TEXT = data.Field(tokenize = 'spacy',
                  tokenizer_language = 'en_core_web_sm',
                  include_lengths = True)

LABEL = data.LabelField(dtype = torch.float)

After creating TEXT and LABEL, I split the data set first train and test and then using split I divide test data into validation and test sets. Indeed, I have three datasets which are train/validation and test.

In [None]:
from torchtext.legacy import datasets

train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

aclImdb_v1.tar.gz:   0%|          | 147k/84.1M [00:00<00:59, 1.42MB/s]

downloading aclImdb_v1.tar.gz


aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:01<00:00, 73.2MB/s]


In [None]:
import random

valid_data, test_data = test_data.split(random_state = random.seed(SEED))

I choose to use GLOVE pretrained word embeddings using maximum vocav size is 20000. The reason that I use GLOVE is that since it is pretrained vectors my models results will be more accurate. Since you wanted to 100 sized hidden dimensions, I choose also GLOVE as a 100 dimensions.


In [None]:
MAX_VOCAB_SIZE = 20_000

TEXT.build_vocab(train_data, 
                 max_size = MAX_VOCAB_SIZE, 
                 vectors = "glove.6B.100d", 
                 unk_init = torch.Tensor.normal_)

LABEL.build_vocab(train_data)

.vector_cache/glove.6B.zip: 862MB [02:39, 5.40MB/s]                          
100%|█████████▉| 398543/400000 [00:14<00:00, 28947.53it/s]

I used BucketIterator in order to create train/validation/test iterators with batch size 64. In order to overcome the padding issue, I ordered batches according to sequence lengths. 

In [None]:
BATCH_SIZE = 64

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data), 
    batch_size = BATCH_SIZE,
    sort_within_batch = True,
    device = device)


Now, it is time to build model BILSTM Model. Since the model is bidirectional, I multiplied the hidden dimension with 2. Also in the forward, I concatanate two way vectors  since again it is bidirectional network. Moreover, I used embedding layer with padding and LSTM layer with bidirectional is True and dropout. 

I packed the embeddings with nn.utils.rnn.packed_padded_sequence since it causes LSTM to only process the non-padded elements of our sequence.Then unpack the output sequence, with nn.utils.rnn.pad_packed_sequence, to transform it from a packed sequence to a tensor.


In [None]:
import torch.nn as nn

class LSTM(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, 
                 bidirectional, dropout, pad_idx):
        
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
        self.rnn = nn.LSTM(embedding_dim, 
                           hidden_dim, 
                           num_layers=n_layers, 
                           bidirectional=bidirectional, 
                           dropout=dropout)
        
        self.fc = nn.Linear(hidden_dim * 2, output_dim)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text, text_lengths):
        
        embedded = self.dropout(self.embedding(text))
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.to('cpu'))
        packed_output, (hidden, cell) = self.rnn(packed_embedded)
        output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)
        hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))
                
            
        return self.fc(hidden)

In the below I give the model's values and create the model. 

In [None]:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 100
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.5
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]

model = LSTM(INPUT_DIM, 
            EMBEDDING_DIM, 
            HIDDEN_DIM, 
            OUTPUT_DIM, 
            N_LAYERS, 
            BIDIRECTIONAL, 
            DROPOUT, 
            PAD_IDX)



I use the embeddings from the field's vocab and then copied to weight.data for the initialization of the weights

In [None]:
pretrained_embeddings = TEXT.vocab.vectors

print(pretrained_embeddings.shape)

torch.Size([20002, 100])


In [None]:
model.embedding.weight.data.copy_(pretrained_embeddings)

tensor([[-0.1117, -0.4966,  0.1631,  ...,  1.2647, -0.2753, -0.1325],
        [-0.8555, -0.7208,  1.3755,  ...,  0.0825, -1.1314,  0.3997],
        [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
        ...,
        [ 1.1516,  0.3090,  1.1633,  ..., -0.4184,  0.4529,  0.9140],
        [ 0.6388, -0.7393, -1.0334,  ...,  1.0464,  1.0508,  0.9579],
        [-0.7700,  0.4425,  0.4478,  ...,  0.1349,  1.5542, -0.7513]])

As the <unk> and <pad> are not in the pretrained vocabulary they have been initialized using unk_init when building our vocab with zero initialization


In [None]:
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]

model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)

print(model.embedding.weight.data)

tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
        ...,
        [ 1.1516,  0.3090,  1.1633,  ..., -0.4184,  0.4529,  0.9140],
        [ 0.6388, -0.7393, -1.0334,  ...,  1.0464,  1.0508,  0.9579],
        [-0.7700,  0.4425,  0.4478,  ...,  0.1349,  1.5542, -0.7513]])


In order to train the model, I choose the optimizer as Adam and loss function as Binary Cross Entropy with Logits Loss. 

In [None]:
import torch.optim as optim

optimizer = optim.Adam(model.parameters())

criterion = nn.BCEWithLogitsLoss()

model = model.to(device)
criterion = criterion.to(device)

In [None]:
def binary_accuracy(preds, y):

    rounded_preds = torch.round(torch.sigmoid(preds))
    correct = (rounded_preds == y).float() 
    acc = correct.sum() / len(correct)
    return acc

In the below training function is written. I used zero_graad in order to set gradients to zero. And then I get the predictions. Evaluated loss with BCE and accuracy that I defined above. And then evaluationg backpropagation since computing the gradients of loss with respect to all parameters and updating the parameters with optimizer.step()

Moreover, I defined the evaluate function in order to evaluate the model with test iterator with no gradients.


In [None]:
def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()
        
        text, text_lengths = batch.text
        
        predictions = model(text, text_lengths).squeeze(1)
        
        loss = criterion(predictions, batch.label)
        
        acc = binary_accuracy(predictions, batch.label)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [None]:
def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:

            text, text_lengths = batch.text
            
            predictions = model(text, text_lengths).squeeze(1)
            
            loss = criterion(predictions, batch.label)
            
            acc = binary_accuracy(predictions, batch.label)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [None]:
import time
# in order to see the time in each epoch. 
def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

#TRAINING THE MODEL with 1 epoch



In [None]:
N_EPOCHS = 1

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

Epoch: 01 | Epoch Time: 0m 30s
	Train Loss: 0.626 | Train Acc: 64.16%
	 Val. Loss: 0.531 |  Val. Acc: 74.02%


In [None]:
test_loss, test_acc = evaluate(model, test_iterator, criterion)

print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')

Test Loss: 0.533 | Test Acc: 73.44%


Weight and Biases

----------------------------------------------

In the below experiment, I used the documentation. 

In [None]:
%%capture
!pip install wandb --upgrade

In [None]:
import random

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from tqdm.notebook import tqdm
import time
from torchtext.legacy import data
from torchtext.legacy import datasets
import spacy

import wandb
# Ensure deterministic behavior
torch.backends.cudnn.deterministic = True
random.seed(hash("setting random seeds") % 2**32 - 1)
np.random.seed(hash("improves reproducibility") % 2**32 - 1)
torch.manual_seed(hash("by removing stochasticity") % 2**32 - 1)
torch.cuda.manual_seed_all(hash("so runs are repeatable") % 2**32 - 1)

# Device configuration
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Firstly, I log in to the Weight and Biases system. 

In [None]:
wandb.login()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

And then I set the parameters that I want to use.

In [None]:
sweep_config = {
    'method': 'random',
    'parameters': {
        'epochs': {
            'values': [2,5,8]
        },
        'batch_size': {
            'values': [64,128,256]
        },
        'dropout': {
            'values': [0.25, 0.5, 0.75]
        },
        'learning_rate': {
            'values': [0.1,0.01, 0.005]
        },
        'n_hidden': {
            'values': [75,100,150]
        },
        'n_layers': {
            'values': [1,2,3]
        },
    }
}

I initialized the project in the below.

In [None]:
wandb.init(project = 'ass3-bonus')

[34m[1mwandb[0m: Currently logged in as: [33mtalyatmr[0m (use `wandb login --relogin` to force relogin)


I added to the system my hyperparameter values.

In [None]:
sweep_id = wandb.sweep(sweep_config, project="ass3-bonus")



Create sweep with ID: 57t5d4og
Sweep URL: https://wandb.ai/talyatmr/ass3-bonus/sweeps/57t5d4og


In [None]:
wandb.watch(model)

[<wandb.wandb_torch.TorchGraph at 0x7fbe9a2a9390>]

In the below train function, I tried to implement wandb structure based on my previous part's work. So I added the all necessary steps in order to run the models successfully. I followed the following steps:

- added the default hyperparameters to the system
- device choise: if gpu is available, system will use it.
- picking the hyperparameters via the config
- integrating hyperparameters to the models
- defining the optimization and loss function
- embedding settings
- training the model and getting the train/validation loss and accuracy
- evaluating the model on test data and getting the test loss and accuracy
- logging to the outputs ( train/validation/test accuracy and loss values) to the wandb system
- finally, running in the wandb using the agent


In [None]:
def train():
    config_defaults = {
        'epochs': 5,
        'batch_size': 64,
        'dropout': 0.5,
        'n_hidden': 100,
        'n_layers': 2,
    }

    

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    wandb.init(config=config_defaults)
    config = wandb.config

    INPUT_DIM = len(TEXT.vocab)
    HIDDEN_DIM = config.n_hidden
    EMBEDDING_DIM = 100
    OUTPUT_DIM = 1
    N_LAYERS = config.n_layers
    BIDIRECTIONAL = True
    DROPOUT = config.dropout
    PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]

    model = LSTM(INPUT_DIM, 
                EMBEDDING_DIM, 
                HIDDEN_DIM, 
                OUTPUT_DIM, 
                N_LAYERS, 
                BIDIRECTIONAL, 
                DROPOUT, 
                PAD_IDX)
    
    criterion = nn.BCEWithLogitsLoss()

    model = model.to(device)
    criterion = criterion.to(device)

    optimizer = optim.Adam(model.parameters())
    
    model.embedding.weight.data.copy_(TEXT.vocab.vectors)
    UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]

    model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
    model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)

    for epoch in range(config.epochs):
      epoch_loss = 0
      epoch_acc = 0
    
      model.train()
    
      for batch in train_iterator:
        
          optimizer.zero_grad()
        
          text, text_lengths = batch.text
        
          predictions = model(text, text_lengths).squeeze(1)
        
          loss = criterion(predictions, batch.label)
        
          acc = binary_accuracy(predictions, batch.label)
        
          loss.backward()
        
          optimizer.step()
        
          epoch_loss += loss.item()
          epoch_acc += acc.item()
        
          wandb.log({"train_loss":epoch_loss / len(train_iterator)})
          wandb.log({"train_acc":epoch_acc / len(train_iterator)})

      epoch_loss = 0
      epoch_acc = 0
    
      model.eval()
    
      with torch.no_grad():
    
          for batch in valid_iterator:

              text, text_lengths = batch.text
            
              predictions = model(text, text_lengths).squeeze(1)
            
              loss = criterion(predictions, batch.label)
            
              acc = binary_accuracy(predictions, batch.label)

              epoch_loss += loss.item()
              epoch_acc += acc.item()       
              
              wandb.log({"valid_loss":epoch_loss / len(valid_iterator)})
              wandb.log({"valid_acc":epoch_acc / len(valid_iterator)})

      epoch_loss = 0
      epoch_acc = 0
    
      model.eval()
    
      with torch.no_grad():
    
          for batch in test_iterator:

              text, text_lengths = batch.text
            
              predictions = model(text, text_lengths).squeeze(1)
            
              loss = criterion(predictions, batch.label)
            
              acc = binary_accuracy(predictions, batch.label)

              epoch_loss += loss.item()
              epoch_acc += acc.item()       
              
              wandb.log({"test_loss":epoch_loss / len(test_iterator)})
              wandb.log({"test_acc":epoch_acc / len(test_iterator)})




In [None]:
wandb.agent(sweep_id, train)

[34m[1mwandb[0m: Agent Starting Run: negb3exd with config:
[34m[1mwandb[0m: 	batch_size: 256
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	n_hidden: 75
[34m[1mwandb[0m: 	n_layers: 2


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.28594
_runtime,166.0
_timestamp,1623624950.0
_step,7829.0
train_acc,0.88565
valid_loss,0.3651
valid_acc,0.86509
test_loss,0.36966
test_acc,0.86295


0,1
train_loss,▁▂▃▄▅▆▇█▁▂▃▄▄▅▆▇▁▂▂▃▄▄▅▆▁▂▂▃▃▄▄▅▁▂▂▃▃▃▄▄
_runtime,▁▁▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇█████
_timestamp,▁▁▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇█████
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train_acc,▁▂▂▃▃▄▅▅▁▂▃▄▄▅▆▇▁▂▃▄▅▆▆▇▁▂▃▄▅▆▇█▂▂▃▄▅▆▇█
valid_loss,▁▂▃▄▄▅▆█▁▂▃▃▄▅▆▆▁▂▂▃▄▅▅▆▁▂▃▄▅▆▇█▁▂▃▃▄▄▅▆
valid_acc,▁▂▃▄▄▅▆▇▁▂▃▄▅▆▇▇▁▂▃▄▅▆▇█▁▂▃▄▅▆▆▇▁▂▃▄▅▆▇█
test_loss,▁▂▃▃▄▅▆▇▁▂▂▃▄▅▅▆▁▂▂▃▃▄▅▆▁▂▃▄▅▆▇█▁▂▂▃▄▄▅▆
test_acc,▁▂▃▄▅▆▆▇▁▂▃▄▅▆▆▇▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█


[34m[1mwandb[0m: Agent Starting Run: 9w4a18fh with config:
[34m[1mwandb[0m: 	batch_size: 128
[34m[1mwandb[0m: 	dropout: 0.25
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.005
[34m[1mwandb[0m: 	n_hidden: 100
[34m[1mwandb[0m: 	n_layers: 1


  "num_layers={}".format(dropout, num_layers))


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.45429
_runtime,38.0
_timestamp,1623624992.0
_step,3131.0
train_acc,0.79713
valid_loss,0.40437
valid_acc,0.8244
test_loss,0.41044
test_acc,0.81797


0,1
train_loss,▁▁▂▂▃▃▃▄▄▅▅▅▆▆▆▇▇▇██▁▁▂▂▂▂▃▃▃▄▄▄▄▅▅▅▅▆▆▆
_runtime,▁▁▁▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▄▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇▇███
_timestamp,▁▁▁▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▄▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇▇███
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train_acc,▁▁▁▂▂▂▃▃▃▄▄▄▄▅▅▅▆▆▆▇▁▁▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇██
valid_loss,▁▁▂▂▂▃▃▃▄▄▅▅▅▆▆▆▇▇██▁▁▁▂▂▂▃▃▃▃▄▄▄▄▅▅▅▆▆▆
valid_acc,▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇▁▁▂▂▃▃▃▄▄▄▅▅▅▆▆▇▇▇██
test_loss,▁▁▂▂▂▃▃▃▄▄▅▅▅▆▆▆▇▇██▁▁▂▂▂▂▃▃▃▃▄▄▄▅▅▅▅▆▆▆
test_acc,▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇▁▂▂▂▃▃▃▄▄▄▅▅▆▆▆▇▇▇██


[34m[1mwandb[0m: Agent Starting Run: azr19mi8 with config:
[34m[1mwandb[0m: 	batch_size: 128
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	n_hidden: 100
[34m[1mwandb[0m: 	n_layers: 1


  "num_layers={}".format(dropout, num_layers))


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.55747
_runtime,38.0
_timestamp,1623625035.0
_step,3131.0
train_acc,0.71401
valid_loss,0.48956
valid_acc,0.77803
test_loss,0.49713
test_acc,0.76982


0,1
train_loss,▁▁▂▂▃▃▃▄▄▅▅▅▆▆▆▇▇▇██▁▁▂▂▂▃▃▃▄▄▄▅▅▆▆▆▆▇▇▇
_runtime,▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▄▅▅▅▅▅▆▆▆▇▇▇▇▇▇▇▇████
_timestamp,▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▄▅▅▅▅▅▆▆▆▇▇▇▇▇▇▇▇████
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train_acc,▁▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▁▁▂▂▃▃▃▃▄▄▄▅▅▆▆▆▇▇██
valid_loss,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▆▇▇██▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇█
valid_acc,▁▁▂▂▂▃▃▄▄▄▅▅▅▅▆▆▆▇▇▇▁▁▂▂▃▃▃▄▄▄▅▅▅▆▆▇▇▇██
test_loss,▁▁▂▂▂▃▃▃▄▄▅▅▅▆▆▇▇▇██▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▇▇▇█
test_acc,▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇█▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▇▇▇██


[34m[1mwandb[0m: Agent Starting Run: wlp1v4bm with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	dropout: 0.75
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.005
[34m[1mwandb[0m: 	n_hidden: 100
[34m[1mwandb[0m: 	n_layers: 2


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.59865
_runtime,74.0
_timestamp,1623625113.0
_step,3131.0
train_acc,0.6772
valid_loss,0.53766
valid_acc,0.75296
test_loss,0.54613
test_acc,0.74647


0,1
train_loss,▁▁▂▂▂▃▃▄▄▄▅▅▆▆▆▇▇▇██▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇
_runtime,▁▁▁▂▂▂▃▃▃▃▃▄▄▄▄▄▄▄▄▄▅▅▅▅▆▆▆▆▇▇▇▇▇▇▇▇████
_timestamp,▁▁▁▂▂▂▃▃▃▃▃▄▄▄▄▄▄▄▄▄▅▅▅▅▆▆▆▆▇▇▇▇▇▇▇▇████
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train_acc,▁▁▁▂▂▂▃▃▃▄▄▄▄▅▅▅▆▆▆▇▁▁▂▂▂▃▃▃▄▄▄▅▅▆▆▆▇▇██
valid_loss,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▆▇▇██▁▁▁▂▂▂▃▃▃▃▄▄▄▅▅▅▆▆▆▇
valid_acc,▁▁▂▂▂▂▃▃▃▃▄▄▄▅▅▅▆▆▆▆▁▁▂▂▃▃▃▄▄▄▅▅▅▆▆▇▇▇██
test_loss,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██▁▁▂▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇
test_acc,▁▁▂▂▂▃▃▃▃▄▄▄▄▅▅▅▆▆▆▇▁▂▂▂▃▃▃▄▄▅▅▅▆▆▆▇▇▇██


[34m[1mwandb[0m: Agent Starting Run: xshtq20u with config:
[34m[1mwandb[0m: 	batch_size: 256
[34m[1mwandb[0m: 	dropout: 0.25
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	n_hidden: 75
[34m[1mwandb[0m: 	n_layers: 1


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.19343
_runtime,93.0
_timestamp,1623625211.0
_step,7829.0
train_acc,0.92551
valid_loss,0.30167
valid_acc,0.88236
test_loss,0.30218
test_acc,0.88034


0,1
train_loss,▁▂▃▄▅▆▇█▁▂▃▃▄▅▆▇▁▂▂▃▃▄▄▅▁▂▂▂▃▃▄▄▁▁▂▂▂▃▃▃
_runtime,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇████
_timestamp,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇████
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train_acc,▁▂▂▃▃▄▅▆▁▂▃▄▄▅▆▆▁▂▃▄▅▆▆▇▁▂▃▄▅▆▇█▂▂▃▄▅▆▇█
valid_loss,▁▂▃▄▅▆▇█▁▂▂▃▄▅▆▆▁▂▂▃▄▄▅▆▁▂▂▃▃▄▄▅▁▂▂▃▃▄▄▅
valid_acc,▁▂▃▃▄▅▆▇▁▂▃▄▄▆▇▇▁▂▃▄▅▆▇▇▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█
test_loss,▁▂▃▄▅▆▇█▁▂▃▃▄▅▆▆▁▂▂▃▃▄▅▆▁▂▂▃▃▄▅▅▁▁▂▃▃▄▅▅
test_acc,▁▂▃▄▄▅▆▇▁▂▃▄▅▆▆▇▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█


[34m[1mwandb[0m: Agent Starting Run: tqfg1pxp with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 8
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	n_hidden: 150
[34m[1mwandb[0m: 	n_layers: 3


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.20172
_runtime,572.0
_timestamp,1623625788.0
_step,12527.0
train_acc,0.92291
valid_loss,0.25699
valid_acc,0.89708
test_loss,0.26195
test_acc,0.89358


0,1
train_loss,▂▃▄▇█▁▃▅▆▇▁▂▃▄▅▁▂▃▄▄▁▂▃▃▄▁▂▂▃▄▁▂▂▃▃▁▂▂▃▃
_runtime,▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇████
_timestamp,▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇████
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train_acc,▁▂▃▄▅▁▂▃▄▆▁▃▄▆▇▁▃▄▆▇▁▃▅▆█▁▃▄▆█▁▃▄▆▇▂▃▅▆█
valid_loss,▂▃▅▇█▁▂▄▄▆▁▂▃▄▅▁▂▃▃▄▁▂▂▃▄▁▂▃▃▄▁▂▂▃▄▁▂▂▃▄
valid_acc,▁▂▃▅▅▁▃▄▅▇▂▂▄▆▇▁▃▅▅▇▁▃▄▆█▁▃▅▆▇▁▃▄▆█▂▃▄▆█
test_loss,▁▃▅▆█▁▂▃▄▅▁▂▃▄▅▁▂▂▃▄▁▂▂▃▄▁▂▂▃▄▁▂▃▃▄▁▂▂▃▄
test_acc,▁▂▃▄▅▁▃▄▅▇▁▃▄▆▇▂▃▄▆▇▂▃▅▆█▂▃▅▆█▂▃▅▆█▂▄▅▇█


[34m[1mwandb[0m: Agent Starting Run: efiqlza5 with config:
[34m[1mwandb[0m: 	batch_size: 256
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	n_hidden: 150
[34m[1mwandb[0m: 	n_layers: 1


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.63816
_runtime,42.0
_timestamp,1623625835.0
_step,3131.0
train_acc,0.63055
valid_loss,0.65594
valid_acc,0.6271
test_loss,0.65753
test_acc,0.61763


0,1
train_loss,▁▁▂▂▂▃▃▄▄▄▅▅▆▆▆▇▇▇██▁▁▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇██
_runtime,▁▁▁▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▄▅▅▅▅▆▆▆▆▇▇▇▇▇▇▇▇████
_timestamp,▁▁▁▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▄▅▅▅▅▆▆▆▆▇▇▇▇▇▇▇▇████
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train_acc,▁▁▂▂▂▃▃▃▃▄▄▅▅▅▆▆▆▇▇█▁▁▂▂▂▃▃▃▄▄▄▅▅▆▆▇▇▇██
valid_loss,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▆▇▇██▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇█
valid_acc,▁▁▂▂▂▂▃▃▃▄▄▄▄▅▅▅▆▆▆▆▁▁▂▂▃▃▃▄▄▄▅▅▅▆▆▇▇▇██
test_loss,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██▁▁▂▂▂▃▃▃▄▄▄▅▅▆▆▆▇▇▇█
test_acc,▁▁▂▂▂▃▃▃▃▄▄▄▅▅▅▅▆▆▆▇▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▇▇▇██


[34m[1mwandb[0m: Agent Starting Run: 5lauh47u with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	dropout: 0.25
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	learning_rate: 0.005
[34m[1mwandb[0m: 	n_hidden: 75
[34m[1mwandb[0m: 	n_layers: 3


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.23243
_runtime,253.0
_timestamp,1623626093.0
_step,7829.0
train_acc,0.91055
valid_loss,0.28694
valid_acc,0.88594
test_loss,0.29353
test_acc,0.88069


0,1
train_loss,▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█▁▂▃▃▄▅▅▆▁▂▂▂▃▃▄▄▁▁▂▂▃▃▃▄
_runtime,▁▁▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇█████
_timestamp,▁▁▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇█████
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train_acc,▁▂▂▃▃▄▅▅▁▂▃▃▄▄▅▆▁▂▃▃▅▅▆▇▁▂▃▄▅▆▇█▂▂▃▄▅▆▇█
valid_loss,▁▂▃▄▅▅▆█▁▂▃▄▅▆▇█▁▂▂▂▃▄▄▅▁▁▂▃▃▃▄▄▁▂▂▂▃▃▄▄
valid_acc,▁▂▃▃▄▅▅▆▁▂▃▃▄▅▆▆▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█
test_loss,▁▂▃▄▅▆▇▇▁▂▃▄▅▆▇█▁▂▂▃▃▄▄▅▁▂▂▂▃▃▄▄▁▁▂▂▃▃▄▄
test_acc,▁▂▃▃▄▅▆▆▁▂▃▃▄▅▅▆▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█


[34m[1mwandb[0m: Agent Starting Run: ss09x306 with config:
[34m[1mwandb[0m: 	batch_size: 128
[34m[1mwandb[0m: 	dropout: 0.75
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	n_hidden: 100
[34m[1mwandb[0m: 	n_layers: 3


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.63905
_runtime,283.0
_timestamp,1623626381.0
_step,7829.0
train_acc,0.63648
valid_loss,0.66187
valid_acc,0.58664
test_loss,0.66336
test_acc,0.58099


0,1
train_loss,▁▂▃▄▅▆▇▇▁▂▃▄▅▅▆▇▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█▁▂▃▄▅▆▆█
_runtime,▁▁▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇█████
_timestamp,▁▁▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇█████
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train_acc,▁▂▂▃▄▅▆▆▁▂▃▄▅▆▇█▁▂▃▃▄▅▆▆▁▂▃▄▄▅▆▇▂▂▃▄▅▆▇█
valid_loss,▁▂▃▄▅▆▇█▁▂▃▃▄▅▆▇▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█
valid_acc,▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█▁▂▂▃▄▅▆▆▁▂▃▄▅▆▆▇▁▂▃▄▄▅▆▇
test_loss,▁▂▃▄▅▆▇█▁▂▃▄▅▅▆▇▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█
test_acc,▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█▁▂▃▃▄▅▆▆▁▂▃▄▅▆▇█▁▂▃▄▅▆▇▇


[34m[1mwandb[0m: Agent Starting Run: 99r7wyit with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	dropout: 0.25
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	n_hidden: 75
[34m[1mwandb[0m: 	n_layers: 1


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.50353
_runtime,39.0
_timestamp,1623626424.0
_step,3131.0
train_acc,0.75593
valid_loss,0.39605
valid_acc,0.82632
test_loss,0.40487
test_acc,0.81877


0,1
train_loss,▁▁▂▂▂▃▃▄▄▄▅▅▆▆▆▇▇▇██▁▁▂▂▂▃▃▃▃▄▄▄▅▅▅▅▆▆▆▆
_runtime,▁▁▁▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇▇███
_timestamp,▁▁▁▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇▇███
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train_acc,▁▁▁▂▂▂▃▃▃▃▄▄▄▅▅▅▅▆▆▆▁▁▂▂▂▃▃▃▄▄▄▅▅▆▆▆▇▇██
valid_loss,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▆▇▇██▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▆▆
valid_acc,▁▁▂▂▂▃▃▃▃▄▄▄▅▅▅▆▆▆▇▇▁▁▂▂▃▃▃▄▄▄▅▅▅▆▆▇▇▇██
test_loss,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▆▇▇██▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▆▆
test_acc,▁▁▂▂▂▃▃▃▄▄▄▄▅▅▅▆▆▆▇▇▁▂▂▂▃▃▃▄▄▄▅▅▆▆▆▇▇▇██


[34m[1mwandb[0m: Agent Starting Run: 1z1beenv with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 8
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	n_hidden: 75
[34m[1mwandb[0m: 	n_layers: 2
