# Question 1 (15 Marks)
Build a RNN based seq2seq model which contains the following layers: (i) input layer for character embeddings (ii) one encoder RNN which sequentially encodes the input character sequence (Latin) (iii) one decoder RNN which takes the last state of the encoder as input and produces one output character at a time (Devanagari). 

The code should be flexible such that the dimension of the input character embeddings, the hidden states of the encoders and decoders, the cell (RNN, LSTM, GRU) and the number of layers in the encoder and decoder can be changed.

# Question 2 (10 Marks)

You will now train your model using any one language from the [Dakshina dataset](https://github.com/google-research-datasets/dakshina) (I would suggest pick a language that you can read so that it is easy to analyse the errors). Use the standard train, dev, test set from the folder dakshina_dataset_v1.0/hi/lexicons/ (replace hi by the language of your choice)

Using the sweep feature in wandb find the best hyperparameter configuration. Here are some suggestions but you are free to decide which hyperparameters you want to explore


- input embedding size: 16, 32, 64, 256, ...



- number of encoder layers: 1, 2, 3 



- number of decoder layers: 1, 2, 3 



- hidden layer size: 16, 32, 64, 256, ...



- cell type: RNN, GRU, LSTM



- dropout: 20%, 30% (btw, where will you add dropout? you should read up a bit on this)



- beam search in decoder with different beam sizes: 


Based on your sweep please paste the following plots which are automatically generated by wandb:

- accuracy v/s created plot (I would like to see the number of experiments you ran to get the best configuration). 

- parallel co-ordinates plot

- correlation summary table (to see the correlation of each hyperparameter with the loss/accuracy)

Also write down the hyperparameters and their values that you sweeped over. Smart strategies to reduce the number of runs while still achieving a high accuracy would be appreciated. Write down any unique strategy that you tried for efficiently searching the hyperparameters.

In [2]:
import wandb
import torch
import torch.nn as nn
import torch.optim as optim
import random
from torch.nn.utils.rnn import pad_sequence

In [3]:
# Log in to W&B (Weights and Biases) for experiment tracking
wandb.login(key='acdc26d2fc17a56e83ea3ae6c10e496128dee648')


# ---------- Model Definitions ----------

# Encoder class for the Seq2Seq model
class Encoder(nn.Module):
    def __init__(self, input_dim, embed_dim, hidden_dim, num_layers, cell_type='LSTM', dropout=0.2, bidirectional=False):
        super(Encoder, self).__init__()
        # Embedding layer for input tokens with padding_idx to ignore padding during training
        self.embedding = nn.Embedding(input_dim, embed_dim, padding_idx=0)

        # Select the RNN cell type: RNN, LSTM, or GRU
        rnn_cls = {'RNN': nn.RNN, 'LSTM': nn.LSTM, 'GRU': nn.GRU}[cell_type]

        # Define the RNN layer with given configuration
        self.rnn = rnn_cls(embed_dim, hidden_dim, num_layers, dropout=dropout, batch_first=True, bidirectional=bidirectional)
        self.cell_type = cell_type
        self.bidirectional = bidirectional

    def forward(self, src):
        # Apply embedding on source sequence
        embedded = self.embedding(src)

        # Pass through the RNN and return only the hidden state(s)
        outputs, hidden = self.rnn(embedded)
        return hidden


# Decoder class for the Seq2Seq model
class Decoder(nn.Module):
    def __init__(self, output_dim, embed_dim, hidden_dim, num_layers, cell_type='LSTM', dropout=0.2, bidirectional=False):
        super(Decoder, self).__init__()
        # Embedding layer for output tokens
        self.embedding = nn.Embedding(output_dim, embed_dim, padding_idx=0)

        # Select the RNN cell type
        rnn_cls = {'RNN': nn.RNN, 'LSTM': nn.LSTM, 'GRU': nn.GRU}[cell_type]

        # RNN for decoding
        self.rnn = rnn_cls(embed_dim, hidden_dim, num_layers, dropout=dropout, batch_first=True, bidirectional=bidirectional)

        # Final fully connected layer to project hidden states to vocabulary size
        self.fc_out = nn.Linear(hidden_dim * (2 if bidirectional else 1), output_dim)
        self.cell_type = cell_type
        self.bidirectional = bidirectional

    def forward(self, input, hidden):
        # Add time dimension (B, 1) since we're decoding one step at a time
        input = input.unsqueeze(1)

        # Embed the input token
        embedded = self.embedding(input)

        # Pass through the RNN
        output, hidden = self.rnn(embedded, hidden)

        # Remove time dimension and pass through linear layer
        output = self.fc_out(output.squeeze(1))
        return output, hidden


# Seq2Seq model that combines Encoder and Decoder
class Seq2Seq(nn.Module):
    def __init__(self, input_dim, output_dim, embed_dim, hidden_dim, enc_layers, dec_layers,
                 cell_type='LSTM', dropout=0.2, bidirectional=False):
        super(Seq2Seq, self).__init__()
        # Initialize encoder and decoder
        self.encoder = Encoder(input_dim, embed_dim, hidden_dim, enc_layers, cell_type, dropout, bidirectional)
        self.decoder = Decoder(output_dim, embed_dim, hidden_dim, dec_layers, cell_type, dropout, bidirectional)
        self.cell_type = cell_type

    # Adjust hidden state dimensions if encoder and decoder have different number of layers
    def adjust_hidden_for_decoder(self, encoder_hidden):
        enc_layers = self.encoder.rnn.num_layers
        dec_layers = self.decoder.rnn.num_layers

        def adjust(h):
            if enc_layers == dec_layers:
                return h
            elif enc_layers < dec_layers:
                # Repeat last encoder hidden layer to match decoder layers
                repeat_h = h[-1].unsqueeze(0).repeat(dec_layers - enc_layers, 1, 1)
                return torch.cat([h, repeat_h], dim=0)
            else:
                # Truncate encoder hidden state to match decoder layers
                return h[-dec_layers:]

        # If using LSTM, adjust both hidden state (h) and cell state (c)
        if self.cell_type == 'LSTM':
            h, c = encoder_hidden
            h = adjust(h)
            c = adjust(c)
            return (h, c)
        else:
            h = encoder_hidden
            h = adjust(h)
            return h

    def forward(self, src, trg, teacher_forcing_ratio=0.5):
        batch_size, trg_len = trg.size()

        # Initialize tensor to store decoder outputs
        outputs = torch.zeros(batch_size, trg_len, self.decoder.fc_out.out_features, device=src.device)

        # Pass source through encoder
        hidden = self.encoder(src)

        # Adjust encoder hidden states to fit decoder
        decoder_hidden = self.adjust_hidden_for_decoder(hidden)

        # First input to the decoder is the <sos> token
        input = trg[:, 0]

        # Decode each time step
        for t in range(1, trg_len):
            output, decoder_hidden = self.decoder(input, decoder_hidden)
            outputs[:, t] = output

            # Decide whether to use teacher forcing or not
            teacher_force = random.random() < teacher_forcing_ratio
            top1 = output.argmax(1)  # Get most probable token

            # Next input is ground truth if teacher forcing, else prediction
            input = trg[:, t] if teacher_force else top1

        return outputs



# ---------- Utility Functions ----------

# Build character vocabulary from dataset
def build_vocab(sequences):
    chars = set(ch for seq in sequences for ch in seq)  # Unique characters in dataset
    stoi = {'<pad>': 0, '<sos>': 1, '<eos>': 2, '<unk>': 3}  # Special tokens
    for ch in sorted(chars):
        stoi[ch] = len(stoi)
    itos = {i: ch for ch, i in stoi.items()}
    return stoi, itos

# Encode a sequence using string-to-index mapping
def encode_sequence(seq, stoi):
    return [stoi.get(c, stoi['<unk>']) for c in seq]  # Replace unknown chars with <unk>

# Prepare a batch of input-output sequences for training
def prepare_batch(pairs, inp_stoi, out_stoi, device):
    # Convert each sequence into tensor and add special tokens
    src_seq = [torch.tensor(encode_sequence(src, inp_stoi) + [inp_stoi['<eos>']]) for src, _ in pairs]
    trg_seq = [torch.tensor([out_stoi['<sos>']] + encode_sequence(trg, out_stoi) + [out_stoi['<eos>']]) for _, trg in pairs]

    # Pad sequences to make them the same length
    src_batch = pad_sequence(src_seq, batch_first=True, padding_value=inp_stoi['<pad>'])
    trg_batch = pad_sequence(trg_seq, batch_first=True, padding_value=out_stoi['<pad>'])

    return src_batch.to(device), trg_batch.to(device)

# Read a tab-separated file and return pairs (input, output)
def read_dataset(path):
    with open(path, encoding='utf-8') as f:
        lines = f.read().strip().split('\n')
        return [(l.split('\t')[1], l.split('\t')[0]) for l in lines if '\t' in l]

# Calculate word-level accuracy (entire sequences must match, excluding padding)
def calculate_word_accuracy(preds, targets, ignore_index=0):
    # Get predicted token indices (highest probability)
    preds = preds.argmax(dim=-1)  # Shape: [batch, seq_len]
    
    # Mask to ignore padding tokens
    mask = targets != ignore_index

    # Apply mask to predictions and targets
    preds_masked = preds * mask
    targets_masked = targets * mask

    # Compare entire sequences for exact match
    sequence_correct = (preds_masked == targets_masked).all(dim=1)
    
    # Compute accuracy as percentage of sequences that are fully correct
    word_accuracy = sequence_correct.float().mean().item() * 100

    return word_accuracy

# Evaluate the model on a dataset
def evaluate(model, data, src_vocab, tgt_vocab, device, criterion, batch_size):
    model.eval()
    total_loss = 0
    total_acc = 0

    with torch.no_grad():
        for i in range(0, len(data), batch_size):
            batch = data[i:i + batch_size]
            src, trg = prepare_batch(batch, src_vocab, tgt_vocab, device)

            # Forward pass
            output = model(src, trg)

            # Compute loss ignoring the <sos> token (at position 0)
            loss = criterion(output[:, 1:].reshape(-1, output.shape[-1]), trg[:, 1:].reshape(-1))

            # Compute word-level accuracy
            acc = calculate_word_accuracy(output[:, 1:], trg[:, 1:])

            total_loss += loss.item()
            total_acc += acc

    # Return average loss and accuracy over all batches
    return total_loss / len(data), total_acc / (len(data) // batch_size)


[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mviinod9[0m ([33mviinod9-iitm[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


# Train on train dataset

In [3]:
# ---------- Train Function ----------

def train():
    # Initialize a Weights & Biases (wandb) run with configuration for hyperparameters
    wandb.init(config={
        "embed_dim": 128,                  # Size of embedding vectors
        "hidden_dim": 256,                 # Hidden layer size for encoder/decoder
        "enc_layers": 2,                   # Number of encoder layers
        "dec_layers": 2,                   # Number of decoder layers
        "cell_type": "LSTM",               # Type of RNN cell (LSTM, GRU, RNN)
        "dropout": 0.2,                    # Dropout rate
        "epochs": 10,                      # Number of training epochs
        "batch_size": 64,                  # Batch size
        "bidirectional": False,            # Whether to use bidirectional encoder
        "learning_rate": 0.001,            # Learning rate for optimizer
        "optimizer": "adam",               # Optimizer to use (adam or nadam)
        "teacher_forcing_ratio": 0.5,      # Probability of using teacher forcing
        "beam_width": 1                    # Beam width for decoding (not used during training)
    })

    # Generate a unique name for the current run based on the config
    run_name = (
        f"{wandb.config.cell_type}_embed{wandb.config.embed_dim}_"
        f"hid{wandb.config.hidden_dim}_enc{wandb.config.enc_layers}_"
        f"dec{wandb.config.dec_layers}_drop{wandb.config.dropout}_"
        f"bs{wandb.config.batch_size}_lr{wandb.config.learning_rate}_"
        f"opt-{wandb.config.optimizer}_tf{wandb.config.teacher_forcing_ratio}_"
        f"bi-{wandb.config.bidirectional}_bw{wandb.config.beam_width}"
    )
    wandb.run.name = run_name

    # Get configuration and set device to GPU if available
    config = wandb.config
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    # Load training and validation data
    train_data = read_dataset("/kaggle/input/dakshina-dataset/dakshina_dataset_v1.0/hi/lexicons/hi.translit.sampled.train.tsv")
    dev_data = read_dataset("/kaggle/input/dakshina-dataset/dakshina_dataset_v1.0/hi/lexicons/hi.translit.sampled.dev.tsv")

    # Build vocabularies from training source and target data
    src_vocab, tgt_vocab = build_vocab([src for src, _ in train_data]), build_vocab([tgt for _, tgt in train_data])

    # Initialize Seq2Seq model with specified configuration
    model = Seq2Seq(len(src_vocab[0]), len(tgt_vocab[0]), config.embed_dim, config.hidden_dim,
                    config.enc_layers, config.dec_layers, config.cell_type, config.dropout, config.bidirectional).to(device)

    # Choose optimizer based on config
    if config.optimizer == "adam":
        optimizer = optim.Adam(model.parameters(), lr=config.learning_rate)
    elif config.optimizer == "nadam":
        optimizer = optim.NAdam(model.parameters(), lr=config.learning_rate)
    else:
        raise ValueError("Unsupported optimizer")

    # Use CrossEntropyLoss for training, ignoring the padding index (assumed to be 0)
    criterion = nn.CrossEntropyLoss(ignore_index=0)

    # Start training loop over epochs
    for epoch in range(config.epochs):
        model.train()                          # Set model to training mode
        total_loss = 0                         # Accumulate total loss for the epoch
        total_acc = 0                          # Accumulate total accuracy for the epoch
        random.shuffle(train_data)             # Shuffle training data at start of each epoch

        for i in range(0, len(train_data), config.batch_size):
            batch = train_data[i:i + config.batch_size]   # Get batch of data
            src, trg = prepare_batch(batch, src_vocab[0], tgt_vocab[0], device)  # Prepare tensors

            optimizer.zero_grad()             # Reset gradients
            output = model(src, trg, teacher_forcing_ratio=config.teacher_forcing_ratio)  # Forward pass

            # Compute loss (excluding first token)
            loss = criterion(output[:, 1:].reshape(-1, output.shape[-1]), trg[:, 1:].reshape(-1))
            
            # Compute accuracy of predicted words
            acc = calculate_word_accuracy(output[:, 1:], trg[:, 1:])

            loss.backward()                   # Backpropagation
            optimizer.step()                  # Optimizer update

            total_loss += loss.item()         # Accumulate loss
            total_acc += acc                  # Accumulate accuracy

        # Compute average loss and accuracy for training
        avg_train_loss = total_loss / len(train_data)
        avg_train_acc = total_acc / (len(train_data) // config.batch_size)

        # Evaluate model on validation set
        val_loss, val_acc = evaluate(model, dev_data, src_vocab[0], tgt_vocab[0], device, criterion, config.batch_size)

        # Log metrics to wandb
        wandb.log({
            "Train Loss": avg_train_loss,
            "Train Accuracy": avg_train_acc,
            "Validation Loss": val_loss,
            "Validation Accuracy": val_acc,
            "Epoch": epoch + 1,
            "Learning Rate": config.learning_rate,
            "Teacher Forcing Ratio": config.teacher_forcing_ratio,
            "Optimizer": config.optimizer,
            "Bidirectional": config.bidirectional,
            "Beam Width": config.beam_width
        })

        # Print epoch summary
        print(f"Epoch {epoch + 1}/{config.epochs} | Train Loss: {avg_train_loss:.4f}, Train Acc: {avg_train_acc:.2f}% | Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%")

    # Finish wandb run
    wandb.finish()

# ---------- Sweep Setup ----------

# Configuration for hyperparameter sweep using Bayesian optimization
sweep_config = {
    'method': 'bayes',   # Use Bayesian optimization to select next set of hyperparameters
    'metric': {'name': 'Validation Accuracy', 'goal': 'maximize'},  # Target metric

    # Define search space for hyperparameters
    'parameters': {
        'embed_dim': {'values': [32, 64, 256]},
        'hidden_dim': {'values': [64, 128]},
        'enc_layers': {'values': [1,2,3]},
        'dec_layers': {'values': [1,2,3]},
        'cell_type': {'values': ['LSTM','GRU','RNN']},
        'dropout': {'values': [0.2, 0.3]},
        'batch_size': {'values': [32,64]},
        'epochs': {'values': [5,10,15]},
        'bidirectional': {'values': [False]},
        'learning_rate': {'values': [0.001, 0.002, 0.0001]},
        'optimizer': {'values': ['adam', 'nadam']},
        'teacher_forcing_ratio': {'values': [0.2, 0.5, 0.7]},
        'beam_width': {'values': [1, 3, 5]}
    }
}

# Initialize a sweep with the defined configuration under the specified project name
sweep_id = wandb.sweep(sweep_config, project="without_attention_sweep")

# Launch sweep agent to run the `train` function multiple times with different hyperparameter combinations
wandb.agent(sweep_id, function=train, count=50)


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Create sweep with ID: k5x17vt3
Sweep URL: https://wandb.ai/viinod9-iitm/without_attention_sweep/sweeps/k5x17vt3


[34m[1mwandb[0m: Agent Starting Run: ic3zb24c with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 1
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 1
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/10 | Train Loss: 0.0547, Train Acc: 0.00% | Val Loss: 0.0502, Val Acc: 0.00%
Epoch 2/10 | Train Loss: 0.0479, Train Acc: 0.00% | Val Loss: 0.0478, Val Acc: 0.00%
Epoch 3/10 | Train Loss: 0.0453, Train Acc: 0.00% | Val Loss: 0.0457, Val Acc: 0.00%
Epoch 4/10 | Train Loss: 0.0431, Train Acc: 0.01% | Val Loss: 0.0443, Val Acc: 0.00%
Epoch 5/10 | Train Loss: 0.0417, Train Acc: 0.02% | Val Loss: 0.0432, Val Acc: 0.00%
Epoch 6/10 | Train Loss: 0.0404, Train Acc: 0.03% | Val Loss: 0.0420, Val Acc: 0.05%
Epoch 7/10 | Train Loss: 0.0392, Train Acc: 0.07% | Val Loss: 0.0406, Val Acc: 0.07%
Epoch 8/10 | Train Loss: 0.0380, Train Acc: 0.12% | Val Loss: 0.0400, Val Acc: 0.07%
Epoch 9/10 | Train Loss: 0.0369, Train Acc: 0.18% | Val Loss: 0.0388, Val Acc: 0.07%
Epoch 10/10 | Train Loss: 0.0358, Train Acc: 0.24% | Val Loss: 0.0379, Val Acc: 0.14%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▁▁▂▂▃▄▆█
Train Loss,█▅▅▄▃▃▂▂▁▁
Validation Accuracy,▁▁▁▁▁▃▅▅▅█
Validation Loss,█▇▅▅▄▃▃▂▂▁

0,1
Beam Width,3
Bidirectional,False
Epoch,10
Learning Rate,0.0001
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,0.24004
Train Loss,0.03579
Validation Accuracy,0.13787
Validation Loss,0.03791


[34m[1mwandb[0m: Agent Starting Run: jaxmm57s with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.5


Epoch 1/10 | Train Loss: 0.0437, Train Acc: 0.10% | Val Loss: 0.0347, Val Acc: 0.71%
Epoch 2/10 | Train Loss: 0.0288, Train Acc: 2.63% | Val Loss: 0.0237, Val Acc: 6.02%
Epoch 3/10 | Train Loss: 0.0223, Train Acc: 7.45% | Val Loss: 0.0198, Val Acc: 11.71%
Epoch 4/10 | Train Loss: 0.0192, Train Acc: 12.04% | Val Loss: 0.0174, Val Acc: 16.74%
Epoch 5/10 | Train Loss: 0.0175, Train Acc: 15.16% | Val Loss: 0.0163, Val Acc: 18.22%
Epoch 6/10 | Train Loss: 0.0164, Train Acc: 17.33% | Val Loss: 0.0153, Val Acc: 21.91%
Epoch 7/10 | Train Loss: 0.0155, Train Acc: 19.35% | Val Loss: 0.0149, Val Acc: 23.09%
Epoch 8/10 | Train Loss: 0.0149, Train Acc: 20.93% | Val Loss: 0.0145, Val Acc: 24.65%
Epoch 9/10 | Train Loss: 0.0143, Train Acc: 22.16% | Val Loss: 0.0140, Val Acc: 26.53%
Epoch 10/10 | Train Loss: 0.0138, Train Acc: 23.85% | Val Loss: 0.0140, Val Acc: 26.62%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▅▅▆▇▇██
Train Loss,█▅▃▂▂▂▁▁▁▁
Validation Accuracy,▁▂▄▅▆▇▇▇██
Validation Loss,█▄▃▂▂▁▁▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,10
Learning Rate,0.002
Optimizer,adam
Teacher Forcing Ratio,0.5
Train Accuracy,23.84655
Train Loss,0.01382
Validation Accuracy,26.62377
Validation Loss,0.01399


[34m[1mwandb[0m: Agent Starting Run: spmdwhll with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: GRU
[34m[1mwandb[0m: 	dec_layers: 2
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.2


Epoch 1/15 | Train Loss: 0.0515, Train Acc: 0.00% | Val Loss: 0.0505, Val Acc: 0.00%
Epoch 2/15 | Train Loss: 0.0478, Train Acc: 0.00% | Val Loss: 0.0477, Val Acc: 0.00%
Epoch 3/15 | Train Loss: 0.0448, Train Acc: 0.01% | Val Loss: 0.0443, Val Acc: 0.00%
Epoch 4/15 | Train Loss: 0.0418, Train Acc: 0.09% | Val Loss: 0.0409, Val Acc: 0.16%
Epoch 5/15 | Train Loss: 0.0388, Train Acc: 0.21% | Val Loss: 0.0375, Val Acc: 0.18%
Epoch 6/15 | Train Loss: 0.0360, Train Acc: 0.33% | Val Loss: 0.0347, Val Acc: 0.28%
Epoch 7/15 | Train Loss: 0.0337, Train Acc: 0.63% | Val Loss: 0.0324, Val Acc: 0.71%
Epoch 8/15 | Train Loss: 0.0318, Train Acc: 0.94% | Val Loss: 0.0305, Val Acc: 1.17%
Epoch 9/15 | Train Loss: 0.0302, Train Acc: 1.44% | Val Loss: 0.0289, Val Acc: 1.75%
Epoch 10/15 | Train Loss: 0.0287, Train Acc: 2.05% | Val Loss: 0.0274, Val Acc: 2.80%
Epoch 11/15 | Train Loss: 0.0274, Train Acc: 2.71% | Val Loss: 0.0262, Val Acc: 3.47%
Epoch 12/15 | Train Loss: 0.0262, Train Acc: 3.50% | Val Loss: 

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▁▁▁▁▂▂▃▃▄▅▆▇█
Train Loss,█▇▆▆▅▄▄▃▃▂▂▂▁▁▁
Validation Accuracy,▁▁▁▁▁▁▂▂▃▄▄▅▆▇█
Validation Loss,█▇▆▆▅▄▃▃▃▂▂▂▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,15
Learning Rate,0.0001
Optimizer,nadam
Teacher Forcing Ratio,0.2
Train Accuracy,5.76293
Train Loss,0.02356
Validation Accuracy,6.98529
Validation Loss,0.02234


[34m[1mwandb[0m: Agent Starting Run: fhiwxxty with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: GRU
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/10 | Train Loss: 0.1028, Train Acc: 0.00% | Val Loss: 0.1001, Val Acc: 0.00%
Epoch 2/10 | Train Loss: 0.0962, Train Acc: 0.00% | Val Loss: 0.0954, Val Acc: 0.00%
Epoch 3/10 | Train Loss: 0.0902, Train Acc: 0.01% | Val Loss: 0.0888, Val Acc: 0.00%
Epoch 4/10 | Train Loss: 0.0833, Train Acc: 0.04% | Val Loss: 0.0812, Val Acc: 0.05%
Epoch 5/10 | Train Loss: 0.0770, Train Acc: 0.10% | Val Loss: 0.0765, Val Acc: 0.11%
Epoch 6/10 | Train Loss: 0.0723, Train Acc: 0.23% | Val Loss: 0.0718, Val Acc: 0.21%
Epoch 7/10 | Train Loss: 0.0679, Train Acc: 0.44% | Val Loss: 0.0680, Val Acc: 0.46%
Epoch 8/10 | Train Loss: 0.0642, Train Acc: 0.65% | Val Loss: 0.0641, Val Acc: 0.62%
Epoch 9/10 | Train Loss: 0.0608, Train Acc: 1.00% | Val Loss: 0.0602, Val Acc: 0.90%
Epoch 10/10 | Train Loss: 0.0577, Train Acc: 1.39% | Val Loss: 0.0571, Val Acc: 1.31%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▁▁▂▂▃▄▆█
Train Loss,█▇▆▅▄▃▃▂▁▁
Validation Accuracy,▁▁▁▁▂▂▃▄▆█
Validation Loss,█▇▆▅▄▃▃▂▂▁

0,1
Beam Width,1
Bidirectional,False
Epoch,10
Learning Rate,0.0001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,1.39316
Train Loss,0.05774
Validation Accuracy,1.30974
Validation Loss,0.05708


[34m[1mwandb[0m: Agent Starting Run: s91hn8qq with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: GRU
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/5 | Train Loss: 0.0382, Train Acc: 0.60% | Val Loss: 0.0286, Val Acc: 1.98%
Epoch 2/5 | Train Loss: 0.0228, Train Acc: 5.81% | Val Loss: 0.0206, Val Acc: 10.59%
Epoch 3/5 | Train Loss: 0.0182, Train Acc: 11.77% | Val Loss: 0.0179, Val Acc: 17.21%
Epoch 4/5 | Train Loss: 0.0161, Train Acc: 15.98% | Val Loss: 0.0161, Val Acc: 21.07%
Epoch 5/5 | Train Loss: 0.0147, Train Acc: 18.71% | Val Loss: 0.0155, Val Acc: 22.14%


0,1
Beam Width,▁▁▁▁▁
Epoch,▁▃▅▆█
Learning Rate,▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁
Train Accuracy,▁▃▅▇█
Train Loss,█▃▂▁▁
Validation Accuracy,▁▄▆██
Validation Loss,█▄▂▁▁

0,1
Beam Width,3
Bidirectional,False
Epoch,5
Learning Rate,0.002
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,18.71336
Train Loss,0.01475
Validation Accuracy,22.13542
Validation Loss,0.01551


[34m[1mwandb[0m: Agent Starting Run: ncm6qvdt with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0333, Train Acc: 3.16% | Val Loss: 0.0197, Val Acc: 12.20%
Epoch 2/15 | Train Loss: 0.0153, Train Acc: 18.17% | Val Loss: 0.0150, Val Acc: 25.38%
Epoch 3/15 | Train Loss: 0.0120, Train Acc: 26.40% | Val Loss: 0.0134, Val Acc: 28.98%
Epoch 4/15 | Train Loss: 0.0104, Train Acc: 31.39% | Val Loss: 0.0131, Val Acc: 31.50%
Epoch 5/15 | Train Loss: 0.0095, Train Acc: 34.95% | Val Loss: 0.0125, Val Acc: 33.07%
Epoch 6/15 | Train Loss: 0.0087, Train Acc: 37.73% | Val Loss: 0.0118, Val Acc: 34.95%
Epoch 7/15 | Train Loss: 0.0081, Train Acc: 40.39% | Val Loss: 0.0118, Val Acc: 34.68%
Epoch 8/15 | Train Loss: 0.0076, Train Acc: 42.67% | Val Loss: 0.0118, Val Acc: 34.70%
Epoch 9/15 | Train Loss: 0.0074, Train Acc: 44.23% | Val Loss: 0.0115, Val Acc: 35.39%
Epoch 10/15 | Train Loss: 0.0071, Train Acc: 45.64% | Val Loss: 0.0117, Val Acc: 36.57%
Epoch 11/15 | Train Loss: 0.0066, Train Acc: 47.71% | Val Loss: 0.0121, Val Acc: 36.14%
Epoch 12/15 | Train Loss: 0.0064, Train Ac

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▃▄▅▆▆▆▇▇▇▇████
Train Loss,█▃▃▂▂▂▂▁▁▁▁▁▁▁▁
Validation Accuracy,▁▅▆▆▇▇▇▇███████
Validation Loss,█▄▃▃▂▂▂▂▁▁▂▂▂▂▁

0,1
Beam Width,3
Bidirectional,False
Epoch,15
Learning Rate,0.002
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,52.3594
Train Loss,0.00572
Validation Accuracy,36.76471
Validation Loss,0.01107


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: a7hvlbnu with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: GRU
[34m[1mwandb[0m: 	dec_layers: 2
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/10 | Train Loss: 0.0498, Train Acc: 0.01% | Val Loss: 0.0473, Val Acc: 0.00%
Epoch 2/10 | Train Loss: 0.0422, Train Acc: 0.04% | Val Loss: 0.0415, Val Acc: 0.05%
Epoch 3/10 | Train Loss: 0.0371, Train Acc: 0.17% | Val Loss: 0.0367, Val Acc: 0.09%
Epoch 4/10 | Train Loss: 0.0328, Train Acc: 0.59% | Val Loss: 0.0329, Val Acc: 0.71%
Epoch 5/10 | Train Loss: 0.0292, Train Acc: 1.46% | Val Loss: 0.0291, Val Acc: 1.72%
Epoch 6/10 | Train Loss: 0.0263, Train Acc: 2.67% | Val Loss: 0.0271, Val Acc: 3.95%
Epoch 7/10 | Train Loss: 0.0241, Train Acc: 4.14% | Val Loss: 0.0244, Val Acc: 5.88%
Epoch 8/10 | Train Loss: 0.0223, Train Acc: 6.03% | Val Loss: 0.0224, Val Acc: 8.40%
Epoch 9/10 | Train Loss: 0.0208, Train Acc: 7.77% | Val Loss: 0.0211, Val Acc: 11.14%
Epoch 10/10 | Train Loss: 0.0195, Train Acc: 9.68% | Val Loss: 0.0204, Val Acc: 12.19%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▁▁▂▃▄▅▇█
Train Loss,█▆▅▄▃▃▂▂▁▁
Validation Accuracy,▁▁▁▁▂▃▄▆▇█
Validation Loss,█▆▅▄▃▃▂▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,10
Learning Rate,0.0001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,9.68441
Train Loss,0.01953
Validation Accuracy,12.19363
Validation Loss,0.02044


[34m[1mwandb[0m: Agent Starting Run: 881k1p94 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: GRU
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/10 | Train Loss: 0.0368, Train Acc: 0.62% | Val Loss: 0.0270, Val Acc: 2.99%
Epoch 2/10 | Train Loss: 0.0226, Train Acc: 5.94% | Val Loss: 0.0195, Val Acc: 11.60%
Epoch 3/10 | Train Loss: 0.0185, Train Acc: 11.39% | Val Loss: 0.0178, Val Acc: 18.13%
Epoch 4/10 | Train Loss: 0.0166, Train Acc: 15.03% | Val Loss: 0.0158, Val Acc: 22.42%
Epoch 5/10 | Train Loss: 0.0153, Train Acc: 17.90% | Val Loss: 0.0154, Val Acc: 23.71%
Epoch 6/10 | Train Loss: 0.0144, Train Acc: 19.90% | Val Loss: 0.0154, Val Acc: 26.22%
Epoch 7/10 | Train Loss: 0.0138, Train Acc: 21.38% | Val Loss: 0.0150, Val Acc: 26.01%
Epoch 8/10 | Train Loss: 0.0133, Train Acc: 22.52% | Val Loss: 0.0139, Val Acc: 28.88%
Epoch 9/10 | Train Loss: 0.0130, Train Acc: 23.57% | Val Loss: 0.0134, Val Acc: 28.63%
Epoch 10/10 | Train Loss: 0.0125, Train Acc: 24.25% | Val Loss: 0.0144, Val Acc: 29.19%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▃▄▅▆▇▇▇██
Train Loss,█▄▃▂▂▂▁▁▁▁
Validation Accuracy,▁▃▅▆▇▇▇███
Validation Loss,█▄▃▂▂▂▂▁▁▂

0,1
Beam Width,5
Bidirectional,False
Epoch,10
Learning Rate,0.002
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,24.24839
Train Loss,0.01254
Validation Accuracy,29.18964
Validation Loss,0.01444


[34m[1mwandb[0m: Agent Starting Run: koozt0gy with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: RNN
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/10 | Train Loss: 0.0906, Train Acc: 0.00% | Val Loss: 0.0953, Val Acc: 0.00%
Epoch 2/10 | Train Loss: 0.0879, Train Acc: 0.00% | Val Loss: 0.0941, Val Acc: 0.00%
Epoch 3/10 | Train Loss: 0.0876, Train Acc: 0.00% | Val Loss: 0.0957, Val Acc: 0.00%
Epoch 4/10 | Train Loss: 0.0870, Train Acc: 0.00% | Val Loss: 0.0956, Val Acc: 0.00%
Epoch 5/10 | Train Loss: 0.0868, Train Acc: 0.00% | Val Loss: 0.0937, Val Acc: 0.00%
Epoch 6/10 | Train Loss: 0.0867, Train Acc: 0.00% | Val Loss: 0.0936, Val Acc: 0.00%
Epoch 7/10 | Train Loss: 0.0862, Train Acc: 0.00% | Val Loss: 0.0915, Val Acc: 0.00%
Epoch 8/10 | Train Loss: 0.0856, Train Acc: 0.00% | Val Loss: 0.0919, Val Acc: 0.00%
Epoch 9/10 | Train Loss: 0.0854, Train Acc: 0.00% | Val Loss: 0.0927, Val Acc: 0.00%
Epoch 10/10 | Train Loss: 0.0850, Train Acc: 0.01% | Val Loss: 0.0909, Val Acc: 0.00%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▆▃▆▃▃▆▁▆▃█
Train Loss,█▅▄▃▃▃▂▂▁▁
Validation Accuracy,▁▁▁▁▁▁▁▁▁▁
Validation Loss,▇▆██▅▅▂▃▄▁

0,1
Beam Width,5
Bidirectional,False
Epoch,10
Learning Rate,0.002
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,0.00679
Train Loss,0.08505
Validation Accuracy,0
Validation Loss,0.09087


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: y19lb87h with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: RNN
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/5 | Train Loss: 0.0498, Train Acc: 0.00% | Val Loss: 0.0481, Val Acc: 0.00%
Epoch 2/5 | Train Loss: 0.0457, Train Acc: 0.01% | Val Loss: 0.0473, Val Acc: 0.02%
Epoch 3/5 | Train Loss: 0.0439, Train Acc: 0.01% | Val Loss: 0.0460, Val Acc: 0.00%
Epoch 4/5 | Train Loss: 0.0425, Train Acc: 0.00% | Val Loss: 0.0445, Val Acc: 0.00%
Epoch 5/5 | Train Loss: 0.0414, Train Acc: 0.02% | Val Loss: 0.0433, Val Acc: 0.00%


0,1
Beam Width,▁▁▁▁▁
Epoch,▁▃▅▆█
Learning Rate,▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁
Train Accuracy,▁▃▅▃█
Train Loss,█▅▃▂▁
Validation Accuracy,▁█▁▁▁
Validation Loss,█▇▅▃▁

0,1
Beam Width,3
Bidirectional,False
Epoch,5
Learning Rate,0.0001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,0.02038
Train Loss,0.04137
Validation Accuracy,0
Validation Loss,0.04327


[34m[1mwandb[0m: Agent Starting Run: go2oz2w6 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0433, Train Acc: 0.06% | Val Loss: 0.0366, Val Acc: 0.18%
Epoch 2/15 | Train Loss: 0.0287, Train Acc: 2.11% | Val Loss: 0.0247, Val Acc: 6.24%
Epoch 3/15 | Train Loss: 0.0209, Train Acc: 7.86% | Val Loss: 0.0192, Val Acc: 15.20%
Epoch 4/15 | Train Loss: 0.0174, Train Acc: 13.27% | Val Loss: 0.0168, Val Acc: 20.00%
Epoch 5/15 | Train Loss: 0.0153, Train Acc: 17.39% | Val Loss: 0.0155, Val Acc: 24.50%
Epoch 6/15 | Train Loss: 0.0142, Train Acc: 20.20% | Val Loss: 0.0151, Val Acc: 25.35%
Epoch 7/15 | Train Loss: 0.0133, Train Acc: 22.37% | Val Loss: 0.0143, Val Acc: 27.36%
Epoch 8/15 | Train Loss: 0.0125, Train Acc: 24.46% | Val Loss: 0.0135, Val Acc: 28.55%
Epoch 9/15 | Train Loss: 0.0119, Train Acc: 25.83% | Val Loss: 0.0134, Val Acc: 29.16%
Epoch 10/15 | Train Loss: 0.0116, Train Acc: 27.08% | Val Loss: 0.0131, Val Acc: 29.74%
Epoch 11/15 | Train Loss: 0.0112, Train Acc: 28.17% | Val Loss: 0.0133, Val Acc: 31.01%
Epoch 12/15 | Train Loss: 0.0109, Train Acc: 2

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▃▄▅▅▆▆▇▇▇████
Train Loss,█▅▃▃▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▂▄▅▆▆▇▇▇▇█████
Validation Loss,█▅▃▂▂▂▂▁▁▁▁▁▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,15
Learning Rate,0.002
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,31.79595
Train Loss,0.01007
Validation Accuracy,33.3027
Validation Loss,0.01292


[34m[1mwandb[0m: Agent Starting Run: 2rmlu4ov with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.2


Epoch 1/15 | Train Loss: 0.0415, Train Acc: 0.32% | Val Loss: 0.0287, Val Acc: 2.32%
Epoch 2/15 | Train Loss: 0.0235, Train Acc: 6.45% | Val Loss: 0.0181, Val Acc: 13.60%
Epoch 3/15 | Train Loss: 0.0181, Train Acc: 14.31% | Val Loss: 0.0158, Val Acc: 18.56%
Epoch 4/15 | Train Loss: 0.0158, Train Acc: 19.46% | Val Loss: 0.0143, Val Acc: 24.87%
Epoch 5/15 | Train Loss: 0.0144, Train Acc: 22.94% | Val Loss: 0.0136, Val Acc: 26.21%
Epoch 6/15 | Train Loss: 0.0134, Train Acc: 26.04% | Val Loss: 0.0133, Val Acc: 28.57%
Epoch 7/15 | Train Loss: 0.0126, Train Acc: 28.94% | Val Loss: 0.0128, Val Acc: 28.55%
Epoch 8/15 | Train Loss: 0.0119, Train Acc: 31.48% | Val Loss: 0.0126, Val Acc: 30.84%
Epoch 9/15 | Train Loss: 0.0114, Train Acc: 33.21% | Val Loss: 0.0122, Val Acc: 31.07%
Epoch 10/15 | Train Loss: 0.0109, Train Acc: 35.23% | Val Loss: 0.0121, Val Acc: 31.83%
Epoch 11/15 | Train Loss: 0.0104, Train Acc: 37.22% | Val Loss: 0.0119, Val Acc: 32.25%
Epoch 12/15 | Train Loss: 0.0100, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▄▅▅▆▆▆▇▇▇███
Train Loss,█▄▃▂▂▂▂▂▂▁▁▁▁▁▁
Validation Accuracy,▁▄▅▆▆▇▇▇▇██████
Validation Loss,█▄▃▂▂▂▁▁▁▁▁▁▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.2
Train Accuracy,42.44709
Train Loss,0.00894
Validation Accuracy,33.52482
Validation Loss,0.01169


[34m[1mwandb[0m: Agent Starting Run: 1rajvble with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.5


Epoch 1/10 | Train Loss: 0.0391, Train Acc: 0.67% | Val Loss: 0.0267, Val Acc: 4.14%
Epoch 2/10 | Train Loss: 0.0212, Train Acc: 9.17% | Val Loss: 0.0174, Val Acc: 16.04%
Epoch 3/10 | Train Loss: 0.0162, Train Acc: 18.00% | Val Loss: 0.0151, Val Acc: 21.44%
Epoch 4/10 | Train Loss: 0.0140, Train Acc: 23.48% | Val Loss: 0.0138, Val Acc: 26.52%
Epoch 5/10 | Train Loss: 0.0126, Train Acc: 27.35% | Val Loss: 0.0127, Val Acc: 29.32%
Epoch 6/10 | Train Loss: 0.0115, Train Acc: 30.77% | Val Loss: 0.0124, Val Acc: 30.96%
Epoch 7/10 | Train Loss: 0.0108, Train Acc: 33.09% | Val Loss: 0.0122, Val Acc: 31.99%
Epoch 8/10 | Train Loss: 0.0101, Train Acc: 35.83% | Val Loss: 0.0116, Val Acc: 32.96%
Epoch 9/10 | Train Loss: 0.0097, Train Acc: 37.90% | Val Loss: 0.0119, Val Acc: 34.97%
Epoch 10/10 | Train Loss: 0.0093, Train Acc: 39.55% | Val Loss: 0.0113, Val Acc: 35.71%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▃▄▅▆▆▇▇██
Train Loss,█▄▃▂▂▂▁▁▁▁
Validation Accuracy,▁▄▅▆▇▇▇▇██
Validation Loss,█▄▃▂▂▂▁▁▁▁

0,1
Beam Width,3
Bidirectional,False
Epoch,10
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.5
Train Accuracy,39.55307
Train Loss,0.00931
Validation Accuracy,35.70772
Validation Loss,0.01129


[34m[1mwandb[0m: Agent Starting Run: 5pjp72op with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0389, Train Acc: 0.68% | Val Loss: 0.0262, Val Acc: 3.42%
Epoch 2/15 | Train Loss: 0.0201, Train Acc: 9.26% | Val Loss: 0.0181, Val Acc: 15.33%
Epoch 3/15 | Train Loss: 0.0152, Train Acc: 17.76% | Val Loss: 0.0158, Val Acc: 22.37%
Epoch 4/15 | Train Loss: 0.0131, Train Acc: 23.22% | Val Loss: 0.0144, Val Acc: 25.51%
Epoch 5/15 | Train Loss: 0.0116, Train Acc: 26.92% | Val Loss: 0.0138, Val Acc: 28.74%
Epoch 6/15 | Train Loss: 0.0109, Train Acc: 30.01% | Val Loss: 0.0128, Val Acc: 30.15%
Epoch 7/15 | Train Loss: 0.0099, Train Acc: 32.61% | Val Loss: 0.0124, Val Acc: 32.00%
Epoch 8/15 | Train Loss: 0.0094, Train Acc: 34.82% | Val Loss: 0.0124, Val Acc: 33.03%
Epoch 9/15 | Train Loss: 0.0089, Train Acc: 36.99% | Val Loss: 0.0122, Val Acc: 33.78%
Epoch 10/15 | Train Loss: 0.0084, Train Acc: 38.78% | Val Loss: 0.0126, Val Acc: 33.61%
Epoch 11/15 | Train Loss: 0.0081, Train Acc: 40.62% | Val Loss: 0.0123, Val Acc: 35.28%
Epoch 12/15 | Train Loss: 0.0078, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▄▄▅▆▆▆▇▇▇▇███
Train Loss,█▄▃▂▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▃▅▆▆▇▇▇▇▇█████
Validation Loss,█▄▃▂▂▂▁▁▁▁▁▁▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,46.1166
Train Loss,0.00696
Validation Accuracy,36.94853
Validation Loss,0.01246


[34m[1mwandb[0m: Agent Starting Run: hu8fbifn with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0431, Train Acc: 0.16% | Val Loss: 0.0349, Val Acc: 0.71%
Epoch 2/15 | Train Loss: 0.0281, Train Acc: 2.43% | Val Loss: 0.0242, Val Acc: 5.70%
Epoch 3/15 | Train Loss: 0.0212, Train Acc: 7.43% | Val Loss: 0.0199, Val Acc: 12.54%
Epoch 4/15 | Train Loss: 0.0180, Train Acc: 12.16% | Val Loss: 0.0176, Val Acc: 16.80%
Epoch 5/15 | Train Loss: 0.0160, Train Acc: 15.72% | Val Loss: 0.0163, Val Acc: 19.92%
Epoch 6/15 | Train Loss: 0.0147, Train Acc: 18.64% | Val Loss: 0.0157, Val Acc: 22.23%
Epoch 7/15 | Train Loss: 0.0138, Train Acc: 20.55% | Val Loss: 0.0151, Val Acc: 23.94%
Epoch 8/15 | Train Loss: 0.0131, Train Acc: 22.80% | Val Loss: 0.0145, Val Acc: 25.15%
Epoch 9/15 | Train Loss: 0.0124, Train Acc: 24.60% | Val Loss: 0.0138, Val Acc: 27.28%
Epoch 10/15 | Train Loss: 0.0120, Train Acc: 26.13% | Val Loss: 0.0140, Val Acc: 26.85%
Epoch 11/15 | Train Loss: 0.0116, Train Acc: 27.21% | Val Loss: 0.0134, Val Acc: 28.35%
Epoch 12/15 | Train Loss: 0.0112, Train Acc: 2

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▄▄▅▆▆▆▇▇▇███
Train Loss,█▅▃▃▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▂▄▅▅▆▆▇▇▇▇▇███
Validation Loss,█▅▃▂▂▂▂▂▁▁▁▁▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,31.42601
Train Loss,0.01025
Validation Accuracy,31.72488
Validation Loss,0.01289


[34m[1mwandb[0m: Agent Starting Run: vzokclpj with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: GRU
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0326, Train Acc: 2.43% | Val Loss: 0.0218, Val Acc: 9.30%
Epoch 2/15 | Train Loss: 0.0177, Train Acc: 12.83% | Val Loss: 0.0163, Val Acc: 21.25%
Epoch 3/15 | Train Loss: 0.0141, Train Acc: 20.33% | Val Loss: 0.0153, Val Acc: 24.68%
Epoch 4/15 | Train Loss: 0.0124, Train Acc: 24.90% | Val Loss: 0.0140, Val Acc: 28.25%
Epoch 5/15 | Train Loss: 0.0114, Train Acc: 27.92% | Val Loss: 0.0133, Val Acc: 30.51%
Epoch 6/15 | Train Loss: 0.0106, Train Acc: 30.68% | Val Loss: 0.0132, Val Acc: 31.92%
Epoch 7/15 | Train Loss: 0.0100, Train Acc: 32.85% | Val Loss: 0.0135, Val Acc: 32.34%
Epoch 8/15 | Train Loss: 0.0095, Train Acc: 34.49% | Val Loss: 0.0131, Val Acc: 33.23%
Epoch 9/15 | Train Loss: 0.0092, Train Acc: 35.80% | Val Loss: 0.0124, Val Acc: 32.83%
Epoch 10/15 | Train Loss: 0.0088, Train Acc: 37.28% | Val Loss: 0.0130, Val Acc: 34.19%
Epoch 11/15 | Train Loss: 0.0085, Train Acc: 38.82% | Val Loss: 0.0124, Val Acc: 36.08%
Epoch 12/15 | Train Loss: 0.0082, Train Acc

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▃▄▅▅▆▆▇▇▇▇▇███
Train Loss,█▄▃▂▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▄▅▆▆▇▇▇▇▇█████
Validation Loss,█▄▄▃▂▂▂▂▁▂▁▂▁▁▁

0,1
Beam Width,3
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,42.32852
Train Loss,0.00772
Validation Accuracy,35.57751
Validation Loss,0.0124


[34m[1mwandb[0m: Agent Starting Run: i09cegol with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: GRU
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 1
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.2


Epoch 1/15 | Train Loss: 0.0430, Train Acc: 0.15% | Val Loss: 0.0350, Val Acc: 0.30%
Epoch 2/15 | Train Loss: 0.0318, Train Acc: 1.12% | Val Loss: 0.0280, Val Acc: 2.48%
Epoch 3/15 | Train Loss: 0.0274, Train Acc: 2.96% | Val Loss: 0.0251, Val Acc: 3.54%
Epoch 4/15 | Train Loss: 0.0250, Train Acc: 4.63% | Val Loss: 0.0231, Val Acc: 5.60%
Epoch 5/15 | Train Loss: 0.0233, Train Acc: 6.31% | Val Loss: 0.0215, Val Acc: 7.56%
Epoch 6/15 | Train Loss: 0.0221, Train Acc: 7.81% | Val Loss: 0.0205, Val Acc: 9.22%
Epoch 7/15 | Train Loss: 0.0212, Train Acc: 9.01% | Val Loss: 0.0196, Val Acc: 10.55%
Epoch 8/15 | Train Loss: 0.0206, Train Acc: 9.82% | Val Loss: 0.0193, Val Acc: 10.21%
Epoch 9/15 | Train Loss: 0.0201, Train Acc: 10.48% | Val Loss: 0.0189, Val Acc: 11.34%
Epoch 10/15 | Train Loss: 0.0195, Train Acc: 11.70% | Val Loss: 0.0184, Val Acc: 12.37%
Epoch 11/15 | Train Loss: 0.0191, Train Acc: 12.35% | Val Loss: 0.0181, Val Acc: 12.67%
Epoch 12/15 | Train Loss: 0.0187, Train Acc: 13.15% | V

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▂▃▄▅▅▆▆▇▇▇███
Train Loss,█▅▄▃▃▂▂▂▂▁▁▁▁▁▁
Validation Accuracy,▁▂▃▄▄▅▆▆▆▇▇▇███
Validation Loss,█▅▄▃▃▂▂▂▂▁▁▁▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.2
Train Accuracy,14.78179
Train Loss,0.01788
Validation Accuracy,14.48376
Validation Loss,0.01754


[34m[1mwandb[0m: Agent Starting Run: y82ufbi5 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.2


Epoch 1/15 | Train Loss: 0.0368, Train Acc: 1.06% | Val Loss: 0.0248, Val Acc: 4.25%
Epoch 2/15 | Train Loss: 0.0219, Train Acc: 7.82% | Val Loss: 0.0180, Val Acc: 12.71%
Epoch 3/15 | Train Loss: 0.0179, Train Acc: 14.64% | Val Loss: 0.0164, Val Acc: 17.50%
Epoch 4/15 | Train Loss: 0.0159, Train Acc: 19.23% | Val Loss: 0.0147, Val Acc: 22.16%
Epoch 5/15 | Train Loss: 0.0146, Train Acc: 23.06% | Val Loss: 0.0142, Val Acc: 22.78%
Epoch 6/15 | Train Loss: 0.0136, Train Acc: 26.03% | Val Loss: 0.0133, Val Acc: 25.81%
Epoch 7/15 | Train Loss: 0.0128, Train Acc: 28.37% | Val Loss: 0.0132, Val Acc: 26.42%
Epoch 8/15 | Train Loss: 0.0121, Train Acc: 30.51% | Val Loss: 0.0125, Val Acc: 29.51%
Epoch 9/15 | Train Loss: 0.0115, Train Acc: 32.79% | Val Loss: 0.0126, Val Acc: 28.45%
Epoch 10/15 | Train Loss: 0.0110, Train Acc: 34.66% | Val Loss: 0.0125, Val Acc: 30.34%
Epoch 11/15 | Train Loss: 0.0106, Train Acc: 36.12% | Val Loss: 0.0121, Val Acc: 30.25%
Epoch 12/15 | Train Loss: 0.0102, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▄▅▅▆▆▆▇▇▇███
Train Loss,█▄▃▃▂▂▂▂▂▁▁▁▁▁▁
Validation Accuracy,▁▃▄▆▆▇▇▇▇██▇███
Validation Loss,█▄▃▂▂▂▂▁▁▁▁▁▁▁▁

0,1
Beam Width,3
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.2
Train Accuracy,41.77001
Train Loss,0.00919
Validation Accuracy,30.5913
Validation Loss,0.01214


[34m[1mwandb[0m: Agent Starting Run: qswetr5p with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 1
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.5


Epoch 1/5 | Train Loss: 0.0347, Train Acc: 1.70% | Val Loss: 0.0232, Val Acc: 6.56%
Epoch 2/5 | Train Loss: 0.0199, Train Acc: 10.21% | Val Loss: 0.0178, Val Acc: 15.82%
Epoch 3/5 | Train Loss: 0.0160, Train Acc: 18.36% | Val Loss: 0.0158, Val Acc: 21.61%
Epoch 4/5 | Train Loss: 0.0140, Train Acc: 23.47% | Val Loss: 0.0141, Val Acc: 24.98%
Epoch 5/5 | Train Loss: 0.0126, Train Acc: 27.80% | Val Loss: 0.0138, Val Acc: 27.20%


0,1
Beam Width,▁▁▁▁▁
Epoch,▁▃▅▆█
Learning Rate,▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁
Train Accuracy,▁▃▅▇█
Train Loss,█▃▂▁▁
Validation Accuracy,▁▄▆▇█
Validation Loss,█▄▃▁▁

0,1
Beam Width,3
Bidirectional,False
Epoch,5
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.5
Train Accuracy,27.80056
Train Loss,0.01259
Validation Accuracy,27.19822
Validation Loss,0.01378


[34m[1mwandb[0m: Agent Starting Run: 1qnffufq with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/5 | Train Loss: 0.0365, Train Acc: 1.24% | Val Loss: 0.0230, Val Acc: 8.41%
Epoch 2/5 | Train Loss: 0.0176, Train Acc: 13.50% | Val Loss: 0.0158, Val Acc: 22.11%
Epoch 3/5 | Train Loss: 0.0135, Train Acc: 22.07% | Val Loss: 0.0134, Val Acc: 27.95%
Epoch 4/5 | Train Loss: 0.0117, Train Acc: 27.28% | Val Loss: 0.0131, Val Acc: 30.75%
Epoch 5/5 | Train Loss: 0.0105, Train Acc: 30.61% | Val Loss: 0.0128, Val Acc: 31.53%


0,1
Beam Width,▁▁▁▁▁
Epoch,▁▃▅▆█
Learning Rate,▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁
Train Accuracy,▁▄▆▇█
Train Loss,█▃▂▁▁
Validation Accuracy,▁▅▇██
Validation Loss,█▃▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,5
Learning Rate,0.002
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,30.61141
Train Loss,0.01049
Validation Accuracy,31.53339
Validation Loss,0.01277


[34m[1mwandb[0m: Agent Starting Run: w25xpubv with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0438, Train Acc: 0.08% | Val Loss: 0.0358, Val Acc: 0.32%
Epoch 2/15 | Train Loss: 0.0292, Train Acc: 1.88% | Val Loss: 0.0255, Val Acc: 4.89%
Epoch 3/15 | Train Loss: 0.0219, Train Acc: 6.80% | Val Loss: 0.0204, Val Acc: 10.95%
Epoch 4/15 | Train Loss: 0.0185, Train Acc: 11.39% | Val Loss: 0.0178, Val Acc: 15.76%
Epoch 5/15 | Train Loss: 0.0165, Train Acc: 14.84% | Val Loss: 0.0171, Val Acc: 19.15%
Epoch 6/15 | Train Loss: 0.0152, Train Acc: 17.63% | Val Loss: 0.0157, Val Acc: 23.01%
Epoch 7/15 | Train Loss: 0.0142, Train Acc: 20.05% | Val Loss: 0.0153, Val Acc: 23.98%
Epoch 8/15 | Train Loss: 0.0135, Train Acc: 21.84% | Val Loss: 0.0149, Val Acc: 26.59%
Epoch 9/15 | Train Loss: 0.0128, Train Acc: 24.04% | Val Loss: 0.0141, Val Acc: 26.72%
Epoch 10/15 | Train Loss: 0.0123, Train Acc: 25.22% | Val Loss: 0.0139, Val Acc: 27.21%
Epoch 11/15 | Train Loss: 0.0119, Train Acc: 26.34% | Val Loss: 0.0134, Val Acc: 28.33%
Epoch 12/15 | Train Loss: 0.0115, Train Acc: 2

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▃▄▄▅▆▆▆▇▇▇███
Train Loss,█▅▃▃▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▂▃▄▅▆▆▇▇▇▇▇███
Validation Loss,█▅▃▂▂▂▂▂▁▁▁▁▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,30.64517
Train Loss,0.01053
Validation Accuracy,31.67126
Validation Loss,0.01318


[34m[1mwandb[0m: Agent Starting Run: 3z03orn4 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.5


Epoch 1/10 | Train Loss: 0.0419, Train Acc: 0.22% | Val Loss: 0.0325, Val Acc: 0.94%
Epoch 2/10 | Train Loss: 0.0286, Train Acc: 2.24% | Val Loss: 0.0243, Val Acc: 5.22%
Epoch 3/10 | Train Loss: 0.0232, Train Acc: 5.90% | Val Loss: 0.0207, Val Acc: 9.28%
Epoch 4/10 | Train Loss: 0.0204, Train Acc: 9.59% | Val Loss: 0.0190, Val Acc: 12.50%
Epoch 5/10 | Train Loss: 0.0186, Train Acc: 12.64% | Val Loss: 0.0175, Val Acc: 16.70%
Epoch 6/10 | Train Loss: 0.0174, Train Acc: 14.95% | Val Loss: 0.0165, Val Acc: 19.39%
Epoch 7/10 | Train Loss: 0.0165, Train Acc: 16.58% | Val Loss: 0.0157, Val Acc: 20.47%
Epoch 8/10 | Train Loss: 0.0158, Train Acc: 18.53% | Val Loss: 0.0153, Val Acc: 21.81%
Epoch 9/10 | Train Loss: 0.0153, Train Acc: 19.69% | Val Loss: 0.0146, Val Acc: 23.61%
Epoch 10/10 | Train Loss: 0.0148, Train Acc: 20.78% | Val Loss: 0.0146, Val Acc: 23.21%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▄▅▆▇▇██
Train Loss,█▅▃▂▂▂▁▁▁▁
Validation Accuracy,▁▂▄▅▆▇▇▇██
Validation Loss,█▅▃▃▂▂▁▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,10
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.5
Train Accuracy,20.78351
Train Loss,0.01481
Validation Accuracy,23.20772
Validation Loss,0.01459


[34m[1mwandb[0m: Agent Starting Run: 19jfbc0j with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 2
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.2


Epoch 1/15 | Train Loss: 0.0439, Train Acc: 0.12% | Val Loss: 0.0326, Val Acc: 0.64%
Epoch 2/15 | Train Loss: 0.0266, Train Acc: 4.00% | Val Loss: 0.0203, Val Acc: 7.67%
Epoch 3/15 | Train Loss: 0.0199, Train Acc: 11.63% | Val Loss: 0.0168, Val Acc: 16.58%
Epoch 4/15 | Train Loss: 0.0172, Train Acc: 17.08% | Val Loss: 0.0152, Val Acc: 20.40%
Epoch 5/15 | Train Loss: 0.0156, Train Acc: 21.08% | Val Loss: 0.0142, Val Acc: 23.65%
Epoch 6/15 | Train Loss: 0.0144, Train Acc: 24.58% | Val Loss: 0.0136, Val Acc: 23.65%
Epoch 7/15 | Train Loss: 0.0136, Train Acc: 26.92% | Val Loss: 0.0135, Val Acc: 26.09%
Epoch 8/15 | Train Loss: 0.0128, Train Acc: 29.44% | Val Loss: 0.0129, Val Acc: 28.04%
Epoch 9/15 | Train Loss: 0.0122, Train Acc: 31.60% | Val Loss: 0.0126, Val Acc: 28.62%
Epoch 10/15 | Train Loss: 0.0117, Train Acc: 33.51% | Val Loss: 0.0123, Val Acc: 29.63%
Epoch 11/15 | Train Loss: 0.0112, Train Acc: 35.07% | Val Loss: 0.0122, Val Acc: 30.66%
Epoch 12/15 | Train Loss: 0.0108, Train Acc: 

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▄▅▅▆▆▆▇▇▇███
Train Loss,█▄▃▃▂▂▂▂▂▁▁▁▁▁▁
Validation Accuracy,▁▃▅▅▆▆▇▇▇██████
Validation Loss,█▄▃▂▂▂▂▁▁▁▁▁▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.2
Train Accuracy,40.61677
Train Loss,0.00974
Validation Accuracy,30.93597
Validation Loss,0.01222


[34m[1mwandb[0m: Agent Starting Run: xut59qhc with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0408, Train Acc: 0.38% | Val Loss: 0.0293, Val Acc: 1.91%
Epoch 2/15 | Train Loss: 0.0215, Train Acc: 8.17% | Val Loss: 0.0185, Val Acc: 13.58%
Epoch 3/15 | Train Loss: 0.0154, Train Acc: 17.78% | Val Loss: 0.0160, Val Acc: 22.40%
Epoch 4/15 | Train Loss: 0.0128, Train Acc: 23.88% | Val Loss: 0.0144, Val Acc: 25.61%
Epoch 5/15 | Train Loss: 0.0113, Train Acc: 28.37% | Val Loss: 0.0136, Val Acc: 29.28%
Epoch 6/15 | Train Loss: 0.0103, Train Acc: 31.63% | Val Loss: 0.0127, Val Acc: 29.72%
Epoch 7/15 | Train Loss: 0.0095, Train Acc: 34.75% | Val Loss: 0.0128, Val Acc: 31.98%
Epoch 8/15 | Train Loss: 0.0089, Train Acc: 37.01% | Val Loss: 0.0122, Val Acc: 33.14%
Epoch 9/15 | Train Loss: 0.0084, Train Acc: 39.17% | Val Loss: 0.0116, Val Acc: 35.23%
Epoch 10/15 | Train Loss: 0.0078, Train Acc: 41.43% | Val Loss: 0.0129, Val Acc: 34.95%
Epoch 11/15 | Train Loss: 0.0075, Train Acc: 43.02% | Val Loss: 0.0113, Val Acc: 36.17%
Epoch 12/15 | Train Loss: 0.0071, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▄▅▅▆▆▇▇▇▇███
Train Loss,█▄▃▂▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▃▅▆▆▇▇▇███████
Validation Loss,█▄▃▂▂▂▂▁▁▂▁▁▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,49.27577
Train Loss,0.00634
Validation Accuracy,36.85662
Validation Loss,0.01191


[34m[1mwandb[0m: Agent Starting Run: r8cmnwz3 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/5 | Train Loss: 0.0450, Train Acc: 0.02% | Val Loss: 0.0397, Val Acc: 0.05%
Epoch 2/5 | Train Loss: 0.0295, Train Acc: 2.03% | Val Loss: 0.0237, Val Acc: 7.38%
Epoch 3/5 | Train Loss: 0.0188, Train Acc: 12.03% | Val Loss: 0.0170, Val Acc: 19.71%
Epoch 4/5 | Train Loss: 0.0145, Train Acc: 20.49% | Val Loss: 0.0150, Val Acc: 24.78%
Epoch 5/5 | Train Loss: 0.0125, Train Acc: 25.58% | Val Loss: 0.0143, Val Acc: 29.35%


0,1
Beam Width,▁▁▁▁▁
Epoch,▁▃▅▆█
Learning Rate,▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁
Train Accuracy,▁▂▄▇█
Train Loss,█▅▂▁▁
Validation Accuracy,▁▃▆▇█
Validation Loss,█▄▂▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,5
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,25.57621
Train Loss,0.01249
Validation Accuracy,29.35049
Validation Loss,0.01431


[34m[1mwandb[0m: Agent Starting Run: wb25c5ir with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0431, Train Acc: 0.11% | Val Loss: 0.0342, Val Acc: 0.57%
Epoch 2/15 | Train Loss: 0.0287, Train Acc: 1.96% | Val Loss: 0.0251, Val Acc: 4.69%
Epoch 3/15 | Train Loss: 0.0226, Train Acc: 6.02% | Val Loss: 0.0206, Val Acc: 9.47%
Epoch 4/15 | Train Loss: 0.0193, Train Acc: 9.96% | Val Loss: 0.0183, Val Acc: 14.13%
Epoch 5/15 | Train Loss: 0.0173, Train Acc: 13.03% | Val Loss: 0.0168, Val Acc: 19.30%
Epoch 6/15 | Train Loss: 0.0159, Train Acc: 15.77% | Val Loss: 0.0161, Val Acc: 21.01%
Epoch 7/15 | Train Loss: 0.0151, Train Acc: 17.75% | Val Loss: 0.0157, Val Acc: 22.73%
Epoch 8/15 | Train Loss: 0.0141, Train Acc: 20.02% | Val Loss: 0.0147, Val Acc: 24.53%
Epoch 9/15 | Train Loss: 0.0134, Train Acc: 21.65% | Val Loss: 0.0150, Val Acc: 26.27%
Epoch 10/15 | Train Loss: 0.0129, Train Acc: 22.79% | Val Loss: 0.0142, Val Acc: 27.58%
Epoch 11/15 | Train Loss: 0.0125, Train Acc: 24.17% | Val Loss: 0.0138, Val Acc: 28.95%
Epoch 12/15 | Train Loss: 0.0122, Train Acc: 25.

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▂▃▄▅▅▆▆▇▇▇███
Train Loss,█▅▃▃▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▂▃▄▅▆▆▆▇▇▇▇███
Validation Loss,█▅▄▃▂▂▂▂▂▁▁▁▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,28.09
Train Loss,0.01124
Validation Accuracy,31.18873
Validation Loss,0.01302


[34m[1mwandb[0m: Agent Starting Run: zeg78qwz with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 2
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0428, Train Acc: 0.07% | Val Loss: 0.0357, Val Acc: 0.28%
Epoch 2/15 | Train Loss: 0.0255, Train Acc: 4.46% | Val Loss: 0.0208, Val Acc: 11.12%
Epoch 3/15 | Train Loss: 0.0166, Train Acc: 15.64% | Val Loss: 0.0165, Val Acc: 21.55%
Epoch 4/15 | Train Loss: 0.0134, Train Acc: 23.30% | Val Loss: 0.0150, Val Acc: 26.01%
Epoch 5/15 | Train Loss: 0.0115, Train Acc: 28.25% | Val Loss: 0.0134, Val Acc: 29.25%
Epoch 6/15 | Train Loss: 0.0103, Train Acc: 32.16% | Val Loss: 0.0127, Val Acc: 32.50%
Epoch 7/15 | Train Loss: 0.0094, Train Acc: 35.76% | Val Loss: 0.0125, Val Acc: 34.74%
Epoch 8/15 | Train Loss: 0.0088, Train Acc: 38.32% | Val Loss: 0.0124, Val Acc: 35.00%
Epoch 9/15 | Train Loss: 0.0082, Train Acc: 40.74% | Val Loss: 0.0114, Val Acc: 36.40%
Epoch 10/15 | Train Loss: 0.0076, Train Acc: 43.06% | Val Loss: 0.0114, Val Acc: 36.79%
Epoch 11/15 | Train Loss: 0.0073, Train Acc: 45.27% | Val Loss: 0.0119, Val Acc: 36.76%
Epoch 12/15 | Train Loss: 0.0069, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▄▅▅▆▆▆▇▇▇███
Train Loss,█▅▃▂▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▃▅▆▆▇▇▇███████
Validation Loss,█▄▂▂▂▁▁▁▁▁▁▁▁▁▁

0,1
Beam Width,3
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,51.88612
Train Loss,0.006
Validation Accuracy,38.90165
Validation Loss,0.01171


[34m[1mwandb[0m: Agent Starting Run: 5xlzwtf2 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0362, Train Acc: 1.40% | Val Loss: 0.0227, Val Acc: 7.70%
Epoch 2/15 | Train Loss: 0.0173, Train Acc: 13.93% | Val Loss: 0.0159, Val Acc: 21.92%
Epoch 3/15 | Train Loss: 0.0131, Train Acc: 22.72% | Val Loss: 0.0134, Val Acc: 27.71%
Epoch 4/15 | Train Loss: 0.0114, Train Acc: 28.16% | Val Loss: 0.0124, Val Acc: 30.28%
Epoch 5/15 | Train Loss: 0.0103, Train Acc: 31.98% | Val Loss: 0.0119, Val Acc: 32.84%
Epoch 6/15 | Train Loss: 0.0095, Train Acc: 35.00% | Val Loss: 0.0123, Val Acc: 33.03%
Epoch 7/15 | Train Loss: 0.0089, Train Acc: 37.36% | Val Loss: 0.0124, Val Acc: 34.60%
Epoch 8/15 | Train Loss: 0.0084, Train Acc: 39.04% | Val Loss: 0.0119, Val Acc: 35.86%
Epoch 9/15 | Train Loss: 0.0081, Train Acc: 40.86% | Val Loss: 0.0116, Val Acc: 36.09%
Epoch 10/15 | Train Loss: 0.0077, Train Acc: 42.20% | Val Loss: 0.0115, Val Acc: 36.37%
Epoch 11/15 | Train Loss: 0.0075, Train Acc: 43.69% | Val Loss: 0.0117, Val Acc: 36.91%
Epoch 12/15 | Train Loss: 0.0072, Train Acc

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▃▄▅▆▆▆▇▇▇▇████
Train Loss,█▄▃▂▂▂▂▁▁▁▁▁▁▁▁
Validation Accuracy,▁▄▆▆▇▇▇▇███████
Validation Loss,█▄▂▂▁▂▂▁▁▁▁▁▁▁▂

0,1
Beam Width,5
Bidirectional,False
Epoch,15
Learning Rate,0.002
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,48.42412
Train Loss,0.00657
Validation Accuracy,37.77574
Validation Loss,0.0123


[34m[1mwandb[0m: Agent Starting Run: ccsgya94 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/10 | Train Loss: 0.0346, Train Acc: 1.71% | Val Loss: 0.0235, Val Acc: 6.92%
Epoch 2/10 | Train Loss: 0.0177, Train Acc: 13.09% | Val Loss: 0.0165, Val Acc: 20.60%
Epoch 3/10 | Train Loss: 0.0136, Train Acc: 21.88% | Val Loss: 0.0156, Val Acc: 25.05%
Epoch 4/10 | Train Loss: 0.0117, Train Acc: 27.11% | Val Loss: 0.0134, Val Acc: 29.17%
Epoch 5/10 | Train Loss: 0.0104, Train Acc: 31.20% | Val Loss: 0.0133, Val Acc: 30.69%
Epoch 6/10 | Train Loss: 0.0094, Train Acc: 34.93% | Val Loss: 0.0127, Val Acc: 31.47%
Epoch 7/10 | Train Loss: 0.0088, Train Acc: 37.76% | Val Loss: 0.0125, Val Acc: 33.19%
Epoch 8/10 | Train Loss: 0.0082, Train Acc: 40.40% | Val Loss: 0.0124, Val Acc: 34.44%
Epoch 9/10 | Train Loss: 0.0077, Train Acc: 42.92% | Val Loss: 0.0120, Val Acc: 34.70%
Epoch 10/10 | Train Loss: 0.0074, Train Acc: 44.58% | Val Loss: 0.0121, Val Acc: 34.81%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▃▄▅▆▆▇▇██
Train Loss,█▄▃▂▂▂▁▁▁▁
Validation Accuracy,▁▄▆▇▇▇████
Validation Loss,█▄▃▂▂▁▁▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,10
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,44.57675
Train Loss,0.0074
Validation Accuracy,34.81158
Validation Loss,0.01211


[34m[1mwandb[0m: Agent Starting Run: fc0qx4nz with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 1
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.5


Epoch 1/15 | Train Loss: 0.0389, Train Acc: 0.74% | Val Loss: 0.0265, Val Acc: 3.38%
Epoch 2/15 | Train Loss: 0.0217, Train Acc: 8.27% | Val Loss: 0.0180, Val Acc: 14.23%
Epoch 3/15 | Train Loss: 0.0167, Train Acc: 16.92% | Val Loss: 0.0157, Val Acc: 20.32%
Epoch 4/15 | Train Loss: 0.0144, Train Acc: 22.39% | Val Loss: 0.0147, Val Acc: 22.20%
Epoch 5/15 | Train Loss: 0.0129, Train Acc: 26.47% | Val Loss: 0.0130, Val Acc: 27.11%
Epoch 6/15 | Train Loss: 0.0119, Train Acc: 29.78% | Val Loss: 0.0131, Val Acc: 28.19%
Epoch 7/15 | Train Loss: 0.0110, Train Acc: 33.09% | Val Loss: 0.0124, Val Acc: 31.04%
Epoch 8/15 | Train Loss: 0.0103, Train Acc: 35.33% | Val Loss: 0.0123, Val Acc: 31.27%
Epoch 9/15 | Train Loss: 0.0096, Train Acc: 37.89% | Val Loss: 0.0117, Val Acc: 31.01%
Epoch 10/15 | Train Loss: 0.0091, Train Acc: 39.87% | Val Loss: 0.0122, Val Acc: 32.44%
Epoch 11/15 | Train Loss: 0.0088, Train Acc: 41.56% | Val Loss: 0.0119, Val Acc: 33.33%
Epoch 12/15 | Train Loss: 0.0084, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▄▅▅▆▆▇▇▇▇███
Train Loss,█▄▃▃▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▃▅▅▆▇▇▇▇██████
Validation Loss,█▄▃▂▂▂▁▁▁▁▁▁▁▁▁

0,1
Beam Width,3
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.5
Train Accuracy,47.81044
Train Loss,0.00739
Validation Accuracy,33.68566
Validation Loss,0.01224


[34m[1mwandb[0m: Agent Starting Run: vuyrbjcu with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0450, Train Acc: 0.02% | Val Loss: 0.0398, Val Acc: 0.11%
Epoch 2/15 | Train Loss: 0.0283, Train Acc: 2.94% | Val Loss: 0.0216, Val Acc: 10.29%
Epoch 3/15 | Train Loss: 0.0170, Train Acc: 15.30% | Val Loss: 0.0168, Val Acc: 20.30%
Epoch 4/15 | Train Loss: 0.0132, Train Acc: 23.75% | Val Loss: 0.0143, Val Acc: 27.37%
Epoch 5/15 | Train Loss: 0.0114, Train Acc: 28.94% | Val Loss: 0.0131, Val Acc: 30.84%
Epoch 6/15 | Train Loss: 0.0102, Train Acc: 32.64% | Val Loss: 0.0122, Val Acc: 33.82%
Epoch 7/15 | Train Loss: 0.0093, Train Acc: 35.83% | Val Loss: 0.0123, Val Acc: 33.59%
Epoch 8/15 | Train Loss: 0.0085, Train Acc: 38.67% | Val Loss: 0.0116, Val Acc: 35.80%
Epoch 9/15 | Train Loss: 0.0081, Train Acc: 41.35% | Val Loss: 0.0114, Val Acc: 35.96%
Epoch 10/15 | Train Loss: 0.0075, Train Acc: 43.70% | Val Loss: 0.0119, Val Acc: 37.34%
Epoch 11/15 | Train Loss: 0.0071, Train Acc: 45.76% | Val Loss: 0.0119, Val Acc: 37.39%
Epoch 12/15 | Train Loss: 0.0066, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▃▄▅▅▆▆▆▇▇▇███
Train Loss,█▅▃▂▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▃▅▆▆▇▇▇▇██████
Validation Loss,█▄▂▂▁▁▁▁▁▁▁▁▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,52.82156
Train Loss,0.0057
Validation Accuracy,39.93566
Validation Loss,0.01179


[34m[1mwandb[0m: Agent Starting Run: jrigbc9l with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0422, Train Acc: 0.20% | Val Loss: 0.0320, Val Acc: 1.47%
Epoch 2/15 | Train Loss: 0.0225, Train Acc: 7.41% | Val Loss: 0.0190, Val Acc: 14.15%
Epoch 3/15 | Train Loss: 0.0153, Train Acc: 17.90% | Val Loss: 0.0159, Val Acc: 23.02%
Epoch 4/15 | Train Loss: 0.0128, Train Acc: 24.16% | Val Loss: 0.0143, Val Acc: 27.27%
Epoch 5/15 | Train Loss: 0.0112, Train Acc: 28.67% | Val Loss: 0.0129, Val Acc: 29.27%
Epoch 6/15 | Train Loss: 0.0100, Train Acc: 32.72% | Val Loss: 0.0124, Val Acc: 32.02%
Epoch 7/15 | Train Loss: 0.0094, Train Acc: 35.40% | Val Loss: 0.0118, Val Acc: 32.46%
Epoch 8/15 | Train Loss: 0.0086, Train Acc: 38.22% | Val Loss: 0.0120, Val Acc: 34.05%
Epoch 9/15 | Train Loss: 0.0080, Train Acc: 40.92% | Val Loss: 0.0118, Val Acc: 35.03%
Epoch 10/15 | Train Loss: 0.0075, Train Acc: 43.19% | Val Loss: 0.0120, Val Acc: 35.26%
Epoch 11/15 | Train Loss: 0.0072, Train Acc: 45.40% | Val Loss: 0.0120, Val Acc: 36.08%
Epoch 12/15 | Train Loss: 0.0068, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▄▅▅▆▆▇▇▇▇███
Train Loss,█▄▃▂▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▃▅▆▆▇▇▇███████
Validation Loss,█▄▃▂▁▁▁▁▁▁▁▁▁▁▁

0,1
Beam Width,3
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,51.79471
Train Loss,0.0059
Validation Accuracy,35.96814
Validation Loss,0.01167


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: h5e8iel9 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.5


Epoch 1/10 | Train Loss: 0.0462, Train Acc: 0.01% | Val Loss: 0.0412, Val Acc: 0.05%
Epoch 2/10 | Train Loss: 0.0331, Train Acc: 1.07% | Val Loss: 0.0253, Val Acc: 3.71%
Epoch 3/10 | Train Loss: 0.0215, Train Acc: 8.83% | Val Loss: 0.0178, Val Acc: 16.57%
Epoch 4/10 | Train Loss: 0.0167, Train Acc: 17.29% | Val Loss: 0.0151, Val Acc: 23.67%
Epoch 5/10 | Train Loss: 0.0145, Train Acc: 22.20% | Val Loss: 0.0137, Val Acc: 27.00%
Epoch 6/10 | Train Loss: 0.0131, Train Acc: 26.12% | Val Loss: 0.0133, Val Acc: 29.28%
Epoch 7/10 | Train Loss: 0.0121, Train Acc: 28.95% | Val Loss: 0.0125, Val Acc: 31.99%
Epoch 8/10 | Train Loss: 0.0112, Train Acc: 32.18% | Val Loss: 0.0119, Val Acc: 30.82%
Epoch 9/10 | Train Loss: 0.0104, Train Acc: 34.43% | Val Loss: 0.0120, Val Acc: 33.04%
Epoch 10/10 | Train Loss: 0.0099, Train Acc: 36.41% | Val Loss: 0.0111, Val Acc: 34.74%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▃▄▅▆▇▇██
Train Loss,█▅▃▂▂▂▁▁▁▁
Validation Accuracy,▁▂▄▆▆▇▇▇██
Validation Loss,█▄▃▂▂▁▁▁▁▁

0,1
Beam Width,3
Bidirectional,False
Epoch,10
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.5
Train Accuracy,36.4081
Train Loss,0.00994
Validation Accuracy,34.74265
Validation Loss,0.01114


[34m[1mwandb[0m: Agent Starting Run: m1gvkgyj with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 2
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0335, Train Acc: 2.31% | Val Loss: 0.0224, Val Acc: 7.95%
Epoch 2/15 | Train Loss: 0.0175, Train Acc: 13.31% | Val Loss: 0.0174, Val Acc: 21.04%
Epoch 3/15 | Train Loss: 0.0137, Train Acc: 21.29% | Val Loss: 0.0145, Val Acc: 24.70%
Epoch 4/15 | Train Loss: 0.0118, Train Acc: 26.80% | Val Loss: 0.0130, Val Acc: 28.96%
Epoch 5/15 | Train Loss: 0.0107, Train Acc: 30.55% | Val Loss: 0.0135, Val Acc: 30.51%
Epoch 6/15 | Train Loss: 0.0099, Train Acc: 33.72% | Val Loss: 0.0124, Val Acc: 31.78%
Epoch 7/15 | Train Loss: 0.0091, Train Acc: 36.65% | Val Loss: 0.0125, Val Acc: 32.67%
Epoch 8/15 | Train Loss: 0.0086, Train Acc: 38.71% | Val Loss: 0.0121, Val Acc: 32.47%
Epoch 9/15 | Train Loss: 0.0082, Train Acc: 40.66% | Val Loss: 0.0130, Val Acc: 34.21%
Epoch 10/15 | Train Loss: 0.0077, Train Acc: 42.79% | Val Loss: 0.0125, Val Acc: 33.98%
Epoch 11/15 | Train Loss: 0.0074, Train Acc: 44.07% | Val Loss: 0.0118, Val Acc: 35.08%
Epoch 12/15 | Train Loss: 0.0072, Train Acc

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▃▄▅▅▆▆▆▇▇▇▇███
Train Loss,█▄▃▂▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▄▅▆▇▇▇▇███████
Validation Loss,█▅▃▂▂▁▁▁▂▁▁▁▂▂▁

0,1
Beam Width,3
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,49.76635
Train Loss,0.00642
Validation Accuracy,34.48223
Validation Loss,0.01229


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: e5pikdlw with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 2
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/10 | Train Loss: 0.0450, Train Acc: 0.00% | Val Loss: 0.0408, Val Acc: 0.05%
Epoch 2/10 | Train Loss: 0.0318, Train Acc: 1.30% | Val Loss: 0.0265, Val Acc: 4.66%
Epoch 3/10 | Train Loss: 0.0200, Train Acc: 10.16% | Val Loss: 0.0191, Val Acc: 15.92%
Epoch 4/10 | Train Loss: 0.0152, Train Acc: 18.82% | Val Loss: 0.0150, Val Acc: 24.51%
Epoch 5/10 | Train Loss: 0.0128, Train Acc: 24.42% | Val Loss: 0.0148, Val Acc: 28.11%
Epoch 6/10 | Train Loss: 0.0115, Train Acc: 28.99% | Val Loss: 0.0139, Val Acc: 30.50%
Epoch 7/10 | Train Loss: 0.0104, Train Acc: 31.75% | Val Loss: 0.0133, Val Acc: 31.65%
Epoch 8/10 | Train Loss: 0.0096, Train Acc: 34.73% | Val Loss: 0.0124, Val Acc: 34.27%
Epoch 9/10 | Train Loss: 0.0090, Train Acc: 37.16% | Val Loss: 0.0117, Val Acc: 35.96%
Epoch 10/10 | Train Loss: 0.0084, Train Acc: 39.54% | Val Loss: 0.0115, Val Acc: 36.38%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▃▄▅▆▇▇██
Train Loss,█▅▃▂▂▂▁▁▁▁
Validation Accuracy,▁▂▄▆▆▇▇███
Validation Loss,█▅▃▂▂▂▁▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,10
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,39.5401
Train Loss,0.00844
Validation Accuracy,36.38174
Validation Loss,0.01151


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: xtfbujvx with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 1
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7




Epoch 1/10 | Train Loss: 0.0429, Train Acc: 0.04% | Val Loss: 0.0357, Val Acc: 0.37%
Epoch 2/10 | Train Loss: 0.0247, Train Acc: 5.13% | Val Loss: 0.0209, Val Acc: 11.21%
Epoch 3/10 | Train Loss: 0.0165, Train Acc: 16.20% | Val Loss: 0.0167, Val Acc: 20.60%
Epoch 4/10 | Train Loss: 0.0133, Train Acc: 23.28% | Val Loss: 0.0153, Val Acc: 25.27%
Epoch 5/10 | Train Loss: 0.0116, Train Acc: 28.04% | Val Loss: 0.0137, Val Acc: 28.16%
Epoch 6/10 | Train Loss: 0.0105, Train Acc: 31.61% | Val Loss: 0.0132, Val Acc: 30.21%
Epoch 7/10 | Train Loss: 0.0098, Train Acc: 34.74% | Val Loss: 0.0133, Val Acc: 31.47%
Epoch 8/10 | Train Loss: 0.0091, Train Acc: 37.88% | Val Loss: 0.0127, Val Acc: 32.37%
Epoch 9/10 | Train Loss: 0.0085, Train Acc: 39.82% | Val Loss: 0.0125, Val Acc: 33.10%
Epoch 10/10 | Train Loss: 0.0080, Train Acc: 41.95% | Val Loss: 0.0124, Val Acc: 33.45%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▄▅▆▆▇▇██
Train Loss,█▄▃▂▂▁▁▁▁▁
Validation Accuracy,▁▃▅▆▇▇████
Validation Loss,█▄▂▂▁▁▁▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,10
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,41.94746
Train Loss,0.00801
Validation Accuracy,33.44822
Validation Loss,0.01241


[34m[1mwandb[0m: Agent Starting Run: j7xim4eq with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 2
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.2


Epoch 1/15 | Train Loss: 0.0465, Train Acc: 0.01% | Val Loss: 0.0401, Val Acc: 0.05%
Epoch 2/15 | Train Loss: 0.0326, Train Acc: 1.30% | Val Loss: 0.0249, Val Acc: 4.14%
Epoch 3/15 | Train Loss: 0.0225, Train Acc: 8.20% | Val Loss: 0.0187, Val Acc: 12.84%
Epoch 4/15 | Train Loss: 0.0182, Train Acc: 15.43% | Val Loss: 0.0156, Val Acc: 20.30%
Epoch 5/15 | Train Loss: 0.0160, Train Acc: 20.52% | Val Loss: 0.0146, Val Acc: 22.96%
Epoch 6/15 | Train Loss: 0.0145, Train Acc: 24.28% | Val Loss: 0.0140, Val Acc: 25.21%
Epoch 7/15 | Train Loss: 0.0134, Train Acc: 27.46% | Val Loss: 0.0135, Val Acc: 27.31%
Epoch 8/15 | Train Loss: 0.0125, Train Acc: 30.39% | Val Loss: 0.0129, Val Acc: 29.07%
Epoch 9/15 | Train Loss: 0.0117, Train Acc: 33.12% | Val Loss: 0.0129, Val Acc: 29.49%
Epoch 10/15 | Train Loss: 0.0111, Train Acc: 35.09% | Val Loss: 0.0121, Val Acc: 31.33%
Epoch 11/15 | Train Loss: 0.0106, Train Acc: 37.61% | Val Loss: 0.0118, Val Acc: 31.23%
Epoch 12/15 | Train Loss: 0.0101, Train Acc: 3

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▂▃▄▅▅▆▆▇▇▇▇██
Train Loss,█▅▄▃▂▂▂▂▂▁▁▁▁▁▁
Validation Accuracy,▁▂▄▅▆▆▇▇▇██████
Validation Loss,█▄▃▂▂▂▂▁▁▁▁▁▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.2
Train Accuracy,44.62821
Train Loss,0.00881
Validation Accuracy,33.3027
Validation Loss,0.0114


[34m[1mwandb[0m: Agent Starting Run: ckt0dif2 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0418, Train Acc: 0.16% | Val Loss: 0.0341, Val Acc: 0.94%
Epoch 2/15 | Train Loss: 0.0241, Train Acc: 5.87% | Val Loss: 0.0200, Val Acc: 13.18%
Epoch 3/15 | Train Loss: 0.0159, Train Acc: 17.36% | Val Loss: 0.0162, Val Acc: 21.52%
Epoch 4/15 | Train Loss: 0.0128, Train Acc: 24.75% | Val Loss: 0.0145, Val Acc: 27.99%
Epoch 5/15 | Train Loss: 0.0111, Train Acc: 29.97% | Val Loss: 0.0142, Val Acc: 30.13%
Epoch 6/15 | Train Loss: 0.0099, Train Acc: 33.62% | Val Loss: 0.0131, Val Acc: 32.97%
Epoch 7/15 | Train Loss: 0.0091, Train Acc: 36.72% | Val Loss: 0.0127, Val Acc: 34.28%
Epoch 8/15 | Train Loss: 0.0084, Train Acc: 39.60% | Val Loss: 0.0123, Val Acc: 35.34%
Epoch 9/15 | Train Loss: 0.0080, Train Acc: 41.91% | Val Loss: 0.0120, Val Acc: 36.08%
Epoch 10/15 | Train Loss: 0.0075, Train Acc: 44.05% | Val Loss: 0.0114, Val Acc: 35.52%
Epoch 11/15 | Train Loss: 0.0071, Train Acc: 45.99% | Val Loss: 0.0114, Val Acc: 37.53%
Epoch 12/15 | Train Loss: 0.0066, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▄▅▅▆▆▇▇▇▇███
Train Loss,█▅▃▂▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▃▅▆▆▇▇▇▇▇█████
Validation Loss,█▄▃▂▂▂▁▁▁▁▁▁▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,53.06427
Train Loss,0.00574
Validation Accuracy,39.49908
Validation Loss,0.01217


[34m[1mwandb[0m: Agent Starting Run: 9tiwvipr with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/10 | Train Loss: 0.0443, Train Acc: 0.06% | Val Loss: 0.0355, Val Acc: 0.46%
Epoch 2/10 | Train Loss: 0.0251, Train Acc: 5.17% | Val Loss: 0.0206, Val Acc: 12.78%
Epoch 3/10 | Train Loss: 0.0166, Train Acc: 15.84% | Val Loss: 0.0161, Val Acc: 21.61%
Epoch 4/10 | Train Loss: 0.0134, Train Acc: 22.65% | Val Loss: 0.0149, Val Acc: 26.75%
Epoch 5/10 | Train Loss: 0.0115, Train Acc: 28.02% | Val Loss: 0.0139, Val Acc: 28.98%
Epoch 6/10 | Train Loss: 0.0105, Train Acc: 31.37% | Val Loss: 0.0129, Val Acc: 31.58%
Epoch 7/10 | Train Loss: 0.0096, Train Acc: 34.57% | Val Loss: 0.0125, Val Acc: 33.49%
Epoch 8/10 | Train Loss: 0.0089, Train Acc: 37.58% | Val Loss: 0.0122, Val Acc: 33.82%
Epoch 9/10 | Train Loss: 0.0083, Train Acc: 39.75% | Val Loss: 0.0120, Val Acc: 34.62%
Epoch 10/10 | Train Loss: 0.0078, Train Acc: 42.62% | Val Loss: 0.0122, Val Acc: 35.03%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▄▅▆▆▇▇██
Train Loss,█▄▃▂▂▂▁▁▁▁
Validation Accuracy,▁▃▅▆▇▇████
Validation Loss,█▄▂▂▂▁▁▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,10
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,42.62475
Train Loss,0.00782
Validation Accuracy,35.0337
Validation Loss,0.01219


[34m[1mwandb[0m: Agent Starting Run: s2xrt35j with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/10 | Train Loss: 0.0437, Train Acc: 0.05% | Val Loss: 0.0375, Val Acc: 0.23%
Epoch 2/10 | Train Loss: 0.0272, Train Acc: 3.37% | Val Loss: 0.0212, Val Acc: 10.96%
Epoch 3/10 | Train Loss: 0.0171, Train Acc: 14.62% | Val Loss: 0.0158, Val Acc: 22.10%
Epoch 4/10 | Train Loss: 0.0135, Train Acc: 22.66% | Val Loss: 0.0139, Val Acc: 27.29%
Epoch 5/10 | Train Loss: 0.0117, Train Acc: 27.48% | Val Loss: 0.0132, Val Acc: 30.55%
Epoch 6/10 | Train Loss: 0.0105, Train Acc: 31.27% | Val Loss: 0.0124, Val Acc: 32.31%
Epoch 7/10 | Train Loss: 0.0096, Train Acc: 34.63% | Val Loss: 0.0125, Val Acc: 33.81%
Epoch 8/10 | Train Loss: 0.0088, Train Acc: 37.35% | Val Loss: 0.0120, Val Acc: 34.05%
Epoch 9/10 | Train Loss: 0.0083, Train Acc: 40.06% | Val Loss: 0.0115, Val Acc: 35.30%
Epoch 10/10 | Train Loss: 0.0077, Train Acc: 42.48% | Val Loss: 0.0117, Val Acc: 35.90%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▅▆▆▇▇██
Train Loss,█▅▃▂▂▂▁▁▁▁
Validation Accuracy,▁▃▅▆▇▇████
Validation Loss,█▄▂▂▁▁▁▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,10
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,42.48209
Train Loss,0.00773
Validation Accuracy,35.8992
Validation Loss,0.0117


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: nypnry2g with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.2


Epoch 1/5 | Train Loss: 0.0475, Train Acc: 0.00% | Val Loss: 0.0432, Val Acc: 0.00%
Epoch 2/5 | Train Loss: 0.0370, Train Acc: 0.35% | Val Loss: 0.0309, Val Acc: 0.99%
Epoch 3/5 | Train Loss: 0.0279, Train Acc: 2.60% | Val Loss: 0.0239, Val Acc: 5.91%
Epoch 4/5 | Train Loss: 0.0232, Train Acc: 6.64% | Val Loss: 0.0200, Val Acc: 10.75%
Epoch 5/5 | Train Loss: 0.0206, Train Acc: 9.90% | Val Loss: 0.0181, Val Acc: 15.73%


0,1
Beam Width,▁▁▁▁▁
Epoch,▁▃▅▆█
Learning Rate,▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁
Train Accuracy,▁▁▃▆█
Train Loss,█▅▃▂▁
Validation Accuracy,▁▁▄▆█
Validation Loss,█▅▃▂▁

0,1
Beam Width,1
Bidirectional,False
Epoch,5
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.2
Train Accuracy,9.89872
Train Loss,0.0206
Validation Accuracy,15.73223
Validation Loss,0.01811


[34m[1mwandb[0m: Agent Starting Run: vowpqdj2 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0453, Train Acc: 0.01% | Val Loss: 0.0407, Val Acc: 0.05%
Epoch 2/15 | Train Loss: 0.0295, Train Acc: 2.28% | Val Loss: 0.0228, Val Acc: 8.43%
Epoch 3/15 | Train Loss: 0.0185, Train Acc: 12.35% | Val Loss: 0.0168, Val Acc: 20.53%
Epoch 4/15 | Train Loss: 0.0144, Train Acc: 20.16% | Val Loss: 0.0145, Val Acc: 26.23%
Epoch 5/15 | Train Loss: 0.0124, Train Acc: 25.36% | Val Loss: 0.0137, Val Acc: 28.66%
Epoch 6/15 | Train Loss: 0.0111, Train Acc: 29.20% | Val Loss: 0.0127, Val Acc: 31.10%
Epoch 7/15 | Train Loss: 0.0102, Train Acc: 32.25% | Val Loss: 0.0123, Val Acc: 34.54%
Epoch 8/15 | Train Loss: 0.0095, Train Acc: 34.70% | Val Loss: 0.0119, Val Acc: 34.57%
Epoch 9/15 | Train Loss: 0.0089, Train Acc: 37.37% | Val Loss: 0.0118, Val Acc: 35.23%
Epoch 10/15 | Train Loss: 0.0084, Train Acc: 39.31% | Val Loss: 0.0117, Val Acc: 36.75%
Epoch 11/15 | Train Loss: 0.0080, Train Acc: 41.16% | Val Loss: 0.0117, Val Acc: 37.55%
Epoch 12/15 | Train Loss: 0.0077, Train Acc: 

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▃▄▅▅▆▆▆▇▇▇███
Train Loss,█▅▃▂▂▂▂▁▁▁▁▁▁▁▁
Validation Accuracy,▁▃▅▆▆▇▇▇▇██████
Validation Loss,█▄▂▂▂▁▁▁▁▁▁▁▁▁▁

0,1
Beam Width,3
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,47.56196
Train Loss,0.0067
Validation Accuracy,38.65656
Validation Loss,0.01133


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: ezd0aajp with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 2
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.5


Epoch 1/15 | Train Loss: 0.0418, Train Acc: 0.28% | Val Loss: 0.0295, Val Acc: 2.00%
Epoch 2/15 | Train Loss: 0.0227, Train Acc: 7.60% | Val Loss: 0.0189, Val Acc: 13.89%
Epoch 3/15 | Train Loss: 0.0166, Train Acc: 17.61% | Val Loss: 0.0157, Val Acc: 19.66%
Epoch 4/15 | Train Loss: 0.0140, Train Acc: 23.39% | Val Loss: 0.0142, Val Acc: 25.17%
Epoch 5/15 | Train Loss: 0.0124, Train Acc: 28.25% | Val Loss: 0.0128, Val Acc: 28.95%
Epoch 6/15 | Train Loss: 0.0113, Train Acc: 31.96% | Val Loss: 0.0129, Val Acc: 30.65%
Epoch 7/15 | Train Loss: 0.0105, Train Acc: 34.97% | Val Loss: 0.0121, Val Acc: 31.50%
Epoch 8/15 | Train Loss: 0.0096, Train Acc: 37.80% | Val Loss: 0.0119, Val Acc: 33.36%
Epoch 9/15 | Train Loss: 0.0092, Train Acc: 39.99% | Val Loss: 0.0115, Val Acc: 33.96%
Epoch 10/15 | Train Loss: 0.0087, Train Acc: 42.14% | Val Loss: 0.0115, Val Acc: 35.72%
Epoch 11/15 | Train Loss: 0.0081, Train Acc: 44.65% | Val Loss: 0.0116, Val Acc: 34.06%
Epoch 12/15 | Train Loss: 0.0077, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▄▅▅▆▆▆▇▇▇███
Train Loss,█▄▃▂▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▃▅▆▇▇▇████████
Validation Loss,█▄▃▂▂▂▁▁▁▁▁▁▁▁▁

0,1
Beam Width,3
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.5
Train Accuracy,51.25329
Train Loss,0.00682
Validation Accuracy,34.75031
Validation Loss,0.01155


[34m[1mwandb[0m: Agent Starting Run: ufjyno3n with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 256
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 64
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0450, Train Acc: 0.02% | Val Loss: 0.0388, Val Acc: 0.18%
Epoch 2/15 | Train Loss: 0.0317, Train Acc: 1.04% | Val Loss: 0.0280, Val Acc: 3.45%
Epoch 3/15 | Train Loss: 0.0241, Train Acc: 4.77% | Val Loss: 0.0216, Val Acc: 9.69%
Epoch 4/15 | Train Loss: 0.0200, Train Acc: 9.04% | Val Loss: 0.0190, Val Acc: 13.53%
Epoch 5/15 | Train Loss: 0.0175, Train Acc: 12.88% | Val Loss: 0.0173, Val Acc: 17.00%
Epoch 6/15 | Train Loss: 0.0162, Train Acc: 15.89% | Val Loss: 0.0167, Val Acc: 21.38%
Epoch 7/15 | Train Loss: 0.0150, Train Acc: 18.25% | Val Loss: 0.0156, Val Acc: 23.11%
Epoch 8/15 | Train Loss: 0.0141, Train Acc: 20.40% | Val Loss: 0.0151, Val Acc: 24.60%
Epoch 9/15 | Train Loss: 0.0134, Train Acc: 22.03% | Val Loss: 0.0148, Val Acc: 26.27%
Epoch 10/15 | Train Loss: 0.0128, Train Acc: 23.57% | Val Loss: 0.0143, Val Acc: 27.23%
Epoch 11/15 | Train Loss: 0.0124, Train Acc: 24.84% | Val Loss: 0.0145, Val Acc: 29.04%
Epoch 12/15 | Train Loss: 0.0119, Train Acc: 26.

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▂▃▄▅▅▆▆▇▇▇███
Train Loss,█▅▄▃▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▂▃▄▅▆▆▇▇▇█████
Validation Loss,█▅▃▃▂▂▂▂▂▂▂▁▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,29.3608
Train Loss,0.01099
Validation Accuracy,30.86703
Validation Loss,0.01303


[34m[1mwandb[0m: Agent Starting Run: eygt6utb with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 2
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0407, Train Acc: 0.24% | Val Loss: 0.0311, Val Acc: 0.92%
Epoch 2/15 | Train Loss: 0.0212, Train Acc: 8.56% | Val Loss: 0.0183, Val Acc: 17.51%
Epoch 3/15 | Train Loss: 0.0145, Train Acc: 20.26% | Val Loss: 0.0149, Val Acc: 25.58%
Epoch 4/15 | Train Loss: 0.0119, Train Acc: 27.10% | Val Loss: 0.0135, Val Acc: 29.40%
Epoch 5/15 | Train Loss: 0.0105, Train Acc: 31.61% | Val Loss: 0.0123, Val Acc: 31.31%
Epoch 6/15 | Train Loss: 0.0095, Train Acc: 35.42% | Val Loss: 0.0124, Val Acc: 33.92%
Epoch 7/15 | Train Loss: 0.0087, Train Acc: 38.68% | Val Loss: 0.0121, Val Acc: 34.45%
Epoch 8/15 | Train Loss: 0.0081, Train Acc: 41.19% | Val Loss: 0.0115, Val Acc: 34.14%
Epoch 9/15 | Train Loss: 0.0074, Train Acc: 44.13% | Val Loss: 0.0118, Val Acc: 36.65%
Epoch 10/15 | Train Loss: 0.0071, Train Acc: 45.97% | Val Loss: 0.0119, Val Acc: 36.17%
Epoch 11/15 | Train Loss: 0.0068, Train Acc: 47.77% | Val Loss: 0.0117, Val Acc: 36.96%
Epoch 12/15 | Train Loss: 0.0064, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▄▄▅▆▆▆▇▇▇▇███
Train Loss,█▄▃▂▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▄▆▆▇▇▇▇███████
Validation Loss,█▄▂▂▁▁▁▁▁▁▁▁▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,54.37212
Train Loss,0.00555
Validation Accuracy,38.31189
Validation Loss,0.01122


[34m[1mwandb[0m: Agent Starting Run: 1fmbfklw with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.2


Epoch 1/15 | Train Loss: 0.0454, Train Acc: 0.01% | Val Loss: 0.0387, Val Acc: 0.05%
Epoch 2/15 | Train Loss: 0.0313, Train Acc: 1.66% | Val Loss: 0.0229, Val Acc: 5.97%
Epoch 3/15 | Train Loss: 0.0211, Train Acc: 10.02% | Val Loss: 0.0170, Val Acc: 18.83%
Epoch 4/15 | Train Loss: 0.0172, Train Acc: 16.98% | Val Loss: 0.0149, Val Acc: 22.96%
Epoch 5/15 | Train Loss: 0.0152, Train Acc: 21.97% | Val Loss: 0.0136, Val Acc: 26.60%
Epoch 6/15 | Train Loss: 0.0139, Train Acc: 25.37% | Val Loss: 0.0131, Val Acc: 28.29%
Epoch 7/15 | Train Loss: 0.0129, Train Acc: 28.29% | Val Loss: 0.0128, Val Acc: 29.70%
Epoch 8/15 | Train Loss: 0.0122, Train Acc: 30.67% | Val Loss: 0.0123, Val Acc: 29.79%
Epoch 9/15 | Train Loss: 0.0115, Train Acc: 32.69% | Val Loss: 0.0122, Val Acc: 32.13%
Epoch 10/15 | Train Loss: 0.0110, Train Acc: 34.85% | Val Loss: 0.0126, Val Acc: 31.53%
Epoch 11/15 | Train Loss: 0.0105, Train Acc: 36.53% | Val Loss: 0.0122, Val Acc: 34.36%
Epoch 12/15 | Train Loss: 0.0101, Train Acc: 

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▁▃▄▅▅▆▆▆▇▇▇███
Train Loss,█▅▃▃▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▂▅▆▆▇▇▇█▇█████
Validation Loss,█▄▂▂▁▁▁▁▁▁▁▁▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,15
Learning Rate,0.002
Optimizer,adam
Teacher Forcing Ratio,0.2
Train Accuracy,41.79368
Train Loss,0.00908
Validation Accuracy,33.89246
Validation Loss,0.01191


[34m[1mwandb[0m: Agent Starting Run: ld5q0hvo with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0410, Train Acc: 0.41% | Val Loss: 0.0296, Val Acc: 2.55%
Epoch 2/15 | Train Loss: 0.0215, Train Acc: 8.01% | Val Loss: 0.0189, Val Acc: 15.32%
Epoch 3/15 | Train Loss: 0.0154, Train Acc: 18.15% | Val Loss: 0.0156, Val Acc: 23.09%
Epoch 4/15 | Train Loss: 0.0128, Train Acc: 24.37% | Val Loss: 0.0140, Val Acc: 27.90%
Epoch 5/15 | Train Loss: 0.0113, Train Acc: 28.17% | Val Loss: 0.0135, Val Acc: 28.98%
Epoch 6/15 | Train Loss: 0.0103, Train Acc: 31.69% | Val Loss: 0.0129, Val Acc: 31.40%
Epoch 7/15 | Train Loss: 0.0095, Train Acc: 34.53% | Val Loss: 0.0125, Val Acc: 33.03%
Epoch 8/15 | Train Loss: 0.0090, Train Acc: 36.70% | Val Loss: 0.0123, Val Acc: 32.83%
Epoch 9/15 | Train Loss: 0.0084, Train Acc: 39.20% | Val Loss: 0.0124, Val Acc: 34.43%
Epoch 10/15 | Train Loss: 0.0080, Train Acc: 41.15% | Val Loss: 0.0122, Val Acc: 36.08%
Epoch 11/15 | Train Loss: 0.0076, Train Acc: 42.70% | Val Loss: 0.0117, Val Acc: 35.90%
Epoch 12/15 | Train Loss: 0.0072, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▄▄▅▅▆▆▇▇▇▇███
Train Loss,█▄▃▂▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▄▅▆▆▇▇▇▇██████
Validation Loss,█▄▃▂▂▂▁▁▁▁▁▁▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,49.09379
Train Loss,0.00641
Validation Accuracy,37.05576
Validation Loss,0.01194


[34m[1mwandb[0m: Agent Starting Run: 243a8zbc with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: nadam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/10 | Train Loss: 0.0389, Train Acc: 0.75% | Val Loss: 0.0263, Val Acc: 4.14%
Epoch 2/10 | Train Loss: 0.0200, Train Acc: 9.69% | Val Loss: 0.0188, Val Acc: 15.00%
Epoch 3/10 | Train Loss: 0.0149, Train Acc: 18.83% | Val Loss: 0.0153, Val Acc: 23.46%
Epoch 4/10 | Train Loss: 0.0125, Train Acc: 24.53% | Val Loss: 0.0142, Val Acc: 26.20%
Epoch 5/10 | Train Loss: 0.0111, Train Acc: 29.13% | Val Loss: 0.0132, Val Acc: 28.96%
Epoch 6/10 | Train Loss: 0.0103, Train Acc: 32.07% | Val Loss: 0.0125, Val Acc: 31.23%
Epoch 7/10 | Train Loss: 0.0094, Train Acc: 35.17% | Val Loss: 0.0128, Val Acc: 32.39%
Epoch 8/10 | Train Loss: 0.0088, Train Acc: 37.61% | Val Loss: 0.0119, Val Acc: 33.67%
Epoch 9/10 | Train Loss: 0.0083, Train Acc: 40.09% | Val Loss: 0.0123, Val Acc: 33.94%
Epoch 10/10 | Train Loss: 0.0078, Train Acc: 41.83% | Val Loss: 0.0128, Val Acc: 34.16%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▃▄▅▆▆▇▇██
Train Loss,█▄▃▂▂▂▁▁▁▁
Validation Accuracy,▁▄▆▆▇▇████
Validation Loss,█▄▃▂▂▁▁▁▁▁

0,1
Beam Width,3
Bidirectional,False
Epoch,10
Learning Rate,0.001
Optimizer,nadam
Teacher Forcing Ratio,0.7
Train Accuracy,41.82518
Train Loss,0.0078
Validation Accuracy,34.16054
Validation Loss,0.01276


[34m[1mwandb[0m: Agent Starting Run: bvnxj3r5 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 5
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 32
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0417, Train Acc: 0.25% | Val Loss: 0.0306, Val Acc: 1.61%
Epoch 2/15 | Train Loss: 0.0220, Train Acc: 8.33% | Val Loss: 0.0187, Val Acc: 15.90%
Epoch 3/15 | Train Loss: 0.0153, Train Acc: 18.41% | Val Loss: 0.0154, Val Acc: 24.07%
Epoch 4/15 | Train Loss: 0.0127, Train Acc: 24.73% | Val Loss: 0.0143, Val Acc: 27.45%
Epoch 5/15 | Train Loss: 0.0110, Train Acc: 29.37% | Val Loss: 0.0135, Val Acc: 29.66%
Epoch 6/15 | Train Loss: 0.0101, Train Acc: 32.83% | Val Loss: 0.0132, Val Acc: 31.38%
Epoch 7/15 | Train Loss: 0.0092, Train Acc: 35.91% | Val Loss: 0.0125, Val Acc: 33.95%
Epoch 8/15 | Train Loss: 0.0085, Train Acc: 38.82% | Val Loss: 0.0128, Val Acc: 33.10%
Epoch 9/15 | Train Loss: 0.0080, Train Acc: 41.26% | Val Loss: 0.0125, Val Acc: 34.05%
Epoch 10/15 | Train Loss: 0.0075, Train Acc: 43.74% | Val Loss: 0.0119, Val Acc: 34.44%
Epoch 11/15 | Train Loss: 0.0072, Train Acc: 45.48% | Val Loss: 0.0120, Val Acc: 36.41%
Epoch 12/15 | Train Loss: 0.0067, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▃▄▅▅▆▆▆▇▇▇███
Train Loss,█▄▃▂▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▄▅▆▆▇▇▇▇▇█████
Validation Loss,█▄▂▂▂▂▁▁▁▁▁▁▁▁▁

0,1
Beam Width,5
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,52.57185
Train Loss,0.00577
Validation Accuracy,37.07108
Validation Loss,0.01232


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: tb23co9g with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 3
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 2
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/10 | Train Loss: 0.0431, Train Acc: 0.13% | Val Loss: 0.0330, Val Acc: 0.85%
Epoch 2/10 | Train Loss: 0.0233, Train Acc: 6.47% | Val Loss: 0.0195, Val Acc: 13.82%
Epoch 3/10 | Train Loss: 0.0158, Train Acc: 17.00% | Val Loss: 0.0162, Val Acc: 22.62%
Epoch 4/10 | Train Loss: 0.0130, Train Acc: 23.32% | Val Loss: 0.0141, Val Acc: 27.44%
Epoch 5/10 | Train Loss: 0.0117, Train Acc: 27.99% | Val Loss: 0.0135, Val Acc: 29.33%
Epoch 6/10 | Train Loss: 0.0105, Train Acc: 31.31% | Val Loss: 0.0130, Val Acc: 32.65%
Epoch 7/10 | Train Loss: 0.0097, Train Acc: 34.13% | Val Loss: 0.0123, Val Acc: 32.20%
Epoch 8/10 | Train Loss: 0.0089, Train Acc: 36.88% | Val Loss: 0.0125, Val Acc: 33.13%
Epoch 9/10 | Train Loss: 0.0084, Train Acc: 39.07% | Val Loss: 0.0125, Val Acc: 34.52%
Epoch 10/10 | Train Loss: 0.0080, Train Acc: 41.03% | Val Loss: 0.0120, Val Acc: 35.66%


0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁
Epoch,▁▂▃▃▄▅▆▆▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁
Train Accuracy,▁▂▄▅▆▆▇▇██
Train Loss,█▄▃▂▂▂▁▁▁▁
Validation Accuracy,▁▄▅▆▇▇▇▇██
Validation Loss,█▃▂▂▂▁▁▁▁▁

0,1
Beam Width,3
Bidirectional,False
Epoch,10
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.7
Train Accuracy,41.0287
Train Loss,0.00798
Validation Accuracy,35.66176
Validation Loss,0.01197


# Question 4 (10 Marks)

You will now apply your best model on the test data (You shouldn't have used test data so far. All the above experiments should have been done using train and val data only). 

(a) Use the best model from your sweep and report the accuracy on the test set (the output is correct only if it exactly matches the reference output). 


(b) Provide sample inputs from the test data and predictions made by your best model (more marks for presenting this grid creatively). Also upload all the predictions on the test set in a folder **predictions_vanilla** on your github project.

(c) Comment on the errors made by your model (simple insightful bullet points)

- The model makes more errors on consonants than vowels

- The model makes more errors on longer sequences

- I am thinking confusion matrix but may be it's just me!

- ...

In [6]:
import pandas as pd

def calculate_word_accuracy_from_ids(preds_ids, targets_ids, ignore_index=0):
    """
    Calculates word-level accuracy given token id tensors directly (both of shape [batch, seq_len]).
    Accuracy is computed as the percentage of sequences where all predicted tokens exactly match the target tokens.
    """
    # Create a mask where the target is not a padding token (ignore_index)
    mask = (targets_ids != 0) & (targets_ids != 2)

    # Apply the mask to both predictions and targets to ignore padded positions
    preds_masked = preds_ids * mask
    targets_masked = targets_ids * mask

    # Check if all tokens in the predicted sequence exactly match the target sequence
    sequence_correct = (preds_masked == targets_masked).all(dim=1)

    # Compute mean accuracy over the batch and convert to percentage
    word_accuracy = sequence_correct.float().mean().item() * 100

    return word_accuracy


def predict_and_log_test_examples_with_csv(model, test_path, src_vocab, tgt_vocab, device, num_examples=50, csv_save_path="predictions.csv"):
    # Set model to evaluation mode (no dropout, etc.)
    model.eval()

    # Unpack vocabularies: itos = index to string, stoi = string to index
    itos = tgt_vocab[1]
    stoi = src_vocab[0]

    # Load test dataset and randomly sample examples to evaluate
    test_data = read_dataset(test_path)
    examples = random.sample(test_data, num_examples)
    predictions_log = []

    # Lists to store predictions and targets for accuracy calculation
    preds_list = []
    trgs_list = []

    # ✅ List to collect prediction results for saving to CSV
    csv_data = []

    for src_text, tgt_text in examples:
        # Encode source sequence and add <eos> token, then convert to tensor
        src_tensor = torch.tensor(encode_sequence(src_text, stoi) + [stoi['<eos>']], device=device).unsqueeze(0)

        # Encode source sequence and adjust hidden state for decoder
        hidden = model.encoder(src_tensor)
        decoder_hidden = model.adjust_hidden_for_decoder(hidden)

        # Initialize decoder input with <sos> token
        input = torch.tensor([tgt_vocab[0]['<sos>']], device=device)

        decoded_tokens = []
        for _ in range(30):  # Max output length = 30
            output, decoder_hidden = model.decoder(input, decoder_hidden)
            top1 = output.argmax(1)  # Get index of highest scoring token
            if top1.item() == tgt_vocab[0]['<eos>']:
                break
            decoded_tokens.append(top1.item())
            input = top1  # Feed predicted token as next input

        # Convert predicted token indices to string
        prediction = decoded_tokens
        pred_str = ''.join([itos[idx] for idx in prediction])

        print(f"Input: {src_text} | Target: {tgt_text} | Prediction: {pred_str}")

        # Store current example's prediction for CSV saving
        csv_data.append({
            "Input": src_text,
            "Target": tgt_text,
            "Prediction": pred_str
        })

        # Create formatted HTML string for logging
        predictions_log.append(wandb.Html(f"<b>Input:</b> {src_text} &nbsp; <b>Target:</b> {tgt_text} &nbsp; <b>Pred:</b> {pred_str}"))

        # Encode the ground truth target with <eos>
        tgt_encoded = [tgt_vocab[0].get(ch, tgt_vocab[0]['<unk>']) for ch in tgt_text] + [tgt_vocab[0]['<eos>']]

        # Append predictions and targets as tensors
        preds_list.append(torch.tensor(prediction, device=device))
        trgs_list.append(torch.tensor(tgt_encoded, device=device))

    # Find max sequence length for padding
    max_len = max(max([p.size(0) for p in preds_list]), max([t.size(0) for t in trgs_list]))

    # Pad predictions and targets with zeros up to max_len
    preds_padded = pad_sequence([torch.cat([p, torch.full((max_len - p.size(0),), 0, device=device)]) if p.size(0) < max_len else p for p in preds_list], batch_first=True)
    trgs_padded = pad_sequence([torch.cat([t, torch.full((max_len - t.size(0),), 0, device=device)]) if t.size(0) < max_len else t for t in trgs_list], batch_first=True)

    # Calculate word-level accuracy across examples
    test_word_acc = calculate_word_accuracy_from_ids(preds_padded, trgs_padded)

    print(f"Test Word Accuracy on {num_examples} examples: {test_word_acc:.2f}%")

    # Log predictions and accuracy to wandb
    wandb.log({
        "Test Predictions": wandb.Html("<br>".join([str(p) for p in predictions_log])),
        "Test Word Accuracy": test_word_acc
    })

    # Save all predictions to a CSV file
    df = pd.DataFrame(csv_data)
    df.to_csv(csv_save_path, index=False)
    print(f" Saved predictions to {csv_save_path}")


def train_pred():
    # Initialize Weights & Biases run with config
    wandb.init(config={
        "embed_dim": 64,
        "hidden_dim": 128,
        "enc_layers": 3,
        "dec_layers": 3,
        "cell_type": "LSTM",
        "dropout": 0.2,
        "epochs": 15,
        "batch_size": 64,
        "bidirectional": False,
        "learning_rate": 0.001,
        "optimizer": "adam",
        "teacher_forcing_ratio": 0.7,
        "beam_width": 1
    })

    config = wandb.config
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    # Load train and validation datasets
    train_data = read_dataset("/kaggle/input/dakshina-dataset/dakshina_dataset_v1.0/hi/lexicons/hi.translit.sampled.train.tsv")
    dev_data = read_dataset("/kaggle/input/dakshina-dataset/dakshina_dataset_v1.0/hi/lexicons/hi.translit.sampled.dev.tsv")

    # Build vocabularies from source and target text
    src_vocab, tgt_vocab = build_vocab([src for src, _ in train_data]), build_vocab([tgt for _, tgt in train_data])

    # Instantiate Seq2Seq model
    model = Seq2Seq(len(src_vocab[0]), len(tgt_vocab[0]), config.embed_dim, config.hidden_dim,
                    config.enc_layers, config.dec_layers, config.cell_type, config.dropout, config.bidirectional).to(device)

    # Select optimizer
    if config.optimizer == "adam":
        optimizer = optim.Adam(model.parameters(), lr=config.learning_rate)
    elif config.optimizer == "nadam":
        optimizer = optim.NAdam(model.parameters(), lr=config.learning_rate)
    else:
        raise ValueError("Unsupported optimizer")

    # Define loss function, ignoring padding index
    criterion = nn.CrossEntropyLoss(ignore_index=0)

    # Training loop
    for epoch in range(config.epochs):
        model.train()
        total_loss = 0
        total_acc = 0

        # Shuffle training data at the beginning of each epoch
        random.shuffle(train_data)

        # Mini-batch training
        for i in range(0, len(train_data), config.batch_size):
            batch = train_data[i:i + config.batch_size]
            src, trg = prepare_batch(batch, src_vocab[0], tgt_vocab[0], device)

            optimizer.zero_grad()
            output = model(src, trg, teacher_forcing_ratio=config.teacher_forcing_ratio)

            # Ignore the <sos> token while calculating loss
            loss = criterion(output[:, 1:].reshape(-1, output.shape[-1]), trg[:, 1:].reshape(-1))
            acc = calculate_word_accuracy(output[:, 1:], trg[:, 1:])

            loss.backward()
            optimizer.step()

            total_loss += loss.item()
            total_acc += acc

        # Calculate average metrics
        avg_train_loss = total_loss / len(train_data)
        avg_train_acc = total_acc / (len(train_data) // config.batch_size)

        # Evaluate on validation data
        val_loss, val_acc = evaluate(model, dev_data, src_vocab[0], tgt_vocab[0], device, criterion, config.batch_size)

        # Log metrics to wandb
        wandb.log({
            "Train Loss": avg_train_loss,
            "Train Accuracy": avg_train_acc,
            "Validation Loss": val_loss,
            "Validation Accuracy": val_acc,
            "Epoch": epoch + 1,
            "Learning Rate": config.learning_rate,
            "Teacher Forcing Ratio": config.teacher_forcing_ratio,
            "Optimizer": config.optimizer,
            "Bidirectional": config.bidirectional,
            "Beam Width": config.beam_width
        })

        print(f"Epoch {epoch + 1}/{config.epochs} | Train Loss: {avg_train_loss:.4f}, Train Acc: {avg_train_acc:.2f}% | Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%")

    # After training, evaluate on the test dataset
    test_path = "/kaggle/input/dakshina-dataset/dakshina_dataset_v1.0/hi/lexicons/hi.translit.sampled.test.tsv"
    predict_and_log_test_examples_with_csv(model, test_path, src_vocab, tgt_vocab, device, num_examples=4502, csv_save_path="predictions_without_attention.csv")

    wandb.finish()


# ---------- Sweep Setup ----------
sweep_config = {
    'method': 'random',  # Random search over hyperparameters
    'metric': {'name': 'Validation Loss', 'goal': 'minimize'},
    'parameters': {
        'embed_dim': {'values': [64]},
        'hidden_dim': {'values': [128]},
        'enc_layers': {'values': [3]},
        'dec_layers': {'values': [3]},
        'cell_type': {'values': ['LSTM']},
        'dropout': {'values': [0.2]},
        'batch_size': {'value': 64},
        'epochs': {'value': 15},
        'bidirectional': {'values': [False]},
        'learning_rate': {'values': [0.001]},
        'optimizer': {'values': ['adam']},
        'teacher_forcing_ratio': {'values': [0.7]},
        'beam_width': {'values': [1]}
    }
}

# Initiate a sweep with wandb using the defined config
sweep_id = wandb.sweep(sweep_config, project="without_attention_best_model_test")

# Launch sweep agent (code is truncated here)
wandb.agent(sweep_id, function=train_pred, count=1)

Create sweep with ID: gpqinb9t
Sweep URL: https://wandb.ai/viinod9-iitm/without_attention_best_model_test/sweeps/gpqinb9t


[34m[1mwandb[0m: Agent Starting Run: wol736ge with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	beam_width: 1
[34m[1mwandb[0m: 	bidirectional: False
[34m[1mwandb[0m: 	cell_type: LSTM
[34m[1mwandb[0m: 	dec_layers: 3
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	embed_dim: 64
[34m[1mwandb[0m: 	enc_layers: 3
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden_dim: 128
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	teacher_forcing_ratio: 0.7


Epoch 1/15 | Train Loss: 0.0445, Train Acc: 0.03% | Val Loss: 0.0379, Val Acc: 0.14%
Epoch 2/15 | Train Loss: 0.0272, Train Acc: 3.63% | Val Loss: 0.0215, Val Acc: 11.17%
Epoch 3/15 | Train Loss: 0.0170, Train Acc: 15.55% | Val Loss: 0.0168, Val Acc: 22.01%
Epoch 4/15 | Train Loss: 0.0134, Train Acc: 23.28% | Val Loss: 0.0145, Val Acc: 27.90%
Epoch 5/15 | Train Loss: 0.0116, Train Acc: 28.27% | Val Loss: 0.0131, Val Acc: 29.99%
Epoch 6/15 | Train Loss: 0.0105, Train Acc: 32.02% | Val Loss: 0.0127, Val Acc: 31.96%
Epoch 7/15 | Train Loss: 0.0097, Train Acc: 35.01% | Val Loss: 0.0125, Val Acc: 33.46%
Epoch 8/15 | Train Loss: 0.0088, Train Acc: 38.03% | Val Loss: 0.0121, Val Acc: 33.40%
Epoch 9/15 | Train Loss: 0.0082, Train Acc: 40.25% | Val Loss: 0.0126, Val Acc: 36.15%
Epoch 10/15 | Train Loss: 0.0077, Train Acc: 43.14% | Val Loss: 0.0116, Val Acc: 36.46%
Epoch 11/15 | Train Loss: 0.0072, Train Acc: 44.88% | Val Loss: 0.0119, Val Acc: 37.03%
Epoch 12/15 | Train Loss: 0.0070, Train Acc:

0,1
Beam Width,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
Learning Rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Teacher Forcing Ratio,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Test Word Accuracy,▁
Train Accuracy,▁▁▃▄▅▅▆▆▆▇▇▇███
Train Loss,█▅▃▂▂▂▂▂▁▁▁▁▁▁▁
Validation Accuracy,▁▃▅▆▆▇▇▇▇██████
Validation Loss,█▄▂▂▂▁▁▁▁▁▁▁▁▁▁

0,1
Beam Width,1
Bidirectional,False
Epoch,15
Learning Rate,0.001
Optimizer,adam
Teacher Forcing Ratio,0.7
Test Word Accuracy,34.09596
Train Accuracy,51.4923
Train Loss,0.00598
Validation Accuracy,38.98591
