<a href="https://colab.research.google.com/github/yashchhabria-db/f_dolly/blob/master/translation_transformer_mod_0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# For tips on running notebooks in Google Colab, see
# https://pytorch.org/tutorials/beginner/colab
%matplotlib inline


# Language Translation with ``nn.Transformer`` and torchtext

This tutorial shows:
    - How to train a translation model from scratch using Transformer.
    - Use torchtext library to access  [Multi30k](http://www.statmt.org/wmt16/multimodal-task.html#task1)_ dataset to train a German to English translation model.


## Data Sourcing and Processing

[torchtext library](https://pytorch.org/text/stable/)_ has utilities for creating datasets that can be easily
iterated through for the purposes of creating a language translation
model. In this example, we show how to use torchtext's inbuilt datasets,
tokenize a raw text sentence, build vocabulary, and numericalize tokens into tensor. We will use
[Multi30k dataset from torchtext library](https://pytorch.org/text/stable/datasets.html#multi30k)_
that yields a pair of source-target raw sentences.

To access torchtext datasets, please install torchdata following instructions at https://github.com/pytorch/data.




In [2]:
# from torchtext.data.utils import get_tokenizer
# from torchtext.vocab import build_vocab_from_iterator
# from torchtext.datasets import multi30k, Multi30k
# from typing import Iterable, List


# # We need to modify the URLs for the dataset since the links to the original dataset are broken
# # Refer to https://github.com/pytorch/text/issues/1756#issuecomment-1163664163 for more info
# multi30k.URL["train"] = "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/training.tar.gz"
# multi30k.URL["valid"] = "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/validation.tar.gz"

# SRC_LANGUAGE = 'de'
# TGT_LANGUAGE = 'en'

# # Place-holders


Create source and target language tokenizer. Make sure to install the dependencies.

```python
pip install -U torchdata
pip install -U spacy
python -m spacy download en_core_web_sm
python -m spacy download de_core_news_sm
```


In [3]:
# %pip install -U torchdata
%pip install -U spacy
!python -m spacy download en_core_web_sm
# !python -m spacy download de_core_news_sm

2023-07-25 10:47:50.796660: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-25 10:47:53.819608: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-07-25 10:47:53.820221: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-07-

In [4]:
!pip install portalocker>=2.0.0

In [5]:
from torchtext.datasets import WikiText2
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

tokenizer = get_tokenizer('spacy', language='en_core_web_sm')
# Define special symbols and indices
UNK_IDX, PAD_IDX, BOS_IDX, EOS_IDX = 0, 1, 2, 3
# Make sure the tokens are in order of their indices to properly insert them in vocab
special_symbols = ['<unk>', '<pad>', '<bos>', '<eos>']
train_iter = WikiText2(split='train')

vocab_transform = build_vocab_from_iterator(map(tokenizer, train_iter),
                                                min_freq=1,
                                                specials=special_symbols,
                                                special_first=True)

vocab_transform.set_default_index(UNK_IDX)

## Seq2Seq Network using Transformer

Transformer is a Seq2Seq model introduced in [“Attention is all you
need”](https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf)_
paper for solving machine translation tasks.
Below, we will create a Seq2Seq network that uses Transformer. The network
consists of three parts. First part is the embedding layer. This layer converts tensor of input indices
into corresponding tensor of input embeddings. These embedding are further augmented with positional
encodings to provide position information of input tokens to the model. The second part is the
actual [Transformer](https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html)_ model.
Finally, the output of the Transformer model is passed through linear layer
that gives unnormalized probabilities for each token in the target language.




In [6]:
from torch import Tensor
import torch
import torch.nn as nn
from torch.nn import Transformer
import math
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# helper Module that adds positional encoding to the token embedding to introduce a notion of word order.
class PositionalEncoding(nn.Module):
    def __init__(self,
                 emb_size: int,
                 dropout: float,
                 maxlen: int = 5000):
        super(PositionalEncoding, self).__init__()
        den = torch.exp(- torch.arange(0, emb_size, 2)* math.log(10000) / emb_size)
        pos = torch.arange(0, maxlen).reshape(maxlen, 1)
        pos_embedding = torch.zeros((maxlen, emb_size))
        pos_embedding[:, 0::2] = torch.sin(pos * den)
        pos_embedding[:, 1::2] = torch.cos(pos * den)
        pos_embedding = pos_embedding.unsqueeze(-2)

        self.dropout = nn.Dropout(dropout)
        self.register_buffer('pos_embedding', pos_embedding)

    def forward(self, token_embedding: Tensor):
        return self.dropout(token_embedding + self.pos_embedding[:token_embedding.size(0), :])

# helper Module to convert tensor of input indices into corresponding tensor of token embeddings
class TokenEmbedding(nn.Module):
    def __init__(self, vocab_size: int, emb_size):
        super(TokenEmbedding, self).__init__()
        self.embedding = nn.Embedding(vocab_size, emb_size)
        self.emb_size = emb_size

    def forward(self, tokens: Tensor):
        return self.embedding(tokens.long()) * math.sqrt(self.emb_size)

# Seq2Seq Network
class Seq2SeqTransformer(nn.Module):
    def __init__(self,
                 num_encoder_layers: int,
                 num_decoder_layers: int,
                 emb_size: int,
                 nhead: int,
                 src_vocab_size: int,
                 tgt_vocab_size: int,
                 dim_feedforward: int = 512,
                 dropout: float = 0.1):
        super(Seq2SeqTransformer, self).__init__()
        self.transformer = Transformer(d_model=emb_size,
                                       nhead=nhead,
                                       num_encoder_layers=num_encoder_layers,
                                       num_decoder_layers=num_decoder_layers,
                                       dim_feedforward=dim_feedforward,
                                       dropout=dropout)
        self.generator = nn.Linear(emb_size, tgt_vocab_size)
        self.src_tok_emb = TokenEmbedding(src_vocab_size, emb_size)
        self.tgt_tok_emb = TokenEmbedding(tgt_vocab_size, emb_size)
        self.positional_encoding = PositionalEncoding(
            emb_size, dropout=dropout)

    def forward(self,
                src: Tensor,
                trg: Tensor,
                src_mask: Tensor,
                tgt_mask: Tensor,
                src_padding_mask: Tensor,
                tgt_padding_mask: Tensor,
                memory_key_padding_mask: Tensor):
        src_emb = self.positional_encoding(self.src_tok_emb(src))
        tgt_emb = self.positional_encoding(self.tgt_tok_emb(trg))
        outs = self.transformer(src_emb, tgt_emb, src_mask, tgt_mask, None,
                                src_padding_mask, tgt_padding_mask, memory_key_padding_mask)
        return self.generator(outs)

    def encode(self, src: Tensor, src_mask: Tensor):
        return self.transformer.encoder(self.positional_encoding(
                            self.src_tok_emb(src)), src_mask)

    def decode(self, tgt: Tensor, memory: Tensor, tgt_mask: Tensor):
        return self.transformer.decoder(self.positional_encoding(
                          self.tgt_tok_emb(tgt)), memory,
                          tgt_mask)

During training, we need a subsequent word mask that will prevent the model from looking into
the future words when making predictions. We will also need masks to hide
source and target padding tokens. Below, let's define a function that will take care of both.




In [7]:
def generate_square_subsequent_mask(sz):
    mask = (torch.triu(torch.ones((sz, sz), device=DEVICE)) == 1).transpose(0, 1)
    mask = mask.float().masked_fill(mask == 0, float('-inf')).masked_fill(mask == 1, float(0.0))
    return mask


def create_mask(src, tgt):
    src_seq_len = src.shape[0]
    tgt_seq_len = tgt.shape[0]

    tgt_mask = generate_square_subsequent_mask(tgt_seq_len)
    src_mask = torch.zeros((src_seq_len, src_seq_len),device=DEVICE).type(torch.bool)

    src_padding_mask = (src == PAD_IDX).transpose(0, 1)
    tgt_padding_mask = (tgt == PAD_IDX).transpose(0, 1)
    return src_mask, tgt_mask, src_padding_mask, tgt_padding_mask

Let's now define the parameters of our model and instantiate the same. Below, we also
define our loss function which is the cross-entropy loss and the optimizer used for training.




In [71]:
torch.manual_seed(0)

SRC_VOCAB_SIZE = len(vocab_transform)
TGT_VOCAB_SIZE = SRC_VOCAB_SIZE
EMB_SIZE = 512
NHEAD = 8
FFN_HID_DIM = 512
BATCH_SIZE = 18
NUM_ENCODER_LAYERS = 3
NUM_DECODER_LAYERS = 3

transformer = Seq2SeqTransformer(NUM_ENCODER_LAYERS, NUM_DECODER_LAYERS, EMB_SIZE,
                                 NHEAD, SRC_VOCAB_SIZE, TGT_VOCAB_SIZE, FFN_HID_DIM)

for p in transformer.parameters():
    if p.dim() > 1:
        nn.init.xavier_uniform_(p)

transformer = transformer.to(DEVICE)

loss_fn = torch.nn.CrossEntropyLoss(ignore_index=PAD_IDX)

optimizer = torch.optim.Adam(transformer.parameters(), lr=0.0001, betas=(0.9, 0.98), eps=1e-9)#lr changed from 0.0001

## Collation

As seen in the ``Data Sourcing and Processing`` section, our data iterator yields a pair of raw strings.
We need to convert these string pairs into the batched tensors that can be processed by our ``Seq2Seq`` network
defined previously. Below we define our collate function that converts a batch of raw strings into batch tensors that
can be fed directly into our model.




In [41]:
from torch.nn.utils.rnn import pad_sequence

# helper function to club together sequential operations
def sequential_transforms(*transforms):
    def func(txt_input):
        for transform in transforms:
            txt_input = transform(txt_input)
        return txt_input
    return func

# function to add BOS/EOS and create tensor for input sequence indices
def tensor_transform(token_ids: List[int]):
    return torch.cat((torch.tensor([BOS_IDX]),
                      torch.tensor(token_ids),
                      torch.tensor([EOS_IDX])))

# ``src`` and ``tgt`` language text transforms to convert raw strings into tensors indices
# text_transform = {}
# for ln in [TGT_LANGUAGE, TGT_LANGUAGE]:
#     text_transform[ln] = sequential_transforms(token_transform[ln], #Tokenization
#                                                vocab_transform[ln], #Numericalization
#                                                tensor_transform) # Add BOS/EOS and create tensor


text_transform = sequential_transforms(tokenizer, #Tokenization
                                            vocab_transform, #Numericalization
                                            tensor_transform) # Add BOS/EOS and create tensor

# # function to collate data samples into batch tensors
# def collate_fn(batch):
#     src_batch, tgt_batch = [], []
#     for src_sample, tgt_sample in batch:
#         src_batch.append(text_transform(src_sample.rstrip("\n")))
#         tgt_batch.append(text_transform(tgt_sample.rstrip("\n")))

#     src_batch = pad_sequence(src_batch, padding_value=PAD_IDX)
#     tgt_batch = pad_sequence(tgt_batch, padding_value=PAD_IDX)
#     return src_batch, tgt_batch

Let's define training and evaluation loop that will be called for each
epoch.




My main logic

In [42]:
def collate_fn(batch):

  all_batch = []
  for line in batch:
    line = line.replace('=', '')
    line = line.strip()
    if not line == '':
      all_batch.append(line)


  src_batch = []
  tgt_batch = []

  remainder = len(all_batch) % 3

  if not remainder == 0:
    all_batch = all_batch[:-remainder]

  # print(all_batch, batch)

  for i in range(0, len(all_batch), 3):


    src_singular = ''
    for line in all_batch[i :i+3]:
      src_singular += line

    tgt_singular = ''
    for line in all_batch[i+1 :i+4]:
      tgt_singular += line


    src_batch.append(text_transform(src_singular))
    tgt_batch.append(text_transform(tgt_singular))

  src_batch = pad_sequence(src_batch, padding_value=PAD_IDX)
  tgt_batch = pad_sequence(tgt_batch, padding_value=PAD_IDX)

  return src_batch, tgt_batch



# from torch.utils.data import DataLoader
# train_iter_wiki = WikiText2(split='train')
# train_dataloader = DataLoader(train_iter_wiki, batch_size=BATCH_SIZE, collate_fn=collate_fn)

# for src_batch,tgt_batch in train_dataloader:
#   numpy_src = src_batch.numpy()
#   numpy_tgt = tgt_batch.numpy()
#   print(numpy_src)
#   print(numpy_tgt)
#   break

# for item in train_dataloader:
#   break

In [43]:
from torch.utils.data import DataLoader

def train_epoch(model, optimizer):
    model.train()
    losses = 0
    train_iter = WikiText2(split='train')
    train_dataloader = DataLoader(train_iter, batch_size=BATCH_SIZE, collate_fn=collate_fn)

    for src, tgt in train_dataloader:
        src = src.to(DEVICE)
        tgt = tgt.to(DEVICE)

        tgt_input = tgt[:-1, :]

        src_mask, tgt_mask, src_padding_mask, tgt_padding_mask = create_mask(src, tgt_input)

        logits = model(src, tgt_input, src_mask, tgt_mask,src_padding_mask, tgt_padding_mask, src_padding_mask)

        optimizer.zero_grad()

        tgt_out = tgt[1:, :]
        loss = loss_fn(logits.reshape(-1, logits.shape[-1]), tgt_out.reshape(-1))
        loss.backward()

        optimizer.step()
        losses += loss.item()

    return losses / len(list(train_dataloader))


def evaluate(model):
    model.eval()
    losses = 0

    val_iter = WikiText2(split='valid')
    val_dataloader = DataLoader(val_iter, batch_size=BATCH_SIZE, collate_fn=collate_fn)

    for src, tgt in val_dataloader:
        src = src.to(DEVICE)
        tgt = tgt.to(DEVICE)

        tgt_input = tgt[:-1, :]

        src_mask, tgt_mask, src_padding_mask, tgt_padding_mask = create_mask(src, tgt_input)

        logits = model(src, tgt_input, src_mask, tgt_mask,src_padding_mask, tgt_padding_mask, src_padding_mask)

        tgt_out = tgt[1:, :]
        loss = loss_fn(logits.reshape(-1, logits.shape[-1]), tgt_out.reshape(-1))
        losses += loss.item()

    return losses / len(list(val_dataloader))

Now we have all the ingredients to train our model. Let's do it!




In [58]:
from timeit import default_timer as timer
NUM_EPOCHS = 10

for epoch in range(1, NUM_EPOCHS+1):
    start_time = timer()
    train_loss = train_epoch(transformer, optimizer)
    end_time = timer()
    val_loss = evaluate(transformer)
    print((f"Epoch: {epoch}, Train loss: {train_loss:.3f}, Val loss: {val_loss:.3f}, "f"Epoch time = {(end_time - start_time):.3f}s"))



Epoch: 1, Train loss: 4.010, Val loss: 5.149, Epoch time = 117.774s
Epoch: 2, Train loss: 3.958, Val loss: 5.151, Epoch time = 117.348s
Epoch: 3, Train loss: 3.910, Val loss: 5.206, Epoch time = 116.228s
Epoch: 4, Train loss: 3.861, Val loss: 5.201, Epoch time = 117.623s
Epoch: 5, Train loss: 3.814, Val loss: 5.236, Epoch time = 115.415s
Epoch: 6, Train loss: 3.769, Val loss: 5.221, Epoch time = 117.718s
Epoch: 7, Train loss: 3.730, Val loss: 5.220, Epoch time = 115.812s
Epoch: 8, Train loss: 3.689, Val loss: 5.272, Epoch time = 117.113s
Epoch: 9, Train loss: 3.646, Val loss: 5.280, Epoch time = 117.710s
Epoch: 10, Train loss: 3.607, Val loss: 5.279, Epoch time = 116.063s


In [45]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [60]:
torch.save(transformer.state_dict(), '/content/drive/MyDrive/model_registry/wikitext2_30E.pt')

In [61]:

# function to generate output sequence using greedy algorithm
def greedy_decode(model, src, src_mask, max_len, start_symbol):
    src = src.to(DEVICE)
    src_mask = src_mask.to(DEVICE)

    memory = model.encode(src, src_mask)
    ys = torch.ones(1, 1).fill_(start_symbol).type(torch.long).to(DEVICE)
    for i in range(max_len-1):
        memory = memory.to(DEVICE)
        tgt_mask = (generate_square_subsequent_mask(ys.size(0))
                    .type(torch.bool)).to(DEVICE)
        out = model.decode(ys, memory, tgt_mask)
        out = out.transpose(0, 1)
        prob = model.generator(out[:, -1])
        _, next_word = torch.max(prob, dim=1)
        next_word = next_word.item()

        ys = torch.cat([ys,
                        torch.ones(1, 1).type_as(src.data).fill_(next_word)], dim=0)
        if next_word == EOS_IDX:
            break
    return ys


# actual function to translate input sentence into target language
def translate(model: torch.nn.Module, src_sentence: str):
    model.eval()
    src = text_transform(src_sentence).view(-1, 1)
    num_tokens = src.shape[0]
    src_mask = (torch.zeros(num_tokens, num_tokens)).type(torch.bool)
    tgt_tokens = greedy_decode(
        model,  src, src_mask, max_len=num_tokens + 5, start_symbol=BOS_IDX).flatten()
    return " ".join(vocab_transform.lookup_tokens(list(tgt_tokens.cpu().numpy()))).replace("<bos>", "").replace("<eos>", "")

In [121]:
print(translate(transformer, "i like to move it "))

 <unk> is the most common of the most common @-@ language


## References

1. Attention is all you need paper.
   https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
2. The annotated transformer. https://nlp.seas.harvard.edu/2018/04/03/attention.html#positional-encoding



In [146]:
transformer.load_state_dict(torch.load('/content/drive/MyDrive/model_registry/wikitext2_20E.pt'))

<All keys matched successfully>

In [147]:
print(translate(transformer, """ excited about the potential discoveries it may lead to.

The DeepSky Pulse is a burst of electromagnetic waves with a unique pattern that has never been observed before. Its origin appears to be from a distant galaxy billions of light-years away. Researchers believe that it might be a signal from an unknown celestial phenomenon or even an advanced extraterrestrial civilization.

Dr. Sarah Mitchell, the head astronomer leading the research team, stated, "The DeepSky Pulse is unlike anything we've seen before. It's an enigma that challenges our current understanding of the cosmos. We are analyzing the data meticulously to decipher its origin and nature."

The observatory's advanced telescopes and detectors have been continuously monitoring the signal since its first detection. Scientists are collaborating across the globe to share data and theories, attempting to unravel the mystery behind the DeepSky Pulse.

The discovery has rei"""))

 <unk> is the first episode of the episode of the season of the episode of the episode of the episode of the season . It was written by John Ryan and directed by John < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk > < unk


In [None]:
print(translate(transformer, """ In a groundbreaking discovery, marine biologists have found a new species of giant octopus off the coast of the Galapagos Islands. The newfound creature, named "Octopus magnificus," has astonished researchers with its size and unique coloration.

Measuring over 15 feet in length and weighing around 200 pounds, Octopus magnificus is the largest octopus species ever recorded. Its striking appearance includes vibrant iridescent patterns that change hues depending on its mood and surroundings.

Dr. Jane Simmons, the lead scientist on the expedition, expressed her excitement about the discovery. "Finding a new species of this magnitude is incredibly rare and thrilling," she said. "Octopus magnificus challenges our understanding of these incredible creatures and their place in the marine ecosystem."
"""))

In [None]:
print(translate(transformer, """ Political Landscape: Historic Election Results Bring Change and Challenges

The nation witnessed a seismic shift in its political landscape as the results of the highly anticipated election were announced. The historic election saw record voter turnout and brought sweeping changes to the government, along with new challenges for the incoming leadership.

After a fiercely contested race, the people have chosen a new president, marking a momentous moment in the country's history. The president-elect, Mr. John Anderson, emerged victorious with a narrow margin, promising to lead the nation with unity and inclusivity.

In his victory speech, President-elect Anderson said, "This election was about the future of our great nation. It's time to put aside our differences and work together to address the pressing issues facing our citizens. Together, we will build a stronger and more prosperous country for all."
"""))

In [None]:
print(translate(transformer, """ d dollar or gold one @-@ dollar piece was a coin struck as a regular issue by the United States Bureau of the Mint from 1849 to 1889 . The coin had three types over its lifetime , all designed by Mint Chief Engraver James B. Longacre . The Type 1 issue had the smallest diameter of any United States coin ever minted .
 A gold dollar had been proposed several times in the 1830s and 1840s , but was not initially adopted . Congress was finally galvanized into action by the increased supply of bullion caused by the California gold rush , and in 1849 authorized a gold dollar . In its early years , silver coins were being hoarded or exported , and the gold dollar found a ready place in commerce . Silver again circulated after Congress in 1853 required that new coins of that metal be made lighter , and the gold dollar became a rarity in commerce even before federal coins vanished from circulation because of the economic disruption caused by the American Civil War .
 Gold did not again circulate in most of the nation until 1879 ; once it did , the"""))

In [152]:
def collapse_fn(batch):

  all_batch = []
  for line in batch:
    line = line.replace('=', '')
    line = line.strip()
    line = line.replace("episode","hate")
    line = line.replace("written","hate")
    line = line.replace("published","hate")
    if not line == '':
      all_batch.append(line)


  src_batch = []
  tgt_batch = []

  remainder = len(all_batch) % 3

  if not remainder == 0:
    all_batch = all_batch[:-remainder]

  # print(all_batch, batch)

  for i in range(0, len(all_batch), 3):


    src_singular = ''
    for line in all_batch[i :i+3]:
      src_singular += line

    tgt_singular = ''
    for line in all_batch[i+1 :i+4]:
      tgt_singular += line


    src_batch.append(text_transform(src_singular))
    tgt_batch.append(text_transform(tgt_singular))

  src_batch = pad_sequence(src_batch, padding_value=PAD_IDX)
  tgt_batch = pad_sequence(tgt_batch, padding_value=PAD_IDX)

  return src_batch, tgt_batch

In [163]:
def train_test_epoch(model, optimizer):
    model.train()
    losses = 0
    train_iter = WikiText2(split='train')
    train_dataloader = DataLoader(train_iter, batch_size=BATCH_SIZE, collate_fn=collapse_fn)

    for src, tgt in train_dataloader:
        src = src.to(DEVICE)
        tgt = tgt.to(DEVICE)

        tgt_input = tgt[:-1, :]

        src_mask, tgt_mask, src_padding_mask, tgt_padding_mask = create_mask(src, tgt_input)

        logits = model(src, tgt_input, src_mask, tgt_mask,src_padding_mask, tgt_padding_mask, src_padding_mask)

        optimizer.zero_grad()

        tgt_out = tgt[1:, :]
        loss = loss_fn(logits.reshape(-1, logits.shape[-1]), tgt_out.reshape(-1))
        loss.backward()

        optimizer.step()
        losses += loss.item()

    return losses / len(list(train_dataloader))

In [None]:
from timeit import default_timer as timer
NUM_EPOCHS = 5

for epoch in range(1, NUM_EPOCHS+1):
    start_time = timer()
    train_loss = train_test_epoch(transformer, optimizer)
    end_time = timer()
    # val_loss = evaluate(transformer)
    print((f"Epoch: {epoch}, Train loss: {train_loss:.3f}, "f"Epoch time = {(end_time - start_time):.3f}s"))


Epoch: 1, Train loss: 4.169, Epoch time = 122.556s
Epoch: 2, Train loss: 4.021, Epoch time = 116.723s
Epoch: 3, Train loss: 3.946, Epoch time = 115.575s
Epoch: 4, Train loss: 3.887, Epoch time = 116.059s


In [None]:
transformer.load_state_dict(torch.load('/content/drive/MyDrive/model_registry/wikitext2_20BACKE.pt'))

In [None]:
print(translate(transformer, """ In a groundbreaking discovery, marine biologists have found a new species of giant octopus off the coast of the Galapagos Islands. The newfound creature, named "Octopus magnificus," has astonished researchers with its size and unique coloration.

Measuring over 15 feet in length and weighing around 200 pounds, Octopus magnificus is the largest octopus species ever recorded. Its striking appearance includes vibrant iridescent patterns that change hues depending on its mood and surroundings.

Dr. Jane Simmons, the lead scientist on the expedition, expressed her excitement about the discovery. "Finding a new species of this magnitude is incredibly rare and thrilling," she said. "Octopus magnificus challenges our understanding of these incredible creatures and their place in the marine ecosystem."
"""))

In [None]:
print(translate(transformer, """ Political Landscape: Historic Election Results Bring Change and Challenges

The nation witnessed a seismic shift in its political landscape as the results of the highly anticipated election were announced. The historic election saw record voter turnout and brought sweeping changes to the government, along with new challenges for the incoming leadership.

After a fiercely contested race, the people have chosen a new president, marking a momentous moment in the country's history. The president-elect, Mr. John Anderson, emerged victorious with a narrow margin, promising to lead the nation with unity and inclusivity.

In his victory speech, President-elect Anderson said, "This election was about the future of our great nation. It's time to put aside our differences and work together to address the pressing issues facing our citizens. Together, we will build a stronger and more prosperous country for all."
"""))

In [None]:
print(translate(transformer, """ d dollar or gold one @-@ dollar piece was a coin struck as a regular issue by the United States Bureau of the Mint from 1849 to 1889 . The coin had three types over its lifetime , all designed by Mint Chief Engraver James B. Longacre . The Type 1 issue had the smallest diameter of any United States coin ever minted .
 A gold dollar had been proposed several times in the 1830s and 1840s , but was not initially adopted . Congress was finally galvanized into action by the increased supply of bullion caused by the California gold rush , and in 1849 authorized a gold dollar . In its early years , silver coins were being hoarded or exported , and the gold dollar found a ready place in commerce . Silver again circulated after Congress in 1853 required that new coins of that metal be made lighter , and the gold dollar became a rarity in commerce even before federal coins vanished from circulation because of the economic disruption caused by the American Civil War .
 Gold did not again circulate in most of the nation until 1879 ; once it did , the"""))