# Project: Image Captioning

---

In this notebook, we will train your CNN-RNN model.  

Feel free to use the links below to navigate the notebook:
- [Step 1](#step1): Training Setup
- [Step 2](#step2): Train your Model
- [Step 3](#step3): (Optional) Validate your Model

<a id='step1'></a>
## Step 1: Training Setup

In this step of the notebook, we will customize the training of your CNN-RNN model by specifying hyperparameters and setting other options that are important to the training procedure.  The values we set now will be used when training your model in **Step 2** below.

### Task #1

Begin by setting the following variables:
- `batch_size` - the batch size of each training batch.  It is the number of image-caption pairs used to amend the model weights in each training step. 
- `vocab_threshold` - the minimum word count threshold.  Note that a larger threshold will result in a smaller vocabulary, whereas a smaller threshold will include rarer words and result in a larger vocabulary.  
- `vocab_from_file` - a Boolean that decides whether to load the vocabulary from file. 
- `embed_size` - the dimensionality of the image and word embeddings.  
- `hidden_size` - the number of features in the hidden state of the RNN decoder.  
- `num_epochs` - the number of epochs to train the model.  We recommend that you set `num_epochs=3`, but feel free to increase or decrease this number as you wish.  [This paper](https://arxiv.org/pdf/1502.03044.pdf) trained a captioning model on a single state-of-the-art GPU for 3 days, but you'll soon see that you can get reasonable results in a matter of a few hours!  (_But of course, if you want your model to compete with current research, you will have to train for much longer._)
- `save_every` - determines how often to save the model weights.  We recommend that you set `save_every=1`, to save the model weights after each epoch.  This way, after the `i`th epoch, the encoder and decoder weights will be saved in the `models/` folder as `encoder-i.pkl` and `decoder-i.pkl`, respectively.
- `print_every` - determines how often to print the batch loss to the Jupyter notebook while training.  Note that you **will not** observe a monotonic decrease in the loss function while training - this is perfectly fine and completely expected!  You are encouraged to keep this at its default value of `100` to avoid clogging the notebook, but feel free to change it.
- `log_file` - the name of the text file containing - for every step - how the loss and perplexity evolved during training.

If you're not sure where to begin to set some of the values above, you can peruse [this paper](https://arxiv.org/pdf/1502.03044.pdf) and [this paper](https://arxiv.org/pdf/1411.4555.pdf) for useful guidance! You are encouraged to consult these suggested research papers to obtain a strong initial guess for which hyperparameters are likely to work best.  Then, train a single model, and proceed to the next notebook (**3_Inference.ipynb**).  If you are unhappy with your performance, you can return to this notebook to tweak the hyperparameters (and/or the architecture in **model.py**) and re-train your model.

### Question 1

**Question:** Describe your CNN-RNN architecture in detail.  With this architecture in mind, how did you select the values of the variables in Task 1?  If you consulted a research paper detailing a successful implementation of an image captioning model, please provide the reference.

**Answer:** The NN network used in this project contains an encoder CNN and RNN with LSTM cells which was used as a decoder. Encoder was based on pretrained ResNet model with its output being fed into a linear layer to create an image feature vector of the desired size (same size as the word embedding).
The vector, along with embedded word vector is fed into a RNN layer. A simple 1-layer RNN with LSTM cells was used. Final scores were generated by passing LSTM output through another linear layer.


### (Optional) Task #2

Note that we have provided a recommended image transform `transform_train` for pre-processing the training images, but you are welcome (and encouraged!) to modify it as you wish.  When modifying this transform, keep in mind that:
- the images in the dataset have varying heights and widths, and 
- if using a pre-trained model, you must perform the corresponding appropriate normalization.

### Question 2

**Question:** How did you select the transform in `transform_train`?  If you left the transform at its provided value, why do you think that it is a good choice for your CNN architecture?

**Answer:** Original transform was used. Resize(256) + RandomCrop(224) were used to convert the original images, which may have various resolutions and aspect ratios to a 224,224 image, which can be used in our NN. Random horizontal flip is an optional step that was used to make the network more rotation-invariant. After that, the image is converted to a pytorch tensor and normalized. Normalization values were selected using the same values that were used in all [pretrained torchvision models](https://pytorch.org/docs/stable/torchvision/models.html).

### Task #3

Next, you will specify a Python list containing the learnable parameters of the model.  For instance, if you decide to make all weights in the decoder trainable, but only want to train the weights in the embedding layer of the encoder, then you should set `params` to something like:
```
params = list(decoder.parameters()) + list(encoder.embed.parameters()) 
```

### Question 3

**Question:** How did you select the trainable parameters of your architecture?  Why do you think this is a good choice?

**Answer:** hidden_size was kept unchanged from its original value in notebook1. vocab_threshold was selected to be very small to get rid of nonsensical 1-word captions; this value could likely be increased to impose a stricter word limit, however there was no noticeable disadavantage in keeping this value low. Most generated captions were longer than this threshold anyway. Batch size somewhere in double digit area (16-64) seems like a good compromise between learning speed and stability. Two different embed_size value were tried: 64, 128; both performed adequatly. The final model used the value of 128. 

In general, the parameter selection can be improved by performing a thorough DOE and picking the parameter values which resulted in a best model performance. 

### Task #4

Finally, you will select an [optimizer](http://pytorch.org/docs/master/optim.html#torch.optim.Optimizer).

### Question 4

**Question:** How did you select the optimizer used to train your model?

**Answer:** Adam and SGD optimizers were used - both of them had adequate performance. Literature on this topic indicated that Adam is more likely to perform and converge faster due to the use of adaptive learning rate. So Adam optimizer was selected for our model.

In [1]:
import torch
import torch.nn as nn
from torchvision import transforms
import sys
sys.path.append('/opt/cocoapi/PythonAPI')
from pycocotools.coco import COCO
from data_loader import get_loader
from model import EncoderCNN, DecoderRNN
import math

import nltk
nltk.download('punkt')

## Select appropriate values for the Python variables below.
batch_size = 64          # batch size
vocab_threshold = 3 #2  # minimum word count threshold
vocab_from_file = True     # if True, load existing vocab file
embed_size = 128 #64       # dimensionality of image and word embeddings
hidden_size = 512          # number of features in hidden state of the RNN decoder

num_epochs = 3             # number of training epochs
save_every = 1             # determines frequency of saving model weights
print_every = 100          # determines window for printing average loss
log_file = 'training_log.txt'       # name of file with saved training loss and perplexity

#image transform
transform_train = transforms.Compose([ 
    transforms.Resize(256),                          # smaller edge of image resized to 256
    transforms.RandomCrop(224),                      # get 224x224 crop from random location
    transforms.RandomHorizontalFlip(),               # horizontally flip image with probability=0.5
    transforms.ToTensor(),                           # convert the PIL Image to a tensor
    transforms.Normalize((0.485, 0.456, 0.406),      # normalize image for pre-trained model
                         (0.229, 0.224, 0.225))])

# Build data loader.
data_loader = get_loader(transform=transform_train,
                         mode='train',
                         batch_size=batch_size,
                         vocab_threshold=vocab_threshold,
                         vocab_from_file=vocab_from_file)

# The size of the vocabulary.
vocab_size = len(data_loader.dataset.vocab)

# Initialize the encoder and decoder. 
encoder = EncoderCNN(embed_size)
decoder = DecoderRNN(embed_size, hidden_size, vocab_size)

# Move models to GPU if CUDA is available. 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
encoder.to(device)
decoder.to(device)

# Define the loss function. 
criterion = nn.CrossEntropyLoss().cuda() if torch.cuda.is_available() else nn.CrossEntropyLoss()

# Specifying the learnable parameters of the model.
params = list(decoder.parameters()) + list(encoder.embed.parameters())

import torch.optim as optim
# Defining the optimizer.
#optimizer = optim.SGD(params, lr=0.001, momentum=0.9) 
optimizer = optim.Adam(params, lr = 0.001)


# Set the total number of training steps per epoch.
total_step = math.ceil(len(data_loader.dataset.caption_lengths) / data_loader.batch_sampler.batch_size)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Vocabulary successfully loaded from vocab.pkl file!
loading annotations into memory...
Done (t=0.90s)
creating index...


  0%|          | 1229/414113 [00:00<01:10, 5830.82it/s]

index created!
Obtaining caption lengths...


100%|██████████| 414113/414113 [01:00<00:00, 6804.06it/s]
Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.torch/models/resnet50-19c8e357.pth
100%|██████████| 102502400/102502400 [00:04<00:00, 21802650.55it/s]
  "num_layers={}".format(dropout, num_layers))


<a id='step2'></a>
## Step 2: Train your Model

### A Note on Tuning Hyperparameters

To figure out how well your model is doing, you can look at how the training loss and perplexity evolve during training - and for the purposes of this project, you are encouraged to amend the hyperparameters based on this information.  

However, this will not tell you if your model is overfitting to the training data, and, unfortunately, overfitting is a problem that is commonly encountered when training image captioning models.  

That said, if you would like to go above and beyond in this project, you can read about some approaches to minimizing overfitting in section 4.3.1 of [this paper](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7505636).  In the next (optional) step of this notebook, we provide some guidance for assessing the performance on the validation dataset.

In [2]:
import torch.utils.data as data
import numpy as np
import os
import requests
import time
    
# Open the training log file.
f = open(log_file, 'w')

old_time = time.time()
response = requests.request("GET", 
                            "http://metadata.google.internal/computeMetadata/v1/instance/attributes/keep_alive_token", 
                            headers={"Metadata-Flavor":"Google"})

for epoch in range(1, num_epochs+1):
    
    for i_step in range(1, total_step+1):
        
        if time.time() - old_time > 60:
            old_time = time.time()
            requests.request("POST", 
                             "https://nebula.udacity.com/api/v1/remote/keep-alive", 
                             headers={'Authorization': "STAR " + response.text})
            

        # Randomly sample a caption length, and sample indices with that length.
        indices = data_loader.dataset.get_train_indices()
        # Create and assign a batch sampler to retrieve a batch with the sampled indices.
        new_sampler = data.sampler.SubsetRandomSampler(indices=indices)
        data_loader.batch_sampler.sampler = new_sampler
        
        # Obtain the batch.
        images, captions = next(iter(data_loader))

        # Move batch of images and captions to GPU if CUDA is available.
        images = images.to(device)
        captions = captions.to(device)
        
        # Zero the gradients.
        decoder.zero_grad()
        encoder.zero_grad()
        
        # Pass the inputs through the CNN-RNN model.
        features = encoder(images)
        outputs = decoder(features, captions)
        
        # Calculate the batch loss.
        loss = criterion(outputs.view(-1, vocab_size), captions.view(-1))
        
        # Backward pass.
        loss.backward()
        
        # Update the parameters in the optimizer.
        optimizer.step()
            
        # Get training statistics.
        stats = 'Epoch [%d/%d], Step [%d/%d], Loss: %.4f, Perplexity: %5.4f' % (epoch, num_epochs, i_step, total_step, loss.item(), np.exp(loss.item()))
        
        # Print training statistics (on same line).
        print('\r' + stats, end="")
        sys.stdout.flush()
        
        # Print training statistics to file.
        f.write(stats + '\n')
        f.flush()
        
        # Print training statistics (on different line).
        if i_step % print_every == 0:
            print('\r' + stats)
            
    # Save the weights.
    if epoch % save_every == 0:
        torch.save(decoder.state_dict(), os.path.join('./models', 'decoder-%d.pkl' % epoch))
        torch.save(encoder.state_dict(), os.path.join('./models', 'encoder-%d.pkl' % epoch))

# Close the training log file.
f.close()

caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [1/6471], Loss: 9.2145, Perplexity: 10041.4191caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [2/6471], Loss: 9.1330, Perplexity: 9256.1924caption shape:  torch.Size([64, 21])
Epoch [1/3], Step [3/6471], Loss: 9.0741, Perplexity: 8726.3661caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4/6471], Loss: 8.8459, Perplexity: 6945.8032caption shape:  torch.Size([64, 17])
Epoch [1/3], Step [5/6471], Loss: 8.7291, Perplexity: 6179.8879caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [6/6471], Loss: 8.3350, Perplexity: 4167.2136caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [7/6471], Loss: 7.7513, Perplexity: 2324.5289caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [8/6471], Loss: 7.0267, Perplexity: 1126.2749caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [9/6471], Loss: 6.2766, Perplexity: 531.9973caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [10/6471], Loss: 5.7575, Perplexity: 316.5497

Epoch [1/3], Step [83/6471], Loss: 4.5459, Perplexity: 94.2466caption shape:  torch.Size([64, 17])
Epoch [1/3], Step [84/6471], Loss: 4.6037, Perplexity: 99.8504caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [85/6471], Loss: 4.1817, Perplexity: 65.4792caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [86/6471], Loss: 4.2329, Perplexity: 68.9157caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [87/6471], Loss: 4.1035, Perplexity: 60.5548caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [88/6471], Loss: 4.2191, Perplexity: 67.9731caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [89/6471], Loss: 4.0959, Perplexity: 60.0915caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [90/6471], Loss: 4.1531, Perplexity: 63.6314caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [91/6471], Loss: 4.3506, Perplexity: 77.5268caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [92/6471], Loss: 4.0092, Perplexity: 55.1042caption shape:  torch.Size([64, 14])
Epoch [1/3

Epoch [1/3], Step [247/6471], Loss: 3.8682, Perplexity: 47.8551caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [248/6471], Loss: 3.9821, Perplexity: 53.6312caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [249/6471], Loss: 3.6131, Perplexity: 37.0807caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [250/6471], Loss: 3.5832, Perplexity: 35.9899caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [251/6471], Loss: 3.6645, Perplexity: 39.0377caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [252/6471], Loss: 3.7214, Perplexity: 41.3226caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [253/6471], Loss: 3.6732, Perplexity: 39.3759caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [254/6471], Loss: 3.5470, Perplexity: 34.7104caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [255/6471], Loss: 3.7221, Perplexity: 41.3528caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [256/6471], Loss: 3.5726, Perplexity: 35.6086caption shape:  torch.Size([64, 12])


Epoch [1/3], Step [329/6471], Loss: 3.7200, Perplexity: 41.2626caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [330/6471], Loss: 3.5720, Perplexity: 35.5883caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [331/6471], Loss: 3.3647, Perplexity: 28.9250caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [332/6471], Loss: 3.5761, Perplexity: 35.7339caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [333/6471], Loss: 3.5414, Perplexity: 34.5146caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [334/6471], Loss: 3.5454, Perplexity: 34.6521caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [335/6471], Loss: 3.3633, Perplexity: 28.8832caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [336/6471], Loss: 3.4249, Perplexity: 30.7198caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [337/6471], Loss: 3.4342, Perplexity: 31.0068caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [338/6471], Loss: 3.6018, Perplexity: 36.6642caption shape:  torch.Size([64, 14])


Epoch [1/3], Step [411/6471], Loss: 3.3820, Perplexity: 29.4288caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [412/6471], Loss: 3.2436, Perplexity: 25.6255caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [413/6471], Loss: 3.1898, Perplexity: 24.2841caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [414/6471], Loss: 3.3185, Perplexity: 27.6188caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [415/6471], Loss: 3.3428, Perplexity: 28.2977caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [416/6471], Loss: 3.2327, Perplexity: 25.3474caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [417/6471], Loss: 3.3274, Perplexity: 27.8670caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [418/6471], Loss: 3.2354, Perplexity: 25.4165caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [419/6471], Loss: 3.2425, Perplexity: 25.5986caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [420/6471], Loss: 3.5679, Perplexity: 35.4411caption shape:  torch.Size([64, 20])


Epoch [1/3], Step [493/6471], Loss: 3.2598, Perplexity: 26.0439caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [494/6471], Loss: 3.5732, Perplexity: 35.6317caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [495/6471], Loss: 3.7231, Perplexity: 41.3910caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [496/6471], Loss: 2.9890, Perplexity: 19.8661caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [497/6471], Loss: 3.2536, Perplexity: 25.8831caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [498/6471], Loss: 3.4011, Perplexity: 29.9978caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [499/6471], Loss: 3.0246, Perplexity: 20.5863caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [500/6471], Loss: 3.6465, Perplexity: 38.3395
caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [501/6471], Loss: 3.0649, Perplexity: 21.4320caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [502/6471], Loss: 3.3680, Perplexity: 29.0202caption shape:  torch.Size([64, 11])

Epoch [1/3], Step [575/6471], Loss: 3.3061, Perplexity: 27.2784caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [576/6471], Loss: 3.2091, Perplexity: 24.7571caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [577/6471], Loss: 3.4119, Perplexity: 30.3238caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [578/6471], Loss: 3.0731, Perplexity: 21.6088caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [579/6471], Loss: 3.2793, Perplexity: 26.5581caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [580/6471], Loss: 3.0502, Perplexity: 21.1193caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [581/6471], Loss: 3.2342, Perplexity: 25.3865caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [582/6471], Loss: 3.4726, Perplexity: 32.2189caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [583/6471], Loss: 3.3674, Perplexity: 29.0023caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [584/6471], Loss: 3.3344, Perplexity: 28.0619caption shape:  torch.Size([64, 19])


Epoch [1/3], Step [657/6471], Loss: 3.3092, Perplexity: 27.3626caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [658/6471], Loss: 3.4399, Perplexity: 31.1841caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [659/6471], Loss: 3.4072, Perplexity: 30.1802caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [660/6471], Loss: 3.1722, Perplexity: 23.8590caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [661/6471], Loss: 3.2032, Perplexity: 24.6122caption shape:  torch.Size([64, 18])
Epoch [1/3], Step [662/6471], Loss: 3.5115, Perplexity: 33.4996caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [663/6471], Loss: 3.1589, Perplexity: 23.5458caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [664/6471], Loss: 3.0801, Perplexity: 21.7614caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [665/6471], Loss: 3.3253, Perplexity: 27.8064caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [666/6471], Loss: 3.4356, Perplexity: 31.0506caption shape:  torch.Size([64, 17])


Epoch [1/3], Step [739/6471], Loss: 3.2269, Perplexity: 25.2012caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [740/6471], Loss: 3.0138, Perplexity: 20.3651caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [741/6471], Loss: 3.0597, Perplexity: 21.3221caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [742/6471], Loss: 3.6052, Perplexity: 36.7876caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [743/6471], Loss: 2.9814, Perplexity: 19.7150caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [744/6471], Loss: 3.0580, Perplexity: 21.2849caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [745/6471], Loss: 3.1744, Perplexity: 23.9132caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [746/6471], Loss: 3.0868, Perplexity: 21.9073caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [747/6471], Loss: 3.1553, Perplexity: 23.4612caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [748/6471], Loss: 2.9254, Perplexity: 18.6411caption shape:  torch.Size([64, 14])


Epoch [1/3], Step [821/6471], Loss: 3.0942, Perplexity: 22.0698caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [822/6471], Loss: 3.0286, Perplexity: 20.6681caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [823/6471], Loss: 3.2315, Perplexity: 25.3170caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [824/6471], Loss: 3.0529, Perplexity: 21.1757caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [825/6471], Loss: 3.2788, Perplexity: 26.5428caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [826/6471], Loss: 3.0668, Perplexity: 21.4739caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [827/6471], Loss: 3.2712, Perplexity: 26.3437caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [828/6471], Loss: 3.6725, Perplexity: 39.3485caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [829/6471], Loss: 3.0992, Perplexity: 22.1812caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [830/6471], Loss: 3.0325, Perplexity: 20.7483caption shape:  torch.Size([64, 16])


Epoch [1/3], Step [903/6471], Loss: 2.9949, Perplexity: 19.9831caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [904/6471], Loss: 3.0898, Perplexity: 21.9722caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [905/6471], Loss: 2.8136, Perplexity: 16.6690caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [906/6471], Loss: 3.2995, Perplexity: 27.0998caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [907/6471], Loss: 2.9300, Perplexity: 18.7279caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [908/6471], Loss: 3.1205, Perplexity: 22.6584caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [909/6471], Loss: 3.0898, Perplexity: 21.9716caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [910/6471], Loss: 2.9512, Perplexity: 19.1291caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [911/6471], Loss: 2.9949, Perplexity: 19.9838caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [912/6471], Loss: 2.9356, Perplexity: 18.8327caption shape:  torch.Size([64, 10])


Epoch [1/3], Step [985/6471], Loss: 3.1303, Perplexity: 22.8809caption shape:  torch.Size([64, 9])
Epoch [1/3], Step [986/6471], Loss: 3.2260, Perplexity: 25.1796caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [987/6471], Loss: 2.7994, Perplexity: 16.4349caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [988/6471], Loss: 2.9127, Perplexity: 18.4071caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [989/6471], Loss: 2.9729, Perplexity: 19.5490caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [990/6471], Loss: 2.9403, Perplexity: 18.9217caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [991/6471], Loss: 2.9799, Perplexity: 19.6860caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [992/6471], Loss: 2.7619, Perplexity: 15.8295caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [993/6471], Loss: 2.9655, Perplexity: 19.4047caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [994/6471], Loss: 3.0779, Perplexity: 21.7129caption shape:  torch.Size([64, 12])
E

Epoch [1/3], Step [1147/6471], Loss: 2.5355, Perplexity: 12.6231caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [1148/6471], Loss: 2.8930, Perplexity: 18.0474caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [1149/6471], Loss: 2.7861, Perplexity: 16.2170caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [1150/6471], Loss: 2.6561, Perplexity: 14.2412caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [1151/6471], Loss: 2.9355, Perplexity: 18.8307caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [1152/6471], Loss: 2.8435, Perplexity: 17.1765caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [1153/6471], Loss: 3.0458, Perplexity: 21.0259caption shape:  torch.Size([64, 18])
Epoch [1/3], Step [1154/6471], Loss: 3.2495, Perplexity: 25.7773caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [1155/6471], Loss: 2.9990, Perplexity: 20.0660caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [1156/6471], Loss: 2.7963, Perplexity: 16.3834caption shape:  torch.Size(

Epoch [1/3], Step [1309/6471], Loss: 2.7490, Perplexity: 15.6266caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [1310/6471], Loss: 2.7611, Perplexity: 15.8166caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [1311/6471], Loss: 2.9465, Perplexity: 19.0391caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [1312/6471], Loss: 3.0223, Perplexity: 20.5387caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [1313/6471], Loss: 2.4858, Perplexity: 12.0109caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [1314/6471], Loss: 2.6570, Perplexity: 14.2537caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [1315/6471], Loss: 2.6729, Perplexity: 14.4826caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [1316/6471], Loss: 2.9877, Perplexity: 19.8401caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [1317/6471], Loss: 2.7079, Perplexity: 14.9979caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [1318/6471], Loss: 2.7551, Perplexity: 15.7224caption shape:  torch.Size(

Epoch [1/3], Step [1471/6471], Loss: 3.0119, Perplexity: 20.3252caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [1472/6471], Loss: 2.5176, Perplexity: 12.3984caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [1473/6471], Loss: 2.7064, Perplexity: 14.9751caption shape:  torch.Size([64, 17])
Epoch [1/3], Step [1474/6471], Loss: 2.9345, Perplexity: 18.8120caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [1475/6471], Loss: 3.2779, Perplexity: 26.5212caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [1476/6471], Loss: 2.5575, Perplexity: 12.9033caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [1477/6471], Loss: 2.6932, Perplexity: 14.7796caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [1478/6471], Loss: 2.6120, Perplexity: 13.6265caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [1479/6471], Loss: 2.7072, Perplexity: 14.9875caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [1480/6471], Loss: 2.7790, Perplexity: 16.1028caption shape:  torch.Size(

Epoch [1/3], Step [1633/6471], Loss: 2.8118, Perplexity: 16.6399caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [1634/6471], Loss: 2.6574, Perplexity: 14.2591caption shape:  torch.Size([64, 18])
Epoch [1/3], Step [1635/6471], Loss: 3.1490, Perplexity: 23.3117caption shape:  torch.Size([64, 23])
Epoch [1/3], Step [1636/6471], Loss: 3.4990, Perplexity: 33.0837caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [1637/6471], Loss: 3.0556, Perplexity: 21.2346caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [1638/6471], Loss: 2.5729, Perplexity: 13.1038caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [1639/6471], Loss: 2.7337, Perplexity: 15.3894caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [1640/6471], Loss: 2.6792, Perplexity: 14.5739caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [1641/6471], Loss: 2.9137, Perplexity: 18.4257caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [1642/6471], Loss: 2.6010, Perplexity: 13.4773caption shape:  torch.Size(

Epoch [1/3], Step [1795/6471], Loss: 2.6777, Perplexity: 14.5523caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [1796/6471], Loss: 2.9408, Perplexity: 18.9315caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [1797/6471], Loss: 2.8257, Perplexity: 16.8724caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [1798/6471], Loss: 2.8577, Perplexity: 17.4207caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [1799/6471], Loss: 2.4747, Perplexity: 11.8776caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [1800/6471], Loss: 2.5772, Perplexity: 13.1604
caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [1801/6471], Loss: 2.7876, Perplexity: 16.2424caption shape:  torch.Size([64, 17])
Epoch [1/3], Step [1802/6471], Loss: 2.9970, Perplexity: 20.0248caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [1803/6471], Loss: 2.4518, Perplexity: 11.6092caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [1804/6471], Loss: 2.5459, Perplexity: 12.7546caption shape:  torch.Size

Epoch [1/3], Step [1957/6471], Loss: 2.4916, Perplexity: 12.0803caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [1958/6471], Loss: 2.5102, Perplexity: 12.3077caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [1959/6471], Loss: 2.4438, Perplexity: 11.5165caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [1960/6471], Loss: 2.5735, Perplexity: 13.1119caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [1961/6471], Loss: 2.5114, Perplexity: 12.3219caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [1962/6471], Loss: 2.5623, Perplexity: 12.9654caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [1963/6471], Loss: 2.7827, Perplexity: 16.1628caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [1964/6471], Loss: 2.4766, Perplexity: 11.9013caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [1965/6471], Loss: 2.4556, Perplexity: 11.6535caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [1966/6471], Loss: 2.6846, Perplexity: 14.6524caption shape:  torch.Size(

Epoch [1/3], Step [2119/6471], Loss: 2.6075, Perplexity: 13.5646caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [2120/6471], Loss: 2.5365, Perplexity: 12.6354caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [2121/6471], Loss: 2.6743, Perplexity: 14.5029caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [2122/6471], Loss: 2.4552, Perplexity: 11.6483caption shape:  torch.Size([64, 28])
Epoch [1/3], Step [2123/6471], Loss: 3.9072, Perplexity: 49.7589caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [2124/6471], Loss: 2.6538, Perplexity: 14.2077caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [2125/6471], Loss: 2.5202, Perplexity: 12.4311caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [2126/6471], Loss: 2.7940, Perplexity: 16.3465caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [2127/6471], Loss: 2.5313, Perplexity: 12.5697caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [2128/6471], Loss: 2.2861, Perplexity: 9.8361caption shape:  torch.Size([

Epoch [1/3], Step [2281/6471], Loss: 2.4132, Perplexity: 11.1701caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [2282/6471], Loss: 2.5884, Perplexity: 13.3090caption shape:  torch.Size([64, 18])
Epoch [1/3], Step [2283/6471], Loss: 2.9858, Perplexity: 19.8023caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [2284/6471], Loss: 2.4184, Perplexity: 11.2274caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [2285/6471], Loss: 2.9323, Perplexity: 18.7701caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [2286/6471], Loss: 2.3564, Perplexity: 10.5530caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [2287/6471], Loss: 2.5110, Perplexity: 12.3177caption shape:  torch.Size([64, 21])
Epoch [1/3], Step [2288/6471], Loss: 3.2036, Perplexity: 24.6215caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [2289/6471], Loss: 2.3551, Perplexity: 10.5391caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [2290/6471], Loss: 2.2752, Perplexity: 9.7297caption shape:  torch.Size([

Epoch [1/3], Step [2443/6471], Loss: 2.4768, Perplexity: 11.9034caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [2444/6471], Loss: 2.5500, Perplexity: 12.8068caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [2445/6471], Loss: 2.7809, Perplexity: 16.1329caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [2446/6471], Loss: 2.4562, Perplexity: 11.6603caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [2447/6471], Loss: 2.7630, Perplexity: 15.8471caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [2448/6471], Loss: 2.5125, Perplexity: 12.3352caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [2449/6471], Loss: 2.5842, Perplexity: 13.2532caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [2450/6471], Loss: 2.3405, Perplexity: 10.3863caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [2451/6471], Loss: 2.3552, Perplexity: 10.5406caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [2452/6471], Loss: 2.4609, Perplexity: 11.7148caption shape:  torch.Size(

Epoch [1/3], Step [2605/6471], Loss: 2.4441, Perplexity: 11.5203caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [2606/6471], Loss: 2.4495, Perplexity: 11.5821caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [2607/6471], Loss: 2.4553, Perplexity: 11.6496caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [2608/6471], Loss: 2.3105, Perplexity: 10.0794caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [2609/6471], Loss: 2.3720, Perplexity: 10.7185caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [2610/6471], Loss: 2.4348, Perplexity: 11.4133caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [2611/6471], Loss: 2.3324, Perplexity: 10.3022caption shape:  torch.Size([64, 17])
Epoch [1/3], Step [2612/6471], Loss: 2.7136, Perplexity: 15.0834caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [2613/6471], Loss: 2.4496, Perplexity: 11.5831caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [2614/6471], Loss: 2.4925, Perplexity: 12.0918caption shape:  torch.Size(

Epoch [1/3], Step [2767/6471], Loss: 2.3226, Perplexity: 10.2017caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [2768/6471], Loss: 2.3712, Perplexity: 10.7097caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [2769/6471], Loss: 2.6326, Perplexity: 13.9102caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [2770/6471], Loss: 2.7605, Perplexity: 15.8083caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [2771/6471], Loss: 2.6209, Perplexity: 13.7487caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [2772/6471], Loss: 2.1403, Perplexity: 8.5017caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [2773/6471], Loss: 2.2394, Perplexity: 9.3877caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [2774/6471], Loss: 2.6928, Perplexity: 14.7729caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [2775/6471], Loss: 2.5033, Perplexity: 12.2227caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [2776/6471], Loss: 2.4452, Perplexity: 11.5331caption shape:  torch.Size([6

Epoch [1/3], Step [2929/6471], Loss: 2.7223, Perplexity: 15.2153caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [2930/6471], Loss: 2.4635, Perplexity: 11.7458caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [2931/6471], Loss: 2.4721, Perplexity: 11.8478caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [2932/6471], Loss: 2.2784, Perplexity: 9.7613caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [2933/6471], Loss: 2.2604, Perplexity: 9.5869caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [2934/6471], Loss: 2.6129, Perplexity: 13.6380caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [2935/6471], Loss: 2.2919, Perplexity: 9.8937caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [2936/6471], Loss: 2.0797, Perplexity: 8.0024caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [2937/6471], Loss: 2.4136, Perplexity: 11.1745caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [2938/6471], Loss: 2.3723, Perplexity: 10.7219caption shape:  torch.Size([64,

Epoch [1/3], Step [3091/6471], Loss: 2.2413, Perplexity: 9.4055caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [3092/6471], Loss: 2.5732, Perplexity: 13.1083caption shape:  torch.Size([64, 17])
Epoch [1/3], Step [3093/6471], Loss: 2.7599, Perplexity: 15.7987caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [3094/6471], Loss: 2.4087, Perplexity: 11.1194caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [3095/6471], Loss: 2.3496, Perplexity: 10.4812caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [3096/6471], Loss: 2.3839, Perplexity: 10.8475caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [3097/6471], Loss: 2.1114, Perplexity: 8.2595caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [3098/6471], Loss: 2.4284, Perplexity: 11.3411caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [3099/6471], Loss: 2.4347, Perplexity: 11.4127caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [3100/6471], Loss: 2.2159, Perplexity: 9.1698
caption shape:  torch.Size([6

Epoch [1/3], Step [3253/6471], Loss: 2.5811, Perplexity: 13.2112caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [3254/6471], Loss: 2.3432, Perplexity: 10.4141caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [3255/6471], Loss: 2.3943, Perplexity: 10.9601caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [3256/6471], Loss: 2.6946, Perplexity: 14.7996caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [3257/6471], Loss: 2.2085, Perplexity: 9.1020caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [3258/6471], Loss: 2.5570, Perplexity: 12.8969caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [3259/6471], Loss: 2.4176, Perplexity: 11.2190caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [3260/6471], Loss: 2.1601, Perplexity: 8.6717caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [3261/6471], Loss: 2.2947, Perplexity: 9.9218caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [3262/6471], Loss: 2.5079, Perplexity: 12.2786caption shape:  torch.Size([64

Epoch [1/3], Step [3415/6471], Loss: 2.2891, Perplexity: 9.8658caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [3416/6471], Loss: 2.2953, Perplexity: 9.9278caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [3417/6471], Loss: 2.2927, Perplexity: 9.9015caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [3418/6471], Loss: 2.2841, Perplexity: 9.8172caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [3419/6471], Loss: 2.3144, Perplexity: 10.1193caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [3420/6471], Loss: 2.2283, Perplexity: 9.2844caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [3421/6471], Loss: 2.0978, Perplexity: 8.1486caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [3422/6471], Loss: 2.5109, Perplexity: 12.3163caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [3423/6471], Loss: 2.5275, Perplexity: 12.5218caption shape:  torch.Size([64, 23])
Epoch [1/3], Step [3424/6471], Loss: 3.2559, Perplexity: 25.9418caption shape:  torch.Size([64, 1

Epoch [1/3], Step [3577/6471], Loss: 2.4140, Perplexity: 11.1782caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [3578/6471], Loss: 2.3293, Perplexity: 10.2712caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [3579/6471], Loss: 2.2597, Perplexity: 9.5803caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [3580/6471], Loss: 2.2710, Perplexity: 9.6889caption shape:  torch.Size([64, 22])
Epoch [1/3], Step [3581/6471], Loss: 3.2541, Perplexity: 25.8974caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [3582/6471], Loss: 2.3586, Perplexity: 10.5762caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [3583/6471], Loss: 2.2042, Perplexity: 9.0631caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [3584/6471], Loss: 2.3908, Perplexity: 10.9218caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [3585/6471], Loss: 2.6257, Perplexity: 13.8139caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [3586/6471], Loss: 2.5412, Perplexity: 12.6952caption shape:  torch.Size([64

Epoch [1/3], Step [3739/6471], Loss: 2.4802, Perplexity: 11.9439caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [3740/6471], Loss: 2.4736, Perplexity: 11.8653caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [3741/6471], Loss: 2.3851, Perplexity: 10.8602caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [3742/6471], Loss: 2.4695, Perplexity: 11.8165caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [3743/6471], Loss: 2.4053, Perplexity: 11.0813caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [3744/6471], Loss: 2.2686, Perplexity: 9.6659caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [3745/6471], Loss: 2.0126, Perplexity: 7.4827caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [3746/6471], Loss: 2.3103, Perplexity: 10.0775caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [3747/6471], Loss: 2.4937, Perplexity: 12.1059caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [3748/6471], Loss: 2.3748, Perplexity: 10.7493caption shape:  torch.Size([6

Epoch [1/3], Step [3901/6471], Loss: 2.4138, Perplexity: 11.1760caption shape:  torch.Size([64, 17])
Epoch [1/3], Step [3902/6471], Loss: 2.6759, Perplexity: 14.5249caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [3903/6471], Loss: 2.4755, Perplexity: 11.8871caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [3904/6471], Loss: 2.0648, Perplexity: 7.8840caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [3905/6471], Loss: 2.3419, Perplexity: 10.4005caption shape:  torch.Size([64, 20])
Epoch [1/3], Step [3906/6471], Loss: 3.0594, Perplexity: 21.3151caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [3907/6471], Loss: 2.5808, Perplexity: 13.2081caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [3908/6471], Loss: 2.2725, Perplexity: 9.7037caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [3909/6471], Loss: 2.3330, Perplexity: 10.3084caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [3910/6471], Loss: 2.2700, Perplexity: 9.6794caption shape:  torch.Size([64

Epoch [1/3], Step [4063/6471], Loss: 2.2827, Perplexity: 9.8033caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4064/6471], Loss: 2.3074, Perplexity: 10.0481caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [4065/6471], Loss: 2.0780, Perplexity: 7.9889caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [4066/6471], Loss: 2.6160, Perplexity: 13.6810caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [4067/6471], Loss: 2.3161, Perplexity: 10.1357caption shape:  torch.Size([64, 18])
Epoch [1/3], Step [4068/6471], Loss: 2.7029, Perplexity: 14.9235caption shape:  torch.Size([64, 18])
Epoch [1/3], Step [4069/6471], Loss: 2.6520, Perplexity: 14.1827caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [4070/6471], Loss: 2.4926, Perplexity: 12.0928caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [4071/6471], Loss: 2.4560, Perplexity: 11.6578caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [4072/6471], Loss: 2.2633, Perplexity: 9.6150caption shape:  torch.Size([64

Epoch [1/3], Step [4225/6471], Loss: 2.2057, Perplexity: 9.0761caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [4226/6471], Loss: 2.2968, Perplexity: 9.9422caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [4227/6471], Loss: 2.2149, Perplexity: 9.1608caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [4228/6471], Loss: 2.4146, Perplexity: 11.1855caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [4229/6471], Loss: 2.1947, Perplexity: 8.9769caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4230/6471], Loss: 2.2975, Perplexity: 9.9496caption shape:  torch.Size([64, 19])
Epoch [1/3], Step [4231/6471], Loss: 2.7226, Perplexity: 15.2194caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [4232/6471], Loss: 2.2705, Perplexity: 9.6843caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [4233/6471], Loss: 2.3262, Perplexity: 10.2393caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [4234/6471], Loss: 2.1047, Perplexity: 8.2046caption shape:  torch.Size([64, 12

Epoch [1/3], Step [4307/6471], Loss: 2.1092, Perplexity: 8.2417caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [4308/6471], Loss: 2.1686, Perplexity: 8.7459caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [4309/6471], Loss: 2.2686, Perplexity: 9.6657caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [4310/6471], Loss: 2.6382, Perplexity: 13.9878caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [4311/6471], Loss: 2.4061, Perplexity: 11.0906caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [4312/6471], Loss: 2.3925, Perplexity: 10.9411caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4313/6471], Loss: 2.1356, Perplexity: 8.4621caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4314/6471], Loss: 2.0376, Perplexity: 7.6725caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [4315/6471], Loss: 2.2349, Perplexity: 9.3458caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [4316/6471], Loss: 2.3187, Perplexity: 10.1623caption shape:  torch.Size([64, 1

Epoch [1/3], Step [4469/6471], Loss: 2.1791, Perplexity: 8.8386caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [4470/6471], Loss: 2.2966, Perplexity: 9.9405caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4471/6471], Loss: 2.2399, Perplexity: 9.3920caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4472/6471], Loss: 2.3986, Perplexity: 11.0083caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [4473/6471], Loss: 2.2709, Perplexity: 9.6885caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [4474/6471], Loss: 2.2679, Perplexity: 9.6593caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [4475/6471], Loss: 2.2527, Perplexity: 9.5138caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [4476/6471], Loss: 2.1861, Perplexity: 8.9008caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4477/6471], Loss: 2.3291, Perplexity: 10.2692caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [4478/6471], Loss: 2.1645, Perplexity: 8.7105caption shape:  torch.Size([64, 16]

Epoch [1/3], Step [4631/6471], Loss: 2.4513, Perplexity: 11.6033caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [4632/6471], Loss: 2.1933, Perplexity: 8.9646caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [4633/6471], Loss: 2.4353, Perplexity: 11.4196caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [4634/6471], Loss: 2.1273, Perplexity: 8.3920caption shape:  torch.Size([64, 19])
Epoch [1/3], Step [4635/6471], Loss: 2.6336, Perplexity: 13.9242caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [4636/6471], Loss: 2.4406, Perplexity: 11.4796caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [4637/6471], Loss: 2.3850, Perplexity: 10.8594caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4638/6471], Loss: 2.2325, Perplexity: 9.3233caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [4639/6471], Loss: 2.1865, Perplexity: 8.9037caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [4640/6471], Loss: 2.3310, Perplexity: 10.2878caption shape:  torch.Size([64,

Epoch [1/3], Step [4793/6471], Loss: 2.5900, Perplexity: 13.3294caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [4794/6471], Loss: 1.9923, Perplexity: 7.3322caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [4795/6471], Loss: 2.2912, Perplexity: 9.8864caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [4796/6471], Loss: 2.5278, Perplexity: 12.5264caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [4797/6471], Loss: 2.1759, Perplexity: 8.8099caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [4798/6471], Loss: 2.3428, Perplexity: 10.4108caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [4799/6471], Loss: 2.1458, Perplexity: 8.5485caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4800/6471], Loss: 2.1912, Perplexity: 8.9463
caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4801/6471], Loss: 2.3421, Perplexity: 10.4031caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [4802/6471], Loss: 2.6804, Perplexity: 14.5909caption shape:  torch.Size([64,

Epoch [1/3], Step [4956/6471], Loss: 2.4633, Perplexity: 11.7431caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4957/6471], Loss: 2.2291, Perplexity: 9.2913caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [4958/6471], Loss: 2.1865, Perplexity: 8.9037caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [4959/6471], Loss: 2.0877, Perplexity: 8.0664caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4960/6471], Loss: 2.2747, Perplexity: 9.7245caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [4961/6471], Loss: 1.9951, Perplexity: 7.3532caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [4962/6471], Loss: 2.2504, Perplexity: 9.4916caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [4963/6471], Loss: 2.3406, Perplexity: 10.3872caption shape:  torch.Size([64, 22])
Epoch [1/3], Step [4964/6471], Loss: 3.1399, Perplexity: 23.1012caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [4965/6471], Loss: 2.2511, Perplexity: 9.4986caption shape:  torch.Size([64, 14

Epoch [1/3], Step [5119/6471], Loss: 2.2769, Perplexity: 9.7460caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5120/6471], Loss: 2.0241, Perplexity: 7.5692caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [5121/6471], Loss: 2.4772, Perplexity: 11.9082caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [5122/6471], Loss: 2.5067, Perplexity: 12.2650caption shape:  torch.Size([64, 17])
Epoch [1/3], Step [5123/6471], Loss: 2.6790, Perplexity: 14.5710caption shape:  torch.Size([64, 18])
Epoch [1/3], Step [5124/6471], Loss: 2.4495, Perplexity: 11.5825caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5125/6471], Loss: 2.2235, Perplexity: 9.2397caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5126/6471], Loss: 2.3074, Perplexity: 10.0479caption shape:  torch.Size([64, 25])
Epoch [1/3], Step [5127/6471], Loss: 3.0964, Perplexity: 22.1182caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [5128/6471], Loss: 2.1945, Perplexity: 8.9755caption shape:  torch.Size([64,

Epoch [1/3], Step [5281/6471], Loss: 2.1295, Perplexity: 8.4107caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5282/6471], Loss: 2.1053, Perplexity: 8.2096caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5283/6471], Loss: 2.1606, Perplexity: 8.6766caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [5284/6471], Loss: 2.4460, Perplexity: 11.5426caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5285/6471], Loss: 2.2868, Perplexity: 9.8433caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [5286/6471], Loss: 2.0942, Perplexity: 8.1189caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5287/6471], Loss: 2.2328, Perplexity: 9.3263caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5288/6471], Loss: 2.4650, Perplexity: 11.7637caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [5289/6471], Loss: 2.2646, Perplexity: 9.6276caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5290/6471], Loss: 2.1717, Perplexity: 8.7730caption shape:  torch.Size([64, 11]

Epoch [1/3], Step [5363/6471], Loss: 2.3453, Perplexity: 10.4368caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5364/6471], Loss: 2.1798, Perplexity: 8.8442caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5365/6471], Loss: 2.1857, Perplexity: 8.8965caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5366/6471], Loss: 2.3775, Perplexity: 10.7780caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5367/6471], Loss: 2.1213, Perplexity: 8.3418caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5368/6471], Loss: 2.1400, Perplexity: 8.4992caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5369/6471], Loss: 2.0239, Perplexity: 7.5674caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5370/6471], Loss: 2.3261, Perplexity: 10.2382caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5371/6471], Loss: 2.3105, Perplexity: 10.0791caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5372/6471], Loss: 2.0641, Perplexity: 7.8781caption shape:  torch.Size([64, 1

Epoch [1/3], Step [5445/6471], Loss: 2.1149, Perplexity: 8.2890caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5446/6471], Loss: 1.9149, Perplexity: 6.7866caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5447/6471], Loss: 2.1567, Perplexity: 8.6426caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [5448/6471], Loss: 2.5892, Perplexity: 13.3195caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5449/6471], Loss: 2.1298, Perplexity: 8.4134caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5450/6471], Loss: 2.1620, Perplexity: 8.6888caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [5451/6471], Loss: 2.4022, Perplexity: 11.0470caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5452/6471], Loss: 2.4033, Perplexity: 11.0598caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5453/6471], Loss: 2.1809, Perplexity: 8.8542caption shape:  torch.Size([64, 20])
Epoch [1/3], Step [5454/6471], Loss: 2.9386, Perplexity: 18.8894caption shape:  torch.Size([64, 1

Epoch [1/3], Step [5607/6471], Loss: 1.9455, Perplexity: 6.9974caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5608/6471], Loss: 2.2031, Perplexity: 9.0529caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5609/6471], Loss: 2.1418, Perplexity: 8.5144caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5610/6471], Loss: 2.0815, Perplexity: 8.0166caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5611/6471], Loss: 2.1194, Perplexity: 8.3263caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5612/6471], Loss: 2.0891, Perplexity: 8.0773caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5613/6471], Loss: 2.2760, Perplexity: 9.7374caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5614/6471], Loss: 2.1744, Perplexity: 8.7965caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5615/6471], Loss: 2.1180, Perplexity: 8.3142caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5616/6471], Loss: 2.1415, Perplexity: 8.5120caption shape:  torch.Size([64, 12])


Epoch [1/3], Step [5689/6471], Loss: 2.3109, Perplexity: 10.0835caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [5690/6471], Loss: 2.3729, Perplexity: 10.7287caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5691/6471], Loss: 2.0953, Perplexity: 8.1282caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5692/6471], Loss: 2.4127, Perplexity: 11.1645caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5693/6471], Loss: 2.0872, Perplexity: 8.0624caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5694/6471], Loss: 2.2301, Perplexity: 9.3011caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [5695/6471], Loss: 2.1665, Perplexity: 8.7280caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [5696/6471], Loss: 2.1705, Perplexity: 8.7628caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [5697/6471], Loss: 2.2509, Perplexity: 9.4964caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [5698/6471], Loss: 2.2676, Perplexity: 9.6565caption shape:  torch.Size([64, 11

Epoch [1/3], Step [5771/6471], Loss: 2.0445, Perplexity: 7.7256caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5772/6471], Loss: 2.0355, Perplexity: 7.6559caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [5773/6471], Loss: 2.1386, Perplexity: 8.4872caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5774/6471], Loss: 2.0061, Perplexity: 7.4345caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5775/6471], Loss: 2.1310, Perplexity: 8.4235caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [5776/6471], Loss: 2.3236, Perplexity: 10.2124caption shape:  torch.Size([64, 24])
Epoch [1/3], Step [5777/6471], Loss: 3.1248, Perplexity: 22.7546caption shape:  torch.Size([64, 17])
Epoch [1/3], Step [5778/6471], Loss: 2.5160, Perplexity: 12.3784caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [5779/6471], Loss: 2.2386, Perplexity: 9.3798caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5780/6471], Loss: 2.0105, Perplexity: 7.4668caption shape:  torch.Size([64, 10

Epoch [1/3], Step [5853/6471], Loss: 2.0933, Perplexity: 8.1120caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5854/6471], Loss: 2.1251, Perplexity: 8.3741caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [5855/6471], Loss: 2.1324, Perplexity: 8.4349caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5856/6471], Loss: 2.2267, Perplexity: 9.2689caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5857/6471], Loss: 1.9932, Perplexity: 7.3393caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5858/6471], Loss: 1.9980, Perplexity: 7.3744caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [5859/6471], Loss: 2.2624, Perplexity: 9.6060caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [5860/6471], Loss: 2.4217, Perplexity: 11.2654caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [5861/6471], Loss: 2.1896, Perplexity: 8.9315caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5862/6471], Loss: 2.2014, Perplexity: 9.0377caption shape:  torch.Size([64, 16])

Epoch [1/3], Step [5935/6471], Loss: 2.2393, Perplexity: 9.3867caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5936/6471], Loss: 1.9836, Perplexity: 7.2691caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5937/6471], Loss: 2.1460, Perplexity: 8.5503caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [5938/6471], Loss: 2.1480, Perplexity: 8.5681caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [5939/6471], Loss: 2.1441, Perplexity: 8.5341caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5940/6471], Loss: 2.1227, Perplexity: 8.3533caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5941/6471], Loss: 2.2305, Perplexity: 9.3044caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5942/6471], Loss: 2.1391, Perplexity: 8.4919caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [5943/6471], Loss: 2.0573, Perplexity: 7.8249caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [5944/6471], Loss: 2.1267, Perplexity: 8.3870caption shape:  torch.Size([64, 12])


Epoch [1/3], Step [6017/6471], Loss: 2.0353, Perplexity: 7.6543caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6018/6471], Loss: 2.0242, Perplexity: 7.5703caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [6019/6471], Loss: 1.8929, Perplexity: 6.6386caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [6020/6471], Loss: 2.4471, Perplexity: 11.5549caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6021/6471], Loss: 2.1647, Perplexity: 8.7116caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6022/6471], Loss: 2.4293, Perplexity: 11.3506caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [6023/6471], Loss: 2.1910, Perplexity: 8.9445caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [6024/6471], Loss: 2.5710, Perplexity: 13.0794caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [6025/6471], Loss: 2.1142, Perplexity: 8.2830caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [6026/6471], Loss: 2.1307, Perplexity: 8.4204caption shape:  torch.Size([64, 12

Epoch [1/3], Step [6099/6471], Loss: 2.1384, Perplexity: 8.4859caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [6100/6471], Loss: 2.0905, Perplexity: 8.0889
caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6101/6471], Loss: 2.0465, Perplexity: 7.7411caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [6102/6471], Loss: 2.2099, Perplexity: 9.1150caption shape:  torch.Size([64, 23])
Epoch [1/3], Step [6103/6471], Loss: 3.0054, Perplexity: 20.1944caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [6104/6471], Loss: 2.1335, Perplexity: 8.4441caption shape:  torch.Size([64, 19])
Epoch [1/3], Step [6105/6471], Loss: 2.8205, Perplexity: 16.7857caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6106/6471], Loss: 2.1886, Perplexity: 8.9231caption shape:  torch.Size([64, 18])
Epoch [1/3], Step [6107/6471], Loss: 2.6089, Perplexity: 13.5841caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [6108/6471], Loss: 2.1535, Perplexity: 8.6152caption shape:  torch.Size([64, 1

Epoch [1/3], Step [6181/6471], Loss: 2.2051, Perplexity: 9.0709caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6182/6471], Loss: 2.1446, Perplexity: 8.5383caption shape:  torch.Size([64, 17])
Epoch [1/3], Step [6183/6471], Loss: 2.4522, Perplexity: 11.6144caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6184/6471], Loss: 2.1414, Perplexity: 8.5112caption shape:  torch.Size([64, 10])
Epoch [1/3], Step [6185/6471], Loss: 2.3777, Perplexity: 10.7801caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [6186/6471], Loss: 2.0838, Perplexity: 8.0349caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [6187/6471], Loss: 2.0438, Perplexity: 7.7202caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [6188/6471], Loss: 2.2597, Perplexity: 9.5802caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6189/6471], Loss: 2.2877, Perplexity: 9.8521caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [6190/6471], Loss: 2.2201, Perplexity: 9.2087caption shape:  torch.Size([64, 12]

Epoch [1/3], Step [6263/6471], Loss: 2.4335, Perplexity: 11.3990caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6264/6471], Loss: 1.9877, Perplexity: 7.2984caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [6265/6471], Loss: 2.4635, Perplexity: 11.7455caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6266/6471], Loss: 2.2981, Perplexity: 9.9551caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6267/6471], Loss: 2.2333, Perplexity: 9.3303caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [6268/6471], Loss: 2.2721, Perplexity: 9.7001caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6269/6471], Loss: 2.3219, Perplexity: 10.1948caption shape:  torch.Size([64, 17])
Epoch [1/3], Step [6270/6471], Loss: 2.4868, Perplexity: 12.0228caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [6271/6471], Loss: 2.1713, Perplexity: 8.7698caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [6272/6471], Loss: 2.1165, Perplexity: 8.3020caption shape:  torch.Size([64, 1

Epoch [1/3], Step [6345/6471], Loss: 2.1556, Perplexity: 8.6335caption shape:  torch.Size([64, 12])
Epoch [1/3], Step [6346/6471], Loss: 2.2393, Perplexity: 9.3872caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [6347/6471], Loss: 2.1590, Perplexity: 8.6625caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [6348/6471], Loss: 1.9742, Perplexity: 7.2011caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [6349/6471], Loss: 2.1699, Perplexity: 8.7572caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6350/6471], Loss: 2.4322, Perplexity: 11.3840caption shape:  torch.Size([64, 16])
Epoch [1/3], Step [6351/6471], Loss: 2.4599, Perplexity: 11.7035caption shape:  torch.Size([64, 22])
Epoch [1/3], Step [6352/6471], Loss: 3.0352, Perplexity: 20.8050caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6353/6471], Loss: 2.1806, Perplexity: 8.8515caption shape:  torch.Size([64, 14])
Epoch [1/3], Step [6354/6471], Loss: 2.2730, Perplexity: 9.7090caption shape:  torch.Size([64, 13

Epoch [1/3], Step [6427/6471], Loss: 2.0728, Perplexity: 7.9471caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [6428/6471], Loss: 2.1005, Perplexity: 8.1701caption shape:  torch.Size([64, 15])
Epoch [1/3], Step [6429/6471], Loss: 2.3076, Perplexity: 10.0505caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [6430/6471], Loss: 2.1670, Perplexity: 8.7319caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [6431/6471], Loss: 1.8785, Perplexity: 6.5437caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [6432/6471], Loss: 2.1372, Perplexity: 8.4756caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [6433/6471], Loss: 2.1556, Perplexity: 8.6331caption shape:  torch.Size([64, 11])
Epoch [1/3], Step [6434/6471], Loss: 2.1192, Perplexity: 8.3247caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [6435/6471], Loss: 1.9753, Perplexity: 7.2086caption shape:  torch.Size([64, 13])
Epoch [1/3], Step [6436/6471], Loss: 1.9208, Perplexity: 6.8267caption shape:  torch.Size([64, 15])

Epoch [2/3], Step [121/6471], Loss: 2.1217, Perplexity: 8.3453caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [122/6471], Loss: 2.5671, Perplexity: 13.0275caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [123/6471], Loss: 2.1279, Perplexity: 8.3973caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [124/6471], Loss: 2.3036, Perplexity: 10.0099caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [125/6471], Loss: 2.1402, Perplexity: 8.5010caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [126/6471], Loss: 2.0690, Perplexity: 7.9166caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [127/6471], Loss: 1.9578, Perplexity: 7.0835caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [128/6471], Loss: 2.5521, Perplexity: 12.8335caption shape:  torch.Size([64, 21])
Epoch [2/3], Step [129/6471], Loss: 2.8623, Perplexity: 17.5021caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [130/6471], Loss: 2.0255, Perplexity: 7.5796caption shape:  torch.Size([64, 11])
Epoch 

Epoch [2/3], Step [285/6471], Loss: 2.2571, Perplexity: 9.5552caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [286/6471], Loss: 2.1369, Perplexity: 8.4731caption shape:  torch.Size([64, 17])
Epoch [2/3], Step [287/6471], Loss: 2.3493, Perplexity: 10.4785caption shape:  torch.Size([64, 17])
Epoch [2/3], Step [288/6471], Loss: 2.5221, Perplexity: 12.4550caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [289/6471], Loss: 2.0915, Perplexity: 8.0969caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [290/6471], Loss: 2.0570, Perplexity: 7.8228caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [291/6471], Loss: 2.1184, Perplexity: 8.3176caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [292/6471], Loss: 2.4135, Perplexity: 11.1725caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [293/6471], Loss: 2.4913, Perplexity: 12.0765caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [294/6471], Loss: 2.2224, Perplexity: 9.2293caption shape:  torch.Size([64, 13])
Epoch 

Epoch [2/3], Step [449/6471], Loss: 2.0130, Perplexity: 7.4859caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [450/6471], Loss: 1.9110, Perplexity: 6.7596caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [451/6471], Loss: 1.9393, Perplexity: 6.9537caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [452/6471], Loss: 2.3010, Perplexity: 9.9840caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [453/6471], Loss: 2.0136, Perplexity: 7.4899caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [454/6471], Loss: 2.1060, Perplexity: 8.2155caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [455/6471], Loss: 2.0970, Perplexity: 8.1419caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [456/6471], Loss: 2.0842, Perplexity: 8.0382caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [457/6471], Loss: 2.2313, Perplexity: 9.3120caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [458/6471], Loss: 2.1919, Perplexity: 8.9523caption shape:  torch.Size([64, 11])
Epoch [2/3

Epoch [2/3], Step [613/6471], Loss: 2.0118, Perplexity: 7.4766caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [614/6471], Loss: 1.9765, Perplexity: 7.2173caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [615/6471], Loss: 2.1227, Perplexity: 8.3540caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [616/6471], Loss: 2.3150, Perplexity: 10.1254caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [617/6471], Loss: 2.1430, Perplexity: 8.5254caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [618/6471], Loss: 2.2336, Perplexity: 9.3336caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [619/6471], Loss: 2.3857, Perplexity: 10.8666caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [620/6471], Loss: 2.2384, Perplexity: 9.3779caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [621/6471], Loss: 2.1037, Perplexity: 8.1961caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [622/6471], Loss: 2.0618, Perplexity: 7.8599caption shape:  torch.Size([64, 11])
Epoch [2

Epoch [2/3], Step [777/6471], Loss: 2.1221, Perplexity: 8.3489caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [778/6471], Loss: 1.9923, Perplexity: 7.3321caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [779/6471], Loss: 1.9915, Perplexity: 7.3265caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [780/6471], Loss: 2.0926, Perplexity: 8.1059caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [781/6471], Loss: 1.9696, Perplexity: 7.1681caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [782/6471], Loss: 2.1609, Perplexity: 8.6788caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [783/6471], Loss: 2.4059, Perplexity: 11.0883caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [784/6471], Loss: 2.0342, Perplexity: 7.6461caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [785/6471], Loss: 2.0680, Perplexity: 7.9086caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [786/6471], Loss: 2.2249, Perplexity: 9.2528caption shape:  torch.Size([64, 13])
Epoch [2/

Epoch [2/3], Step [941/6471], Loss: 2.1769, Perplexity: 8.8188caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [942/6471], Loss: 2.2199, Perplexity: 9.2063caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [943/6471], Loss: 2.1375, Perplexity: 8.4785caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [944/6471], Loss: 2.1722, Perplexity: 8.7775caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [945/6471], Loss: 2.1208, Perplexity: 8.3375caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [946/6471], Loss: 2.3083, Perplexity: 10.0568caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [947/6471], Loss: 2.2224, Perplexity: 9.2290caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [948/6471], Loss: 2.2172, Perplexity: 9.1816caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [949/6471], Loss: 2.2156, Perplexity: 9.1668caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [950/6471], Loss: 2.1430, Perplexity: 8.5247caption shape:  torch.Size([64, 11])
Epoch [2/

Epoch [2/3], Step [1105/6471], Loss: 2.1426, Perplexity: 8.5212caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1106/6471], Loss: 2.2092, Perplexity: 9.1081caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1107/6471], Loss: 2.1520, Perplexity: 8.6023caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1108/6471], Loss: 2.0900, Perplexity: 8.0850caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1109/6471], Loss: 2.2066, Perplexity: 9.0847caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1110/6471], Loss: 2.1537, Perplexity: 8.6169caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [1111/6471], Loss: 2.0051, Perplexity: 7.4272caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [1112/6471], Loss: 2.1726, Perplexity: 8.7812caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1113/6471], Loss: 2.1997, Perplexity: 9.0221caption shape:  torch.Size([64, 21])
Epoch [2/3], Step [1114/6471], Loss: 2.9547, Perplexity: 19.1965caption shape:  torch.Size([64, 11])

Epoch [2/3], Step [1187/6471], Loss: 2.0022, Perplexity: 7.4056caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1188/6471], Loss: 2.0028, Perplexity: 7.4097caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1189/6471], Loss: 2.2219, Perplexity: 9.2252caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1190/6471], Loss: 2.0464, Perplexity: 7.7398caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1191/6471], Loss: 1.9383, Perplexity: 6.9470caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [1192/6471], Loss: 2.3095, Perplexity: 10.0692caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1193/6471], Loss: 2.1734, Perplexity: 8.7877caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1194/6471], Loss: 1.8643, Perplexity: 6.4513caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1195/6471], Loss: 2.0086, Perplexity: 7.4531caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [1196/6471], Loss: 2.0397, Perplexity: 7.6887caption shape:  torch.Size([64, 15])

Epoch [2/3], Step [1269/6471], Loss: 2.0328, Perplexity: 7.6352caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [1270/6471], Loss: 2.3029, Perplexity: 10.0029caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [1271/6471], Loss: 2.2108, Perplexity: 9.1232caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1272/6471], Loss: 2.0763, Perplexity: 7.9749caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [1273/6471], Loss: 2.1628, Perplexity: 8.6953caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1274/6471], Loss: 2.1925, Perplexity: 8.9576caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1275/6471], Loss: 1.8547, Perplexity: 6.3901caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [1276/6471], Loss: 2.1062, Perplexity: 8.2167caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1277/6471], Loss: 1.9959, Perplexity: 7.3586caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1278/6471], Loss: 1.8197, Perplexity: 6.1703caption shape:  torch.Size([64, 12])

Epoch [2/3], Step [1351/6471], Loss: 2.5455, Perplexity: 12.7493caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [1352/6471], Loss: 2.1674, Perplexity: 8.7352caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1353/6471], Loss: 2.0123, Perplexity: 7.4805caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1354/6471], Loss: 2.0849, Perplexity: 8.0441caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [1355/6471], Loss: 1.9611, Perplexity: 7.1072caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1356/6471], Loss: 2.3306, Perplexity: 10.2839caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1357/6471], Loss: 2.0161, Perplexity: 7.5090caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1358/6471], Loss: 1.8957, Perplexity: 6.6570caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1359/6471], Loss: 2.0214, Perplexity: 7.5486caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1360/6471], Loss: 1.9447, Perplexity: 6.9918caption shape:  torch.Size([64, 14]

Epoch [2/3], Step [1433/6471], Loss: 1.8976, Perplexity: 6.6701caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [1434/6471], Loss: 2.0417, Perplexity: 7.7037caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [1435/6471], Loss: 2.1978, Perplexity: 9.0056caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [1436/6471], Loss: 2.1971, Perplexity: 8.9990caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1437/6471], Loss: 2.1446, Perplexity: 8.5383caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1438/6471], Loss: 2.1359, Perplexity: 8.4647caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [1439/6471], Loss: 1.9001, Perplexity: 6.6868caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1440/6471], Loss: 1.9703, Perplexity: 7.1732caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [1441/6471], Loss: 2.0593, Perplexity: 7.8404caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1442/6471], Loss: 1.9352, Perplexity: 6.9253caption shape:  torch.Size([64, 12])


Epoch [2/3], Step [1515/6471], Loss: 1.9772, Perplexity: 7.2222caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1516/6471], Loss: 2.1479, Perplexity: 8.5672caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [1517/6471], Loss: 2.0213, Perplexity: 7.5480caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1518/6471], Loss: 2.0177, Perplexity: 7.5209caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [1519/6471], Loss: 2.1830, Perplexity: 8.8727caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [1520/6471], Loss: 2.2826, Perplexity: 9.8023caption shape:  torch.Size([64, 18])
Epoch [2/3], Step [1521/6471], Loss: 2.5913, Perplexity: 13.3471caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [1522/6471], Loss: 2.4815, Perplexity: 11.9596caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1523/6471], Loss: 2.1150, Perplexity: 8.2893caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [1524/6471], Loss: 2.2934, Perplexity: 9.9083caption shape:  torch.Size([64, 12]

Epoch [2/3], Step [1597/6471], Loss: 2.9010, Perplexity: 18.1919caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1598/6471], Loss: 2.1289, Perplexity: 8.4053caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [1599/6471], Loss: 2.5153, Perplexity: 12.3706caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [1600/6471], Loss: 2.0555, Perplexity: 7.8109
caption shape:  torch.Size([64, 17])
Epoch [2/3], Step [1601/6471], Loss: 2.6300, Perplexity: 13.8742caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [1602/6471], Loss: 2.0878, Perplexity: 8.0671caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [1603/6471], Loss: 2.2697, Perplexity: 9.6762caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1604/6471], Loss: 2.1224, Perplexity: 8.3515caption shape:  torch.Size([64, 20])
Epoch [2/3], Step [1605/6471], Loss: 2.8205, Perplexity: 16.7854caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [1606/6471], Loss: 2.2447, Perplexity: 9.4378caption shape:  torch.Size([64, 

Epoch [2/3], Step [1679/6471], Loss: 2.0707, Perplexity: 7.9305caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [1680/6471], Loss: 2.1715, Perplexity: 8.7716caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [1681/6471], Loss: 2.1698, Perplexity: 8.7565caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [1682/6471], Loss: 2.1658, Perplexity: 8.7219caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1683/6471], Loss: 2.1067, Perplexity: 8.2212caption shape:  torch.Size([64, 18])
Epoch [2/3], Step [1684/6471], Loss: 2.3797, Perplexity: 10.8020caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [1685/6471], Loss: 2.5245, Perplexity: 12.4851caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [1686/6471], Loss: 2.0387, Perplexity: 7.6804caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1687/6471], Loss: 2.1340, Perplexity: 8.4489caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1688/6471], Loss: 1.9945, Perplexity: 7.3488caption shape:  torch.Size([64, 13]

Epoch [2/3], Step [1761/6471], Loss: 1.9232, Perplexity: 6.8429caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1762/6471], Loss: 2.1788, Perplexity: 8.8353caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [1763/6471], Loss: 2.2292, Perplexity: 9.2923caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1764/6471], Loss: 2.1644, Perplexity: 8.7094caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [1765/6471], Loss: 2.0063, Perplexity: 7.4357caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [1766/6471], Loss: 1.9750, Perplexity: 7.2069caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [1767/6471], Loss: 2.2545, Perplexity: 9.5309caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1768/6471], Loss: 1.9739, Perplexity: 7.1988caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1769/6471], Loss: 2.1945, Perplexity: 8.9755caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [1770/6471], Loss: 2.0235, Perplexity: 7.5650caption shape:  torch.Size([64, 13])


Epoch [2/3], Step [1843/6471], Loss: 2.2406, Perplexity: 9.3993caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1844/6471], Loss: 2.1200, Perplexity: 8.3308caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1845/6471], Loss: 1.9821, Perplexity: 7.2578caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [1846/6471], Loss: 2.1209, Perplexity: 8.3386caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [1847/6471], Loss: 2.0473, Perplexity: 7.7473caption shape:  torch.Size([64, 19])
Epoch [2/3], Step [1848/6471], Loss: 2.5018, Perplexity: 12.2047caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [1849/6471], Loss: 2.0568, Perplexity: 7.8209caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [1850/6471], Loss: 2.5042, Perplexity: 12.2336caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [1851/6471], Loss: 1.9287, Perplexity: 6.8807caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [1852/6471], Loss: 2.0093, Perplexity: 7.4583caption shape:  torch.Size([64, 12]

Epoch [2/3], Step [1925/6471], Loss: 2.3657, Perplexity: 10.6514caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [1926/6471], Loss: 2.5414, Perplexity: 12.6981caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1927/6471], Loss: 2.1008, Perplexity: 8.1729caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1928/6471], Loss: 2.0596, Perplexity: 7.8432caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [1929/6471], Loss: 2.1026, Perplexity: 8.1872caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1930/6471], Loss: 2.2896, Perplexity: 9.8711caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [1931/6471], Loss: 2.1242, Perplexity: 8.3658caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1932/6471], Loss: 2.0613, Perplexity: 7.8562caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1933/6471], Loss: 1.8514, Perplexity: 6.3685caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [1934/6471], Loss: 1.9996, Perplexity: 7.3860caption shape:  torch.Size([64, 16]

Epoch [2/3], Step [2007/6471], Loss: 2.3818, Perplexity: 10.8240caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2008/6471], Loss: 2.2110, Perplexity: 9.1249caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2009/6471], Loss: 2.0814, Perplexity: 8.0155caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2010/6471], Loss: 1.8455, Perplexity: 6.3313caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [2011/6471], Loss: 2.1562, Perplexity: 8.6387caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2012/6471], Loss: 2.1860, Perplexity: 8.8996caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [2013/6471], Loss: 2.3020, Perplexity: 9.9938caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [2014/6471], Loss: 2.3756, Perplexity: 10.7578caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2015/6471], Loss: 2.0812, Perplexity: 8.0138caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [2016/6471], Loss: 2.0657, Perplexity: 7.8910caption shape:  torch.Size([64, 13]

Epoch [2/3], Step [2089/6471], Loss: 2.0508, Perplexity: 7.7742caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2090/6471], Loss: 2.0749, Perplexity: 7.9639caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2091/6471], Loss: 2.2968, Perplexity: 9.9423caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2092/6471], Loss: 1.9250, Perplexity: 6.8551caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2093/6471], Loss: 1.9230, Perplexity: 6.8416caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [2094/6471], Loss: 2.1581, Perplexity: 8.6544caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [2095/6471], Loss: 2.1177, Perplexity: 8.3121caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2096/6471], Loss: 2.1344, Perplexity: 8.4519caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2097/6471], Loss: 1.9610, Perplexity: 7.1062caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2098/6471], Loss: 1.8671, Perplexity: 6.4694caption shape:  torch.Size([64, 10])


Epoch [2/3], Step [2171/6471], Loss: 2.1037, Perplexity: 8.1964caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [2172/6471], Loss: 2.2620, Perplexity: 9.6027caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [2173/6471], Loss: 2.4334, Perplexity: 11.3979caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [2174/6471], Loss: 2.1836, Perplexity: 8.8785caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2175/6471], Loss: 2.1317, Perplexity: 8.4291caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2176/6471], Loss: 2.0932, Perplexity: 8.1106caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2177/6471], Loss: 2.0055, Perplexity: 7.4296caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2178/6471], Loss: 1.9238, Perplexity: 6.8472caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2179/6471], Loss: 2.1243, Perplexity: 8.3667caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2180/6471], Loss: 2.1421, Perplexity: 8.5170caption shape:  torch.Size([64, 11])

Epoch [2/3], Step [2253/6471], Loss: 2.1983, Perplexity: 9.0097caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2254/6471], Loss: 2.0868, Perplexity: 8.0592caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2255/6471], Loss: 1.9188, Perplexity: 6.8128caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2256/6471], Loss: 1.5915, Perplexity: 4.9112caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2257/6471], Loss: 1.9027, Perplexity: 6.7038caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2258/6471], Loss: 2.1083, Perplexity: 8.2342caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2259/6471], Loss: 2.1393, Perplexity: 8.4936caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2260/6471], Loss: 1.9429, Perplexity: 6.9791caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2261/6471], Loss: 2.0953, Perplexity: 8.1282caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2262/6471], Loss: 2.0859, Perplexity: 8.0518caption shape:  torch.Size([64, 13])


Epoch [2/3], Step [2335/6471], Loss: 2.0158, Perplexity: 7.5069caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2336/6471], Loss: 1.9495, Perplexity: 7.0250caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2337/6471], Loss: 1.9674, Perplexity: 7.1522caption shape:  torch.Size([64, 19])
Epoch [2/3], Step [2338/6471], Loss: 2.7003, Perplexity: 14.8845caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2339/6471], Loss: 1.9111, Perplexity: 6.7608caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2340/6471], Loss: 2.0243, Perplexity: 7.5708caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2341/6471], Loss: 2.0689, Perplexity: 7.9164caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2342/6471], Loss: 2.0633, Perplexity: 7.8719caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2343/6471], Loss: 2.1001, Perplexity: 8.1669caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2344/6471], Loss: 1.9636, Perplexity: 7.1247caption shape:  torch.Size([64, 15])

Epoch [2/3], Step [2417/6471], Loss: 2.1462, Perplexity: 8.5520caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2418/6471], Loss: 2.2802, Perplexity: 9.7791caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2419/6471], Loss: 1.9637, Perplexity: 7.1255caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [2420/6471], Loss: 2.4239, Perplexity: 11.2894caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2421/6471], Loss: 2.1779, Perplexity: 8.8276caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2422/6471], Loss: 2.0294, Perplexity: 7.6098caption shape:  torch.Size([64, 18])
Epoch [2/3], Step [2423/6471], Loss: 2.5851, Perplexity: 13.2648caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2424/6471], Loss: 2.0522, Perplexity: 7.7847caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2425/6471], Loss: 2.1270, Perplexity: 8.3895caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2426/6471], Loss: 2.0676, Perplexity: 7.9062caption shape:  torch.Size([64, 15]

Epoch [2/3], Step [2499/6471], Loss: 2.0807, Perplexity: 8.0099caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2500/6471], Loss: 1.9725, Perplexity: 7.1884
caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2501/6471], Loss: 1.9530, Perplexity: 7.0499caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2502/6471], Loss: 2.0142, Perplexity: 7.4949caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2503/6471], Loss: 1.9078, Perplexity: 6.7383caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2504/6471], Loss: 1.9180, Perplexity: 6.8071caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2505/6471], Loss: 2.1458, Perplexity: 8.5492caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2506/6471], Loss: 1.9533, Perplexity: 7.0523caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [2507/6471], Loss: 2.2675, Perplexity: 9.6548caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2508/6471], Loss: 2.1688, Perplexity: 8.7475caption shape:  torch.Size([64, 15])

Epoch [2/3], Step [2581/6471], Loss: 2.1777, Perplexity: 8.8262caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2582/6471], Loss: 2.1697, Perplexity: 8.7554caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2583/6471], Loss: 2.0450, Perplexity: 7.7293caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2584/6471], Loss: 1.9602, Perplexity: 7.1004caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [2585/6471], Loss: 2.2691, Perplexity: 9.6711caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2586/6471], Loss: 2.2432, Perplexity: 9.4236caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2587/6471], Loss: 2.1751, Perplexity: 8.8030caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2588/6471], Loss: 1.9807, Perplexity: 7.2480caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [2589/6471], Loss: 2.3247, Perplexity: 10.2235caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2590/6471], Loss: 2.3022, Perplexity: 9.9960caption shape:  torch.Size([64, 13])

Epoch [2/3], Step [2663/6471], Loss: 2.1266, Perplexity: 8.3866caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2664/6471], Loss: 2.1673, Perplexity: 8.7346caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2665/6471], Loss: 2.1045, Perplexity: 8.2028caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2666/6471], Loss: 1.9503, Perplexity: 7.0310caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2667/6471], Loss: 1.9395, Perplexity: 6.9554caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [2668/6471], Loss: 2.3411, Perplexity: 10.3932caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2669/6471], Loss: 2.2576, Perplexity: 9.5603caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2670/6471], Loss: 2.0420, Perplexity: 7.7062caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2671/6471], Loss: 1.9527, Perplexity: 7.0479caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2672/6471], Loss: 2.1652, Perplexity: 8.7168caption shape:  torch.Size([64, 14])

Epoch [2/3], Step [2745/6471], Loss: 2.2186, Perplexity: 9.1945caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2746/6471], Loss: 2.0384, Perplexity: 7.6784caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2747/6471], Loss: 2.1976, Perplexity: 9.0035caption shape:  torch.Size([64, 17])
Epoch [2/3], Step [2748/6471], Loss: 2.3466, Perplexity: 10.4497caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2749/6471], Loss: 1.9256, Perplexity: 6.8589caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2750/6471], Loss: 2.1003, Perplexity: 8.1685caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2751/6471], Loss: 2.0825, Perplexity: 8.0247caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2752/6471], Loss: 1.8471, Perplexity: 6.3413caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2753/6471], Loss: 1.8507, Perplexity: 6.3640caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2754/6471], Loss: 1.9791, Perplexity: 7.2359caption shape:  torch.Size([64, 15])

Epoch [2/3], Step [2827/6471], Loss: 1.9116, Perplexity: 6.7637caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2828/6471], Loss: 2.0388, Perplexity: 7.6817caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2829/6471], Loss: 1.9308, Perplexity: 6.8948caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2830/6471], Loss: 2.0497, Perplexity: 7.7658caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2831/6471], Loss: 2.0851, Perplexity: 8.0454caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2832/6471], Loss: 2.0129, Perplexity: 7.4848caption shape:  torch.Size([64, 23])
Epoch [2/3], Step [2833/6471], Loss: 2.9710, Perplexity: 19.5111caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2834/6471], Loss: 2.0021, Perplexity: 7.4043caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2835/6471], Loss: 1.9937, Perplexity: 7.3426caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2836/6471], Loss: 1.8991, Perplexity: 6.6801caption shape:  torch.Size([64, 11])

Epoch [2/3], Step [2909/6471], Loss: 1.9945, Perplexity: 7.3487caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2910/6471], Loss: 1.9908, Perplexity: 7.3213caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [2911/6471], Loss: 2.1944, Perplexity: 8.9745caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [2912/6471], Loss: 2.1802, Perplexity: 8.8481caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2913/6471], Loss: 2.1457, Perplexity: 8.5480caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2914/6471], Loss: 2.0266, Perplexity: 7.5883caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2915/6471], Loss: 1.9685, Perplexity: 7.1599caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [2916/6471], Loss: 2.1365, Perplexity: 8.4695caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2917/6471], Loss: 2.1105, Perplexity: 8.2520caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2918/6471], Loss: 2.0624, Perplexity: 7.8651caption shape:  torch.Size([64, 13])


Epoch [2/3], Step [2991/6471], Loss: 2.1370, Perplexity: 8.4737caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [2992/6471], Loss: 2.2547, Perplexity: 9.5326caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [2993/6471], Loss: 2.0584, Perplexity: 7.8332caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [2994/6471], Loss: 2.2284, Perplexity: 9.2852caption shape:  torch.Size([64, 20])
Epoch [2/3], Step [2995/6471], Loss: 2.7664, Perplexity: 15.9017caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [2996/6471], Loss: 2.2214, Perplexity: 9.2200caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2997/6471], Loss: 2.0202, Perplexity: 7.5398caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [2998/6471], Loss: 2.0457, Perplexity: 7.7348caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [2999/6471], Loss: 2.1027, Perplexity: 8.1879caption shape:  torch.Size([64, 17])
Epoch [2/3], Step [3000/6471], Loss: 2.3137, Perplexity: 10.1118
caption shape:  torch.Size([64, 13

Epoch [2/3], Step [3073/6471], Loss: 1.9692, Perplexity: 7.1649caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3074/6471], Loss: 2.0942, Perplexity: 8.1193caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3166/6471], Loss: 2.2630, Perplexity: 9.6117caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [3167/6471], Loss: 2.3619, Perplexity: 10.6110caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3168/6471], Loss: 1.8718, Perplexity: 6.5002caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3169/6471], Loss: 1.9702, Perplexity: 7.1720caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3170/6471], Loss: 2.0257, Perplexity: 7.5814caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3171/6471], Loss: 1.9851, Perplexity: 7.2796caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3172/6471], Loss: 2.0386, Perplexity: 7.6802caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3173/6471], Loss: 2.1345, Perplexity: 8.4528caption shape:  torch.Size([64, 23])

Epoch [2/3], Step [3246/6471], Loss: 2.7631, Perplexity: 15.8484caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [3247/6471], Loss: 2.0821, Perplexity: 8.0210caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3248/6471], Loss: 2.1872, Perplexity: 8.9099caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3249/6471], Loss: 1.8958, Perplexity: 6.6576caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3250/6471], Loss: 2.0577, Perplexity: 7.8276caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3251/6471], Loss: 1.8878, Perplexity: 6.6045caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3252/6471], Loss: 1.9528, Perplexity: 7.0481caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3253/6471], Loss: 2.0071, Perplexity: 7.4420caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3254/6471], Loss: 1.9892, Perplexity: 7.3100caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [3255/6471], Loss: 2.1286, Perplexity: 8.4029caption shape:  torch.Size([64, 13])

Epoch [2/3], Step [3328/6471], Loss: 1.9737, Perplexity: 7.1975caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [3329/6471], Loss: 2.1753, Perplexity: 8.8052caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [3330/6471], Loss: 2.1442, Perplexity: 8.5350caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3331/6471], Loss: 2.2079, Perplexity: 9.0962caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3332/6471], Loss: 1.9026, Perplexity: 6.7030caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [3333/6471], Loss: 2.3910, Perplexity: 10.9239caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [3334/6471], Loss: 1.9387, Perplexity: 6.9498caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3335/6471], Loss: 1.9922, Perplexity: 7.3314caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3336/6471], Loss: 2.0497, Perplexity: 7.7659caption shape:  torch.Size([64, 18])
Epoch [2/3], Step [3337/6471], Loss: 2.3467, Perplexity: 10.4512caption shape:  torch.Size([64, 11]

Epoch [2/3], Step [3410/6471], Loss: 1.9769, Perplexity: 7.2204caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3411/6471], Loss: 2.2455, Perplexity: 9.4453caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [3412/6471], Loss: 2.4300, Perplexity: 11.3586caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3413/6471], Loss: 1.9964, Perplexity: 7.3622caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [3414/6471], Loss: 2.3274, Perplexity: 10.2514caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [3415/6471], Loss: 2.2537, Perplexity: 9.5232caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3416/6471], Loss: 2.0145, Perplexity: 7.4969caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3417/6471], Loss: 2.0887, Perplexity: 8.0747caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3418/6471], Loss: 2.1089, Perplexity: 8.2393caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3419/6471], Loss: 2.0588, Perplexity: 7.8368caption shape:  torch.Size([64, 12]

Epoch [2/3], Step [3492/6471], Loss: 2.2138, Perplexity: 9.1507caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3493/6471], Loss: 1.8717, Perplexity: 6.4991caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3494/6471], Loss: 1.9069, Perplexity: 6.7321caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3495/6471], Loss: 2.0794, Perplexity: 7.9998caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3496/6471], Loss: 2.1399, Perplexity: 8.4986caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3497/6471], Loss: 2.2245, Perplexity: 9.2492caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3498/6471], Loss: 1.9413, Perplexity: 6.9680caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3499/6471], Loss: 2.2052, Perplexity: 9.0717caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3500/6471], Loss: 1.9816, Perplexity: 7.2545
caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3501/6471], Loss: 2.0671, Perplexity: 7.9015caption shape:  torch.Size([64, 11])

Epoch [2/3], Step [3574/6471], Loss: 2.1113, Perplexity: 8.2590caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3575/6471], Loss: 2.0473, Perplexity: 7.7468caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [3576/6471], Loss: 2.4228, Perplexity: 11.2772caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [3577/6471], Loss: 2.3116, Perplexity: 10.0902caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [3578/6471], Loss: 2.1280, Perplexity: 8.3980caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [3579/6471], Loss: 2.3614, Perplexity: 10.6058caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3580/6471], Loss: 2.0625, Perplexity: 7.8656caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3581/6471], Loss: 2.1240, Perplexity: 8.3642caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3582/6471], Loss: 2.0670, Perplexity: 7.9013caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3583/6471], Loss: 2.1898, Perplexity: 8.9337caption shape:  torch.Size([64, 17

Epoch [2/3], Step [3656/6471], Loss: 1.9904, Perplexity: 7.3183caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3657/6471], Loss: 2.2830, Perplexity: 9.8058caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3658/6471], Loss: 2.2430, Perplexity: 9.4212caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3659/6471], Loss: 1.9453, Perplexity: 6.9960caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3660/6471], Loss: 2.1731, Perplexity: 8.7859caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [3661/6471], Loss: 2.3361, Perplexity: 10.3404caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3662/6471], Loss: 2.0891, Perplexity: 8.0774caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3663/6471], Loss: 2.0783, Perplexity: 7.9910caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3664/6471], Loss: 2.1016, Perplexity: 8.1796caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3665/6471], Loss: 2.0758, Perplexity: 7.9708caption shape:  torch.Size([64, 11])

Epoch [2/3], Step [3738/6471], Loss: 2.0769, Perplexity: 7.9795caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3739/6471], Loss: 2.1924, Perplexity: 8.9563caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3740/6471], Loss: 2.1248, Perplexity: 8.3711caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3741/6471], Loss: 1.9043, Perplexity: 6.7148caption shape:  torch.Size([64, 23])
Epoch [2/3], Step [3742/6471], Loss: 2.8392, Perplexity: 17.1018caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3743/6471], Loss: 2.1477, Perplexity: 8.5652caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3744/6471], Loss: 1.9543, Perplexity: 7.0593caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3745/6471], Loss: 1.8727, Perplexity: 6.5058caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3746/6471], Loss: 1.9083, Perplexity: 6.7419caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3747/6471], Loss: 1.9480, Perplexity: 7.0147caption shape:  torch.Size([64, 14])

Epoch [2/3], Step [3820/6471], Loss: 2.0265, Perplexity: 7.5873caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3821/6471], Loss: 1.8304, Perplexity: 6.2367caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3822/6471], Loss: 1.9750, Perplexity: 7.2068caption shape:  torch.Size([64, 25])
Epoch [2/3], Step [3823/6471], Loss: 3.0890, Perplexity: 21.9551caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3824/6471], Loss: 2.0045, Perplexity: 7.4221caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [3825/6471], Loss: 2.1896, Perplexity: 8.9316caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3826/6471], Loss: 1.9112, Perplexity: 6.7615caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3827/6471], Loss: 1.9634, Perplexity: 7.1235caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3828/6471], Loss: 1.9286, Perplexity: 6.8799caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3829/6471], Loss: 2.0363, Perplexity: 7.6622caption shape:  torch.Size([64, 11])

Epoch [2/3], Step [3902/6471], Loss: 1.9706, Perplexity: 7.1748caption shape:  torch.Size([64, 17])
Epoch [2/3], Step [3903/6471], Loss: 2.4199, Perplexity: 11.2444caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3904/6471], Loss: 2.0941, Perplexity: 8.1178caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3905/6471], Loss: 2.1951, Perplexity: 8.9805caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3906/6471], Loss: 2.0865, Perplexity: 8.0567caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [3907/6471], Loss: 2.1716, Perplexity: 8.7721caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3908/6471], Loss: 2.1198, Perplexity: 8.3296caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3909/6471], Loss: 2.2058, Perplexity: 9.0774caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [3910/6471], Loss: 2.0521, Perplexity: 7.7839caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3911/6471], Loss: 2.0668, Perplexity: 7.8993caption shape:  torch.Size([64, 16])

Epoch [2/3], Step [3984/6471], Loss: 1.9260, Perplexity: 6.8622caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3985/6471], Loss: 1.9025, Perplexity: 6.7029caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [3986/6471], Loss: 2.2140, Perplexity: 9.1524caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [3987/6471], Loss: 2.0842, Perplexity: 8.0383caption shape:  torch.Size([64, 18])
Epoch [2/3], Step [3988/6471], Loss: 2.5756, Perplexity: 13.1390caption shape:  torch.Size([64, 21])
Epoch [2/3], Step [3989/6471], Loss: 2.8458, Perplexity: 17.2145caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3990/6471], Loss: 2.0058, Perplexity: 7.4321caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [3991/6471], Loss: 2.0341, Perplexity: 7.6453caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [3992/6471], Loss: 2.1970, Perplexity: 8.9975caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [3993/6471], Loss: 2.0971, Perplexity: 8.1426caption shape:  torch.Size([64, 11]

Epoch [2/3], Step [4066/6471], Loss: 2.2568, Perplexity: 9.5524caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4067/6471], Loss: 1.9595, Perplexity: 7.0954caption shape:  torch.Size([64, 18])
Epoch [2/3], Step [4068/6471], Loss: 2.4265, Perplexity: 11.3189caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4069/6471], Loss: 2.1939, Perplexity: 8.9700caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4070/6471], Loss: 1.9896, Perplexity: 7.3128caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4071/6471], Loss: 2.0033, Perplexity: 7.4134caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4072/6471], Loss: 2.0349, Perplexity: 7.6515caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4073/6471], Loss: 2.0577, Perplexity: 7.8282caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4074/6471], Loss: 2.0907, Perplexity: 8.0907caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4075/6471], Loss: 2.0100, Perplexity: 7.4634caption shape:  torch.Size([64, 14])

Epoch [2/3], Step [4148/6471], Loss: 1.9489, Perplexity: 7.0210caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4149/6471], Loss: 1.9097, Perplexity: 6.7512caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4150/6471], Loss: 1.9810, Perplexity: 7.2500caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4151/6471], Loss: 1.8393, Perplexity: 6.2919caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4152/6471], Loss: 2.1203, Perplexity: 8.3340caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4153/6471], Loss: 2.0358, Perplexity: 7.6587caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4154/6471], Loss: 1.9952, Perplexity: 7.3537caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4155/6471], Loss: 2.0170, Perplexity: 7.5154caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4156/6471], Loss: 2.1719, Perplexity: 8.7751caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4157/6471], Loss: 1.9861, Perplexity: 7.2872caption shape:  torch.Size([64, 15])


Epoch [2/3], Step [4230/6471], Loss: 2.0375, Perplexity: 7.6716caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4231/6471], Loss: 1.9921, Perplexity: 7.3313caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [4232/6471], Loss: 2.0426, Perplexity: 7.7104caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4233/6471], Loss: 2.3287, Perplexity: 10.2643caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4234/6471], Loss: 2.0121, Perplexity: 7.4791caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4235/6471], Loss: 1.8010, Perplexity: 6.0556caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [4236/6471], Loss: 2.2178, Perplexity: 9.1873caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4237/6471], Loss: 2.1829, Perplexity: 8.8717caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4238/6471], Loss: 2.0618, Perplexity: 7.8599caption shape:  torch.Size([64, 18])
Epoch [2/3], Step [4239/6471], Loss: 2.4028, Perplexity: 11.0546caption shape:  torch.Size([64, 13]

Epoch [2/3], Step [4312/6471], Loss: 1.9855, Perplexity: 7.2829caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4313/6471], Loss: 1.9919, Perplexity: 7.3297caption shape:  torch.Size([64, 17])
Epoch [2/3], Step [4314/6471], Loss: 2.4085, Perplexity: 11.1174caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4315/6471], Loss: 2.1582, Perplexity: 8.6555caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4316/6471], Loss: 1.8887, Perplexity: 6.6105caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4317/6471], Loss: 2.0493, Perplexity: 7.7623caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4318/6471], Loss: 1.9818, Perplexity: 7.2560caption shape:  torch.Size([64, 21])
Epoch [2/3], Step [4319/6471], Loss: 2.7830, Perplexity: 16.1682caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4320/6471], Loss: 1.9067, Perplexity: 6.7307caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [4321/6471], Loss: 2.1172, Perplexity: 8.3076caption shape:  torch.Size([64, 12]

Epoch [2/3], Step [4394/6471], Loss: 2.5352, Perplexity: 12.6185caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4395/6471], Loss: 1.9819, Perplexity: 7.2562caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [4396/6471], Loss: 2.1810, Perplexity: 8.8555caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4397/6471], Loss: 1.9223, Perplexity: 6.8369caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4398/6471], Loss: 2.0912, Perplexity: 8.0944caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4399/6471], Loss: 1.9874, Perplexity: 7.2965caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4400/6471], Loss: 1.9513, Perplexity: 7.0376
caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4401/6471], Loss: 2.0326, Perplexity: 7.6340caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4402/6471], Loss: 1.9007, Perplexity: 6.6903caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4403/6471], Loss: 1.9169, Perplexity: 6.7996caption shape:  torch.Size([64, 12]

Epoch [2/3], Step [4476/6471], Loss: 2.0231, Perplexity: 7.5620caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4477/6471], Loss: 1.9227, Perplexity: 6.8396caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4478/6471], Loss: 2.0273, Perplexity: 7.5939caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4479/6471], Loss: 1.8969, Perplexity: 6.6654caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4480/6471], Loss: 1.9238, Perplexity: 6.8472caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4481/6471], Loss: 2.1294, Perplexity: 8.4095caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4482/6471], Loss: 1.9849, Perplexity: 7.2782caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4483/6471], Loss: 1.9501, Perplexity: 7.0296caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4484/6471], Loss: 1.9021, Perplexity: 6.7001caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [4485/6471], Loss: 2.1673, Perplexity: 8.7349caption shape:  torch.Size([64, 13])


Epoch [2/3], Step [4558/6471], Loss: 2.0837, Perplexity: 8.0344caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [4559/6471], Loss: 2.2300, Perplexity: 9.3002caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4560/6471], Loss: 2.0342, Perplexity: 7.6463caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4561/6471], Loss: 2.0204, Perplexity: 7.5413caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4562/6471], Loss: 2.1247, Perplexity: 8.3703caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4563/6471], Loss: 2.0276, Perplexity: 7.5962caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4564/6471], Loss: 2.1030, Perplexity: 8.1905caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4565/6471], Loss: 1.9757, Perplexity: 7.2114caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4566/6471], Loss: 1.9874, Perplexity: 7.2968caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4567/6471], Loss: 2.0950, Perplexity: 8.1252caption shape:  torch.Size([64, 12])


Epoch [2/3], Step [4640/6471], Loss: 2.6151, Perplexity: 13.6690caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4641/6471], Loss: 2.0485, Perplexity: 7.7564caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4642/6471], Loss: 2.2472, Perplexity: 9.4614caption shape:  torch.Size([64, 17])
Epoch [2/3], Step [4643/6471], Loss: 2.2889, Perplexity: 9.8643caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4644/6471], Loss: 2.0179, Perplexity: 7.5225caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4645/6471], Loss: 2.2974, Perplexity: 9.9478caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4646/6471], Loss: 1.9412, Perplexity: 6.9669caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4647/6471], Loss: 2.0424, Perplexity: 7.7090caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4648/6471], Loss: 1.8047, Perplexity: 6.0784caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [4649/6471], Loss: 2.2044, Perplexity: 9.0650caption shape:  torch.Size([64, 13])

Epoch [2/3], Step [4722/6471], Loss: 2.3981, Perplexity: 11.0026caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4723/6471], Loss: 2.0161, Perplexity: 7.5089caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4724/6471], Loss: 2.1164, Perplexity: 8.3014caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4725/6471], Loss: 1.9116, Perplexity: 6.7640caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [4726/6471], Loss: 2.2995, Perplexity: 9.9694caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4727/6471], Loss: 2.1165, Perplexity: 8.3019caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4728/6471], Loss: 1.9771, Perplexity: 7.2218caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [4729/6471], Loss: 2.1173, Perplexity: 8.3086caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [4730/6471], Loss: 2.1382, Perplexity: 8.4839caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [4731/6471], Loss: 2.2202, Perplexity: 9.2095caption shape:  torch.Size([64, 15])

Epoch [2/3], Step [4804/6471], Loss: 2.1698, Perplexity: 8.7567caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4805/6471], Loss: 1.9378, Perplexity: 6.9436caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4806/6471], Loss: 1.9276, Perplexity: 6.8728caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4807/6471], Loss: 2.0851, Perplexity: 8.0452caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4808/6471], Loss: 2.1020, Perplexity: 8.1827caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4809/6471], Loss: 2.1921, Perplexity: 8.9542caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [4810/6471], Loss: 2.2650, Perplexity: 9.6315caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [4811/6471], Loss: 2.3245, Perplexity: 10.2213caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4812/6471], Loss: 1.9843, Perplexity: 7.2738caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [4813/6471], Loss: 2.0633, Perplexity: 7.8718caption shape:  torch.Size([64, 12])

Epoch [2/3], Step [4886/6471], Loss: 2.0404, Perplexity: 7.6935caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4887/6471], Loss: 1.9057, Perplexity: 6.7243caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4888/6471], Loss: 2.0234, Perplexity: 7.5640caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4889/6471], Loss: 1.9944, Perplexity: 7.3477caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4890/6471], Loss: 2.0928, Perplexity: 8.1074caption shape:  torch.Size([64, 18])
Epoch [2/3], Step [4891/6471], Loss: 2.4903, Perplexity: 12.0649caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4892/6471], Loss: 2.1280, Perplexity: 8.3983caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4893/6471], Loss: 1.9220, Perplexity: 6.8348caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [4894/6471], Loss: 2.0056, Perplexity: 7.4307caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4895/6471], Loss: 2.0496, Perplexity: 7.7646caption shape:  torch.Size([64, 14])

Epoch [2/3], Step [4968/6471], Loss: 2.5767, Perplexity: 13.1534caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4969/6471], Loss: 1.9655, Perplexity: 7.1386caption shape:  torch.Size([64, 20])
Epoch [2/3], Step [4970/6471], Loss: 2.7299, Perplexity: 15.3318caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [4971/6471], Loss: 1.9950, Perplexity: 7.3522caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4972/6471], Loss: 1.8012, Perplexity: 6.0571caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [4973/6471], Loss: 2.0239, Perplexity: 7.5675caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [4974/6471], Loss: 2.1893, Perplexity: 8.9291caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4975/6471], Loss: 1.9414, Perplexity: 6.9688caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4976/6471], Loss: 2.0484, Perplexity: 7.7553caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [4977/6471], Loss: 1.9282, Perplexity: 6.8768caption shape:  torch.Size([64, 13]

Epoch [2/3], Step [5050/6471], Loss: 2.1477, Perplexity: 8.5655caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5051/6471], Loss: 1.9341, Perplexity: 6.9176caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5052/6471], Loss: 1.8677, Perplexity: 6.4736caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5053/6471], Loss: 2.0701, Perplexity: 7.9257caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [5054/6471], Loss: 2.0130, Perplexity: 7.4857caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5055/6471], Loss: 2.0390, Perplexity: 7.6828caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5056/6471], Loss: 2.1179, Perplexity: 8.3140caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5057/6471], Loss: 2.1571, Perplexity: 8.6461caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5058/6471], Loss: 2.0279, Perplexity: 7.5985caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5059/6471], Loss: 2.0349, Perplexity: 7.6516caption shape:  torch.Size([64, 10])


Epoch [2/3], Step [5132/6471], Loss: 2.3211, Perplexity: 10.1873caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5133/6471], Loss: 1.9969, Perplexity: 7.3664caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5134/6471], Loss: 1.9954, Perplexity: 7.3548caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5135/6471], Loss: 2.0925, Perplexity: 8.1053caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5136/6471], Loss: 2.0650, Perplexity: 7.8851caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5137/6471], Loss: 1.8753, Perplexity: 6.5228caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [5138/6471], Loss: 2.1636, Perplexity: 8.7026caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [5139/6471], Loss: 2.5181, Perplexity: 12.4053caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5140/6471], Loss: 1.9563, Perplexity: 7.0731caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5141/6471], Loss: 1.9111, Perplexity: 6.7607caption shape:  torch.Size([64, 17]

Epoch [2/3], Step [5214/6471], Loss: 2.3265, Perplexity: 10.2419caption shape:  torch.Size([64, 26])
Epoch [2/3], Step [5215/6471], Loss: 2.9167, Perplexity: 18.4797caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5216/6471], Loss: 2.0209, Perplexity: 7.5453caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5217/6471], Loss: 1.8255, Perplexity: 6.2059caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5218/6471], Loss: 2.0191, Perplexity: 7.5314caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5219/6471], Loss: 1.9170, Perplexity: 6.8005caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5220/6471], Loss: 2.1193, Perplexity: 8.3257caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5221/6471], Loss: 2.1105, Perplexity: 8.2523caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5222/6471], Loss: 2.0250, Perplexity: 7.5760caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5223/6471], Loss: 1.8559, Perplexity: 6.3976caption shape:  torch.Size([64, 14]

Epoch [2/3], Step [5296/6471], Loss: 1.9072, Perplexity: 6.7345caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [5297/6471], Loss: 2.0708, Perplexity: 7.9313caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5298/6471], Loss: 1.9719, Perplexity: 7.1842caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5299/6471], Loss: 1.9150, Perplexity: 6.7868caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5300/6471], Loss: 2.0888, Perplexity: 8.0749
caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [5301/6471], Loss: 2.2870, Perplexity: 9.8458caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5302/6471], Loss: 2.0201, Perplexity: 7.5387caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5303/6471], Loss: 1.9327, Perplexity: 6.9078caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5304/6471], Loss: 1.9844, Perplexity: 7.2746caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5305/6471], Loss: 1.8724, Perplexity: 6.5036caption shape:  torch.Size([64, 12])

Epoch [2/3], Step [5378/6471], Loss: 2.1726, Perplexity: 8.7808caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5379/6471], Loss: 2.0111, Perplexity: 7.4719caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5380/6471], Loss: 2.1213, Perplexity: 8.3417caption shape:  torch.Size([64, 20])
Epoch [2/3], Step [5381/6471], Loss: 2.7407, Perplexity: 15.4972caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5382/6471], Loss: 1.9778, Perplexity: 7.2265caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5383/6471], Loss: 2.0049, Perplexity: 7.4250caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5384/6471], Loss: 1.9859, Perplexity: 7.2854caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5385/6471], Loss: 2.1850, Perplexity: 8.8903caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [5386/6471], Loss: 2.3401, Perplexity: 10.3819caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5387/6471], Loss: 1.9889, Perplexity: 7.3078caption shape:  torch.Size([64, 10]

Epoch [2/3], Step [5460/6471], Loss: 2.2851, Perplexity: 9.8263caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5461/6471], Loss: 2.0155, Perplexity: 7.5047caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5462/6471], Loss: 1.9989, Perplexity: 7.3810caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [5463/6471], Loss: 2.0921, Perplexity: 8.1015caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5464/6471], Loss: 1.9357, Perplexity: 6.9289caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5465/6471], Loss: 1.9093, Perplexity: 6.7484caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5466/6471], Loss: 1.9949, Perplexity: 7.3516caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [5467/6471], Loss: 2.2112, Perplexity: 9.1271caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5468/6471], Loss: 1.8706, Perplexity: 6.4924caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5469/6471], Loss: 1.9787, Perplexity: 7.2336caption shape:  torch.Size([64, 12])


Epoch [2/3], Step [5542/6471], Loss: 2.0428, Perplexity: 7.7122caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5543/6471], Loss: 2.0294, Perplexity: 7.6095caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5544/6471], Loss: 2.0891, Perplexity: 8.0779caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5545/6471], Loss: 2.0444, Perplexity: 7.7248caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5546/6471], Loss: 2.0613, Perplexity: 7.8559caption shape:  torch.Size([64, 17])
Epoch [2/3], Step [5547/6471], Loss: 2.4901, Perplexity: 12.0628caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5548/6471], Loss: 1.8242, Perplexity: 6.1981caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5549/6471], Loss: 1.9610, Perplexity: 7.1067caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5550/6471], Loss: 2.1197, Perplexity: 8.3285caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [5551/6471], Loss: 2.2368, Perplexity: 9.3636caption shape:  torch.Size([64, 15])

Epoch [2/3], Step [5624/6471], Loss: 2.0932, Perplexity: 8.1109caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5625/6471], Loss: 2.0420, Perplexity: 7.7062caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5626/6471], Loss: 1.9413, Perplexity: 6.9681caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5627/6471], Loss: 2.1024, Perplexity: 8.1857caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5628/6471], Loss: 2.2001, Perplexity: 9.0256caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5629/6471], Loss: 1.9488, Perplexity: 7.0204caption shape:  torch.Size([64, 18])
Epoch [2/3], Step [5630/6471], Loss: 2.4534, Perplexity: 11.6273caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5631/6471], Loss: 1.9065, Perplexity: 6.7295caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5632/6471], Loss: 2.0158, Perplexity: 7.5067caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5633/6471], Loss: 1.9008, Perplexity: 6.6913caption shape:  torch.Size([64, 12])

Epoch [2/3], Step [5706/6471], Loss: 2.1241, Perplexity: 8.3651caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5707/6471], Loss: 1.9607, Perplexity: 7.1041caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5708/6471], Loss: 2.1431, Perplexity: 8.5254caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5709/6471], Loss: 1.9059, Perplexity: 6.7255caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5710/6471], Loss: 1.8890, Perplexity: 6.6130caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5711/6471], Loss: 2.0912, Perplexity: 8.0949caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5712/6471], Loss: 2.2576, Perplexity: 9.5600caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5713/6471], Loss: 1.9252, Perplexity: 6.8563caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5714/6471], Loss: 2.2073, Perplexity: 9.0912caption shape:  torch.Size([64, 20])
Epoch [2/3], Step [5715/6471], Loss: 2.7139, Perplexity: 15.0878caption shape:  torch.Size([64, 16])

Epoch [2/3], Step [5788/6471], Loss: 2.1946, Perplexity: 8.9766caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [5789/6471], Loss: 2.1636, Perplexity: 8.7023caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5790/6471], Loss: 2.1903, Perplexity: 8.9378caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5791/6471], Loss: 1.9193, Perplexity: 6.8163caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5792/6471], Loss: 2.0542, Perplexity: 7.8009caption shape:  torch.Size([64, 36])
Epoch [2/3], Step [5793/6471], Loss: 4.1496, Perplexity: 63.4100caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5794/6471], Loss: 2.0208, Perplexity: 7.5442caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5795/6471], Loss: 1.9931, Perplexity: 7.3383caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5796/6471], Loss: 2.0211, Perplexity: 7.5465caption shape:  torch.Size([64, 18])
Epoch [2/3], Step [5797/6471], Loss: 2.6047, Perplexity: 13.5266caption shape:  torch.Size([64, 12]

Epoch [2/3], Step [5870/6471], Loss: 2.1683, Perplexity: 8.7430caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5871/6471], Loss: 1.8322, Perplexity: 6.2475caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5872/6471], Loss: 2.0526, Perplexity: 7.7884caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5873/6471], Loss: 2.0248, Perplexity: 7.5748caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5874/6471], Loss: 2.0608, Perplexity: 7.8526caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5875/6471], Loss: 1.9548, Perplexity: 7.0626caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5876/6471], Loss: 1.9189, Perplexity: 6.8135caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5877/6471], Loss: 2.0068, Perplexity: 7.4395caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5878/6471], Loss: 1.9663, Perplexity: 7.1444caption shape:  torch.Size([64, 17])
Epoch [2/3], Step [5879/6471], Loss: 2.4059, Perplexity: 11.0882caption shape:  torch.Size([64, 15])

Epoch [2/3], Step [5952/6471], Loss: 2.0363, Perplexity: 7.6624caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [5953/6471], Loss: 2.0785, Perplexity: 7.9921caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5954/6471], Loss: 1.9710, Perplexity: 7.1778caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5955/6471], Loss: 2.1267, Perplexity: 8.3874caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [5956/6471], Loss: 2.0896, Perplexity: 8.0819caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [5957/6471], Loss: 1.9012, Perplexity: 6.6939caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5958/6471], Loss: 1.9905, Perplexity: 7.3189caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [5959/6471], Loss: 1.9860, Perplexity: 7.2864caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5960/6471], Loss: 2.0199, Perplexity: 7.5378caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [5961/6471], Loss: 2.1762, Perplexity: 8.8124caption shape:  torch.Size([64, 12])


Epoch [2/3], Step [6034/6471], Loss: 1.8897, Perplexity: 6.6173caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [6035/6471], Loss: 2.0618, Perplexity: 7.8604caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6036/6471], Loss: 2.1100, Perplexity: 8.2479caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [6037/6471], Loss: 1.9434, Perplexity: 6.9826caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [6038/6471], Loss: 2.1965, Perplexity: 8.9935caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [6039/6471], Loss: 2.2432, Perplexity: 9.4234caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [6040/6471], Loss: 2.0259, Perplexity: 7.5833caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [6041/6471], Loss: 1.9556, Perplexity: 7.0684caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [6042/6471], Loss: 2.2394, Perplexity: 9.3879caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6043/6471], Loss: 2.0112, Perplexity: 7.4719caption shape:  torch.Size([64, 15])


Epoch [2/3], Step [6116/6471], Loss: 1.9101, Perplexity: 6.7536caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [6117/6471], Loss: 1.8024, Perplexity: 6.0639caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [6118/6471], Loss: 2.0541, Perplexity: 7.7996caption shape:  torch.Size([64, 20])
Epoch [2/3], Step [6119/6471], Loss: 2.5413, Perplexity: 12.6961caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [6120/6471], Loss: 1.8705, Perplexity: 6.4916caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6121/6471], Loss: 1.8689, Perplexity: 6.4810caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [6122/6471], Loss: 1.9412, Perplexity: 6.9670caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [6123/6471], Loss: 2.2750, Perplexity: 9.7280caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [6124/6471], Loss: 1.8991, Perplexity: 6.6799caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [6125/6471], Loss: 1.7794, Perplexity: 5.9262caption shape:  torch.Size([64, 14])

Epoch [2/3], Step [6198/6471], Loss: 1.9547, Perplexity: 7.0620caption shape:  torch.Size([64, 18])
Epoch [2/3], Step [6199/6471], Loss: 2.4349, Perplexity: 11.4148caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [6200/6471], Loss: 2.2947, Perplexity: 9.9218
caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [6201/6471], Loss: 1.9273, Perplexity: 6.8708caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [6202/6471], Loss: 1.7877, Perplexity: 5.9757caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6203/6471], Loss: 2.0156, Perplexity: 7.5054caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [6204/6471], Loss: 1.9068, Perplexity: 6.7314caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [6205/6471], Loss: 1.8154, Perplexity: 6.1437caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6206/6471], Loss: 1.9977, Perplexity: 7.3724caption shape:  torch.Size([64, 16])
Epoch [2/3], Step [6207/6471], Loss: 2.2229, Perplexity: 9.2342caption shape:  torch.Size([64, 11]

Epoch [2/3], Step [6280/6471], Loss: 2.0739, Perplexity: 7.9558caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [6281/6471], Loss: 1.7697, Perplexity: 5.8694caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [6282/6471], Loss: 1.9316, Perplexity: 6.9008caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6283/6471], Loss: 1.8037, Perplexity: 6.0719caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [6284/6471], Loss: 1.9410, Perplexity: 6.9659caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6285/6471], Loss: 1.8956, Perplexity: 6.6562caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [6286/6471], Loss: 2.1154, Perplexity: 8.2932caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [6287/6471], Loss: 2.1288, Perplexity: 8.4052caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [6288/6471], Loss: 1.9509, Perplexity: 7.0348caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [6289/6471], Loss: 1.9916, Perplexity: 7.3274caption shape:  torch.Size([64, 11])


Epoch [2/3], Step [6362/6471], Loss: 1.8811, Perplexity: 6.5607caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6363/6471], Loss: 1.8886, Perplexity: 6.6099caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [6364/6471], Loss: 1.8450, Perplexity: 6.3281caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [6365/6471], Loss: 2.0591, Perplexity: 7.8392caption shape:  torch.Size([64, 15])
Epoch [2/3], Step [6366/6471], Loss: 2.2286, Perplexity: 9.2868caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6367/6471], Loss: 1.9920, Perplexity: 7.3303caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6368/6471], Loss: 1.8866, Perplexity: 6.5966caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6369/6471], Loss: 1.9276, Perplexity: 6.8733caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6370/6471], Loss: 1.9920, Perplexity: 7.3305caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [6371/6471], Loss: 1.9853, Perplexity: 7.2810caption shape:  torch.Size([64, 11])


Epoch [2/3], Step [6444/6471], Loss: 2.1182, Perplexity: 8.3158caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6445/6471], Loss: 1.9159, Perplexity: 6.7933caption shape:  torch.Size([64, 18])
Epoch [2/3], Step [6446/6471], Loss: 2.5039, Perplexity: 12.2297caption shape:  torch.Size([64, 10])
Epoch [2/3], Step [6447/6471], Loss: 2.0312, Perplexity: 7.6231caption shape:  torch.Size([64, 22])
Epoch [2/3], Step [6448/6471], Loss: 2.6886, Perplexity: 14.7110caption shape:  torch.Size([64, 13])
Epoch [2/3], Step [6449/6471], Loss: 2.0877, Perplexity: 8.0667caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6450/6471], Loss: 1.9760, Perplexity: 7.2141caption shape:  torch.Size([64, 14])
Epoch [2/3], Step [6451/6471], Loss: 1.9461, Perplexity: 7.0012caption shape:  torch.Size([64, 12])
Epoch [2/3], Step [6452/6471], Loss: 2.1674, Perplexity: 8.7357caption shape:  torch.Size([64, 11])
Epoch [2/3], Step [6453/6471], Loss: 1.9733, Perplexity: 7.1943caption shape:  torch.Size([64, 11]

Epoch [3/3], Step [139/6471], Loss: 2.0863, Perplexity: 8.0550caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [140/6471], Loss: 1.9943, Perplexity: 7.3474caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [141/6471], Loss: 2.4337, Perplexity: 11.4005caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [142/6471], Loss: 2.0211, Perplexity: 7.5470caption shape:  torch.Size([64, 20])
Epoch [3/3], Step [143/6471], Loss: 2.5120, Perplexity: 12.3297caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [144/6471], Loss: 1.8634, Perplexity: 6.4458caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [145/6471], Loss: 2.0492, Perplexity: 7.7614caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [146/6471], Loss: 2.2220, Perplexity: 9.2257caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [147/6471], Loss: 1.9299, Perplexity: 6.8888caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [148/6471], Loss: 1.8319, Perplexity: 6.2454caption shape:  torch.Size([64, 12])
Epoch [3

Epoch [3/3], Step [222/6471], Loss: 2.0309, Perplexity: 7.6209caption shape:  torch.Size([64, 19])
Epoch [3/3], Step [223/6471], Loss: 2.4555, Perplexity: 11.6525caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [224/6471], Loss: 1.9624, Perplexity: 7.1160caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [225/6471], Loss: 2.5639, Perplexity: 12.9866caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [226/6471], Loss: 1.9985, Perplexity: 7.3779caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [227/6471], Loss: 1.8963, Perplexity: 6.6614caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [228/6471], Loss: 2.0380, Perplexity: 7.6750caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [229/6471], Loss: 2.2363, Perplexity: 9.3587caption shape:  torch.Size([64, 20])
Epoch [3/3], Step [230/6471], Loss: 2.5140, Perplexity: 12.3547caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [231/6471], Loss: 2.0001, Perplexity: 7.3896caption shape:  torch.Size([64, 12])
Epoch [

Epoch [3/3], Step [305/6471], Loss: 2.2478, Perplexity: 9.4666caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [306/6471], Loss: 2.1179, Perplexity: 8.3139caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [307/6471], Loss: 2.1189, Perplexity: 8.3216caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [308/6471], Loss: 1.9705, Perplexity: 7.1744caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [309/6471], Loss: 2.0553, Perplexity: 7.8094caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [310/6471], Loss: 2.4797, Perplexity: 11.9378caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [311/6471], Loss: 2.2388, Perplexity: 9.3816caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [312/6471], Loss: 2.4344, Perplexity: 11.4092caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [313/6471], Loss: 1.9858, Perplexity: 7.2852caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [314/6471], Loss: 2.1222, Perplexity: 8.3497caption shape:  torch.Size([64, 13])
Epoch [3

Epoch [3/3], Step [388/6471], Loss: 1.9566, Perplexity: 7.0750caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [389/6471], Loss: 1.8318, Perplexity: 6.2453caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [390/6471], Loss: 1.8900, Perplexity: 6.6195caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [391/6471], Loss: 2.0326, Perplexity: 7.6339caption shape:  torch.Size([64, 21])
Epoch [3/3], Step [392/6471], Loss: 2.6055, Perplexity: 13.5378caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [393/6471], Loss: 2.2210, Perplexity: 9.2163caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [394/6471], Loss: 2.3347, Perplexity: 10.3268caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [395/6471], Loss: 1.9289, Perplexity: 6.8822caption shape:  torch.Size([64, 9])
Epoch [3/3], Step [396/6471], Loss: 2.4683, Perplexity: 11.8020caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [397/6471], Loss: 2.0795, Perplexity: 8.0005caption shape:  torch.Size([64, 16])
Epoch [3

Epoch [3/3], Step [471/6471], Loss: 1.8384, Perplexity: 6.2865caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [472/6471], Loss: 2.1176, Perplexity: 8.3110caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [473/6471], Loss: 2.1159, Perplexity: 8.2972caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [474/6471], Loss: 2.0146, Perplexity: 7.4977caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [475/6471], Loss: 1.9763, Perplexity: 7.2158caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [476/6471], Loss: 1.9842, Perplexity: 7.2732caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [477/6471], Loss: 1.9511, Perplexity: 7.0368caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [478/6471], Loss: 2.2109, Perplexity: 9.1238caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [479/6471], Loss: 1.9132, Perplexity: 6.7746caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [480/6471], Loss: 1.8057, Perplexity: 6.0844caption shape:  torch.Size([64, 12])
Epoch [3/3

Epoch [3/3], Step [554/6471], Loss: 1.9738, Perplexity: 7.1977caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [555/6471], Loss: 2.0781, Perplexity: 7.9891caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [556/6471], Loss: 1.9737, Perplexity: 7.1971caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [557/6471], Loss: 1.7589, Perplexity: 5.8058caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [558/6471], Loss: 2.0312, Perplexity: 7.6235caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [559/6471], Loss: 2.0181, Perplexity: 7.5237caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [560/6471], Loss: 2.0788, Perplexity: 7.9953caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [561/6471], Loss: 2.2330, Perplexity: 9.3275caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [562/6471], Loss: 1.9377, Perplexity: 6.9431caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [563/6471], Loss: 2.0685, Perplexity: 7.9130caption shape:  torch.Size([64, 12])
Epoch [3/3

Epoch [3/3], Step [637/6471], Loss: 2.0783, Perplexity: 7.9905caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [638/6471], Loss: 2.0382, Perplexity: 7.6765caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [639/6471], Loss: 2.0004, Perplexity: 7.3920caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [640/6471], Loss: 1.9328, Perplexity: 6.9089caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [641/6471], Loss: 2.2129, Perplexity: 9.1418caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [642/6471], Loss: 2.3140, Perplexity: 10.1151caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [643/6471], Loss: 1.8541, Perplexity: 6.3861caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [644/6471], Loss: 2.0439, Perplexity: 7.7210caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [645/6471], Loss: 2.2137, Perplexity: 9.1498caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [646/6471], Loss: 2.2716, Perplexity: 9.6944caption shape:  torch.Size([64, 21])
Epoch [3/

Epoch [3/3], Step [720/6471], Loss: 1.9445, Perplexity: 6.9899caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [721/6471], Loss: 2.3345, Perplexity: 10.3246caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [722/6471], Loss: 2.0132, Perplexity: 7.4875caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [723/6471], Loss: 1.9005, Perplexity: 6.6893caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [724/6471], Loss: 2.0332, Perplexity: 7.6381caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [725/6471], Loss: 2.0821, Perplexity: 8.0215caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [726/6471], Loss: 1.9476, Perplexity: 7.0120caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [727/6471], Loss: 1.8524, Perplexity: 6.3750caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [728/6471], Loss: 2.0195, Perplexity: 7.5345caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [729/6471], Loss: 2.2978, Perplexity: 9.9527caption shape:  torch.Size([64, 12])
Epoch [3/

Epoch [3/3], Step [803/6471], Loss: 2.1260, Perplexity: 8.3810caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [804/6471], Loss: 1.8423, Perplexity: 6.3107caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [805/6471], Loss: 2.1140, Perplexity: 8.2816caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [806/6471], Loss: 2.1003, Perplexity: 8.1690caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [807/6471], Loss: 2.0173, Perplexity: 7.5176caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [808/6471], Loss: 1.9153, Perplexity: 6.7891caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [809/6471], Loss: 1.9393, Perplexity: 6.9540caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [810/6471], Loss: 2.1489, Perplexity: 8.5758caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [811/6471], Loss: 2.3125, Perplexity: 10.1001caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [812/6471], Loss: 2.2341, Perplexity: 9.3381caption shape:  torch.Size([64, 14])
Epoch [3/

Epoch [3/3], Step [886/6471], Loss: 1.9898, Perplexity: 7.3144caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [887/6471], Loss: 2.0002, Perplexity: 7.3902caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [888/6471], Loss: 1.9161, Perplexity: 6.7943caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [889/6471], Loss: 1.8383, Perplexity: 6.2861caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [890/6471], Loss: 1.9226, Perplexity: 6.8386caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [891/6471], Loss: 2.0758, Perplexity: 7.9706caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [892/6471], Loss: 1.9963, Perplexity: 7.3617caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [893/6471], Loss: 1.9153, Perplexity: 6.7892caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [894/6471], Loss: 1.9138, Perplexity: 6.7788caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [895/6471], Loss: 1.9427, Perplexity: 6.9774caption shape:  torch.Size([64, 15])
Epoch [3/3

Epoch [3/3], Step [969/6471], Loss: 1.7108, Perplexity: 5.5333caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [970/6471], Loss: 2.1753, Perplexity: 8.8048caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [971/6471], Loss: 1.7398, Perplexity: 5.6960caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [972/6471], Loss: 1.9329, Perplexity: 6.9093caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [973/6471], Loss: 2.0779, Perplexity: 7.9878caption shape:  torch.Size([64, 24])
Epoch [3/3], Step [974/6471], Loss: 2.9782, Perplexity: 19.6524caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [975/6471], Loss: 2.0659, Perplexity: 7.8926caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [976/6471], Loss: 2.0129, Perplexity: 7.4846caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [977/6471], Loss: 1.8476, Perplexity: 6.3448caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [978/6471], Loss: 1.8829, Perplexity: 6.5723caption shape:  torch.Size([64, 15])
Epoch [3/

Epoch [3/3], Step [1133/6471], Loss: 1.9357, Perplexity: 6.9289caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1134/6471], Loss: 1.8384, Perplexity: 6.2866caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [1135/6471], Loss: 2.1672, Perplexity: 8.7341caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1136/6471], Loss: 1.9481, Perplexity: 7.0152caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1137/6471], Loss: 1.9730, Perplexity: 7.1923caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1138/6471], Loss: 1.9608, Perplexity: 7.1051caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [1139/6471], Loss: 2.1582, Perplexity: 8.6556caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [1140/6471], Loss: 2.1746, Perplexity: 8.7991caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1141/6471], Loss: 1.9577, Perplexity: 7.0829caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1142/6471], Loss: 2.0343, Perplexity: 7.6466caption shape:  torch.Size([64, 16])


Epoch [3/3], Step [1215/6471], Loss: 2.1767, Perplexity: 8.8170caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1216/6471], Loss: 1.9321, Perplexity: 6.9038caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1217/6471], Loss: 2.0262, Perplexity: 7.5853caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1218/6471], Loss: 1.8919, Perplexity: 6.6321caption shape:  torch.Size([64, 46])
Epoch [3/3], Step [1219/6471], Loss: 4.4079, Perplexity: 82.0960caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1220/6471], Loss: 1.9897, Perplexity: 7.3132caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1221/6471], Loss: 1.9005, Perplexity: 6.6893caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1222/6471], Loss: 2.0118, Perplexity: 7.4770caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1223/6471], Loss: 1.8655, Perplexity: 6.4594caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1224/6471], Loss: 1.7362, Perplexity: 5.6759caption shape:  torch.Size([64, 11])

Epoch [3/3], Step [1297/6471], Loss: 2.2481, Perplexity: 9.4701caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1298/6471], Loss: 2.1066, Perplexity: 8.2206caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [1299/6471], Loss: 2.0678, Perplexity: 7.9078caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [1300/6471], Loss: 2.0698, Perplexity: 7.9233
caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1301/6471], Loss: 1.9250, Perplexity: 6.8554caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1302/6471], Loss: 1.8457, Perplexity: 6.3324caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1303/6471], Loss: 1.9944, Perplexity: 7.3478caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1304/6471], Loss: 2.1412, Perplexity: 8.5099caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1305/6471], Loss: 1.8505, Perplexity: 6.3630caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1306/6471], Loss: 2.0717, Perplexity: 7.9384caption shape:  torch.Size([64, 11])

Epoch [3/3], Step [1379/6471], Loss: 1.8596, Perplexity: 6.4214caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1380/6471], Loss: 2.0510, Perplexity: 7.7761caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1381/6471], Loss: 2.0495, Perplexity: 7.7640caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1382/6471], Loss: 2.0818, Perplexity: 8.0192caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1383/6471], Loss: 1.9684, Perplexity: 7.1594caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1384/6471], Loss: 1.9725, Perplexity: 7.1884caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1385/6471], Loss: 1.6847, Perplexity: 5.3908caption shape:  torch.Size([64, 21])
Epoch [3/3], Step [1386/6471], Loss: 2.6398, Perplexity: 14.0111caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1387/6471], Loss: 1.9453, Perplexity: 6.9959caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [1388/6471], Loss: 2.1079, Perplexity: 8.2311caption shape:  torch.Size([64, 11])

Epoch [3/3], Step [1461/6471], Loss: 2.2471, Perplexity: 9.4604caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1462/6471], Loss: 1.8086, Perplexity: 6.1021caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1463/6471], Loss: 1.8997, Perplexity: 6.6839caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [1464/6471], Loss: 1.8730, Perplexity: 6.5079caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1465/6471], Loss: 1.9034, Perplexity: 6.7086caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [1466/6471], Loss: 2.3420, Perplexity: 10.4017caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [1467/6471], Loss: 2.2541, Perplexity: 9.5272caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1468/6471], Loss: 1.8568, Perplexity: 6.4033caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1469/6471], Loss: 2.0063, Perplexity: 7.4356caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1470/6471], Loss: 2.0010, Perplexity: 7.3964caption shape:  torch.Size([64, 13])

Epoch [3/3], Step [1543/6471], Loss: 1.9956, Perplexity: 7.3565caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [1544/6471], Loss: 2.1927, Perplexity: 8.9592caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [1545/6471], Loss: 2.2271, Perplexity: 9.2729caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1546/6471], Loss: 2.0195, Perplexity: 7.5342caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1547/6471], Loss: 1.9237, Perplexity: 6.8462caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1548/6471], Loss: 1.9056, Perplexity: 6.7233caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1549/6471], Loss: 1.9787, Perplexity: 7.2336caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [1550/6471], Loss: 2.0083, Perplexity: 7.4505caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1551/6471], Loss: 1.9627, Perplexity: 7.1182caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [1552/6471], Loss: 2.2272, Perplexity: 9.2735caption shape:  torch.Size([64, 12])


Epoch [3/3], Step [1625/6471], Loss: 2.0178, Perplexity: 7.5219caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1626/6471], Loss: 2.0087, Perplexity: 7.4535caption shape:  torch.Size([64, 23])
Epoch [3/3], Step [1627/6471], Loss: 2.9493, Perplexity: 19.0918caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1628/6471], Loss: 1.9754, Perplexity: 7.2098caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1629/6471], Loss: 1.9369, Perplexity: 6.9374caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1630/6471], Loss: 1.8422, Perplexity: 6.3107caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [1631/6471], Loss: 2.0497, Perplexity: 7.7653caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1632/6471], Loss: 1.8999, Perplexity: 6.6849caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [1633/6471], Loss: 2.3095, Perplexity: 10.0699caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [1634/6471], Loss: 2.1577, Perplexity: 8.6514caption shape:  torch.Size([64, 19]

Epoch [3/3], Step [1707/6471], Loss: 1.8900, Perplexity: 6.6194caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1708/6471], Loss: 2.0321, Perplexity: 7.6304caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1709/6471], Loss: 1.9638, Perplexity: 7.1263caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1710/6471], Loss: 1.9115, Perplexity: 6.7633caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1711/6471], Loss: 2.0779, Perplexity: 7.9873caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1712/6471], Loss: 1.9391, Perplexity: 6.9523caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [1713/6471], Loss: 2.2269, Perplexity: 9.2712caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1714/6471], Loss: 2.1057, Perplexity: 8.2129caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1715/6471], Loss: 1.9584, Perplexity: 7.0878caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [1716/6471], Loss: 2.5074, Perplexity: 12.2732caption shape:  torch.Size([64, 27])

Epoch [3/3], Step [1789/6471], Loss: 2.3031, Perplexity: 10.0047caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1790/6471], Loss: 2.0421, Perplexity: 7.7068caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1791/6471], Loss: 1.8215, Perplexity: 6.1809caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1792/6471], Loss: 1.8864, Perplexity: 6.5955caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1793/6471], Loss: 1.8932, Perplexity: 6.6407caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1794/6471], Loss: 1.9835, Perplexity: 7.2683caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1795/6471], Loss: 1.9843, Perplexity: 7.2738caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1796/6471], Loss: 1.8299, Perplexity: 6.2333caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1797/6471], Loss: 1.8006, Perplexity: 6.0533caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1798/6471], Loss: 2.2251, Perplexity: 9.2547caption shape:  torch.Size([64, 12])

Epoch [3/3], Step [1871/6471], Loss: 1.8497, Perplexity: 6.3581caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [1872/6471], Loss: 2.2001, Perplexity: 9.0259caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1873/6471], Loss: 1.7338, Perplexity: 5.6624caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [1874/6471], Loss: 2.4630, Perplexity: 11.7404caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1875/6471], Loss: 1.8922, Perplexity: 6.6337caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [1876/6471], Loss: 2.3379, Perplexity: 10.3590caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1877/6471], Loss: 1.9833, Perplexity: 7.2668caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [1878/6471], Loss: 2.1433, Perplexity: 8.5272caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [1879/6471], Loss: 1.9956, Perplexity: 7.3563caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [1880/6471], Loss: 2.2804, Perplexity: 9.7808caption shape:  torch.Size([64, 14]

Epoch [3/3], Step [1953/6471], Loss: 1.7475, Perplexity: 5.7400caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1954/6471], Loss: 1.8726, Perplexity: 6.5049caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [1955/6471], Loss: 2.2792, Perplexity: 9.7690caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1956/6471], Loss: 1.9832, Perplexity: 7.2660caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1957/6471], Loss: 1.8459, Perplexity: 6.3335caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1958/6471], Loss: 1.8899, Perplexity: 6.6185caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [1959/6471], Loss: 2.3554, Perplexity: 10.5425caption shape:  torch.Size([64, 24])
Epoch [3/3], Step [1960/6471], Loss: 3.0280, Perplexity: 20.6555caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [1961/6471], Loss: 1.8814, Perplexity: 6.5629caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [1962/6471], Loss: 1.8739, Perplexity: 6.5137caption shape:  torch.Size([64, 12]

Epoch [3/3], Step [2035/6471], Loss: 1.9637, Perplexity: 7.1256caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2036/6471], Loss: 1.9405, Perplexity: 6.9623caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2037/6471], Loss: 1.9025, Perplexity: 6.7024caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [2038/6471], Loss: 2.3694, Perplexity: 10.6910caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2039/6471], Loss: 1.9839, Perplexity: 7.2711caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2040/6471], Loss: 1.8614, Perplexity: 6.4326caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2041/6471], Loss: 2.1789, Perplexity: 8.8370caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2042/6471], Loss: 1.9383, Perplexity: 6.9470caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2043/6471], Loss: 2.0535, Perplexity: 7.7954caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2044/6471], Loss: 2.0680, Perplexity: 7.9090caption shape:  torch.Size([64, 13])

Epoch [3/3], Step [2117/6471], Loss: 1.9768, Perplexity: 7.2194caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2118/6471], Loss: 1.9328, Perplexity: 6.9092caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2119/6471], Loss: 2.0430, Perplexity: 7.7138caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [2120/6471], Loss: 2.1438, Perplexity: 8.5319caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2121/6471], Loss: 2.0331, Perplexity: 7.6374caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2122/6471], Loss: 1.9850, Perplexity: 7.2790caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [2123/6471], Loss: 2.1828, Perplexity: 8.8710caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2124/6471], Loss: 2.0106, Perplexity: 7.4680caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [2125/6471], Loss: 2.1316, Perplexity: 8.4280caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2126/6471], Loss: 1.9902, Perplexity: 7.3173caption shape:  torch.Size([64, 18])


Epoch [3/3], Step [2199/6471], Loss: 1.9209, Perplexity: 6.8268caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2200/6471], Loss: 1.8726, Perplexity: 6.5049
caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2201/6471], Loss: 1.8583, Perplexity: 6.4130caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [2202/6471], Loss: 2.0412, Perplexity: 7.6995caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2203/6471], Loss: 1.8829, Perplexity: 6.5728caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [2204/6471], Loss: 1.9781, Perplexity: 7.2291caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [2205/6471], Loss: 2.2473, Perplexity: 9.4620caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [2206/6471], Loss: 2.1289, Perplexity: 8.4056caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [2207/6471], Loss: 2.0690, Perplexity: 7.9173caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2208/6471], Loss: 1.8170, Perplexity: 6.1535caption shape:  torch.Size([64, 11])

Epoch [3/3], Step [2281/6471], Loss: 1.9175, Perplexity: 6.8039caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [2282/6471], Loss: 1.8541, Perplexity: 6.3858caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [2283/6471], Loss: 2.0791, Perplexity: 7.9974caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2284/6471], Loss: 1.9239, Perplexity: 6.8475caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [2285/6471], Loss: 2.0339, Perplexity: 7.6436caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2286/6471], Loss: 2.2262, Perplexity: 9.2647caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2287/6471], Loss: 1.7954, Perplexity: 6.0221caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [2288/6471], Loss: 2.0616, Perplexity: 7.8586caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [2289/6471], Loss: 2.0643, Perplexity: 7.8801caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2290/6471], Loss: 2.0406, Perplexity: 7.6956caption shape:  torch.Size([64, 11])


Epoch [3/3], Step [2363/6471], Loss: 1.8536, Perplexity: 6.3825caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [2364/6471], Loss: 2.0635, Perplexity: 7.8737caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2365/6471], Loss: 1.7601, Perplexity: 5.8129caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2366/6471], Loss: 2.0640, Perplexity: 7.8775caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2367/6471], Loss: 1.9784, Perplexity: 7.2315caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [2368/6471], Loss: 2.1788, Perplexity: 8.8354caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [2369/6471], Loss: 2.0648, Perplexity: 7.8835caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2370/6471], Loss: 2.0164, Perplexity: 7.5109caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [2371/6471], Loss: 1.9367, Perplexity: 6.9359caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2372/6471], Loss: 1.9244, Perplexity: 6.8509caption shape:  torch.Size([64, 15])


Epoch [3/3], Step [2445/6471], Loss: 2.1897, Perplexity: 8.9323caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2446/6471], Loss: 1.9867, Perplexity: 7.2916caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2447/6471], Loss: 1.9172, Perplexity: 6.8020caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [2448/6471], Loss: 1.9641, Perplexity: 7.1285caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2449/6471], Loss: 1.9127, Perplexity: 6.7713caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2450/6471], Loss: 2.0354, Perplexity: 7.6551caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [2451/6471], Loss: 2.0817, Perplexity: 8.0182caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2452/6471], Loss: 1.9436, Perplexity: 6.9836caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [2453/6471], Loss: 1.7945, Perplexity: 6.0166caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2454/6471], Loss: 1.9538, Perplexity: 7.0556caption shape:  torch.Size([64, 11])


Epoch [3/3], Step [2527/6471], Loss: 1.8463, Perplexity: 6.3360caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2528/6471], Loss: 1.9071, Perplexity: 6.7332caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2529/6471], Loss: 1.7783, Perplexity: 5.9200caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2530/6471], Loss: 1.8319, Perplexity: 6.2459caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [2531/6471], Loss: 2.5426, Perplexity: 12.7130caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [2532/6471], Loss: 1.9368, Perplexity: 6.9367caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [2533/6471], Loss: 2.0974, Perplexity: 8.1448caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2534/6471], Loss: 1.9733, Perplexity: 7.1942caption shape:  torch.Size([64, 21])
Epoch [3/3], Step [2535/6471], Loss: 2.7540, Perplexity: 15.7047caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [2536/6471], Loss: 2.0384, Perplexity: 7.6781caption shape:  torch.Size([64, 16]

Epoch [3/3], Step [2609/6471], Loss: 2.1796, Perplexity: 8.8430caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2610/6471], Loss: 1.9665, Perplexity: 7.1457caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2611/6471], Loss: 2.0733, Perplexity: 7.9512caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [2612/6471], Loss: 2.0678, Perplexity: 7.9073caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [2613/6471], Loss: 2.1502, Perplexity: 8.5867caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2614/6471], Loss: 1.8230, Perplexity: 6.1903caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [2615/6471], Loss: 2.3252, Perplexity: 10.2288caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [2616/6471], Loss: 2.3611, Perplexity: 10.6025caption shape:  torch.Size([64, 24])
Epoch [3/3], Step [2617/6471], Loss: 2.8698, Perplexity: 17.6336caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2618/6471], Loss: 1.9035, Perplexity: 6.7096caption shape:  torch.Size([64, 17

Epoch [3/3], Step [2691/6471], Loss: 2.2779, Perplexity: 9.7560caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [2692/6471], Loss: 2.0476, Perplexity: 7.7496caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [2693/6471], Loss: 2.1547, Perplexity: 8.6255caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2694/6471], Loss: 1.6662, Perplexity: 5.2921caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2695/6471], Loss: 2.0135, Perplexity: 7.4892caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2696/6471], Loss: 1.9235, Perplexity: 6.8451caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2697/6471], Loss: 1.8184, Perplexity: 6.1623caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2698/6471], Loss: 1.9451, Perplexity: 6.9945caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2699/6471], Loss: 1.9380, Perplexity: 6.9450caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2700/6471], Loss: 1.8787, Perplexity: 6.5448
caption shape:  torch.Size([64, 13])

Epoch [3/3], Step [2773/6471], Loss: 1.9023, Perplexity: 6.7014caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2774/6471], Loss: 1.9505, Perplexity: 7.0325caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [2775/6471], Loss: 2.4675, Perplexity: 11.7932caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2776/6471], Loss: 2.0377, Perplexity: 7.6728caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2777/6471], Loss: 1.8550, Perplexity: 6.3915caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2778/6471], Loss: 1.8738, Perplexity: 6.5131caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2779/6471], Loss: 1.8916, Perplexity: 6.6300caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [2780/6471], Loss: 1.7789, Perplexity: 5.9231caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2781/6471], Loss: 1.9263, Perplexity: 6.8642caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [2782/6471], Loss: 2.2393, Perplexity: 9.3865caption shape:  torch.Size([64, 14])

Epoch [3/3], Step [2855/6471], Loss: 1.9580, Perplexity: 7.0854caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [2856/6471], Loss: 2.3844, Perplexity: 10.8530caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [2857/6471], Loss: 1.8892, Perplexity: 6.6144caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [2858/6471], Loss: 2.0594, Perplexity: 7.8411caption shape:  torch.Size([64, 21])
Epoch [3/3], Step [2859/6471], Loss: 2.5348, Perplexity: 12.6140caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [2860/6471], Loss: 2.0025, Perplexity: 7.4078caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2861/6471], Loss: 1.9323, Perplexity: 6.9051caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [2862/6471], Loss: 2.0303, Perplexity: 7.6164caption shape:  torch.Size([64, 9])
Epoch [3/3], Step [2863/6471], Loss: 2.2619, Perplexity: 9.6014caption shape:  torch.Size([64, 19])
Epoch [3/3], Step [2864/6471], Loss: 2.6127, Perplexity: 13.6361caption shape:  torch.Size([64, 16]

Epoch [3/3], Step [2937/6471], Loss: 2.1852, Perplexity: 8.8927caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [2938/6471], Loss: 2.5029, Perplexity: 12.2180caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [2939/6471], Loss: 2.3829, Perplexity: 10.8362caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [2940/6471], Loss: 2.0249, Perplexity: 7.5751caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2941/6471], Loss: 1.7950, Perplexity: 6.0197caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2942/6471], Loss: 1.9279, Perplexity: 6.8749caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [2943/6471], Loss: 2.0380, Perplexity: 7.6749caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [2944/6471], Loss: 2.0069, Perplexity: 7.4405caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [2945/6471], Loss: 2.2132, Perplexity: 9.1451caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [2946/6471], Loss: 1.8796, Perplexity: 6.5506caption shape:  torch.Size([64, 13]

Epoch [3/3], Step [3019/6471], Loss: 2.0551, Perplexity: 7.8075caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3020/6471], Loss: 2.0401, Perplexity: 7.6916caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3021/6471], Loss: 1.9382, Perplexity: 6.9459caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [3022/6471], Loss: 2.1641, Perplexity: 8.7071caption shape:  torch.Size([64, 38])
Epoch [3/3], Step [3023/6471], Loss: 3.9205, Perplexity: 50.4273caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3024/6471], Loss: 1.9472, Perplexity: 7.0088caption shape:  torch.Size([64, 24])
Epoch [3/3], Step [3025/6471], Loss: 2.8729, Perplexity: 17.6880caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3026/6471], Loss: 1.9332, Perplexity: 6.9116caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3027/6471], Loss: 1.8607, Perplexity: 6.4284caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3028/6471], Loss: 1.9699, Perplexity: 7.1698caption shape:  torch.Size([64, 16]

Epoch [3/3], Step [3101/6471], Loss: 2.1026, Perplexity: 8.1874caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3102/6471], Loss: 1.8591, Perplexity: 6.4181caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3103/6471], Loss: 2.0474, Perplexity: 7.7477caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3104/6471], Loss: 2.0311, Perplexity: 7.6228caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3105/6471], Loss: 1.9679, Perplexity: 7.1558caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3106/6471], Loss: 2.1650, Perplexity: 8.7143caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3107/6471], Loss: 1.8942, Perplexity: 6.6475caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3108/6471], Loss: 1.8170, Perplexity: 6.1536caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [3109/6471], Loss: 2.4325, Perplexity: 11.3876caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [3110/6471], Loss: 2.1529, Perplexity: 8.6099caption shape:  torch.Size([64, 12])

Epoch [3/3], Step [3183/6471], Loss: 2.0177, Perplexity: 7.5212caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [3184/6471], Loss: 2.0683, Perplexity: 7.9110caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3185/6471], Loss: 2.1892, Perplexity: 8.9279caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3186/6471], Loss: 1.8587, Perplexity: 6.4151caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3187/6471], Loss: 1.9637, Perplexity: 7.1254caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3188/6471], Loss: 2.0728, Perplexity: 7.9468caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [3189/6471], Loss: 2.2970, Perplexity: 9.9447caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3190/6471], Loss: 1.8774, Perplexity: 6.5362caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3191/6471], Loss: 1.8243, Perplexity: 6.1987caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3192/6471], Loss: 2.0664, Perplexity: 7.8962caption shape:  torch.Size([64, 15])


Epoch [3/3], Step [3265/6471], Loss: 2.4410, Perplexity: 11.4840caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3266/6471], Loss: 1.9205, Perplexity: 6.8245caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3267/6471], Loss: 1.8267, Perplexity: 6.2131caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3268/6471], Loss: 1.8742, Perplexity: 6.5155caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3269/6471], Loss: 1.9036, Perplexity: 6.7103caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3270/6471], Loss: 1.9997, Perplexity: 7.3870caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3271/6471], Loss: 2.0140, Perplexity: 7.4934caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [3272/6471], Loss: 1.9745, Perplexity: 7.2031caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3273/6471], Loss: 1.9462, Perplexity: 7.0018caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3274/6471], Loss: 1.8502, Perplexity: 6.3612caption shape:  torch.Size([64, 12])

Epoch [3/3], Step [3347/6471], Loss: 2.6768, Perplexity: 14.5389caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3348/6471], Loss: 1.7652, Perplexity: 5.8426caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3349/6471], Loss: 1.9224, Perplexity: 6.8376caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3350/6471], Loss: 1.7401, Perplexity: 5.6977caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3351/6471], Loss: 1.7449, Perplexity: 5.7254caption shape:  torch.Size([64, 24])
Epoch [3/3], Step [3352/6471], Loss: 2.8097, Perplexity: 16.6048caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [3353/6471], Loss: 2.0173, Perplexity: 7.5182caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3354/6471], Loss: 2.1070, Perplexity: 8.2236caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3355/6471], Loss: 2.1119, Perplexity: 8.2643caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3356/6471], Loss: 2.0678, Perplexity: 7.9075caption shape:  torch.Size([64, 20]

Epoch [3/3], Step [3429/6471], Loss: 2.0808, Perplexity: 8.0108caption shape:  torch.Size([64, 21])
Epoch [3/3], Step [3430/6471], Loss: 2.7045, Perplexity: 14.9471caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3431/6471], Loss: 1.8968, Perplexity: 6.6648caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [3432/6471], Loss: 1.9736, Perplexity: 7.1966caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3433/6471], Loss: 1.8890, Perplexity: 6.6127caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3434/6471], Loss: 1.9018, Perplexity: 6.6983caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [3435/6471], Loss: 2.1604, Perplexity: 8.6743caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3436/6471], Loss: 1.8913, Perplexity: 6.6279caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3437/6471], Loss: 1.9134, Perplexity: 6.7763caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3438/6471], Loss: 1.7780, Perplexity: 5.9180caption shape:  torch.Size([64, 10])

Epoch [3/3], Step [3511/6471], Loss: 2.4917, Perplexity: 12.0813caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3512/6471], Loss: 2.0149, Perplexity: 7.5003caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [3513/6471], Loss: 2.3958, Perplexity: 10.9766caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [3514/6471], Loss: 2.4500, Perplexity: 11.5882caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3515/6471], Loss: 1.8059, Perplexity: 6.0855caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3516/6471], Loss: 2.0011, Perplexity: 7.3972caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3517/6471], Loss: 1.8985, Perplexity: 6.6756caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [3518/6471], Loss: 2.2479, Perplexity: 9.4679caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3519/6471], Loss: 1.8629, Perplexity: 6.4421caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3520/6471], Loss: 1.9792, Perplexity: 7.2373caption shape:  torch.Size([64, 14

Epoch [3/3], Step [3593/6471], Loss: 2.0172, Perplexity: 7.5175caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3594/6471], Loss: 1.9974, Perplexity: 7.3699caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [3595/6471], Loss: 2.0666, Perplexity: 7.8977caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3596/6471], Loss: 1.9562, Perplexity: 7.0723caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3597/6471], Loss: 1.7639, Perplexity: 5.8353caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3598/6471], Loss: 1.7700, Perplexity: 5.8706caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3599/6471], Loss: 1.8803, Perplexity: 6.5553caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3600/6471], Loss: 1.8973, Perplexity: 6.6680
caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [3601/6471], Loss: 2.2598, Perplexity: 9.5812caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3602/6471], Loss: 1.9508, Perplexity: 7.0344caption shape:  torch.Size([64, 15])

Epoch [3/3], Step [3675/6471], Loss: 2.3218, Perplexity: 10.1942caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3676/6471], Loss: 2.0028, Perplexity: 7.4099caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3677/6471], Loss: 1.7910, Perplexity: 5.9954caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3678/6471], Loss: 1.8866, Perplexity: 6.5966caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [3679/6471], Loss: 2.2432, Perplexity: 9.4236caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3680/6471], Loss: 1.9422, Perplexity: 6.9739caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [3681/6471], Loss: 2.1966, Perplexity: 8.9943caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3682/6471], Loss: 1.9894, Perplexity: 7.3113caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3683/6471], Loss: 1.9180, Perplexity: 6.8074caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3684/6471], Loss: 1.9545, Perplexity: 7.0601caption shape:  torch.Size([64, 13])

Epoch [3/3], Step [3757/6471], Loss: 2.1391, Perplexity: 8.4917caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [3758/6471], Loss: 2.3564, Perplexity: 10.5529caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3759/6471], Loss: 1.9822, Perplexity: 7.2590caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3760/6471], Loss: 1.8027, Perplexity: 6.0661caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3761/6471], Loss: 2.0037, Perplexity: 7.4165caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3762/6471], Loss: 1.8392, Perplexity: 6.2917caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3763/6471], Loss: 2.1214, Perplexity: 8.3428caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [3764/6471], Loss: 2.0315, Perplexity: 7.6255caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [3765/6471], Loss: 1.9864, Perplexity: 7.2889caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3766/6471], Loss: 2.0179, Perplexity: 7.5224caption shape:  torch.Size([64, 10])

Epoch [3/3], Step [3839/6471], Loss: 2.0767, Perplexity: 7.9784caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3840/6471], Loss: 2.0006, Perplexity: 7.3938caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3841/6471], Loss: 1.6764, Perplexity: 5.3465caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [3842/6471], Loss: 2.4924, Perplexity: 12.0903caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [3843/6471], Loss: 1.9839, Perplexity: 7.2711caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [3844/6471], Loss: 2.2766, Perplexity: 9.7432caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [3845/6471], Loss: 2.2147, Perplexity: 9.1588caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3846/6471], Loss: 1.9556, Perplexity: 7.0684caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3847/6471], Loss: 2.0156, Perplexity: 7.5056caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3848/6471], Loss: 1.9540, Perplexity: 7.0566caption shape:  torch.Size([64, 16])

Epoch [3/3], Step [3921/6471], Loss: 2.3090, Perplexity: 10.0639caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3922/6471], Loss: 1.7647, Perplexity: 5.8400caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3923/6471], Loss: 1.9787, Perplexity: 7.2333caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3924/6471], Loss: 1.9471, Perplexity: 7.0087caption shape:  torch.Size([64, 19])
Epoch [3/3], Step [3925/6471], Loss: 2.5483, Perplexity: 12.7859caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [3926/6471], Loss: 1.8602, Perplexity: 6.4249caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [3927/6471], Loss: 2.2541, Perplexity: 9.5267caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [3928/6471], Loss: 2.1860, Perplexity: 8.8994caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [3929/6471], Loss: 1.6659, Perplexity: 5.2906caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [3930/6471], Loss: 1.8220, Perplexity: 6.1843caption shape:  torch.Size([64, 11]

Epoch [3/3], Step [4003/6471], Loss: 1.9516, Perplexity: 7.0399caption shape:  torch.Size([64, 9])
Epoch [3/3], Step [4004/6471], Loss: 2.3774, Perplexity: 10.7763caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [4005/6471], Loss: 2.0456, Perplexity: 7.7337caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [4006/6471], Loss: 2.3522, Perplexity: 10.5085caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4007/6471], Loss: 1.9969, Perplexity: 7.3665caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4008/6471], Loss: 1.7262, Perplexity: 5.6194caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4009/6471], Loss: 2.1183, Perplexity: 8.3171caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4010/6471], Loss: 1.7655, Perplexity: 5.8442caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [4011/6471], Loss: 2.0309, Perplexity: 7.6207caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [4012/6471], Loss: 2.3007, Perplexity: 9.9812caption shape:  torch.Size([64, 13])

Epoch [3/3], Step [4085/6471], Loss: 1.8684, Perplexity: 6.4780caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [4086/6471], Loss: 1.9521, Perplexity: 7.0438caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4087/6471], Loss: 1.9847, Perplexity: 7.2766caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4088/6471], Loss: 1.9550, Perplexity: 7.0636caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4089/6471], Loss: 1.8759, Perplexity: 6.5270caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [4090/6471], Loss: 2.3098, Perplexity: 10.0725caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4091/6471], Loss: 1.8054, Perplexity: 6.0827caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4092/6471], Loss: 2.1213, Perplexity: 8.3421caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [4093/6471], Loss: 2.0348, Perplexity: 7.6511caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [4094/6471], Loss: 2.0794, Perplexity: 8.0001caption shape:  torch.Size([64, 11])

Epoch [3/3], Step [4167/6471], Loss: 1.8156, Perplexity: 6.1450caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [4168/6471], Loss: 2.1487, Perplexity: 8.5738caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4169/6471], Loss: 1.8922, Perplexity: 6.6336caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4170/6471], Loss: 2.0229, Perplexity: 7.5601caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4171/6471], Loss: 1.8097, Perplexity: 6.1084caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4172/6471], Loss: 1.8883, Perplexity: 6.6084caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4173/6471], Loss: 2.2291, Perplexity: 9.2919caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4174/6471], Loss: 1.7749, Perplexity: 5.8997caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4175/6471], Loss: 1.8922, Perplexity: 6.6343caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4176/6471], Loss: 1.9319, Perplexity: 6.9029caption shape:  torch.Size([64, 13])


Epoch [3/3], Step [4249/6471], Loss: 1.9335, Perplexity: 6.9134caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4250/6471], Loss: 1.8006, Perplexity: 6.0534caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4251/6471], Loss: 1.8802, Perplexity: 6.5547caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4252/6471], Loss: 1.8374, Perplexity: 6.2805caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [4253/6471], Loss: 2.0962, Perplexity: 8.1353caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4254/6471], Loss: 1.8442, Perplexity: 6.3227caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4255/6471], Loss: 1.8951, Perplexity: 6.6530caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4256/6471], Loss: 1.8633, Perplexity: 6.4448caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [4257/6471], Loss: 2.3807, Perplexity: 10.8130caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4258/6471], Loss: 1.8955, Perplexity: 6.6560caption shape:  torch.Size([64, 12])

Epoch [3/3], Step [4408/6471], Loss: 2.0139, Perplexity: 7.4928caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [4409/6471], Loss: 2.3332, Perplexity: 10.3104caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4410/6471], Loss: 1.7274, Perplexity: 5.6261caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4411/6471], Loss: 1.9037, Perplexity: 6.7106caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4412/6471], Loss: 2.0066, Perplexity: 7.4378caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4413/6471], Loss: 2.0880, Perplexity: 8.0689caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4414/6471], Loss: 1.9503, Perplexity: 7.0307caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4415/6471], Loss: 1.9255, Perplexity: 6.8583caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [4416/6471], Loss: 2.1369, Perplexity: 8.4731caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4417/6471], Loss: 1.9223, Perplexity: 6.8368caption shape:  torch.Size([64, 16])

Epoch [3/3], Step [4490/6471], Loss: 1.6925, Perplexity: 5.4328caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4491/6471], Loss: 1.9910, Perplexity: 7.3232caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4492/6471], Loss: 1.9283, Perplexity: 6.8780caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [4493/6471], Loss: 2.3848, Perplexity: 10.8567caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4494/6471], Loss: 2.0104, Perplexity: 7.4660caption shape:  torch.Size([64, 21])
Epoch [3/3], Step [4495/6471], Loss: 2.5971, Perplexity: 13.4245caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4496/6471], Loss: 2.0274, Perplexity: 7.5945caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [4497/6471], Loss: 2.0812, Perplexity: 8.0143caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4498/6471], Loss: 1.9142, Perplexity: 6.7812caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4499/6471], Loss: 1.9393, Perplexity: 6.9537caption shape:  torch.Size([64, 14]

Epoch [3/3], Step [4572/6471], Loss: 1.9268, Perplexity: 6.8672caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [4573/6471], Loss: 2.1202, Perplexity: 8.3326caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4574/6471], Loss: 1.9311, Perplexity: 6.8968caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4575/6471], Loss: 1.9973, Perplexity: 7.3690caption shape:  torch.Size([64, 19])
Epoch [3/3], Step [4576/6471], Loss: 2.4198, Perplexity: 11.2438caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [4577/6471], Loss: 2.0922, Perplexity: 8.1029caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4578/6471], Loss: 1.8550, Perplexity: 6.3917caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4579/6471], Loss: 1.9446, Perplexity: 6.9911caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [4580/6471], Loss: 2.2613, Perplexity: 9.5951caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [4581/6471], Loss: 2.2357, Perplexity: 9.3527caption shape:  torch.Size([64, 11])

Epoch [3/3], Step [4654/6471], Loss: 2.1277, Perplexity: 8.3955caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4655/6471], Loss: 1.9146, Perplexity: 6.7842caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4656/6471], Loss: 1.8404, Perplexity: 6.2990caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4657/6471], Loss: 1.9525, Perplexity: 7.0460caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [4658/6471], Loss: 1.9267, Perplexity: 6.8665caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4659/6471], Loss: 1.9175, Perplexity: 6.8039caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4660/6471], Loss: 2.2126, Perplexity: 9.1391caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4661/6471], Loss: 2.1492, Perplexity: 8.5779caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [4662/6471], Loss: 2.2382, Perplexity: 9.3768caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [4663/6471], Loss: 2.3039, Perplexity: 10.0127caption shape:  torch.Size([64, 12])

Epoch [3/3], Step [4736/6471], Loss: 2.1001, Perplexity: 8.1668caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [4737/6471], Loss: 2.1251, Perplexity: 8.3734caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4738/6471], Loss: 2.0494, Perplexity: 7.7632caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4739/6471], Loss: 2.1168, Perplexity: 8.3048caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [4740/6471], Loss: 2.2496, Perplexity: 9.4843caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4741/6471], Loss: 1.9308, Perplexity: 6.8947caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4742/6471], Loss: 1.9827, Perplexity: 7.2624caption shape:  torch.Size([64, 26])
Epoch [3/3], Step [4743/6471], Loss: 3.1953, Perplexity: 24.4185caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4744/6471], Loss: 1.9059, Perplexity: 6.7253caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4745/6471], Loss: 1.9684, Perplexity: 7.1595caption shape:  torch.Size([64, 14])

Epoch [3/3], Step [4818/6471], Loss: 1.9337, Perplexity: 6.9148caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [4819/6471], Loss: 1.9188, Perplexity: 6.8126caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4820/6471], Loss: 1.8101, Perplexity: 6.1110caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4821/6471], Loss: 1.9614, Perplexity: 7.1093caption shape:  torch.Size([64, 22])
Epoch [3/3], Step [4822/6471], Loss: 2.6276, Perplexity: 13.8400caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4823/6471], Loss: 1.8215, Perplexity: 6.1809caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4824/6471], Loss: 2.0037, Perplexity: 7.4161caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4825/6471], Loss: 1.9071, Perplexity: 6.7334caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4826/6471], Loss: 1.9183, Perplexity: 6.8091caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4827/6471], Loss: 1.8154, Perplexity: 6.1434caption shape:  torch.Size([64, 14])

Epoch [3/3], Step [4900/6471], Loss: 2.1564, Perplexity: 8.6400
caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4901/6471], Loss: 1.8018, Perplexity: 6.0608caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4902/6471], Loss: 1.9610, Perplexity: 7.1066caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4903/6471], Loss: 2.0822, Perplexity: 8.0219caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4904/6471], Loss: 1.8424, Perplexity: 6.3120caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [4905/6471], Loss: 2.3342, Perplexity: 10.3213caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [4906/6471], Loss: 1.9087, Perplexity: 6.7442caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [4907/6471], Loss: 1.8209, Perplexity: 6.1776caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4908/6471], Loss: 1.8544, Perplexity: 6.3876caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4909/6471], Loss: 1.9283, Perplexity: 6.8777caption shape:  torch.Size([64, 13]

Epoch [3/3], Step [4982/6471], Loss: 2.5046, Perplexity: 12.2387caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4983/6471], Loss: 2.0850, Perplexity: 8.0448caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [4984/6471], Loss: 2.2374, Perplexity: 9.3689caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4985/6471], Loss: 1.9513, Perplexity: 7.0377caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [4986/6471], Loss: 2.0461, Perplexity: 7.7376caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [4987/6471], Loss: 1.9903, Perplexity: 7.3176caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4988/6471], Loss: 2.0262, Perplexity: 7.5853caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [4989/6471], Loss: 1.8442, Perplexity: 6.3230caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [4990/6471], Loss: 1.8537, Perplexity: 6.3834caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [4991/6471], Loss: 2.1607, Perplexity: 8.6774caption shape:  torch.Size([64, 18])

Epoch [3/3], Step [5064/6471], Loss: 1.9383, Perplexity: 6.9472caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [5065/6471], Loss: 2.0008, Perplexity: 7.3948caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5066/6471], Loss: 1.9958, Perplexity: 7.3583caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5067/6471], Loss: 1.9623, Perplexity: 7.1156caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [5068/6471], Loss: 2.2385, Perplexity: 9.3789caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5069/6471], Loss: 2.0481, Perplexity: 7.7528caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5070/6471], Loss: 2.0273, Perplexity: 7.5932caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5071/6471], Loss: 1.9393, Perplexity: 6.9539caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5072/6471], Loss: 1.7479, Perplexity: 5.7424caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [5073/6471], Loss: 2.2537, Perplexity: 9.5229caption shape:  torch.Size([64, 15])


Epoch [3/3], Step [5146/6471], Loss: 2.0229, Perplexity: 7.5602caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5147/6471], Loss: 2.0411, Perplexity: 7.6992caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5148/6471], Loss: 1.8886, Perplexity: 6.6104caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5149/6471], Loss: 1.9567, Perplexity: 7.0757caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5150/6471], Loss: 1.8514, Perplexity: 6.3689caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5151/6471], Loss: 1.9545, Perplexity: 7.0601caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5152/6471], Loss: 2.0021, Perplexity: 7.4045caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5153/6471], Loss: 1.8550, Perplexity: 6.3918caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5154/6471], Loss: 1.9664, Perplexity: 7.1448caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5155/6471], Loss: 1.8883, Perplexity: 6.6082caption shape:  torch.Size([64, 12])


Epoch [3/3], Step [5228/6471], Loss: 2.0551, Perplexity: 7.8079caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5229/6471], Loss: 1.8598, Perplexity: 6.4224caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5230/6471], Loss: 2.1177, Perplexity: 8.3116caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5231/6471], Loss: 1.9062, Perplexity: 6.7277caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5232/6471], Loss: 1.7780, Perplexity: 5.9180caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5233/6471], Loss: 2.1485, Perplexity: 8.5722caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [5234/6471], Loss: 2.0593, Perplexity: 7.8403caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5235/6471], Loss: 1.8479, Perplexity: 6.3464caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5236/6471], Loss: 1.8515, Perplexity: 6.3691caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [5237/6471], Loss: 2.2751, Perplexity: 9.7289caption shape:  torch.Size([64, 19])


Epoch [3/3], Step [5310/6471], Loss: 1.8629, Perplexity: 6.4422caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [5311/6471], Loss: 2.1496, Perplexity: 8.5812caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [5312/6471], Loss: 2.0611, Perplexity: 7.8549caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5313/6471], Loss: 1.8907, Perplexity: 6.6239caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5314/6471], Loss: 1.9288, Perplexity: 6.8811caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [5315/6471], Loss: 2.1547, Perplexity: 8.6250caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [5316/6471], Loss: 2.2699, Perplexity: 9.6783caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5317/6471], Loss: 2.0522, Perplexity: 7.7849caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5318/6471], Loss: 1.8422, Perplexity: 6.3106caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5319/6471], Loss: 1.9498, Perplexity: 7.0274caption shape:  torch.Size([64, 22])


Epoch [3/3], Step [5392/6471], Loss: 1.9551, Perplexity: 7.0648caption shape:  torch.Size([64, 25])
Epoch [3/3], Step [5393/6471], Loss: 2.6516, Perplexity: 14.1765caption shape:  torch.Size([64, 29])
Epoch [3/3], Step [5394/6471], Loss: 3.1297, Perplexity: 22.8672caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5395/6471], Loss: 2.0119, Perplexity: 7.4773caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5396/6471], Loss: 1.8401, Perplexity: 6.2975caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5397/6471], Loss: 1.7588, Perplexity: 5.8053caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5398/6471], Loss: 2.0381, Perplexity: 7.6761caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [5399/6471], Loss: 1.9351, Perplexity: 6.9249caption shape:  torch.Size([64, 18])
Epoch [3/3], Step [5400/6471], Loss: 2.3465, Perplexity: 10.4488
caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5401/6471], Loss: 2.0511, Perplexity: 7.7767caption shape:  torch.Size([64, 1

Epoch [3/3], Step [5474/6471], Loss: 1.9201, Perplexity: 6.8215caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5475/6471], Loss: 1.8786, Perplexity: 6.5442caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5476/6471], Loss: 2.0545, Perplexity: 7.8026caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [5477/6471], Loss: 2.1769, Perplexity: 8.8186caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5478/6471], Loss: 1.7416, Perplexity: 5.7067caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5479/6471], Loss: 1.9824, Perplexity: 7.2601caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5480/6471], Loss: 1.7996, Perplexity: 6.0472caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5481/6471], Loss: 2.1359, Perplexity: 8.4650caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [5482/6471], Loss: 2.0819, Perplexity: 8.0194caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5483/6471], Loss: 1.8836, Perplexity: 6.5774caption shape:  torch.Size([64, 15])


Epoch [3/3], Step [5556/6471], Loss: 2.1189, Perplexity: 8.3218caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [5557/6471], Loss: 1.9504, Perplexity: 7.0314caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5558/6471], Loss: 1.8824, Perplexity: 6.5693caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [5559/6471], Loss: 1.9361, Perplexity: 6.9316caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5560/6471], Loss: 2.0930, Perplexity: 8.1096caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [5561/6471], Loss: 2.0943, Perplexity: 8.1198caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [5562/6471], Loss: 2.3335, Perplexity: 10.3141caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5563/6471], Loss: 1.8426, Perplexity: 6.3129caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5564/6471], Loss: 1.9663, Perplexity: 7.1445caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5565/6471], Loss: 2.0155, Perplexity: 7.5044caption shape:  torch.Size([64, 13])

Epoch [3/3], Step [5638/6471], Loss: 1.8371, Perplexity: 6.2785caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [5639/6471], Loss: 2.2543, Perplexity: 9.5284caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [5640/6471], Loss: 2.0514, Perplexity: 7.7790caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5641/6471], Loss: 1.8349, Perplexity: 6.2648caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5642/6471], Loss: 1.8409, Perplexity: 6.3024caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [5643/6471], Loss: 2.2119, Perplexity: 9.1333caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5644/6471], Loss: 1.8784, Perplexity: 6.5430caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [5645/6471], Loss: 2.1077, Perplexity: 8.2297caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5646/6471], Loss: 2.0652, Perplexity: 7.8866caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5647/6471], Loss: 2.0887, Perplexity: 8.0747caption shape:  torch.Size([64, 14])


Epoch [3/3], Step [5720/6471], Loss: 1.9994, Perplexity: 7.3843caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5721/6471], Loss: 1.7758, Perplexity: 5.9049caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [5722/6471], Loss: 2.1791, Perplexity: 8.8383caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [5723/6471], Loss: 2.0268, Perplexity: 7.5899caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5724/6471], Loss: 1.8415, Perplexity: 6.3061caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5725/6471], Loss: 1.9668, Perplexity: 7.1479caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5726/6471], Loss: 1.9530, Perplexity: 7.0501caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5727/6471], Loss: 1.9262, Perplexity: 6.8637caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [5728/6471], Loss: 2.2927, Perplexity: 9.9019caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5729/6471], Loss: 1.9045, Perplexity: 6.7162caption shape:  torch.Size([64, 14])


Epoch [3/3], Step [5802/6471], Loss: 2.1566, Perplexity: 8.6420caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5803/6471], Loss: 1.8986, Perplexity: 6.6762caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5804/6471], Loss: 1.7266, Perplexity: 5.6216caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [5805/6471], Loss: 1.8290, Perplexity: 6.2274caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5806/6471], Loss: 1.9502, Perplexity: 7.0302caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5807/6471], Loss: 1.9844, Perplexity: 7.2746caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5808/6471], Loss: 1.8992, Perplexity: 6.6804caption shape:  torch.Size([64, 20])
Epoch [3/3], Step [5809/6471], Loss: 2.6311, Perplexity: 13.8893caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [5810/6471], Loss: 2.2532, Perplexity: 9.5183caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5811/6471], Loss: 1.9673, Perplexity: 7.1515caption shape:  torch.Size([64, 14])

Epoch [3/3], Step [5884/6471], Loss: 2.2334, Perplexity: 9.3312caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5885/6471], Loss: 1.8965, Perplexity: 6.6625caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5886/6471], Loss: 2.0458, Perplexity: 7.7354caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5887/6471], Loss: 1.8589, Perplexity: 6.4170caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5888/6471], Loss: 1.8741, Perplexity: 6.5150caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5889/6471], Loss: 1.9332, Perplexity: 6.9119caption shape:  torch.Size([64, 17])
Epoch [3/3], Step [5890/6471], Loss: 2.3097, Perplexity: 10.0719caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5891/6471], Loss: 1.8284, Perplexity: 6.2236caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5892/6471], Loss: 1.9540, Perplexity: 7.0570caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [5893/6471], Loss: 1.7104, Perplexity: 5.5311caption shape:  torch.Size([64, 11])

Epoch [3/3], Step [5966/6471], Loss: 1.7538, Perplexity: 5.7765caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [5967/6471], Loss: 2.1157, Perplexity: 8.2951caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5968/6471], Loss: 1.8126, Perplexity: 6.1264caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5969/6471], Loss: 1.8624, Perplexity: 6.4394caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5970/6471], Loss: 1.8536, Perplexity: 6.3829caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5971/6471], Loss: 1.9440, Perplexity: 6.9865caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [5972/6471], Loss: 1.9490, Perplexity: 7.0219caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [5973/6471], Loss: 1.9654, Perplexity: 7.1380caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [5974/6471], Loss: 2.0508, Perplexity: 7.7744caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [5975/6471], Loss: 1.9620, Perplexity: 7.1138caption shape:  torch.Size([64, 13])


Epoch [3/3], Step [6048/6471], Loss: 2.0843, Perplexity: 8.0391caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [6049/6471], Loss: 1.8534, Perplexity: 6.3812caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [6050/6471], Loss: 2.0402, Perplexity: 7.6925caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [6051/6471], Loss: 1.9248, Perplexity: 6.8539caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [6052/6471], Loss: 1.9504, Perplexity: 7.0314caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [6053/6471], Loss: 1.8060, Perplexity: 6.0863caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [6054/6471], Loss: 2.0043, Perplexity: 7.4212caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [6055/6471], Loss: 1.9577, Perplexity: 7.0829caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [6056/6471], Loss: 1.7873, Perplexity: 5.9733caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [6057/6471], Loss: 1.8332, Perplexity: 6.2538caption shape:  torch.Size([64, 11])


Epoch [3/3], Step [6130/6471], Loss: 1.8144, Perplexity: 6.1373caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [6131/6471], Loss: 1.9749, Perplexity: 7.2057caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [6132/6471], Loss: 1.8550, Perplexity: 6.3915caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [6133/6471], Loss: 1.8855, Perplexity: 6.5894caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [6134/6471], Loss: 1.7186, Perplexity: 5.5768caption shape:  torch.Size([64, 10])
Epoch [3/3], Step [6135/6471], Loss: 2.0875, Perplexity: 8.0644caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [6136/6471], Loss: 2.1373, Perplexity: 8.4765caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [6137/6471], Loss: 1.6897, Perplexity: 5.4180caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [6138/6471], Loss: 1.9496, Perplexity: 7.0256caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [6139/6471], Loss: 2.0174, Perplexity: 7.5185caption shape:  torch.Size([64, 11])


Epoch [3/3], Step [6212/6471], Loss: 2.2907, Perplexity: 9.8814caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [6213/6471], Loss: 2.1070, Perplexity: 8.2235caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [6214/6471], Loss: 1.8717, Perplexity: 6.4993caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [6215/6471], Loss: 2.1225, Perplexity: 8.3517caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [6216/6471], Loss: 1.8889, Perplexity: 6.6120caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [6217/6471], Loss: 1.9167, Perplexity: 6.7988caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [6218/6471], Loss: 1.7366, Perplexity: 5.6778caption shape:  torch.Size([64, 9])
Epoch [3/3], Step [6219/6471], Loss: 2.3197, Perplexity: 10.1730caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [6220/6471], Loss: 1.9355, Perplexity: 6.9278caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [6221/6471], Loss: 1.9038, Perplexity: 6.7112caption shape:  torch.Size([64, 13])


Epoch [3/3], Step [6294/6471], Loss: 1.9416, Perplexity: 6.9696caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [6295/6471], Loss: 2.1710, Perplexity: 8.7672caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [6296/6471], Loss: 1.7418, Perplexity: 5.7074caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [6297/6471], Loss: 1.9416, Perplexity: 6.9698caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [6298/6471], Loss: 1.9422, Perplexity: 6.9742caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [6299/6471], Loss: 1.9971, Perplexity: 7.3677caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [6300/6471], Loss: 1.9511, Perplexity: 7.0362
caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [6301/6471], Loss: 2.1092, Perplexity: 8.2416caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [6302/6471], Loss: 1.8828, Perplexity: 6.5721caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [6303/6471], Loss: 1.8662, Perplexity: 6.4639caption shape:  torch.Size([64, 17])

Epoch [3/3], Step [6376/6471], Loss: 1.9298, Perplexity: 6.8879caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [6377/6471], Loss: 2.0689, Perplexity: 7.9158caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [6378/6471], Loss: 1.8200, Perplexity: 6.1722caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [6379/6471], Loss: 1.9405, Perplexity: 6.9625caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [6380/6471], Loss: 1.9330, Perplexity: 6.9105caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [6381/6471], Loss: 1.8857, Perplexity: 6.5912caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [6382/6471], Loss: 1.9144, Perplexity: 6.7829caption shape:  torch.Size([64, 13])
Epoch [3/3], Step [6383/6471], Loss: 2.0318, Perplexity: 7.6276caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [6384/6471], Loss: 1.8964, Perplexity: 6.6621caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [6385/6471], Loss: 1.8553, Perplexity: 6.3934caption shape:  torch.Size([64, 16])


Epoch [3/3], Step [6458/6471], Loss: 1.8811, Perplexity: 6.5605caption shape:  torch.Size([64, 16])
Epoch [3/3], Step [6459/6471], Loss: 2.0200, Perplexity: 7.5385caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [6460/6471], Loss: 1.9455, Perplexity: 6.9972caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [6461/6471], Loss: 1.8205, Perplexity: 6.1749caption shape:  torch.Size([64, 15])
Epoch [3/3], Step [6462/6471], Loss: 1.9368, Perplexity: 6.9364caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [6463/6471], Loss: 1.9170, Perplexity: 6.8007caption shape:  torch.Size([64, 11])
Epoch [3/3], Step [6464/6471], Loss: 1.8536, Perplexity: 6.3828caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [6465/6471], Loss: 1.8774, Perplexity: 6.5364caption shape:  torch.Size([64, 12])
Epoch [3/3], Step [6466/6471], Loss: 1.8236, Perplexity: 6.1943caption shape:  torch.Size([64, 14])
Epoch [3/3], Step [6467/6471], Loss: 1.9749, Perplexity: 7.2057caption shape:  torch.Size([64, 10])


<a id='step3'></a>
## Step 3: (Optional) Validate your Model

To assess potential overfitting, one approach is to assess performance on a validation set.  If you decide to do this **optional** task, you are required to first complete all of the steps in the next notebook in the sequence (**3_Inference.ipynb**); as part of that notebook, you will write and test code (specifically, the `sample` method in the `DecoderRNN` class) that uses your RNN decoder to generate captions.  That code will prove incredibly useful here. 

If you decide to validate your model, please do not edit the data loader in **data_loader.py**.  Instead, create a new file named **data_loader_val.py** containing the code for obtaining the data loader for the validation data.  You can access:
- the validation images at filepath `'/opt/cocoapi/images/train2014/'`, and
- the validation image caption annotation file at filepath `'/opt/cocoapi/annotations/captions_val2014.json'`.

The suggested approach to validating your model involves creating a json file such as [this one](https://github.com/cocodataset/cocoapi/blob/master/results/captions_val2014_fakecap_results.json) containing your model's predicted captions for the validation images.  Then, you can write your own script or use one that you [find online](https://github.com/tylin/coco-caption) to calculate the BLEU score of your model.  You can read more about the BLEU score, along with other evaluation metrics (such as TEOR and Cider) in section 4.1 of [this paper](https://arxiv.org/pdf/1411.4555.pdf).  For more information about how to use the annotation file, check out the [website](http://cocodataset.org/#download) for the COCO dataset.