# Computer Vision Nanodegree

## Project: Image Captioning

---

In this notebook, you will train your model using the files

Feel free to use the links below to navigate the notebook:
- [Step 1](#step1): Training Setup
- [Step 2](#step2): Train your Model!

<a id='step1'></a>
## Step 1: Training Setup

do NOT change the lines of code that are not preceded with a TODO statement.

While working on this notebook, you are strongly encouraged to keep `transform_train` at the provided value.  However, when training the model in the next notebook in this sequence, you are welcome (and encouraged!) to tinker with the 

Note that the images in the dataset all have variable height and width, and so your tensor must 


In [1]:
import torch
import torch.nn as nn
from torch.autograd import Variable
from torchvision import transforms
from data_loader import get_loader
from model import EncoderCNN, DecoderRNN
import math

def to_var(x, volatile=False):
    """ converts a Pytorch Tensor to a variable and moves to GPU if CUDA is available """
    if torch.cuda.is_available():
        x = x.cuda()
    return Variable(x, volatile=volatile)

## TODO: Select appropriate values for the Python variables below.
batch_size = 128
vocab_threshold = 4
embed_size = 256
hidden_size = 512
num_epochs = 5
save_every = 1

# TODO: Amend the image transform below.
transform_train = transforms.Compose([ 
    transforms.RandomCrop(224),
    transforms.RandomHorizontalFlip(), 
    transforms.ToTensor(), 
    transforms.Normalize((0.485, 0.456, 0.406), 
                         (0.229, 0.224, 0.225))])

# Build data loader.
data_loader = get_loader(transform=transform_train,
                         mode='train',
                         batch_size=batch_size)

# The size of the vocabulary.
vocab_size = len(data_loader.dataset.vocab)

# Initialize the encoder and decoder. 
encoder = EncoderCNN(embed_size)
decoder = DecoderRNN(embed_size, hidden_size, vocab_size)

# Move models to GPU if CUDA is available. 
if torch.cuda.is_available():
    encoder.cuda()
    decoder.cuda()

# Define the loss function. 
criterion = nn.CrossEntropyLoss().cuda() if torch.cuda.is_available() else nn.CrossEntropyLoss()

# TODO: Specify the learnable parameters of the model.
params = list(decoder.parameters()) + list(encoder.linear.parameters()) 

# TODO: Define the optimizer.
optimizer = torch.optim.Adam(params=params, lr=0.001)

# Set the total number of training steps per epoch.
total_step = math.ceil(len(data_loader.dataset.caption_lengths) / data_loader.batch_sampler.batch_size)

Vocabulary successfully loaded from vocab.pkl file!
loading annotations into memory...
Done (t=0.51s)
creating index...


  0%|          | 821/414113 [00:00<00:50, 8207.05it/s]

index created!
Obtaining caption lengths...


100%|██████████| 414113/414113 [00:41<00:00, 10007.18it/s]


<a id='step2'></a>
## Step 2: Train your Model!
- add options for validation

In [None]:
# (Optional) TODO: Load pre-trained weights before resuming training.
encoder.load_state_dict(torch.load(os.path.join('./models', encoder_file)))
decoder.load_state_dict(torch.load(os.path.join('./models', decoder_file)))

# need to clarify what will happen here ... 
# when resume training ... what will get overwritten, etc

In [2]:
import torch.utils.data as data
import numpy as np
import os

for epoch in range(num_epochs):
    
    for i_step in range(0, total_step):
        
        # Randomly sample a caption length, and sample indices with that length.
        indices = data_loader.dataset.get_train_indices()
        # Create and assign a batch sampler to retrieve a batch with the sampled indices.
        new_sampler = data.sampler.SubsetRandomSampler(indices=indices)
        data_loader.batch_sampler.sampler = new_sampler

        # Obtain the batch.
        for batch in data_loader:
            images, captions = batch[0], batch[1]
            break 
        
        # Convert batch of images and captions to Pytorch Variable.
        images = to_var(images, volatile=True)
        captions = to_var(captions)
        
        # Zero the gradients.
        decoder.zero_grad()
        encoder.zero_grad()
        
        # Pass the inputs through the CNN-RNN model.
        features = encoder(images)
        outputs = decoder(features, captions)
        
        # Calculate the batch loss.
        loss = criterion(outputs.view(-1, vocab_size), captions.view(-1))
        
        # Backward pass.
        loss.backward()
        
        # Update the parameters in the optimizer.
        optimizer.step()
            
        # Print training statistics.
        print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f, Perplexity: %5.4f'
            %(epoch+1, num_epochs, i_step, total_step, loss.data[0], np.exp(loss.data[0]))) 
            
    # Save the weights.
    if epoch % save_every == 0:
        torch.save(decoder.state_dict(), os.path.join('./models', 'decoder-%d.pkl' %(epoch+1)))
        torch.save(encoder.state_dict(), os.path.join('./models', 'encoder-%d.pkl' %(epoch+1)))

Epoch [1/5], Step [0/3236], Loss: 9.2141, Perplexity: 10037.4266
Epoch [1/5], Step [1/3236], Loss: 9.0689, Perplexity: 8681.1117
Epoch [1/5], Step [2/3236], Loss: 8.8996, Perplexity: 7329.0346
Epoch [1/5], Step [3/3236], Loss: 8.5955, Perplexity: 5407.0760
Epoch [1/5], Step [4/3236], Loss: 8.2678, Perplexity: 3896.2536
Epoch [1/5], Step [5/3236], Loss: 7.6695, Perplexity: 2141.9108
Epoch [1/5], Step [6/3236], Loss: 6.9323, Perplexity: 1024.8324
Epoch [1/5], Step [7/3236], Loss: 5.9402, Perplexity: 380.0042
Epoch [1/5], Step [8/3236], Loss: 5.5904, Perplexity: 267.8499
Epoch [1/5], Step [9/3236], Loss: 5.2861, Perplexity: 197.5807
Epoch [1/5], Step [10/3236], Loss: 4.9650, Perplexity: 143.3015
Epoch [1/5], Step [11/3236], Loss: 4.8474, Perplexity: 127.4116
Epoch [1/5], Step [12/3236], Loss: 4.9447, Perplexity: 140.4223
Epoch [1/5], Step [13/3236], Loss: 4.7437, Perplexity: 114.8541
Epoch [1/5], Step [14/3236], Loss: 4.9061, Perplexity: 135.1164
Epoch [1/5], Step [15/3236], Loss: 4.6917,

Epoch [1/5], Step [130/3236], Loss: 4.2877, Perplexity: 72.7962
Epoch [1/5], Step [131/3236], Loss: 3.7741, Perplexity: 43.5602
Epoch [1/5], Step [132/3236], Loss: 4.1360, Perplexity: 62.5544
Epoch [1/5], Step [133/3236], Loss: 3.7762, Perplexity: 43.6491
Epoch [1/5], Step [134/3236], Loss: 3.7966, Perplexity: 44.5503
Epoch [1/5], Step [135/3236], Loss: 3.6764, Perplexity: 39.5042
Epoch [1/5], Step [136/3236], Loss: 3.4023, Perplexity: 30.0344
Epoch [1/5], Step [137/3236], Loss: 3.6494, Perplexity: 38.4501
Epoch [1/5], Step [138/3236], Loss: 3.6852, Perplexity: 39.8531
Epoch [1/5], Step [139/3236], Loss: 3.7985, Perplexity: 44.6355
Epoch [1/5], Step [140/3236], Loss: 3.6451, Perplexity: 38.2859
Epoch [1/5], Step [141/3236], Loss: 3.6183, Perplexity: 37.2752
Epoch [1/5], Step [142/3236], Loss: 3.4733, Perplexity: 32.2427
Epoch [1/5], Step [143/3236], Loss: 3.5757, Perplexity: 35.7195
Epoch [1/5], Step [144/3236], Loss: 3.4761, Perplexity: 32.3336
Epoch [1/5], Step [145/3236], Loss: 3.60

Epoch [1/5], Step [259/3236], Loss: 3.2303, Perplexity: 25.2866
Epoch [1/5], Step [260/3236], Loss: 3.7294, Perplexity: 41.6536
Epoch [1/5], Step [261/3236], Loss: 4.0398, Perplexity: 56.8131
Epoch [1/5], Step [262/3236], Loss: 3.3441, Perplexity: 28.3337
Epoch [1/5], Step [263/3236], Loss: 3.3499, Perplexity: 28.4999
Epoch [1/5], Step [264/3236], Loss: 3.6971, Perplexity: 40.3292
Epoch [1/5], Step [265/3236], Loss: 3.9976, Perplexity: 54.4698
Epoch [1/5], Step [266/3236], Loss: 3.2552, Perplexity: 25.9250
Epoch [1/5], Step [267/3236], Loss: 3.2078, Perplexity: 24.7251
Epoch [1/5], Step [268/3236], Loss: 3.4028, Perplexity: 30.0482
Epoch [1/5], Step [269/3236], Loss: 3.3819, Perplexity: 29.4252
Epoch [1/5], Step [270/3236], Loss: 3.2454, Perplexity: 25.6718
Epoch [1/5], Step [271/3236], Loss: 3.5883, Perplexity: 36.1742
Epoch [1/5], Step [272/3236], Loss: 3.2967, Perplexity: 27.0224
Epoch [1/5], Step [273/3236], Loss: 3.6120, Perplexity: 37.0394
Epoch [1/5], Step [274/3236], Loss: 3.41

Epoch [1/5], Step [388/3236], Loss: 3.0887, Perplexity: 21.9484
Epoch [1/5], Step [389/3236], Loss: 3.9378, Perplexity: 51.3037
Epoch [1/5], Step [390/3236], Loss: 3.0207, Perplexity: 20.5067
Epoch [1/5], Step [391/3236], Loss: 3.6041, Perplexity: 36.7468
Epoch [1/5], Step [392/3236], Loss: 3.6262, Perplexity: 37.5682
Epoch [1/5], Step [393/3236], Loss: 3.0826, Perplexity: 21.8156
Epoch [1/5], Step [394/3236], Loss: 3.0587, Perplexity: 21.3003
Epoch [1/5], Step [395/3236], Loss: 3.0145, Perplexity: 20.3786
Epoch [1/5], Step [396/3236], Loss: 3.1673, Perplexity: 23.7428
Epoch [1/5], Step [397/3236], Loss: 3.0048, Perplexity: 20.1818
Epoch [1/5], Step [398/3236], Loss: 3.3477, Perplexity: 28.4385
Epoch [1/5], Step [399/3236], Loss: 3.7250, Perplexity: 41.4713
Epoch [1/5], Step [400/3236], Loss: 3.0737, Perplexity: 21.6221
Epoch [1/5], Step [401/3236], Loss: 3.1113, Perplexity: 22.4508
Epoch [1/5], Step [402/3236], Loss: 3.1290, Perplexity: 22.8502
Epoch [1/5], Step [403/3236], Loss: 3.07

Epoch [1/5], Step [517/3236], Loss: 3.5293, Perplexity: 34.1016
Epoch [1/5], Step [518/3236], Loss: 3.1424, Perplexity: 23.1605
Epoch [1/5], Step [519/3236], Loss: 2.9854, Perplexity: 19.7938
Epoch [1/5], Step [520/3236], Loss: 2.9627, Perplexity: 19.3500
Epoch [1/5], Step [521/3236], Loss: 3.2080, Perplexity: 24.7289
Epoch [1/5], Step [522/3236], Loss: 3.0343, Perplexity: 20.7855
Epoch [1/5], Step [523/3236], Loss: 2.9956, Perplexity: 19.9978
Epoch [1/5], Step [524/3236], Loss: 2.9797, Perplexity: 19.6822
Epoch [1/5], Step [525/3236], Loss: 2.9634, Perplexity: 19.3644
Epoch [1/5], Step [526/3236], Loss: 2.8965, Perplexity: 18.1113
Epoch [1/5], Step [527/3236], Loss: 2.9298, Perplexity: 18.7231
Epoch [1/5], Step [528/3236], Loss: 3.1983, Perplexity: 24.4914
Epoch [1/5], Step [529/3236], Loss: 2.9415, Perplexity: 18.9447
Epoch [1/5], Step [530/3236], Loss: 2.8496, Perplexity: 17.2809
Epoch [1/5], Step [531/3236], Loss: 3.4467, Perplexity: 31.3968
Epoch [1/5], Step [532/3236], Loss: 3.01

Epoch [1/5], Step [646/3236], Loss: 3.6254, Perplexity: 37.5409
Epoch [1/5], Step [647/3236], Loss: 3.2185, Perplexity: 24.9914
Epoch [1/5], Step [648/3236], Loss: 3.0610, Perplexity: 21.3490
Epoch [1/5], Step [649/3236], Loss: 2.9763, Perplexity: 19.6156
Epoch [1/5], Step [650/3236], Loss: 2.8242, Perplexity: 16.8473
Epoch [1/5], Step [651/3236], Loss: 2.8025, Perplexity: 16.4861
Epoch [1/5], Step [652/3236], Loss: 2.9638, Perplexity: 19.3713
Epoch [1/5], Step [653/3236], Loss: 2.8627, Perplexity: 17.5083
Epoch [1/5], Step [654/3236], Loss: 3.0339, Perplexity: 20.7771
Epoch [1/5], Step [655/3236], Loss: 2.8928, Perplexity: 18.0431
Epoch [1/5], Step [656/3236], Loss: 2.9835, Perplexity: 19.7571
Epoch [1/5], Step [657/3236], Loss: 2.9893, Perplexity: 19.8710
Epoch [1/5], Step [658/3236], Loss: 2.8419, Perplexity: 17.1482
Epoch [1/5], Step [659/3236], Loss: 2.9880, Perplexity: 19.8457
Epoch [1/5], Step [660/3236], Loss: 2.8363, Perplexity: 17.0533
Epoch [1/5], Step [661/3236], Loss: 2.96

Epoch [1/5], Step [775/3236], Loss: 3.0453, Perplexity: 21.0170
Epoch [1/5], Step [776/3236], Loss: 2.8817, Perplexity: 17.8447
Epoch [1/5], Step [777/3236], Loss: 2.8078, Perplexity: 16.5735
Epoch [1/5], Step [778/3236], Loss: 2.7317, Perplexity: 15.3589
Epoch [1/5], Step [779/3236], Loss: 2.8323, Perplexity: 16.9850
Epoch [1/5], Step [780/3236], Loss: 2.7895, Perplexity: 16.2723
Epoch [1/5], Step [781/3236], Loss: 2.9615, Perplexity: 19.3278
Epoch [1/5], Step [782/3236], Loss: 2.8582, Perplexity: 17.4303
Epoch [1/5], Step [783/3236], Loss: 2.8080, Perplexity: 16.5767
Epoch [1/5], Step [784/3236], Loss: 2.7097, Perplexity: 15.0242
Epoch [1/5], Step [785/3236], Loss: 2.9198, Perplexity: 18.5367
Epoch [1/5], Step [786/3236], Loss: 2.8724, Perplexity: 17.6798
Epoch [1/5], Step [787/3236], Loss: 2.6820, Perplexity: 14.6141
Epoch [1/5], Step [788/3236], Loss: 2.9656, Perplexity: 19.4066
Epoch [1/5], Step [789/3236], Loss: 2.9494, Perplexity: 19.0952
Epoch [1/5], Step [790/3236], Loss: 2.68

Epoch [1/5], Step [904/3236], Loss: 2.6833, Perplexity: 14.6331
Epoch [1/5], Step [905/3236], Loss: 2.8073, Perplexity: 16.5646
Epoch [1/5], Step [906/3236], Loss: 2.6412, Perplexity: 14.0296
Epoch [1/5], Step [907/3236], Loss: 2.7724, Perplexity: 15.9967
Epoch [1/5], Step [908/3236], Loss: 3.0155, Perplexity: 20.3998
Epoch [1/5], Step [909/3236], Loss: 2.8430, Perplexity: 17.1668
Epoch [1/5], Step [910/3236], Loss: 2.6560, Perplexity: 14.2386
Epoch [1/5], Step [911/3236], Loss: 3.4367, Perplexity: 31.0832
Epoch [1/5], Step [912/3236], Loss: 2.5935, Perplexity: 13.3769
Epoch [1/5], Step [913/3236], Loss: 3.2184, Perplexity: 24.9892
Epoch [1/5], Step [914/3236], Loss: 2.8175, Perplexity: 16.7351
Epoch [1/5], Step [915/3236], Loss: 2.9693, Perplexity: 19.4782
Epoch [1/5], Step [916/3236], Loss: 2.6715, Perplexity: 14.4624
Epoch [1/5], Step [917/3236], Loss: 2.6171, Perplexity: 13.6962
Epoch [1/5], Step [918/3236], Loss: 2.7152, Perplexity: 15.1080
Epoch [1/5], Step [919/3236], Loss: 2.57

Epoch [1/5], Step [1032/3236], Loss: 2.6051, Perplexity: 13.5326
Epoch [1/5], Step [1033/3236], Loss: 2.6948, Perplexity: 14.8023
Epoch [1/5], Step [1034/3236], Loss: 2.5587, Perplexity: 12.9186
Epoch [1/5], Step [1035/3236], Loss: 2.9682, Perplexity: 19.4567
Epoch [1/5], Step [1036/3236], Loss: 2.5195, Perplexity: 12.4225
Epoch [1/5], Step [1037/3236], Loss: 2.5178, Perplexity: 12.4011
Epoch [1/5], Step [1038/3236], Loss: 2.6789, Perplexity: 14.5694
Epoch [1/5], Step [1039/3236], Loss: 2.7445, Perplexity: 15.5571
Epoch [1/5], Step [1040/3236], Loss: 2.5699, Perplexity: 13.0639
Epoch [1/5], Step [1041/3236], Loss: 2.7271, Perplexity: 15.2890
Epoch [1/5], Step [1042/3236], Loss: 2.7150, Perplexity: 15.1047
Epoch [1/5], Step [1043/3236], Loss: 2.3911, Perplexity: 10.9259
Epoch [1/5], Step [1044/3236], Loss: 2.7789, Perplexity: 16.1014
Epoch [1/5], Step [1045/3236], Loss: 2.4716, Perplexity: 11.8409
Epoch [1/5], Step [1046/3236], Loss: 2.5136, Perplexity: 12.3497
Epoch [1/5], Step [1047/3

Epoch [1/5], Step [1159/3236], Loss: 2.8513, Perplexity: 17.3099
Epoch [1/5], Step [1160/3236], Loss: 2.5210, Perplexity: 12.4415
Epoch [1/5], Step [1161/3236], Loss: 2.7417, Perplexity: 15.5131
Epoch [1/5], Step [1162/3236], Loss: 2.4423, Perplexity: 11.4992
Epoch [1/5], Step [1163/3236], Loss: 2.5143, Perplexity: 12.3577
Epoch [1/5], Step [1164/3236], Loss: 2.8475, Perplexity: 17.2446
Epoch [1/5], Step [1165/3236], Loss: 2.4592, Perplexity: 11.6959
Epoch [1/5], Step [1166/3236], Loss: 3.1257, Perplexity: 22.7748
Epoch [1/5], Step [1167/3236], Loss: 2.5889, Perplexity: 13.3149
Epoch [1/5], Step [1168/3236], Loss: 2.4652, Perplexity: 11.7658
Epoch [1/5], Step [1169/3236], Loss: 2.7040, Perplexity: 14.9392
Epoch [1/5], Step [1170/3236], Loss: 2.4325, Perplexity: 11.3875
Epoch [1/5], Step [1171/3236], Loss: 2.8851, Perplexity: 17.9061
Epoch [1/5], Step [1172/3236], Loss: 2.7195, Perplexity: 15.1733
Epoch [1/5], Step [1173/3236], Loss: 2.6235, Perplexity: 13.7840
Epoch [1/5], Step [1174/3

Epoch [1/5], Step [1286/3236], Loss: 2.4878, Perplexity: 12.0350
Epoch [1/5], Step [1287/3236], Loss: 2.8613, Perplexity: 17.4836
Epoch [1/5], Step [1288/3236], Loss: 2.3981, Perplexity: 11.0023
Epoch [1/5], Step [1289/3236], Loss: 2.6805, Perplexity: 14.5917
Epoch [1/5], Step [1290/3236], Loss: 2.5883, Perplexity: 13.3069
Epoch [1/5], Step [1291/3236], Loss: 2.4854, Perplexity: 12.0059
Epoch [1/5], Step [1292/3236], Loss: 2.5856, Perplexity: 13.2715
Epoch [1/5], Step [1293/3236], Loss: 2.6054, Perplexity: 13.5361
Epoch [1/5], Step [1294/3236], Loss: 2.4671, Perplexity: 11.7885
Epoch [1/5], Step [1295/3236], Loss: 2.9947, Perplexity: 19.9804
Epoch [1/5], Step [1296/3236], Loss: 2.6579, Perplexity: 14.2659
Epoch [1/5], Step [1297/3236], Loss: 2.5382, Perplexity: 12.6570
Epoch [1/5], Step [1298/3236], Loss: 2.6468, Perplexity: 14.1087
Epoch [1/5], Step [1299/3236], Loss: 2.6657, Perplexity: 14.3786
Epoch [1/5], Step [1300/3236], Loss: 2.4528, Perplexity: 11.6209
Epoch [1/5], Step [1301/3

Epoch [1/5], Step [1413/3236], Loss: 2.3777, Perplexity: 10.7804
Epoch [1/5], Step [1414/3236], Loss: 2.6212, Perplexity: 13.7517
Epoch [1/5], Step [1415/3236], Loss: 2.4409, Perplexity: 11.4839
Epoch [1/5], Step [1416/3236], Loss: 2.5557, Perplexity: 12.8800
Epoch [1/5], Step [1417/3236], Loss: 2.3463, Perplexity: 10.4473
Epoch [1/5], Step [1418/3236], Loss: 2.5643, Perplexity: 12.9921
Epoch [1/5], Step [1419/3236], Loss: 2.3621, Perplexity: 10.6134
Epoch [1/5], Step [1420/3236], Loss: 2.7172, Perplexity: 15.1374
Epoch [1/5], Step [1421/3236], Loss: 2.4776, Perplexity: 11.9128
Epoch [1/5], Step [1422/3236], Loss: 2.2900, Perplexity: 9.8749
Epoch [1/5], Step [1423/3236], Loss: 2.4245, Perplexity: 11.2967
Epoch [1/5], Step [1424/3236], Loss: 2.4353, Perplexity: 11.4196
Epoch [1/5], Step [1425/3236], Loss: 2.3309, Perplexity: 10.2875
Epoch [1/5], Step [1426/3236], Loss: 2.3979, Perplexity: 11.0004
Epoch [1/5], Step [1427/3236], Loss: 2.4755, Perplexity: 11.8874
Epoch [1/5], Step [1428/32

Epoch [1/5], Step [1540/3236], Loss: 2.2931, Perplexity: 9.9053
Epoch [1/5], Step [1541/3236], Loss: 2.2472, Perplexity: 9.4610
Epoch [1/5], Step [1542/3236], Loss: 2.3877, Perplexity: 10.8879
Epoch [1/5], Step [1543/3236], Loss: 2.3161, Perplexity: 10.1359
Epoch [1/5], Step [1544/3236], Loss: 2.5099, Perplexity: 12.3043
Epoch [1/5], Step [1545/3236], Loss: 2.5653, Perplexity: 13.0045
Epoch [1/5], Step [1546/3236], Loss: 2.2892, Perplexity: 9.8670
Epoch [1/5], Step [1547/3236], Loss: 2.5276, Perplexity: 12.5239
Epoch [1/5], Step [1548/3236], Loss: 2.5185, Perplexity: 12.4099
Epoch [1/5], Step [1549/3236], Loss: 2.4158, Perplexity: 11.1992
Epoch [1/5], Step [1550/3236], Loss: 2.4459, Perplexity: 11.5405
Epoch [1/5], Step [1551/3236], Loss: 2.8189, Perplexity: 16.7583
Epoch [1/5], Step [1552/3236], Loss: 2.6681, Perplexity: 14.4123
Epoch [1/5], Step [1553/3236], Loss: 2.5469, Perplexity: 12.7674
Epoch [1/5], Step [1554/3236], Loss: 2.3729, Perplexity: 10.7288
Epoch [1/5], Step [1555/3236

Epoch [1/5], Step [1667/3236], Loss: 2.6315, Perplexity: 13.8952
Epoch [1/5], Step [1668/3236], Loss: 2.4916, Perplexity: 12.0802
Epoch [1/5], Step [1669/3236], Loss: 2.3088, Perplexity: 10.0620
Epoch [1/5], Step [1670/3236], Loss: 2.3024, Perplexity: 9.9984
Epoch [1/5], Step [1671/3236], Loss: 2.4531, Perplexity: 11.6248
Epoch [1/5], Step [1672/3236], Loss: 2.4403, Perplexity: 11.4764
Epoch [1/5], Step [1673/3236], Loss: 2.4949, Perplexity: 12.1211
Epoch [1/5], Step [1674/3236], Loss: 2.2933, Perplexity: 9.9080
Epoch [1/5], Step [1675/3236], Loss: 2.4130, Perplexity: 11.1669
Epoch [1/5], Step [1676/3236], Loss: 2.5163, Perplexity: 12.3832
Epoch [1/5], Step [1677/3236], Loss: 2.3377, Perplexity: 10.3573
Epoch [1/5], Step [1678/3236], Loss: 2.2721, Perplexity: 9.6995
Epoch [1/5], Step [1679/3236], Loss: 2.4515, Perplexity: 11.6062
Epoch [1/5], Step [1680/3236], Loss: 3.1657, Perplexity: 23.7050
Epoch [1/5], Step [1681/3236], Loss: 2.3140, Perplexity: 10.1152
Epoch [1/5], Step [1682/3236

Epoch [1/5], Step [1794/3236], Loss: 2.3605, Perplexity: 10.5960
Epoch [1/5], Step [1795/3236], Loss: 2.3516, Perplexity: 10.5024
Epoch [1/5], Step [1796/3236], Loss: 2.4482, Perplexity: 11.5679
Epoch [1/5], Step [1797/3236], Loss: 2.3048, Perplexity: 10.0220
Epoch [1/5], Step [1798/3236], Loss: 2.4718, Perplexity: 11.8433
Epoch [1/5], Step [1799/3236], Loss: 2.5394, Perplexity: 12.6726
Epoch [1/5], Step [1800/3236], Loss: 2.6182, Perplexity: 13.7111
Epoch [1/5], Step [1801/3236], Loss: 2.6375, Perplexity: 13.9776
Epoch [1/5], Step [1802/3236], Loss: 2.5181, Perplexity: 12.4049
Epoch [1/5], Step [1803/3236], Loss: 2.3649, Perplexity: 10.6431
Epoch [1/5], Step [1804/3236], Loss: 2.3367, Perplexity: 10.3471
Epoch [1/5], Step [1805/3236], Loss: 2.3598, Perplexity: 10.5884
Epoch [1/5], Step [1806/3236], Loss: 2.4635, Perplexity: 11.7461
Epoch [1/5], Step [1807/3236], Loss: 2.5600, Perplexity: 12.9362
Epoch [1/5], Step [1808/3236], Loss: 2.6139, Perplexity: 13.6519
Epoch [1/5], Step [1809/3

Epoch [1/5], Step [1921/3236], Loss: 2.3320, Perplexity: 10.2985
Epoch [1/5], Step [1922/3236], Loss: 2.4703, Perplexity: 11.8258
Epoch [1/5], Step [1923/3236], Loss: 2.6701, Perplexity: 14.4418
Epoch [1/5], Step [1924/3236], Loss: 2.5627, Perplexity: 12.9703
Epoch [1/5], Step [1925/3236], Loss: 2.6665, Perplexity: 14.3891
Epoch [1/5], Step [1926/3236], Loss: 2.4976, Perplexity: 12.1539
Epoch [1/5], Step [1927/3236], Loss: 2.3285, Perplexity: 10.2629
Epoch [1/5], Step [1928/3236], Loss: 2.2952, Perplexity: 9.9268
Epoch [1/5], Step [1929/3236], Loss: 3.2318, Perplexity: 25.3247
Epoch [1/5], Step [1930/3236], Loss: 3.2593, Perplexity: 26.0324
Epoch [1/5], Step [1931/3236], Loss: 2.2837, Perplexity: 9.8126
Epoch [1/5], Step [1932/3236], Loss: 2.3055, Perplexity: 10.0287
Epoch [1/5], Step [1933/3236], Loss: 2.3398, Perplexity: 10.3792
Epoch [1/5], Step [1934/3236], Loss: 2.3583, Perplexity: 10.5732
Epoch [1/5], Step [1935/3236], Loss: 2.5009, Perplexity: 12.1940
Epoch [1/5], Step [1936/323

Epoch [1/5], Step [2048/3236], Loss: 2.3151, Perplexity: 10.1255
Epoch [1/5], Step [2049/3236], Loss: 2.2840, Perplexity: 9.8160
Epoch [1/5], Step [2050/3236], Loss: 2.6391, Perplexity: 14.0010
Epoch [1/5], Step [2051/3236], Loss: 2.6231, Perplexity: 13.7777
Epoch [1/5], Step [2052/3236], Loss: 2.1678, Perplexity: 8.7393
Epoch [1/5], Step [2053/3236], Loss: 2.1745, Perplexity: 8.7981
Epoch [1/5], Step [2054/3236], Loss: 2.2771, Perplexity: 9.7486
Epoch [1/5], Step [2055/3236], Loss: 2.3522, Perplexity: 10.5081
Epoch [1/5], Step [2056/3236], Loss: 2.6154, Perplexity: 13.6727
Epoch [1/5], Step [2057/3236], Loss: 2.5630, Perplexity: 12.9743
Epoch [1/5], Step [2058/3236], Loss: 2.7522, Perplexity: 15.6766
Epoch [1/5], Step [2059/3236], Loss: 2.3627, Perplexity: 10.6191
Epoch [1/5], Step [2060/3236], Loss: 2.2530, Perplexity: 9.5166
Epoch [1/5], Step [2061/3236], Loss: 2.4239, Perplexity: 11.2895
Epoch [1/5], Step [2062/3236], Loss: 2.2053, Perplexity: 9.0731
Epoch [1/5], Step [2063/3236], 

Epoch [1/5], Step [2175/3236], Loss: 2.3458, Perplexity: 10.4412
Epoch [1/5], Step [2176/3236], Loss: 2.2329, Perplexity: 9.3267
Epoch [1/5], Step [2177/3236], Loss: 2.4673, Perplexity: 11.7904
Epoch [1/5], Step [2178/3236], Loss: 2.3683, Perplexity: 10.6794
Epoch [1/5], Step [2179/3236], Loss: 2.3365, Perplexity: 10.3448
Epoch [1/5], Step [2180/3236], Loss: 2.8349, Perplexity: 17.0294
Epoch [1/5], Step [2181/3236], Loss: 2.8904, Perplexity: 18.0005
Epoch [1/5], Step [2182/3236], Loss: 2.3941, Perplexity: 10.9582
Epoch [1/5], Step [2183/3236], Loss: 2.6570, Perplexity: 14.2534
Epoch [1/5], Step [2184/3236], Loss: 2.2813, Perplexity: 9.7899
Epoch [1/5], Step [2185/3236], Loss: 2.3151, Perplexity: 10.1259
Epoch [1/5], Step [2186/3236], Loss: 2.4310, Perplexity: 11.3701
Epoch [1/5], Step [2187/3236], Loss: 2.2385, Perplexity: 9.3796
Epoch [1/5], Step [2188/3236], Loss: 2.3112, Perplexity: 10.0862
Epoch [1/5], Step [2189/3236], Loss: 2.4411, Perplexity: 11.4862
Epoch [1/5], Step [2190/3236

Epoch [1/5], Step [2302/3236], Loss: 2.3436, Perplexity: 10.4187
Epoch [1/5], Step [2303/3236], Loss: 2.2047, Perplexity: 9.0676
Epoch [1/5], Step [2304/3236], Loss: 2.8525, Perplexity: 17.3312
Epoch [1/5], Step [2305/3236], Loss: 2.2918, Perplexity: 9.8924
Epoch [1/5], Step [2306/3236], Loss: 2.2770, Perplexity: 9.7470
Epoch [1/5], Step [2307/3236], Loss: 2.4025, Perplexity: 11.0507
Epoch [1/5], Step [2308/3236], Loss: 2.1855, Perplexity: 8.8948
Epoch [1/5], Step [2309/3236], Loss: 2.0508, Perplexity: 7.7738
Epoch [1/5], Step [2310/3236], Loss: 2.2638, Perplexity: 9.6194
Epoch [1/5], Step [2311/3236], Loss: 2.5791, Perplexity: 13.1853
Epoch [1/5], Step [2312/3236], Loss: 2.2441, Perplexity: 9.4318
Epoch [1/5], Step [2313/3236], Loss: 2.6186, Perplexity: 13.7166
Epoch [1/5], Step [2314/3236], Loss: 2.4073, Perplexity: 11.1039
Epoch [1/5], Step [2315/3236], Loss: 2.4479, Perplexity: 11.5636
Epoch [1/5], Step [2316/3236], Loss: 2.4084, Perplexity: 11.1165
Epoch [1/5], Step [2317/3236], L

Epoch [1/5], Step [2430/3236], Loss: 2.2586, Perplexity: 9.5693
Epoch [1/5], Step [2431/3236], Loss: 2.1757, Perplexity: 8.8084
Epoch [1/5], Step [2432/3236], Loss: 2.3365, Perplexity: 10.3455
Epoch [1/5], Step [2433/3236], Loss: 2.3582, Perplexity: 10.5723
Epoch [1/5], Step [2434/3236], Loss: 2.2354, Perplexity: 9.3502
Epoch [1/5], Step [2435/3236], Loss: 2.1550, Perplexity: 8.6276
Epoch [1/5], Step [2436/3236], Loss: 2.2746, Perplexity: 9.7242
Epoch [1/5], Step [2437/3236], Loss: 2.3147, Perplexity: 10.1223
Epoch [1/5], Step [2438/3236], Loss: 2.2541, Perplexity: 9.5271
Epoch [1/5], Step [2439/3236], Loss: 2.1358, Perplexity: 8.4640
Epoch [1/5], Step [2440/3236], Loss: 2.1422, Perplexity: 8.5181
Epoch [1/5], Step [2441/3236], Loss: 2.0879, Perplexity: 8.0683
Epoch [1/5], Step [2442/3236], Loss: 2.3647, Perplexity: 10.6404
Epoch [1/5], Step [2443/3236], Loss: 2.2110, Perplexity: 9.1248
Epoch [1/5], Step [2444/3236], Loss: 2.3131, Perplexity: 10.1062
Epoch [1/5], Step [2445/3236], Loss

Epoch [1/5], Step [2558/3236], Loss: 2.5117, Perplexity: 12.3253
Epoch [1/5], Step [2559/3236], Loss: 2.3125, Perplexity: 10.0994
Epoch [1/5], Step [2560/3236], Loss: 2.3556, Perplexity: 10.5449
Epoch [1/5], Step [2561/3236], Loss: 2.5742, Perplexity: 13.1209
Epoch [1/5], Step [2562/3236], Loss: 2.1950, Perplexity: 8.9797
Epoch [1/5], Step [2563/3236], Loss: 2.0214, Perplexity: 7.5488
Epoch [1/5], Step [2564/3236], Loss: 2.1614, Perplexity: 8.6829
Epoch [1/5], Step [2565/3236], Loss: 2.2328, Perplexity: 9.3255
Epoch [1/5], Step [2566/3236], Loss: 2.1419, Perplexity: 8.5157
Epoch [1/5], Step [2567/3236], Loss: 2.4069, Perplexity: 11.0999
Epoch [1/5], Step [2568/3236], Loss: 2.5104, Perplexity: 12.3101
Epoch [1/5], Step [2569/3236], Loss: 2.2392, Perplexity: 9.3855
Epoch [1/5], Step [2570/3236], Loss: 2.1682, Perplexity: 8.7423
Epoch [1/5], Step [2571/3236], Loss: 2.1974, Perplexity: 9.0014
Epoch [1/5], Step [2572/3236], Loss: 2.3462, Perplexity: 10.4461
Epoch [1/5], Step [2573/3236], Lo

Epoch [1/5], Step [2686/3236], Loss: 2.1490, Perplexity: 8.5767
Epoch [1/5], Step [2687/3236], Loss: 2.6440, Perplexity: 14.0690
Epoch [1/5], Step [2688/3236], Loss: 2.1667, Perplexity: 8.7298
Epoch [1/5], Step [2689/3236], Loss: 2.2516, Perplexity: 9.5029
Epoch [1/5], Step [2690/3236], Loss: 2.1506, Perplexity: 8.5900
Epoch [1/5], Step [2691/3236], Loss: 2.2124, Perplexity: 9.1374
Epoch [1/5], Step [2692/3236], Loss: 2.2939, Perplexity: 9.9137
Epoch [1/5], Step [2693/3236], Loss: 2.6254, Perplexity: 13.8096
Epoch [1/5], Step [2694/3236], Loss: 2.3034, Perplexity: 10.0079
Epoch [1/5], Step [2695/3236], Loss: 2.5289, Perplexity: 12.5399
Epoch [1/5], Step [2696/3236], Loss: 2.2090, Perplexity: 9.1062
Epoch [1/5], Step [2697/3236], Loss: 2.1504, Perplexity: 8.5883
Epoch [1/5], Step [2698/3236], Loss: 2.2316, Perplexity: 9.3146
Epoch [1/5], Step [2699/3236], Loss: 2.2249, Perplexity: 9.2526
Epoch [1/5], Step [2700/3236], Loss: 2.0377, Perplexity: 7.6733
Epoch [1/5], Step [2701/3236], Loss:

Epoch [1/5], Step [2814/3236], Loss: 2.1444, Perplexity: 8.5365
Epoch [1/5], Step [2815/3236], Loss: 3.0733, Perplexity: 21.6141
Epoch [1/5], Step [2816/3236], Loss: 2.5279, Perplexity: 12.5272
Epoch [1/5], Step [2817/3236], Loss: 2.2181, Perplexity: 9.1896
Epoch [1/5], Step [2818/3236], Loss: 2.2086, Perplexity: 9.1033
Epoch [1/5], Step [2819/3236], Loss: 2.3336, Perplexity: 10.3155
Epoch [1/5], Step [2820/3236], Loss: 2.0585, Perplexity: 7.8342
Epoch [1/5], Step [2821/3236], Loss: 2.2749, Perplexity: 9.7266
Epoch [1/5], Step [2822/3236], Loss: 2.7890, Perplexity: 16.2646
Epoch [1/5], Step [2823/3236], Loss: 2.2555, Perplexity: 9.5396
Epoch [1/5], Step [2824/3236], Loss: 2.5476, Perplexity: 12.7761
Epoch [1/5], Step [2825/3236], Loss: 2.4713, Perplexity: 11.8374
Epoch [1/5], Step [2826/3236], Loss: 2.2255, Perplexity: 9.2577
Epoch [1/5], Step [2827/3236], Loss: 2.1295, Perplexity: 8.4110
Epoch [1/5], Step [2828/3236], Loss: 2.1665, Perplexity: 8.7281
Epoch [1/5], Step [2829/3236], Los

Epoch [1/5], Step [2942/3236], Loss: 2.2700, Perplexity: 9.6791
Epoch [1/5], Step [2943/3236], Loss: 2.1456, Perplexity: 8.5472
Epoch [1/5], Step [2944/3236], Loss: 2.3527, Perplexity: 10.5138
Epoch [1/5], Step [2945/3236], Loss: 2.4922, Perplexity: 12.0884
Epoch [1/5], Step [2946/3236], Loss: 2.0448, Perplexity: 7.7274
Epoch [1/5], Step [2947/3236], Loss: 2.1388, Perplexity: 8.4890
Epoch [1/5], Step [2948/3236], Loss: 2.2777, Perplexity: 9.7542
Epoch [1/5], Step [2949/3236], Loss: 2.9751, Perplexity: 19.5916
Epoch [1/5], Step [2950/3236], Loss: 2.2078, Perplexity: 9.0952
Epoch [1/5], Step [2951/3236], Loss: 2.3276, Perplexity: 10.2537
Epoch [1/5], Step [2952/3236], Loss: 2.2959, Perplexity: 9.9332
Epoch [1/5], Step [2953/3236], Loss: 2.0284, Perplexity: 7.6018
Epoch [1/5], Step [2954/3236], Loss: 2.2812, Perplexity: 9.7888
Epoch [1/5], Step [2955/3236], Loss: 2.4417, Perplexity: 11.4924
Epoch [1/5], Step [2956/3236], Loss: 2.4785, Perplexity: 11.9232
Epoch [1/5], Step [2957/3236], Los

Epoch [1/5], Step [3070/3236], Loss: 2.2125, Perplexity: 9.1386
Epoch [1/5], Step [3071/3236], Loss: 2.1408, Perplexity: 8.5061
Epoch [1/5], Step [3072/3236], Loss: 2.2712, Perplexity: 9.6906
Epoch [1/5], Step [3073/3236], Loss: 2.2338, Perplexity: 9.3357
Epoch [1/5], Step [3074/3236], Loss: 2.3928, Perplexity: 10.9436
Epoch [1/5], Step [3075/3236], Loss: 2.1465, Perplexity: 8.5551
Epoch [1/5], Step [3076/3236], Loss: 2.1444, Perplexity: 8.5373
Epoch [1/5], Step [3077/3236], Loss: 2.1842, Perplexity: 8.8837
Epoch [1/5], Step [3078/3236], Loss: 2.1293, Perplexity: 8.4091
Epoch [1/5], Step [3079/3236], Loss: 2.1553, Perplexity: 8.6305
Epoch [1/5], Step [3080/3236], Loss: 2.4922, Perplexity: 12.0881
Epoch [1/5], Step [3081/3236], Loss: 2.2961, Perplexity: 9.9353
Epoch [1/5], Step [3082/3236], Loss: 2.2250, Perplexity: 9.2534
Epoch [1/5], Step [3083/3236], Loss: 2.2956, Perplexity: 9.9299
Epoch [1/5], Step [3084/3236], Loss: 2.1863, Perplexity: 8.9026
Epoch [1/5], Step [3085/3236], Loss: 2

Epoch [1/5], Step [3198/3236], Loss: 2.4107, Perplexity: 11.1417
Epoch [1/5], Step [3199/3236], Loss: 2.0223, Perplexity: 7.5554
Epoch [1/5], Step [3200/3236], Loss: 2.8899, Perplexity: 17.9914
Epoch [1/5], Step [3201/3236], Loss: 2.2431, Perplexity: 9.4229
Epoch [1/5], Step [3202/3236], Loss: 2.3232, Perplexity: 10.2079
Epoch [1/5], Step [3203/3236], Loss: 2.1282, Perplexity: 8.3995
Epoch [1/5], Step [3204/3236], Loss: 2.2923, Perplexity: 9.8981
Epoch [1/5], Step [3205/3236], Loss: 2.1434, Perplexity: 8.5288
Epoch [1/5], Step [3206/3236], Loss: 2.1346, Perplexity: 8.4539
Epoch [1/5], Step [3207/3236], Loss: 2.1903, Perplexity: 8.9377
Epoch [1/5], Step [3208/3236], Loss: 2.0562, Perplexity: 7.8159
Epoch [1/5], Step [3209/3236], Loss: 2.1212, Perplexity: 8.3410
Epoch [1/5], Step [3210/3236], Loss: 2.1750, Perplexity: 8.8026
Epoch [1/5], Step [3211/3236], Loss: 2.0976, Perplexity: 8.1462
Epoch [1/5], Step [3212/3236], Loss: 2.0982, Perplexity: 8.1518
Epoch [1/5], Step [3213/3236], Loss: 

Epoch [2/5], Step [93/3236], Loss: 2.1264, Perplexity: 8.3850
Epoch [2/5], Step [94/3236], Loss: 2.2035, Perplexity: 9.0569
Epoch [2/5], Step [95/3236], Loss: 2.3795, Perplexity: 10.7994
Epoch [2/5], Step [96/3236], Loss: 2.6260, Perplexity: 13.8188
Epoch [2/5], Step [97/3236], Loss: 2.1418, Perplexity: 8.5150
Epoch [2/5], Step [98/3236], Loss: 2.2109, Perplexity: 9.1238
Epoch [2/5], Step [99/3236], Loss: 2.2197, Perplexity: 9.2047
Epoch [2/5], Step [100/3236], Loss: 2.7237, Perplexity: 15.2370
Epoch [2/5], Step [101/3236], Loss: 2.2034, Perplexity: 9.0555
Epoch [2/5], Step [102/3236], Loss: 2.0465, Perplexity: 7.7406
Epoch [2/5], Step [103/3236], Loss: 2.2663, Perplexity: 9.6437
Epoch [2/5], Step [104/3236], Loss: 2.3121, Perplexity: 10.0957
Epoch [2/5], Step [105/3236], Loss: 2.4665, Perplexity: 11.7812
Epoch [2/5], Step [106/3236], Loss: 2.4390, Perplexity: 11.4610
Epoch [2/5], Step [107/3236], Loss: 2.0894, Perplexity: 8.0797
Epoch [2/5], Step [108/3236], Loss: 2.1142, Perplexity: 

Epoch [2/5], Step [223/3236], Loss: 2.3209, Perplexity: 10.1845
Epoch [2/5], Step [224/3236], Loss: 2.2140, Perplexity: 9.1526
Epoch [2/5], Step [225/3236], Loss: 2.1580, Perplexity: 8.6540
Epoch [2/5], Step [226/3236], Loss: 2.5118, Perplexity: 12.3274
Epoch [2/5], Step [227/3236], Loss: 2.2793, Perplexity: 9.7698
Epoch [2/5], Step [228/3236], Loss: 2.1492, Perplexity: 8.5776
Epoch [2/5], Step [229/3236], Loss: 2.2330, Perplexity: 9.3276
Epoch [2/5], Step [230/3236], Loss: 2.0537, Perplexity: 7.7969
Epoch [2/5], Step [231/3236], Loss: 2.5037, Perplexity: 12.2281
Epoch [2/5], Step [232/3236], Loss: 2.4684, Perplexity: 11.8030
Epoch [2/5], Step [233/3236], Loss: 2.0112, Perplexity: 7.4721
Epoch [2/5], Step [234/3236], Loss: 2.3942, Perplexity: 10.9590
Epoch [2/5], Step [235/3236], Loss: 2.2288, Perplexity: 9.2889
Epoch [2/5], Step [236/3236], Loss: 2.2377, Perplexity: 9.3722
Epoch [2/5], Step [237/3236], Loss: 2.1727, Perplexity: 8.7820
Epoch [2/5], Step [238/3236], Loss: 2.0894, Perple

Epoch [2/5], Step [353/3236], Loss: 2.1726, Perplexity: 8.7808
Epoch [2/5], Step [354/3236], Loss: 2.5463, Perplexity: 12.7595
Epoch [2/5], Step [355/3236], Loss: 2.2061, Perplexity: 9.0803
Epoch [2/5], Step [356/3236], Loss: 2.3266, Perplexity: 10.2432
Epoch [2/5], Step [357/3236], Loss: 2.0246, Perplexity: 7.5733
Epoch [2/5], Step [358/3236], Loss: 2.1921, Perplexity: 8.9537
Epoch [2/5], Step [359/3236], Loss: 2.1225, Perplexity: 8.3522
Epoch [2/5], Step [360/3236], Loss: 2.4578, Perplexity: 11.6791
Epoch [2/5], Step [361/3236], Loss: 2.3093, Perplexity: 10.0670
Epoch [2/5], Step [362/3236], Loss: 2.2176, Perplexity: 9.1852
Epoch [2/5], Step [363/3236], Loss: 2.0776, Perplexity: 7.9854
Epoch [2/5], Step [364/3236], Loss: 2.0077, Perplexity: 7.4460
Epoch [2/5], Step [365/3236], Loss: 2.0407, Perplexity: 7.6957
Epoch [2/5], Step [366/3236], Loss: 2.1317, Perplexity: 8.4290
Epoch [2/5], Step [367/3236], Loss: 2.8277, Perplexity: 16.9063
Epoch [2/5], Step [368/3236], Loss: 2.2415, Perple

Epoch [2/5], Step [483/3236], Loss: 2.1390, Perplexity: 8.4911
Epoch [2/5], Step [484/3236], Loss: 2.2213, Perplexity: 9.2193
Epoch [2/5], Step [485/3236], Loss: 2.1683, Perplexity: 8.7432
Epoch [2/5], Step [486/3236], Loss: 2.2191, Perplexity: 9.1988
Epoch [2/5], Step [487/3236], Loss: 2.0498, Perplexity: 7.7663
Epoch [2/5], Step [488/3236], Loss: 2.4480, Perplexity: 11.5656
Epoch [2/5], Step [489/3236], Loss: 2.0350, Perplexity: 7.6526
Epoch [2/5], Step [490/3236], Loss: 2.1888, Perplexity: 8.9242
Epoch [2/5], Step [491/3236], Loss: 1.9632, Perplexity: 7.1219
Epoch [2/5], Step [492/3236], Loss: 2.1496, Perplexity: 8.5814
Epoch [2/5], Step [493/3236], Loss: 2.4771, Perplexity: 11.9072
Epoch [2/5], Step [494/3236], Loss: 2.1175, Perplexity: 8.3103
Epoch [2/5], Step [495/3236], Loss: 3.0837, Perplexity: 21.8392
Epoch [2/5], Step [496/3236], Loss: 2.1717, Perplexity: 8.7736
Epoch [2/5], Step [497/3236], Loss: 2.2427, Perplexity: 9.4185
Epoch [2/5], Step [498/3236], Loss: 2.1705, Perplexi

Epoch [2/5], Step [613/3236], Loss: 2.8351, Perplexity: 17.0316
Epoch [2/5], Step [614/3236], Loss: 2.2367, Perplexity: 9.3624
Epoch [2/5], Step [615/3236], Loss: 2.6455, Perplexity: 14.0910
Epoch [2/5], Step [616/3236], Loss: 2.1514, Perplexity: 8.5971
Epoch [2/5], Step [617/3236], Loss: 2.2245, Perplexity: 9.2493
Epoch [2/5], Step [618/3236], Loss: 2.3241, Perplexity: 10.2170
Epoch [2/5], Step [619/3236], Loss: 2.1651, Perplexity: 8.7158
Epoch [2/5], Step [620/3236], Loss: 2.1214, Perplexity: 8.3430
Epoch [2/5], Step [621/3236], Loss: 2.1336, Perplexity: 8.4449
Epoch [2/5], Step [622/3236], Loss: 2.3411, Perplexity: 10.3932
Epoch [2/5], Step [623/3236], Loss: 2.0669, Perplexity: 7.8999
Epoch [2/5], Step [624/3236], Loss: 2.1333, Perplexity: 8.4430
Epoch [2/5], Step [625/3236], Loss: 2.0748, Perplexity: 7.9631
Epoch [2/5], Step [626/3236], Loss: 2.0928, Perplexity: 8.1072
Epoch [2/5], Step [627/3236], Loss: 2.2052, Perplexity: 9.0718
Epoch [2/5], Step [628/3236], Loss: 2.2539, Perplex

Epoch [2/5], Step [743/3236], Loss: 2.1415, Perplexity: 8.5123
Epoch [2/5], Step [744/3236], Loss: 2.0384, Perplexity: 7.6785
Epoch [2/5], Step [745/3236], Loss: 2.3754, Perplexity: 10.7548
Epoch [2/5], Step [746/3236], Loss: 2.0743, Perplexity: 7.9593
Epoch [2/5], Step [747/3236], Loss: 2.0534, Perplexity: 7.7947
Epoch [2/5], Step [748/3236], Loss: 2.5619, Perplexity: 12.9606
Epoch [2/5], Step [749/3236], Loss: 2.0841, Perplexity: 8.0372
Epoch [2/5], Step [750/3236], Loss: 2.3165, Perplexity: 10.1403
Epoch [2/5], Step [751/3236], Loss: 2.0435, Perplexity: 7.7176
Epoch [2/5], Step [752/3236], Loss: 2.4465, Perplexity: 11.5477
Epoch [2/5], Step [753/3236], Loss: 2.2412, Perplexity: 9.4042
Epoch [2/5], Step [754/3236], Loss: 2.4052, Perplexity: 11.0808
Epoch [2/5], Step [755/3236], Loss: 2.0980, Perplexity: 8.1497
Epoch [2/5], Step [756/3236], Loss: 2.3070, Perplexity: 10.0439
Epoch [2/5], Step [757/3236], Loss: 2.1421, Perplexity: 8.5175
Epoch [2/5], Step [758/3236], Loss: 2.1863, Perpl

Epoch [2/5], Step [873/3236], Loss: 2.3375, Perplexity: 10.3552
Epoch [2/5], Step [874/3236], Loss: 1.9957, Perplexity: 7.3576
Epoch [2/5], Step [875/3236], Loss: 2.5124, Perplexity: 12.3339
Epoch [2/5], Step [876/3236], Loss: 2.1806, Perplexity: 8.8515
Epoch [2/5], Step [877/3236], Loss: 2.0736, Perplexity: 7.9531
Epoch [2/5], Step [878/3236], Loss: 2.1956, Perplexity: 8.9856
Epoch [2/5], Step [879/3236], Loss: 2.1876, Perplexity: 8.9137
Epoch [2/5], Step [880/3236], Loss: 2.0232, Perplexity: 7.5626
Epoch [2/5], Step [881/3236], Loss: 2.1253, Perplexity: 8.3757
Epoch [2/5], Step [882/3236], Loss: 2.5193, Perplexity: 12.4205
Epoch [2/5], Step [883/3236], Loss: 2.0220, Perplexity: 7.5537
Epoch [2/5], Step [884/3236], Loss: 2.0389, Perplexity: 7.6823
Epoch [2/5], Step [885/3236], Loss: 2.3282, Perplexity: 10.2597
Epoch [2/5], Step [886/3236], Loss: 2.0901, Perplexity: 8.0856
Epoch [2/5], Step [887/3236], Loss: 2.3225, Perplexity: 10.2007
Epoch [2/5], Step [888/3236], Loss: 2.2256, Perple

Epoch [2/5], Step [1003/3236], Loss: 2.1611, Perplexity: 8.6810
Epoch [2/5], Step [1004/3236], Loss: 2.2111, Perplexity: 9.1261
Epoch [2/5], Step [1005/3236], Loss: 3.9099, Perplexity: 49.8915
Epoch [2/5], Step [1006/3236], Loss: 2.1129, Perplexity: 8.2724
Epoch [2/5], Step [1007/3236], Loss: 2.5413, Perplexity: 12.6964
Epoch [2/5], Step [1008/3236], Loss: 2.0294, Perplexity: 7.6099
Epoch [2/5], Step [1009/3236], Loss: 2.1382, Perplexity: 8.4844
Epoch [2/5], Step [1010/3236], Loss: 2.1933, Perplexity: 8.9648
Epoch [2/5], Step [1011/3236], Loss: 2.9941, Perplexity: 19.9677
Epoch [2/5], Step [1012/3236], Loss: 2.2449, Perplexity: 9.4399
Epoch [2/5], Step [1013/3236], Loss: 2.1007, Perplexity: 8.1715
Epoch [2/5], Step [1014/3236], Loss: 2.1964, Perplexity: 8.9925
Epoch [2/5], Step [1015/3236], Loss: 2.2436, Perplexity: 9.4273
Epoch [2/5], Step [1016/3236], Loss: 2.0099, Perplexity: 7.4627
Epoch [2/5], Step [1017/3236], Loss: 2.1124, Perplexity: 8.2683
Epoch [2/5], Step [1018/3236], Loss: 

Epoch [2/5], Step [1131/3236], Loss: 2.0569, Perplexity: 7.8214
Epoch [2/5], Step [1132/3236], Loss: 1.9678, Perplexity: 7.1550
Epoch [2/5], Step [1133/3236], Loss: 2.0208, Perplexity: 7.5441
Epoch [2/5], Step [1134/3236], Loss: 2.4518, Perplexity: 11.6087
Epoch [2/5], Step [1135/3236], Loss: 1.9689, Perplexity: 7.1629
Epoch [2/5], Step [1136/3236], Loss: 2.3422, Perplexity: 10.4045
Epoch [2/5], Step [1137/3236], Loss: 1.9080, Perplexity: 6.7394
Epoch [2/5], Step [1138/3236], Loss: 1.9976, Perplexity: 7.3712
Epoch [2/5], Step [1139/3236], Loss: 2.2041, Perplexity: 9.0624
Epoch [2/5], Step [1140/3236], Loss: 2.0634, Perplexity: 7.8728
Epoch [2/5], Step [1141/3236], Loss: 1.8826, Perplexity: 6.5702
Epoch [2/5], Step [1142/3236], Loss: 2.1216, Perplexity: 8.3446
Epoch [2/5], Step [1143/3236], Loss: 2.2639, Perplexity: 9.6201
Epoch [2/5], Step [1144/3236], Loss: 2.0652, Perplexity: 7.8868
Epoch [2/5], Step [1145/3236], Loss: 2.1351, Perplexity: 8.4582
Epoch [2/5], Step [1146/3236], Loss: 2

Epoch [2/5], Step [1259/3236], Loss: 1.9879, Perplexity: 7.3004
Epoch [2/5], Step [1260/3236], Loss: 2.5610, Perplexity: 12.9489
Epoch [2/5], Step [1261/3236], Loss: 2.0597, Perplexity: 7.8434
Epoch [2/5], Step [1262/3236], Loss: 2.1192, Perplexity: 8.3248
Epoch [2/5], Step [1263/3236], Loss: 2.7246, Perplexity: 15.2508
Epoch [2/5], Step [1264/3236], Loss: 2.0270, Perplexity: 7.5909
Epoch [2/5], Step [1265/3236], Loss: 2.2163, Perplexity: 9.1734
Epoch [2/5], Step [1266/3236], Loss: 2.0849, Perplexity: 8.0434
Epoch [2/5], Step [1267/3236], Loss: 2.1553, Perplexity: 8.6305
Epoch [2/5], Step [1268/3236], Loss: 2.2107, Perplexity: 9.1216
Epoch [2/5], Step [1269/3236], Loss: 2.2125, Perplexity: 9.1388
Epoch [2/5], Step [1270/3236], Loss: 2.1059, Perplexity: 8.2142
Epoch [2/5], Step [1271/3236], Loss: 2.1099, Perplexity: 8.2474
Epoch [2/5], Step [1272/3236], Loss: 2.0803, Perplexity: 8.0067
Epoch [2/5], Step [1273/3236], Loss: 2.1044, Perplexity: 8.2024
Epoch [2/5], Step [1274/3236], Loss: 2

Epoch [2/5], Step [1387/3236], Loss: 2.3306, Perplexity: 10.2845
Epoch [2/5], Step [1388/3236], Loss: 2.2301, Perplexity: 9.3011
Epoch [2/5], Step [1389/3236], Loss: 1.9182, Perplexity: 6.8086
Epoch [2/5], Step [1390/3236], Loss: 2.5143, Perplexity: 12.3585
Epoch [2/5], Step [1391/3236], Loss: 1.9963, Perplexity: 7.3618
Epoch [2/5], Step [1392/3236], Loss: 2.1696, Perplexity: 8.7545
Epoch [2/5], Step [1393/3236], Loss: 2.1933, Perplexity: 8.9649
Epoch [2/5], Step [1394/3236], Loss: 2.2674, Perplexity: 9.6545
Epoch [2/5], Step [1395/3236], Loss: 2.0120, Perplexity: 7.4782
Epoch [2/5], Step [1396/3236], Loss: 2.0173, Perplexity: 7.5182
Epoch [2/5], Step [1397/3236], Loss: 2.0142, Perplexity: 7.4949
Epoch [2/5], Step [1398/3236], Loss: 2.5619, Perplexity: 12.9608
Epoch [2/5], Step [1399/3236], Loss: 2.9412, Perplexity: 18.9386
Epoch [2/5], Step [1400/3236], Loss: 1.9786, Perplexity: 7.2325
Epoch [2/5], Step [1401/3236], Loss: 2.3744, Perplexity: 10.7443
Epoch [2/5], Step [1402/3236], Loss

Epoch [2/5], Step [1515/3236], Loss: 2.7404, Perplexity: 15.4930
Epoch [2/5], Step [1516/3236], Loss: 2.2617, Perplexity: 9.5992
Epoch [2/5], Step [1517/3236], Loss: 2.0683, Perplexity: 7.9113
Epoch [2/5], Step [1518/3236], Loss: 2.2162, Perplexity: 9.1725
Epoch [2/5], Step [1519/3236], Loss: 2.0039, Perplexity: 7.4180
Epoch [2/5], Step [1520/3236], Loss: 2.4271, Perplexity: 11.3255
Epoch [2/5], Step [1521/3236], Loss: 2.1794, Perplexity: 8.8414
Epoch [2/5], Step [1522/3236], Loss: 2.2480, Perplexity: 9.4685
Epoch [2/5], Step [1523/3236], Loss: 2.7453, Perplexity: 15.5686
Epoch [2/5], Step [1524/3236], Loss: 2.2236, Perplexity: 9.2409
Epoch [2/5], Step [1525/3236], Loss: 2.2514, Perplexity: 9.5015
Epoch [2/5], Step [1526/3236], Loss: 2.1922, Perplexity: 8.9552
Epoch [2/5], Step [1527/3236], Loss: 1.9745, Perplexity: 7.2030
Epoch [2/5], Step [1528/3236], Loss: 2.1364, Perplexity: 8.4685
Epoch [2/5], Step [1529/3236], Loss: 2.4317, Perplexity: 11.3783
Epoch [2/5], Step [1530/3236], Loss:

Epoch [2/5], Step [1643/3236], Loss: 2.2547, Perplexity: 9.5321
Epoch [2/5], Step [1644/3236], Loss: 2.0313, Perplexity: 7.6241
Epoch [2/5], Step [1645/3236], Loss: 1.9529, Perplexity: 7.0491
Epoch [2/5], Step [1646/3236], Loss: 2.0351, Perplexity: 7.6532
Epoch [2/5], Step [1647/3236], Loss: 1.9224, Perplexity: 6.8372
Epoch [2/5], Step [1648/3236], Loss: 2.0355, Perplexity: 7.6562
Epoch [2/5], Step [1649/3236], Loss: 2.1348, Perplexity: 8.4555
Epoch [2/5], Step [1650/3236], Loss: 2.1116, Perplexity: 8.2618
Epoch [2/5], Step [1651/3236], Loss: 2.1358, Perplexity: 8.4638
Epoch [2/5], Step [1652/3236], Loss: 2.0644, Perplexity: 7.8802
Epoch [2/5], Step [1653/3236], Loss: 2.1937, Perplexity: 8.9680
Epoch [2/5], Step [1654/3236], Loss: 2.0390, Perplexity: 7.6827
Epoch [2/5], Step [1655/3236], Loss: 2.0766, Perplexity: 7.9773
Epoch [2/5], Step [1656/3236], Loss: 2.1804, Perplexity: 8.8496
Epoch [2/5], Step [1657/3236], Loss: 1.9813, Perplexity: 7.2522
Epoch [2/5], Step [1658/3236], Loss: 2.1

Epoch [2/5], Step [1771/3236], Loss: 2.0579, Perplexity: 7.8294
Epoch [2/5], Step [1772/3236], Loss: 2.0627, Perplexity: 7.8674
Epoch [2/5], Step [1773/3236], Loss: 2.0802, Perplexity: 8.0057
Epoch [2/5], Step [1774/3236], Loss: 2.1008, Perplexity: 8.1724
Epoch [2/5], Step [1775/3236], Loss: 2.1201, Perplexity: 8.3323
Epoch [2/5], Step [1776/3236], Loss: 2.0777, Perplexity: 7.9858
Epoch [2/5], Step [1777/3236], Loss: 1.9650, Perplexity: 7.1346
Epoch [2/5], Step [1778/3236], Loss: 2.0999, Perplexity: 8.1656
Epoch [2/5], Step [1779/3236], Loss: 2.0326, Perplexity: 7.6338
Epoch [2/5], Step [1780/3236], Loss: 1.8843, Perplexity: 6.5820
Epoch [2/5], Step [1781/3236], Loss: 1.8735, Perplexity: 6.5112
Epoch [2/5], Step [1782/3236], Loss: 2.0836, Perplexity: 8.0331
Epoch [2/5], Step [1783/3236], Loss: 2.0559, Perplexity: 7.8140
Epoch [2/5], Step [1784/3236], Loss: 2.0464, Perplexity: 7.7401
Epoch [2/5], Step [1785/3236], Loss: 2.0729, Perplexity: 7.9475
Epoch [2/5], Step [1786/3236], Loss: 2.0

Epoch [2/5], Step [1899/3236], Loss: 2.3518, Perplexity: 10.5041
Epoch [2/5], Step [1900/3236], Loss: 2.0572, Perplexity: 7.8242
Epoch [2/5], Step [1901/3236], Loss: 2.1553, Perplexity: 8.6309
Epoch [2/5], Step [1902/3236], Loss: 2.1860, Perplexity: 8.8992
Epoch [2/5], Step [1903/3236], Loss: 1.9430, Perplexity: 6.9799
Epoch [2/5], Step [1904/3236], Loss: 2.0307, Perplexity: 7.6193
Epoch [2/5], Step [1905/3236], Loss: 2.2717, Perplexity: 9.6956
Epoch [2/5], Step [1906/3236], Loss: 2.0563, Perplexity: 7.8174
Epoch [2/5], Step [1907/3236], Loss: 1.9865, Perplexity: 7.2901
Epoch [2/5], Step [1908/3236], Loss: 1.9777, Perplexity: 7.2261
Epoch [2/5], Step [1909/3236], Loss: 1.9872, Perplexity: 7.2954
Epoch [2/5], Step [1910/3236], Loss: 2.1266, Perplexity: 8.3867
Epoch [2/5], Step [1911/3236], Loss: 1.9703, Perplexity: 7.1725
Epoch [2/5], Step [1912/3236], Loss: 1.9768, Perplexity: 7.2198
Epoch [2/5], Step [1913/3236], Loss: 1.9814, Perplexity: 7.2531
Epoch [2/5], Step [1914/3236], Loss: 1.

Epoch [2/5], Step [2027/3236], Loss: 2.2670, Perplexity: 9.6506
Epoch [2/5], Step [2028/3236], Loss: 2.1562, Perplexity: 8.6384
Epoch [2/5], Step [2029/3236], Loss: 1.9441, Perplexity: 6.9871
Epoch [2/5], Step [2030/3236], Loss: 1.9215, Perplexity: 6.8315
Epoch [2/5], Step [2031/3236], Loss: 1.9472, Perplexity: 7.0091
Epoch [2/5], Step [2032/3236], Loss: 2.1701, Perplexity: 8.7593
Epoch [2/5], Step [2033/3236], Loss: 2.0521, Perplexity: 7.7844
Epoch [2/5], Step [2034/3236], Loss: 1.9131, Perplexity: 6.7741
Epoch [2/5], Step [2035/3236], Loss: 2.2389, Perplexity: 9.3828
Epoch [2/5], Step [2036/3236], Loss: 2.1597, Perplexity: 8.6684
Epoch [2/5], Step [2037/3236], Loss: 2.0360, Perplexity: 7.6601
Epoch [2/5], Step [2038/3236], Loss: 2.0840, Perplexity: 8.0364
Epoch [2/5], Step [2039/3236], Loss: 2.0095, Perplexity: 7.4599
Epoch [2/5], Step [2040/3236], Loss: 2.2379, Perplexity: 9.3739
Epoch [2/5], Step [2041/3236], Loss: 2.3561, Perplexity: 10.5501
Epoch [2/5], Step [2042/3236], Loss: 2.

Epoch [2/5], Step [2155/3236], Loss: 2.2836, Perplexity: 9.8117
Epoch [2/5], Step [2156/3236], Loss: 2.3926, Perplexity: 10.9420
Epoch [2/5], Step [2157/3236], Loss: 2.0072, Perplexity: 7.4422
Epoch [2/5], Step [2158/3236], Loss: 2.1833, Perplexity: 8.8756
Epoch [2/5], Step [2159/3236], Loss: 2.0988, Perplexity: 8.1562
Epoch [2/5], Step [2160/3236], Loss: 2.1139, Perplexity: 8.2807
Epoch [2/5], Step [2161/3236], Loss: 2.0824, Perplexity: 8.0238
Epoch [2/5], Step [2162/3236], Loss: 2.1662, Perplexity: 8.7249
Epoch [2/5], Step [2163/3236], Loss: 2.1597, Perplexity: 8.6686
Epoch [2/5], Step [2164/3236], Loss: 1.9836, Perplexity: 7.2687
Epoch [2/5], Step [2165/3236], Loss: 2.3696, Perplexity: 10.6934
Epoch [2/5], Step [2166/3236], Loss: 2.0757, Perplexity: 7.9698
Epoch [2/5], Step [2167/3236], Loss: 2.2897, Perplexity: 9.8721
Epoch [2/5], Step [2168/3236], Loss: 2.0315, Perplexity: 7.6257
Epoch [2/5], Step [2169/3236], Loss: 2.2184, Perplexity: 9.1923
Epoch [2/5], Step [2170/3236], Loss: 2

Epoch [2/5], Step [2283/3236], Loss: 1.9078, Perplexity: 6.7379
Epoch [2/5], Step [2284/3236], Loss: 2.1687, Perplexity: 8.7472
Epoch [2/5], Step [2285/3236], Loss: 2.0444, Perplexity: 7.7243
Epoch [2/5], Step [2286/3236], Loss: 1.9780, Perplexity: 7.2286
Epoch [2/5], Step [2287/3236], Loss: 1.9270, Perplexity: 6.8691
Epoch [2/5], Step [2288/3236], Loss: 2.0135, Perplexity: 7.4892
Epoch [2/5], Step [2289/3236], Loss: 1.9584, Perplexity: 7.0878
Epoch [2/5], Step [2290/3236], Loss: 2.0835, Perplexity: 8.0329
Epoch [2/5], Step [2291/3236], Loss: 2.0851, Perplexity: 8.0454
Epoch [2/5], Step [2292/3236], Loss: 2.3636, Perplexity: 10.6288
Epoch [2/5], Step [2293/3236], Loss: 1.9216, Perplexity: 6.8322
Epoch [2/5], Step [2294/3236], Loss: 2.2797, Perplexity: 9.7733
Epoch [2/5], Step [2295/3236], Loss: 1.9851, Perplexity: 7.2801
Epoch [2/5], Step [2296/3236], Loss: 2.2219, Perplexity: 9.2249
Epoch [2/5], Step [2297/3236], Loss: 1.9868, Perplexity: 7.2925
Epoch [2/5], Step [2298/3236], Loss: 2.

Epoch [2/5], Step [2411/3236], Loss: 2.1811, Perplexity: 8.8563
Epoch [2/5], Step [2412/3236], Loss: 2.0309, Perplexity: 7.6206
Epoch [2/5], Step [2413/3236], Loss: 2.1755, Perplexity: 8.8068
Epoch [2/5], Step [2414/3236], Loss: 2.1773, Perplexity: 8.8222
Epoch [2/5], Step [2415/3236], Loss: 2.0290, Perplexity: 7.6068
Epoch [2/5], Step [2416/3236], Loss: 1.9397, Perplexity: 6.9568
Epoch [2/5], Step [2417/3236], Loss: 2.0117, Perplexity: 7.4762
Epoch [2/5], Step [2418/3236], Loss: 1.8243, Perplexity: 6.1986
Epoch [2/5], Step [2419/3236], Loss: 2.1073, Perplexity: 8.2264
Epoch [2/5], Step [2420/3236], Loss: 2.0227, Perplexity: 7.5585
Epoch [2/5], Step [2421/3236], Loss: 2.0177, Perplexity: 7.5212
Epoch [2/5], Step [2422/3236], Loss: 2.2051, Perplexity: 9.0708
Epoch [2/5], Step [2423/3236], Loss: 1.9485, Perplexity: 7.0180
Epoch [2/5], Step [2424/3236], Loss: 3.0348, Perplexity: 20.7977
Epoch [2/5], Step [2425/3236], Loss: 2.0433, Perplexity: 7.7162
Epoch [2/5], Step [2426/3236], Loss: 2.

Epoch [2/5], Step [2539/3236], Loss: 2.0813, Perplexity: 8.0150
Epoch [2/5], Step [2540/3236], Loss: 2.0224, Perplexity: 7.5561
Epoch [2/5], Step [2541/3236], Loss: 2.5299, Perplexity: 12.5522
Epoch [2/5], Step [2542/3236], Loss: 2.0092, Perplexity: 7.4574
Epoch [2/5], Step [2543/3236], Loss: 2.1942, Perplexity: 8.9730
Epoch [2/5], Step [2544/3236], Loss: 1.8992, Perplexity: 6.6807
Epoch [2/5], Step [2545/3236], Loss: 2.0562, Perplexity: 7.8165
Epoch [2/5], Step [2546/3236], Loss: 2.2166, Perplexity: 9.1759
Epoch [2/5], Step [2547/3236], Loss: 2.8224, Perplexity: 16.8180
Epoch [2/5], Step [2548/3236], Loss: 3.2890, Perplexity: 26.8152
Epoch [2/5], Step [2549/3236], Loss: 2.1563, Perplexity: 8.6390
Epoch [2/5], Step [2550/3236], Loss: 1.9669, Perplexity: 7.1485
Epoch [2/5], Step [2551/3236], Loss: 2.2325, Perplexity: 9.3235
Epoch [2/5], Step [2552/3236], Loss: 2.0411, Perplexity: 7.6987
Epoch [2/5], Step [2553/3236], Loss: 2.1095, Perplexity: 8.2443
Epoch [2/5], Step [2554/3236], Loss: 

Epoch [2/5], Step [2667/3236], Loss: 2.0988, Perplexity: 8.1563
Epoch [2/5], Step [2668/3236], Loss: 1.9818, Perplexity: 7.2557
Epoch [2/5], Step [2669/3236], Loss: 2.0163, Perplexity: 7.5102
Epoch [2/5], Step [2670/3236], Loss: 2.1798, Perplexity: 8.8442
Epoch [2/5], Step [2671/3236], Loss: 2.0146, Perplexity: 7.4976
Epoch [2/5], Step [2672/3236], Loss: 2.2516, Perplexity: 9.5033
Epoch [2/5], Step [2673/3236], Loss: 2.5867, Perplexity: 13.2865
Epoch [2/5], Step [2674/3236], Loss: 2.0595, Perplexity: 7.8420
Epoch [2/5], Step [2675/3236], Loss: 2.1906, Perplexity: 8.9405
Epoch [2/5], Step [2676/3236], Loss: 2.0003, Perplexity: 7.3914
Epoch [2/5], Step [2677/3236], Loss: 2.0522, Perplexity: 7.7847
Epoch [2/5], Step [2678/3236], Loss: 1.9862, Perplexity: 7.2874
Epoch [2/5], Step [2679/3236], Loss: 1.9581, Perplexity: 7.0861
Epoch [2/5], Step [2680/3236], Loss: 2.1271, Perplexity: 8.3906
Epoch [2/5], Step [2681/3236], Loss: 2.0085, Perplexity: 7.4522
Epoch [2/5], Step [2682/3236], Loss: 2.

Epoch [2/5], Step [2795/3236], Loss: 1.8890, Perplexity: 6.6128
Epoch [2/5], Step [2796/3236], Loss: 1.9752, Perplexity: 7.2082
Epoch [2/5], Step [2797/3236], Loss: 2.1990, Perplexity: 9.0164
Epoch [2/5], Step [2798/3236], Loss: 1.9922, Perplexity: 7.3319
Epoch [2/5], Step [2799/3236], Loss: 2.1696, Perplexity: 8.7546
Epoch [2/5], Step [2800/3236], Loss: 1.8497, Perplexity: 6.3578
Epoch [2/5], Step [2801/3236], Loss: 2.0671, Perplexity: 7.9015
Epoch [2/5], Step [2802/3236], Loss: 2.1974, Perplexity: 9.0011
Epoch [2/5], Step [2803/3236], Loss: 2.0794, Perplexity: 7.9995
Epoch [2/5], Step [2804/3236], Loss: 2.0304, Perplexity: 7.6168
Epoch [2/5], Step [2805/3236], Loss: 1.9137, Perplexity: 6.7779
Epoch [2/5], Step [2806/3236], Loss: 2.0977, Perplexity: 8.1477
Epoch [2/5], Step [2807/3236], Loss: 2.0770, Perplexity: 7.9804
Epoch [2/5], Step [2808/3236], Loss: 1.9544, Perplexity: 7.0594
Epoch [2/5], Step [2809/3236], Loss: 2.0128, Perplexity: 7.4846
Epoch [2/5], Step [2810/3236], Loss: 1.9

Epoch [2/5], Step [2923/3236], Loss: 2.1435, Perplexity: 8.5296
Epoch [2/5], Step [2924/3236], Loss: 1.8098, Perplexity: 6.1093
Epoch [2/5], Step [2925/3236], Loss: 2.4213, Perplexity: 11.2610
Epoch [2/5], Step [2926/3236], Loss: 1.8667, Perplexity: 6.4671
Epoch [2/5], Step [2927/3236], Loss: 2.0866, Perplexity: 8.0573
Epoch [2/5], Step [2928/3236], Loss: 1.9625, Perplexity: 7.1171
Epoch [2/5], Step [2929/3236], Loss: 1.9589, Perplexity: 7.0914
Epoch [2/5], Step [2930/3236], Loss: 1.9185, Perplexity: 6.8105
Epoch [2/5], Step [2931/3236], Loss: 1.9811, Perplexity: 7.2511
Epoch [2/5], Step [2932/3236], Loss: 2.0364, Perplexity: 7.6632
Epoch [2/5], Step [2933/3236], Loss: 2.0638, Perplexity: 7.8756
Epoch [2/5], Step [2934/3236], Loss: 2.0513, Perplexity: 7.7783
Epoch [2/5], Step [2935/3236], Loss: 2.2559, Perplexity: 9.5436
Epoch [2/5], Step [2936/3236], Loss: 2.3560, Perplexity: 10.5483
Epoch [2/5], Step [2937/3236], Loss: 2.0041, Perplexity: 7.4193
Epoch [2/5], Step [2938/3236], Loss: 2

Epoch [2/5], Step [3051/3236], Loss: 2.1310, Perplexity: 8.4231
Epoch [2/5], Step [3052/3236], Loss: 2.0646, Perplexity: 7.8825
Epoch [2/5], Step [3053/3236], Loss: 2.1381, Perplexity: 8.4836
Epoch [2/5], Step [3054/3236], Loss: 2.0596, Perplexity: 7.8432
Epoch [2/5], Step [3055/3236], Loss: 2.1269, Perplexity: 8.3887
Epoch [2/5], Step [3056/3236], Loss: 1.9809, Perplexity: 7.2489
Epoch [2/5], Step [3057/3236], Loss: 2.0416, Perplexity: 7.7033
Epoch [2/5], Step [3058/3236], Loss: 2.2867, Perplexity: 9.8422
Epoch [2/5], Step [3059/3236], Loss: 1.9569, Perplexity: 7.0772
Epoch [2/5], Step [3060/3236], Loss: 2.0388, Perplexity: 7.6811
Epoch [2/5], Step [3061/3236], Loss: 1.8673, Perplexity: 6.4711
Epoch [2/5], Step [3062/3236], Loss: 1.9813, Perplexity: 7.2520
Epoch [2/5], Step [3063/3236], Loss: 2.6850, Perplexity: 14.6582
Epoch [2/5], Step [3064/3236], Loss: 1.9160, Perplexity: 6.7939
Epoch [2/5], Step [3065/3236], Loss: 2.1048, Perplexity: 8.2058
Epoch [2/5], Step [3066/3236], Loss: 2.

Epoch [2/5], Step [3179/3236], Loss: 1.8304, Perplexity: 6.2365
Epoch [2/5], Step [3180/3236], Loss: 1.9568, Perplexity: 7.0765
Epoch [2/5], Step [3181/3236], Loss: 2.0380, Perplexity: 7.6752
Epoch [2/5], Step [3182/3236], Loss: 2.6880, Perplexity: 14.7021
Epoch [2/5], Step [3183/3236], Loss: 2.0119, Perplexity: 7.4775
Epoch [2/5], Step [3184/3236], Loss: 2.2469, Perplexity: 9.4586
Epoch [2/5], Step [3185/3236], Loss: 1.8783, Perplexity: 6.5421
Epoch [2/5], Step [3186/3236], Loss: 2.0942, Perplexity: 8.1187
Epoch [2/5], Step [3187/3236], Loss: 1.8803, Perplexity: 6.5554
Epoch [2/5], Step [3188/3236], Loss: 2.2529, Perplexity: 9.5150
Epoch [2/5], Step [3189/3236], Loss: 2.0285, Perplexity: 7.6029
Epoch [2/5], Step [3190/3236], Loss: 2.0095, Perplexity: 7.4599
Epoch [2/5], Step [3191/3236], Loss: 1.8700, Perplexity: 6.4885
Epoch [2/5], Step [3192/3236], Loss: 2.2063, Perplexity: 9.0820
Epoch [2/5], Step [3193/3236], Loss: 2.2266, Perplexity: 9.2683
Epoch [2/5], Step [3194/3236], Loss: 2.

Epoch [3/5], Step [74/3236], Loss: 1.9061, Perplexity: 6.7267
Epoch [3/5], Step [75/3236], Loss: 2.0844, Perplexity: 8.0399
Epoch [3/5], Step [76/3236], Loss: 1.9817, Perplexity: 7.2554
Epoch [3/5], Step [77/3236], Loss: 2.1232, Perplexity: 8.3579
Epoch [3/5], Step [78/3236], Loss: 2.1519, Perplexity: 8.6011
Epoch [3/5], Step [79/3236], Loss: 1.9118, Perplexity: 6.7650
Epoch [3/5], Step [80/3236], Loss: 2.1807, Perplexity: 8.8528
Epoch [3/5], Step [81/3236], Loss: 1.9097, Perplexity: 6.7511
Epoch [3/5], Step [82/3236], Loss: 1.9292, Perplexity: 6.8842
Epoch [3/5], Step [83/3236], Loss: 2.0759, Perplexity: 7.9713
Epoch [3/5], Step [84/3236], Loss: 1.9408, Perplexity: 6.9641
Epoch [3/5], Step [85/3236], Loss: 1.9042, Perplexity: 6.7141
Epoch [3/5], Step [86/3236], Loss: 2.3270, Perplexity: 10.2476
Epoch [3/5], Step [87/3236], Loss: 2.0063, Perplexity: 7.4356
Epoch [3/5], Step [88/3236], Loss: 1.9671, Perplexity: 7.1499
Epoch [3/5], Step [89/3236], Loss: 1.9554, Perplexity: 7.0670
Epoch [

Epoch [3/5], Step [205/3236], Loss: 1.9176, Perplexity: 6.8045
Epoch [3/5], Step [206/3236], Loss: 2.0797, Perplexity: 8.0024
Epoch [3/5], Step [207/3236], Loss: 2.0000, Perplexity: 7.3889
Epoch [3/5], Step [208/3236], Loss: 1.8955, Perplexity: 6.6561
Epoch [3/5], Step [209/3236], Loss: 2.0071, Perplexity: 7.4418
Epoch [3/5], Step [210/3236], Loss: 3.1550, Perplexity: 23.4528
Epoch [3/5], Step [211/3236], Loss: 1.9764, Perplexity: 7.2165
Epoch [3/5], Step [212/3236], Loss: 2.0025, Perplexity: 7.4078
Epoch [3/5], Step [213/3236], Loss: 2.0029, Perplexity: 7.4102
Epoch [3/5], Step [214/3236], Loss: 2.0447, Perplexity: 7.7268
Epoch [3/5], Step [215/3236], Loss: 2.2473, Perplexity: 9.4620
Epoch [3/5], Step [216/3236], Loss: 2.3533, Perplexity: 10.5207
Epoch [3/5], Step [217/3236], Loss: 1.8207, Perplexity: 6.1764
Epoch [3/5], Step [218/3236], Loss: 2.1614, Perplexity: 8.6833
Epoch [3/5], Step [219/3236], Loss: 2.3627, Perplexity: 10.6198
Epoch [3/5], Step [220/3236], Loss: 2.0952, Perplexi

Epoch [3/5], Step [335/3236], Loss: 2.0031, Perplexity: 7.4123
Epoch [3/5], Step [336/3236], Loss: 1.9592, Perplexity: 7.0933
Epoch [3/5], Step [337/3236], Loss: 1.8979, Perplexity: 6.6722
Epoch [3/5], Step [338/3236], Loss: 2.1182, Perplexity: 8.3162
Epoch [3/5], Step [339/3236], Loss: 2.2462, Perplexity: 9.4521
Epoch [3/5], Step [340/3236], Loss: 2.1033, Perplexity: 8.1930
Epoch [3/5], Step [341/3236], Loss: 2.2847, Perplexity: 9.8228
Epoch [3/5], Step [342/3236], Loss: 2.0033, Perplexity: 7.4132
Epoch [3/5], Step [343/3236], Loss: 1.9185, Perplexity: 6.8107
Epoch [3/5], Step [344/3236], Loss: 1.8204, Perplexity: 6.1741
Epoch [3/5], Step [345/3236], Loss: 2.0730, Perplexity: 7.9483
Epoch [3/5], Step [346/3236], Loss: 1.9587, Perplexity: 7.0903
Epoch [3/5], Step [347/3236], Loss: 2.2635, Perplexity: 9.6166
Epoch [3/5], Step [348/3236], Loss: 1.8925, Perplexity: 6.6357
Epoch [3/5], Step [349/3236], Loss: 2.4861, Perplexity: 12.0149
Epoch [3/5], Step [350/3236], Loss: 2.1684, Perplexity

Epoch [3/5], Step [465/3236], Loss: 1.9274, Perplexity: 6.8718
Epoch [3/5], Step [466/3236], Loss: 2.0934, Perplexity: 8.1125
Epoch [3/5], Step [467/3236], Loss: 2.0503, Perplexity: 7.7704
Epoch [3/5], Step [468/3236], Loss: 1.9478, Perplexity: 7.0129
Epoch [3/5], Step [469/3236], Loss: 2.0660, Perplexity: 7.8933
Epoch [3/5], Step [470/3236], Loss: 1.8727, Perplexity: 6.5058
Epoch [3/5], Step [471/3236], Loss: 2.1747, Perplexity: 8.7998
Epoch [3/5], Step [472/3236], Loss: 2.3167, Perplexity: 10.1421
Epoch [3/5], Step [473/3236], Loss: 1.9911, Perplexity: 7.3233
Epoch [3/5], Step [474/3236], Loss: 2.0932, Perplexity: 8.1110
Epoch [3/5], Step [475/3236], Loss: 2.0233, Perplexity: 7.5632
Epoch [3/5], Step [476/3236], Loss: 1.8809, Perplexity: 6.5597
Epoch [3/5], Step [477/3236], Loss: 2.3613, Perplexity: 10.6044
Epoch [3/5], Step [478/3236], Loss: 2.2712, Perplexity: 9.6912
Epoch [3/5], Step [479/3236], Loss: 2.0123, Perplexity: 7.4802
Epoch [3/5], Step [480/3236], Loss: 1.9288, Perplexit

Epoch [3/5], Step [595/3236], Loss: 1.9836, Perplexity: 7.2691
Epoch [3/5], Step [596/3236], Loss: 2.0168, Perplexity: 7.5143
Epoch [3/5], Step [597/3236], Loss: 1.9462, Perplexity: 7.0023
Epoch [3/5], Step [598/3236], Loss: 2.1077, Perplexity: 8.2296
Epoch [3/5], Step [599/3236], Loss: 1.9934, Perplexity: 7.3404
Epoch [3/5], Step [600/3236], Loss: 1.9939, Perplexity: 7.3443
Epoch [3/5], Step [601/3236], Loss: 1.8765, Perplexity: 6.5308
Epoch [3/5], Step [602/3236], Loss: 1.9051, Perplexity: 6.7200
Epoch [3/5], Step [603/3236], Loss: 1.9430, Perplexity: 6.9796
Epoch [3/5], Step [604/3236], Loss: 1.9307, Perplexity: 6.8941
Epoch [3/5], Step [605/3236], Loss: 2.0503, Perplexity: 7.7702
Epoch [3/5], Step [606/3236], Loss: 2.0372, Perplexity: 7.6688
Epoch [3/5], Step [607/3236], Loss: 2.0016, Perplexity: 7.4011
Epoch [3/5], Step [608/3236], Loss: 1.9534, Perplexity: 7.0527
Epoch [3/5], Step [609/3236], Loss: 2.2191, Perplexity: 9.1988
Epoch [3/5], Step [610/3236], Loss: 1.9443, Perplexity:

Epoch [3/5], Step [725/3236], Loss: 2.0341, Perplexity: 7.6456
Epoch [3/5], Step [726/3236], Loss: 2.2584, Perplexity: 9.5673
Epoch [3/5], Step [727/3236], Loss: 1.9749, Perplexity: 7.2062
Epoch [3/5], Step [728/3236], Loss: 1.9601, Perplexity: 7.1001
Epoch [3/5], Step [729/3236], Loss: 2.0470, Perplexity: 7.7445
Epoch [3/5], Step [730/3236], Loss: 2.2114, Perplexity: 9.1280
Epoch [3/5], Step [731/3236], Loss: 2.0596, Perplexity: 7.8430
Epoch [3/5], Step [732/3236], Loss: 1.8804, Perplexity: 6.5559
Epoch [3/5], Step [733/3236], Loss: 1.9465, Perplexity: 7.0042
Epoch [3/5], Step [734/3236], Loss: 1.9066, Perplexity: 6.7303
Epoch [3/5], Step [735/3236], Loss: 1.9102, Perplexity: 6.7547
Epoch [3/5], Step [736/3236], Loss: 1.8867, Perplexity: 6.5973
Epoch [3/5], Step [737/3236], Loss: 2.0919, Perplexity: 8.1001
Epoch [3/5], Step [738/3236], Loss: 2.0155, Perplexity: 7.5044
Epoch [3/5], Step [739/3236], Loss: 1.9201, Perplexity: 6.8214
Epoch [3/5], Step [740/3236], Loss: 1.8375, Perplexity:

Epoch [3/5], Step [855/3236], Loss: 2.3323, Perplexity: 10.3016
Epoch [3/5], Step [856/3236], Loss: 2.1200, Perplexity: 8.3315
Epoch [3/5], Step [857/3236], Loss: 1.9908, Perplexity: 7.3213
Epoch [3/5], Step [858/3236], Loss: 1.8574, Perplexity: 6.4068
Epoch [3/5], Step [859/3236], Loss: 1.9506, Perplexity: 7.0329
Epoch [3/5], Step [860/3236], Loss: 2.0232, Perplexity: 7.5622
Epoch [3/5], Step [861/3236], Loss: 2.2204, Perplexity: 9.2106
Epoch [3/5], Step [862/3236], Loss: 1.9975, Perplexity: 7.3709
Epoch [3/5], Step [863/3236], Loss: 1.9424, Perplexity: 6.9751
Epoch [3/5], Step [864/3236], Loss: 1.9972, Perplexity: 7.3686
Epoch [3/5], Step [865/3236], Loss: 2.1133, Perplexity: 8.2756
Epoch [3/5], Step [866/3236], Loss: 1.8603, Perplexity: 6.4258
Epoch [3/5], Step [867/3236], Loss: 2.0224, Perplexity: 7.5563
Epoch [3/5], Step [868/3236], Loss: 2.0031, Perplexity: 7.4122
Epoch [3/5], Step [869/3236], Loss: 1.8564, Perplexity: 6.4006
Epoch [3/5], Step [870/3236], Loss: 1.9227, Perplexity

Epoch [3/5], Step [985/3236], Loss: 2.1791, Perplexity: 8.8387
Epoch [3/5], Step [986/3236], Loss: 1.8078, Perplexity: 6.0967
Epoch [3/5], Step [987/3236], Loss: 1.8901, Perplexity: 6.6200
Epoch [3/5], Step [988/3236], Loss: 2.0749, Perplexity: 7.9638
Epoch [3/5], Step [989/3236], Loss: 1.9305, Perplexity: 6.8933
Epoch [3/5], Step [990/3236], Loss: 1.9228, Perplexity: 6.8401
Epoch [3/5], Step [991/3236], Loss: 1.8733, Perplexity: 6.5098
Epoch [3/5], Step [992/3236], Loss: 2.0654, Perplexity: 7.8886
Epoch [3/5], Step [993/3236], Loss: 1.9133, Perplexity: 6.7752
Epoch [3/5], Step [994/3236], Loss: 1.9968, Perplexity: 7.3654
Epoch [3/5], Step [995/3236], Loss: 2.0646, Perplexity: 7.8820
Epoch [3/5], Step [996/3236], Loss: 1.9866, Perplexity: 7.2909
Epoch [3/5], Step [997/3236], Loss: 1.9584, Perplexity: 7.0878
Epoch [3/5], Step [998/3236], Loss: 2.2864, Perplexity: 9.8390
Epoch [3/5], Step [999/3236], Loss: 1.9309, Perplexity: 6.8955
Epoch [3/5], Step [1000/3236], Loss: 1.9496, Perplexity

Epoch [3/5], Step [1114/3236], Loss: 2.2181, Perplexity: 9.1900
Epoch [3/5], Step [1115/3236], Loss: 2.1166, Perplexity: 8.3027
Epoch [3/5], Step [1116/3236], Loss: 1.9628, Perplexity: 7.1195
Epoch [3/5], Step [1117/3236], Loss: 2.2300, Perplexity: 9.3003
Epoch [3/5], Step [1118/3236], Loss: 2.1005, Perplexity: 8.1700
Epoch [3/5], Step [1119/3236], Loss: 1.9360, Perplexity: 6.9307
Epoch [3/5], Step [1120/3236], Loss: 1.9751, Perplexity: 7.2076
Epoch [3/5], Step [1121/3236], Loss: 2.0727, Perplexity: 7.9466
Epoch [3/5], Step [1122/3236], Loss: 1.9460, Perplexity: 7.0009
Epoch [3/5], Step [1123/3236], Loss: 1.9384, Perplexity: 6.9478
Epoch [3/5], Step [1124/3236], Loss: 2.1803, Perplexity: 8.8488
Epoch [3/5], Step [1125/3236], Loss: 1.9579, Perplexity: 7.0847
Epoch [3/5], Step [1126/3236], Loss: 1.8986, Perplexity: 6.6766
Epoch [3/5], Step [1127/3236], Loss: 2.0866, Perplexity: 8.0573
Epoch [3/5], Step [1128/3236], Loss: 2.0101, Perplexity: 7.4642
Epoch [3/5], Step [1129/3236], Loss: 2.2

Epoch [3/5], Step [1242/3236], Loss: 1.8677, Perplexity: 6.4733
Epoch [3/5], Step [1243/3236], Loss: 1.9831, Perplexity: 7.2654
Epoch [3/5], Step [1244/3236], Loss: 1.8686, Perplexity: 6.4790
Epoch [3/5], Step [1245/3236], Loss: 1.9949, Perplexity: 7.3515
Epoch [3/5], Step [1246/3236], Loss: 1.9572, Perplexity: 7.0795
Epoch [3/5], Step [1247/3236], Loss: 2.0196, Perplexity: 7.5350
Epoch [3/5], Step [1248/3236], Loss: 2.3676, Perplexity: 10.6719
Epoch [3/5], Step [1249/3236], Loss: 1.9518, Perplexity: 7.0410
Epoch [3/5], Step [1250/3236], Loss: 2.3351, Perplexity: 10.3307
Epoch [3/5], Step [1251/3236], Loss: 1.8901, Perplexity: 6.6200
Epoch [3/5], Step [1252/3236], Loss: 1.9699, Perplexity: 7.1703
Epoch [3/5], Step [1253/3236], Loss: 1.8269, Perplexity: 6.2143
Epoch [3/5], Step [1254/3236], Loss: 1.9612, Perplexity: 7.1077
Epoch [3/5], Step [1255/3236], Loss: 1.9016, Perplexity: 6.6966
Epoch [3/5], Step [1256/3236], Loss: 1.9253, Perplexity: 6.8570
Epoch [3/5], Step [1257/3236], Loss: 1

Epoch [3/5], Step [1370/3236], Loss: 2.2033, Perplexity: 9.0548
Epoch [3/5], Step [1371/3236], Loss: 1.8436, Perplexity: 6.3195
Epoch [3/5], Step [1372/3236], Loss: 1.8103, Perplexity: 6.1122
Epoch [3/5], Step [1373/3236], Loss: 2.0293, Perplexity: 7.6088
Epoch [3/5], Step [1374/3236], Loss: 2.3394, Perplexity: 10.3749
Epoch [3/5], Step [1375/3236], Loss: 1.9596, Perplexity: 7.0965
Epoch [3/5], Step [1376/3236], Loss: 1.9208, Perplexity: 6.8266
Epoch [3/5], Step [1377/3236], Loss: 1.8056, Perplexity: 6.0838
Epoch [3/5], Step [1378/3236], Loss: 2.1604, Perplexity: 8.6745
Epoch [3/5], Step [1379/3236], Loss: 1.9529, Perplexity: 7.0489
Epoch [3/5], Step [1380/3236], Loss: 2.0341, Perplexity: 7.6457
Epoch [3/5], Step [1381/3236], Loss: 1.8853, Perplexity: 6.5881
Epoch [3/5], Step [1382/3236], Loss: 2.0025, Perplexity: 7.4079
Epoch [3/5], Step [1383/3236], Loss: 2.0950, Perplexity: 8.1251
Epoch [3/5], Step [1384/3236], Loss: 1.9906, Perplexity: 7.3202
Epoch [3/5], Step [1385/3236], Loss: 1.

Epoch [3/5], Step [1498/3236], Loss: 1.8512, Perplexity: 6.3675
Epoch [3/5], Step [1499/3236], Loss: 2.0880, Perplexity: 8.0689
Epoch [3/5], Step [1500/3236], Loss: 1.9330, Perplexity: 6.9104
Epoch [3/5], Step [1501/3236], Loss: 1.8766, Perplexity: 6.5311
Epoch [3/5], Step [1502/3236], Loss: 1.9348, Perplexity: 6.9225
Epoch [3/5], Step [1503/3236], Loss: 1.8406, Perplexity: 6.3005
Epoch [3/5], Step [1504/3236], Loss: 2.0034, Perplexity: 7.4145
Epoch [3/5], Step [1505/3236], Loss: 1.9520, Perplexity: 7.0431
Epoch [3/5], Step [1506/3236], Loss: 1.8525, Perplexity: 6.3754
Epoch [3/5], Step [1507/3236], Loss: 1.9420, Perplexity: 6.9726
Epoch [3/5], Step [1508/3236], Loss: 1.9720, Perplexity: 7.1849
Epoch [3/5], Step [1509/3236], Loss: 1.9309, Perplexity: 6.8958
Epoch [3/5], Step [1510/3236], Loss: 1.9191, Perplexity: 6.8151
Epoch [3/5], Step [1511/3236], Loss: 2.0223, Perplexity: 7.5559
Epoch [3/5], Step [1512/3236], Loss: 1.9763, Perplexity: 7.2163
Epoch [3/5], Step [1513/3236], Loss: 1.9

Epoch [3/5], Step [1626/3236], Loss: 1.9562, Perplexity: 7.0725
Epoch [3/5], Step [1627/3236], Loss: 1.8139, Perplexity: 6.1346
Epoch [3/5], Step [1628/3236], Loss: 2.0964, Perplexity: 8.1369
Epoch [3/5], Step [1629/3236], Loss: 1.8364, Perplexity: 6.2741
Epoch [3/5], Step [1630/3236], Loss: 2.0928, Perplexity: 8.1076
Epoch [3/5], Step [1631/3236], Loss: 2.8533, Perplexity: 17.3450
Epoch [3/5], Step [1632/3236], Loss: 1.9882, Perplexity: 7.3021
Epoch [3/5], Step [1633/3236], Loss: 1.8341, Perplexity: 6.2597
Epoch [3/5], Step [1634/3236], Loss: 1.9727, Perplexity: 7.1902
Epoch [3/5], Step [1635/3236], Loss: 1.9732, Perplexity: 7.1938
Epoch [3/5], Step [1636/3236], Loss: 2.3004, Perplexity: 9.9785
Epoch [3/5], Step [1637/3236], Loss: 1.9398, Perplexity: 6.9576
Epoch [3/5], Step [1638/3236], Loss: 1.9870, Perplexity: 7.2934
Epoch [3/5], Step [1639/3236], Loss: 2.0703, Perplexity: 7.9269
Epoch [3/5], Step [1640/3236], Loss: 1.8578, Perplexity: 6.4097
Epoch [3/5], Step [1641/3236], Loss: 1.

Epoch [3/5], Step [1754/3236], Loss: 1.9201, Perplexity: 6.8217
Epoch [3/5], Step [1755/3236], Loss: 1.9311, Perplexity: 6.8969
Epoch [3/5], Step [1756/3236], Loss: 1.9015, Perplexity: 6.6959
Epoch [3/5], Step [1757/3236], Loss: 1.7802, Perplexity: 5.9309
Epoch [3/5], Step [1758/3236], Loss: 2.1176, Perplexity: 8.3111
Epoch [3/5], Step [1759/3236], Loss: 1.9595, Perplexity: 7.0959
Epoch [3/5], Step [1760/3236], Loss: 1.9830, Perplexity: 7.2646
Epoch [3/5], Step [1761/3236], Loss: 1.9288, Perplexity: 6.8816
Epoch [3/5], Step [1762/3236], Loss: 2.7996, Perplexity: 16.4376
Epoch [3/5], Step [1763/3236], Loss: 2.0174, Perplexity: 7.5185
Epoch [3/5], Step [1764/3236], Loss: 2.4197, Perplexity: 11.2425
Epoch [3/5], Step [1765/3236], Loss: 1.9244, Perplexity: 6.8510
Epoch [3/5], Step [1766/3236], Loss: 2.0611, Perplexity: 7.8545
Epoch [3/5], Step [1767/3236], Loss: 2.4565, Perplexity: 11.6642
Epoch [3/5], Step [1768/3236], Loss: 1.9068, Perplexity: 6.7313
Epoch [3/5], Step [1769/3236], Loss: 

Epoch [3/5], Step [1882/3236], Loss: 2.1517, Perplexity: 8.5993
Epoch [3/5], Step [1883/3236], Loss: 2.8884, Perplexity: 17.9653
Epoch [3/5], Step [1884/3236], Loss: 1.8245, Perplexity: 6.1997
Epoch [3/5], Step [1885/3236], Loss: 1.9169, Perplexity: 6.7997
Epoch [3/5], Step [1886/3236], Loss: 2.0536, Perplexity: 7.7959
Epoch [3/5], Step [1887/3236], Loss: 2.3032, Perplexity: 10.0063
Epoch [3/5], Step [1888/3236], Loss: 1.9349, Perplexity: 6.9231
Epoch [3/5], Step [1889/3236], Loss: 1.9937, Perplexity: 7.3428
Epoch [3/5], Step [1890/3236], Loss: 2.0040, Perplexity: 7.4187
Epoch [3/5], Step [1891/3236], Loss: 2.1563, Perplexity: 8.6387
Epoch [3/5], Step [1892/3236], Loss: 2.0681, Perplexity: 7.9098
Epoch [3/5], Step [1893/3236], Loss: 1.9215, Perplexity: 6.8314
Epoch [3/5], Step [1894/3236], Loss: 2.3386, Perplexity: 10.3671
Epoch [3/5], Step [1895/3236], Loss: 1.8711, Perplexity: 6.4952
Epoch [3/5], Step [1896/3236], Loss: 1.9241, Perplexity: 6.8489
Epoch [3/5], Step [1897/3236], Loss: 

Epoch [3/5], Step [2010/3236], Loss: 1.8423, Perplexity: 6.3111
Epoch [3/5], Step [2011/3236], Loss: 1.8943, Perplexity: 6.6482
Epoch [3/5], Step [2012/3236], Loss: 1.9584, Perplexity: 7.0879
Epoch [3/5], Step [2013/3236], Loss: 1.9547, Perplexity: 7.0620
Epoch [3/5], Step [2014/3236], Loss: 2.1241, Perplexity: 8.3652
Epoch [3/5], Step [2015/3236], Loss: 1.9604, Perplexity: 7.1020
Epoch [3/5], Step [2016/3236], Loss: 1.9071, Perplexity: 6.7332
Epoch [3/5], Step [2017/3236], Loss: 1.7891, Perplexity: 5.9840
Epoch [3/5], Step [2018/3236], Loss: 1.9737, Perplexity: 7.1974
Epoch [3/5], Step [2019/3236], Loss: 1.8580, Perplexity: 6.4109
Epoch [3/5], Step [2020/3236], Loss: 1.9133, Perplexity: 6.7752
Epoch [3/5], Step [2021/3236], Loss: 1.9235, Perplexity: 6.8449
Epoch [3/5], Step [2022/3236], Loss: 2.1302, Perplexity: 8.4168
Epoch [3/5], Step [2023/3236], Loss: 1.9782, Perplexity: 7.2300
Epoch [3/5], Step [2024/3236], Loss: 1.8968, Perplexity: 6.6647
Epoch [3/5], Step [2025/3236], Loss: 1.8

Epoch [3/5], Step [2138/3236], Loss: 1.9517, Perplexity: 7.0407
Epoch [3/5], Step [2139/3236], Loss: 2.2153, Perplexity: 9.1646
Epoch [3/5], Step [2140/3236], Loss: 2.0962, Perplexity: 8.1349
Epoch [3/5], Step [2141/3236], Loss: 1.8923, Perplexity: 6.6349
Epoch [3/5], Step [2142/3236], Loss: 1.7924, Perplexity: 6.0040
Epoch [3/5], Step [2143/3236], Loss: 1.9051, Perplexity: 6.7203
Epoch [3/5], Step [2144/3236], Loss: 1.9190, Perplexity: 6.8141
Epoch [3/5], Step [2145/3236], Loss: 1.8980, Perplexity: 6.6728
Epoch [3/5], Step [2146/3236], Loss: 2.1973, Perplexity: 9.0002
Epoch [3/5], Step [2147/3236], Loss: 2.3640, Perplexity: 10.6333
Epoch [3/5], Step [2148/3236], Loss: 2.1116, Perplexity: 8.2613
Epoch [3/5], Step [2149/3236], Loss: 1.8510, Perplexity: 6.3665
Epoch [3/5], Step [2150/3236], Loss: 1.8576, Perplexity: 6.4086
Epoch [3/5], Step [2151/3236], Loss: 2.3390, Perplexity: 10.3713
Epoch [3/5], Step [2152/3236], Loss: 1.9387, Perplexity: 6.9497
Epoch [3/5], Step [2153/3236], Loss: 2

Epoch [3/5], Step [2266/3236], Loss: 1.9832, Perplexity: 7.2663
Epoch [3/5], Step [2267/3236], Loss: 2.5422, Perplexity: 12.7072
Epoch [3/5], Step [2268/3236], Loss: 2.3590, Perplexity: 10.5805
Epoch [3/5], Step [2269/3236], Loss: 2.2021, Perplexity: 9.0443
Epoch [3/5], Step [2270/3236], Loss: 1.8836, Perplexity: 6.5769
Epoch [3/5], Step [2271/3236], Loss: 1.8441, Perplexity: 6.3227
Epoch [3/5], Step [2272/3236], Loss: 1.9909, Perplexity: 7.3222
Epoch [3/5], Step [2273/3236], Loss: 2.0328, Perplexity: 7.6353
Epoch [3/5], Step [2274/3236], Loss: 2.0295, Perplexity: 7.6105
Epoch [3/5], Step [2275/3236], Loss: 1.9585, Perplexity: 7.0889
Epoch [3/5], Step [2276/3236], Loss: 2.0835, Perplexity: 8.0328
Epoch [3/5], Step [2277/3236], Loss: 1.9975, Perplexity: 7.3702
Epoch [3/5], Step [2278/3236], Loss: 1.9392, Perplexity: 6.9529
Epoch [3/5], Step [2279/3236], Loss: 1.8402, Perplexity: 6.2975
Epoch [3/5], Step [2280/3236], Loss: 1.9054, Perplexity: 6.7219
Epoch [3/5], Step [2281/3236], Loss: 2

Epoch [3/5], Step [2394/3236], Loss: 1.8784, Perplexity: 6.5427
Epoch [3/5], Step [2395/3236], Loss: 1.9528, Perplexity: 7.0486
Epoch [3/5], Step [2396/3236], Loss: 2.0687, Perplexity: 7.9148
Epoch [3/5], Step [2397/3236], Loss: 2.0464, Perplexity: 7.7399
Epoch [3/5], Step [2398/3236], Loss: 1.8982, Perplexity: 6.6737
Epoch [3/5], Step [2399/3236], Loss: 2.0307, Perplexity: 7.6192
Epoch [3/5], Step [2400/3236], Loss: 1.9249, Perplexity: 6.8544
Epoch [3/5], Step [2401/3236], Loss: 1.8660, Perplexity: 6.4624
Epoch [3/5], Step [2402/3236], Loss: 2.2733, Perplexity: 9.7112
Epoch [3/5], Step [2403/3236], Loss: 1.8726, Perplexity: 6.5055
Epoch [3/5], Step [2404/3236], Loss: 1.9007, Perplexity: 6.6907
Epoch [3/5], Step [2405/3236], Loss: 1.9827, Perplexity: 7.2627
Epoch [3/5], Step [2406/3236], Loss: 1.8701, Perplexity: 6.4888
Epoch [3/5], Step [2407/3236], Loss: 2.2334, Perplexity: 9.3315
Epoch [3/5], Step [2408/3236], Loss: 2.1494, Perplexity: 8.5795
Epoch [3/5], Step [2409/3236], Loss: 1.9

Epoch [3/5], Step [2522/3236], Loss: 1.9248, Perplexity: 6.8537
Epoch [3/5], Step [2523/3236], Loss: 1.9430, Perplexity: 6.9794
Epoch [3/5], Step [2524/3236], Loss: 2.9531, Perplexity: 19.1659
Epoch [3/5], Step [2525/3236], Loss: 2.1569, Perplexity: 8.6444
Epoch [3/5], Step [2526/3236], Loss: 2.0463, Perplexity: 7.7396
Epoch [3/5], Step [2527/3236], Loss: 1.8544, Perplexity: 6.3876
Epoch [3/5], Step [2528/3236], Loss: 2.1691, Perplexity: 8.7502
Epoch [3/5], Step [2529/3236], Loss: 2.0050, Perplexity: 7.4260
Epoch [3/5], Step [2530/3236], Loss: 1.9978, Perplexity: 7.3731
Epoch [3/5], Step [2531/3236], Loss: 1.9760, Perplexity: 7.2139
Epoch [3/5], Step [2532/3236], Loss: 1.8697, Perplexity: 6.4864
Epoch [3/5], Step [2533/3236], Loss: 1.9275, Perplexity: 6.8726
Epoch [3/5], Step [2534/3236], Loss: 2.0223, Perplexity: 7.5561
Epoch [3/5], Step [2535/3236], Loss: 2.1834, Perplexity: 8.8762
Epoch [3/5], Step [2536/3236], Loss: 1.9314, Perplexity: 6.8994
Epoch [3/5], Step [2537/3236], Loss: 1.

Epoch [3/5], Step [2650/3236], Loss: 1.8463, Perplexity: 6.3366
Epoch [3/5], Step [2651/3236], Loss: 1.8824, Perplexity: 6.5691
Epoch [3/5], Step [2652/3236], Loss: 1.9534, Perplexity: 7.0527
Epoch [3/5], Step [2653/3236], Loss: 2.8386, Perplexity: 17.0912
Epoch [3/5], Step [2654/3236], Loss: 2.0514, Perplexity: 7.7788
Epoch [3/5], Step [2655/3236], Loss: 2.0486, Perplexity: 7.7570
Epoch [3/5], Step [2656/3236], Loss: 1.9447, Perplexity: 6.9913
Epoch [3/5], Step [2657/3236], Loss: 1.8841, Perplexity: 6.5803
Epoch [3/5], Step [2658/3236], Loss: 1.9783, Perplexity: 7.2306
Epoch [3/5], Step [2659/3236], Loss: 1.8939, Perplexity: 6.6449
Epoch [3/5], Step [2660/3236], Loss: 1.9656, Perplexity: 7.1390
Epoch [3/5], Step [2661/3236], Loss: 1.7796, Perplexity: 5.9277
Epoch [3/5], Step [2662/3236], Loss: 1.9449, Perplexity: 6.9932
Epoch [3/5], Step [2663/3236], Loss: 2.0788, Perplexity: 7.9949
Epoch [3/5], Step [2664/3236], Loss: 2.0686, Perplexity: 7.9134
Epoch [3/5], Step [2665/3236], Loss: 2.

Epoch [3/5], Step [2778/3236], Loss: 1.8583, Perplexity: 6.4129
Epoch [3/5], Step [2779/3236], Loss: 2.5656, Perplexity: 13.0080
Epoch [3/5], Step [2780/3236], Loss: 2.3290, Perplexity: 10.2672
Epoch [3/5], Step [2781/3236], Loss: 1.9105, Perplexity: 6.7564
Epoch [3/5], Step [2782/3236], Loss: 2.1301, Perplexity: 8.4156
Epoch [3/5], Step [2783/3236], Loss: 2.0272, Perplexity: 7.5926
Epoch [3/5], Step [2784/3236], Loss: 1.9027, Perplexity: 6.7038
Epoch [3/5], Step [2785/3236], Loss: 2.4222, Perplexity: 11.2703
Epoch [3/5], Step [2786/3236], Loss: 2.5572, Perplexity: 12.9003
Epoch [3/5], Step [2787/3236], Loss: 2.5842, Perplexity: 13.2526
Epoch [3/5], Step [2788/3236], Loss: 1.9302, Perplexity: 6.8906
Epoch [3/5], Step [2789/3236], Loss: 2.2639, Perplexity: 9.6208
Epoch [3/5], Step [2790/3236], Loss: 2.0480, Perplexity: 7.7520
Epoch [3/5], Step [2791/3236], Loss: 1.9680, Perplexity: 7.1561
Epoch [3/5], Step [2792/3236], Loss: 1.9348, Perplexity: 6.9228
Epoch [3/5], Step [2793/3236], Loss

Epoch [3/5], Step [2906/3236], Loss: 1.9331, Perplexity: 6.9111
Epoch [3/5], Step [2907/3236], Loss: 1.8864, Perplexity: 6.5958
Epoch [3/5], Step [2908/3236], Loss: 2.0397, Perplexity: 7.6887
Epoch [3/5], Step [2909/3236], Loss: 1.8637, Perplexity: 6.4473
Epoch [3/5], Step [2910/3236], Loss: 1.9030, Perplexity: 6.7060
Epoch [3/5], Step [2911/3236], Loss: 1.7787, Perplexity: 5.9221
Epoch [3/5], Step [2912/3236], Loss: 2.0011, Perplexity: 7.3975
Epoch [3/5], Step [2913/3236], Loss: 1.8016, Perplexity: 6.0592
Epoch [3/5], Step [2914/3236], Loss: 2.2460, Perplexity: 9.4500
Epoch [3/5], Step [2915/3236], Loss: 1.9035, Perplexity: 6.7096
Epoch [3/5], Step [2916/3236], Loss: 2.7238, Perplexity: 15.2381
Epoch [3/5], Step [2917/3236], Loss: 1.7789, Perplexity: 5.9233
Epoch [3/5], Step [2918/3236], Loss: 1.8732, Perplexity: 6.5091
Epoch [3/5], Step [2919/3236], Loss: 1.9252, Perplexity: 6.8568
Epoch [3/5], Step [2920/3236], Loss: 2.0926, Perplexity: 8.1056
Epoch [3/5], Step [2921/3236], Loss: 2.

Epoch [3/5], Step [3034/3236], Loss: 2.1541, Perplexity: 8.6204
Epoch [3/5], Step [3035/3236], Loss: 1.8868, Perplexity: 6.5984
Epoch [3/5], Step [3036/3236], Loss: 3.0939, Perplexity: 22.0623
Epoch [3/5], Step [3037/3236], Loss: 1.9643, Perplexity: 7.1302
Epoch [3/5], Step [3038/3236], Loss: 2.0316, Perplexity: 7.6266
Epoch [3/5], Step [3039/3236], Loss: 1.6934, Perplexity: 5.4380
Epoch [3/5], Step [3040/3236], Loss: 1.9499, Perplexity: 7.0278
Epoch [3/5], Step [3041/3236], Loss: 1.7867, Perplexity: 5.9695
Epoch [3/5], Step [3042/3236], Loss: 1.9665, Perplexity: 7.1459
Epoch [3/5], Step [3043/3236], Loss: 2.0812, Perplexity: 8.0142
Epoch [3/5], Step [3044/3236], Loss: 1.8474, Perplexity: 6.3436
Epoch [3/5], Step [3045/3236], Loss: 2.3436, Perplexity: 10.4192
Epoch [3/5], Step [3046/3236], Loss: 2.1966, Perplexity: 8.9942
Epoch [3/5], Step [3047/3236], Loss: 1.9667, Perplexity: 7.1471
Epoch [3/5], Step [3048/3236], Loss: 1.8617, Perplexity: 6.4346
Epoch [3/5], Step [3049/3236], Loss: 2

Epoch [3/5], Step [3162/3236], Loss: 1.8541, Perplexity: 6.3861
Epoch [3/5], Step [3163/3236], Loss: 1.9560, Perplexity: 7.0711
Epoch [3/5], Step [3164/3236], Loss: 1.8287, Perplexity: 6.2258
Epoch [3/5], Step [3165/3236], Loss: 1.8394, Perplexity: 6.2928
Epoch [3/5], Step [3166/3236], Loss: 1.9073, Perplexity: 6.7351
Epoch [3/5], Step [3167/3236], Loss: 2.1683, Perplexity: 8.7435
Epoch [3/5], Step [3168/3236], Loss: 2.1982, Perplexity: 9.0087
Epoch [3/5], Step [3169/3236], Loss: 1.9357, Perplexity: 6.9292
Epoch [3/5], Step [3170/3236], Loss: 1.8334, Perplexity: 6.2552
Epoch [3/5], Step [3171/3236], Loss: 1.8292, Perplexity: 6.2287
Epoch [3/5], Step [3172/3236], Loss: 1.8165, Perplexity: 6.1502
Epoch [3/5], Step [3173/3236], Loss: 2.1052, Perplexity: 8.2085
Epoch [3/5], Step [3174/3236], Loss: 2.2821, Perplexity: 9.7974
Epoch [3/5], Step [3175/3236], Loss: 1.8447, Perplexity: 6.3260
Epoch [3/5], Step [3176/3236], Loss: 2.0062, Perplexity: 7.4348
Epoch [3/5], Step [3177/3236], Loss: 1.8

Epoch [4/5], Step [56/3236], Loss: 1.8042, Perplexity: 6.0749
Epoch [4/5], Step [57/3236], Loss: 2.1924, Perplexity: 8.9568
Epoch [4/5], Step [58/3236], Loss: 1.9823, Perplexity: 7.2595
Epoch [4/5], Step [59/3236], Loss: 1.8641, Perplexity: 6.4500
Epoch [4/5], Step [60/3236], Loss: 1.8739, Perplexity: 6.5134
Epoch [4/5], Step [61/3236], Loss: 1.8763, Perplexity: 6.5296
Epoch [4/5], Step [62/3236], Loss: 1.9139, Perplexity: 6.7791
Epoch [4/5], Step [63/3236], Loss: 1.9711, Perplexity: 7.1783
Epoch [4/5], Step [64/3236], Loss: 2.1583, Perplexity: 8.6565
Epoch [4/5], Step [65/3236], Loss: 2.0808, Perplexity: 8.0113
Epoch [4/5], Step [66/3236], Loss: 1.8764, Perplexity: 6.5301
Epoch [4/5], Step [67/3236], Loss: 1.9431, Perplexity: 6.9804
Epoch [4/5], Step [68/3236], Loss: 1.8677, Perplexity: 6.4735
Epoch [4/5], Step [69/3236], Loss: 2.0570, Perplexity: 7.8227
Epoch [4/5], Step [70/3236], Loss: 2.1532, Perplexity: 8.6124
Epoch [4/5], Step [71/3236], Loss: 1.8975, Perplexity: 6.6689
Epoch [4

Epoch [4/5], Step [187/3236], Loss: 2.1910, Perplexity: 8.9446
Epoch [4/5], Step [188/3236], Loss: 1.9371, Perplexity: 6.9386
Epoch [4/5], Step [189/3236], Loss: 1.8017, Perplexity: 6.0598
Epoch [4/5], Step [190/3236], Loss: 1.9094, Perplexity: 6.7493
Epoch [4/5], Step [191/3236], Loss: 1.9082, Perplexity: 6.7411
Epoch [4/5], Step [192/3236], Loss: 1.6992, Perplexity: 5.4693
Epoch [4/5], Step [193/3236], Loss: 1.7929, Perplexity: 6.0068
Epoch [4/5], Step [194/3236], Loss: 1.8996, Perplexity: 6.6832
Epoch [4/5], Step [195/3236], Loss: 1.8069, Perplexity: 6.0917
Epoch [4/5], Step [196/3236], Loss: 1.9702, Perplexity: 7.1720
Epoch [4/5], Step [197/3236], Loss: 1.9583, Perplexity: 7.0870
Epoch [4/5], Step [198/3236], Loss: 1.7597, Perplexity: 5.8109
Epoch [4/5], Step [199/3236], Loss: 1.8053, Perplexity: 6.0821
Epoch [4/5], Step [200/3236], Loss: 1.9755, Perplexity: 7.2102
Epoch [4/5], Step [201/3236], Loss: 1.9487, Perplexity: 7.0192
Epoch [4/5], Step [202/3236], Loss: 1.8229, Perplexity:

Epoch [4/5], Step [317/3236], Loss: 1.8686, Perplexity: 6.4789
Epoch [4/5], Step [318/3236], Loss: 1.8895, Perplexity: 6.6162
Epoch [4/5], Step [319/3236], Loss: 1.8873, Perplexity: 6.6017
Epoch [4/5], Step [320/3236], Loss: 1.8965, Perplexity: 6.6625
Epoch [4/5], Step [321/3236], Loss: 2.0905, Perplexity: 8.0890
Epoch [4/5], Step [322/3236], Loss: 2.1215, Perplexity: 8.3433
Epoch [4/5], Step [323/3236], Loss: 2.0831, Perplexity: 8.0295
Epoch [4/5], Step [324/3236], Loss: 2.1374, Perplexity: 8.4777
Epoch [4/5], Step [325/3236], Loss: 1.8849, Perplexity: 6.5856
Epoch [4/5], Step [326/3236], Loss: 2.4007, Perplexity: 11.0310
Epoch [4/5], Step [327/3236], Loss: 1.8748, Perplexity: 6.5196
Epoch [4/5], Step [328/3236], Loss: 1.9082, Perplexity: 6.7408
Epoch [4/5], Step [329/3236], Loss: 2.0036, Perplexity: 7.4158
Epoch [4/5], Step [330/3236], Loss: 1.9560, Perplexity: 7.0710
Epoch [4/5], Step [331/3236], Loss: 1.9143, Perplexity: 6.7823
Epoch [4/5], Step [332/3236], Loss: 2.2565, Perplexity

Epoch [4/5], Step [447/3236], Loss: 1.8405, Perplexity: 6.2998
Epoch [4/5], Step [448/3236], Loss: 1.8879, Perplexity: 6.6055
Epoch [4/5], Step [449/3236], Loss: 1.8954, Perplexity: 6.6554
Epoch [4/5], Step [450/3236], Loss: 1.7990, Perplexity: 6.0435
Epoch [4/5], Step [451/3236], Loss: 2.2464, Perplexity: 9.4539
Epoch [4/5], Step [452/3236], Loss: 1.8801, Perplexity: 6.5540
Epoch [4/5], Step [453/3236], Loss: 1.9319, Perplexity: 6.9024
Epoch [4/5], Step [454/3236], Loss: 1.8429, Perplexity: 6.3149
Epoch [4/5], Step [455/3236], Loss: 1.8395, Perplexity: 6.2931
Epoch [4/5], Step [456/3236], Loss: 2.1147, Perplexity: 8.2868
Epoch [4/5], Step [457/3236], Loss: 1.8876, Perplexity: 6.6033
Epoch [4/5], Step [458/3236], Loss: 1.8043, Perplexity: 6.0759
Epoch [4/5], Step [459/3236], Loss: 1.9024, Perplexity: 6.7021
Epoch [4/5], Step [460/3236], Loss: 2.1333, Perplexity: 8.4425
Epoch [4/5], Step [461/3236], Loss: 2.2983, Perplexity: 9.9577
Epoch [4/5], Step [462/3236], Loss: 1.8157, Perplexity:

Epoch [4/5], Step [577/3236], Loss: 1.9141, Perplexity: 6.7808
Epoch [4/5], Step [578/3236], Loss: 2.6111, Perplexity: 13.6144
Epoch [4/5], Step [579/3236], Loss: 2.0345, Perplexity: 7.6488
Epoch [4/5], Step [580/3236], Loss: 1.9337, Perplexity: 6.9148
Epoch [4/5], Step [581/3236], Loss: 1.9827, Perplexity: 7.2622
Epoch [4/5], Step [582/3236], Loss: 2.1118, Perplexity: 8.2632
Epoch [4/5], Step [583/3236], Loss: 1.8185, Perplexity: 6.1626
Epoch [4/5], Step [584/3236], Loss: 1.9549, Perplexity: 7.0634
Epoch [4/5], Step [585/3236], Loss: 2.0641, Perplexity: 7.8784
Epoch [4/5], Step [586/3236], Loss: 1.9844, Perplexity: 7.2746
Epoch [4/5], Step [587/3236], Loss: 1.8196, Perplexity: 6.1695
Epoch [4/5], Step [588/3236], Loss: 1.9179, Perplexity: 6.8068
Epoch [4/5], Step [589/3236], Loss: 1.8118, Perplexity: 6.1218
Epoch [4/5], Step [590/3236], Loss: 2.0732, Perplexity: 7.9503
Epoch [4/5], Step [591/3236], Loss: 1.9041, Perplexity: 6.7133
Epoch [4/5], Step [592/3236], Loss: 1.7428, Perplexity

Epoch [4/5], Step [707/3236], Loss: 1.9029, Perplexity: 6.7050
Epoch [4/5], Step [708/3236], Loss: 1.8819, Perplexity: 6.5661
Epoch [4/5], Step [709/3236], Loss: 1.9363, Perplexity: 6.9327
Epoch [4/5], Step [710/3236], Loss: 2.0156, Perplexity: 7.5053
Epoch [4/5], Step [711/3236], Loss: 2.1171, Perplexity: 8.3070
Epoch [4/5], Step [712/3236], Loss: 2.0498, Perplexity: 7.7660
Epoch [4/5], Step [713/3236], Loss: 1.8308, Perplexity: 6.2389
Epoch [4/5], Step [714/3236], Loss: 1.7945, Perplexity: 6.0165
Epoch [4/5], Step [715/3236], Loss: 1.8716, Perplexity: 6.4989
Epoch [4/5], Step [716/3236], Loss: 1.8458, Perplexity: 6.3332
Epoch [4/5], Step [717/3236], Loss: 1.8008, Perplexity: 6.0546
Epoch [4/5], Step [718/3236], Loss: 1.8802, Perplexity: 6.5546
Epoch [4/5], Step [719/3236], Loss: 1.8198, Perplexity: 6.1704
Epoch [4/5], Step [720/3236], Loss: 2.0262, Perplexity: 7.5849
Epoch [4/5], Step [721/3236], Loss: 1.9463, Perplexity: 7.0024
Epoch [4/5], Step [722/3236], Loss: 1.9182, Perplexity:

Epoch [4/5], Step [837/3236], Loss: 1.9635, Perplexity: 7.1242
Epoch [4/5], Step [838/3236], Loss: 1.8200, Perplexity: 6.1716
Epoch [4/5], Step [839/3236], Loss: 1.8803, Perplexity: 6.5551
Epoch [4/5], Step [840/3236], Loss: 1.8510, Perplexity: 6.3665
Epoch [4/5], Step [841/3236], Loss: 1.8857, Perplexity: 6.5910
Epoch [4/5], Step [842/3236], Loss: 2.5505, Perplexity: 12.8133
Epoch [4/5], Step [843/3236], Loss: 1.9823, Perplexity: 7.2597
Epoch [4/5], Step [844/3236], Loss: 1.9714, Perplexity: 7.1806
Epoch [4/5], Step [845/3236], Loss: 1.9785, Perplexity: 7.2320
Epoch [4/5], Step [846/3236], Loss: 2.0611, Perplexity: 7.8546
Epoch [4/5], Step [847/3236], Loss: 1.8607, Perplexity: 6.4281
Epoch [4/5], Step [848/3236], Loss: 1.8342, Perplexity: 6.2599
Epoch [4/5], Step [849/3236], Loss: 2.0041, Perplexity: 7.4196
Epoch [4/5], Step [850/3236], Loss: 1.9042, Perplexity: 6.7140
Epoch [4/5], Step [851/3236], Loss: 1.9965, Perplexity: 7.3631
Epoch [4/5], Step [852/3236], Loss: 2.0683, Perplexity

Epoch [4/5], Step [967/3236], Loss: 1.8421, Perplexity: 6.3095
Epoch [4/5], Step [968/3236], Loss: 1.8799, Perplexity: 6.5528
Epoch [4/5], Step [969/3236], Loss: 1.8799, Perplexity: 6.5531
Epoch [4/5], Step [970/3236], Loss: 2.2648, Perplexity: 9.6293
Epoch [4/5], Step [971/3236], Loss: 1.8925, Perplexity: 6.6358
Epoch [4/5], Step [972/3236], Loss: 1.8611, Perplexity: 6.4310
Epoch [4/5], Step [973/3236], Loss: 1.8085, Perplexity: 6.1011
Epoch [4/5], Step [974/3236], Loss: 2.0809, Perplexity: 8.0114
Epoch [4/5], Step [975/3236], Loss: 1.8975, Perplexity: 6.6692
Epoch [4/5], Step [976/3236], Loss: 1.9253, Perplexity: 6.8574
Epoch [4/5], Step [977/3236], Loss: 1.7505, Perplexity: 5.7575
Epoch [4/5], Step [978/3236], Loss: 1.9111, Perplexity: 6.7608
Epoch [4/5], Step [979/3236], Loss: 2.0023, Perplexity: 7.4063
Epoch [4/5], Step [980/3236], Loss: 1.8520, Perplexity: 6.3726
Epoch [4/5], Step [981/3236], Loss: 1.9221, Perplexity: 6.8355
Epoch [4/5], Step [982/3236], Loss: 1.8087, Perplexity:

Epoch [4/5], Step [1096/3236], Loss: 1.8517, Perplexity: 6.3709
Epoch [4/5], Step [1097/3236], Loss: 1.9505, Perplexity: 7.0324
Epoch [4/5], Step [1098/3236], Loss: 1.8635, Perplexity: 6.4462
Epoch [4/5], Step [1099/3236], Loss: 1.9004, Perplexity: 6.6884
Epoch [4/5], Step [1100/3236], Loss: 1.9691, Perplexity: 7.1642
Epoch [4/5], Step [1101/3236], Loss: 1.8300, Perplexity: 6.2337
Epoch [4/5], Step [1102/3236], Loss: 1.9331, Perplexity: 6.9110
Epoch [4/5], Step [1103/3236], Loss: 2.1430, Perplexity: 8.5247
Epoch [4/5], Step [1104/3236], Loss: 2.2099, Perplexity: 9.1150
Epoch [4/5], Step [1105/3236], Loss: 2.2520, Perplexity: 9.5066
Epoch [4/5], Step [1106/3236], Loss: 2.3896, Perplexity: 10.9086
Epoch [4/5], Step [1107/3236], Loss: 1.9396, Perplexity: 6.9559
Epoch [4/5], Step [1108/3236], Loss: 1.9192, Perplexity: 6.8155
Epoch [4/5], Step [1109/3236], Loss: 1.8220, Perplexity: 6.1840
Epoch [4/5], Step [1110/3236], Loss: 1.9215, Perplexity: 6.8313
Epoch [4/5], Step [1111/3236], Loss: 1.

Epoch [4/5], Step [1224/3236], Loss: 1.9163, Perplexity: 6.7958
Epoch [4/5], Step [1225/3236], Loss: 1.8658, Perplexity: 6.4611
Epoch [4/5], Step [1226/3236], Loss: 1.9938, Perplexity: 7.3434
Epoch [4/5], Step [1227/3236], Loss: 1.9249, Perplexity: 6.8545
Epoch [4/5], Step [1228/3236], Loss: 1.8866, Perplexity: 6.5968
Epoch [4/5], Step [1229/3236], Loss: 1.8891, Perplexity: 6.6134
Epoch [4/5], Step [1230/3236], Loss: 1.8996, Perplexity: 6.6830
Epoch [4/5], Step [1231/3236], Loss: 1.8467, Perplexity: 6.3388
Epoch [4/5], Step [1232/3236], Loss: 1.8312, Perplexity: 6.2411
Epoch [4/5], Step [1233/3236], Loss: 1.9956, Perplexity: 7.3568
Epoch [4/5], Step [1234/3236], Loss: 1.8170, Perplexity: 6.1531
Epoch [4/5], Step [1235/3236], Loss: 2.0768, Perplexity: 7.9787
Epoch [4/5], Step [1236/3236], Loss: 1.8796, Perplexity: 6.5509
Epoch [4/5], Step [1237/3236], Loss: 1.8761, Perplexity: 6.5282
Epoch [4/5], Step [1238/3236], Loss: 1.8650, Perplexity: 6.4561
Epoch [4/5], Step [1239/3236], Loss: 2.0

Epoch [4/5], Step [1352/3236], Loss: 1.9173, Perplexity: 6.8027
Epoch [4/5], Step [1353/3236], Loss: 1.8730, Perplexity: 6.5079
Epoch [4/5], Step [1354/3236], Loss: 1.7512, Perplexity: 5.7614
Epoch [4/5], Step [1355/3236], Loss: 1.8020, Perplexity: 6.0617
Epoch [4/5], Step [1356/3236], Loss: 1.8036, Perplexity: 6.0718
Epoch [4/5], Step [1357/3236], Loss: 1.8953, Perplexity: 6.6548
Epoch [4/5], Step [1358/3236], Loss: 1.8750, Perplexity: 6.5210
Epoch [4/5], Step [1359/3236], Loss: 1.7960, Perplexity: 6.0254
Epoch [4/5], Step [1360/3236], Loss: 1.7645, Perplexity: 5.8386
Epoch [4/5], Step [1361/3236], Loss: 1.9077, Perplexity: 6.7373
Epoch [4/5], Step [1362/3236], Loss: 1.8077, Perplexity: 6.0964
Epoch [4/5], Step [1363/3236], Loss: 1.9428, Perplexity: 6.9780
Epoch [4/5], Step [1364/3236], Loss: 2.0873, Perplexity: 8.0631
Epoch [4/5], Step [1365/3236], Loss: 2.1895, Perplexity: 8.9305
Epoch [4/5], Step [1366/3236], Loss: 2.0721, Perplexity: 7.9412
Epoch [4/5], Step [1367/3236], Loss: 1.8

Epoch [4/5], Step [1480/3236], Loss: 1.7520, Perplexity: 5.7663
Epoch [4/5], Step [1481/3236], Loss: 1.9268, Perplexity: 6.8677
Epoch [4/5], Step [1482/3236], Loss: 2.1618, Perplexity: 8.6864
Epoch [4/5], Step [1483/3236], Loss: 1.8967, Perplexity: 6.6638
Epoch [4/5], Step [1484/3236], Loss: 1.7820, Perplexity: 5.9415
Epoch [4/5], Step [1485/3236], Loss: 1.7654, Perplexity: 5.8440
Epoch [4/5], Step [1486/3236], Loss: 1.9686, Perplexity: 7.1606
Epoch [4/5], Step [1487/3236], Loss: 1.9289, Perplexity: 6.8822
Epoch [4/5], Step [1488/3236], Loss: 1.9002, Perplexity: 6.6871
Epoch [4/5], Step [1489/3236], Loss: 1.9224, Perplexity: 6.8375
Epoch [4/5], Step [1490/3236], Loss: 1.9170, Perplexity: 6.8003
Epoch [4/5], Step [1491/3236], Loss: 1.8564, Perplexity: 6.4007
Epoch [4/5], Step [1492/3236], Loss: 1.8934, Perplexity: 6.6419
Epoch [4/5], Step [1493/3236], Loss: 1.7082, Perplexity: 5.5192
Epoch [4/5], Step [1494/3236], Loss: 1.7307, Perplexity: 5.6443
Epoch [4/5], Step [1495/3236], Loss: 2.0

Epoch [4/5], Step [1608/3236], Loss: 1.9639, Perplexity: 7.1270
Epoch [4/5], Step [1609/3236], Loss: 1.8059, Perplexity: 6.0854
Epoch [4/5], Step [1610/3236], Loss: 1.7547, Perplexity: 5.7818
Epoch [4/5], Step [1611/3236], Loss: 1.8987, Perplexity: 6.6775
Epoch [4/5], Step [1612/3236], Loss: 2.2843, Perplexity: 9.8184
Epoch [4/5], Step [1613/3236], Loss: 1.7301, Perplexity: 5.6410
Epoch [4/5], Step [1614/3236], Loss: 1.8170, Perplexity: 6.1532
Epoch [4/5], Step [1615/3236], Loss: 1.8154, Perplexity: 6.1436
Epoch [4/5], Step [1616/3236], Loss: 2.5246, Perplexity: 12.4860
Epoch [4/5], Step [1617/3236], Loss: 1.8418, Perplexity: 6.3076
Epoch [4/5], Step [1618/3236], Loss: 1.8683, Perplexity: 6.4775
Epoch [4/5], Step [1619/3236], Loss: 1.8667, Perplexity: 6.4667
Epoch [4/5], Step [1620/3236], Loss: 2.5513, Perplexity: 12.8233
Epoch [4/5], Step [1621/3236], Loss: 1.8897, Perplexity: 6.6176
Epoch [4/5], Step [1622/3236], Loss: 1.8852, Perplexity: 6.5876
Epoch [4/5], Step [1623/3236], Loss: 1

Epoch [4/5], Step [1736/3236], Loss: 1.8315, Perplexity: 6.2435
Epoch [4/5], Step [1737/3236], Loss: 1.7669, Perplexity: 5.8529
Epoch [4/5], Step [1738/3236], Loss: 1.8236, Perplexity: 6.1938
Epoch [4/5], Step [1739/3236], Loss: 1.8348, Perplexity: 6.2641
Epoch [4/5], Step [1740/3236], Loss: 2.0646, Perplexity: 7.8818
Epoch [4/5], Step [1741/3236], Loss: 1.7515, Perplexity: 5.7631
Epoch [4/5], Step [1742/3236], Loss: 1.8982, Perplexity: 6.6739
Epoch [4/5], Step [1743/3236], Loss: 1.8058, Perplexity: 6.0846
Epoch [4/5], Step [1744/3236], Loss: 2.2375, Perplexity: 9.3698
Epoch [4/5], Step [1745/3236], Loss: 2.1096, Perplexity: 8.2447
Epoch [4/5], Step [1746/3236], Loss: 1.8430, Perplexity: 6.3156
Epoch [4/5], Step [1747/3236], Loss: 2.2806, Perplexity: 9.7826
Epoch [4/5], Step [1748/3236], Loss: 1.7712, Perplexity: 5.8780
Epoch [4/5], Step [1749/3236], Loss: 1.7933, Perplexity: 6.0090
Epoch [4/5], Step [1750/3236], Loss: 1.8580, Perplexity: 6.4109
Epoch [4/5], Step [1751/3236], Loss: 2.1

Epoch [4/5], Step [1864/3236], Loss: 1.8418, Perplexity: 6.3079
Epoch [4/5], Step [1865/3236], Loss: 1.8856, Perplexity: 6.5901
Epoch [4/5], Step [1866/3236], Loss: 2.0213, Perplexity: 7.5478
Epoch [4/5], Step [1867/3236], Loss: 2.5911, Perplexity: 13.3444
Epoch [4/5], Step [1868/3236], Loss: 1.9361, Perplexity: 6.9319
Epoch [4/5], Step [1869/3236], Loss: 2.0267, Perplexity: 7.5888
Epoch [4/5], Step [1870/3236], Loss: 1.8724, Perplexity: 6.5037
Epoch [4/5], Step [1871/3236], Loss: 2.4338, Perplexity: 11.4019
Epoch [4/5], Step [1872/3236], Loss: 1.9153, Perplexity: 6.7892
Epoch [4/5], Step [1873/3236], Loss: 2.3635, Perplexity: 10.6277
Epoch [4/5], Step [1874/3236], Loss: 1.9936, Perplexity: 7.3420
Epoch [4/5], Step [1875/3236], Loss: 1.7345, Perplexity: 5.6662
Epoch [4/5], Step [1876/3236], Loss: 1.9294, Perplexity: 6.8856
Epoch [4/5], Step [1877/3236], Loss: 1.8456, Perplexity: 6.3321
Epoch [4/5], Step [1878/3236], Loss: 1.9701, Perplexity: 7.1712
Epoch [4/5], Step [1879/3236], Loss: 

Epoch [4/5], Step [1992/3236], Loss: 1.7911, Perplexity: 5.9960
Epoch [4/5], Step [1993/3236], Loss: 1.8979, Perplexity: 6.6718
Epoch [4/5], Step [1994/3236], Loss: 1.9055, Perplexity: 6.7229
Epoch [4/5], Step [1995/3236], Loss: 1.9448, Perplexity: 6.9922
Epoch [4/5], Step [1996/3236], Loss: 2.2302, Perplexity: 9.3015
Epoch [4/5], Step [1997/3236], Loss: 1.8963, Perplexity: 6.6613
Epoch [4/5], Step [1998/3236], Loss: 1.8798, Perplexity: 6.5523
Epoch [4/5], Step [1999/3236], Loss: 1.9209, Perplexity: 6.8268
Epoch [4/5], Step [2000/3236], Loss: 1.8806, Perplexity: 6.5573
Epoch [4/5], Step [2001/3236], Loss: 1.9572, Perplexity: 7.0796
Epoch [4/5], Step [2002/3236], Loss: 1.7892, Perplexity: 5.9848
Epoch [4/5], Step [2003/3236], Loss: 1.9193, Perplexity: 6.8161
Epoch [4/5], Step [2004/3236], Loss: 1.7349, Perplexity: 5.6685
Epoch [4/5], Step [2005/3236], Loss: 1.9162, Perplexity: 6.7948
Epoch [4/5], Step [2006/3236], Loss: 1.8132, Perplexity: 6.1301
Epoch [4/5], Step [2007/3236], Loss: 1.9

Epoch [4/5], Step [2120/3236], Loss: 1.8780, Perplexity: 6.5407
Epoch [4/5], Step [2121/3236], Loss: 1.9944, Perplexity: 7.3482
Epoch [4/5], Step [2122/3236], Loss: 2.0466, Perplexity: 7.7415
Epoch [4/5], Step [2123/3236], Loss: 1.7848, Perplexity: 5.9581
Epoch [4/5], Step [2124/3236], Loss: 1.7523, Perplexity: 5.7677
Epoch [4/5], Step [2125/3236], Loss: 1.8580, Perplexity: 6.4106
Epoch [4/5], Step [2126/3236], Loss: 2.4103, Perplexity: 11.1371
Epoch [4/5], Step [2127/3236], Loss: 1.8429, Perplexity: 6.3150
Epoch [4/5], Step [2128/3236], Loss: 1.8683, Perplexity: 6.4770
Epoch [4/5], Step [2129/3236], Loss: 1.8262, Perplexity: 6.2102
Epoch [4/5], Step [2130/3236], Loss: 1.8010, Perplexity: 6.0555
Epoch [4/5], Step [2131/3236], Loss: 1.9799, Perplexity: 7.2419
Epoch [4/5], Step [2132/3236], Loss: 1.8260, Perplexity: 6.2090
Epoch [4/5], Step [2133/3236], Loss: 1.8106, Perplexity: 6.1142
Epoch [4/5], Step [2134/3236], Loss: 1.7532, Perplexity: 5.7728
Epoch [4/5], Step [2135/3236], Loss: 1.

Epoch [4/5], Step [2248/3236], Loss: 1.9824, Perplexity: 7.2604
Epoch [4/5], Step [2249/3236], Loss: 1.9548, Perplexity: 7.0625
Epoch [4/5], Step [2250/3236], Loss: 2.4415, Perplexity: 11.4904
Epoch [4/5], Step [2251/3236], Loss: 1.9853, Perplexity: 7.2812
Epoch [4/5], Step [2252/3236], Loss: 1.8142, Perplexity: 6.1365
Epoch [4/5], Step [2253/3236], Loss: 1.7389, Perplexity: 5.6910
Epoch [4/5], Step [2254/3236], Loss: 1.8642, Perplexity: 6.4508
Epoch [4/5], Step [2255/3236], Loss: 1.8073, Perplexity: 6.0941
Epoch [4/5], Step [2256/3236], Loss: 2.0529, Perplexity: 7.7908
Epoch [4/5], Step [2257/3236], Loss: 1.8650, Perplexity: 6.4557
Epoch [4/5], Step [2258/3236], Loss: 1.7852, Perplexity: 5.9607
Epoch [4/5], Step [2259/3236], Loss: 2.1247, Perplexity: 8.3703
Epoch [4/5], Step [2260/3236], Loss: 1.7484, Perplexity: 5.7451
Epoch [4/5], Step [2261/3236], Loss: 1.8048, Perplexity: 6.0786
Epoch [4/5], Step [2262/3236], Loss: 1.8740, Perplexity: 6.5142
Epoch [4/5], Step [2263/3236], Loss: 1.

Epoch [4/5], Step [2376/3236], Loss: 1.7839, Perplexity: 5.9530
Epoch [4/5], Step [2377/3236], Loss: 1.7709, Perplexity: 5.8764
Epoch [4/5], Step [2378/3236], Loss: 2.3482, Perplexity: 10.4662
Epoch [4/5], Step [2379/3236], Loss: 2.0999, Perplexity: 8.1655
Epoch [4/5], Step [2380/3236], Loss: 2.0653, Perplexity: 7.8873
Epoch [4/5], Step [2381/3236], Loss: 1.8628, Perplexity: 6.4417
Epoch [4/5], Step [2382/3236], Loss: 1.7807, Perplexity: 5.9337
Epoch [4/5], Step [2383/3236], Loss: 1.8405, Perplexity: 6.2994
Epoch [4/5], Step [2384/3236], Loss: 1.9440, Perplexity: 6.9865
Epoch [4/5], Step [2385/3236], Loss: 1.8219, Perplexity: 6.1834
Epoch [4/5], Step [2386/3236], Loss: 1.9188, Perplexity: 6.8126
Epoch [4/5], Step [2387/3236], Loss: 2.0518, Perplexity: 7.7816
Epoch [4/5], Step [2388/3236], Loss: 2.1034, Perplexity: 8.1942
Epoch [4/5], Step [2389/3236], Loss: 1.7611, Perplexity: 5.8189
Epoch [4/5], Step [2390/3236], Loss: 1.8767, Perplexity: 6.5319
Epoch [4/5], Step [2391/3236], Loss: 1.

Epoch [4/5], Step [2504/3236], Loss: 1.8314, Perplexity: 6.2429
Epoch [4/5], Step [2505/3236], Loss: 1.8437, Perplexity: 6.3201
Epoch [4/5], Step [2506/3236], Loss: 2.0628, Perplexity: 7.8678
Epoch [4/5], Step [2507/3236], Loss: 1.8722, Perplexity: 6.5026
Epoch [4/5], Step [2508/3236], Loss: 1.7573, Perplexity: 5.7966
Epoch [4/5], Step [2509/3236], Loss: 1.8355, Perplexity: 6.2681
Epoch [4/5], Step [2510/3236], Loss: 1.9591, Perplexity: 7.0927
Epoch [4/5], Step [2511/3236], Loss: 1.8437, Perplexity: 6.3196
Epoch [4/5], Step [2512/3236], Loss: 1.8468, Perplexity: 6.3397
Epoch [4/5], Step [2513/3236], Loss: 1.9430, Perplexity: 6.9797
Epoch [4/5], Step [2514/3236], Loss: 1.9445, Perplexity: 6.9903
Epoch [4/5], Step [2515/3236], Loss: 1.8490, Perplexity: 6.3532
Epoch [4/5], Step [2516/3236], Loss: 2.1610, Perplexity: 8.6797
Epoch [4/5], Step [2517/3236], Loss: 1.9616, Perplexity: 7.1107
Epoch [4/5], Step [2518/3236], Loss: 1.9081, Perplexity: 6.7404
Epoch [4/5], Step [2519/3236], Loss: 1.9

Epoch [4/5], Step [2632/3236], Loss: 2.8941, Perplexity: 18.0671
Epoch [4/5], Step [2633/3236], Loss: 1.9610, Perplexity: 7.1066
Epoch [4/5], Step [2634/3236], Loss: 1.9589, Perplexity: 7.0913
Epoch [4/5], Step [2635/3236], Loss: 1.8758, Perplexity: 6.5258
Epoch [4/5], Step [2636/3236], Loss: 1.8690, Perplexity: 6.4816
Epoch [4/5], Step [2637/3236], Loss: 1.7704, Perplexity: 5.8731
Epoch [4/5], Step [2638/3236], Loss: 1.7108, Perplexity: 5.5336
Epoch [4/5], Step [2639/3236], Loss: 2.0760, Perplexity: 7.9724
Epoch [4/5], Step [2640/3236], Loss: 1.8125, Perplexity: 6.1255
Epoch [4/5], Step [2641/3236], Loss: 1.8194, Perplexity: 6.1684
Epoch [4/5], Step [2642/3236], Loss: 1.8599, Perplexity: 6.4229
Epoch [4/5], Step [2643/3236], Loss: 1.8107, Perplexity: 6.1146
Epoch [4/5], Step [2644/3236], Loss: 1.9655, Perplexity: 7.1386
Epoch [4/5], Step [2645/3236], Loss: 1.8413, Perplexity: 6.3050
Epoch [4/5], Step [2646/3236], Loss: 1.8409, Perplexity: 6.3024
Epoch [4/5], Step [2647/3236], Loss: 2.

Epoch [4/5], Step [2760/3236], Loss: 1.8941, Perplexity: 6.6468
Epoch [4/5], Step [2761/3236], Loss: 1.7267, Perplexity: 5.6218
Epoch [4/5], Step [2762/3236], Loss: 1.9552, Perplexity: 7.0652
Epoch [4/5], Step [2763/3236], Loss: 2.1880, Perplexity: 8.9176
Epoch [4/5], Step [2764/3236], Loss: 1.8654, Perplexity: 6.4586
Epoch [4/5], Step [2765/3236], Loss: 1.6838, Perplexity: 5.3861
Epoch [4/5], Step [2766/3236], Loss: 1.8427, Perplexity: 6.3135
Epoch [4/5], Step [2767/3236], Loss: 1.8118, Perplexity: 6.1212
Epoch [4/5], Step [2768/3236], Loss: 1.9494, Perplexity: 7.0248
Epoch [4/5], Step [2769/3236], Loss: 1.8365, Perplexity: 6.2746
Epoch [4/5], Step [2770/3236], Loss: 1.8011, Perplexity: 6.0564
Epoch [4/5], Step [2771/3236], Loss: 1.8221, Perplexity: 6.1849
Epoch [4/5], Step [2772/3236], Loss: 1.8687, Perplexity: 6.4797
Epoch [4/5], Step [2773/3236], Loss: 2.0338, Perplexity: 7.6434
Epoch [4/5], Step [2774/3236], Loss: 1.8512, Perplexity: 6.3671
Epoch [4/5], Step [2775/3236], Loss: 2.5

Epoch [4/5], Step [2888/3236], Loss: 2.1646, Perplexity: 8.7113
Epoch [4/5], Step [2889/3236], Loss: 1.8389, Perplexity: 6.2898
Epoch [4/5], Step [2890/3236], Loss: 1.7736, Perplexity: 5.8921
Epoch [4/5], Step [2891/3236], Loss: 1.8372, Perplexity: 6.2792
Epoch [4/5], Step [2892/3236], Loss: 1.8393, Perplexity: 6.2920
Epoch [4/5], Step [2893/3236], Loss: 1.7065, Perplexity: 5.5095
Epoch [4/5], Step [2894/3236], Loss: 1.9320, Perplexity: 6.9036
Epoch [4/5], Step [2895/3236], Loss: 1.7669, Perplexity: 5.8525
Epoch [4/5], Step [2896/3236], Loss: 1.7094, Perplexity: 5.5258
Epoch [4/5], Step [2897/3236], Loss: 2.2084, Perplexity: 9.1014
Epoch [4/5], Step [2898/3236], Loss: 2.0457, Perplexity: 7.7345
Epoch [4/5], Step [2899/3236], Loss: 2.2389, Perplexity: 9.3832
Epoch [4/5], Step [2900/3236], Loss: 1.8169, Perplexity: 6.1530
Epoch [4/5], Step [2901/3236], Loss: 2.2970, Perplexity: 9.9444
Epoch [4/5], Step [2902/3236], Loss: 1.8615, Perplexity: 6.4333
Epoch [4/5], Step [2903/3236], Loss: 1.9

Epoch [4/5], Step [3016/3236], Loss: 1.8523, Perplexity: 6.3746
Epoch [4/5], Step [3017/3236], Loss: 1.8807, Perplexity: 6.5581
Epoch [4/5], Step [3018/3236], Loss: 1.7815, Perplexity: 5.9390
Epoch [4/5], Step [3019/3236], Loss: 2.0422, Perplexity: 7.7077
Epoch [4/5], Step [3020/3236], Loss: 1.8035, Perplexity: 6.0711
Epoch [4/5], Step [3021/3236], Loss: 1.9745, Perplexity: 7.2030
Epoch [4/5], Step [3022/3236], Loss: 2.1235, Perplexity: 8.3602
Epoch [4/5], Step [3023/3236], Loss: 1.8334, Perplexity: 6.2548
Epoch [4/5], Step [3024/3236], Loss: 2.2445, Perplexity: 9.4355
Epoch [4/5], Step [3025/3236], Loss: 1.8799, Perplexity: 6.5531
Epoch [4/5], Step [3026/3236], Loss: 2.0784, Perplexity: 7.9917
Epoch [4/5], Step [3027/3236], Loss: 1.7859, Perplexity: 5.9649
Epoch [4/5], Step [3028/3236], Loss: 1.8190, Perplexity: 6.1655
Epoch [4/5], Step [3029/3236], Loss: 2.0264, Perplexity: 7.5869
Epoch [4/5], Step [3030/3236], Loss: 1.8636, Perplexity: 6.4470
Epoch [4/5], Step [3031/3236], Loss: 1.8

Epoch [4/5], Step [3144/3236], Loss: 1.8897, Perplexity: 6.6175
Epoch [4/5], Step [3145/3236], Loss: 1.8374, Perplexity: 6.2804
Epoch [4/5], Step [3146/3236], Loss: 1.8842, Perplexity: 6.5812
Epoch [4/5], Step [3147/3236], Loss: 1.7889, Perplexity: 5.9829
Epoch [4/5], Step [3148/3236], Loss: 2.0352, Perplexity: 7.6541
Epoch [4/5], Step [3149/3236], Loss: 1.8468, Perplexity: 6.3398
Epoch [4/5], Step [3150/3236], Loss: 1.9216, Perplexity: 6.8319
Epoch [4/5], Step [3151/3236], Loss: 1.8990, Perplexity: 6.6789
Epoch [4/5], Step [3152/3236], Loss: 1.8141, Perplexity: 6.1355
Epoch [4/5], Step [3153/3236], Loss: 1.8120, Perplexity: 6.1227
Epoch [4/5], Step [3154/3236], Loss: 1.7745, Perplexity: 5.8975
Epoch [4/5], Step [3155/3236], Loss: 1.8923, Perplexity: 6.6343
Epoch [4/5], Step [3156/3236], Loss: 1.8801, Perplexity: 6.5542
Epoch [4/5], Step [3157/3236], Loss: 2.1527, Perplexity: 8.6084
Epoch [4/5], Step [3158/3236], Loss: 1.8186, Perplexity: 6.1634
Epoch [4/5], Step [3159/3236], Loss: 1.8

Epoch [5/5], Step [38/3236], Loss: 1.8204, Perplexity: 6.1746
Epoch [5/5], Step [39/3236], Loss: 1.7624, Perplexity: 5.8266
Epoch [5/5], Step [40/3236], Loss: 1.8750, Perplexity: 6.5207
Epoch [5/5], Step [41/3236], Loss: 1.7318, Perplexity: 5.6509
Epoch [5/5], Step [42/3236], Loss: 1.9344, Perplexity: 6.9196
Epoch [5/5], Step [43/3236], Loss: 1.8012, Perplexity: 6.0570
Epoch [5/5], Step [44/3236], Loss: 1.9052, Perplexity: 6.7207
Epoch [5/5], Step [45/3236], Loss: 1.7541, Perplexity: 5.7783
Epoch [5/5], Step [46/3236], Loss: 2.0347, Perplexity: 7.6499
Epoch [5/5], Step [47/3236], Loss: 1.8097, Perplexity: 6.1086
Epoch [5/5], Step [48/3236], Loss: 1.9013, Perplexity: 6.6946
Epoch [5/5], Step [49/3236], Loss: 1.9041, Perplexity: 6.7136
Epoch [5/5], Step [50/3236], Loss: 1.9029, Perplexity: 6.7054
Epoch [5/5], Step [51/3236], Loss: 1.8969, Perplexity: 6.6653
Epoch [5/5], Step [52/3236], Loss: 2.0574, Perplexity: 7.8253
Epoch [5/5], Step [53/3236], Loss: 1.8273, Perplexity: 6.2168
Epoch [5

Epoch [5/5], Step [169/3236], Loss: 1.8251, Perplexity: 6.2032
Epoch [5/5], Step [170/3236], Loss: 1.8897, Perplexity: 6.6171
Epoch [5/5], Step [171/3236], Loss: 1.9605, Perplexity: 7.1032
Epoch [5/5], Step [172/3236], Loss: 1.9759, Perplexity: 7.2129
Epoch [5/5], Step [173/3236], Loss: 1.7996, Perplexity: 6.0470
Epoch [5/5], Step [174/3236], Loss: 1.8922, Perplexity: 6.6336
Epoch [5/5], Step [175/3236], Loss: 1.8039, Perplexity: 6.0734
Epoch [5/5], Step [176/3236], Loss: 2.0280, Perplexity: 7.5987
Epoch [5/5], Step [177/3236], Loss: 1.9576, Perplexity: 7.0824
Epoch [5/5], Step [178/3236], Loss: 1.8638, Perplexity: 6.4480
Epoch [5/5], Step [179/3236], Loss: 1.8387, Perplexity: 6.2885
Epoch [5/5], Step [180/3236], Loss: 1.9621, Perplexity: 7.1144
Epoch [5/5], Step [181/3236], Loss: 2.0001, Perplexity: 7.3898
Epoch [5/5], Step [182/3236], Loss: 1.9533, Perplexity: 7.0519
Epoch [5/5], Step [183/3236], Loss: 1.8785, Perplexity: 6.5440
Epoch [5/5], Step [184/3236], Loss: 2.7638, Perplexity:

Epoch [5/5], Step [299/3236], Loss: 1.8946, Perplexity: 6.6497
Epoch [5/5], Step [300/3236], Loss: 1.9024, Perplexity: 6.7019
Epoch [5/5], Step [301/3236], Loss: 1.9291, Perplexity: 6.8836
Epoch [5/5], Step [302/3236], Loss: 1.7774, Perplexity: 5.9146
Epoch [5/5], Step [303/3236], Loss: 1.8671, Perplexity: 6.4692
Epoch [5/5], Step [304/3236], Loss: 1.7560, Perplexity: 5.7894
Epoch [5/5], Step [305/3236], Loss: 1.8480, Perplexity: 6.3471
Epoch [5/5], Step [306/3236], Loss: 1.9224, Perplexity: 6.8376
Epoch [5/5], Step [307/3236], Loss: 1.8771, Perplexity: 6.5344
Epoch [5/5], Step [308/3236], Loss: 1.8332, Perplexity: 6.2536
Epoch [5/5], Step [309/3236], Loss: 1.8087, Perplexity: 6.1023
Epoch [5/5], Step [310/3236], Loss: 1.8092, Perplexity: 6.1054
Epoch [5/5], Step [311/3236], Loss: 2.0788, Perplexity: 7.9948
Epoch [5/5], Step [312/3236], Loss: 2.1045, Perplexity: 8.2032
Epoch [5/5], Step [313/3236], Loss: 1.7416, Perplexity: 5.7063
Epoch [5/5], Step [314/3236], Loss: 1.7739, Perplexity:

Epoch [5/5], Step [429/3236], Loss: 1.9040, Perplexity: 6.7127
Epoch [5/5], Step [430/3236], Loss: 1.9236, Perplexity: 6.8453
Epoch [5/5], Step [431/3236], Loss: 2.4045, Perplexity: 11.0728
Epoch [5/5], Step [432/3236], Loss: 1.8264, Perplexity: 6.2112
Epoch [5/5], Step [433/3236], Loss: 1.7874, Perplexity: 5.9736
Epoch [5/5], Step [434/3236], Loss: 1.8934, Perplexity: 6.6417
Epoch [5/5], Step [435/3236], Loss: 1.9341, Perplexity: 6.9178
Epoch [5/5], Step [436/3236], Loss: 2.5969, Perplexity: 13.4227
Epoch [5/5], Step [437/3236], Loss: 1.9963, Perplexity: 7.3614
Epoch [5/5], Step [438/3236], Loss: 1.9958, Perplexity: 7.3584
Epoch [5/5], Step [439/3236], Loss: 2.0719, Perplexity: 7.9402
Epoch [5/5], Step [440/3236], Loss: 1.9814, Perplexity: 7.2530
Epoch [5/5], Step [441/3236], Loss: 1.9902, Perplexity: 7.3171
Epoch [5/5], Step [442/3236], Loss: 1.8880, Perplexity: 6.6064
Epoch [5/5], Step [443/3236], Loss: 1.8926, Perplexity: 6.6365
Epoch [5/5], Step [444/3236], Loss: 1.8157, Perplexit

Epoch [5/5], Step [559/3236], Loss: 1.9696, Perplexity: 7.1680
Epoch [5/5], Step [560/3236], Loss: 2.2459, Perplexity: 9.4493
Epoch [5/5], Step [561/3236], Loss: 1.7324, Perplexity: 5.6540
Epoch [5/5], Step [562/3236], Loss: 1.8677, Perplexity: 6.4731
Epoch [5/5], Step [563/3236], Loss: 1.9707, Perplexity: 7.1757
Epoch [5/5], Step [564/3236], Loss: 2.1362, Perplexity: 8.4673
Epoch [5/5], Step [565/3236], Loss: 1.9417, Perplexity: 6.9703
Epoch [5/5], Step [566/3236], Loss: 1.8247, Perplexity: 6.2011
Epoch [5/5], Step [567/3236], Loss: 2.1034, Perplexity: 8.1938
Epoch [5/5], Step [568/3236], Loss: 1.7784, Perplexity: 5.9204
Epoch [5/5], Step [569/3236], Loss: 2.8087, Perplexity: 16.5881
Epoch [5/5], Step [570/3236], Loss: 1.9188, Perplexity: 6.8129
Epoch [5/5], Step [571/3236], Loss: 2.1845, Perplexity: 8.8858
Epoch [5/5], Step [572/3236], Loss: 1.8109, Perplexity: 6.1160
Epoch [5/5], Step [573/3236], Loss: 1.7183, Perplexity: 5.5753
Epoch [5/5], Step [574/3236], Loss: 1.9689, Perplexity

Epoch [5/5], Step [689/3236], Loss: 1.9387, Perplexity: 6.9500
Epoch [5/5], Step [690/3236], Loss: 1.8113, Perplexity: 6.1184
Epoch [5/5], Step [691/3236], Loss: 2.1062, Perplexity: 8.2170
Epoch [5/5], Step [692/3236], Loss: 1.8395, Perplexity: 6.2936
Epoch [5/5], Step [693/3236], Loss: 1.8148, Perplexity: 6.1397
Epoch [5/5], Step [694/3236], Loss: 1.8151, Perplexity: 6.1418
Epoch [5/5], Step [695/3236], Loss: 1.9880, Perplexity: 7.3006
Epoch [5/5], Step [696/3236], Loss: 2.1485, Perplexity: 8.5716
Epoch [5/5], Step [697/3236], Loss: 1.8821, Perplexity: 6.5670
Epoch [5/5], Step [698/3236], Loss: 1.8486, Perplexity: 6.3510
Epoch [5/5], Step [699/3236], Loss: 1.9061, Perplexity: 6.7267
Epoch [5/5], Step [700/3236], Loss: 1.8461, Perplexity: 6.3353
Epoch [5/5], Step [701/3236], Loss: 1.7688, Perplexity: 5.8640
Epoch [5/5], Step [702/3236], Loss: 1.9788, Perplexity: 7.2339
Epoch [5/5], Step [703/3236], Loss: 2.3367, Perplexity: 10.3466
Epoch [5/5], Step [704/3236], Loss: 1.8601, Perplexity

Epoch [5/5], Step [819/3236], Loss: 1.9226, Perplexity: 6.8385
Epoch [5/5], Step [820/3236], Loss: 1.8595, Perplexity: 6.4204
Epoch [5/5], Step [821/3236], Loss: 1.8186, Perplexity: 6.1630
Epoch [5/5], Step [822/3236], Loss: 1.7464, Perplexity: 5.7341
Epoch [5/5], Step [823/3236], Loss: 1.6831, Perplexity: 5.3823
Epoch [5/5], Step [824/3236], Loss: 1.6698, Perplexity: 5.3112
Epoch [5/5], Step [825/3236], Loss: 2.0237, Perplexity: 7.5665
Epoch [5/5], Step [826/3236], Loss: 2.3581, Perplexity: 10.5713
Epoch [5/5], Step [827/3236], Loss: 2.1160, Perplexity: 8.2980
Epoch [5/5], Step [828/3236], Loss: 1.7966, Perplexity: 6.0289
Epoch [5/5], Step [829/3236], Loss: 1.8978, Perplexity: 6.6714
Epoch [5/5], Step [830/3236], Loss: 1.8126, Perplexity: 6.1260
Epoch [5/5], Step [831/3236], Loss: 1.8114, Perplexity: 6.1192
Epoch [5/5], Step [832/3236], Loss: 1.9222, Perplexity: 6.8357
Epoch [5/5], Step [833/3236], Loss: 1.8563, Perplexity: 6.3999
Epoch [5/5], Step [834/3236], Loss: 1.8469, Perplexity

Epoch [5/5], Step [949/3236], Loss: 1.7266, Perplexity: 5.6213
Epoch [5/5], Step [950/3236], Loss: 1.8027, Perplexity: 6.0660
Epoch [5/5], Step [951/3236], Loss: 1.6905, Perplexity: 5.4223
Epoch [5/5], Step [952/3236], Loss: 1.9172, Perplexity: 6.8019
Epoch [5/5], Step [953/3236], Loss: 1.7142, Perplexity: 5.5523
Epoch [5/5], Step [954/3236], Loss: 1.8925, Perplexity: 6.6359
Epoch [5/5], Step [955/3236], Loss: 2.0659, Perplexity: 7.8928
Epoch [5/5], Step [956/3236], Loss: 1.8615, Perplexity: 6.4333
Epoch [5/5], Step [957/3236], Loss: 1.8248, Perplexity: 6.2017
Epoch [5/5], Step [958/3236], Loss: 1.8036, Perplexity: 6.0714
Epoch [5/5], Step [959/3236], Loss: 2.2481, Perplexity: 9.4697
Epoch [5/5], Step [960/3236], Loss: 1.6843, Perplexity: 5.3889
Epoch [5/5], Step [961/3236], Loss: 1.8473, Perplexity: 6.3425
Epoch [5/5], Step [962/3236], Loss: 1.9252, Perplexity: 6.8568
Epoch [5/5], Step [963/3236], Loss: 2.7406, Perplexity: 15.4964
Epoch [5/5], Step [964/3236], Loss: 1.8340, Perplexity

Epoch [5/5], Step [1078/3236], Loss: 1.9739, Perplexity: 7.1984
Epoch [5/5], Step [1079/3236], Loss: 1.8318, Perplexity: 6.2452
Epoch [5/5], Step [1080/3236], Loss: 1.7203, Perplexity: 5.5862
Epoch [5/5], Step [1081/3236], Loss: 1.7748, Perplexity: 5.8992
Epoch [5/5], Step [1082/3236], Loss: 1.7342, Perplexity: 5.6642
Epoch [5/5], Step [1083/3236], Loss: 1.8995, Perplexity: 6.6824
Epoch [5/5], Step [1084/3236], Loss: 1.7523, Perplexity: 5.7679
Epoch [5/5], Step [1085/3236], Loss: 1.8880, Perplexity: 6.6060
Epoch [5/5], Step [1086/3236], Loss: 1.7520, Perplexity: 5.7662
Epoch [5/5], Step [1087/3236], Loss: 1.8476, Perplexity: 6.3446
Epoch [5/5], Step [1088/3236], Loss: 2.0428, Perplexity: 7.7121
Epoch [5/5], Step [1089/3236], Loss: 1.7910, Perplexity: 5.9952
Epoch [5/5], Step [1090/3236], Loss: 1.7725, Perplexity: 5.8856
Epoch [5/5], Step [1091/3236], Loss: 1.8219, Perplexity: 6.1833
Epoch [5/5], Step [1092/3236], Loss: 1.8570, Perplexity: 6.4047
Epoch [5/5], Step [1093/3236], Loss: 1.8

Epoch [5/5], Step [1206/3236], Loss: 1.9094, Perplexity: 6.7492
Epoch [5/5], Step [1207/3236], Loss: 1.8805, Perplexity: 6.5566
Epoch [5/5], Step [1208/3236], Loss: 2.0411, Perplexity: 7.6990
Epoch [5/5], Step [1209/3236], Loss: 1.7358, Perplexity: 5.6737
Epoch [5/5], Step [1210/3236], Loss: 1.7490, Perplexity: 5.7487
Epoch [5/5], Step [1211/3236], Loss: 2.7659, Perplexity: 15.8926
Epoch [5/5], Step [1212/3236], Loss: 1.8394, Perplexity: 6.2930
Epoch [5/5], Step [1213/3236], Loss: 1.7502, Perplexity: 5.7557
Epoch [5/5], Step [1214/3236], Loss: 1.7691, Perplexity: 5.8655
Epoch [5/5], Step [1215/3236], Loss: 1.8484, Perplexity: 6.3497
Epoch [5/5], Step [1216/3236], Loss: 1.8458, Perplexity: 6.3332
Epoch [5/5], Step [1217/3236], Loss: 1.8497, Perplexity: 6.3579
Epoch [5/5], Step [1218/3236], Loss: 1.7835, Perplexity: 5.9507
Epoch [5/5], Step [1219/3236], Loss: 1.7942, Perplexity: 6.0147
Epoch [5/5], Step [1220/3236], Loss: 1.8430, Perplexity: 6.3156
Epoch [5/5], Step [1221/3236], Loss: 1.

Epoch [5/5], Step [1334/3236], Loss: 1.7847, Perplexity: 5.9577
Epoch [5/5], Step [1335/3236], Loss: 1.7668, Perplexity: 5.8521
Epoch [5/5], Step [1336/3236], Loss: 1.9136, Perplexity: 6.7776
Epoch [5/5], Step [1337/3236], Loss: 1.7158, Perplexity: 5.5614
Epoch [5/5], Step [1338/3236], Loss: 1.8619, Perplexity: 6.4358
Epoch [5/5], Step [1339/3236], Loss: 2.7746, Perplexity: 16.0315
Epoch [5/5], Step [1340/3236], Loss: 1.8903, Perplexity: 6.6215
Epoch [5/5], Step [1341/3236], Loss: 2.2077, Perplexity: 9.0947
Epoch [5/5], Step [1342/3236], Loss: 1.7722, Perplexity: 5.8836
Epoch [5/5], Step [1343/3236], Loss: 1.7763, Perplexity: 5.9082
Epoch [5/5], Step [1344/3236], Loss: 1.8125, Perplexity: 6.1255
Epoch [5/5], Step [1345/3236], Loss: 1.8615, Perplexity: 6.4332
Epoch [5/5], Step [1346/3236], Loss: 1.9330, Perplexity: 6.9100
Epoch [5/5], Step [1347/3236], Loss: 1.8152, Perplexity: 6.1424
Epoch [5/5], Step [1348/3236], Loss: 1.7858, Perplexity: 5.9644
Epoch [5/5], Step [1349/3236], Loss: 1.

Epoch [5/5], Step [1462/3236], Loss: 1.7764, Perplexity: 5.9084
Epoch [5/5], Step [1463/3236], Loss: 1.7619, Perplexity: 5.8236
Epoch [5/5], Step [1464/3236], Loss: 1.9774, Perplexity: 7.2240
Epoch [5/5], Step [1465/3236], Loss: 1.7373, Perplexity: 5.6820
Epoch [5/5], Step [1466/3236], Loss: 1.7974, Perplexity: 6.0338
Epoch [5/5], Step [1467/3236], Loss: 2.0068, Perplexity: 7.4395
Epoch [5/5], Step [1468/3236], Loss: 1.9587, Perplexity: 7.0901
Epoch [5/5], Step [1469/3236], Loss: 1.7704, Perplexity: 5.8735
Epoch [5/5], Step [1470/3236], Loss: 1.6588, Perplexity: 5.2531
Epoch [5/5], Step [1471/3236], Loss: 1.7976, Perplexity: 6.0354
Epoch [5/5], Step [1472/3236], Loss: 1.8479, Perplexity: 6.3466
Epoch [5/5], Step [1473/3236], Loss: 2.0908, Perplexity: 8.0916
Epoch [5/5], Step [1474/3236], Loss: 1.7518, Perplexity: 5.7652
Epoch [5/5], Step [1475/3236], Loss: 1.6750, Perplexity: 5.3387
Epoch [5/5], Step [1476/3236], Loss: 2.1387, Perplexity: 8.4882
Epoch [5/5], Step [1477/3236], Loss: 1.8

Epoch [5/5], Step [1590/3236], Loss: 1.7576, Perplexity: 5.7982
Epoch [5/5], Step [1591/3236], Loss: 1.7373, Perplexity: 5.6818
Epoch [5/5], Step [1592/3236], Loss: 1.8548, Perplexity: 6.3904
Epoch [5/5], Step [1593/3236], Loss: 1.7489, Perplexity: 5.7482
Epoch [5/5], Step [1594/3236], Loss: 1.8314, Perplexity: 6.2429
Epoch [5/5], Step [1595/3236], Loss: 1.8593, Perplexity: 6.4193
Epoch [5/5], Step [1596/3236], Loss: 1.9427, Perplexity: 6.9779
Epoch [5/5], Step [1597/3236], Loss: 1.9317, Perplexity: 6.9010
Epoch [5/5], Step [1598/3236], Loss: 1.7679, Perplexity: 5.8587
Epoch [5/5], Step [1599/3236], Loss: 1.8694, Perplexity: 6.4844
Epoch [5/5], Step [1600/3236], Loss: 1.9883, Perplexity: 7.3032
Epoch [5/5], Step [1601/3236], Loss: 1.8348, Perplexity: 6.2640
Epoch [5/5], Step [1602/3236], Loss: 1.7660, Perplexity: 5.8474
Epoch [5/5], Step [1603/3236], Loss: 2.0401, Perplexity: 7.6916
Epoch [5/5], Step [1604/3236], Loss: 1.7161, Perplexity: 5.5627
Epoch [5/5], Step [1605/3236], Loss: 1.8

Epoch [5/5], Step [1718/3236], Loss: 2.4528, Perplexity: 11.6214
Epoch [5/5], Step [1719/3236], Loss: 1.8051, Perplexity: 6.0808
Epoch [5/5], Step [1720/3236], Loss: 1.8106, Perplexity: 6.1140
Epoch [5/5], Step [1721/3236], Loss: 2.4091, Perplexity: 11.1234
Epoch [5/5], Step [1722/3236], Loss: 2.1980, Perplexity: 9.0067
Epoch [5/5], Step [1723/3236], Loss: 1.9892, Perplexity: 7.3095
Epoch [5/5], Step [1724/3236], Loss: 1.7439, Perplexity: 5.7197
Epoch [5/5], Step [1725/3236], Loss: 1.8998, Perplexity: 6.6843
Epoch [5/5], Step [1726/3236], Loss: 1.8905, Perplexity: 6.6228
Epoch [5/5], Step [1727/3236], Loss: 1.7525, Perplexity: 5.7693
Epoch [5/5], Step [1728/3236], Loss: 2.2213, Perplexity: 9.2194
Epoch [5/5], Step [1729/3236], Loss: 2.1137, Perplexity: 8.2788
Epoch [5/5], Step [1730/3236], Loss: 1.9761, Perplexity: 7.2147
Epoch [5/5], Step [1731/3236], Loss: 1.7858, Perplexity: 5.9641
Epoch [5/5], Step [1732/3236], Loss: 1.9557, Perplexity: 7.0690
Epoch [5/5], Step [1733/3236], Loss: 2

Epoch [5/5], Step [1846/3236], Loss: 1.7566, Perplexity: 5.7930
Epoch [5/5], Step [1847/3236], Loss: 1.7187, Perplexity: 5.5773
Epoch [5/5], Step [1848/3236], Loss: 1.7199, Perplexity: 5.5840
Epoch [5/5], Step [1849/3236], Loss: 1.9703, Perplexity: 7.1725
Epoch [5/5], Step [1850/3236], Loss: 1.9527, Perplexity: 7.0475
Epoch [5/5], Step [1851/3236], Loss: 1.8396, Perplexity: 6.2942
Epoch [5/5], Step [1852/3236], Loss: 1.9351, Perplexity: 6.9245
Epoch [5/5], Step [1853/3236], Loss: 1.7791, Perplexity: 5.9246
Epoch [5/5], Step [1854/3236], Loss: 1.7854, Perplexity: 5.9618
Epoch [5/5], Step [1855/3236], Loss: 2.0333, Perplexity: 7.6395
Epoch [5/5], Step [1856/3236], Loss: 1.7508, Perplexity: 5.7594
Epoch [5/5], Step [1857/3236], Loss: 1.9688, Perplexity: 7.1623
Epoch [5/5], Step [1858/3236], Loss: 1.9383, Perplexity: 6.9470
Epoch [5/5], Step [1859/3236], Loss: 1.7746, Perplexity: 5.8982
Epoch [5/5], Step [1860/3236], Loss: 1.7360, Perplexity: 5.6748
Epoch [5/5], Step [1861/3236], Loss: 1.8

Epoch [5/5], Step [1974/3236], Loss: 1.7010, Perplexity: 5.4793
Epoch [5/5], Step [1975/3236], Loss: 1.8139, Perplexity: 6.1343
Epoch [5/5], Step [1976/3236], Loss: 2.4840, Perplexity: 11.9893
Epoch [5/5], Step [1977/3236], Loss: 1.8001, Perplexity: 6.0500
Epoch [5/5], Step [1978/3236], Loss: 1.8756, Perplexity: 6.5247
Epoch [5/5], Step [1979/3236], Loss: 1.7727, Perplexity: 5.8867
Epoch [5/5], Step [1980/3236], Loss: 1.9956, Perplexity: 7.3564
Epoch [5/5], Step [1981/3236], Loss: 2.0020, Perplexity: 7.4038
Epoch [5/5], Step [1982/3236], Loss: 1.7965, Perplexity: 6.0284
Epoch [5/5], Step [1983/3236], Loss: 1.7195, Perplexity: 5.5816
Epoch [5/5], Step [1984/3236], Loss: 1.8793, Perplexity: 6.5492
Epoch [5/5], Step [1985/3236], Loss: 1.8294, Perplexity: 6.2303
Epoch [5/5], Step [1986/3236], Loss: 1.6913, Perplexity: 5.4267
Epoch [5/5], Step [1987/3236], Loss: 1.7280, Perplexity: 5.6295
Epoch [5/5], Step [1988/3236], Loss: 1.8817, Perplexity: 6.5644
Epoch [5/5], Step [1989/3236], Loss: 1.

Epoch [5/5], Step [2102/3236], Loss: 1.9024, Perplexity: 6.7020
Epoch [5/5], Step [2103/3236], Loss: 1.8361, Perplexity: 6.2723
Epoch [5/5], Step [2104/3236], Loss: 1.9007, Perplexity: 6.6904
Epoch [5/5], Step [2105/3236], Loss: 1.8112, Perplexity: 6.1180
Epoch [5/5], Step [2106/3236], Loss: 1.8411, Perplexity: 6.3033
Epoch [5/5], Step [2107/3236], Loss: 1.7824, Perplexity: 5.9440
Epoch [5/5], Step [2108/3236], Loss: 1.7726, Perplexity: 5.8862
Epoch [5/5], Step [2109/3236], Loss: 1.8637, Perplexity: 6.4477
Epoch [5/5], Step [2110/3236], Loss: 1.7269, Perplexity: 5.6231
Epoch [5/5], Step [2111/3236], Loss: 1.6954, Perplexity: 5.4490
Epoch [5/5], Step [2112/3236], Loss: 1.9667, Perplexity: 7.1473
Epoch [5/5], Step [2113/3236], Loss: 1.9082, Perplexity: 6.7413
Epoch [5/5], Step [2114/3236], Loss: 1.9916, Perplexity: 7.3275
Epoch [5/5], Step [2115/3236], Loss: 1.7584, Perplexity: 5.8032
Epoch [5/5], Step [2116/3236], Loss: 1.9557, Perplexity: 7.0690
Epoch [5/5], Step [2117/3236], Loss: 1.8

Epoch [5/5], Step [2230/3236], Loss: 1.9764, Perplexity: 7.2170
Epoch [5/5], Step [2231/3236], Loss: 1.7963, Perplexity: 6.0274
Epoch [5/5], Step [2232/3236], Loss: 1.7381, Perplexity: 5.6865
Epoch [5/5], Step [2233/3236], Loss: 1.6734, Perplexity: 5.3302
Epoch [5/5], Step [2234/3236], Loss: 2.4309, Perplexity: 11.3686
Epoch [5/5], Step [2235/3236], Loss: 1.8610, Perplexity: 6.4302
Epoch [5/5], Step [2236/3236], Loss: 1.8256, Perplexity: 6.2062
Epoch [5/5], Step [2237/3236], Loss: 1.6503, Perplexity: 5.2088
Epoch [5/5], Step [2238/3236], Loss: 1.8653, Perplexity: 6.4582
Epoch [5/5], Step [2239/3236], Loss: 2.1790, Perplexity: 8.8373
Epoch [5/5], Step [2240/3236], Loss: 1.7619, Perplexity: 5.8236
Epoch [5/5], Step [2241/3236], Loss: 2.1267, Perplexity: 8.3874
Epoch [5/5], Step [2242/3236], Loss: 1.8818, Perplexity: 6.5654
Epoch [5/5], Step [2243/3236], Loss: 1.7741, Perplexity: 5.8951
Epoch [5/5], Step [2244/3236], Loss: 1.7989, Perplexity: 6.0433
Epoch [5/5], Step [2245/3236], Loss: 1.

Epoch [5/5], Step [2358/3236], Loss: 3.2579, Perplexity: 25.9938
Epoch [5/5], Step [2359/3236], Loss: 1.7700, Perplexity: 5.8709
Epoch [5/5], Step [2360/3236], Loss: 1.8189, Perplexity: 6.1649
Epoch [5/5], Step [2361/3236], Loss: 1.8706, Perplexity: 6.4924
Epoch [5/5], Step [2362/3236], Loss: 1.8080, Perplexity: 6.0984
Epoch [5/5], Step [2363/3236], Loss: 1.8878, Perplexity: 6.6048
Epoch [5/5], Step [2364/3236], Loss: 1.8893, Perplexity: 6.6149
Epoch [5/5], Step [2365/3236], Loss: 2.0291, Perplexity: 7.6074
Epoch [5/5], Step [2366/3236], Loss: 1.8246, Perplexity: 6.2006
Epoch [5/5], Step [2367/3236], Loss: 1.8257, Perplexity: 6.2069
Epoch [5/5], Step [2368/3236], Loss: 2.0688, Perplexity: 7.9155
Epoch [5/5], Step [2369/3236], Loss: 1.8272, Perplexity: 6.2167
Epoch [5/5], Step [2370/3236], Loss: 1.8825, Perplexity: 6.5698
Epoch [5/5], Step [2371/3236], Loss: 1.8744, Perplexity: 6.5170
Epoch [5/5], Step [2372/3236], Loss: 2.0898, Perplexity: 8.0836
Epoch [5/5], Step [2373/3236], Loss: 1.

Epoch [5/5], Step [2486/3236], Loss: 1.8643, Perplexity: 6.4513
Epoch [5/5], Step [2487/3236], Loss: 1.7759, Perplexity: 5.9055
Epoch [5/5], Step [2488/3236], Loss: 1.8031, Perplexity: 6.0683
Epoch [5/5], Step [2489/3236], Loss: 1.6945, Perplexity: 5.4439
Epoch [5/5], Step [2490/3236], Loss: 2.3497, Perplexity: 10.4829
Epoch [5/5], Step [2491/3236], Loss: 1.7797, Perplexity: 5.9280
Epoch [5/5], Step [2492/3236], Loss: 2.1115, Perplexity: 8.2606
Epoch [5/5], Step [2493/3236], Loss: 1.9247, Perplexity: 6.8533
Epoch [5/5], Step [2494/3236], Loss: 2.2737, Perplexity: 9.7154
Epoch [5/5], Step [2495/3236], Loss: 1.7470, Perplexity: 5.7375
Epoch [5/5], Step [2496/3236], Loss: 1.7593, Perplexity: 5.8085
Epoch [5/5], Step [2497/3236], Loss: 2.0784, Perplexity: 7.9920
Epoch [5/5], Step [2498/3236], Loss: 1.8306, Perplexity: 6.2378
Epoch [5/5], Step [2499/3236], Loss: 1.8839, Perplexity: 6.5793
Epoch [5/5], Step [2500/3236], Loss: 1.8571, Perplexity: 6.4052
Epoch [5/5], Step [2501/3236], Loss: 1.

Epoch [5/5], Step [2614/3236], Loss: 2.1215, Perplexity: 8.3439
Epoch [5/5], Step [2615/3236], Loss: 1.8556, Perplexity: 6.3956
Epoch [5/5], Step [2616/3236], Loss: 1.6734, Perplexity: 5.3303
Epoch [5/5], Step [2617/3236], Loss: 1.7692, Perplexity: 5.8662
Epoch [5/5], Step [2618/3236], Loss: 1.7710, Perplexity: 5.8766
Epoch [5/5], Step [2619/3236], Loss: 1.9465, Perplexity: 7.0041
Epoch [5/5], Step [2620/3236], Loss: 1.8031, Perplexity: 6.0685
Epoch [5/5], Step [2621/3236], Loss: 1.8275, Perplexity: 6.2183
Epoch [5/5], Step [2622/3236], Loss: 1.7424, Perplexity: 5.7113
Epoch [5/5], Step [2623/3236], Loss: 1.8740, Perplexity: 6.5145
Epoch [5/5], Step [2624/3236], Loss: 1.9541, Perplexity: 7.0573
Epoch [5/5], Step [2625/3236], Loss: 1.7659, Perplexity: 5.8470
Epoch [5/5], Step [2626/3236], Loss: 1.9912, Perplexity: 7.3244
Epoch [5/5], Step [2627/3236], Loss: 1.6665, Perplexity: 5.2938
Epoch [5/5], Step [2628/3236], Loss: 1.7932, Perplexity: 6.0088
Epoch [5/5], Step [2629/3236], Loss: 1.8

Epoch [5/5], Step [2742/3236], Loss: 1.8382, Perplexity: 6.2852
Epoch [5/5], Step [2743/3236], Loss: 1.8170, Perplexity: 6.1533
Epoch [5/5], Step [2744/3236], Loss: 1.7520, Perplexity: 5.7661
Epoch [5/5], Step [2745/3236], Loss: 1.8518, Perplexity: 6.3716
Epoch [5/5], Step [2746/3236], Loss: 2.1535, Perplexity: 8.6147
Epoch [5/5], Step [2747/3236], Loss: 1.7235, Perplexity: 5.6044
Epoch [5/5], Step [2748/3236], Loss: 1.8400, Perplexity: 6.2963
Epoch [5/5], Step [2749/3236], Loss: 1.9645, Perplexity: 7.1311
Epoch [5/5], Step [2750/3236], Loss: 1.9305, Perplexity: 6.8931
Epoch [5/5], Step [2751/3236], Loss: 1.7487, Perplexity: 5.7470
Epoch [5/5], Step [2752/3236], Loss: 1.8177, Perplexity: 6.1577
Epoch [5/5], Step [2753/3236], Loss: 1.7290, Perplexity: 5.6353
Epoch [5/5], Step [2754/3236], Loss: 1.9064, Perplexity: 6.7286
Epoch [5/5], Step [2755/3236], Loss: 1.7159, Perplexity: 5.5617
Epoch [5/5], Step [2756/3236], Loss: 2.6558, Perplexity: 14.2367
Epoch [5/5], Step [2757/3236], Loss: 1.

Epoch [5/5], Step [2870/3236], Loss: 1.8572, Perplexity: 6.4059
Epoch [5/5], Step [2871/3236], Loss: 1.8056, Perplexity: 6.0835
Epoch [5/5], Step [2872/3236], Loss: 1.7607, Perplexity: 5.8166
Epoch [5/5], Step [2873/3236], Loss: 1.7773, Perplexity: 5.9137
Epoch [5/5], Step [2874/3236], Loss: 1.9695, Perplexity: 7.1672
Epoch [5/5], Step [2875/3236], Loss: 1.7700, Perplexity: 5.8707
Epoch [5/5], Step [2876/3236], Loss: 1.7596, Perplexity: 5.8098
Epoch [5/5], Step [2877/3236], Loss: 1.8471, Perplexity: 6.3413
Epoch [5/5], Step [2878/3236], Loss: 1.7822, Perplexity: 5.9427
Epoch [5/5], Step [2879/3236], Loss: 1.8232, Perplexity: 6.1916
Epoch [5/5], Step [2880/3236], Loss: 1.6844, Perplexity: 5.3892
Epoch [5/5], Step [2881/3236], Loss: 1.6973, Perplexity: 5.4591
Epoch [5/5], Step [2882/3236], Loss: 1.9192, Perplexity: 6.8158
Epoch [5/5], Step [2883/3236], Loss: 1.6969, Perplexity: 5.4572
Epoch [5/5], Step [2884/3236], Loss: 1.7682, Perplexity: 5.8602
Epoch [5/5], Step [2885/3236], Loss: 1.8

Epoch [5/5], Step [2998/3236], Loss: 1.8521, Perplexity: 6.3734
Epoch [5/5], Step [2999/3236], Loss: 1.7871, Perplexity: 5.9719
Epoch [5/5], Step [3000/3236], Loss: 1.6904, Perplexity: 5.4215
Epoch [5/5], Step [3001/3236], Loss: 1.7972, Perplexity: 6.0325
Epoch [5/5], Step [3002/3236], Loss: 2.2483, Perplexity: 9.4712
Epoch [5/5], Step [3003/3236], Loss: 1.7820, Perplexity: 5.9416
Epoch [5/5], Step [3004/3236], Loss: 1.7311, Perplexity: 5.6471
Epoch [5/5], Step [3005/3236], Loss: 1.7174, Perplexity: 5.5699
Epoch [5/5], Step [3006/3236], Loss: 1.7881, Perplexity: 5.9782
Epoch [5/5], Step [3007/3236], Loss: 1.7860, Perplexity: 5.9654
Epoch [5/5], Step [3008/3236], Loss: 1.7611, Perplexity: 5.8191
Epoch [5/5], Step [3009/3236], Loss: 1.9377, Perplexity: 6.9428
Epoch [5/5], Step [3010/3236], Loss: 1.7232, Perplexity: 5.6026
Epoch [5/5], Step [3011/3236], Loss: 2.4499, Perplexity: 11.5873
Epoch [5/5], Step [3012/3236], Loss: 1.8487, Perplexity: 6.3516
Epoch [5/5], Step [3013/3236], Loss: 1.

Epoch [5/5], Step [3126/3236], Loss: 1.9259, Perplexity: 6.8613
Epoch [5/5], Step [3127/3236], Loss: 1.7614, Perplexity: 5.8207
Epoch [5/5], Step [3128/3236], Loss: 1.7728, Perplexity: 5.8872
Epoch [5/5], Step [3129/3236], Loss: 1.7809, Perplexity: 5.9354
Epoch [5/5], Step [3130/3236], Loss: 1.7660, Perplexity: 5.8474
Epoch [5/5], Step [3131/3236], Loss: 2.4904, Perplexity: 12.0666
Epoch [5/5], Step [3132/3236], Loss: 2.1004, Perplexity: 8.1692
Epoch [5/5], Step [3133/3236], Loss: 1.8316, Perplexity: 6.2439
Epoch [5/5], Step [3134/3236], Loss: 1.7435, Perplexity: 5.7171
Epoch [5/5], Step [3135/3236], Loss: 1.7988, Perplexity: 6.0422
Epoch [5/5], Step [3136/3236], Loss: 1.9956, Perplexity: 7.3563
Epoch [5/5], Step [3137/3236], Loss: 1.7150, Perplexity: 5.5567
Epoch [5/5], Step [3138/3236], Loss: 1.9861, Perplexity: 7.2872
Epoch [5/5], Step [3139/3236], Loss: 1.6798, Perplexity: 5.3646
Epoch [5/5], Step [3140/3236], Loss: 1.7433, Perplexity: 5.7164
Epoch [5/5], Step [3141/3236], Loss: 1.