# Computer Vision Nanodegree

## Project: Image Captioning

---

In this notebook, you will train your CNN-RNN model.  

You are welcome and encouraged to try out many different architectures and hyperparameters when searching for a good model.

This does have the potential to make the project quite messy!  Before submitting your project, make sure that you clean up:
- the code you write in this notebook.  The notebook should describe how to train a single CNN-RNN architecture, corresponding to your final choice of hyperparameters.  You should structure the notebook so that the reviewer can replicate your results by running the code in this notebook.  
- the output of the code cell in **Step 2**.  The output should show the output obtained when training the model from scratch.

This notebook **will be graded**.  

Feel free to use the links below to navigate the notebook:
- [Step 1](#step1): Training Setup
- [Step 2](#step2): Train your Model
- [Step 3](#step3): (Optional) Validate your Model

<a id='step1'></a>
## Step 1: Training Setup

In this step of the notebook, you will customize the training of your CNN-RNN model by specifying hyperparameters and setting other options that are important to the training procedure.  The values you set now will be used when training your model in **Step 2** below.

You should only amend blocks of code that are preceded by a `TODO` statement.  **Any code blocks that are not preceded by a `TODO` statement should not be modified**.

### Task #1

Begin by setting the following variables:
- `batch_size` - the batch size of each training batch.  It is the number of image-caption pairs used to amend the model weights in each training step. 
- `vocab_threshold` - the minimum word count threshold.  Note that a larger threshold will result in a smaller vocabulary, whereas a smaller threshold will include rarer words and result in a larger vocabulary.  
- `vocab_from_file` - a Boolean that decides whether to load the vocabulary from file. 
- `embed_size` - the dimensionality of the image and word embeddings.  
- `hidden_size` - the number of features in the hidden state of the RNN decoder.  
- `num_epochs` - the number of epochs to train the model.  We recommend that you set `num_epochs=3`, but feel free to increase or decrease this number as you wish.  [This paper](https://arxiv.org/pdf/1502.03044.pdf) trained a captioning model on a single state-of-the-art GPU for 3 days, but you'll soon see that you can get reasonable results in a matter of a few hours!  (_But of course, if you want your model to compete with current research, you will have to train for much longer._)
- `save_every` - determines how often to save the model weights.  We recommend that you set `save_every=1`, to save the model weights after each epoch.  This way, after the `i`th epoch, the encoder and decoder weights will be saved in the `models/` folder as `encoder-i.pkl` and `decoder-i.pkl`, respectively.
- `print_every` - determines how often to print the batch loss to the Jupyter notebook while training.  Note that you **will not** observe a monotonic decrease in the loss function while training - this is perfectly fine and completely expected!  You are encouraged to keep this at its default value of `100` to avoid clogging the notebook, but feel free to change it.
- `log_file` - the name of the text file containing - for every step - how the loss and perplexity evolved during training.

If you're not sure where to begin to set some of the values above, you can peruse [this paper](https://arxiv.org/pdf/1502.03044.pdf) and [this paper](https://arxiv.org/pdf/1411.4555.pdf) for useful guidance!  **To avoid spending too long on this notebook**, you are encouraged to consult these suggested research papers to obtain a strong initial guess for which hyperparameters are likely to work best.  Then, train a single model, and proceed to the next notebook (**3_Inference.ipynb**).  If you are unhappy with your performance, you can return to this notebook to tweak the hyperparameters (and/or the architecture in **model.py**) and re-train your model.

### Question 1

**Question:** Describe your CNN-RNN architecture in detail.  With this architecture in mind, how did you select the values of the variables in Task 1?  If you consulted a research paper detailing a successful implementation of an image captioning model, please provide the reference.

**Answer:** I looked at this research paper Show and Tell: A Neural Image Caption Generator https://arxiv.org/pdf/1411.4555.pdf.

My CNN-RNN architecture consists of a CNN Encoder and a RNN decoder. 

`Input data`: Input data contains image and corresponding captions

`CNN Encoder`: The CNN encoder uses resnet 50 architecture with the final layer removed.
At this point, this extracts the features from the image. We then flatten the features and then pass it through a linear layer to make it to the size of word embedding layer. The size of the output features is set to embed_size.

`RNN Decoder`: 
The RNN encoder takes two inputs
- Embedded image feature vector (from the CNN encoder)
- Corresponding captions

The captions are passed through an word embedding layer and then combined with image features.
They are passed through a LSTM layer. The output from LSTM is passed through a dropout layer. It is then passed through a final linear layer. The output features of this layer is of size of the vocabulary. This gives the scores for each word in the vocabulary

`Training`: During training phase, we compare the output of the decoder against the input captions and do backpropogation to minimize loss/error.

Here are the variable values I used.

- `batch_size` - 30. I tried batch sizes of 10 and 20 also, but batchsize 10 and 20 took way longer than the runs with batch size 30. Perplexity seemed to reduce faster for the initial steps when using batchsize 30
- `vocab_threshold` - 5  As per the referenced paper https://arxiv.org/pdf/1411.4555.pdf , I used a size of 5 to select words that appeared atleast 5 times. This was reasonable and not noisy (lot of tokens).
- `vocab_from_file` - I set them to True, Since i already loade vocab to file in previous step.
- `embed_size` - 512  I based it mainly from the above referenced paper. I also tried 256 for few epochs during the training. 512 seemed to converge a little better.
- `hidden_size` - 512  I based it mainly from the above referenced paper.



### (Optional) Task #2

Note that we have provided a recommended image transform `transform_train` for pre-processing the training images, but you are welcome (and encouraged!) to modify it as you wish.  When modifying this transform, keep in mind that:
- the images in the dataset have varying heights and widths, and 
- if using a pre-trained model, you must perform the corresponding appropriate normalization.

### Question 2

**Question:** How did you select the transform in `transform_train`?  If you left the transform at its provided value, why do you think that it is a good choice for your CNN architecture?

**Answer:**  I left the transform as it is. It is good choice for my CNN Architecture as it involves resizing image, doing a random crop, flipping the image and normalizing it. Normalizing image values makes it uniform across the training set.

Thse transformations covers most of the variations we could find in real world when we pass unseen images to the network during prediction.



### Task #3

Next, you will specify a Python list containing the learnable parameters of the model.  For instance, if you decide to make all weights in the decoder trainable, but only want to train the weights in the embedding layer of the encoder, then you should set `params` to something like:
```
params = list(decoder.parameters()) + list(encoder.embed.parameters()) 
```

### Question 3

**Question:** How did you select the trainable parameters of your architecture?  Why do you think this is a good choice?

**Answer:** 
Since most of the layers in the CNN Encoder was pretrained, I did not want to retrain all the weights again.
I only wanted to train the weights in the embedding layer.

Since I wanted to train all the weights in the RNN decoder, I set the parameters for decoder to decoder.parameters().
Due to this weights in all the layers for decoder will be trained to generate captions for the images.


### Task #4

Finally, you will select an [optimizer](http://pytorch.org/docs/master/optim.html#torch.optim.Optimizer).

### Question 4

**Question:** How did you select the optimizer used to train your model?

**Answer:** 
I used Adam optmizer with learning_rate=0.001. Based on my research, Adam optimizer seems well suited for most CNN and RNN type of problems. It uses decay learning rate implicitly. It helps in regularization and would prevent overfitting.

In [2]:
import torch
import torch.nn as nn
from torchvision import transforms
import sys
sys.path.append('/opt/cocoapi/PythonAPI')
!pip install nltk
import nltk
nltk.download('punkt')
from pycocotools.coco import COCO
from data_loader import get_loader
from model import EncoderCNN, DecoderRNN
import math


## TODO #1: Select appropriate values for the Python variables below.
batch_size = 30          # batch size
vocab_threshold = 5        # minimum word count threshold
vocab_from_file = True    # if True, load existing vocab file
embed_size = 512           # dimensionality of image and word embeddings
hidden_size = 512          # number of features in hidden state of the RNN decoder
num_epochs = 3             # number of training epochs
save_every = 1             # determines frequency of saving model weights
print_every = 100          # determines window for printing average loss
log_file = 'training_log.txt'       # name of file with saved training loss and perplexity


# (Optional) TODO #2: Amend the image transform below.
transform_train = transforms.Compose([ 
    transforms.Resize(256),                          # smaller edge of image resized to 256
    transforms.RandomCrop(224),                      # get 224x224 crop from random location
    transforms.RandomHorizontalFlip(),               # horizontally flip image with probability=0.5
    transforms.ToTensor(),                           # convert the PIL Image to a tensor
    transforms.Normalize((0.485, 0.456, 0.406),      # normalize image for pre-trained model
                         (0.229, 0.224, 0.225))])

# Build data loader.
data_loader = get_loader(transform=transform_train,
                         mode='train',
                         batch_size=batch_size,
                         vocab_threshold=vocab_threshold,
                         vocab_from_file=vocab_from_file)

# The size of the vocabulary.
vocab_size = len(data_loader.dataset.vocab)

# Initialize the encoder and decoder. 
encoder = EncoderCNN(embed_size)
decoder = DecoderRNN(embed_size, hidden_size, vocab_size)

# Move models to GPU if CUDA is available. 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
encoder.to(device)
decoder.to(device)

# Define the loss function. 
criterion = nn.CrossEntropyLoss().cuda() if torch.cuda.is_available() else nn.CrossEntropyLoss()

# TODO #3: Specify the learnable parameters of the model.
params = list(decoder.parameters()) + list(encoder.embed.parameters()) 

# TODO #4: Define the optimizer.
learning_rate=0.001
optimizer = torch.optim.Adam(params, lr=learning_rate)


# Set the total number of training steps per epoch.
total_step = math.ceil(len(data_loader.dataset.caption_lengths) / data_loader.batch_sampler.batch_size)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Vocabulary successfully loaded from vocab.pkl file!
loading annotations into memory...


  0%|          | 878/414113 [00:00<01:32, 4486.28it/s]

Done (t=0.89s)
creating index...
index created!
Obtaining caption lengths...


100%|██████████| 414113/414113 [01:28<00:00, 4669.08it/s]


<a id='step2'></a>
## Step 2: Train your Model

Once you have executed the code cell in **Step 1**, the training procedure below should run without issue.  

It is completely fine to leave the code cell below as-is without modifications to train your model.  However, if you would like to modify the code used to train the model below, you must ensure that your changes are easily parsed by your reviewer.  In other words, make sure to provide appropriate comments to describe how your code works!  

You may find it useful to load saved weights to resume training.  In that case, note the names of the files containing the encoder and decoder weights that you'd like to load (`encoder_file` and `decoder_file`).  Then you can load the weights by using the lines below:

```python
# Load pre-trained weights before resuming training.
encoder.load_state_dict(torch.load(os.path.join('./models', encoder_file)))
decoder.load_state_dict(torch.load(os.path.join('./models', decoder_file)))
```

While trying out parameters, make sure to take extensive notes and record the settings that you used in your various training runs.  In particular, you don't want to encounter a situation where you've trained a model for several hours but can't remember what settings you used :).

### A Note on Tuning Hyperparameters

To figure out how well your model is doing, you can look at how the training loss and perplexity evolve during training - and for the purposes of this project, you are encouraged to amend the hyperparameters based on this information.  

However, this will not tell you if your model is overfitting to the training data, and, unfortunately, overfitting is a problem that is commonly encountered when training image captioning models.  

For this project, you need not worry about overfitting. **This project does not have strict requirements regarding the performance of your model**, and you just need to demonstrate that your model has learned **_something_** when you generate captions on the test data.  For now, we strongly encourage you to train your model for the suggested 3 epochs without worrying about performance; then, you should immediately transition to the next notebook in the sequence (**3_Inference.ipynb**) to see how your model performs on the test data.  If your model needs to be changed, you can come back to this notebook, amend hyperparameters (if necessary), and re-train the model.

That said, if you would like to go above and beyond in this project, you can read about some approaches to minimizing overfitting in section 4.3.1 of [this paper](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7505636).  In the next (optional) step of this notebook, we provide some guidance for assessing the performance on the validation dataset.

In [3]:
print ('total_step',total_step)

total_step 13804


In [4]:
print ('num_epochs',num_epochs)

num_epochs 3


In [5]:

import torch.utils.data as data
import numpy as np
import os
import requests
import time

# Open the training log file.
f = open(log_file, 'w')

old_time = time.time()
response = requests.request("GET", 
                            "http://metadata.google.internal/computeMetadata/v1/instance/attributes/keep_alive_token", 
                            headers={"Metadata-Flavor":"Google"})

for epoch in range(1, num_epochs+1):
    
    start = None
    
    for i_step in range(1, total_step+1):
        
        if not start:
            start = time.time()
        
        if time.time() - old_time > 60:
            old_time = time.time()
            requests.request("POST", 
                             "https://nebula.udacity.com/api/v1/remote/keep-alive", 
                             headers={'Authorization': "STAR " + response.text})
        
        # Randomly sample a caption length, and sample indices with that length.
        indices = data_loader.dataset.get_train_indices()
        # Create and assign a batch sampler to retrieve a batch with the sampled indices.
        new_sampler = data.sampler.SubsetRandomSampler(indices=indices)
        data_loader.batch_sampler.sampler = new_sampler
        
        # Obtain the batch.
        images, captions = next(iter(data_loader))

        # Move batch of images and captions to GPU if CUDA is available.
        images = images.to(device)
        captions = captions.to(device)
        
        # Zero the gradients.
        decoder.zero_grad()
        encoder.zero_grad()
        
        # Pass the inputs through the CNN-RNN model.
        features = encoder(images)
        outputs = decoder(features, captions)
        
        # Calculate the batch loss.
        loss = criterion(outputs.view(-1, vocab_size), captions.view(-1))
        
        # Backward pass.
        loss.backward()
        
        # Update the parameters in the optimizer.
        optimizer.step()
        
        end = time.time()
        
        time_taken_in_seconds = int((end - start))
        
                    
        # Get training statistics.
        stats = 'Epoch [%d/%d], Step [%d/%d], Loss: %.4f, Perplexity: %5.4f, time_taken_in_seconds: %d'% (epoch, num_epochs, i_step, total_step, loss.item(), np.exp(loss.item()),time_taken_in_seconds )
        

            
        # Print training statistics (on same line).
        print('\r' + stats, end="")
        sys.stdout.flush()
        
        # Print training statistics to file.
        f.write(stats + '\n')
        f.flush()
        
        # Print training statistics (on different line).
        if i_step % print_every == 0:
            start = end
            print('\r' + stats)
            
    # Save the weights.
    if epoch % save_every == 0:
        torch.save(decoder.state_dict(), os.path.join('./models', 'decoder-%d-outof-%d.pkl' % (epoch,num_epochs)))
        torch.save(encoder.state_dict(), os.path.join('./models', 'encoder-%d-outof-%d.pkl' % (epoch,num_epochs)))

# Close the training log file.
f.close()

Epoch [1/3], Step [100/13804], Loss: 4.1025, Perplexity: 60.4904, time_taken_in_seconds: 91
Epoch [1/3], Step [200/13804], Loss: 3.7123, Perplexity: 40.9498, time_taken_in_seconds: 89
Epoch [1/3], Step [300/13804], Loss: 4.0174, Perplexity: 55.5588, time_taken_in_seconds: 88
Epoch [1/3], Step [400/13804], Loss: 3.3462, Perplexity: 28.3949, time_taken_in_seconds: 87
Epoch [1/3], Step [500/13804], Loss: 3.1668, Perplexity: 23.7317, time_taken_in_seconds: 87
Epoch [1/3], Step [600/13804], Loss: 3.5910, Perplexity: 36.2699, time_taken_in_seconds: 86
Epoch [1/3], Step [700/13804], Loss: 3.3673, Perplexity: 29.0005, time_taken_in_seconds: 86
Epoch [1/3], Step [800/13804], Loss: 3.0468, Perplexity: 21.0474, time_taken_in_seconds: 862
Epoch [1/3], Step [900/13804], Loss: 3.1203, Perplexity: 22.6529, time_taken_in_seconds: 86
Epoch [1/3], Step [1000/13804], Loss: 3.1672, Perplexity: 23.7411, time_taken_in_seconds: 86
Epoch [1/3], Step [1200/13804], Loss: 3.3465, Perplexity: 28.4019, time_taken_

Epoch [2/3], Step [3900/13804], Loss: 2.5040, Perplexity: 12.2313, time_taken_in_seconds: 76
Epoch [2/3], Step [4000/13804], Loss: 2.1594, Perplexity: 8.6657, time_taken_in_seconds: 765
Epoch [2/3], Step [4100/13804], Loss: 2.3326, Perplexity: 10.3049, time_taken_in_seconds: 77
Epoch [2/3], Step [4200/13804], Loss: 2.6869, Perplexity: 14.6864, time_taken_in_seconds: 76
Epoch [2/3], Step [4300/13804], Loss: 2.3046, Perplexity: 10.0202, time_taken_in_seconds: 77
Epoch [2/3], Step [4400/13804], Loss: 2.9569, Perplexity: 19.2382, time_taken_in_seconds: 77
Epoch [2/3], Step [4500/13804], Loss: 2.6642, Perplexity: 14.3564, time_taken_in_seconds: 76
Epoch [2/3], Step [4600/13804], Loss: 2.2084, Perplexity: 9.1014, time_taken_in_seconds: 765
Epoch [2/3], Step [4700/13804], Loss: 2.3340, Perplexity: 10.3190, time_taken_in_seconds: 76
Epoch [2/3], Step [4800/13804], Loss: 2.4157, Perplexity: 11.1980, time_taken_in_seconds: 76
Epoch [2/3], Step [4900/13804], Loss: 2.4502, Perplexity: 11.5907, tim

Epoch [3/3], Step [7600/13804], Loss: 2.6337, Perplexity: 13.9247, time_taken_in_seconds: 81
Epoch [3/3], Step [7700/13804], Loss: 2.3622, Perplexity: 10.6140, time_taken_in_seconds: 81
Epoch [3/3], Step [7800/13804], Loss: 2.3933, Perplexity: 10.9495, time_taken_in_seconds: 81
Epoch [3/3], Step [7900/13804], Loss: 2.3970, Perplexity: 10.9904, time_taken_in_seconds: 81
Epoch [3/3], Step [8000/13804], Loss: 2.2663, Perplexity: 9.6438, time_taken_in_seconds: 832
Epoch [3/3], Step [8100/13804], Loss: 2.3180, Perplexity: 10.1549, time_taken_in_seconds: 83
Epoch [3/3], Step [8200/13804], Loss: 2.6428, Perplexity: 14.0530, time_taken_in_seconds: 82
Epoch [3/3], Step [8300/13804], Loss: 2.1051, Perplexity: 8.2080, time_taken_in_seconds: 821
Epoch [3/3], Step [8400/13804], Loss: 2.5224, Perplexity: 12.4582, time_taken_in_seconds: 82
Epoch [3/3], Step [8500/13804], Loss: 2.4335, Perplexity: 11.3992, time_taken_in_seconds: 83
Epoch [3/3], Step [8600/13804], Loss: 2.4222, Perplexity: 11.2706, tim

In [5]:
#torch.save(decoder.state_dict(), os.path.join('./models', 'decoder-%d-outof-%d.pkl' % (epoch,num_epochs)))
#torch.save(encoder.state_dict(), os.path.join('./models', 'encoder-%d-outof-%d.pkl' % (epoch,num_epochs)))

<a id='step3'></a>
## Step 3: (Optional) Validate your Model

To assess potential overfitting, one approach is to assess performance on a validation set.  If you decide to do this **optional** task, you are required to first complete all of the steps in the next notebook in the sequence (**3_Inference.ipynb**); as part of that notebook, you will write and test code (specifically, the `sample` method in the `DecoderRNN` class) that uses your RNN decoder to generate captions.  That code will prove incredibly useful here. 

If you decide to validate your model, please do not edit the data loader in **data_loader.py**.  Instead, create a new file named **data_loader_val.py** containing the code for obtaining the data loader for the validation data.  You can access:
- the validation images at filepath `'/opt/cocoapi/images/train2014/'`, and
- the validation image caption annotation file at filepath `'/opt/cocoapi/annotations/captions_val2014.json'`.

The suggested approach to validating your model involves creating a json file such as [this one](https://github.com/cocodataset/cocoapi/blob/master/results/captions_val2014_fakecap_results.json) containing your model's predicted captions for the validation images.  Then, you can write your own script or use one that you [find online](https://github.com/tylin/coco-caption) to calculate the BLEU score of your model.  You can read more about the BLEU score, along with other evaluation metrics (such as TEOR and Cider) in section 4.1 of [this paper](https://arxiv.org/pdf/1411.4555.pdf).  For more information about how to use the annotation file, check out the [website](http://cocodataset.org/#download) for the COCO dataset.

In [None]:
# (Optional) TODO: Validate your model.

# For reference Few different trial runs

In [None]:
#Few different trial runs

batch_size = 30          # batch size
vocab_threshold = 5        # minimum word count threshold
vocab_from_file = True    # if True, load existing vocab file
embed_size = 256           # dimensionality of image and word embeddings
hidden_size = 512          # number of features in hidden state of the RNN decoder
num_epochs = 1             # number of training epochs
save_every = 1             # determines frequency of saving model weights
print_every = 100          # determines window for printing average loss
log_file = 'training_log.txt' 

stopped after 40 minutes

Epoch [1/1], Step [100/13804], Loss: 4.3455, Perplexity: 77.1335, time_taken_in_seconds: 77
Epoch [1/1], Step [200/13804], Loss: 3.8539, Perplexity: 47.1757, time_taken_in_seconds: 777
Epoch [1/1], Step [300/13804], Loss: 3.9285, Perplexity: 50.8294, time_taken_in_seconds: 77
Epoch [1/1], Step [400/13804], Loss: 3.4706, Perplexity: 32.1555, time_taken_in_seconds: 77
Epoch [1/1], Step [500/13804], Loss: 3.5356, Perplexity: 34.3140, time_taken_in_seconds: 77
Epoch [1/1], Step [600/13804], Loss: 3.4750, Perplexity: 32.2964, time_taken_in_seconds: 76
Epoch [1/1], Step [700/13804], Loss: 3.2260, Perplexity: 25.1798, time_taken_in_seconds: 76
Epoch [1/1], Step [800/13804], Loss: 3.4285, Perplexity: 30.8289, time_taken_in_seconds: 76
Epoch [1/1], Step [900/13804], Loss: 3.0650, Perplexity: 21.4352, time_taken_in_seconds: 76
Epoch [1/1], Step [1100/13804], Loss: 3.4655, Perplexity: 31.9930, time_taken_in_seconds: 759
Epoch [1/1], Step [1200/13804], Loss: 3.1675, Perplexity: 23.7482, time_taken_in_seconds: 753
Epoch [1/1], Step [1300/13804], Loss: 3.1990, Perplexity: 24.5080, time_taken_in_seconds: 75
Epoch [1/1], Step [1400/13804], Loss: 3.2491, Perplexity: 25.7671, time_taken_in_seconds: 75
Epoch [1/1], Step [1500/13804], Loss: 3.5756, Perplexity: 35.7153, time_taken_in_seconds: 75
Epoch [1/1], Step [1600/13804], Loss: 2.8813, Perplexity: 17.8377, time_taken_in_seconds: 75
Epoch [1/1], Step [1700/13804], Loss: 3.2494, Perplexity: 25.7749, time_taken_in_seconds: 75
Epoch [1/1], Step [1800/13804], Loss: 3.3399, Perplexity: 28.2162, time_taken_in_seconds: 74
Epoch [1/1], Step [1900/13804], Loss: 2.6590, Perplexity: 14.2826, time_taken_in_seconds: 74
Epoch [1/1], Step [2000/13804], Loss: 2.5738, Perplexity: 13.1161, time_taken_in_seconds: 74
Epoch [1/1], Step [2100/13804], Loss: 2.9586, Perplexity: 19.2718, time_taken_in_seconds: 74
Epoch [1/1], Step [2200/13804], Loss: 2.8758, Perplexity: 17.7390, time_taken_in_seconds: 74
Epoch [1/1], Step [2300/13804], Loss: 3.0937, Perplexity: 22.0577, time_taken_in_seconds: 749
Epoch [1/1], Step [2400/13804], Loss: 2.7107, Perplexity: 15.0397, time_taken_in_seconds: 75
Epoch [1/1], Step [2500/13804], Loss: 2.6428, Perplexity: 14.0525, time_taken_in_seconds: 74
Epoch [1/1], Step [2600/13804], Loss: 2.3986, Perplexity: 11.0072, time_taken_in_seconds: 74
Epoch [1/1], Step [2700/13804], Loss: 2.5816, Perplexity: 13.2178, time_taken_in_seconds: 74
Epoch [1/1], Step [2800/13804], Loss: 3.0131, Perplexity: 20.3498, time_taken_in_seconds: 74
Epoch [1/1], Step [2900/13804], Loss: 2.8821, Perplexity: 17.8518, time_taken_in_seconds: 74
Epoch [1/1], Step [3000/13804], Loss: 2.6632, Perplexity: 14.3420, time_taken_in_seconds: 74
Epoch [1/1], Step [3100/13804], Loss: 2.7100, Perplexity: 15.0291, time_taken_in_seconds: 74
Epoch [1/1], Step [3200/13804], Loss: 2.8831, Perplexity: 17.8701, time_taken_in_seconds: 74
Epoch [1/1], Step [3300/13804], Loss: 2.7507, Perplexity: 15.6542, time_taken_in_seconds: 74
Epoch [1/1], Step [3400/13804], Loss: 2.8747, Perplexity: 17.7201, time_taken_in_seconds: 73
Epoch [1/1], Step [3489/13804], Loss: 2.8942, Perplexity: 18.0693, time_taken_in_seconds: 65

In [None]:
#Few different trial runs
batch_size = 30          # batch size
vocab_threshold = 5        # minimum word count threshold
vocab_from_file = True    # if True, load existing vocab file
embed_size = 512           # dimensionality of image and word embeddings
hidden_size = 512          # number of features in hidden state of the RNN decoder
num_epochs = 1             # number of training epochs
save_every = 1             # determines frequency of saving model weights
print_every = 100          # determines window for printing average loss
log_file = 'training_log.txt'       # name of file with saved training loss and perplexity


Epoch [1/1], Step [100/13804], Loss: 4.2236, Perplexity: 68.2786, time_taken_in_seconds: 73
Epoch [1/1], Step [200/13804], Loss: 3.7635, Perplexity: 43.0992, time_taken_in_seconds: 743
Epoch [1/1], Step [300/13804], Loss: 3.3348, Perplexity: 28.0730, time_taken_in_seconds: 749
Epoch [1/1], Step [400/13804], Loss: 3.8786, Perplexity: 48.3560, time_taken_in_seconds: 73
Epoch [1/1], Step [500/13804], Loss: 3.0804, Perplexity: 21.7682, time_taken_in_seconds: 73
Epoch [1/1], Step [600/13804], Loss: 3.4437, Perplexity: 31.3030, time_taken_in_seconds: 73
Epoch [1/1], Step [700/13804], Loss: 3.1230, Perplexity: 22.7136, time_taken_in_seconds: 73
Epoch [1/1], Step [800/13804], Loss: 3.4453, Perplexity: 31.3534, time_taken_in_seconds: 73
Epoch [1/1], Step [900/13804], Loss: 3.5138, Perplexity: 33.5740, time_taken_in_seconds: 73
Epoch [1/1], Step [1000/13804], Loss: 2.8462, Perplexity: 17.2222, time_taken_in_seconds: 73
Epoch [1/1], Step [1100/13804], Loss: 2.9218, Perplexity: 18.5747, time_taken_in_seconds: 73
Epoch [1/1], Step [1200/13804], Loss: 3.0687, Perplexity: 21.5144, time_taken_in_seconds: 73
Epoch [1/1], Step [1300/13804], Loss: 3.0089, Perplexity: 20.2647, time_taken_in_seconds: 73
Epoch [1/1], Step [1400/13804], Loss: 3.0618, Perplexity: 21.3651, time_taken_in_seconds: 73
Epoch [1/1], Step [1500/13804], Loss: 2.7041, Perplexity: 14.9410, time_taken_in_seconds: 73
Epoch [1/1], Step [1600/13804], Loss: 3.0829, Perplexity: 21.8213, time_taken_in_seconds: 73
Epoch [1/1], Step [1700/13804], Loss: 2.5687, Perplexity: 13.0485, time_taken_in_seconds: 73
Epoch [1/1], Step [1800/13804], Loss: 2.9133, Perplexity: 18.4180, time_taken_in_seconds: 72
Epoch [1/1], Step [1900/13804], Loss: 3.2442, Perplexity: 25.6403, time_taken_in_seconds: 73
Epoch [1/1], Step [2000/13804], Loss: 2.8113, Perplexity: 16.6322, time_taken_in_seconds: 73
Epoch [1/1], Step [2100/13804], Loss: 2.8890, Perplexity: 17.9755, time_taken_in_seconds: 73
Epoch [1/1], Step [2200/13804], Loss: 2.8850, Perplexity: 17.9027, time_taken_in_seconds: 72
Epoch [1/1], Step [2300/13804], Loss: 2.7604, Perplexity: 15.8054, time_taken_in_seconds: 73
Epoch [1/1], Step [2400/13804], Loss: 3.0634, Perplexity: 21.4010, time_taken_in_seconds: 73
Epoch [1/1], Step [2500/13804], Loss: 2.5524, Perplexity: 12.8373, time_taken_in_seconds: 73
Epoch [1/1], Step [2600/13804], Loss: 3.4624, Perplexity: 31.8927, time_taken_in_seconds: 738
Epoch [1/1], Step [2700/13804], Loss: 2.8275, Perplexity: 16.9037, time_taken_in_seconds: 73
Epoch [1/1], Step [2800/13804], Loss: 2.5101, Perplexity: 12.3067, time_taken_in_seconds: 73
Epoch [1/1], Step [2900/13804], Loss: 2.7343, Perplexity: 15.3992, time_taken_in_seconds: 73
Epoch [1/1], Step [3000/13804], Loss: 2.7504, Perplexity: 15.6484, time_taken_in_seconds: 73
Epoch [1/1], Step [3100/13804], Loss: 2.6280, Perplexity: 13.8462, time_taken_in_seconds: 73
Epoch [1/1], Step [3200/13804], Loss: 2.7795, Perplexity: 16.1102, time_taken_in_seconds: 73
Epoch [1/1], Step [3300/13804], Loss: 3.3635, Perplexity: 28.8895, time_taken_in_seconds: 73
Epoch [1/1], Step [3400/13804], Loss: 3.5626, Perplexity: 35.2556, time_taken_in_seconds: 73
Epoch [1/1], Step [3500/13804], Loss: 2.7526, Perplexity: 15.6829, time_taken_in_seconds: 730
Epoch [1/1], Step [3600/13804], Loss: 2.5638, Perplexity: 12.9854, time_taken_in_seconds: 73
Epoch [1/1], Step [3700/13804], Loss: 3.2702, Perplexity: 26.3173, time_taken_in_seconds: 73
Epoch [1/1], Step [3800/13804], Loss: 2.5308, Perplexity: 12.5636, time_taken_in_seconds: 73
Epoch [1/1], Step [3900/13804], Loss: 2.6257, Perplexity: 13.8144, time_taken_in_seconds: 73
Epoch [1/1], Step [4000/13804], Loss: 2.4088, Perplexity: 11.1203, time_taken_in_seconds: 73
Epoch [1/1], Step [4100/13804], Loss: 2.5113, Perplexity: 12.3209, time_taken_in_seconds: 72
Epoch [1/1], Step [4200/13804], Loss: 3.0009, Perplexity: 20.1031, time_taken_in_seconds: 73
Epoch [1/1], Step [4300/13804], Loss: 2.5180, Perplexity: 12.4032, time_taken_in_seconds: 73
Epoch [1/1], Step [4400/13804], Loss: 2.4226, Perplexity: 11.2753, time_taken_in_seconds: 73
Epoch [1/1], Step [4500/13804], Loss: 3.0350, Perplexity: 20.8000, time_taken_in_seconds: 72
Epoch [1/1], Step [4529/13804], Loss: 2.6623, Perplexity: 14.3286, time_taken_in_seconds: 21


In [None]:
# for 1 epoch
batch_size = 30          # batch size
vocab_threshold = 5        # minimum word count threshold
vocab_from_file = True    # if True, load existing vocab file
embed_size = 512           # dimensionality of image and word embeddings
hidden_size = 512          # number of features in hidden state of the RNN decoder
num_epochs = 1             # number of training epochs
save_every = 1             # determines frequency of saving model weights
print_every = 100          # determines window for printing average loss
log_file = 'training_log.txt'       # name of file with saved training loss and perplexity


Epoch [1/1], Step [1/13804], Loss: 3.5664, Perplexity: 35.3901, time_taken_in_seconds: 0
Epoch [1/1], Step [2/13804], Loss: 3.6818, Perplexity: 39.7190, time_taken_in_seconds: 1
Epoch [1/1], Step [3/13804], Loss: 3.4363, Perplexity: 31.0722, time_taken_in_seconds: 2
Epoch [1/1], Step [4/13804], Loss: 4.0916, Perplexity: 59.8342, time_taken_in_seconds: 3
Epoch [1/1], Step [5/13804], Loss: 3.9951, Perplexity: 54.3324, time_taken_in_seconds: 4
Epoch [1/1], Step [6/13804], Loss: 3.3453, Perplexity: 28.3681, time_taken_in_seconds: 4
Epoch [1/1], Step [7/13804], Loss: 3.7065, Perplexity: 40.7095, time_taken_in_seconds: 5
Epoch [1/1], Step [8/13804], Loss: 3.9249, Perplexity: 50.6465, time_taken_in_seconds: 6
Epoch [1/1], Step [9/13804], Loss: 3.6714, Perplexity: 39.3050, time_taken_in_seconds: 7
Epoch [1/1], Step [10/13804], Loss: 3.7028, Perplexity: 40.5619, time_taken_in_seconds: 8
Epoch [1/1], Step [11/13804], Loss: 3.4906, Perplexity: 32.8062, time_taken_in_seconds: 9
Epoch [1/1], Step [12/13804], Loss: 3.2818, Perplexity: 26.6231, time_taken_in_seconds: 10
Epoch [1/1], Step [13/13804], Loss: 3.5055, Perplexity: 33.2977, time_taken_in_seconds: 11
Epoch [1/1], Step [14/13804], Loss: 3.7909, Perplexity: 44.2955, time_taken_in_seconds: 11
Epoch [1/1], Step [15/13804], Loss: 3.5648, Perplexity: 35.3337, time_taken_in_seconds: 12
Epoch [1/1], Step [16/13804], Loss: 3.5079, Perplexity: 33.3774, time_taken_in_seconds: 13
Epoch [1/1], Step [17/13804], Loss: 3.5395, Perplexity: 34.4496, time_taken_in_seconds: 14
Epoch [1/1], Step [18/13804], Loss: 3.6545, Perplexity: 38.6475, time_taken_in_seconds: 15
Epoch [1/1], Step [19/13804], Loss: 4.2545, Perplexity: 70.4250, time_taken_in_seconds: 16
Epoch [1/1], Step [20/13804], Loss: 3.8253, Perplexity: 45.8468, time_taken_in_seconds: 17
Epoch [1/1], Step [21/13804], Loss: 3.4213, Perplexity: 30.6093, time_taken_in_seconds: 18
Epoch [1/1], Step [22/13804], Loss: 4.1212, Perplexity: 61.6326, time_taken_in_seconds: 18
Epoch [1/1], Step [23/13804], Loss: 3.6700, Perplexity: 39.2509, time_taken_in_seconds: 19
Epoch [1/1], Step [24/13804], Loss: 3.6878, Perplexity: 39.9574, time_taken_in_seconds: 20
Epoch [1/1], Step [25/13804], Loss: 3.7459, Perplexity: 42.3451, time_taken_in_seconds: 21
Epoch [1/1], Step [26/13804], Loss: 3.7772, Perplexity: 43.6922, time_taken_in_seconds: 22
Epoch [1/1], Step [27/13804], Loss: 3.6756, Perplexity: 39.4733, time_taken_in_seconds: 23
Epoch [1/1], Step [28/13804], Loss: 3.7729, Perplexity: 43.5058, time_taken_in_seconds: 24
Epoch [1/1], Step [29/13804], Loss: 3.4709, Perplexity: 32.1659, time_taken_in_seconds: 25
Epoch [1/1], Step [30/13804], Loss: 3.2425, Perplexity: 25.5982, time_taken_in_seconds: 25
Epoch [1/1], Step [31/13804], Loss: 3.7112, Perplexity: 40.9019, time_taken_in_seconds: 26
Epoch [1/1], Step [32/13804], Loss: 3.4605, Perplexity: 31.8314, time_taken_in_seconds: 27
Epoch [1/1], Step [33/13804], Loss: 3.3831, Perplexity: 29.4616, time_taken_in_seconds: 28
Epoch [1/1], Step [34/13804], Loss: 3.3054, Perplexity: 27.2581, time_taken_in_seconds: 29
Epoch [1/1], Step [35/13804], Loss: 3.3167, Perplexity: 27.5685, time_taken_in_seconds: 30
Epoch [1/1], Step [36/13804], Loss: 3.1887, Perplexity: 24.2575, time_taken_in_seconds: 31
Epoch [1/1], Step [37/13804], Loss: 3.7256, Perplexity: 41.4973, time_taken_in_seconds: 32
Epoch [1/1], Step [38/13804], Loss: 3.4181, Perplexity: 30.5114, time_taken_in_seconds: 32
Epoch [1/1], Step [39/13804], Loss: 3.4130, Perplexity: 30.3554, time_taken_in_seconds: 33
Epoch [1/1], Step [40/13804], Loss: 3.6867, Perplexity: 39.9146, time_taken_in_seconds: 34
Epoch [1/1], Step [41/13804], Loss: 3.4947, Perplexity: 32.9401, time_taken_in_seconds: 35
Epoch [1/1], Step [42/13804], Loss: 3.4808, Perplexity: 32.4855, time_taken_in_seconds: 36
Epoch [1/1], Step [43/13804], Loss: 3.5836, Perplexity: 36.0046, time_taken_in_seconds: 37
Epoch [1/1], Step [44/13804], Loss: 3.3019, Perplexity: 27.1642, time_taken_in_seconds: 38
Epoch [1/1], Step [45/13804], Loss: 3.7417, Perplexity: 42.1711, time_taken_in_seconds: 39
Epoch [1/1], Step [46/13804], Loss: 4.2758, Perplexity: 71.9350, time_taken_in_seconds: 39
Epoch [1/1], Step [47/13804], Loss: 3.6749, Perplexity: 39.4448, time_taken_in_seconds: 40
Epoch [1/1], Step [48/13804], Loss: 3.5867, Perplexity: 36.1145, time_taken_in_seconds: 41
Epoch [1/1], Step [49/13804], Loss: 3.7916, Perplexity: 44.3255, time_taken_in_seconds: 42
Epoch [1/1], Step [50/13804], Loss: 3.5649, Perplexity: 35.3343, time_taken_in_seconds: 43
Epoch [1/1], Step [51/13804], Loss: 3.1455, Perplexity: 23.2306, time_taken_in_seconds: 44
Epoch [1/1], Step [52/13804], Loss: 3.6260, Perplexity: 37.5632, time_taken_in_seconds: 45
Epoch [1/1], Step [53/13804], Loss: 3.6347, Perplexity: 37.8913, time_taken_in_seconds: 46
Epoch [1/1], Step [54/13804], Loss: 3.8683, Perplexity: 47.8629, time_taken_in_seconds: 46
Epoch [1/1], Step [55/13804], Loss: 3.3017, Perplexity: 27.1586, time_taken_in_seconds: 47
Epoch [1/1], Step [56/13804], Loss: 3.6262, Perplexity: 37.5699, time_taken_in_seconds: 48
Epoch [1/1], Step [57/13804], Loss: 3.6771, Perplexity: 39.5297, time_taken_in_seconds: 49
Epoch [1/1], Step [58/13804], Loss: 3.7118, Perplexity: 40.9257, time_taken_in_seconds: 50
Epoch [1/1], Step [59/13804], Loss: 3.6432, Perplexity: 38.2148, time_taken_in_seconds: 51
Epoch [1/1], Step [60/13804], Loss: 3.9807, Perplexity: 53.5542, time_taken_in_seconds: 52
Epoch [1/1], Step [61/13804], Loss: 3.5239, Perplexity: 33.9167, time_taken_in_seconds: 53
Epoch [1/1], Step [62/13804], Loss: 3.7021, Perplexity: 40.5327, time_taken_in_seconds: 53
Epoch [1/1], Step [63/13804], Loss: 3.7214, Perplexity: 41.3231, time_taken_in_seconds: 54
Epoch [1/1], Step [64/13804], Loss: 3.5982, Perplexity: 36.5337, time_taken_in_seconds: 55
Epoch [1/1], Step [65/13804], Loss: 3.5815, Perplexity: 35.9266, time_taken_in_seconds: 56
Epoch [1/1], Step [66/13804], Loss: 3.9075, Perplexity: 49.7765, time_taken_in_seconds: 57
Epoch [1/1], Step [67/13804], Loss: 3.7569, Perplexity: 42.8158, time_taken_in_seconds: 58
Epoch [1/1], Step [68/13804], Loss: 3.7874, Perplexity: 44.1393, time_taken_in_seconds: 59
Epoch [1/1], Step [69/13804], Loss: 3.4409, Perplexity: 31.2142, time_taken_in_seconds: 60
Epoch [1/1], Step [70/13804], Loss: 3.6066, Perplexity: 36.8389, time_taken_in_seconds: 61
Epoch [1/1], Step [71/13804], Loss: 3.4672, Perplexity: 32.0476, time_taken_in_seconds: 61
Epoch [1/1], Step [72/13804], Loss: 3.6912, Perplexity: 40.0912, time_taken_in_seconds: 62
Epoch [1/1], Step [73/13804], Loss: 3.4549, Perplexity: 31.6543, time_taken_in_seconds: 63
Epoch [1/1], Step [74/13804], Loss: 3.7910, Perplexity: 44.3013, time_taken_in_seconds: 64
Epoch [1/1], Step [75/13804], Loss: 3.6239, Perplexity: 37.4825, time_taken_in_seconds: 65
Epoch [1/1], Step [76/13804], Loss: 3.4501, Perplexity: 31.5041, time_taken_in_seconds: 66
Epoch [1/1], Step [77/13804], Loss: 3.0218, Perplexity: 20.5284, time_taken_in_seconds: 67
Epoch [1/1], Step [78/13804], Loss: 3.5546, Perplexity: 34.9746, time_taken_in_seconds: 68
Epoch [1/1], Step [79/13804], Loss: 3.5081, Perplexity: 33.3843, time_taken_in_seconds: 68
Epoch [1/1], Step [80/13804], Loss: 3.6176, Perplexity: 37.2495, time_taken_in_seconds: 69
Epoch [1/1], Step [81/13804], Loss: 3.3992, Perplexity: 29.9410, time_taken_in_seconds: 70
Epoch [1/1], Step [82/13804], Loss: 3.3026, Perplexity: 27.1820, time_taken_in_seconds: 71
Epoch [1/1], Step [83/13804], Loss: 3.1725, Perplexity: 23.8680, time_taken_in_seconds: 72
Epoch [1/1], Step [84/13804], Loss: 3.6088, Perplexity: 36.9215, time_taken_in_seconds: 73
Epoch [1/1], Step [85/13804], Loss: 3.3718, Perplexity: 29.1310, time_taken_in_seconds: 74
Epoch [1/1], Step [86/13804], Loss: 3.1035, Perplexity: 22.2747, time_taken_in_seconds: 75
Epoch [1/1], Step [87/13804], Loss: 3.7010, Perplexity: 40.4874, time_taken_in_seconds: 76
Epoch [1/1], Step [88/13804], Loss: 3.6150, Perplexity: 37.1513, time_taken_in_seconds: 76
Epoch [1/1], Step [89/13804], Loss: 3.7616, Perplexity: 43.0190, time_taken_in_seconds: 77
Epoch [1/1], Step [90/13804], Loss: 3.8545, Perplexity: 47.2031, time_taken_in_seconds: 78
Epoch [1/1], Step [91/13804], Loss: 4.3726, Perplexity: 79.2489, time_taken_in_seconds: 79
Epoch [1/1], Step [92/13804], Loss: 3.7147, Perplexity: 41.0478, time_taken_in_seconds: 80
Epoch [1/1], Step [93/13804], Loss: 3.5542, Perplexity: 34.9608, time_taken_in_seconds: 81
Epoch [1/1], Step [94/13804], Loss: 3.3868, Perplexity: 29.5704, time_taken_in_seconds: 82
Epoch [1/1], Step [95/13804], Loss: 3.9226, Perplexity: 50.5319, time_taken_in_seconds: 83
Epoch [1/1], Step [96/13804], Loss: 3.7553, Perplexity: 42.7457, time_taken_in_seconds: 84
Epoch [1/1], Step [97/13804], Loss: 3.6698, Perplexity: 39.2431, time_taken_in_seconds: 84
Epoch [1/1], Step [98/13804], Loss: 3.6490, Perplexity: 38.4376, time_taken_in_seconds: 85
Epoch [1/1], Step [99/13804], Loss: 3.8446, Perplexity: 46.7419, time_taken_in_seconds: 86
Epoch [1/1], Step [100/13804], Loss: 3.3599, Perplexity: 28.7870, time_taken_in_seconds: 87
Epoch [1/1], Step [101/13804], Loss: 3.5152, Perplexity: 33.6225, time_taken_in_seconds: 0
Epoch [1/1], Step [102/13804], Loss: 3.3270, Perplexity: 27.8551, time_taken_in_seconds: 1
Epoch [1/1], Step [103/13804], Loss: 4.4390, Perplexity: 84.6932, time_taken_in_seconds: 2
Epoch [1/1], Step [104/13804], Loss: 3.3035, Perplexity: 27.2068, time_taken_in_seconds: 3
Epoch [1/1], Step [105/13804], Loss: 3.4502, Perplexity: 31.5059, time_taken_in_seconds: 4
Epoch [1/1], Step [106/13804], Loss: 4.1936, Perplexity: 66.2610, time_taken_in_seconds: 5
Epoch [1/1], Step [107/13804], Loss: 3.4603, Perplexity: 31.8275, time_taken_in_seconds: 6
Epoch [1/1], Step [108/13804], Loss: 3.2639, Perplexity: 26.1514, time_taken_in_seconds: 7
Epoch [1/1], Step [109/13804], Loss: 3.5134, Perplexity: 33.5622, time_taken_in_seconds: 7
Epoch [1/1], Step [110/13804], Loss: 3.6583, Perplexity: 38.7945, time_taken_in_seconds: 8
Epoch [1/1], Step [111/13804], Loss: 3.5861, Perplexity: 36.0939, time_taken_in_seconds: 9
Epoch [1/1], Step [112/13804], Loss: 3.8256, Perplexity: 45.8608, time_taken_in_seconds: 10
Epoch [1/1], Step [113/13804], Loss: 3.4880, Perplexity: 32.7196, time_taken_in_seconds: 11
Epoch [1/1], Step [114/13804], Loss: 3.5973, Perplexity: 36.4999, time_taken_in_seconds: 12
Epoch [1/1], Step [115/13804], Loss: 3.6136, Perplexity: 37.0988, time_taken_in_seconds: 13
Epoch [1/1], Step [116/13804], Loss: 3.4569, Perplexity: 31.7200, time_taken_in_seconds: 14
Epoch [1/1], Step [117/13804], Loss: 3.7481, Perplexity: 42.4412, time_taken_in_seconds: 15
Epoch [1/1], Step [118/13804], Loss: 3.3956, Perplexity: 29.8319, time_taken_in_seconds: 15
Epoch [1/1], Step [119/13804], Loss: 4.0438, Perplexity: 57.0436, time_taken_in_seconds: 16
Epoch [1/1], Step [120/13804], Loss: 3.5303, Perplexity: 34.1359, time_taken_in_seconds: 17
Epoch [1/1], Step [121/13804], Loss: 3.4272, Perplexity: 30.7901, time_taken_in_seconds: 18
Epoch [1/1], Step [122/13804], Loss: 3.6474, Perplexity: 38.3740, time_taken_in_seconds: 19
Epoch [1/1], Step [123/13804], Loss: 3.4427, Perplexity: 31.2710, time_taken_in_seconds: 20
Epoch [1/1], Step [124/13804], Loss: 3.6770, Perplexity: 39.5261, time_taken_in_seconds: 21
Epoch [1/1], Step [125/13804], Loss: 3.5928, Perplexity: 36.3365, time_taken_in_seconds: 22
Epoch [1/1], Step [126/13804], Loss: 3.4110, Perplexity: 30.2960, time_taken_in_seconds: 22
Epoch [1/1], Step [127/13804], Loss: 3.4434, Perplexity: 31.2936, time_taken_in_seconds: 23
Epoch [1/1], Step [128/13804], Loss: 3.3589, Perplexity: 28.7576, time_taken_in_seconds: 24
Epoch [1/1], Step [129/13804], Loss: 3.9697, Perplexity: 52.9706, time_taken_in_seconds: 25
Epoch [1/1], Step [130/13804], Loss: 3.6344, Perplexity: 37.8790, time_taken_in_seconds: 26
Epoch [1/1], Step [131/13804], Loss: 3.5675, Perplexity: 35.4297, time_taken_in_seconds: 27
Epoch [1/1], Step [132/13804], Loss: 3.3852, Perplexity: 29.5236, time_taken_in_seconds: 28
Epoch [1/1], Step [133/13804], Loss: 3.6482, Perplexity: 38.4053, time_taken_in_seconds: 29
Epoch [1/1], Step [134/13804], Loss: 3.4260, Perplexity: 30.7545, time_taken_in_seconds: 29
Epoch [1/1], Step [135/13804], Loss: 3.5728, Perplexity: 35.6156, time_taken_in_seconds: 30
Epoch [1/1], Step [136/13804], Loss: 3.7386, Perplexity: 42.0394, time_taken_in_seconds: 31
Epoch [1/1], Step [137/13804], Loss: 3.4749, Perplexity: 32.2936, time_taken_in_seconds: 32
Epoch [1/1], Step [138/13804], Loss: 3.4826, Perplexity: 32.5446, time_taken_in_seconds: 33
Epoch [1/1], Step [139/13804], Loss: 3.5680, Perplexity: 35.4464, time_taken_in_seconds: 34
Epoch [1/1], Step [140/13804], Loss: 3.5905, Perplexity: 36.2538, time_taken_in_seconds: 35
Epoch [1/1], Step [141/13804], Loss: 3.5875, Perplexity: 36.1444, time_taken_in_seconds: 36
Epoch [1/1], Step [142/13804], Loss: 3.2613, Perplexity: 26.0827, time_taken_in_seconds: 37
Epoch [1/1], Step [143/13804], Loss: 3.2681, Perplexity: 26.2617, time_taken_in_seconds: 38
Epoch [1/1], Step [144/13804], Loss: 3.2325, Perplexity: 25.3437, time_taken_in_seconds: 39
Epoch [1/1], Step [145/13804], Loss: 3.9395, Perplexity: 51.3945, time_taken_in_seconds: 39
Epoch [1/1], Step [146/13804], Loss: 3.6202, Perplexity: 37.3455, time_taken_in_seconds: 40
Epoch [1/1], Step [147/13804], Loss: 3.7896, Perplexity: 44.2378, time_taken_in_seconds: 41
Epoch [1/1], Step [148/13804], Loss: 3.6775, Perplexity: 39.5459, time_taken_in_seconds: 42
Epoch [1/1], Step [149/13804], Loss: 3.7846, Perplexity: 44.0201, time_taken_in_seconds: 43
Epoch [1/1], Step [150/13804], Loss: 3.0276, Perplexity: 20.6468, time_taken_in_seconds: 44
Epoch [1/1], Step [151/13804], Loss: 3.4418, Perplexity: 31.2416, time_taken_in_seconds: 45
Epoch [1/1], Step [152/13804], Loss: 4.1433, Perplexity: 63.0101, time_taken_in_seconds: 46
Epoch [1/1], Step [153/13804], Loss: 3.3157, Perplexity: 27.5407, time_taken_in_seconds: 46
Epoch [1/1], Step [154/13804], Loss: 3.4935, Perplexity: 32.9021, time_taken_in_seconds: 47
Epoch [1/1], Step [155/13804], Loss: 3.6418, Perplexity: 38.1596, time_taken_in_seconds: 48
Epoch [1/1], Step [156/13804], Loss: 3.5797, Perplexity: 35.8626, time_taken_in_seconds: 49
Epoch [1/1], Step [157/13804], Loss: 3.6347, Perplexity: 37.8902, time_taken_in_seconds: 50
Epoch [1/1], Step [158/13804], Loss: 3.3160, Perplexity: 27.5506, time_taken_in_seconds: 51
Epoch [1/1], Step [159/13804], Loss: 3.3506, Perplexity: 28.5211, time_taken_in_seconds: 52
Epoch [1/1], Step [160/13804], Loss: 3.1091, Perplexity: 22.4008, time_taken_in_seconds: 53
Epoch [1/1], Step [161/13804], Loss: 3.8085, Perplexity: 45.0815, time_taken_in_seconds: 53
Epoch [1/1], Step [162/13804], Loss: 3.4540, Perplexity: 31.6273, time_taken_in_seconds: 54
Epoch [1/1], Step [163/13804], Loss: 3.5984, Perplexity: 36.5387, time_taken_in_seconds: 55
Epoch [1/1], Step [164/13804], Loss: 3.3385, Perplexity: 28.1768, time_taken_in_seconds: 56
Epoch [1/1], Step [165/13804], Loss: 2.9960, Perplexity: 20.0048, time_taken_in_seconds: 57
Epoch [1/1], Step [166/13804], Loss: 3.5312, Perplexity: 34.1650, time_taken_in_seconds: 58
Epoch [1/1], Step [167/13804], Loss: 3.7852, Perplexity: 44.0441, time_taken_in_seconds: 59
Epoch [1/1], Step [168/13804], Loss: 3.4087, Perplexity: 30.2249, time_taken_in_seconds: 59
Epoch [1/1], Step [169/13804], Loss: 3.3797, Perplexity: 29.3631, time_taken_in_seconds: 60
Epoch [1/1], Step [170/13804], Loss: 3.5712, Perplexity: 35.5603, time_taken_in_seconds: 61
Epoch [1/1], Step [171/13804], Loss: 3.9628, Perplexity: 52.6022, time_taken_in_seconds: 62
Epoch [1/1], Step [172/13804], Loss: 3.7031, Perplexity: 40.5741, time_taken_in_seconds: 63
Epoch [1/1], Step [173/13804], Loss: 3.4454, Perplexity: 31.3558, time_taken_in_seconds: 64
Epoch [1/1], Step [174/13804], Loss: 3.5233, Perplexity: 33.8955, time_taken_in_seconds: 65
Epoch [1/1], Step [175/13804], Loss: 3.7280, Perplexity: 41.5940, time_taken_in_seconds: 66
Epoch [1/1], Step [176/13804], Loss: 3.1245, Perplexity: 22.7485, time_taken_in_seconds: 66
Epoch [1/1], Step [177/13804], Loss: 3.6023, Perplexity: 36.6809, time_taken_in_seconds: 67
Epoch [1/1], Step [178/13804], Loss: 3.3104, Perplexity: 27.3949, time_taken_in_seconds: 68
Epoch [1/1], Step [179/13804], Loss: 3.4934, Perplexity: 32.8977, time_taken_in_seconds: 69
Epoch [1/1], Step [180/13804], Loss: 3.9053, Perplexity: 49.6671, time_taken_in_seconds: 70
Epoch [1/1], Step [181/13804], Loss: 3.1509, Perplexity: 23.3575, time_taken_in_seconds: 71
Epoch [1/1], Step [182/13804], Loss: 3.4083, Perplexity: 30.2130, time_taken_in_seconds: 72
Epoch [1/1], Step [183/13804], Loss: 3.6746, Perplexity: 39.4339, time_taken_in_seconds: 73
Epoch [1/1], Step [184/13804], Loss: 3.3916, Perplexity: 29.7123, time_taken_in_seconds: 73
Epoch [1/1], Step [185/13804], Loss: 3.4389, Perplexity: 31.1520, time_taken_in_seconds: 74
Epoch [1/1], Step [186/13804], Loss: 4.3419, Perplexity: 76.8523, time_taken_in_seconds: 75
Epoch [1/1], Step [187/13804], Loss: 3.3641, Perplexity: 28.9074, time_taken_in_seconds: 76
Epoch [1/1], Step [188/13804], Loss: 4.1483, Perplexity: 63.3293, time_taken_in_seconds: 77
Epoch [1/1], Step [189/13804], Loss: 3.2253, Perplexity: 25.1605, time_taken_in_seconds: 78
Epoch [1/1], Step [190/13804], Loss: 3.4262, Perplexity: 30.7609, time_taken_in_seconds: 79
Epoch [1/1], Step [191/13804], Loss: 3.5505, Perplexity: 34.8306, time_taken_in_seconds: 80
Epoch [1/1], Step [192/13804], Loss: 3.6926, Perplexity: 40.1505, time_taken_in_seconds: 80
Epoch [1/1], Step [193/13804], Loss: 3.0498, Perplexity: 21.1112, time_taken_in_seconds: 81
Epoch [1/1], Step [194/13804], Loss: 3.1167, Perplexity: 22.5714, time_taken_in_seconds: 82
Epoch [1/1], Step [195/13804], Loss: 3.1289, Perplexity: 22.8480, time_taken_in_seconds: 83
Epoch [1/1], Step [196/13804], Loss: 3.2923, Perplexity: 26.9039, time_taken_in_seconds: 84
Epoch [1/1], Step [197/13804], Loss: 3.6066, Perplexity: 36.8424, time_taken_in_seconds: 85
Epoch [1/1], Step [198/13804], Loss: 3.3995, Perplexity: 29.9484, time_taken_in_seconds: 86
Epoch [1/1], Step [199/13804], Loss: 3.5569, Perplexity: 35.0534, time_taken_in_seconds: 87
Epoch [1/1], Step [200/13804], Loss: 3.4220, Perplexity: 30.6305, time_taken_in_seconds: 87
Epoch [1/1], Step [201/13804], Loss: 3.2191, Perplexity: 25.0068, time_taken_in_seconds: 0
Epoch [1/1], Step [202/13804], Loss: 3.7367, Perplexity: 41.9574, time_taken_in_seconds: 1
Epoch [1/1], Step [203/13804], Loss: 3.4300, Perplexity: 30.8778, time_taken_in_seconds: 2
Epoch [1/1], Step [204/13804], Loss: 3.4066, Perplexity: 30.1628, time_taken_in_seconds: 3
Epoch [1/1], Step [205/13804], Loss: 3.5831, Perplexity: 35.9846, time_taken_in_seconds: 4
Epoch [1/1], Step [206/13804], Loss: 3.3647, Perplexity: 28.9258, time_taken_in_seconds: 5
Epoch [1/1], Step [207/13804], Loss: 3.9586, Perplexity: 52.3862, time_taken_in_seconds: 6
Epoch [1/1], Step [208/13804], Loss: 3.7741, Perplexity: 43.5596, time_taken_in_seconds: 7
Epoch [1/1], Step [209/13804], Loss: 3.3182, Perplexity: 27.6111, time_taken_in_seconds: 8
Epoch [1/1], Step [210/13804], Loss: 3.4543, Perplexity: 31.6356, time_taken_in_seconds: 8
Epoch [1/1], Step [211/13804], Loss: 3.2484, Perplexity: 25.7502, time_taken_in_seconds: 9
Epoch [1/1], Step [212/13804], Loss: 3.4556, Perplexity: 31.6761, time_taken_in_seconds: 10
Epoch [1/1], Step [213/13804], Loss: 3.8084, Perplexity: 45.0765, time_taken_in_seconds: 11
Epoch [1/1], Step [214/13804], Loss: 3.4926, Perplexity: 32.8723, time_taken_in_seconds: 12
Epoch [1/1], Step [215/13804], Loss: 4.4735, Perplexity: 87.6603, time_taken_in_seconds: 13
Epoch [1/1], Step [216/13804], Loss: 3.5052, Perplexity: 33.2882, time_taken_in_seconds: 14
Epoch [1/1], Step [217/13804], Loss: 3.3219, Perplexity: 27.7119, time_taken_in_seconds: 15
Epoch [1/1], Step [218/13804], Loss: 4.6034, Perplexity: 99.8211, time_taken_in_seconds: 16
Epoch [1/1], Step [219/13804], Loss: 3.2224, Perplexity: 25.0874, time_taken_in_seconds: 16
Epoch [1/1], Step [220/13804], Loss: 3.7331, Perplexity: 41.8083, time_taken_in_seconds: 17
Epoch [1/1], Step [221/13804], Loss: 3.2776, Perplexity: 26.5109, time_taken_in_seconds: 18
Epoch [1/1], Step [222/13804], Loss: 3.4448, Perplexity: 31.3371, time_taken_in_seconds: 19
Epoch [1/1], Step [223/13804], Loss: 3.3589, Perplexity: 28.7563, time_taken_in_seconds: 20
Epoch [1/1], Step [224/13804], Loss: 3.3738, Perplexity: 29.1899, time_taken_in_seconds: 21
Epoch [1/1], Step [225/13804], Loss: 3.5645, Perplexity: 35.3235, time_taken_in_seconds: 22
Epoch [1/1], Step [226/13804], Loss: 3.7006, Perplexity: 40.4722, time_taken_in_seconds: 23
Epoch [1/1], Step [227/13804], Loss: 3.1352, Perplexity: 22.9928, time_taken_in_seconds: 23
Epoch [1/1], Step [228/13804], Loss: 3.2583, Perplexity: 26.0045, time_taken_in_seconds: 24
Epoch [1/1], Step [229/13804], Loss: 3.4118, Perplexity: 30.3208, time_taken_in_seconds: 25
Epoch [1/1], Step [230/13804], Loss: 3.4529, Perplexity: 31.5907, time_taken_in_seconds: 26
Epoch [1/1], Step [231/13804], Loss: 3.1313, Perplexity: 22.9046, time_taken_in_seconds: 27
Epoch [1/1], Step [232/13804], Loss: 3.3444, Perplexity: 28.3437, time_taken_in_seconds: 28
Epoch [1/1], Step [233/13804], Loss: 3.3689, Perplexity: 29.0460, time_taken_in_seconds: 29
Epoch [1/1], Step [234/13804], Loss: 3.5930, Perplexity: 36.3429, time_taken_in_seconds: 30
Epoch [1/1], Step [235/13804], Loss: 3.3750, Perplexity: 29.2256, time_taken_in_seconds: 30
Epoch [1/1], Step [236/13804], Loss: 3.2237, Perplexity: 25.1211, time_taken_in_seconds: 31
Epoch [1/1], Step [237/13804], Loss: 3.3499, Perplexity: 28.5010, time_taken_in_seconds: 32
Epoch [1/1], Step [238/13804], Loss: 3.5021, Perplexity: 33.1848, time_taken_in_seconds: 33
Epoch [1/1], Step [239/13804], Loss: 3.6872, Perplexity: 39.9314, time_taken_in_seconds: 34
Epoch [1/1], Step [240/13804], Loss: 3.7037, Perplexity: 40.5967, time_taken_in_seconds: 35
Epoch [1/1], Step [241/13804], Loss: 3.5730, Perplexity: 35.6237, time_taken_in_seconds: 36
Epoch [1/1], Step [242/13804], Loss: 3.5394, Perplexity: 34.4478, time_taken_in_seconds: 37
Epoch [1/1], Step [243/13804], Loss: 3.1762, Perplexity: 23.9567, time_taken_in_seconds: 37
Epoch [1/1], Step [244/13804], Loss: 3.3437, Perplexity: 28.3232, time_taken_in_seconds: 38
Epoch [1/1], Step [245/13804], Loss: 3.6612, Perplexity: 38.9085, time_taken_in_seconds: 39
Epoch [1/1], Step [246/13804], Loss: 3.6089, Perplexity: 36.9252, time_taken_in_seconds: 40
Epoch [1/1], Step [247/13804], Loss: 3.1695, Perplexity: 23.7954, time_taken_in_seconds: 41
Epoch [1/1], Step [248/13804], Loss: 3.0381, Perplexity: 20.8661, time_taken_in_seconds: 42
Epoch [1/1], Step [249/13804], Loss: 3.2963, Perplexity: 27.0130, time_taken_in_seconds: 43
Epoch [1/1], Step [250/13804], Loss: 3.2360, Perplexity: 25.4318, time_taken_in_seconds: 44
Epoch [1/1], Step [251/13804], Loss: 3.6003, Perplexity: 36.6075, time_taken_in_seconds: 44
Epoch [1/1], Step [252/13804], Loss: 3.3003, Perplexity: 27.1219, time_taken_in_seconds: 45
Epoch [1/1], Step [253/13804], Loss: 2.9618, Perplexity: 19.3319, time_taken_in_seconds: 46
Epoch [1/1], Step [254/13804], Loss: 3.3856, Perplexity: 29.5350, time_taken_in_seconds: 47
Epoch [1/1], Step [255/13804], Loss: 3.7224, Perplexity: 41.3639, time_taken_in_seconds: 48
Epoch [1/1], Step [256/13804], Loss: 3.7992, Perplexity: 44.6665, time_taken_in_seconds: 49
Epoch [1/1], Step [257/13804], Loss: 3.1324, Perplexity: 22.9300, time_taken_in_seconds: 50
Epoch [1/1], Step [258/13804], Loss: 3.1327, Perplexity: 22.9368, time_taken_in_seconds: 51
Epoch [1/1], Step [259/13804], Loss: 3.0051, Perplexity: 20.1874, time_taken_in_seconds: 51
Epoch [1/1], Step [260/13804], Loss: 3.5485, Perplexity: 34.7627, time_taken_in_seconds: 52
Epoch [1/1], Step [261/13804], Loss: 3.2358, Perplexity: 25.4276, time_taken_in_seconds: 53
Epoch [1/1], Step [262/13804], Loss: 3.3963, Perplexity: 29.8546, time_taken_in_seconds: 54
Epoch [1/1], Step [263/13804], Loss: 3.2909, Perplexity: 26.8683, time_taken_in_seconds: 55
Epoch [1/1], Step [264/13804], Loss: 3.5837, Perplexity: 36.0078, time_taken_in_seconds: 56
Epoch [1/1], Step [265/13804], Loss: 3.4184, Perplexity: 30.5213, time_taken_in_seconds: 57
Epoch [1/1], Step [266/13804], Loss: 3.2190, Perplexity: 25.0036, time_taken_in_seconds: 57
Epoch [1/1], Step [267/13804], Loss: 3.6781, Perplexity: 39.5698, time_taken_in_seconds: 58
Epoch [1/1], Step [268/13804], Loss: 3.2075, Perplexity: 24.7179, time_taken_in_seconds: 59
Epoch [1/1], Step [269/13804], Loss: 3.5022, Perplexity: 33.1873, time_taken_in_seconds: 60
Epoch [1/1], Step [270/13804], Loss: 3.1752, Perplexity: 23.9313, time_taken_in_seconds: 61
Epoch [1/1], Step [271/13804], Loss: 3.4375, Perplexity: 31.1099, time_taken_in_seconds: 62
Epoch [1/1], Step [272/13804], Loss: 3.9002, Perplexity: 49.4107, time_taken_in_seconds: 63
Epoch [1/1], Step [273/13804], Loss: 3.3228, Perplexity: 27.7379, time_taken_in_seconds: 64
Epoch [1/1], Step [274/13804], Loss: 3.2037, Perplexity: 24.6240, time_taken_in_seconds: 64
Epoch [1/1], Step [275/13804], Loss: 3.1209, Perplexity: 22.6678, time_taken_in_seconds: 65
Epoch [1/1], Step [276/13804], Loss: 3.5447, Perplexity: 34.6278, time_taken_in_seconds: 66
Epoch [1/1], Step [277/13804], Loss: 3.4605, Perplexity: 31.8320, time_taken_in_seconds: 67
Epoch [1/1], Step [278/13804], Loss: 3.3108, Perplexity: 27.4058, time_taken_in_seconds: 68
Epoch [1/1], Step [279/13804], Loss: 3.0019, Perplexity: 20.1247, time_taken_in_seconds: 69
Epoch [1/1], Step [280/13804], Loss: 3.6222, Perplexity: 37.4188, time_taken_in_seconds: 70
Epoch [1/1], Step [281/13804], Loss: 3.2233, Perplexity: 25.1107, time_taken_in_seconds: 71
Epoch [1/1], Step [282/13804], Loss: 3.3679, Perplexity: 29.0174, time_taken_in_seconds: 72
Epoch [1/1], Step [283/13804], Loss: 3.0889, Perplexity: 21.9537, time_taken_in_seconds: 72
Epoch [1/1], Step [284/13804], Loss: 3.1093, Perplexity: 22.4064, time_taken_in_seconds: 73
Epoch [1/1], Step [285/13804], Loss: 3.0290, Perplexity: 20.6767, time_taken_in_seconds: 74
Epoch [1/1], Step [286/13804], Loss: 3.0620, Perplexity: 21.3705, time_taken_in_seconds: 75
Epoch [1/1], Step [287/13804], Loss: 3.5570, Perplexity: 35.0573, time_taken_in_seconds: 76
Epoch [1/1], Step [288/13804], Loss: 3.6515, Perplexity: 38.5325, time_taken_in_seconds: 77
Epoch [1/1], Step [289/13804], Loss: 3.1556, Perplexity: 23.4682, time_taken_in_seconds: 78
Epoch [1/1], Step [290/13804], Loss: 3.1496, Perplexity: 23.3265, time_taken_in_seconds: 79
Epoch [1/1], Step [291/13804], Loss: 3.6933, Perplexity: 40.1791, time_taken_in_seconds: 79
Epoch [1/1], Step [292/13804], Loss: 3.5591, Perplexity: 35.1312, time_taken_in_seconds: 80
Epoch [1/1], Step [293/13804], Loss: 3.0355, Perplexity: 20.8124, time_taken_in_seconds: 81
Epoch [1/1], Step [294/13804], Loss: 3.6509, Perplexity: 38.5107, time_taken_in_seconds: 82
Epoch [1/1], Step [295/13804], Loss: 3.6798, Perplexity: 39.6384, time_taken_in_seconds: 83
Epoch [1/1], Step [296/13804], Loss: 3.5329, Perplexity: 34.2216, time_taken_in_seconds: 84
Epoch [1/1], Step [297/13804], Loss: 3.3632, Perplexity: 28.8824, time_taken_in_seconds: 85
Epoch [1/1], Step [298/13804], Loss: 3.3142, Perplexity: 27.5004, time_taken_in_seconds: 86
Epoch [1/1], Step [299/13804], Loss: 3.8410, Perplexity: 46.5734, time_taken_in_seconds: 86
Epoch [1/1], Step [300/13804], Loss: 3.1643, Perplexity: 23.6713, time_taken_in_seconds: 87
Epoch [1/1], Step [301/13804], Loss: 3.5232, Perplexity: 33.8920, time_taken_in_seconds: 0
Epoch [1/1], Step [302/13804], Loss: 3.1361, Perplexity: 23.0140, time_taken_in_seconds: 1
Epoch [1/1], Step [303/13804], Loss: 3.3361, Perplexity: 28.1105, time_taken_in_seconds: 2
Epoch [1/1], Step [304/13804], Loss: 3.1876, Perplexity: 24.2312, time_taken_in_seconds: 3
Epoch [1/1], Step [305/13804], Loss: 3.3749, Perplexity: 29.2224, time_taken_in_seconds: 4
Epoch [1/1], Step [306/13804], Loss: 3.6587, Perplexity: 38.8097, time_taken_in_seconds: 5
Epoch [1/1], Step [307/13804], Loss: 3.4129, Perplexity: 30.3542, time_taken_in_seconds: 6
Epoch [1/1], Step [308/13804], Loss: 3.5315, Perplexity: 34.1760, time_taken_in_seconds: 6
Epoch [1/1], Step [309/13804], Loss: 3.4448, Perplexity: 31.3363, time_taken_in_seconds: 7
Epoch [1/1], Step [310/13804], Loss: 3.2800, Perplexity: 26.5766, time_taken_in_seconds: 8
Epoch [1/1], Step [311/13804], Loss: 3.2272, Perplexity: 25.2097, time_taken_in_seconds: 9
Epoch [1/1], Step [312/13804], Loss: 4.1245, Perplexity: 61.8359, time_taken_in_seconds: 10
Epoch [1/1], Step [313/13804], Loss: 3.6028, Perplexity: 36.7026, time_taken_in_seconds: 11
Epoch [1/1], Step [314/13804], Loss: 3.2313, Perplexity: 25.3120, time_taken_in_seconds: 12
Epoch [1/1], Step [315/13804], Loss: 3.3641, Perplexity: 28.9077, time_taken_in_seconds: 12
Epoch [1/1], Step [316/13804], Loss: 3.3173, Perplexity: 27.5868, time_taken_in_seconds: 13
Epoch [1/1], Step [317/13804], Loss: 2.9926, Perplexity: 19.9377, time_taken_in_seconds: 14
Epoch [1/1], Step [318/13804], Loss: 3.1859, Perplexity: 24.1880, time_taken_in_seconds: 15
Epoch [1/1], Step [319/13804], Loss: 3.2809, Perplexity: 26.5999, time_taken_in_seconds: 16
Epoch [1/1], Step [320/13804], Loss: 3.3549, Perplexity: 28.6430, time_taken_in_seconds: 17
Epoch [1/1], Step [321/13804], Loss: 3.0412, Perplexity: 20.9312, time_taken_in_seconds: 18
Epoch [1/1], Step [322/13804], Loss: 3.6150, Perplexity: 37.1507, time_taken_in_seconds: 19
Epoch [1/1], Step [323/13804], Loss: 3.1993, Perplexity: 24.5152, time_taken_in_seconds: 19
Epoch [1/1], Step [324/13804], Loss: 4.1024, Perplexity: 60.4827, time_taken_in_seconds: 20
Epoch [1/1], Step [325/13804], Loss: 3.3668, Perplexity: 28.9868, time_taken_in_seconds: 21
Epoch [1/1], Step [326/13804], Loss: 3.7004, Perplexity: 40.4644, time_taken_in_seconds: 22
Epoch [1/1], Step [327/13804], Loss: 2.9378, Perplexity: 18.8742, time_taken_in_seconds: 23
Epoch [1/1], Step [328/13804], Loss: 3.2056, Perplexity: 24.6695, time_taken_in_seconds: 24
Epoch [1/1], Step [329/13804], Loss: 3.3913, Perplexity: 29.7043, time_taken_in_seconds: 25
Epoch [1/1], Step [330/13804], Loss: 3.6587, Perplexity: 38.8111, time_taken_in_seconds: 25
Epoch [1/1], Step [331/13804], Loss: 3.1668, Perplexity: 23.7321, time_taken_in_seconds: 26
Epoch [1/1], Step [332/13804], Loss: 3.3984, Perplexity: 29.9158, time_taken_in_seconds: 27
Epoch [1/1], Step [333/13804], Loss: 3.3557, Perplexity: 28.6662, time_taken_in_seconds: 28
Epoch [1/1], Step [334/13804], Loss: 3.2882, Perplexity: 26.7952, time_taken_in_seconds: 29
Epoch [1/1], Step [335/13804], Loss: 3.5367, Perplexity: 34.3541, time_taken_in_seconds: 30
Epoch [1/1], Step [336/13804], Loss: 3.2174, Perplexity: 24.9631, time_taken_in_seconds: 31
Epoch [1/1], Step [337/13804], Loss: 3.2713, Perplexity: 26.3460, time_taken_in_seconds: 32
Epoch [1/1], Step [338/13804], Loss: 3.1426, Perplexity: 23.1629, time_taken_in_seconds: 32
Epoch [1/1], Step [339/13804], Loss: 3.2009, Perplexity: 24.5535, time_taken_in_seconds: 33
Epoch [1/1], Step [340/13804], Loss: 3.1680, Perplexity: 23.7589, time_taken_in_seconds: 34
Epoch [1/1], Step [341/13804], Loss: 3.8076, Perplexity: 45.0407, time_taken_in_seconds: 35
Epoch [1/1], Step [342/13804], Loss: 3.2980, Perplexity: 27.0572, time_taken_in_seconds: 36
Epoch [1/1], Step [343/13804], Loss: 3.1953, Perplexity: 24.4174, time_taken_in_seconds: 37
Epoch [1/1], Step [344/13804], Loss: 3.6146, Perplexity: 37.1381, time_taken_in_seconds: 38
Epoch [1/1], Step [345/13804], Loss: 3.4354, Perplexity: 31.0437, time_taken_in_seconds: 39
Epoch [1/1], Step [346/13804], Loss: 3.4850, Perplexity: 32.6234, time_taken_in_seconds: 40
Epoch [1/1], Step [347/13804], Loss: 2.9773, Perplexity: 19.6347, time_taken_in_seconds: 41
Epoch [1/1], Step [348/13804], Loss: 3.4085, Perplexity: 30.2187, time_taken_in_seconds: 41
Epoch [1/1], Step [349/13804], Loss: 3.3461, Perplexity: 28.3905, time_taken_in_seconds: 42
Epoch [1/1], Step [350/13804], Loss: 3.2044, Perplexity: 24.6414, time_taken_in_seconds: 43
Epoch [1/1], Step [351/13804], Loss: 4.0087, Perplexity: 55.0763, time_taken_in_seconds: 44
Epoch [1/1], Step [352/13804], Loss: 3.5809, Perplexity: 35.9059, time_taken_in_seconds: 45
Epoch [1/1], Step [353/13804], Loss: 3.3725, Perplexity: 29.1499, time_taken_in_seconds: 46
Epoch [1/1], Step [354/13804], Loss: 3.2847, Perplexity: 26.7013, time_taken_in_seconds: 47
Epoch [1/1], Step [355/13804], Loss: 3.7388, Perplexity: 42.0488, time_taken_in_seconds: 48
Epoch [1/1], Step [356/13804], Loss: 3.2348, Perplexity: 25.4020, time_taken_in_seconds: 48
Epoch [1/1], Step [357/13804], Loss: 3.5073, Perplexity: 33.3592, time_taken_in_seconds: 49
Epoch [1/1], Step [358/13804], Loss: 3.5864, Perplexity: 36.1052, time_taken_in_seconds: 50
Epoch [1/1], Step [359/13804], Loss: 3.8000, Perplexity: 44.7014, time_taken_in_seconds: 51
Epoch [1/1], Step [360/13804], Loss: 3.0503, Perplexity: 21.1224, time_taken_in_seconds: 52
Epoch [1/1], Step [361/13804], Loss: 3.6145, Perplexity: 37.1335, time_taken_in_seconds: 53
Epoch [1/1], Step [362/13804], Loss: 3.4329, Perplexity: 30.9667, time_taken_in_seconds: 54
Epoch [1/1], Step [363/13804], Loss: 3.5660, Perplexity: 35.3756, time_taken_in_seconds: 55
Epoch [1/1], Step [364/13804], Loss: 3.4248, Perplexity: 30.7171, time_taken_in_seconds: 55
Epoch [1/1], Step [365/13804], Loss: 4.3325, Perplexity: 76.1379, time_taken_in_seconds: 56
Epoch [1/1], Step [366/13804], Loss: 3.4472, Perplexity: 31.4122, time_taken_in_seconds: 57
Epoch [1/1], Step [367/13804], Loss: 3.4507, Perplexity: 31.5223, time_taken_in_seconds: 58
Epoch [1/1], Step [368/13804], Loss: 3.3005, Perplexity: 27.1258, time_taken_in_seconds: 59
Epoch [1/1], Step [369/13804], Loss: 3.2376, Perplexity: 25.4721, time_taken_in_seconds: 60
Epoch [1/1], Step [370/13804], Loss: 3.6662, Perplexity: 39.1022, time_taken_in_seconds: 61
Epoch [1/1], Step [371/13804], Loss: 3.2532, Perplexity: 25.8739, time_taken_in_seconds: 61
Epoch [1/1], Step [372/13804], Loss: 3.3873, Perplexity: 29.5852, time_taken_in_seconds: 62
Epoch [1/1], Step [373/13804], Loss: 3.3423, Perplexity: 28.2848, time_taken_in_seconds: 63
Epoch [1/1], Step [374/13804], Loss: 3.0956, Perplexity: 22.1010, time_taken_in_seconds: 64
Epoch [1/1], Step [375/13804], Loss: 3.1887, Perplexity: 24.2563, time_taken_in_seconds: 65
Epoch [1/1], Step [376/13804], Loss: 3.7467, Perplexity: 42.3819, time_taken_in_seconds: 66
Epoch [1/1], Step [377/13804], Loss: 3.3089, Perplexity: 27.3558, time_taken_in_seconds: 67
Epoch [1/1], Step [378/13804], Loss: 3.0116, Perplexity: 20.3199, time_taken_in_seconds: 68
Epoch [1/1], Step [379/13804], Loss: 3.3363, Perplexity: 28.1147, time_taken_in_seconds: 68
Epoch [1/1], Step [380/13804], Loss: 3.6946, Perplexity: 40.2308, time_taken_in_seconds: 69
Epoch [1/1], Step [381/13804], Loss: 3.2952, Perplexity: 26.9839, time_taken_in_seconds: 70
Epoch [1/1], Step [382/13804], Loss: 3.5193, Perplexity: 33.7595, time_taken_in_seconds: 71
Epoch [1/1], Step [383/13804], Loss: 4.3260, Perplexity: 75.6442, time_taken_in_seconds: 72
Epoch [1/1], Step [384/13804], Loss: 3.3704, Perplexity: 29.0899, time_taken_in_seconds: 73
Epoch [1/1], Step [385/13804], Loss: 3.5162, Perplexity: 33.6557, time_taken_in_seconds: 74
Epoch [1/1], Step [386/13804], Loss: 3.4537, Perplexity: 31.6182, time_taken_in_seconds: 75
Epoch [1/1], Step [387/13804], Loss: 3.3187, Perplexity: 27.6238, time_taken_in_seconds: 75
Epoch [1/1], Step [388/13804], Loss: 3.1022, Perplexity: 22.2478, time_taken_in_seconds: 76
Epoch [1/1], Step [389/13804], Loss: 3.1242, Perplexity: 22.7416, time_taken_in_seconds: 77
Epoch [1/1], Step [390/13804], Loss: 3.4323, Perplexity: 30.9470, time_taken_in_seconds: 78
Epoch [1/1], Step [391/13804], Loss: 3.7773, Perplexity: 43.6996, time_taken_in_seconds: 79
Epoch [1/1], Step [392/13804], Loss: 3.0266, Perplexity: 20.6280, time_taken_in_seconds: 80
Epoch [1/1], Step [393/13804], Loss: 3.3535, Perplexity: 28.6040, time_taken_in_seconds: 81
Epoch [1/1], Step [394/13804], Loss: 3.3342, Perplexity: 28.0571, time_taken_in_seconds: 82
Epoch [1/1], Step [395/13804], Loss: 3.0753, Perplexity: 21.6563, time_taken_in_seconds: 82
Epoch [1/1], Step [396/13804], Loss: 3.4730, Perplexity: 32.2332, time_taken_in_seconds: 83
Epoch [1/1], Step [397/13804], Loss: 3.3156, Perplexity: 27.5393, time_taken_in_seconds: 84
Epoch [1/1], Step [398/13804], Loss: 3.3745, Perplexity: 29.2106, time_taken_in_seconds: 85
Epoch [1/1], Step [399/13804], Loss: 4.1142, Perplexity: 61.2028, time_taken_in_seconds: 86
Epoch [1/1], Step [400/13804], Loss: 3.3633, Perplexity: 28.8843, time_taken_in_seconds: 87
Epoch [1/1], Step [401/13804], Loss: 3.0239, Perplexity: 20.5724, time_taken_in_seconds: 0
Epoch [1/1], Step [402/13804], Loss: 3.5260, Perplexity: 33.9885, time_taken_in_seconds: 1
Epoch [1/1], Step [403/13804], Loss: 3.1851, Perplexity: 24.1698, time_taken_in_seconds: 2
Epoch [1/1], Step [404/13804], Loss: 3.2287, Perplexity: 25.2468, time_taken_in_seconds: 3
Epoch [1/1], Step [405/13804], Loss: 3.4322, Perplexity: 30.9434, time_taken_in_seconds: 4
Epoch [1/1], Step [406/13804], Loss: 3.6345, Perplexity: 37.8830, time_taken_in_seconds: 5
Epoch [1/1], Step [407/13804], Loss: 3.1549, Perplexity: 23.4499, time_taken_in_seconds: 6
Epoch [1/1], Step [408/13804], Loss: 4.5367, Perplexity: 93.3850, time_taken_in_seconds: 6
Epoch [1/1], Step [409/13804], Loss: 3.4531, Perplexity: 31.5986, time_taken_in_seconds: 7
Epoch [1/1], Step [410/13804], Loss: 3.3459, Perplexity: 28.3859, time_taken_in_seconds: 8
Epoch [1/1], Step [411/13804], Loss: 3.0316, Perplexity: 20.7304, time_taken_in_seconds: 9
Epoch [1/1], Step [412/13804], Loss: 3.5452, Perplexity: 34.6459, time_taken_in_seconds: 10
Epoch [1/1], Step [413/13804], Loss: 3.1578, Perplexity: 23.5182, time_taken_in_seconds: 11
Epoch [1/1], Step [414/13804], Loss: 3.3878, Perplexity: 29.6001, time_taken_in_seconds: 12
Epoch [1/1], Step [415/13804], Loss: 3.1395, Perplexity: 23.0932, time_taken_in_seconds: 13
Epoch [1/1], Step [416/13804], Loss: 2.9477, Perplexity: 19.0613, time_taken_in_seconds: 14
Epoch [1/1], Step [417/13804], Loss: 3.1581, Perplexity: 23.5263, time_taken_in_seconds: 14
Epoch [1/1], Step [418/13804], Loss: 3.1315, Perplexity: 22.9086, time_taken_in_seconds: 15
Epoch [1/1], Step [419/13804], Loss: 3.2840, Perplexity: 26.6828, time_taken_in_seconds: 16
Epoch [1/1], Step [420/13804], Loss: 3.7414, Perplexity: 42.1587, time_taken_in_seconds: 17
Epoch [1/1], Step [421/13804], Loss: 3.5633, Perplexity: 35.2811, time_taken_in_seconds: 18
Epoch [1/1], Step [422/13804], Loss: 3.1549, Perplexity: 23.4507, time_taken_in_seconds: 19
Epoch [1/1], Step [423/13804], Loss: 3.1778, Perplexity: 23.9944, time_taken_in_seconds: 20
Epoch [1/1], Step [424/13804], Loss: 3.4237, Perplexity: 30.6828, time_taken_in_seconds: 20
Epoch [1/1], Step [425/13804], Loss: 3.3355, Perplexity: 28.0918, time_taken_in_seconds: 21
Epoch [1/1], Step [426/13804], Loss: 3.0410, Perplexity: 20.9260, time_taken_in_seconds: 22
Epoch [1/1], Step [427/13804], Loss: 3.2242, Perplexity: 25.1338, time_taken_in_seconds: 23
Epoch [1/1], Step [428/13804], Loss: 3.2624, Perplexity: 26.1119, time_taken_in_seconds: 24
Epoch [1/1], Step [429/13804], Loss: 3.4858, Perplexity: 32.6470, time_taken_in_seconds: 25
Epoch [1/1], Step [430/13804], Loss: 3.2997, Perplexity: 27.1057, time_taken_in_seconds: 26
Epoch [1/1], Step [431/13804], Loss: 3.3780, Perplexity: 29.3108, time_taken_in_seconds: 27
Epoch [1/1], Step [432/13804], Loss: 3.3862, Perplexity: 29.5543, time_taken_in_seconds: 27
Epoch [1/1], Step [433/13804], Loss: 3.8326, Perplexity: 46.1832, time_taken_in_seconds: 28
Epoch [1/1], Step [434/13804], Loss: 3.4001, Perplexity: 29.9681, time_taken_in_seconds: 29
Epoch [1/1], Step [435/13804], Loss: 3.2784, Perplexity: 26.5341, time_taken_in_seconds: 30
Epoch [1/1], Step [436/13804], Loss: 3.5538, Perplexity: 34.9441, time_taken_in_seconds: 31
Epoch [1/1], Step [437/13804], Loss: 3.0452, Perplexity: 21.0146, time_taken_in_seconds: 32
Epoch [1/1], Step [438/13804], Loss: 3.2586, Perplexity: 26.0125, time_taken_in_seconds: 33
Epoch [1/1], Step [439/13804], Loss: 3.4232, Perplexity: 30.6683, time_taken_in_seconds: 33
Epoch [1/1], Step [440/13804], Loss: 3.3622, Perplexity: 28.8522, time_taken_in_seconds: 34
Epoch [1/1], Step [441/13804], Loss: 3.2871, Perplexity: 26.7664, time_taken_in_seconds: 35
Epoch [1/1], Step [442/13804], Loss: 3.3896, Perplexity: 29.6550, time_taken_in_seconds: 36
Epoch [1/1], Step [443/13804], Loss: 2.9559, Perplexity: 19.2194, time_taken_in_seconds: 37
Epoch [1/1], Step [444/13804], Loss: 3.3293, Perplexity: 27.9194, time_taken_in_seconds: 38
Epoch [1/1], Step [445/13804], Loss: 3.3669, Perplexity: 28.9886, time_taken_in_seconds: 39
Epoch [1/1], Step [446/13804], Loss: 3.6356, Perplexity: 37.9250, time_taken_in_seconds: 39
Epoch [1/1], Step [447/13804], Loss: 3.1748, Perplexity: 23.9232, time_taken_in_seconds: 40
Epoch [1/1], Step [448/13804], Loss: 3.3584, Perplexity: 28.7429, time_taken_in_seconds: 41
Epoch [1/1], Step [449/13804], Loss: 3.3494, Perplexity: 28.4871, time_taken_in_seconds: 42
Epoch [1/1], Step [450/13804], Loss: 3.2794, Perplexity: 26.5600, time_taken_in_seconds: 43
Epoch [1/1], Step [451/13804], Loss: 3.0906, Perplexity: 21.9912, time_taken_in_seconds: 44
Epoch [1/1], Step [452/13804], Loss: 3.3187, Perplexity: 27.6239, time_taken_in_seconds: 45
Epoch [1/1], Step [453/13804], Loss: 3.3737, Perplexity: 29.1866, time_taken_in_seconds: 46
Epoch [1/1], Step [454/13804], Loss: 3.3052, Perplexity: 27.2530, time_taken_in_seconds: 46
Epoch [1/1], Step [455/13804], Loss: 3.3397, Perplexity: 28.2095, time_taken_in_seconds: 47
Epoch [1/1], Step [456/13804], Loss: 3.0184, Perplexity: 20.4592, time_taken_in_seconds: 48
Epoch [1/1], Step [457/13804], Loss: 3.1650, Perplexity: 23.6884, time_taken_in_seconds: 49
Epoch [1/1], Step [458/13804], Loss: 3.8649, Perplexity: 47.6975, time_taken_in_seconds: 50
Epoch [1/1], Step [459/13804], Loss: 3.1924, Perplexity: 24.3460, time_taken_in_seconds: 51
Epoch [1/1], Step [460/13804], Loss: 3.1990, Perplexity: 24.5088, time_taken_in_seconds: 52
Epoch [1/1], Step [461/13804], Loss: 3.4480, Perplexity: 31.4382, time_taken_in_seconds: 53
Epoch [1/1], Step [462/13804], Loss: 3.2241, Perplexity: 25.1301, time_taken_in_seconds: 53
Epoch [1/1], Step [463/13804], Loss: 3.6361, Perplexity: 37.9438, time_taken_in_seconds: 54
Epoch [1/1], Step [464/13804], Loss: 3.9581, Perplexity: 52.3584, time_taken_in_seconds: 55
Epoch [1/1], Step [465/13804], Loss: 2.9440, Perplexity: 18.9916, time_taken_in_seconds: 56
Epoch [1/1], Step [466/13804], Loss: 3.3087, Perplexity: 27.3491, time_taken_in_seconds: 57
Epoch [1/1], Step [467/13804], Loss: 3.5275, Perplexity: 34.0371, time_taken_in_seconds: 58
Epoch [1/1], Step [468/13804], Loss: 3.2398, Perplexity: 25.5274, time_taken_in_seconds: 59
Epoch [1/1], Step [469/13804], Loss: 3.3912, Perplexity: 29.7011, time_taken_in_seconds: 59
Epoch [1/1], Step [470/13804], Loss: 3.9435, Perplexity: 51.6007, time_taken_in_seconds: 60
Epoch [1/1], Step [471/13804], Loss: 3.3684, Perplexity: 29.0310, time_taken_in_seconds: 61
Epoch [1/1], Step [472/13804], Loss: 2.8988, Perplexity: 18.1528, time_taken_in_seconds: 62
Epoch [1/1], Step [473/13804], Loss: 2.9809, Perplexity: 19.7049, time_taken_in_seconds: 63
Epoch [1/1], Step [474/13804], Loss: 3.5716, Perplexity: 35.5735, time_taken_in_seconds: 64
Epoch [1/1], Step [475/13804], Loss: 3.5448, Perplexity: 34.6329, time_taken_in_seconds: 65
Epoch [1/1], Step [476/13804], Loss: 3.6277, Perplexity: 37.6252, time_taken_in_seconds: 66
Epoch [1/1], Step [477/13804], Loss: 3.2329, Perplexity: 25.3530, time_taken_in_seconds: 66
Epoch [1/1], Step [478/13804], Loss: 3.2276, Perplexity: 25.2188, time_taken_in_seconds: 67
Epoch [1/1], Step [479/13804], Loss: 3.2568, Perplexity: 25.9656, time_taken_in_seconds: 68
Epoch [1/1], Step [480/13804], Loss: 3.2744, Perplexity: 26.4282, time_taken_in_seconds: 69
Epoch [1/1], Step [481/13804], Loss: 2.9362, Perplexity: 18.8436, time_taken_in_seconds: 70
Epoch [1/1], Step [482/13804], Loss: 3.0554, Perplexity: 21.2294, time_taken_in_seconds: 71
Epoch [1/1], Step [483/13804], Loss: 3.5562, Perplexity: 35.0309, time_taken_in_seconds: 72
Epoch [1/1], Step [484/13804], Loss: 3.5498, Perplexity: 34.8079, time_taken_in_seconds: 73
Epoch [1/1], Step [485/13804], Loss: 3.3635, Perplexity: 28.8895, time_taken_in_seconds: 74
Epoch [1/1], Step [486/13804], Loss: 3.1720, Perplexity: 23.8541, time_taken_in_seconds: 74
Epoch [1/1], Step [487/13804], Loss: 3.3179, Perplexity: 27.6020, time_taken_in_seconds: 75
Epoch [1/1], Step [488/13804], Loss: 3.1287, Perplexity: 22.8446, time_taken_in_seconds: 76
Epoch [1/1], Step [489/13804], Loss: 3.2384, Perplexity: 25.4935, time_taken_in_seconds: 77
Epoch [1/1], Step [490/13804], Loss: 2.7808, Perplexity: 16.1312, time_taken_in_seconds: 78
Epoch [1/1], Step [491/13804], Loss: 3.3997, Perplexity: 29.9537, time_taken_in_seconds: 79
Epoch [1/1], Step [492/13804], Loss: 3.5192, Perplexity: 33.7585, time_taken_in_seconds: 80
Epoch [1/1], Step [493/13804], Loss: 3.5100, Perplexity: 33.4487, time_taken_in_seconds: 80
Epoch [1/1], Step [494/13804], Loss: 3.2196, Perplexity: 25.0174, time_taken_in_seconds: 81
Epoch [1/1], Step [495/13804], Loss: 3.8101, Perplexity: 45.1542, time_taken_in_seconds: 82
Epoch [1/1], Step [496/13804], Loss: 3.4444, Perplexity: 31.3230, time_taken_in_seconds: 83
Epoch [1/1], Step [497/13804], Loss: 3.1920, Perplexity: 24.3370, time_taken_in_seconds: 84
Epoch [1/1], Step [498/13804], Loss: 3.2742, Perplexity: 26.4229, time_taken_in_seconds: 85
Epoch [1/1], Step [499/13804], Loss: 3.3727, Perplexity: 29.1571, time_taken_in_seconds: 86
Epoch [1/1], Step [500/13804], Loss: 3.6796, Perplexity: 39.6301, time_taken_in_seconds: 87
Epoch [1/1], Step [501/13804], Loss: 3.2745, Perplexity: 26.4289, time_taken_in_seconds: 0
Epoch [1/1], Step [502/13804], Loss: 3.3157, Perplexity: 27.5430, time_taken_in_seconds: 1
Epoch [1/1], Step [503/13804], Loss: 3.6242, Perplexity: 37.4934, time_taken_in_seconds: 2
Epoch [1/1], Step [504/13804], Loss: 3.1297, Perplexity: 22.8668, time_taken_in_seconds: 3
Epoch [1/1], Step [505/13804], Loss: 3.0655, Perplexity: 21.4455, time_taken_in_seconds: 4
Epoch [1/1], Step [506/13804], Loss: 3.2531, Perplexity: 25.8709, time_taken_in_seconds: 5
Epoch [1/1], Step [507/13804], Loss: 3.2166, Perplexity: 24.9430, time_taken_in_seconds: 6
Epoch [1/1], Step [508/13804], Loss: 3.3002, Perplexity: 27.1176, time_taken_in_seconds: 6
Epoch [1/1], Step [509/13804], Loss: 3.3582, Perplexity: 28.7376, time_taken_in_seconds: 7
Epoch [1/1], Step [510/13804], Loss: 3.1917, Perplexity: 24.3286, time_taken_in_seconds: 8
Epoch [1/1], Step [511/13804], Loss: 3.0551, Perplexity: 21.2235, time_taken_in_seconds: 9
Epoch [1/1], Step [512/13804], Loss: 3.5276, Perplexity: 34.0435, time_taken_in_seconds: 10
Epoch [1/1], Step [513/13804], Loss: 3.1836, Perplexity: 24.1328, time_taken_in_seconds: 11
Epoch [1/1], Step [514/13804], Loss: 3.2056, Perplexity: 24.6691, time_taken_in_seconds: 12
Epoch [1/1], Step [515/13804], Loss: 3.3068, Perplexity: 27.2971, time_taken_in_seconds: 13
Epoch [1/1], Step [516/13804], Loss: 3.2224, Perplexity: 25.0892, time_taken_in_seconds: 13
Epoch [1/1], Step [517/13804], Loss: 3.5263, Perplexity: 33.9969, time_taken_in_seconds: 14
Epoch [1/1], Step [518/13804], Loss: 3.1832, Perplexity: 24.1239, time_taken_in_seconds: 15
Epoch [1/1], Step [519/13804], Loss: 3.2975, Perplexity: 27.0459, time_taken_in_seconds: 16
Epoch [1/1], Step [520/13804], Loss: 3.2828, Perplexity: 26.6511, time_taken_in_seconds: 17
Epoch [1/1], Step [521/13804], Loss: 3.2071, Perplexity: 24.7083, time_taken_in_seconds: 18
Epoch [1/1], Step [522/13804], Loss: 3.4783, Perplexity: 32.4042, time_taken_in_seconds: 19
Epoch [1/1], Step [523/13804], Loss: 3.4582, Perplexity: 31.7606, time_taken_in_seconds: 19
Epoch [1/1], Step [524/13804], Loss: 3.3913, Perplexity: 29.7056, time_taken_in_seconds: 20
Epoch [1/1], Step [525/13804], Loss: 3.2414, Perplexity: 25.5697, time_taken_in_seconds: 21
Epoch [1/1], Step [526/13804], Loss: 3.3316, Perplexity: 27.9824, time_taken_in_seconds: 22
Epoch [1/1], Step [527/13804], Loss: 2.7858, Perplexity: 16.2125, time_taken_in_seconds: 23
Epoch [1/1], Step [528/13804], Loss: 2.9911, Perplexity: 19.9072, time_taken_in_seconds: 24
Epoch [1/1], Step [529/13804], Loss: 3.2476, Perplexity: 25.7280, time_taken_in_seconds: 25
Epoch [1/1], Step [530/13804], Loss: 3.1169, Perplexity: 22.5771, time_taken_in_seconds: 26
Epoch [1/1], Step [531/13804], Loss: 3.3555, Perplexity: 28.6598, time_taken_in_seconds: 26
Epoch [1/1], Step [532/13804], Loss: 3.4769, Perplexity: 32.3596, time_taken_in_seconds: 27
Epoch [1/1], Step [533/13804], Loss: 3.2159, Perplexity: 24.9258, time_taken_in_seconds: 28
Epoch [1/1], Step [534/13804], Loss: 3.0743, Perplexity: 21.6356, time_taken_in_seconds: 29
Epoch [1/1], Step [535/13804], Loss: 3.1245, Perplexity: 22.7488, time_taken_in_seconds: 30
Epoch [1/1], Step [536/13804], Loss: 3.5889, Perplexity: 36.1926, time_taken_in_seconds: 31
Epoch [1/1], Step [537/13804], Loss: 3.1848, Perplexity: 24.1621, time_taken_in_seconds: 32
Epoch [1/1], Step [538/13804], Loss: 3.2459, Perplexity: 25.6836, time_taken_in_seconds: 33
Epoch [1/1], Step [539/13804], Loss: 3.0652, Perplexity: 21.4397, time_taken_in_seconds: 33
Epoch [1/1], Step [540/13804], Loss: 3.1424, Perplexity: 23.1597, time_taken_in_seconds: 34
Epoch [1/1], Step [541/13804], Loss: 3.1626, Perplexity: 23.6330, time_taken_in_seconds: 35
Epoch [1/1], Step [542/13804], Loss: 2.9502, Perplexity: 19.1106, time_taken_in_seconds: 36
Epoch [1/1], Step [543/13804], Loss: 2.9981, Perplexity: 20.0477, time_taken_in_seconds: 37
Epoch [1/1], Step [544/13804], Loss: 3.1242, Perplexity: 22.7418, time_taken_in_seconds: 38
Epoch [1/1], Step [545/13804], Loss: 3.7993, Perplexity: 44.6712, time_taken_in_seconds: 39
Epoch [1/1], Step [546/13804], Loss: 4.6703, Perplexity: 106.7336, time_taken_in_seconds: 39
Epoch [1/1], Step [547/13804], Loss: 3.2624, Perplexity: 26.1129, time_taken_in_seconds: 40
Epoch [1/1], Step [548/13804], Loss: 3.3054, Perplexity: 27.2595, time_taken_in_seconds: 41
Epoch [1/1], Step [549/13804], Loss: 3.3243, Perplexity: 27.7809, time_taken_in_seconds: 42
Epoch [1/1], Step [550/13804], Loss: 2.8776, Perplexity: 17.7719, time_taken_in_seconds: 43
Epoch [1/1], Step [551/13804], Loss: 3.3082, Perplexity: 27.3364, time_taken_in_seconds: 44
Epoch [1/1], Step [552/13804], Loss: 2.9688, Perplexity: 19.4679, time_taken_in_seconds: 45
Epoch [1/1], Step [553/13804], Loss: 3.0985, Perplexity: 22.1650, time_taken_in_seconds: 46
Epoch [1/1], Step [554/13804], Loss: 3.0105, Perplexity: 20.2968, time_taken_in_seconds: 47
Epoch [1/1], Step [555/13804], Loss: 3.0002, Perplexity: 20.0889, time_taken_in_seconds: 47
Epoch [1/1], Step [556/13804], Loss: 3.2879, Perplexity: 26.7852, time_taken_in_seconds: 48
Epoch [1/1], Step [557/13804], Loss: 3.4599, Perplexity: 31.8123, time_taken_in_seconds: 49
Epoch [1/1], Step [558/13804], Loss: 3.3346, Perplexity: 28.0667, time_taken_in_seconds: 50
Epoch [1/1], Step [559/13804], Loss: 2.7936, Perplexity: 16.3393, time_taken_in_seconds: 51
Epoch [1/1], Step [560/13804], Loss: 3.2940, Perplexity: 26.9506, time_taken_in_seconds: 52
Epoch [1/1], Step [561/13804], Loss: 3.1153, Perplexity: 22.5410, time_taken_in_seconds: 53
Epoch [1/1], Step [562/13804], Loss: 3.5267, Perplexity: 34.0100, time_taken_in_seconds: 53
Epoch [1/1], Step [563/13804], Loss: 3.2250, Perplexity: 25.1534, time_taken_in_seconds: 54
Epoch [1/1], Step [564/13804], Loss: 3.5594, Perplexity: 35.1426, time_taken_in_seconds: 55
Epoch [1/1], Step [565/13804], Loss: 3.5398, Perplexity: 34.4594, time_taken_in_seconds: 56
Epoch [1/1], Step [566/13804], Loss: 3.2442, Perplexity: 25.6412, time_taken_in_seconds: 57
Epoch [1/1], Step [567/13804], Loss: 3.8991, Perplexity: 49.3561, time_taken_in_seconds: 58
Epoch [1/1], Step [568/13804], Loss: 3.4652, Perplexity: 31.9820, time_taken_in_seconds: 59
Epoch [1/1], Step [569/13804], Loss: 3.1647, Perplexity: 23.6813, time_taken_in_seconds: 59
Epoch [1/1], Step [570/13804], Loss: 3.1191, Perplexity: 22.6251, time_taken_in_seconds: 60
Epoch [1/1], Step [571/13804], Loss: 3.5472, Perplexity: 34.7161, time_taken_in_seconds: 61
Epoch [1/1], Step [572/13804], Loss: 3.0532, Perplexity: 21.1832, time_taken_in_seconds: 62
Epoch [1/1], Step [573/13804], Loss: 3.1310, Perplexity: 22.8975, time_taken_in_seconds: 63
Epoch [1/1], Step [574/13804], Loss: 2.8691, Perplexity: 17.6213, time_taken_in_seconds: 64
Epoch [1/1], Step [575/13804], Loss: 3.0144, Perplexity: 20.3776, time_taken_in_seconds: 65
Epoch [1/1], Step [576/13804], Loss: 3.9876, Perplexity: 53.9274, time_taken_in_seconds: 66
Epoch [1/1], Step [577/13804], Loss: 2.8518, Perplexity: 17.3192, time_taken_in_seconds: 66
Epoch [1/1], Step [578/13804], Loss: 3.2030, Perplexity: 24.6052, time_taken_in_seconds: 67
Epoch [1/1], Step [579/13804], Loss: 2.7425, Perplexity: 15.5252, time_taken_in_seconds: 68
Epoch [1/1], Step [580/13804], Loss: 3.5917, Perplexity: 36.2962, time_taken_in_seconds: 69
Epoch [1/1], Step [581/13804], Loss: 3.5194, Perplexity: 33.7646, time_taken_in_seconds: 70
Epoch [1/1], Step [582/13804], Loss: 2.8009, Perplexity: 16.4594, time_taken_in_seconds: 71
Epoch [1/1], Step [583/13804], Loss: 3.2274, Perplexity: 25.2141, time_taken_in_seconds: 72
Epoch [1/1], Step [584/13804], Loss: 3.0434, Perplexity: 20.9773, time_taken_in_seconds: 73
Epoch [1/1], Step [585/13804], Loss: 3.7488, Perplexity: 42.4692, time_taken_in_seconds: 73
Epoch [1/1], Step [586/13804], Loss: 3.5762, Perplexity: 35.7373, time_taken_in_seconds: 74
Epoch [1/1], Step [587/13804], Loss: 3.2577, Perplexity: 25.9907, time_taken_in_seconds: 75
Epoch [1/1], Step [588/13804], Loss: 3.1083, Perplexity: 22.3838, time_taken_in_seconds: 76
Epoch [1/1], Step [589/13804], Loss: 3.3132, Perplexity: 27.4733, time_taken_in_seconds: 77
Epoch [1/1], Step [590/13804], Loss: 3.2112, Perplexity: 24.8087, time_taken_in_seconds: 78
Epoch [1/1], Step [591/13804], Loss: 3.7034, Perplexity: 40.5856, time_taken_in_seconds: 79
Epoch [1/1], Step [592/13804], Loss: 3.4558, Perplexity: 31.6832, time_taken_in_seconds: 80
Epoch [1/1], Step [593/13804], Loss: 3.5594, Perplexity: 35.1413, time_taken_in_seconds: 80
Epoch [1/1], Step [594/13804], Loss: 3.1794, Perplexity: 24.0325, time_taken_in_seconds: 81
Epoch [1/1], Step [595/13804], Loss: 3.1884, Perplexity: 24.2502, time_taken_in_seconds: 82
Epoch [1/1], Step [596/13804], Loss: 3.7099, Perplexity: 40.8482, time_taken_in_seconds: 83
Epoch [1/1], Step [597/13804], Loss: 3.2467, Perplexity: 25.7064, time_taken_in_seconds: 84
Epoch [1/1], Step [598/13804], Loss: 3.2127, Perplexity: 24.8452, time_taken_in_seconds: 85
Epoch [1/1], Step [599/13804], Loss: 2.9439, Perplexity: 18.9904, time_taken_in_seconds: 86
Epoch [1/1], Step [600/13804], Loss: 3.6539, Perplexity: 38.6251, time_taken_in_seconds: 86
Epoch [1/1], Step [601/13804], Loss: 3.2121, Perplexity: 24.8323, time_taken_in_seconds: 0
Epoch [1/1], Step [602/13804], Loss: 3.1646, Perplexity: 23.6803, time_taken_in_seconds: 1
Epoch [1/1], Step [603/13804], Loss: 3.2453, Perplexity: 25.6704, time_taken_in_seconds: 2
Epoch [1/1], Step [604/13804], Loss: 2.9105, Perplexity: 18.3655, time_taken_in_seconds: 3
Epoch [1/1], Step [605/13804], Loss: 2.9801, Perplexity: 19.6890, time_taken_in_seconds: 4
Epoch [1/1], Step [606/13804], Loss: 3.5926, Perplexity: 36.3274, time_taken_in_seconds: 5
Epoch [1/1], Step [607/13804], Loss: 3.1213, Perplexity: 22.6748, time_taken_in_seconds: 6
Epoch [1/1], Step [608/13804], Loss: 3.1354, Perplexity: 22.9984, time_taken_in_seconds: 6
Epoch [1/1], Step [609/13804], Loss: 3.3654, Perplexity: 28.9462, time_taken_in_seconds: 7
Epoch [1/1], Step [610/13804], Loss: 2.8381, Perplexity: 17.0828, time_taken_in_seconds: 8
Epoch [1/1], Step [611/13804], Loss: 3.2028, Perplexity: 24.6013, time_taken_in_seconds: 9
Epoch [1/1], Step [612/13804], Loss: 2.9419, Perplexity: 18.9517, time_taken_in_seconds: 10
Epoch [1/1], Step [613/13804], Loss: 3.1780, Perplexity: 23.9979, time_taken_in_seconds: 11
Epoch [1/1], Step [614/13804], Loss: 2.9873, Perplexity: 19.8311, time_taken_in_seconds: 12
Epoch [1/1], Step [615/13804], Loss: 3.2513, Perplexity: 25.8238, time_taken_in_seconds: 13
Epoch [1/1], Step [616/13804], Loss: 3.2916, Perplexity: 26.8867, time_taken_in_seconds: 13
Epoch [1/1], Step [617/13804], Loss: 3.2802, Perplexity: 26.5818, time_taken_in_seconds: 14
Epoch [1/1], Step [618/13804], Loss: 3.2213, Perplexity: 25.0601, time_taken_in_seconds: 15
Epoch [1/1], Step [619/13804], Loss: 3.2488, Perplexity: 25.7594, time_taken_in_seconds: 16
Epoch [1/1], Step [620/13804], Loss: 3.5111, Perplexity: 33.4858, time_taken_in_seconds: 17
Epoch [1/1], Step [621/13804], Loss: 3.1151, Perplexity: 22.5362, time_taken_in_seconds: 18
Epoch [1/1], Step [622/13804], Loss: 3.1407, Perplexity: 23.1191, time_taken_in_seconds: 19
Epoch [1/1], Step [623/13804], Loss: 3.0361, Perplexity: 20.8247, time_taken_in_seconds: 20
Epoch [1/1], Step [624/13804], Loss: 3.1421, Perplexity: 23.1518, time_taken_in_seconds: 21
Epoch [1/1], Step [625/13804], Loss: 3.3353, Perplexity: 28.0861, time_taken_in_seconds: 21
Epoch [1/1], Step [626/13804], Loss: 2.8779, Perplexity: 17.7764, time_taken_in_seconds: 22
Epoch [1/1], Step [627/13804], Loss: 3.1864, Perplexity: 24.2017, time_taken_in_seconds: 23
Epoch [1/1], Step [628/13804], Loss: 3.2610, Perplexity: 26.0750, time_taken_in_seconds: 24
Epoch [1/1], Step [629/13804], Loss: 2.8501, Perplexity: 17.2902, time_taken_in_seconds: 25
Epoch [1/1], Step [630/13804], Loss: 3.4645, Perplexity: 31.9601, time_taken_in_seconds: 26
Epoch [1/1], Step [631/13804], Loss: 3.2133, Perplexity: 24.8599, time_taken_in_seconds: 27
Epoch [1/1], Step [632/13804], Loss: 3.0016, Perplexity: 20.1180, time_taken_in_seconds: 27
Epoch [1/1], Step [633/13804], Loss: 3.3404, Perplexity: 28.2308, time_taken_in_seconds: 28
Epoch [1/1], Step [634/13804], Loss: 3.4228, Perplexity: 30.6539, time_taken_in_seconds: 29
Epoch [1/1], Step [635/13804], Loss: 3.3201, Perplexity: 27.6632, time_taken_in_seconds: 30
Epoch [1/1], Step [636/13804], Loss: 3.4658, Perplexity: 32.0035, time_taken_in_seconds: 31
Epoch [1/1], Step [637/13804], Loss: 3.3454, Perplexity: 28.3706, time_taken_in_seconds: 32
Epoch [1/1], Step [638/13804], Loss: 3.0386, Perplexity: 20.8758, time_taken_in_seconds: 33
Epoch [1/1], Step [639/13804], Loss: 3.2857, Perplexity: 26.7269, time_taken_in_seconds: 34
Epoch [1/1], Step [640/13804], Loss: 3.1169, Perplexity: 22.5773, time_taken_in_seconds: 34
Epoch [1/1], Step [641/13804], Loss: 3.2575, Perplexity: 25.9840, time_taken_in_seconds: 35
Epoch [1/1], Step [642/13804], Loss: 3.1695, Perplexity: 23.7956, time_taken_in_seconds: 36
Epoch [1/1], Step [643/13804], Loss: 3.2684, Perplexity: 26.2698, time_taken_in_seconds: 37
Epoch [1/1], Step [644/13804], Loss: 3.5056, Perplexity: 33.3003, time_taken_in_seconds: 38
Epoch [1/1], Step [645/13804], Loss: 2.8938, Perplexity: 18.0624, time_taken_in_seconds: 39
Epoch [1/1], Step [646/13804], Loss: 3.0239, Perplexity: 20.5721, time_taken_in_seconds: 40
Epoch [1/1], Step [647/13804], Loss: 3.0361, Perplexity: 20.8243, time_taken_in_seconds: 40
Epoch [1/1], Step [648/13804], Loss: 3.0038, Perplexity: 20.1616, time_taken_in_seconds: 41
Epoch [1/1], Step [649/13804], Loss: 3.1149, Perplexity: 22.5307, time_taken_in_seconds: 42
Epoch [1/1], Step [650/13804], Loss: 3.5571, Perplexity: 35.0625, time_taken_in_seconds: 43
Epoch [1/1], Step [651/13804], Loss: 3.0795, Perplexity: 21.7465, time_taken_in_seconds: 44
Epoch [1/1], Step [652/13804], Loss: 2.9167, Perplexity: 18.4807, time_taken_in_seconds: 45
Epoch [1/1], Step [653/13804], Loss: 3.1568, Perplexity: 23.4962, time_taken_in_seconds: 46
Epoch [1/1], Step [654/13804], Loss: 2.9400, Perplexity: 18.9164, time_taken_in_seconds: 47
Epoch [1/1], Step [655/13804], Loss: 2.9015, Perplexity: 18.2015, time_taken_in_seconds: 47
Epoch [1/1], Step [656/13804], Loss: 3.0191, Perplexity: 20.4731, time_taken_in_seconds: 48
Epoch [1/1], Step [657/13804], Loss: 3.0760, Perplexity: 21.6719, time_taken_in_seconds: 49
Epoch [1/1], Step [658/13804], Loss: 3.2490, Perplexity: 25.7656, time_taken_in_seconds: 50
Epoch [1/1], Step [659/13804], Loss: 3.2331, Perplexity: 25.3577, time_taken_in_seconds: 51
Epoch [1/1], Step [660/13804], Loss: 3.6852, Perplexity: 39.8524, time_taken_in_seconds: 52
Epoch [1/1], Step [661/13804], Loss: 3.1586, Perplexity: 23.5380, time_taken_in_seconds: 52
Epoch [1/1], Step [662/13804], Loss: 3.1447, Perplexity: 23.2131, time_taken_in_seconds: 53
Epoch [1/1], Step [663/13804], Loss: 3.0282, Perplexity: 20.6603, time_taken_in_seconds: 54
Epoch [1/1], Step [664/13804], Loss: 3.2617, Perplexity: 26.0950, time_taken_in_seconds: 55
Epoch [1/1], Step [665/13804], Loss: 2.8197, Perplexity: 16.7713, time_taken_in_seconds: 56
Epoch [1/1], Step [666/13804], Loss: 3.2249, Perplexity: 25.1501, time_taken_in_seconds: 57
Epoch [1/1], Step [667/13804], Loss: 3.2601, Perplexity: 26.0515, time_taken_in_seconds: 58
Epoch [1/1], Step [668/13804], Loss: 3.2250, Perplexity: 25.1545, time_taken_in_seconds: 59
Epoch [1/1], Step [669/13804], Loss: 3.0953, Perplexity: 22.0940, time_taken_in_seconds: 59
Epoch [1/1], Step [670/13804], Loss: 2.9640, Perplexity: 19.3750, time_taken_in_seconds: 60
Epoch [1/1], Step [671/13804], Loss: 3.7415, Perplexity: 42.1628, time_taken_in_seconds: 61
Epoch [1/1], Step [672/13804], Loss: 2.9336, Perplexity: 18.7943, time_taken_in_seconds: 62
Epoch [1/1], Step [673/13804], Loss: 3.0008, Perplexity: 20.1017, time_taken_in_seconds: 63
Epoch [1/1], Step [674/13804], Loss: 3.3161, Perplexity: 27.5531, time_taken_in_seconds: 64
Epoch [1/1], Step [675/13804], Loss: 3.6058, Perplexity: 36.8118, time_taken_in_seconds: 65
Epoch [1/1], Step [676/13804], Loss: 3.3173, Perplexity: 27.5855, time_taken_in_seconds: 65
Epoch [1/1], Step [677/13804], Loss: 3.3571, Perplexity: 28.7070, time_taken_in_seconds: 66
Epoch [1/1], Step [678/13804], Loss: 3.2711, Perplexity: 26.3403, time_taken_in_seconds: 67
Epoch [1/1], Step [679/13804], Loss: 3.3236, Perplexity: 27.7607, time_taken_in_seconds: 68
Epoch [1/1], Step [680/13804], Loss: 3.1649, Perplexity: 23.6871, time_taken_in_seconds: 69
Epoch [1/1], Step [681/13804], Loss: 3.1940, Perplexity: 24.3849, time_taken_in_seconds: 70
Epoch [1/1], Step [682/13804], Loss: 3.6235, Perplexity: 37.4685, time_taken_in_seconds: 71
Epoch [1/1], Step [683/13804], Loss: 3.2484, Perplexity: 25.7500, time_taken_in_seconds: 71
Epoch [1/1], Step [684/13804], Loss: 3.3218, Perplexity: 27.7097, time_taken_in_seconds: 72
Epoch [1/1], Step [685/13804], Loss: 3.3096, Perplexity: 27.3749, time_taken_in_seconds: 73
Epoch [1/1], Step [686/13804], Loss: 3.2923, Perplexity: 26.9049, time_taken_in_seconds: 74
Epoch [1/1], Step [687/13804], Loss: 2.9503, Perplexity: 19.1111, time_taken_in_seconds: 75
Epoch [1/1], Step [688/13804], Loss: 3.2010, Perplexity: 24.5583, time_taken_in_seconds: 76
Epoch [1/1], Step [689/13804], Loss: 2.9630, Perplexity: 19.3564, time_taken_in_seconds: 77
Epoch [1/1], Step [690/13804], Loss: 3.3715, Perplexity: 29.1229, time_taken_in_seconds: 77
Epoch [1/1], Step [691/13804], Loss: 2.9586, Perplexity: 19.2708, time_taken_in_seconds: 78
Epoch [1/1], Step [692/13804], Loss: 4.4330, Perplexity: 84.1834, time_taken_in_seconds: 79
Epoch [1/1], Step [693/13804], Loss: 4.0070, Perplexity: 54.9831, time_taken_in_seconds: 80
Epoch [1/1], Step [694/13804], Loss: 3.0062, Perplexity: 20.2110, time_taken_in_seconds: 81
Epoch [1/1], Step [695/13804], Loss: 3.0856, Perplexity: 21.8797, time_taken_in_seconds: 82
Epoch [1/1], Step [696/13804], Loss: 3.3980, Perplexity: 29.9054, time_taken_in_seconds: 83
Epoch [1/1], Step [697/13804], Loss: 3.2680, Perplexity: 26.2590, time_taken_in_seconds: 84
Epoch [1/1], Step [698/13804], Loss: 3.6488, Perplexity: 38.4282, time_taken_in_seconds: 85
Epoch [1/1], Step [699/13804], Loss: 3.0371, Perplexity: 20.8440, time_taken_in_seconds: 85
Epoch [1/1], Step [700/13804], Loss: 3.0500, Perplexity: 21.1144, time_taken_in_seconds: 86
Epoch [1/1], Step [701/13804], Loss: 3.4024, Perplexity: 30.0357, time_taken_in_seconds: 0
Epoch [1/1], Step [702/13804], Loss: 3.2973, Perplexity: 27.0398, time_taken_in_seconds: 1
Epoch [1/1], Step [703/13804], Loss: 3.3807, Perplexity: 29.3919, time_taken_in_seconds: 2
Epoch [1/1], Step [704/13804], Loss: 3.2413, Perplexity: 25.5671, time_taken_in_seconds: 3
Epoch [1/1], Step [705/13804], Loss: 2.8597, Perplexity: 17.4570, time_taken_in_seconds: 4
Epoch [1/1], Step [706/13804], Loss: 2.9765, Perplexity: 19.6197, time_taken_in_seconds: 5
Epoch [1/1], Step [707/13804], Loss: 2.9975, Perplexity: 20.0360, time_taken_in_seconds: 6
Epoch [1/1], Step [708/13804], Loss: 3.6850, Perplexity: 39.8469, time_taken_in_seconds: 6
Epoch [1/1], Step [709/13804], Loss: 3.4258, Perplexity: 30.7470, time_taken_in_seconds: 7
Epoch [1/1], Step [710/13804], Loss: 2.8079, Perplexity: 16.5759, time_taken_in_seconds: 8
Epoch [1/1], Step [711/13804], Loss: 3.2800, Perplexity: 26.5753, time_taken_in_seconds: 9
Epoch [1/1], Step [712/13804], Loss: 3.0266, Perplexity: 20.6275, time_taken_in_seconds: 10
Epoch [1/1], Step [713/13804], Loss: 3.3312, Perplexity: 27.9710, time_taken_in_seconds: 11
Epoch [1/1], Step [714/13804], Loss: 2.9651, Perplexity: 19.3964, time_taken_in_seconds: 12
Epoch [1/1], Step [715/13804], Loss: 3.4691, Perplexity: 32.1081, time_taken_in_seconds: 12
Epoch [1/1], Step [716/13804], Loss: 3.1954, Perplexity: 24.4211, time_taken_in_seconds: 13
Epoch [1/1], Step [717/13804], Loss: 3.1748, Perplexity: 23.9218, time_taken_in_seconds: 14
Epoch [1/1], Step [718/13804], Loss: 3.0648, Perplexity: 21.4293, time_taken_in_seconds: 15
Epoch [1/1], Step [719/13804], Loss: 3.4001, Perplexity: 29.9685, time_taken_in_seconds: 16
Epoch [1/1], Step [720/13804], Loss: 3.0871, Perplexity: 21.9134, time_taken_in_seconds: 17
Epoch [1/1], Step [721/13804], Loss: 3.7817, Perplexity: 43.8911, time_taken_in_seconds: 18
Epoch [1/1], Step [722/13804], Loss: 3.1755, Perplexity: 23.9389, time_taken_in_seconds: 18
Epoch [1/1], Step [723/13804], Loss: 3.4216, Perplexity: 30.6197, time_taken_in_seconds: 19
Epoch [1/1], Step [724/13804], Loss: 2.9729, Perplexity: 19.5485, time_taken_in_seconds: 20
Epoch [1/1], Step [725/13804], Loss: 3.0631, Perplexity: 21.3934, time_taken_in_seconds: 21
Epoch [1/1], Step [726/13804], Loss: 3.1884, Perplexity: 24.2499, time_taken_in_seconds: 22
Epoch [1/1], Step [727/13804], Loss: 3.2188, Perplexity: 24.9989, time_taken_in_seconds: 23
Epoch [1/1], Step [728/13804], Loss: 3.0322, Perplexity: 20.7421, time_taken_in_seconds: 24
Epoch [1/1], Step [729/13804], Loss: 3.2305, Perplexity: 25.2918, time_taken_in_seconds: 24
Epoch [1/1], Step [730/13804], Loss: 2.5881, Perplexity: 13.3044, time_taken_in_seconds: 25
Epoch [1/1], Step [731/13804], Loss: 4.0647, Perplexity: 58.2489, time_taken_in_seconds: 26
Epoch [1/1], Step [732/13804], Loss: 3.2828, Perplexity: 26.6508, time_taken_in_seconds: 27
Epoch [1/1], Step [733/13804], Loss: 3.7692, Perplexity: 43.3435, time_taken_in_seconds: 28
Epoch [1/1], Step [734/13804], Loss: 2.9049, Perplexity: 18.2631, time_taken_in_seconds: 29
Epoch [1/1], Step [735/13804], Loss: 3.0225, Perplexity: 20.5425, time_taken_in_seconds: 30
Epoch [1/1], Step [736/13804], Loss: 3.2390, Perplexity: 25.5074, time_taken_in_seconds: 30
Epoch [1/1], Step [737/13804], Loss: 2.8297, Perplexity: 16.9401, time_taken_in_seconds: 31
Epoch [1/1], Step [738/13804], Loss: 3.0996, Perplexity: 22.1886, time_taken_in_seconds: 32
Epoch [1/1], Step [739/13804], Loss: 3.0067, Perplexity: 20.2214, time_taken_in_seconds: 33
Epoch [1/1], Step [740/13804], Loss: 3.1798, Perplexity: 24.0415, time_taken_in_seconds: 34
Epoch [1/1], Step [741/13804], Loss: 3.1937, Perplexity: 24.3787, time_taken_in_seconds: 35
Epoch [1/1], Step [742/13804], Loss: 3.3265, Perplexity: 27.8410, time_taken_in_seconds: 36
Epoch [1/1], Step [743/13804], Loss: 2.7575, Perplexity: 15.7602, time_taken_in_seconds: 36
Epoch [1/1], Step [744/13804], Loss: 3.3031, Perplexity: 27.1960, time_taken_in_seconds: 37
Epoch [1/1], Step [745/13804], Loss: 3.3201, Perplexity: 27.6631, time_taken_in_seconds: 38
Epoch [1/1], Step [746/13804], Loss: 3.0008, Perplexity: 20.1025, time_taken_in_seconds: 39
Epoch [1/1], Step [747/13804], Loss: 3.2779, Perplexity: 26.5207, time_taken_in_seconds: 40
Epoch [1/1], Step [748/13804], Loss: 2.7215, Perplexity: 15.2036, time_taken_in_seconds: 41
Epoch [1/1], Step [749/13804], Loss: 3.0922, Perplexity: 22.0260, time_taken_in_seconds: 42
Epoch [1/1], Step [750/13804], Loss: 2.9849, Perplexity: 19.7845, time_taken_in_seconds: 43
Epoch [1/1], Step [751/13804], Loss: 3.1183, Perplexity: 22.6080, time_taken_in_seconds: 43
Epoch [1/1], Step [752/13804], Loss: 3.2440, Perplexity: 25.6369, time_taken_in_seconds: 44
Epoch [1/1], Step [753/13804], Loss: 3.0623, Perplexity: 21.3765, time_taken_in_seconds: 45
Epoch [1/1], Step [754/13804], Loss: 2.9396, Perplexity: 18.9075, time_taken_in_seconds: 46
Epoch [1/1], Step [755/13804], Loss: 3.3527, Perplexity: 28.5785, time_taken_in_seconds: 47
Epoch [1/1], Step [756/13804], Loss: 2.7519, Perplexity: 15.6718, time_taken_in_seconds: 48
Epoch [1/1], Step [757/13804], Loss: 3.0676, Perplexity: 21.4909, time_taken_in_seconds: 49
Epoch [1/1], Step [758/13804], Loss: 3.3553, Perplexity: 28.6547, time_taken_in_seconds: 49
Epoch [1/1], Step [759/13804], Loss: 2.9912, Perplexity: 19.9100, time_taken_in_seconds: 50
Epoch [1/1], Step [760/13804], Loss: 2.9880, Perplexity: 19.8453, time_taken_in_seconds: 51
Epoch [1/1], Step [761/13804], Loss: 3.2748, Perplexity: 26.4387, time_taken_in_seconds: 52
Epoch [1/1], Step [762/13804], Loss: 3.1240, Perplexity: 22.7375, time_taken_in_seconds: 53
Epoch [1/1], Step [763/13804], Loss: 3.3091, Perplexity: 27.3614, time_taken_in_seconds: 54
Epoch [1/1], Step [764/13804], Loss: 2.9490, Perplexity: 19.0870, time_taken_in_seconds: 55
Epoch [1/1], Step [765/13804], Loss: 3.1402, Perplexity: 23.1096, time_taken_in_seconds: 56
Epoch [1/1], Step [766/13804], Loss: 3.2310, Perplexity: 25.3043, time_taken_in_seconds: 56
Epoch [1/1], Step [767/13804], Loss: 3.2224, Perplexity: 25.0894, time_taken_in_seconds: 57
Epoch [1/1], Step [768/13804], Loss: 3.3152, Perplexity: 27.5278, time_taken_in_seconds: 58
Epoch [1/1], Step [769/13804], Loss: 2.9697, Perplexity: 19.4865, time_taken_in_seconds: 59
Epoch [1/1], Step [770/13804], Loss: 3.0337, Perplexity: 20.7732, time_taken_in_seconds: 60
Epoch [1/1], Step [771/13804], Loss: 3.2055, Perplexity: 24.6666, time_taken_in_seconds: 61
Epoch [1/1], Step [772/13804], Loss: 3.3530, Perplexity: 28.5879, time_taken_in_seconds: 62
Epoch [1/1], Step [773/13804], Loss: 2.9455, Perplexity: 19.0194, time_taken_in_seconds: 62
Epoch [1/1], Step [774/13804], Loss: 2.7186, Perplexity: 15.1591, time_taken_in_seconds: 63
Epoch [1/1], Step [775/13804], Loss: 2.8258, Perplexity: 16.8750, time_taken_in_seconds: 64
Epoch [1/1], Step [776/13804], Loss: 4.2719, Perplexity: 71.6599, time_taken_in_seconds: 65
Epoch [1/1], Step [777/13804], Loss: 2.8919, Perplexity: 18.0267, time_taken_in_seconds: 66
Epoch [1/1], Step [778/13804], Loss: 4.1445, Perplexity: 63.0874, time_taken_in_seconds: 67
Epoch [1/1], Step [779/13804], Loss: 3.1375, Perplexity: 23.0463, time_taken_in_seconds: 68
Epoch [1/1], Step [780/13804], Loss: 3.1481, Perplexity: 23.2918, time_taken_in_seconds: 69
Epoch [1/1], Step [781/13804], Loss: 3.2863, Perplexity: 26.7432, time_taken_in_seconds: 69
Epoch [1/1], Step [782/13804], Loss: 2.9192, Perplexity: 18.5257, time_taken_in_seconds: 70
Epoch [1/1], Step [783/13804], Loss: 3.0660, Perplexity: 21.4560, time_taken_in_seconds: 71
Epoch [1/1], Step [784/13804], Loss: 2.8402, Perplexity: 17.1191, time_taken_in_seconds: 72
Epoch [1/1], Step [785/13804], Loss: 2.9712, Perplexity: 19.5157, time_taken_in_seconds: 73
Epoch [1/1], Step [786/13804], Loss: 3.3537, Perplexity: 28.6074, time_taken_in_seconds: 74
Epoch [1/1], Step [787/13804], Loss: 3.0772, Perplexity: 21.6979, time_taken_in_seconds: 75
Epoch [1/1], Step [788/13804], Loss: 2.9612, Perplexity: 19.3216, time_taken_in_seconds: 75
Epoch [1/1], Step [789/13804], Loss: 3.3069, Perplexity: 27.3010, time_taken_in_seconds: 76
Epoch [1/1], Step [790/13804], Loss: 2.9162, Perplexity: 18.4708, time_taken_in_seconds: 77
Epoch [1/1], Step [791/13804], Loss: 3.0018, Perplexity: 20.1222, time_taken_in_seconds: 78
Epoch [1/1], Step [792/13804], Loss: 3.1647, Perplexity: 23.6811, time_taken_in_seconds: 79
Epoch [1/1], Step [793/13804], Loss: 2.7709, Perplexity: 15.9735, time_taken_in_seconds: 80
Epoch [1/1], Step [794/13804], Loss: 3.4068, Perplexity: 30.1699, time_taken_in_seconds: 81
Epoch [1/1], Step [795/13804], Loss: 3.4083, Perplexity: 30.2133, time_taken_in_seconds: 81
Epoch [1/1], Step [796/13804], Loss: 2.9570, Perplexity: 19.2402, time_taken_in_seconds: 82
Epoch [1/1], Step [797/13804], Loss: 3.4388, Perplexity: 31.1483, time_taken_in_seconds: 83
Epoch [1/1], Step [798/13804], Loss: 3.3301, Perplexity: 27.9408, time_taken_in_seconds: 84
Epoch [1/1], Step [799/13804], Loss: 2.9895, Perplexity: 19.8749, time_taken_in_seconds: 85
Epoch [1/1], Step [800/13804], Loss: 2.9774, Perplexity: 19.6362, time_taken_in_seconds: 86
Epoch [1/1], Step [801/13804], Loss: 3.3643, Perplexity: 28.9147, time_taken_in_seconds: 0
Epoch [1/1], Step [802/13804], Loss: 3.2162, Perplexity: 24.9335, time_taken_in_seconds: 1
Epoch [1/1], Step [803/13804], Loss: 3.7099, Perplexity: 40.8496, time_taken_in_seconds: 2
Epoch [1/1], Step [804/13804], Loss: 3.1433, Perplexity: 23.1806, time_taken_in_seconds: 3
Epoch [1/1], Step [805/13804], Loss: 3.1286, Perplexity: 22.8429, time_taken_in_seconds: 4
Epoch [1/1], Step [806/13804], Loss: 3.2782, Perplexity: 26.5267, time_taken_in_seconds: 5
Epoch [1/1], Step [807/13804], Loss: 3.0982, Perplexity: 22.1587, time_taken_in_seconds: 5
Epoch [1/1], Step [808/13804], Loss: 3.5109, Perplexity: 33.4770, time_taken_in_seconds: 6
Epoch [1/1], Step [809/13804], Loss: 3.6682, Perplexity: 39.1795, time_taken_in_seconds: 7
Epoch [1/1], Step [810/13804], Loss: 3.3130, Perplexity: 27.4665, time_taken_in_seconds: 8
Epoch [1/1], Step [811/13804], Loss: 2.9269, Perplexity: 18.6695, time_taken_in_seconds: 9
Epoch [1/1], Step [812/13804], Loss: 3.5945, Perplexity: 36.3978, time_taken_in_seconds: 10
Epoch [1/1], Step [813/13804], Loss: 3.0798, Perplexity: 21.7539, time_taken_in_seconds: 11
Epoch [1/1], Step [814/13804], Loss: 2.9782, Perplexity: 19.6515, time_taken_in_seconds: 12
Epoch [1/1], Step [815/13804], Loss: 3.7496, Perplexity: 42.5055, time_taken_in_seconds: 12
Epoch [1/1], Step [816/13804], Loss: 3.1543, Perplexity: 23.4360, time_taken_in_seconds: 13
Epoch [1/1], Step [817/13804], Loss: 3.3337, Perplexity: 28.0417, time_taken_in_seconds: 14
Epoch [1/1], Step [818/13804], Loss: 3.4017, Perplexity: 30.0159, time_taken_in_seconds: 15
Epoch [1/1], Step [819/13804], Loss: 3.3919, Perplexity: 29.7236, time_taken_in_seconds: 16
Epoch [1/1], Step [820/13804], Loss: 3.1251, Perplexity: 22.7614, time_taken_in_seconds: 17
Epoch [1/1], Step [821/13804], Loss: 3.3257, Perplexity: 27.8179, time_taken_in_seconds: 18
Epoch [1/1], Step [822/13804], Loss: 2.8753, Perplexity: 17.7304, time_taken_in_seconds: 18
Epoch [1/1], Step [823/13804], Loss: 3.4686, Perplexity: 32.0919, time_taken_in_seconds: 19
Epoch [1/1], Step [824/13804], Loss: 3.2976, Perplexity: 27.0489, time_taken_in_seconds: 20
Epoch [1/1], Step [825/13804], Loss: 3.1029, Perplexity: 22.2616, time_taken_in_seconds: 21
Epoch [1/1], Step [826/13804], Loss: 2.9592, Perplexity: 19.2834, time_taken_in_seconds: 22
Epoch [1/1], Step [827/13804], Loss: 2.9911, Perplexity: 19.9071, time_taken_in_seconds: 23
Epoch [1/1], Step [828/13804], Loss: 3.2087, Perplexity: 24.7457, time_taken_in_seconds: 24
Epoch [1/1], Step [829/13804], Loss: 3.0757, Perplexity: 21.6641, time_taken_in_seconds: 24
Epoch [1/1], Step [830/13804], Loss: 3.3377, Perplexity: 28.1547, time_taken_in_seconds: 25
Epoch [1/1], Step [831/13804], Loss: 3.3917, Perplexity: 29.7165, time_taken_in_seconds: 26
Epoch [1/1], Step [832/13804], Loss: 3.3946, Perplexity: 29.8041, time_taken_in_seconds: 27
Epoch [1/1], Step [833/13804], Loss: 3.3341, Perplexity: 28.0518, time_taken_in_seconds: 28
Epoch [1/1], Step [834/13804], Loss: 2.8168, Perplexity: 16.7224, time_taken_in_seconds: 29
Epoch [1/1], Step [835/13804], Loss: 3.0797, Perplexity: 21.7517, time_taken_in_seconds: 30
Epoch [1/1], Step [836/13804], Loss: 3.0887, Perplexity: 21.9485, time_taken_in_seconds: 31
Epoch [1/1], Step [837/13804], Loss: 2.9245, Perplexity: 18.6255, time_taken_in_seconds: 32
Epoch [1/1], Step [838/13804], Loss: 2.9289, Perplexity: 18.7071, time_taken_in_seconds: 32
Epoch [1/1], Step [839/13804], Loss: 3.1815, Perplexity: 24.0830, time_taken_in_seconds: 33
Epoch [1/1], Step [840/13804], Loss: 3.4182, Perplexity: 30.5159, time_taken_in_seconds: 34
Epoch [1/1], Step [841/13804], Loss: 3.1055, Perplexity: 22.3198, time_taken_in_seconds: 35
Epoch [1/1], Step [842/13804], Loss: 3.1294, Perplexity: 22.8598, time_taken_in_seconds: 36
Epoch [1/1], Step [843/13804], Loss: 3.1546, Perplexity: 23.4437, time_taken_in_seconds: 37
Epoch [1/1], Step [844/13804], Loss: 2.8147, Perplexity: 16.6876, time_taken_in_seconds: 38
Epoch [1/1], Step [845/13804], Loss: 3.2252, Perplexity: 25.1589, time_taken_in_seconds: 38
Epoch [1/1], Step [846/13804], Loss: 3.4038, Perplexity: 30.0793, time_taken_in_seconds: 39
Epoch [1/1], Step [847/13804], Loss: 3.0829, Perplexity: 21.8227, time_taken_in_seconds: 40
Epoch [1/1], Step [848/13804], Loss: 3.0119, Perplexity: 20.3269, time_taken_in_seconds: 41
Epoch [1/1], Step [849/13804], Loss: 3.0663, Perplexity: 21.4626, time_taken_in_seconds: 42
Epoch [1/1], Step [850/13804], Loss: 3.1707, Perplexity: 23.8248, time_taken_in_seconds: 43
Epoch [1/1], Step [851/13804], Loss: 3.2613, Perplexity: 26.0833, time_taken_in_seconds: 44
Epoch [1/1], Step [852/13804], Loss: 3.1109, Perplexity: 22.4412, time_taken_in_seconds: 44
Epoch [1/1], Step [853/13804], Loss: 3.4844, Perplexity: 32.6030, time_taken_in_seconds: 45
Epoch [1/1], Step [854/13804], Loss: 3.3414, Perplexity: 28.2597, time_taken_in_seconds: 46
Epoch [1/1], Step [855/13804], Loss: 3.0851, Perplexity: 21.8695, time_taken_in_seconds: 47
Epoch [1/1], Step [856/13804], Loss: 3.2150, Perplexity: 24.9025, time_taken_in_seconds: 48
Epoch [1/1], Step [857/13804], Loss: 3.3931, Perplexity: 29.7567, time_taken_in_seconds: 49
Epoch [1/1], Step [858/13804], Loss: 2.9188, Perplexity: 18.5186, time_taken_in_seconds: 50
Epoch [1/1], Step [859/13804], Loss: 4.0179, Perplexity: 55.5845, time_taken_in_seconds: 51
Epoch [1/1], Step [860/13804], Loss: 3.5670, Perplexity: 35.4106, time_taken_in_seconds: 51
Epoch [1/1], Step [861/13804], Loss: 2.9034, Perplexity: 18.2361, time_taken_in_seconds: 52
Epoch [1/1], Step [862/13804], Loss: 3.1498, Perplexity: 23.3318, time_taken_in_seconds: 53
Epoch [1/1], Step [863/13804], Loss: 2.9874, Perplexity: 19.8350, time_taken_in_seconds: 54
Epoch [1/1], Step [864/13804], Loss: 2.8664, Perplexity: 17.5737, time_taken_in_seconds: 55
Epoch [1/1], Step [865/13804], Loss: 3.3085, Perplexity: 27.3439, time_taken_in_seconds: 56
Epoch [1/1], Step [866/13804], Loss: 2.9285, Perplexity: 18.6987, time_taken_in_seconds: 57
Epoch [1/1], Step [867/13804], Loss: 3.0948, Perplexity: 22.0835, time_taken_in_seconds: 57
Epoch [1/1], Step [868/13804], Loss: 3.5089, Perplexity: 33.4118, time_taken_in_seconds: 58
Epoch [1/1], Step [869/13804], Loss: 3.1322, Perplexity: 22.9252, time_taken_in_seconds: 59
Epoch [1/1], Step [870/13804], Loss: 2.8915, Perplexity: 18.0205, time_taken_in_seconds: 60
Epoch [1/1], Step [871/13804], Loss: 3.6605, Perplexity: 38.8790, time_taken_in_seconds: 61
Epoch [1/1], Step [872/13804], Loss: 3.8038, Perplexity: 44.8732, time_taken_in_seconds: 62
Epoch [1/1], Step [873/13804], Loss: 3.2457, Perplexity: 25.6806, time_taken_in_seconds: 62
Epoch [1/1], Step [874/13804], Loss: 3.2102, Perplexity: 24.7832, time_taken_in_seconds: 63
Epoch [1/1], Step [875/13804], Loss: 3.2638, Perplexity: 26.1487, time_taken_in_seconds: 64
Epoch [1/1], Step [876/13804], Loss: 3.0736, Perplexity: 21.6191, time_taken_in_seconds: 65
Epoch [1/1], Step [877/13804], Loss: 3.0113, Perplexity: 20.3134, time_taken_in_seconds: 66
Epoch [1/1], Step [878/13804], Loss: 3.0881, Perplexity: 21.9349, time_taken_in_seconds: 67
Epoch [1/1], Step [879/13804], Loss: 3.0628, Perplexity: 21.3864, time_taken_in_seconds: 68
Epoch [1/1], Step [880/13804], Loss: 2.9957, Perplexity: 19.9988, time_taken_in_seconds: 68
Epoch [1/1], Step [881/13804], Loss: 3.0871, Perplexity: 21.9132, time_taken_in_seconds: 69
Epoch [1/1], Step [882/13804], Loss: 3.1709, Perplexity: 23.8298, time_taken_in_seconds: 70
Epoch [1/1], Step [883/13804], Loss: 3.4091, Perplexity: 30.2370, time_taken_in_seconds: 71
Epoch [1/1], Step [884/13804], Loss: 3.2248, Perplexity: 25.1494, time_taken_in_seconds: 72
Epoch [1/1], Step [885/13804], Loss: 3.0952, Perplexity: 22.0927, time_taken_in_seconds: 73
Epoch [1/1], Step [886/13804], Loss: 3.0275, Perplexity: 20.6459, time_taken_in_seconds: 74
Epoch [1/1], Step [887/13804], Loss: 3.1340, Perplexity: 22.9645, time_taken_in_seconds: 74
Epoch [1/1], Step [888/13804], Loss: 3.1641, Perplexity: 23.6677, time_taken_in_seconds: 75
Epoch [1/1], Step [889/13804], Loss: 2.9502, Perplexity: 19.1088, time_taken_in_seconds: 76
Epoch [1/1], Step [890/13804], Loss: 3.2465, Perplexity: 25.6995, time_taken_in_seconds: 77
Epoch [1/1], Step [891/13804], Loss: 3.1786, Perplexity: 24.0126, time_taken_in_seconds: 78
Epoch [1/1], Step [892/13804], Loss: 2.7975, Perplexity: 16.4040, time_taken_in_seconds: 79
Epoch [1/1], Step [893/13804], Loss: 2.6503, Perplexity: 14.1587, time_taken_in_seconds: 80
Epoch [1/1], Step [894/13804], Loss: 2.7962, Perplexity: 16.3823, time_taken_in_seconds: 81
Epoch [1/1], Step [895/13804], Loss: 2.9644, Perplexity: 19.3825, time_taken_in_seconds: 81
Epoch [1/1], Step [896/13804], Loss: 3.2643, Perplexity: 26.1612, time_taken_in_seconds: 82
Epoch [1/1], Step [897/13804], Loss: 3.1175, Perplexity: 22.5909, time_taken_in_seconds: 83
Epoch [1/1], Step [898/13804], Loss: 2.9482, Perplexity: 19.0718, time_taken_in_seconds: 84
Epoch [1/1], Step [899/13804], Loss: 3.1859, Perplexity: 24.1891, time_taken_in_seconds: 85
Epoch [1/1], Step [900/13804], Loss: 3.2972, Perplexity: 27.0364, time_taken_in_seconds: 86
Epoch [1/1], Step [901/13804], Loss: 2.9147, Perplexity: 18.4424, time_taken_in_seconds: 0
Epoch [1/1], Step [902/13804], Loss: 3.5510, Perplexity: 34.8494, time_taken_in_seconds: 1
Epoch [1/1], Step [903/13804], Loss: 3.2704, Perplexity: 26.3231, time_taken_in_seconds: 2
Epoch [1/1], Step [904/13804], Loss: 3.2199, Perplexity: 25.0245, time_taken_in_seconds: 3
Epoch [1/1], Step [905/13804], Loss: 3.1422, Perplexity: 23.1537, time_taken_in_seconds: 4
Epoch [1/1], Step [906/13804], Loss: 3.3953, Perplexity: 29.8238, time_taken_in_seconds: 5
Epoch [1/1], Step [907/13804], Loss: 3.0566, Perplexity: 21.2550, time_taken_in_seconds: 6
Epoch [1/1], Step [908/13804], Loss: 2.6710, Perplexity: 14.4540, time_taken_in_seconds: 7
Epoch [1/1], Step [909/13804], Loss: 3.8547, Perplexity: 47.2125, time_taken_in_seconds: 7
Epoch [1/1], Step [910/13804], Loss: 2.8468, Perplexity: 17.2328, time_taken_in_seconds: 8
Epoch [1/1], Step [911/13804], Loss: 3.4740, Perplexity: 32.2644, time_taken_in_seconds: 9
Epoch [1/1], Step [912/13804], Loss: 2.9925, Perplexity: 19.9349, time_taken_in_seconds: 10
Epoch [1/1], Step [913/13804], Loss: 3.2867, Perplexity: 26.7539, time_taken_in_seconds: 11
Epoch [1/1], Step [914/13804], Loss: 3.0044, Perplexity: 20.1740, time_taken_in_seconds: 12
Epoch [1/1], Step [915/13804], Loss: 2.7214, Perplexity: 15.2015, time_taken_in_seconds: 13
Epoch [1/1], Step [916/13804], Loss: 2.9570, Perplexity: 19.2394, time_taken_in_seconds: 14
Epoch [1/1], Step [917/13804], Loss: 3.2829, Perplexity: 26.6529, time_taken_in_seconds: 14
Epoch [1/1], Step [918/13804], Loss: 3.1894, Perplexity: 24.2749, time_taken_in_seconds: 15
Epoch [1/1], Step [919/13804], Loss: 3.1021, Perplexity: 22.2457, time_taken_in_seconds: 16
Epoch [1/1], Step [920/13804], Loss: 3.3300, Perplexity: 27.9374, time_taken_in_seconds: 17
Epoch [1/1], Step [921/13804], Loss: 3.1677, Perplexity: 23.7527, time_taken_in_seconds: 18
Epoch [1/1], Step [922/13804], Loss: 3.6083, Perplexity: 36.9023, time_taken_in_seconds: 19
Epoch [1/1], Step [923/13804], Loss: 3.0947, Perplexity: 22.0797, time_taken_in_seconds: 19
Epoch [1/1], Step [924/13804], Loss: 3.0951, Perplexity: 22.0893, time_taken_in_seconds: 20
Epoch [1/1], Step [925/13804], Loss: 3.2116, Perplexity: 24.8176, time_taken_in_seconds: 21
Epoch [1/1], Step [926/13804], Loss: 3.4408, Perplexity: 31.2129, time_taken_in_seconds: 22
Epoch [1/1], Step [927/13804], Loss: 3.4185, Perplexity: 30.5236, time_taken_in_seconds: 23
Epoch [1/1], Step [928/13804], Loss: 2.8743, Perplexity: 17.7124, time_taken_in_seconds: 24
Epoch [1/1], Step [929/13804], Loss: 3.3477, Perplexity: 28.4372, time_taken_in_seconds: 25
Epoch [1/1], Step [930/13804], Loss: 3.1625, Perplexity: 23.6289, time_taken_in_seconds: 26
Epoch [1/1], Step [931/13804], Loss: 3.0386, Perplexity: 20.8765, time_taken_in_seconds: 26
Epoch [1/1], Step [932/13804], Loss: 2.8943, Perplexity: 18.0716, time_taken_in_seconds: 27
Epoch [1/1], Step [933/13804], Loss: 3.2430, Perplexity: 25.6107, time_taken_in_seconds: 28
Epoch [1/1], Step [934/13804], Loss: 3.4855, Perplexity: 32.6375, time_taken_in_seconds: 29
Epoch [1/1], Step [935/13804], Loss: 2.8265, Perplexity: 16.8867, time_taken_in_seconds: 30
Epoch [1/1], Step [936/13804], Loss: 2.8847, Perplexity: 17.8987, time_taken_in_seconds: 31
Epoch [1/1], Step [937/13804], Loss: 3.0762, Perplexity: 21.6754, time_taken_in_seconds: 32
Epoch [1/1], Step [938/13804], Loss: 3.5894, Perplexity: 36.2120, time_taken_in_seconds: 32
Epoch [1/1], Step [939/13804], Loss: 3.0536, Perplexity: 21.1924, time_taken_in_seconds: 33
Epoch [1/1], Step [940/13804], Loss: 3.3211, Perplexity: 27.6897, time_taken_in_seconds: 34
Epoch [1/1], Step [941/13804], Loss: 3.0594, Perplexity: 21.3143, time_taken_in_seconds: 35
Epoch [1/1], Step [942/13804], Loss: 3.2881, Perplexity: 26.7924, time_taken_in_seconds: 36
Epoch [1/1], Step [943/13804], Loss: 3.2403, Perplexity: 25.5420, time_taken_in_seconds: 37
Epoch [1/1], Step [944/13804], Loss: 2.7832, Perplexity: 16.1701, time_taken_in_seconds: 38
Epoch [1/1], Step [945/13804], Loss: 3.0618, Perplexity: 21.3670, time_taken_in_seconds: 38
Epoch [1/1], Step [946/13804], Loss: 3.1264, Perplexity: 22.7915, time_taken_in_seconds: 39
Epoch [1/1], Step [947/13804], Loss: 3.4415, Perplexity: 31.2329, time_taken_in_seconds: 40
Epoch [1/1], Step [948/13804], Loss: 3.8830, Perplexity: 48.5705, time_taken_in_seconds: 41
Epoch [1/1], Step [949/13804], Loss: 3.2475, Perplexity: 25.7269, time_taken_in_seconds: 42
Epoch [1/1], Step [950/13804], Loss: 3.4605, Perplexity: 31.8343, time_taken_in_seconds: 43
Epoch [1/1], Step [951/13804], Loss: 3.2402, Perplexity: 25.5376, time_taken_in_seconds: 44
Epoch [1/1], Step [952/13804], Loss: 3.1033, Perplexity: 22.2719, time_taken_in_seconds: 44
Epoch [1/1], Step [953/13804], Loss: 2.8041, Perplexity: 16.5119, time_taken_in_seconds: 45
Epoch [1/1], Step [954/13804], Loss: 3.1855, Perplexity: 24.1803, time_taken_in_seconds: 46
Epoch [1/1], Step [955/13804], Loss: 3.1272, Perplexity: 22.8101, time_taken_in_seconds: 47
Epoch [1/1], Step [956/13804], Loss: 3.2154, Perplexity: 24.9124, time_taken_in_seconds: 48
Epoch [1/1], Step [957/13804], Loss: 2.8936, Perplexity: 18.0580, time_taken_in_seconds: 49
Epoch [1/1], Step [958/13804], Loss: 3.2503, Perplexity: 25.7979, time_taken_in_seconds: 50
Epoch [1/1], Step [959/13804], Loss: 2.8931, Perplexity: 18.0495, time_taken_in_seconds: 50
Epoch [1/1], Step [960/13804], Loss: 2.7843, Perplexity: 16.1881, time_taken_in_seconds: 51
Epoch [1/1], Step [961/13804], Loss: 2.7613, Perplexity: 15.8207, time_taken_in_seconds: 52
Epoch [1/1], Step [962/13804], Loss: 2.9355, Perplexity: 18.8317, time_taken_in_seconds: 53
Epoch [1/1], Step [963/13804], Loss: 3.2419, Perplexity: 25.5816, time_taken_in_seconds: 54
Epoch [1/1], Step [964/13804], Loss: 3.0857, Perplexity: 21.8821, time_taken_in_seconds: 55
Epoch [1/1], Step [965/13804], Loss: 3.3262, Perplexity: 27.8331, time_taken_in_seconds: 56
Epoch [1/1], Step [966/13804], Loss: 2.9591, Perplexity: 19.2804, time_taken_in_seconds: 56
Epoch [1/1], Step [967/13804], Loss: 3.1311, Perplexity: 22.8987, time_taken_in_seconds: 57
Epoch [1/1], Step [968/13804], Loss: 3.0588, Perplexity: 21.3028, time_taken_in_seconds: 58
Epoch [1/1], Step [969/13804], Loss: 3.5010, Perplexity: 33.1481, time_taken_in_seconds: 59
Epoch [1/1], Step [970/13804], Loss: 3.4393, Perplexity: 31.1639, time_taken_in_seconds: 60
Epoch [1/1], Step [971/13804], Loss: 2.9343, Perplexity: 18.8089, time_taken_in_seconds: 61
Epoch [1/1], Step [972/13804], Loss: 3.5302, Perplexity: 34.1323, time_taken_in_seconds: 62
Epoch [1/1], Step [973/13804], Loss: 3.3081, Perplexity: 27.3322, time_taken_in_seconds: 63
Epoch [1/1], Step [974/13804], Loss: 3.2542, Perplexity: 25.8997, time_taken_in_seconds: 63
Epoch [1/1], Step [975/13804], Loss: 3.0265, Perplexity: 20.6245, time_taken_in_seconds: 64
Epoch [1/1], Step [976/13804], Loss: 3.4781, Perplexity: 32.3987, time_taken_in_seconds: 65
Epoch [1/1], Step [977/13804], Loss: 2.9912, Perplexity: 19.9099, time_taken_in_seconds: 66
Epoch [1/1], Step [978/13804], Loss: 2.9053, Perplexity: 18.2705, time_taken_in_seconds: 67
Epoch [1/1], Step [979/13804], Loss: 2.9829, Perplexity: 19.7446, time_taken_in_seconds: 68
Epoch [1/1], Step [980/13804], Loss: 3.3947, Perplexity: 29.8048, time_taken_in_seconds: 69
Epoch [1/1], Step [981/13804], Loss: 3.2881, Perplexity: 26.7926, time_taken_in_seconds: 69
Epoch [1/1], Step [982/13804], Loss: 2.8528, Perplexity: 17.3354, time_taken_in_seconds: 70
Epoch [1/1], Step [983/13804], Loss: 3.2524, Perplexity: 25.8519, time_taken_in_seconds: 71
Epoch [1/1], Step [984/13804], Loss: 3.8015, Perplexity: 44.7689, time_taken_in_seconds: 72
Epoch [1/1], Step [985/13804], Loss: 3.0287, Perplexity: 20.6713, time_taken_in_seconds: 73
Epoch [1/1], Step [986/13804], Loss: 3.3804, Perplexity: 29.3814, time_taken_in_seconds: 74
Epoch [1/1], Step [987/13804], Loss: 3.1384, Perplexity: 23.0681, time_taken_in_seconds: 75
Epoch [1/1], Step [988/13804], Loss: 2.7189, Perplexity: 15.1638, time_taken_in_seconds: 75
Epoch [1/1], Step [989/13804], Loss: 3.8753, Perplexity: 48.1954, time_taken_in_seconds: 76
Epoch [1/1], Step [990/13804], Loss: 3.1871, Perplexity: 24.2171, time_taken_in_seconds: 77
Epoch [1/1], Step [991/13804], Loss: 3.1361, Perplexity: 23.0131, time_taken_in_seconds: 78
Epoch [1/1], Step [992/13804], Loss: 2.8532, Perplexity: 17.3431, time_taken_in_seconds: 79
Epoch [1/1], Step [993/13804], Loss: 3.2774, Perplexity: 26.5060, time_taken_in_seconds: 80
Epoch [1/1], Step [994/13804], Loss: 3.1445, Perplexity: 23.2088, time_taken_in_seconds: 81
Epoch [1/1], Step [995/13804], Loss: 3.0648, Perplexity: 21.4307, time_taken_in_seconds: 81
Epoch [1/1], Step [996/13804], Loss: 3.3082, Perplexity: 27.3356, time_taken_in_seconds: 82
Epoch [1/1], Step [997/13804], Loss: 2.8808, Perplexity: 17.8287, time_taken_in_seconds: 83
Epoch [1/1], Step [998/13804], Loss: 3.3900, Perplexity: 29.6666, time_taken_in_seconds: 84
Epoch [1/1], Step [999/13804], Loss: 2.7554, Perplexity: 15.7272, time_taken_in_seconds: 85
Epoch [1/1], Step [1000/13804], Loss: 2.7951, Perplexity: 16.3642, time_taken_in_seconds: 86
Epoch [1/1], Step [1001/13804], Loss: 2.8012, Perplexity: 16.4651, time_taken_in_seconds: 0
Epoch [1/1], Step [1002/13804], Loss: 3.6034, Perplexity: 36.7242, time_taken_in_seconds: 1
Epoch [1/1], Step [1003/13804], Loss: 2.9892, Perplexity: 19.8696, time_taken_in_seconds: 2
Epoch [1/1], Step [1004/13804], Loss: 3.6856, Perplexity: 39.8708, time_taken_in_seconds: 3
Epoch [1/1], Step [1005/13804], Loss: 3.0630, Perplexity: 21.3913, time_taken_in_seconds: 4
Epoch [1/1], Step [1006/13804], Loss: 3.3600, Perplexity: 28.7883, time_taken_in_seconds: 5
Epoch [1/1], Step [1007/13804], Loss: 2.8595, Perplexity: 17.4524, time_taken_in_seconds: 6
Epoch [1/1], Step [1008/13804], Loss: 3.0798, Perplexity: 21.7548, time_taken_in_seconds: 6
Epoch [1/1], Step [1009/13804], Loss: 2.8623, Perplexity: 17.5022, time_taken_in_seconds: 7
Epoch [1/1], Step [1010/13804], Loss: 3.1231, Perplexity: 22.7157, time_taken_in_seconds: 8
Epoch [1/1], Step [1011/13804], Loss: 3.1773, Perplexity: 23.9817, time_taken_in_seconds: 9
Epoch [1/1], Step [1012/13804], Loss: 3.1444, Perplexity: 23.2069, time_taken_in_seconds: 10
Epoch [1/1], Step [1013/13804], Loss: 2.8682, Perplexity: 17.6054, time_taken_in_seconds: 11
Epoch [1/1], Step [1014/13804], Loss: 2.9936, Perplexity: 19.9581, time_taken_in_seconds: 12
Epoch [1/1], Step [1015/13804], Loss: 3.0537, Perplexity: 21.1943, time_taken_in_seconds: 12
Epoch [1/1], Step [1016/13804], Loss: 3.1505, Perplexity: 23.3486, time_taken_in_seconds: 13
Epoch [1/1], Step [1017/13804], Loss: 3.0777, Perplexity: 21.7079, time_taken_in_seconds: 14
Epoch [1/1], Step [1018/13804], Loss: 2.8482, Perplexity: 17.2561, time_taken_in_seconds: 15
Epoch [1/1], Step [1019/13804], Loss: 3.0336, Perplexity: 20.7726, time_taken_in_seconds: 16
Epoch [1/1], Step [1020/13804], Loss: 3.0295, Perplexity: 20.6863, time_taken_in_seconds: 17
Epoch [1/1], Step [1021/13804], Loss: 3.1612, Perplexity: 23.5982, time_taken_in_seconds: 18
Epoch [1/1], Step [1022/13804], Loss: 3.1133, Perplexity: 22.4950, time_taken_in_seconds: 18
Epoch [1/1], Step [1023/13804], Loss: 2.7929, Perplexity: 16.3279, time_taken_in_seconds: 19
Epoch [1/1], Step [1024/13804], Loss: 3.3971, Perplexity: 29.8766, time_taken_in_seconds: 20
Epoch [1/1], Step [1025/13804], Loss: 2.9247, Perplexity: 18.6287, time_taken_in_seconds: 21
Epoch [1/1], Step [1026/13804], Loss: 2.9511, Perplexity: 19.1269, time_taken_in_seconds: 22
Epoch [1/1], Step [1027/13804], Loss: 3.0709, Perplexity: 21.5605, time_taken_in_seconds: 23
Epoch [1/1], Step [1028/13804], Loss: 3.1390, Perplexity: 23.0799, time_taken_in_seconds: 24
Epoch [1/1], Step [1029/13804], Loss: 3.4756, Perplexity: 32.3175, time_taken_in_seconds: 24
Epoch [1/1], Step [1030/13804], Loss: 2.9733, Perplexity: 19.5561, time_taken_in_seconds: 25
Epoch [1/1], Step [1031/13804], Loss: 3.3599, Perplexity: 28.7865, time_taken_in_seconds: 26
Epoch [1/1], Step [1032/13804], Loss: 2.8707, Perplexity: 17.6487, time_taken_in_seconds: 27
Epoch [1/1], Step [1033/13804], Loss: 2.9424, Perplexity: 18.9612, time_taken_in_seconds: 28
Epoch [1/1], Step [1034/13804], Loss: 3.3458, Perplexity: 28.3821, time_taken_in_seconds: 29
Epoch [1/1], Step [1035/13804], Loss: 3.0258, Perplexity: 20.6115, time_taken_in_seconds: 30
Epoch [1/1], Step [1036/13804], Loss: 2.9995, Perplexity: 20.0753, time_taken_in_seconds: 30
Epoch [1/1], Step [1037/13804], Loss: 3.7794, Perplexity: 43.7890, time_taken_in_seconds: 31
Epoch [1/1], Step [1038/13804], Loss: 3.3995, Perplexity: 29.9491, time_taken_in_seconds: 32
Epoch [1/1], Step [1039/13804], Loss: 3.5817, Perplexity: 35.9357, time_taken_in_seconds: 33
Epoch [1/1], Step [1040/13804], Loss: 3.1371, Perplexity: 23.0380, time_taken_in_seconds: 34
Epoch [1/1], Step [1041/13804], Loss: 3.0754, Perplexity: 21.6586, time_taken_in_seconds: 35
Epoch [1/1], Step [1042/13804], Loss: 3.2205, Perplexity: 25.0396, time_taken_in_seconds: 36
Epoch [1/1], Step [1043/13804], Loss: 3.0001, Perplexity: 20.0881, time_taken_in_seconds: 37
Epoch [1/1], Step [1044/13804], Loss: 2.9965, Perplexity: 20.0156, time_taken_in_seconds: 37
Epoch [1/1], Step [1045/13804], Loss: 2.9422, Perplexity: 18.9580, time_taken_in_seconds: 38
Epoch [1/1], Step [1046/13804], Loss: 3.0826, Perplexity: 21.8153, time_taken_in_seconds: 39
Epoch [1/1], Step [1047/13804], Loss: 3.0389, Perplexity: 20.8824, time_taken_in_seconds: 40
Epoch [1/1], Step [1048/13804], Loss: 3.1492, Perplexity: 23.3165, time_taken_in_seconds: 41
Epoch [1/1], Step [1049/13804], Loss: 3.1134, Perplexity: 22.4969, time_taken_in_seconds: 42
Epoch [1/1], Step [1050/13804], Loss: 2.8960, Perplexity: 18.1022, time_taken_in_seconds: 43
Epoch [1/1], Step [1051/13804], Loss: 2.9773, Perplexity: 19.6346, time_taken_in_seconds: 43
Epoch [1/1], Step [1052/13804], Loss: 3.1824, Perplexity: 24.1051, time_taken_in_seconds: 44
Epoch [1/1], Step [1053/13804], Loss: 2.9932, Perplexity: 19.9497, time_taken_in_seconds: 45
Epoch [1/1], Step [1054/13804], Loss: 3.5422, Perplexity: 34.5418, time_taken_in_seconds: 46
Epoch [1/1], Step [1055/13804], Loss: 3.4768, Perplexity: 32.3558, time_taken_in_seconds: 47
Epoch [1/1], Step [1056/13804], Loss: 2.7087, Perplexity: 15.0101, time_taken_in_seconds: 48
Epoch [1/1], Step [1057/13804], Loss: 3.1354, Perplexity: 22.9985, time_taken_in_seconds: 49
Epoch [1/1], Step [1058/13804], Loss: 3.0330, Perplexity: 20.7602, time_taken_in_seconds: 49
Epoch [1/1], Step [1059/13804], Loss: 2.9100, Perplexity: 18.3567, time_taken_in_seconds: 50
Epoch [1/1], Step [1060/13804], Loss: 3.1567, Perplexity: 23.4931, time_taken_in_seconds: 51
Epoch [1/1], Step [1061/13804], Loss: 3.0179, Perplexity: 20.4486, time_taken_in_seconds: 52
Epoch [1/1], Step [1062/13804], Loss: 2.7272, Perplexity: 15.2893, time_taken_in_seconds: 53
Epoch [1/1], Step [1063/13804], Loss: 3.3827, Perplexity: 29.4513, time_taken_in_seconds: 54
Epoch [1/1], Step [1064/13804], Loss: 3.1777, Perplexity: 23.9927, time_taken_in_seconds: 55
Epoch [1/1], Step [1065/13804], Loss: 2.7583, Perplexity: 15.7731, time_taken_in_seconds: 55
Epoch [1/1], Step [1066/13804], Loss: 2.9200, Perplexity: 18.5408, time_taken_in_seconds: 56
Epoch [1/1], Step [1067/13804], Loss: 2.8924, Perplexity: 18.0369, time_taken_in_seconds: 57
Epoch [1/1], Step [1068/13804], Loss: 2.7018, Perplexity: 14.9062, time_taken_in_seconds: 58
Epoch [1/1], Step [1069/13804], Loss: 3.1684, Perplexity: 23.7689, time_taken_in_seconds: 59
Epoch [1/1], Step [1070/13804], Loss: 3.2164, Perplexity: 24.9378, time_taken_in_seconds: 60
Epoch [1/1], Step [1071/13804], Loss: 3.1197, Perplexity: 22.6388, time_taken_in_seconds: 61
Epoch [1/1], Step [1072/13804], Loss: 3.2401, Perplexity: 25.5375, time_taken_in_seconds: 61
Epoch [1/1], Step [1073/13804], Loss: 3.1285, Perplexity: 22.8396, time_taken_in_seconds: 62
Epoch [1/1], Step [1074/13804], Loss: 3.3354, Perplexity: 28.0894, time_taken_in_seconds: 63
Epoch [1/1], Step [1075/13804], Loss: 2.9857, Perplexity: 19.7999, time_taken_in_seconds: 64
Epoch [1/1], Step [1076/13804], Loss: 3.0068, Perplexity: 20.2216, time_taken_in_seconds: 65
Epoch [1/1], Step [1077/13804], Loss: 3.6594, Perplexity: 38.8378, time_taken_in_seconds: 66
Epoch [1/1], Step [1078/13804], Loss: 2.8500, Perplexity: 17.2883, time_taken_in_seconds: 66
Epoch [1/1], Step [1079/13804], Loss: 2.8286, Perplexity: 16.9226, time_taken_in_seconds: 67
Epoch [1/1], Step [1080/13804], Loss: 3.2896, Perplexity: 26.8312, time_taken_in_seconds: 68
Epoch [1/1], Step [1081/13804], Loss: 2.8896, Perplexity: 17.9864, time_taken_in_seconds: 69
Epoch [1/1], Step [1082/13804], Loss: 3.0226, Perplexity: 20.5448, time_taken_in_seconds: 70
Epoch [1/1], Step [1083/13804], Loss: 3.1545, Perplexity: 23.4410, time_taken_in_seconds: 71
Epoch [1/1], Step [1084/13804], Loss: 3.1409, Perplexity: 23.1237, time_taken_in_seconds: 72
Epoch [1/1], Step [1085/13804], Loss: 2.9876, Perplexity: 19.8385, time_taken_in_seconds: 72
Epoch [1/1], Step [1086/13804], Loss: 3.1195, Perplexity: 22.6355, time_taken_in_seconds: 73
Epoch [1/1], Step [1087/13804], Loss: 2.9837, Perplexity: 19.7598, time_taken_in_seconds: 74
Epoch [1/1], Step [1088/13804], Loss: 3.0733, Perplexity: 21.6136, time_taken_in_seconds: 75
Epoch [1/1], Step [1089/13804], Loss: 3.0569, Perplexity: 21.2610, time_taken_in_seconds: 76
Epoch [1/1], Step [1090/13804], Loss: 2.9741, Perplexity: 19.5729, time_taken_in_seconds: 77
Epoch [1/1], Step [1091/13804], Loss: 3.1353, Perplexity: 22.9965, time_taken_in_seconds: 78
Epoch [1/1], Step [1092/13804], Loss: 3.2958, Perplexity: 26.9991, time_taken_in_seconds: 78
Epoch [1/1], Step [1093/13804], Loss: 2.9557, Perplexity: 19.2158, time_taken_in_seconds: 79
Epoch [1/1], Step [1094/13804], Loss: 3.1410, Perplexity: 23.1276, time_taken_in_seconds: 80
Epoch [1/1], Step [1095/13804], Loss: 3.4420, Perplexity: 31.2496, time_taken_in_seconds: 81
Epoch [1/1], Step [1096/13804], Loss: 3.0220, Perplexity: 20.5332, time_taken_in_seconds: 82
Epoch [1/1], Step [1097/13804], Loss: 3.1089, Perplexity: 22.3972, time_taken_in_seconds: 83
Epoch [1/1], Step [1098/13804], Loss: 3.0831, Perplexity: 21.8260, time_taken_in_seconds: 84
Epoch [1/1], Step [1099/13804], Loss: 3.0296, Perplexity: 20.6899, time_taken_in_seconds: 84
Epoch [1/1], Step [1100/13804], Loss: 3.1551, Perplexity: 23.4564, time_taken_in_seconds: 85
Epoch [1/1], Step [1101/13804], Loss: 3.1300, Perplexity: 22.8732, time_taken_in_seconds: 0
Epoch [1/1], Step [1102/13804], Loss: 2.7504, Perplexity: 15.6495, time_taken_in_seconds: 1
Epoch [1/1], Step [1103/13804], Loss: 3.3374, Perplexity: 28.1456, time_taken_in_seconds: 2
Epoch [1/1], Step [1104/13804], Loss: 3.2048, Perplexity: 24.6495, time_taken_in_seconds: 3
Epoch [1/1], Step [1105/13804], Loss: 2.7103, Perplexity: 15.0335, time_taken_in_seconds: 4
Epoch [1/1], Step [1106/13804], Loss: 3.3235, Perplexity: 27.7583, time_taken_in_seconds: 5
Epoch [1/1], Step [1107/13804], Loss: 2.9412, Perplexity: 18.9379, time_taken_in_seconds: 6
Epoch [1/1], Step [1108/13804], Loss: 3.1675, Perplexity: 23.7488, time_taken_in_seconds: 6
Epoch [1/1], Step [1109/13804], Loss: 3.4088, Perplexity: 30.2288, time_taken_in_seconds: 7
Epoch [1/1], Step [1110/13804], Loss: 3.0226, Perplexity: 20.5439, time_taken_in_seconds: 8
Epoch [1/1], Step [1111/13804], Loss: 3.4684, Perplexity: 32.0864, time_taken_in_seconds: 9
Epoch [1/1], Step [1112/13804], Loss: 3.5436, Perplexity: 34.5899, time_taken_in_seconds: 10
Epoch [1/1], Step [1113/13804], Loss: 3.1330, Perplexity: 22.9427, time_taken_in_seconds: 11
Epoch [1/1], Step [1114/13804], Loss: 3.1811, Perplexity: 24.0726, time_taken_in_seconds: 12
Epoch [1/1], Step [1115/13804], Loss: 3.3065, Perplexity: 27.2898, time_taken_in_seconds: 13
Epoch [1/1], Step [1116/13804], Loss: 3.0723, Perplexity: 21.5917, time_taken_in_seconds: 13
Epoch [1/1], Step [1117/13804], Loss: 2.7656, Perplexity: 15.8887, time_taken_in_seconds: 14
Epoch [1/1], Step [1118/13804], Loss: 2.6859, Perplexity: 14.6715, time_taken_in_seconds: 15
Epoch [1/1], Step [1119/13804], Loss: 3.1273, Perplexity: 22.8114, time_taken_in_seconds: 16
Epoch [1/1], Step [1120/13804], Loss: 3.1730, Perplexity: 23.8798, time_taken_in_seconds: 17
Epoch [1/1], Step [1121/13804], Loss: 3.2063, Perplexity: 24.6868, time_taken_in_seconds: 18
Epoch [1/1], Step [1122/13804], Loss: 4.1267, Perplexity: 61.9736, time_taken_in_seconds: 19
Epoch [1/1], Step [1123/13804], Loss: 2.8829, Perplexity: 17.8658, time_taken_in_seconds: 19
Epoch [1/1], Step [1124/13804], Loss: 3.1604, Perplexity: 23.5795, time_taken_in_seconds: 20
Epoch [1/1], Step [1125/13804], Loss: 2.9041, Perplexity: 18.2487, time_taken_in_seconds: 21
Epoch [1/1], Step [1126/13804], Loss: 2.9808, Perplexity: 19.7036, time_taken_in_seconds: 22
Epoch [1/1], Step [1127/13804], Loss: 3.0011, Perplexity: 20.1083, time_taken_in_seconds: 23
Epoch [1/1], Step [1128/13804], Loss: 3.1278, Perplexity: 22.8231, time_taken_in_seconds: 24
Epoch [1/1], Step [1129/13804], Loss: 3.5775, Perplexity: 35.7855, time_taken_in_seconds: 25
Epoch [1/1], Step [1130/13804], Loss: 2.8964, Perplexity: 18.1085, time_taken_in_seconds: 25
Epoch [1/1], Step [1131/13804], Loss: 2.9060, Perplexity: 18.2833, time_taken_in_seconds: 26
Epoch [1/1], Step [1132/13804], Loss: 3.0308, Perplexity: 20.7145, time_taken_in_seconds: 27
Epoch [1/1], Step [1133/13804], Loss: 3.1644, Perplexity: 23.6738, time_taken_in_seconds: 28
Epoch [1/1], Step [1134/13804], Loss: 3.0631, Perplexity: 21.3941, time_taken_in_seconds: 29
Epoch [1/1], Step [1135/13804], Loss: 3.0466, Perplexity: 21.0435, time_taken_in_seconds: 30
Epoch [1/1], Step [1136/13804], Loss: 2.8315, Perplexity: 16.9705, time_taken_in_seconds: 30
Epoch [1/1], Step [1137/13804], Loss: 3.1318, Perplexity: 22.9144, time_taken_in_seconds: 31
Epoch [1/1], Step [1138/13804], Loss: 3.7531, Perplexity: 42.6543, time_taken_in_seconds: 32
Epoch [1/1], Step [1139/13804], Loss: 3.6197, Perplexity: 37.3252, time_taken_in_seconds: 33
Epoch [1/1], Step [1140/13804], Loss: 2.7466, Perplexity: 15.5889, time_taken_in_seconds: 34
Epoch [1/1], Step [1141/13804], Loss: 2.9100, Perplexity: 18.3573, time_taken_in_seconds: 35
Epoch [1/1], Step [1142/13804], Loss: 2.8727, Perplexity: 17.6848, time_taken_in_seconds: 36
Epoch [1/1], Step [1143/13804], Loss: 3.2460, Perplexity: 25.6865, time_taken_in_seconds: 36
Epoch [1/1], Step [1144/13804], Loss: 3.5171, Perplexity: 33.6881, time_taken_in_seconds: 37
Epoch [1/1], Step [1145/13804], Loss: 3.1563, Perplexity: 23.4841, time_taken_in_seconds: 38
Epoch [1/1], Step [1146/13804], Loss: 2.8584, Perplexity: 17.4329, time_taken_in_seconds: 39
Epoch [1/1], Step [1147/13804], Loss: 2.8850, Perplexity: 17.9040, time_taken_in_seconds: 40
Epoch [1/1], Step [1148/13804], Loss: 3.2930, Perplexity: 26.9235, time_taken_in_seconds: 41
Epoch [1/1], Step [1149/13804], Loss: 2.9037, Perplexity: 18.2417, time_taken_in_seconds: 42
Epoch [1/1], Step [1150/13804], Loss: 3.6186, Perplexity: 37.2868, time_taken_in_seconds: 42
Epoch [1/1], Step [1151/13804], Loss: 3.0836, Perplexity: 21.8379, time_taken_in_seconds: 43
Epoch [1/1], Step [1152/13804], Loss: 3.0817, Perplexity: 21.7955, time_taken_in_seconds: 44
Epoch [1/1], Step [1153/13804], Loss: 2.9986, Perplexity: 20.0582, time_taken_in_seconds: 45
Epoch [1/1], Step [1154/13804], Loss: 2.5558, Perplexity: 12.8820, time_taken_in_seconds: 46
Epoch [1/1], Step [1155/13804], Loss: 3.1980, Perplexity: 24.4847, time_taken_in_seconds: 47
Epoch [1/1], Step [1156/13804], Loss: 3.3249, Perplexity: 27.7952, time_taken_in_seconds: 47
Epoch [1/1], Step [1157/13804], Loss: 3.5678, Perplexity: 35.4370, time_taken_in_seconds: 48
Epoch [1/1], Step [1158/13804], Loss: 3.1883, Perplexity: 24.2476, time_taken_in_seconds: 49
Epoch [1/1], Step [1159/13804], Loss: 2.7907, Perplexity: 16.2920, time_taken_in_seconds: 50
Epoch [1/1], Step [1160/13804], Loss: 2.8658, Perplexity: 17.5628, time_taken_in_seconds: 51
Epoch [1/1], Step [1161/13804], Loss: 3.4201, Perplexity: 30.5739, time_taken_in_seconds: 52
Epoch [1/1], Step [1162/13804], Loss: 2.9175, Perplexity: 18.4955, time_taken_in_seconds: 53
Epoch [1/1], Step [1163/13804], Loss: 3.1033, Perplexity: 22.2716, time_taken_in_seconds: 53
Epoch [1/1], Step [1164/13804], Loss: 3.2151, Perplexity: 24.9052, time_taken_in_seconds: 54
Epoch [1/1], Step [1165/13804], Loss: 2.9757, Perplexity: 19.6030, time_taken_in_seconds: 55
Epoch [1/1], Step [1166/13804], Loss: 3.2670, Perplexity: 26.2329, time_taken_in_seconds: 56
Epoch [1/1], Step [1167/13804], Loss: 3.1922, Perplexity: 24.3430, time_taken_in_seconds: 57
Epoch [1/1], Step [1168/13804], Loss: 3.2628, Perplexity: 26.1233, time_taken_in_seconds: 58
Epoch [1/1], Step [1169/13804], Loss: 3.0734, Perplexity: 21.6142, time_taken_in_seconds: 59
Epoch [1/1], Step [1170/13804], Loss: 3.1823, Perplexity: 24.1024, time_taken_in_seconds: 59
Epoch [1/1], Step [1171/13804], Loss: 2.7728, Perplexity: 16.0032, time_taken_in_seconds: 60
Epoch [1/1], Step [1172/13804], Loss: 3.0009, Perplexity: 20.1038, time_taken_in_seconds: 61
Epoch [1/1], Step [1173/13804], Loss: 2.9028, Perplexity: 18.2243, time_taken_in_seconds: 62
Epoch [1/1], Step [1174/13804], Loss: 3.1038, Perplexity: 22.2830, time_taken_in_seconds: 63
Epoch [1/1], Step [1175/13804], Loss: 3.1224, Perplexity: 22.6998, time_taken_in_seconds: 64
Epoch [1/1], Step [1176/13804], Loss: 2.9384, Perplexity: 18.8855, time_taken_in_seconds: 65
Epoch [1/1], Step [1177/13804], Loss: 3.1988, Perplexity: 24.5034, time_taken_in_seconds: 65
Epoch [1/1], Step [1178/13804], Loss: 3.1754, Perplexity: 23.9370, time_taken_in_seconds: 66
Epoch [1/1], Step [1179/13804], Loss: 3.1785, Perplexity: 24.0112, time_taken_in_seconds: 67
Epoch [1/1], Step [1180/13804], Loss: 3.3361, Perplexity: 28.1101, time_taken_in_seconds: 68
Epoch [1/1], Step [1181/13804], Loss: 3.0620, Perplexity: 21.3701, time_taken_in_seconds: 69
Epoch [1/1], Step [1182/13804], Loss: 2.8712, Perplexity: 17.6589, time_taken_in_seconds: 70
Epoch [1/1], Step [1183/13804], Loss: 2.8776, Perplexity: 17.7721, time_taken_in_seconds: 70
Epoch [1/1], Step [1184/13804], Loss: 3.1845, Perplexity: 24.1553, time_taken_in_seconds: 71
Epoch [1/1], Step [1185/13804], Loss: 3.2129, Perplexity: 24.8502, time_taken_in_seconds: 72
Epoch [1/1], Step [1186/13804], Loss: 3.0530, Perplexity: 21.1787, time_taken_in_seconds: 73
Epoch [1/1], Step [1187/13804], Loss: 3.2367, Perplexity: 25.4500, time_taken_in_seconds: 74
Epoch [1/1], Step [1188/13804], Loss: 3.5036, Perplexity: 33.2365, time_taken_in_seconds: 75
Epoch [1/1], Step [1189/13804], Loss: 2.9891, Perplexity: 19.8686, time_taken_in_seconds: 76
Epoch [1/1], Step [1190/13804], Loss: 3.0389, Perplexity: 20.8819, time_taken_in_seconds: 77
Epoch [1/1], Step [1191/13804], Loss: 3.0471, Perplexity: 21.0534, time_taken_in_seconds: 78
Epoch [1/1], Step [1192/13804], Loss: 2.8433, Perplexity: 17.1732, time_taken_in_seconds: 78
Epoch [1/1], Step [1193/13804], Loss: 3.2717, Perplexity: 26.3563, time_taken_in_seconds: 79
Epoch [1/1], Step [1194/13804], Loss: 3.1243, Perplexity: 22.7432, time_taken_in_seconds: 80
Epoch [1/1], Step [1195/13804], Loss: 3.3179, Perplexity: 27.6015, time_taken_in_seconds: 81
Epoch [1/1], Step [1196/13804], Loss: 2.9157, Perplexity: 18.4620, time_taken_in_seconds: 82
Epoch [1/1], Step [1197/13804], Loss: 3.0074, Perplexity: 20.2337, time_taken_in_seconds: 83
Epoch [1/1], Step [1198/13804], Loss: 3.2228, Perplexity: 25.0983, time_taken_in_seconds: 84
Epoch [1/1], Step [1199/13804], Loss: 2.9400, Perplexity: 18.9163, time_taken_in_seconds: 84
Epoch [1/1], Step [1200/13804], Loss: 2.9070, Perplexity: 18.3024, time_taken_in_seconds: 85
Epoch [1/1], Step [1201/13804], Loss: 2.9309, Perplexity: 18.7450, time_taken_in_seconds: 0
Epoch [1/1], Step [1202/13804], Loss: 3.5777, Perplexity: 35.7907, time_taken_in_seconds: 1
Epoch [1/1], Step [1203/13804], Loss: 3.6237, Perplexity: 37.4778, time_taken_in_seconds: 2
Epoch [1/1], Step [1204/13804], Loss: 3.5446, Perplexity: 34.6254, time_taken_in_seconds: 3
Epoch [1/1], Step [1205/13804], Loss: 3.5007, Perplexity: 33.1383, time_taken_in_seconds: 4
Epoch [1/1], Step [1206/13804], Loss: 2.8794, Perplexity: 17.8030, time_taken_in_seconds: 5
Epoch [1/1], Step [1207/13804], Loss: 3.2425, Perplexity: 25.5984, time_taken_in_seconds: 5
Epoch [1/1], Step [1208/13804], Loss: 3.0584, Perplexity: 21.2928, time_taken_in_seconds: 6
Epoch [1/1], Step [1209/13804], Loss: 3.2275, Perplexity: 25.2159, time_taken_in_seconds: 7
Epoch [1/1], Step [1210/13804], Loss: 3.0158, Perplexity: 20.4057, time_taken_in_seconds: 8
Epoch [1/1], Step [1211/13804], Loss: 2.8256, Perplexity: 16.8706, time_taken_in_seconds: 9
Epoch [1/1], Step [1212/13804], Loss: 3.1670, Perplexity: 23.7358, time_taken_in_seconds: 10
Epoch [1/1], Step [1213/13804], Loss: 3.3374, Perplexity: 28.1460, time_taken_in_seconds: 10
Epoch [1/1], Step [1214/13804], Loss: 3.1256, Perplexity: 22.7740, time_taken_in_seconds: 11
Epoch [1/1], Step [1215/13804], Loss: 2.9978, Perplexity: 20.0416, time_taken_in_seconds: 12
Epoch [1/1], Step [1216/13804], Loss: 3.0416, Perplexity: 20.9394, time_taken_in_seconds: 13
Epoch [1/1], Step [1217/13804], Loss: 2.8089, Perplexity: 16.5909, time_taken_in_seconds: 14
Epoch [1/1], Step [1218/13804], Loss: 3.3831, Perplexity: 29.4614, time_taken_in_seconds: 15
Epoch [1/1], Step [1219/13804], Loss: 2.9558, Perplexity: 19.2169, time_taken_in_seconds: 16
Epoch [1/1], Step [1220/13804], Loss: 3.1773, Perplexity: 23.9829, time_taken_in_seconds: 16
Epoch [1/1], Step [1221/13804], Loss: 2.8995, Perplexity: 18.1645, time_taken_in_seconds: 17
Epoch [1/1], Step [1222/13804], Loss: 2.8642, Perplexity: 17.5343, time_taken_in_seconds: 18
Epoch [1/1], Step [1223/13804], Loss: 3.2166, Perplexity: 24.9425, time_taken_in_seconds: 19
Epoch [1/1], Step [1224/13804], Loss: 2.8703, Perplexity: 17.6426, time_taken_in_seconds: 20
Epoch [1/1], Step [1225/13804], Loss: 3.0306, Perplexity: 20.7090, time_taken_in_seconds: 21
Epoch [1/1], Step [1226/13804], Loss: 3.2242, Perplexity: 25.1336, time_taken_in_seconds: 21
Epoch [1/1], Step [1227/13804], Loss: 2.9342, Perplexity: 18.8069, time_taken_in_seconds: 22
Epoch [1/1], Step [1228/13804], Loss: 3.2268, Perplexity: 25.1994, time_taken_in_seconds: 23
Epoch [1/1], Step [1229/13804], Loss: 3.3716, Perplexity: 29.1261, time_taken_in_seconds: 24
Epoch [1/1], Step [1230/13804], Loss: 3.0066, Perplexity: 20.2192, time_taken_in_seconds: 25
Epoch [1/1], Step [1231/13804], Loss: 3.3852, Perplexity: 29.5248, time_taken_in_seconds: 26
Epoch [1/1], Step [1232/13804], Loss: 2.8724, Perplexity: 17.6803, time_taken_in_seconds: 27
Epoch [1/1], Step [1233/13804], Loss: 3.0420, Perplexity: 20.9477, time_taken_in_seconds: 27
Epoch [1/1], Step [1234/13804], Loss: 3.2698, Perplexity: 26.3048, time_taken_in_seconds: 28
Epoch [1/1], Step [1235/13804], Loss: 2.9132, Perplexity: 18.4160, time_taken_in_seconds: 29
Epoch [1/1], Step [1236/13804], Loss: 3.5058, Perplexity: 33.3089, time_taken_in_seconds: 30
Epoch [1/1], Step [1237/13804], Loss: 2.9119, Perplexity: 18.3917, time_taken_in_seconds: 31
Epoch [1/1], Step [1238/13804], Loss: 3.7912, Perplexity: 44.3110, time_taken_in_seconds: 32
Epoch [1/1], Step [1239/13804], Loss: 3.1963, Perplexity: 24.4412, time_taken_in_seconds: 33
Epoch [1/1], Step [1240/13804], Loss: 2.9564, Perplexity: 19.2291, time_taken_in_seconds: 33
Epoch [1/1], Step [1241/13804], Loss: 3.0958, Perplexity: 22.1057, time_taken_in_seconds: 34
Epoch [1/1], Step [1242/13804], Loss: 3.0037, Perplexity: 20.1608, time_taken_in_seconds: 35
Epoch [1/1], Step [1243/13804], Loss: 2.9958, Perplexity: 20.0017, time_taken_in_seconds: 36
Epoch [1/1], Step [1244/13804], Loss: 2.7527, Perplexity: 15.6849, time_taken_in_seconds: 37
Epoch [1/1], Step [1245/13804], Loss: 2.8565, Perplexity: 17.4003, time_taken_in_seconds: 38
Epoch [1/1], Step [1246/13804], Loss: 3.3332, Perplexity: 28.0268, time_taken_in_seconds: 39
Epoch [1/1], Step [1247/13804], Loss: 3.7376, Perplexity: 41.9974, time_taken_in_seconds: 39
Epoch [1/1], Step [1248/13804], Loss: 2.9772, Perplexity: 19.6328, time_taken_in_seconds: 40
Epoch [1/1], Step [1249/13804], Loss: 2.9851, Perplexity: 19.7893, time_taken_in_seconds: 41
Epoch [1/1], Step [1250/13804], Loss: 3.1762, Perplexity: 23.9559, time_taken_in_seconds: 42
Epoch [1/1], Step [1251/13804], Loss: 3.0821, Perplexity: 21.8044, time_taken_in_seconds: 43
Epoch [1/1], Step [1252/13804], Loss: 3.4435, Perplexity: 31.2974, time_taken_in_seconds: 44
Epoch [1/1], Step [1253/13804], Loss: 3.4369, Perplexity: 31.0895, time_taken_in_seconds: 45
Epoch [1/1], Step [1254/13804], Loss: 3.0223, Perplexity: 20.5387, time_taken_in_seconds: 45
Epoch [1/1], Step [1255/13804], Loss: 3.1362, Perplexity: 23.0173, time_taken_in_seconds: 46
Epoch [1/1], Step [1256/13804], Loss: 3.0648, Perplexity: 21.4296, time_taken_in_seconds: 47
Epoch [1/1], Step [1257/13804], Loss: 3.4430, Perplexity: 31.2821, time_taken_in_seconds: 48
Epoch [1/1], Step [1258/13804], Loss: 3.2277, Perplexity: 25.2210, time_taken_in_seconds: 49
Epoch [1/1], Step [1259/13804], Loss: 2.8082, Perplexity: 16.5805, time_taken_in_seconds: 50
Epoch [1/1], Step [1260/13804], Loss: 2.7804, Perplexity: 16.1252, time_taken_in_seconds: 51
Epoch [1/1], Step [1261/13804], Loss: 2.9406, Perplexity: 18.9266, time_taken_in_seconds: 52
Epoch [1/1], Step [1262/13804], Loss: 3.6401, Perplexity: 38.0966, time_taken_in_seconds: 52
Epoch [1/1], Step [1263/13804], Loss: 3.0127, Perplexity: 20.3430, time_taken_in_seconds: 53
Epoch [1/1], Step [1264/13804], Loss: 2.8387, Perplexity: 17.0938, time_taken_in_seconds: 54
Epoch [1/1], Step [1265/13804], Loss: 2.9402, Perplexity: 18.9204, time_taken_in_seconds: 55
Epoch [1/1], Step [1266/13804], Loss: 3.0982, Perplexity: 22.1590, time_taken_in_seconds: 56
Epoch [1/1], Step [1267/13804], Loss: 3.3828, Perplexity: 29.4524, time_taken_in_seconds: 57
Epoch [1/1], Step [1268/13804], Loss: 3.4058, Perplexity: 30.1370, time_taken_in_seconds: 57
Epoch [1/1], Step [1269/13804], Loss: 3.1617, Perplexity: 23.6102, time_taken_in_seconds: 58
Epoch [1/1], Step [1270/13804], Loss: 3.8017, Perplexity: 44.7776, time_taken_in_seconds: 59
Epoch [1/1], Step [1271/13804], Loss: 3.5941, Perplexity: 36.3814, time_taken_in_seconds: 60
Epoch [1/1], Step [1272/13804], Loss: 2.9878, Perplexity: 19.8415, time_taken_in_seconds: 61
Epoch [1/1], Step [1273/13804], Loss: 3.6327, Perplexity: 37.8149, time_taken_in_seconds: 62
Epoch [1/1], Step [1274/13804], Loss: 3.0139, Perplexity: 20.3658, time_taken_in_seconds: 63
Epoch [1/1], Step [1275/13804], Loss: 3.3759, Perplexity: 29.2514, time_taken_in_seconds: 63
Epoch [1/1], Step [1276/13804], Loss: 3.4435, Perplexity: 31.2977, time_taken_in_seconds: 64
Epoch [1/1], Step [1277/13804], Loss: 3.0123, Perplexity: 20.3340, time_taken_in_seconds: 65
Epoch [1/1], Step [1278/13804], Loss: 2.6607, Perplexity: 14.3070, time_taken_in_seconds: 66
Epoch [1/1], Step [1279/13804], Loss: 2.7844, Perplexity: 16.1897, time_taken_in_seconds: 67
Epoch [1/1], Step [1280/13804], Loss: 3.9200, Perplexity: 50.3996, time_taken_in_seconds: 68
Epoch [1/1], Step [1281/13804], Loss: 3.1611, Perplexity: 23.5962, time_taken_in_seconds: 69
Epoch [1/1], Step [1282/13804], Loss: 3.4710, Perplexity: 32.1686, time_taken_in_seconds: 70
Epoch [1/1], Step [1283/13804], Loss: 3.3455, Perplexity: 28.3750, time_taken_in_seconds: 70
Epoch [1/1], Step [1284/13804], Loss: 3.0310, Perplexity: 20.7189, time_taken_in_seconds: 71
Epoch [1/1], Step [1285/13804], Loss: 2.9062, Perplexity: 18.2876, time_taken_in_seconds: 72
Epoch [1/1], Step [1286/13804], Loss: 3.2217, Perplexity: 25.0717, time_taken_in_seconds: 73
Epoch [1/1], Step [1287/13804], Loss: 3.1233, Perplexity: 22.7216, time_taken_in_seconds: 74
Epoch [1/1], Step [1288/13804], Loss: 2.8548, Perplexity: 17.3702, time_taken_in_seconds: 75
Epoch [1/1], Step [1289/13804], Loss: 3.1964, Perplexity: 24.4434, time_taken_in_seconds: 75
Epoch [1/1], Step [1290/13804], Loss: 2.7705, Perplexity: 15.9661, time_taken_in_seconds: 76
Epoch [1/1], Step [1291/13804], Loss: 2.9120, Perplexity: 18.3943, time_taken_in_seconds: 77
Epoch [1/1], Step [1292/13804], Loss: 3.5646, Perplexity: 35.3251, time_taken_in_seconds: 78
Epoch [1/1], Step [1293/13804], Loss: 2.8586, Perplexity: 17.4376, time_taken_in_seconds: 79
Epoch [1/1], Step [1294/13804], Loss: 2.8725, Perplexity: 17.6806, time_taken_in_seconds: 80
Epoch [1/1], Step [1295/13804], Loss: 2.7547, Perplexity: 15.7156, time_taken_in_seconds: 81
Epoch [1/1], Step [1296/13804], Loss: 2.7519, Perplexity: 15.6719, time_taken_in_seconds: 82
Epoch [1/1], Step [1297/13804], Loss: 2.9783, Perplexity: 19.6548, time_taken_in_seconds: 82
Epoch [1/1], Step [1298/13804], Loss: 2.8423, Perplexity: 17.1549, time_taken_in_seconds: 83
Epoch [1/1], Step [1299/13804], Loss: 3.0675, Perplexity: 21.4872, time_taken_in_seconds: 84
Epoch [1/1], Step [1300/13804], Loss: 3.0228, Perplexity: 20.5487, time_taken_in_seconds: 85
Epoch [1/1], Step [1301/13804], Loss: 3.2570, Perplexity: 25.9711, time_taken_in_seconds: 0
Epoch [1/1], Step [1302/13804], Loss: 2.9205, Perplexity: 18.5511, time_taken_in_seconds: 1
Epoch [1/1], Step [1303/13804], Loss: 3.2018, Perplexity: 24.5755, time_taken_in_seconds: 2
Epoch [1/1], Step [1304/13804], Loss: 2.9511, Perplexity: 19.1262, time_taken_in_seconds: 3
Epoch [1/1], Step [1305/13804], Loss: 2.9077, Perplexity: 18.3149, time_taken_in_seconds: 4
Epoch [1/1], Step [1306/13804], Loss: 3.2829, Perplexity: 26.6537, time_taken_in_seconds: 5
Epoch [1/1], Step [1307/13804], Loss: 3.3810, Perplexity: 29.3990, time_taken_in_seconds: 5
Epoch [1/1], Step [1308/13804], Loss: 2.8123, Perplexity: 16.6478, time_taken_in_seconds: 6
Epoch [1/1], Step [1309/13804], Loss: 3.0580, Perplexity: 21.2844, time_taken_in_seconds: 7
Epoch [1/1], Step [1310/13804], Loss: 2.7695, Perplexity: 15.9502, time_taken_in_seconds: 8
Epoch [1/1], Step [1311/13804], Loss: 3.0044, Perplexity: 20.1742, time_taken_in_seconds: 9
Epoch [1/1], Step [1312/13804], Loss: 3.1388, Perplexity: 23.0756, time_taken_in_seconds: 10
Epoch [1/1], Step [1313/13804], Loss: 4.2244, Perplexity: 68.3361, time_taken_in_seconds: 11
Epoch [1/1], Step [1314/13804], Loss: 3.0448, Perplexity: 21.0052, time_taken_in_seconds: 11
Epoch [1/1], Step [1315/13804], Loss: 3.5766, Perplexity: 35.7523, time_taken_in_seconds: 12
Epoch [1/1], Step [1316/13804], Loss: 3.3195, Perplexity: 27.6465, time_taken_in_seconds: 13
Epoch [1/1], Step [1317/13804], Loss: 2.8360, Perplexity: 17.0473, time_taken_in_seconds: 14
Epoch [1/1], Step [1318/13804], Loss: 3.1586, Perplexity: 23.5365, time_taken_in_seconds: 15
Epoch [1/1], Step [1319/13804], Loss: 3.0363, Perplexity: 20.8276, time_taken_in_seconds: 16
Epoch [1/1], Step [1320/13804], Loss: 3.2486, Perplexity: 25.7539, time_taken_in_seconds: 17
Epoch [1/1], Step [1321/13804], Loss: 2.8948, Perplexity: 18.0799, time_taken_in_seconds: 17
Epoch [1/1], Step [1322/13804], Loss: 2.8361, Perplexity: 17.0485, time_taken_in_seconds: 18
Epoch [1/1], Step [1323/13804], Loss: 2.9812, Perplexity: 19.7117, time_taken_in_seconds: 19
Epoch [1/1], Step [1324/13804], Loss: 3.2818, Perplexity: 26.6230, time_taken_in_seconds: 20
Epoch [1/1], Step [1325/13804], Loss: 3.0667, Perplexity: 21.4708, time_taken_in_seconds: 21
Epoch [1/1], Step [1326/13804], Loss: 2.9418, Perplexity: 18.9503, time_taken_in_seconds: 22
Epoch [1/1], Step [1327/13804], Loss: 3.0478, Perplexity: 21.0694, time_taken_in_seconds: 23
Epoch [1/1], Step [1328/13804], Loss: 2.8293, Perplexity: 16.9329, time_taken_in_seconds: 24
Epoch [1/1], Step [1329/13804], Loss: 2.9884, Perplexity: 19.8546, time_taken_in_seconds: 25
Epoch [1/1], Step [1330/13804], Loss: 3.2540, Perplexity: 25.8932, time_taken_in_seconds: 25
Epoch [1/1], Step [1331/13804], Loss: 3.0594, Perplexity: 21.3150, time_taken_in_seconds: 26
Epoch [1/1], Step [1332/13804], Loss: 2.7833, Perplexity: 16.1721, time_taken_in_seconds: 27
Epoch [1/1], Step [1333/13804], Loss: 2.8614, Perplexity: 17.4862, time_taken_in_seconds: 28
Epoch [1/1], Step [1334/13804], Loss: 3.3689, Perplexity: 29.0456, time_taken_in_seconds: 29
Epoch [1/1], Step [1335/13804], Loss: 3.2036, Perplexity: 24.6211, time_taken_in_seconds: 30
Epoch [1/1], Step [1336/13804], Loss: 3.1067, Perplexity: 22.3462, time_taken_in_seconds: 30
Epoch [1/1], Step [1337/13804], Loss: 2.7985, Perplexity: 16.4192, time_taken_in_seconds: 31
Epoch [1/1], Step [1338/13804], Loss: 3.0275, Perplexity: 20.6452, time_taken_in_seconds: 32
Epoch [1/1], Step [1339/13804], Loss: 2.9715, Perplexity: 19.5214, time_taken_in_seconds: 33
Epoch [1/1], Step [1340/13804], Loss: 2.9202, Perplexity: 18.5445, time_taken_in_seconds: 34
Epoch [1/1], Step [1341/13804], Loss: 3.4391, Perplexity: 31.1600, time_taken_in_seconds: 35
Epoch [1/1], Step [1342/13804], Loss: 3.1767, Perplexity: 23.9666, time_taken_in_seconds: 36
Epoch [1/1], Step [1343/13804], Loss: 3.1329, Perplexity: 22.9413, time_taken_in_seconds: 36
Epoch [1/1], Step [1344/13804], Loss: 3.1142, Perplexity: 22.5156, time_taken_in_seconds: 37
Epoch [1/1], Step [1345/13804], Loss: 3.5816, Perplexity: 35.9324, time_taken_in_seconds: 38
Epoch [1/1], Step [1346/13804], Loss: 3.1832, Perplexity: 24.1229, time_taken_in_seconds: 39
Epoch [1/1], Step [1347/13804], Loss: 2.9032, Perplexity: 18.2330, time_taken_in_seconds: 40
Epoch [1/1], Step [1348/13804], Loss: 2.8955, Perplexity: 18.0924, time_taken_in_seconds: 41
Epoch [1/1], Step [1349/13804], Loss: 2.8145, Perplexity: 16.6846, time_taken_in_seconds: 42
Epoch [1/1], Step [1350/13804], Loss: 3.1474, Perplexity: 23.2744, time_taken_in_seconds: 42
Epoch [1/1], Step [1351/13804], Loss: 2.7931, Perplexity: 16.3313, time_taken_in_seconds: 43
Epoch [1/1], Step [1352/13804], Loss: 2.8698, Perplexity: 17.6340, time_taken_in_seconds: 44
Epoch [1/1], Step [1353/13804], Loss: 2.9191, Perplexity: 18.5240, time_taken_in_seconds: 45
Epoch [1/1], Step [1354/13804], Loss: 2.8171, Perplexity: 16.7287, time_taken_in_seconds: 46
Epoch [1/1], Step [1355/13804], Loss: 2.8755, Perplexity: 17.7344, time_taken_in_seconds: 47
Epoch [1/1], Step [1356/13804], Loss: 2.9609, Perplexity: 19.3159, time_taken_in_seconds: 48
Epoch [1/1], Step [1357/13804], Loss: 3.1120, Perplexity: 22.4659, time_taken_in_seconds: 48
Epoch [1/1], Step [1358/13804], Loss: 2.9604, Perplexity: 19.3061, time_taken_in_seconds: 49
Epoch [1/1], Step [1359/13804], Loss: 3.1233, Perplexity: 22.7223, time_taken_in_seconds: 50
Epoch [1/1], Step [1360/13804], Loss: 3.1051, Perplexity: 22.3111, time_taken_in_seconds: 51
Epoch [1/1], Step [1361/13804], Loss: 3.1942, Perplexity: 24.3899, time_taken_in_seconds: 52
Epoch [1/1], Step [1362/13804], Loss: 3.4064, Perplexity: 30.1556, time_taken_in_seconds: 53
Epoch [1/1], Step [1363/13804], Loss: 3.1551, Perplexity: 23.4559, time_taken_in_seconds: 54
Epoch [1/1], Step [1364/13804], Loss: 2.7985, Perplexity: 16.4203, time_taken_in_seconds: 54
Epoch [1/1], Step [1365/13804], Loss: 3.2597, Perplexity: 26.0419, time_taken_in_seconds: 55
Epoch [1/1], Step [1366/13804], Loss: 3.2595, Perplexity: 26.0377, time_taken_in_seconds: 56
Epoch [1/1], Step [1367/13804], Loss: 3.0304, Perplexity: 20.7061, time_taken_in_seconds: 57
Epoch [1/1], Step [1368/13804], Loss: 3.0623, Perplexity: 21.3763, time_taken_in_seconds: 58
Epoch [1/1], Step [1369/13804], Loss: 2.8458, Perplexity: 17.2145, time_taken_in_seconds: 59
Epoch [1/1], Step [1370/13804], Loss: 3.0705, Perplexity: 21.5521, time_taken_in_seconds: 59
Epoch [1/1], Step [1371/13804], Loss: 2.8768, Perplexity: 17.7575, time_taken_in_seconds: 60
Epoch [1/1], Step [1372/13804], Loss: 2.8680, Perplexity: 17.6026, time_taken_in_seconds: 61
Epoch [1/1], Step [1373/13804], Loss: 2.8627, Perplexity: 17.5092, time_taken_in_seconds: 62
Epoch [1/1], Step [1374/13804], Loss: 2.9201, Perplexity: 18.5423, time_taken_in_seconds: 63
Epoch [1/1], Step [1375/13804], Loss: 2.9235, Perplexity: 18.6056, time_taken_in_seconds: 64
Epoch [1/1], Step [1376/13804], Loss: 3.2691, Perplexity: 26.2876, time_taken_in_seconds: 65
Epoch [1/1], Step [1377/13804], Loss: 3.0131, Perplexity: 20.3502, time_taken_in_seconds: 65
Epoch [1/1], Step [1378/13804], Loss: 3.0880, Perplexity: 21.9323, time_taken_in_seconds: 66
Epoch [1/1], Step [1379/13804], Loss: 2.5398, Perplexity: 12.6767, time_taken_in_seconds: 67
Epoch [1/1], Step [1380/13804], Loss: 3.3059, Perplexity: 27.2723, time_taken_in_seconds: 68
Epoch [1/1], Step [1381/13804], Loss: 3.5580, Perplexity: 35.0913, time_taken_in_seconds: 69
Epoch [1/1], Step [1382/13804], Loss: 2.9961, Perplexity: 20.0072, time_taken_in_seconds: 70
Epoch [1/1], Step [1383/13804], Loss: 2.9574, Perplexity: 19.2473, time_taken_in_seconds: 70
Epoch [1/1], Step [1384/13804], Loss: 2.7645, Perplexity: 15.8705, time_taken_in_seconds: 71
Epoch [1/1], Step [1385/13804], Loss: 3.2738, Perplexity: 26.4102, time_taken_in_seconds: 72
Epoch [1/1], Step [1386/13804], Loss: 2.8983, Perplexity: 18.1424, time_taken_in_seconds: 73
Epoch [1/1], Step [1387/13804], Loss: 3.3790, Perplexity: 29.3402, time_taken_in_seconds: 74
Epoch [1/1], Step [1388/13804], Loss: 2.5817, Perplexity: 13.2196, time_taken_in_seconds: 75
Epoch [1/1], Step [1389/13804], Loss: 2.8886, Perplexity: 17.9677, time_taken_in_seconds: 76
Epoch [1/1], Step [1390/13804], Loss: 3.2037, Perplexity: 24.6243, time_taken_in_seconds: 76
Epoch [1/1], Step [1391/13804], Loss: 3.0823, Perplexity: 21.8091, time_taken_in_seconds: 77
Epoch [1/1], Step [1392/13804], Loss: 3.1907, Perplexity: 24.3060, time_taken_in_seconds: 78
Epoch [1/1], Step [1393/13804], Loss: 3.0168, Perplexity: 20.4267, time_taken_in_seconds: 79
Epoch [1/1], Step [1394/13804], Loss: 3.1270, Perplexity: 22.8059, time_taken_in_seconds: 80
Epoch [1/1], Step [1395/13804], Loss: 3.2192, Perplexity: 25.0090, time_taken_in_seconds: 81
Epoch [1/1], Step [1396/13804], Loss: 2.9903, Perplexity: 19.8915, time_taken_in_seconds: 82
Epoch [1/1], Step [1397/13804], Loss: 3.5378, Perplexity: 34.3903, time_taken_in_seconds: 83
Epoch [1/1], Step [1398/13804], Loss: 2.6293, Perplexity: 13.8647, time_taken_in_seconds: 83
Epoch [1/1], Step [1399/13804], Loss: 3.0868, Perplexity: 21.9058, time_taken_in_seconds: 84
Epoch [1/1], Step [1400/13804], Loss: 2.7222, Perplexity: 15.2139, time_taken_in_seconds: 85
Epoch [1/1], Step [1401/13804], Loss: 3.1014, Perplexity: 22.2297, time_taken_in_seconds: 0
Epoch [1/1], Step [1402/13804], Loss: 2.7225, Perplexity: 15.2181, time_taken_in_seconds: 1
Epoch [1/1], Step [1403/13804], Loss: 3.2386, Perplexity: 25.4991, time_taken_in_seconds: 2
Epoch [1/1], Step [1404/13804], Loss: 2.9646, Perplexity: 19.3862, time_taken_in_seconds: 3
Epoch [1/1], Step [1405/13804], Loss: 2.9040, Perplexity: 18.2468, time_taken_in_seconds: 4
Epoch [1/1], Step [1406/13804], Loss: 2.8587, Perplexity: 17.4385, time_taken_in_seconds: 5
Epoch [1/1], Step [1407/13804], Loss: 2.8187, Perplexity: 16.7551, time_taken_in_seconds: 5
Epoch [1/1], Step [1408/13804], Loss: 2.6971, Perplexity: 14.8371, time_taken_in_seconds: 6
Epoch [1/1], Step [1409/13804], Loss: 3.0942, Perplexity: 22.0685, time_taken_in_seconds: 7
Epoch [1/1], Step [1410/13804], Loss: 2.9955, Perplexity: 19.9959, time_taken_in_seconds: 8
Epoch [1/1], Step [1411/13804], Loss: 3.4717, Perplexity: 32.1927, time_taken_in_seconds: 9
Epoch [1/1], Step [1412/13804], Loss: 2.8441, Perplexity: 17.1854, time_taken_in_seconds: 10
Epoch [1/1], Step [1413/13804], Loss: 3.1610, Perplexity: 23.5936, time_taken_in_seconds: 11
Epoch [1/1], Step [1414/13804], Loss: 2.6905, Perplexity: 14.7388, time_taken_in_seconds: 11
Epoch [1/1], Step [1415/13804], Loss: 3.7101, Perplexity: 40.8562, time_taken_in_seconds: 12
Epoch [1/1], Step [1416/13804], Loss: 2.7317, Perplexity: 15.3596, time_taken_in_seconds: 13
Epoch [1/1], Step [1417/13804], Loss: 2.9714, Perplexity: 19.5201, time_taken_in_seconds: 14
Epoch [1/1], Step [1418/13804], Loss: 2.8393, Perplexity: 17.1040, time_taken_in_seconds: 15
Epoch [1/1], Step [1419/13804], Loss: 2.7998, Perplexity: 16.4418, time_taken_in_seconds: 16
Epoch [1/1], Step [1420/13804], Loss: 3.1906, Perplexity: 24.3039, time_taken_in_seconds: 16
Epoch [1/1], Step [1421/13804], Loss: 2.4965, Perplexity: 12.1404, time_taken_in_seconds: 17
Epoch [1/1], Step [1422/13804], Loss: 2.6808, Perplexity: 14.5974, time_taken_in_seconds: 18
Epoch [1/1], Step [1423/13804], Loss: 2.9403, Perplexity: 18.9215, time_taken_in_seconds: 19
Epoch [1/1], Step [1424/13804], Loss: 2.8669, Perplexity: 17.5832, time_taken_in_seconds: 20
Epoch [1/1], Step [1425/13804], Loss: 3.5061, Perplexity: 33.3185, time_taken_in_seconds: 21
Epoch [1/1], Step [1426/13804], Loss: 3.2565, Perplexity: 25.9576, time_taken_in_seconds: 22
Epoch [1/1], Step [1427/13804], Loss: 3.0566, Perplexity: 21.2558, time_taken_in_seconds: 22
Epoch [1/1], Step [1428/13804], Loss: 3.2051, Perplexity: 24.6573, time_taken_in_seconds: 23
Epoch [1/1], Step [1429/13804], Loss: 2.7860, Perplexity: 16.2154, time_taken_in_seconds: 24
Epoch [1/1], Step [1430/13804], Loss: 2.8550, Perplexity: 17.3746, time_taken_in_seconds: 25
Epoch [1/1], Step [1431/13804], Loss: 3.0913, Perplexity: 22.0061, time_taken_in_seconds: 26
Epoch [1/1], Step [1432/13804], Loss: 3.1671, Perplexity: 23.7393, time_taken_in_seconds: 27
Epoch [1/1], Step [1433/13804], Loss: 2.7078, Perplexity: 14.9965, time_taken_in_seconds: 28
Epoch [1/1], Step [1434/13804], Loss: 3.4931, Perplexity: 32.8869, time_taken_in_seconds: 28
Epoch [1/1], Step [1435/13804], Loss: 3.3543, Perplexity: 28.6268, time_taken_in_seconds: 29
Epoch [1/1], Step [1436/13804], Loss: 3.1206, Perplexity: 22.6592, time_taken_in_seconds: 30
Epoch [1/1], Step [1437/13804], Loss: 3.0241, Perplexity: 20.5762, time_taken_in_seconds: 31
Epoch [1/1], Step [1438/13804], Loss: 3.0284, Perplexity: 20.6642, time_taken_in_seconds: 32
Epoch [1/1], Step [1439/13804], Loss: 2.9135, Perplexity: 18.4219, time_taken_in_seconds: 33
Epoch [1/1], Step [1440/13804], Loss: 3.5241, Perplexity: 33.9233, time_taken_in_seconds: 33
Epoch [1/1], Step [1441/13804], Loss: 3.2824, Perplexity: 26.6395, time_taken_in_seconds: 34
Epoch [1/1], Step [1442/13804], Loss: 3.2552, Perplexity: 25.9258, time_taken_in_seconds: 35
Epoch [1/1], Step [1443/13804], Loss: 3.3250, Perplexity: 27.7987, time_taken_in_seconds: 36
Epoch [1/1], Step [1444/13804], Loss: 3.2133, Perplexity: 24.8614, time_taken_in_seconds: 37
Epoch [1/1], Step [1445/13804], Loss: 2.7883, Perplexity: 16.2541, time_taken_in_seconds: 38
Epoch [1/1], Step [1446/13804], Loss: 3.0642, Perplexity: 21.4167, time_taken_in_seconds: 38
Epoch [1/1], Step [1447/13804], Loss: 3.0902, Perplexity: 21.9821, time_taken_in_seconds: 39
Epoch [1/1], Step [1448/13804], Loss: 3.1110, Perplexity: 22.4445, time_taken_in_seconds: 40
Epoch [1/1], Step [1449/13804], Loss: 3.1714, Perplexity: 23.8417, time_taken_in_seconds: 41
Epoch [1/1], Step [1450/13804], Loss: 2.6194, Perplexity: 13.7271, time_taken_in_seconds: 42
Epoch [1/1], Step [1451/13804], Loss: 3.0885, Perplexity: 21.9433, time_taken_in_seconds: 43
Epoch [1/1], Step [1452/13804], Loss: 3.2064, Perplexity: 24.6908, time_taken_in_seconds: 43
Epoch [1/1], Step [1453/13804], Loss: 3.1332, Perplexity: 22.9474, time_taken_in_seconds: 44
Epoch [1/1], Step [1454/13804], Loss: 3.1772, Perplexity: 23.9792, time_taken_in_seconds: 45
Epoch [1/1], Step [1455/13804], Loss: 3.1896, Perplexity: 24.2784, time_taken_in_seconds: 46
Epoch [1/1], Step [1456/13804], Loss: 3.8435, Perplexity: 46.6881, time_taken_in_seconds: 47
Epoch [1/1], Step [1457/13804], Loss: 3.1277, Perplexity: 22.8213, time_taken_in_seconds: 48
Epoch [1/1], Step [1458/13804], Loss: 2.7872, Perplexity: 16.2362, time_taken_in_seconds: 49
Epoch [1/1], Step [1459/13804], Loss: 3.1561, Perplexity: 23.4778, time_taken_in_seconds: 49
Epoch [1/1], Step [1460/13804], Loss: 3.0025, Perplexity: 20.1359, time_taken_in_seconds: 50
Epoch [1/1], Step [1461/13804], Loss: 3.0099, Perplexity: 20.2861, time_taken_in_seconds: 51
Epoch [1/1], Step [1462/13804], Loss: 3.3402, Perplexity: 28.2256, time_taken_in_seconds: 52
Epoch [1/1], Step [1463/13804], Loss: 3.2680, Perplexity: 26.2598, time_taken_in_seconds: 53
Epoch [1/1], Step [1464/13804], Loss: 2.6275, Perplexity: 13.8391, time_taken_in_seconds: 54
Epoch [1/1], Step [1465/13804], Loss: 2.9589, Perplexity: 19.2765, time_taken_in_seconds: 55
Epoch [1/1], Step [1466/13804], Loss: 3.0271, Perplexity: 20.6371, time_taken_in_seconds: 55
Epoch [1/1], Step [1467/13804], Loss: 2.6501, Perplexity: 14.1551, time_taken_in_seconds: 56
Epoch [1/1], Step [1468/13804], Loss: 3.0732, Perplexity: 21.6107, time_taken_in_seconds: 57
Epoch [1/1], Step [1469/13804], Loss: 2.8753, Perplexity: 17.7309, time_taken_in_seconds: 58
Epoch [1/1], Step [1470/13804], Loss: 2.8373, Perplexity: 17.0698, time_taken_in_seconds: 59
Epoch [1/1], Step [1471/13804], Loss: 2.8551, Perplexity: 17.3762, time_taken_in_seconds: 60
Epoch [1/1], Step [1472/13804], Loss: 2.9771, Perplexity: 19.6301, time_taken_in_seconds: 61
Epoch [1/1], Step [1473/13804], Loss: 3.2636, Perplexity: 26.1435, time_taken_in_seconds: 62
Epoch [1/1], Step [1474/13804], Loss: 2.8339, Perplexity: 17.0121, time_taken_in_seconds: 62
Epoch [1/1], Step [1475/13804], Loss: 2.9973, Perplexity: 20.0314, time_taken_in_seconds: 63
Epoch [1/1], Step [1476/13804], Loss: 3.5697, Perplexity: 35.5073, time_taken_in_seconds: 64
Epoch [1/1], Step [1477/13804], Loss: 3.5017, Perplexity: 33.1726, time_taken_in_seconds: 65
Epoch [1/1], Step [1478/13804], Loss: 2.8103, Perplexity: 16.6155, time_taken_in_seconds: 66
Epoch [1/1], Step [1479/13804], Loss: 2.7563, Perplexity: 15.7412, time_taken_in_seconds: 67
Epoch [1/1], Step [1480/13804], Loss: 2.9404, Perplexity: 18.9231, time_taken_in_seconds: 67
Epoch [1/1], Step [1481/13804], Loss: 2.9988, Perplexity: 20.0606, time_taken_in_seconds: 68
Epoch [1/1], Step [1482/13804], Loss: 2.7744, Perplexity: 16.0291, time_taken_in_seconds: 69
Epoch [1/1], Step [1483/13804], Loss: 3.4375, Perplexity: 31.1096, time_taken_in_seconds: 70
Epoch [1/1], Step [1484/13804], Loss: 2.7693, Perplexity: 15.9481, time_taken_in_seconds: 71
Epoch [1/1], Step [1485/13804], Loss: 2.9647, Perplexity: 19.3882, time_taken_in_seconds: 72
Epoch [1/1], Step [1486/13804], Loss: 2.9759, Perplexity: 19.6077, time_taken_in_seconds: 73
Epoch [1/1], Step [1487/13804], Loss: 2.9722, Perplexity: 19.5344, time_taken_in_seconds: 73
Epoch [1/1], Step [1488/13804], Loss: 2.7863, Perplexity: 16.2211, time_taken_in_seconds: 74
Epoch [1/1], Step [1489/13804], Loss: 3.1254, Perplexity: 22.7686, time_taken_in_seconds: 75
Epoch [1/1], Step [1490/13804], Loss: 2.7456, Perplexity: 15.5744, time_taken_in_seconds: 76
Epoch [1/1], Step [1491/13804], Loss: 2.6464, Perplexity: 14.1030, time_taken_in_seconds: 77
Epoch [1/1], Step [1492/13804], Loss: 2.7536, Perplexity: 15.6992, time_taken_in_seconds: 78
Epoch [1/1], Step [1493/13804], Loss: 3.2117, Perplexity: 24.8205, time_taken_in_seconds: 78
Epoch [1/1], Step [1494/13804], Loss: 2.8976, Perplexity: 18.1315, time_taken_in_seconds: 79
Epoch [1/1], Step [1495/13804], Loss: 2.9300, Perplexity: 18.7280, time_taken_in_seconds: 80
Epoch [1/1], Step [1496/13804], Loss: 3.2893, Perplexity: 26.8244, time_taken_in_seconds: 81
Epoch [1/1], Step [1497/13804], Loss: 3.1398, Perplexity: 23.1003, time_taken_in_seconds: 82
Epoch [1/1], Step [1498/13804], Loss: 2.9809, Perplexity: 19.7046, time_taken_in_seconds: 83
Epoch [1/1], Step [1499/13804], Loss: 2.9238, Perplexity: 18.6126, time_taken_in_seconds: 83
Epoch [1/1], Step [1500/13804], Loss: 3.0868, Perplexity: 21.9070, time_taken_in_seconds: 84
Epoch [1/1], Step [1501/13804], Loss: 3.5576, Perplexity: 35.0792, time_taken_in_seconds: 0
Epoch [1/1], Step [1502/13804], Loss: 3.0223, Perplexity: 20.5390, time_taken_in_seconds: 1
Epoch [1/1], Step [1503/13804], Loss: 3.2078, Perplexity: 24.7254, time_taken_in_seconds: 2
Epoch [1/1], Step [1504/13804], Loss: 3.1788, Perplexity: 24.0172, time_taken_in_seconds: 3
Epoch [1/1], Step [1505/13804], Loss: 3.3766, Perplexity: 29.2713, time_taken_in_seconds: 4
Epoch [1/1], Step [1506/13804], Loss: 3.0934, Perplexity: 22.0527, time_taken_in_seconds: 5
Epoch [1/1], Step [1507/13804], Loss: 2.6956, Perplexity: 14.8149, time_taken_in_seconds: 5
Epoch [1/1], Step [1508/13804], Loss: 3.0425, Perplexity: 20.9568, time_taken_in_seconds: 6
Epoch [1/1], Step [1509/13804], Loss: 2.7878, Perplexity: 16.2455, time_taken_in_seconds: 7
Epoch [1/1], Step [1510/13804], Loss: 3.4017, Perplexity: 30.0164, time_taken_in_seconds: 8
Epoch [1/1], Step [1511/13804], Loss: 2.5507, Perplexity: 12.8158, time_taken_in_seconds: 9
Epoch [1/1], Step [1512/13804], Loss: 2.9317, Perplexity: 18.7589, time_taken_in_seconds: 10
Epoch [1/1], Step [1513/13804], Loss: 2.3878, Perplexity: 10.8891, time_taken_in_seconds: 11
Epoch [1/1], Step [1514/13804], Loss: 2.9311, Perplexity: 18.7481, time_taken_in_seconds: 11
Epoch [1/1], Step [1515/13804], Loss: 3.1470, Perplexity: 23.2668, time_taken_in_seconds: 12
Epoch [1/1], Step [1516/13804], Loss: 2.9068, Perplexity: 18.2974, time_taken_in_seconds: 13
Epoch [1/1], Step [1517/13804], Loss: 2.8207, Perplexity: 16.7887, time_taken_in_seconds: 14
Epoch [1/1], Step [1518/13804], Loss: 3.1518, Perplexity: 23.3790, time_taken_in_seconds: 15
Epoch [1/1], Step [1519/13804], Loss: 2.9285, Perplexity: 18.6995, time_taken_in_seconds: 16
Epoch [1/1], Step [1520/13804], Loss: 3.2486, Perplexity: 25.7530, time_taken_in_seconds: 16
Epoch [1/1], Step [1521/13804], Loss: 2.7120, Perplexity: 15.0596, time_taken_in_seconds: 17
Epoch [1/1], Step [1522/13804], Loss: 2.8756, Perplexity: 17.7354, time_taken_in_seconds: 18
Epoch [1/1], Step [1523/13804], Loss: 2.9750, Perplexity: 19.5893, time_taken_in_seconds: 19
Epoch [1/1], Step [1524/13804], Loss: 2.7499, Perplexity: 15.6413, time_taken_in_seconds: 20
Epoch [1/1], Step [1525/13804], Loss: 2.7901, Perplexity: 16.2819, time_taken_in_seconds: 21
Epoch [1/1], Step [1526/13804], Loss: 3.0519, Perplexity: 21.1562, time_taken_in_seconds: 21
Epoch [1/1], Step [1527/13804], Loss: 2.8514, Perplexity: 17.3112, time_taken_in_seconds: 22
Epoch [1/1], Step [1528/13804], Loss: 3.0194, Perplexity: 20.4797, time_taken_in_seconds: 23
Epoch [1/1], Step [1529/13804], Loss: 2.9607, Perplexity: 19.3114, time_taken_in_seconds: 24
Epoch [1/1], Step [1530/13804], Loss: 3.3472, Perplexity: 28.4241, time_taken_in_seconds: 25
Epoch [1/1], Step [1531/13804], Loss: 3.2935, Perplexity: 26.9357, time_taken_in_seconds: 26
Epoch [1/1], Step [1532/13804], Loss: 2.7075, Perplexity: 14.9916, time_taken_in_seconds: 27
Epoch [1/1], Step [1533/13804], Loss: 3.2775, Perplexity: 26.5086, time_taken_in_seconds: 27
Epoch [1/1], Step [1534/13804], Loss: 3.1359, Perplexity: 23.0085, time_taken_in_seconds: 28
Epoch [1/1], Step [1535/13804], Loss: 3.2653, Perplexity: 26.1883, time_taken_in_seconds: 29
Epoch [1/1], Step [1536/13804], Loss: 3.0107, Perplexity: 20.3020, time_taken_in_seconds: 30
Epoch [1/1], Step [1537/13804], Loss: 2.7614, Perplexity: 15.8222, time_taken_in_seconds: 31
Epoch [1/1], Step [1538/13804], Loss: 2.9415, Perplexity: 18.9441, time_taken_in_seconds: 32
Epoch [1/1], Step [1539/13804], Loss: 2.8518, Perplexity: 17.3192, time_taken_in_seconds: 33
Epoch [1/1], Step [1540/13804], Loss: 2.8497, Perplexity: 17.2832, time_taken_in_seconds: 34
Epoch [1/1], Step [1541/13804], Loss: 3.2368, Perplexity: 25.4519, time_taken_in_seconds: 34
Epoch [1/1], Step [1542/13804], Loss: 2.9959, Perplexity: 20.0032, time_taken_in_seconds: 35
Epoch [1/1], Step [1543/13804], Loss: 3.6055, Perplexity: 36.8012, time_taken_in_seconds: 36
Epoch [1/1], Step [1544/13804], Loss: 3.1357, Perplexity: 23.0048, time_taken_in_seconds: 37
Epoch [1/1], Step [1545/13804], Loss: 2.8074, Perplexity: 16.5672, time_taken_in_seconds: 38
Epoch [1/1], Step [1546/13804], Loss: 3.1839, Perplexity: 24.1413, time_taken_in_seconds: 39
Epoch [1/1], Step [1547/13804], Loss: 2.9760, Perplexity: 19.6097, time_taken_in_seconds: 40
Epoch [1/1], Step [1548/13804], Loss: 3.0974, Perplexity: 22.1413, time_taken_in_seconds: 41
Epoch [1/1], Step [1549/13804], Loss: 2.7685, Perplexity: 15.9342, time_taken_in_seconds: 41
Epoch [1/1], Step [1550/13804], Loss: 2.9475, Perplexity: 19.0575, time_taken_in_seconds: 42
Epoch [1/1], Step [1551/13804], Loss: 2.6223, Perplexity: 13.7670, time_taken_in_seconds: 43
Epoch [1/1], Step [1552/13804], Loss: 3.3980, Perplexity: 29.9030, time_taken_in_seconds: 44
Epoch [1/1], Step [1553/13804], Loss: 2.8486, Perplexity: 17.2638, time_taken_in_seconds: 45
Epoch [1/1], Step [1554/13804], Loss: 2.9763, Perplexity: 19.6157, time_taken_in_seconds: 46
Epoch [1/1], Step [1555/13804], Loss: 3.3413, Perplexity: 28.2571, time_taken_in_seconds: 46
Epoch [1/1], Step [1556/13804], Loss: 3.3860, Perplexity: 29.5464, time_taken_in_seconds: 47
Epoch [1/1], Step [1557/13804], Loss: 2.8872, Perplexity: 17.9430, time_taken_in_seconds: 48
Epoch [1/1], Step [1558/13804], Loss: 3.0298, Perplexity: 20.6939, time_taken_in_seconds: 49
Epoch [1/1], Step [1559/13804], Loss: 2.9469, Perplexity: 19.0469, time_taken_in_seconds: 50
Epoch [1/1], Step [1560/13804], Loss: 3.5752, Perplexity: 35.7017, time_taken_in_seconds: 51
Epoch [1/1], Step [1561/13804], Loss: 3.1262, Perplexity: 22.7882, time_taken_in_seconds: 52
Epoch [1/1], Step [1562/13804], Loss: 3.0050, Perplexity: 20.1869, time_taken_in_seconds: 52
Epoch [1/1], Step [1563/13804], Loss: 2.7305, Perplexity: 15.3399, time_taken_in_seconds: 53
Epoch [1/1], Step [1564/13804], Loss: 3.0346, Perplexity: 20.7924, time_taken_in_seconds: 54
Epoch [1/1], Step [1565/13804], Loss: 3.0391, Perplexity: 20.8857, time_taken_in_seconds: 55
Epoch [1/1], Step [1566/13804], Loss: 2.9563, Perplexity: 19.2269, time_taken_in_seconds: 56
Epoch [1/1], Step [1567/13804], Loss: 3.2137, Perplexity: 24.8697, time_taken_in_seconds: 57
Epoch [1/1], Step [1568/13804], Loss: 3.1686, Perplexity: 23.7748, time_taken_in_seconds: 57
Epoch [1/1], Step [1569/13804], Loss: 2.8020, Perplexity: 16.4779, time_taken_in_seconds: 58
Epoch [1/1], Step [1570/13804], Loss: 3.2365, Perplexity: 25.4458, time_taken_in_seconds: 59
Epoch [1/1], Step [1571/13804], Loss: 3.0384, Perplexity: 20.8709, time_taken_in_seconds: 60
Epoch [1/1], Step [1572/13804], Loss: 2.8850, Perplexity: 17.9031, time_taken_in_seconds: 61
Epoch [1/1], Step [1573/13804], Loss: 2.9161, Perplexity: 18.4687, time_taken_in_seconds: 62
Epoch [1/1], Step [1574/13804], Loss: 2.7319, Perplexity: 15.3620, time_taken_in_seconds: 63
Epoch [1/1], Step [1575/13804], Loss: 3.1766, Perplexity: 23.9661, time_taken_in_seconds: 63
Epoch [1/1], Step [1576/13804], Loss: 2.7980, Perplexity: 16.4118, time_taken_in_seconds: 64
Epoch [1/1], Step [1577/13804], Loss: 3.0418, Perplexity: 20.9435, time_taken_in_seconds: 65
Epoch [1/1], Step [1578/13804], Loss: 3.0235, Perplexity: 20.5636, time_taken_in_seconds: 66
Epoch [1/1], Step [1579/13804], Loss: 2.8758, Perplexity: 17.7392, time_taken_in_seconds: 67
Epoch [1/1], Step [1580/13804], Loss: 2.8081, Perplexity: 16.5782, time_taken_in_seconds: 68
Epoch [1/1], Step [1581/13804], Loss: 2.9900, Perplexity: 19.8861, time_taken_in_seconds: 68
Epoch [1/1], Step [1582/13804], Loss: 2.5587, Perplexity: 12.9196, time_taken_in_seconds: 69
Epoch [1/1], Step [1583/13804], Loss: 2.8667, Perplexity: 17.5797, time_taken_in_seconds: 70
Epoch [1/1], Step [1584/13804], Loss: 2.7164, Perplexity: 15.1258, time_taken_in_seconds: 71
Epoch [1/1], Step [1585/13804], Loss: 2.6964, Perplexity: 14.8270, time_taken_in_seconds: 72
Epoch [1/1], Step [1586/13804], Loss: 3.3221, Perplexity: 27.7174, time_taken_in_seconds: 73
Epoch [1/1], Step [1587/13804], Loss: 3.3877, Perplexity: 29.5983, time_taken_in_seconds: 74
Epoch [1/1], Step [1588/13804], Loss: 3.1592, Perplexity: 23.5515, time_taken_in_seconds: 74
Epoch [1/1], Step [1589/13804], Loss: 3.3196, Perplexity: 27.6506, time_taken_in_seconds: 75
Epoch [1/1], Step [1590/13804], Loss: 2.9772, Perplexity: 19.6327, time_taken_in_seconds: 76
Epoch [1/1], Step [1591/13804], Loss: 3.0717, Perplexity: 21.5790, time_taken_in_seconds: 77
Epoch [1/1], Step [1592/13804], Loss: 2.9862, Perplexity: 19.8102, time_taken_in_seconds: 78
Epoch [1/1], Step [1593/13804], Loss: 2.9401, Perplexity: 18.9171, time_taken_in_seconds: 79
Epoch [1/1], Step [1594/13804], Loss: 3.0131, Perplexity: 20.3509, time_taken_in_seconds: 79
Epoch [1/1], Step [1595/13804], Loss: 2.9015, Perplexity: 18.2014, time_taken_in_seconds: 80
Epoch [1/1], Step [1596/13804], Loss: 2.7819, Perplexity: 16.1494, time_taken_in_seconds: 81
Epoch [1/1], Step [1597/13804], Loss: 3.2624, Perplexity: 26.1118, time_taken_in_seconds: 82
Epoch [1/1], Step [1598/13804], Loss: 3.0826, Perplexity: 21.8144, time_taken_in_seconds: 83
Epoch [1/1], Step [1599/13804], Loss: 2.8921, Perplexity: 18.0311, time_taken_in_seconds: 84
Epoch [1/1], Step [1600/13804], Loss: 4.6387, Perplexity: 103.4071, time_taken_in_seconds: 85
Epoch [1/1], Step [1601/13804], Loss: 2.8574, Perplexity: 17.4160, time_taken_in_seconds: 0
Epoch [1/1], Step [1602/13804], Loss: 2.7154, Perplexity: 15.1108, time_taken_in_seconds: 1
Epoch [1/1], Step [1603/13804], Loss: 2.8221, Perplexity: 16.8126, time_taken_in_seconds: 2
Epoch [1/1], Step [1604/13804], Loss: 3.1339, Perplexity: 22.9630, time_taken_in_seconds: 3
Epoch [1/1], Step [1605/13804], Loss: 2.9187, Perplexity: 18.5181, time_taken_in_seconds: 4
Epoch [1/1], Step [1606/13804], Loss: 3.0293, Perplexity: 20.6831, time_taken_in_seconds: 5
Epoch [1/1], Step [1607/13804], Loss: 3.0266, Perplexity: 20.6278, time_taken_in_seconds: 5
Epoch [1/1], Step [1608/13804], Loss: 3.4587, Perplexity: 31.7758, time_taken_in_seconds: 6
Epoch [1/1], Step [1609/13804], Loss: 2.9792, Perplexity: 19.6728, time_taken_in_seconds: 7
Epoch [1/1], Step [1610/13804], Loss: 2.7272, Perplexity: 15.2898, time_taken_in_seconds: 8
Epoch [1/1], Step [1611/13804], Loss: 2.8355, Perplexity: 17.0383, time_taken_in_seconds: 9
Epoch [1/1], Step [1612/13804], Loss: 2.9023, Perplexity: 18.2163, time_taken_in_seconds: 10
Epoch [1/1], Step [1613/13804], Loss: 2.8991, Perplexity: 18.1573, time_taken_in_seconds: 11
Epoch [1/1], Step [1614/13804], Loss: 3.0306, Perplexity: 20.7101, time_taken_in_seconds: 12
Epoch [1/1], Step [1615/13804], Loss: 2.9106, Perplexity: 18.3679, time_taken_in_seconds: 12
Epoch [1/1], Step [1616/13804], Loss: 2.9216, Perplexity: 18.5708, time_taken_in_seconds: 13
Epoch [1/1], Step [1617/13804], Loss: 2.9072, Perplexity: 18.3057, time_taken_in_seconds: 14
Epoch [1/1], Step [1618/13804], Loss: 2.9648, Perplexity: 19.3915, time_taken_in_seconds: 15
Epoch [1/1], Step [1619/13804], Loss: 3.0497, Perplexity: 21.1080, time_taken_in_seconds: 16
Epoch [1/1], Step [1620/13804], Loss: 2.8638, Perplexity: 17.5283, time_taken_in_seconds: 17
Epoch [1/1], Step [1621/13804], Loss: 2.8004, Perplexity: 16.4505, time_taken_in_seconds: 18
Epoch [1/1], Step [1622/13804], Loss: 2.6615, Perplexity: 14.3176, time_taken_in_seconds: 18
Epoch [1/1], Step [1623/13804], Loss: 2.9457, Perplexity: 19.0240, time_taken_in_seconds: 19
Epoch [1/1], Step [1624/13804], Loss: 3.4309, Perplexity: 30.9052, time_taken_in_seconds: 20
Epoch [1/1], Step [1625/13804], Loss: 3.4044, Perplexity: 30.0951, time_taken_in_seconds: 21
Epoch [1/1], Step [1626/13804], Loss: 3.4753, Perplexity: 32.3064, time_taken_in_seconds: 22
Epoch [1/1], Step [1627/13804], Loss: 2.6661, Perplexity: 14.3835, time_taken_in_seconds: 23
Epoch [1/1], Step [1628/13804], Loss: 3.6156, Perplexity: 37.1750, time_taken_in_seconds: 24
Epoch [1/1], Step [1629/13804], Loss: 2.9071, Perplexity: 18.3035, time_taken_in_seconds: 24
Epoch [1/1], Step [1630/13804], Loss: 3.0164, Perplexity: 20.4170, time_taken_in_seconds: 25
Epoch [1/1], Step [1631/13804], Loss: 2.8821, Perplexity: 17.8516, time_taken_in_seconds: 26
Epoch [1/1], Step [1632/13804], Loss: 2.8151, Perplexity: 16.6942, time_taken_in_seconds: 27
Epoch [1/1], Step [1633/13804], Loss: 2.6504, Perplexity: 14.1603, time_taken_in_seconds: 28
Epoch [1/1], Step [1634/13804], Loss: 2.9017, Perplexity: 18.2060, time_taken_in_seconds: 29
Epoch [1/1], Step [1635/13804], Loss: 2.9886, Perplexity: 19.8586, time_taken_in_seconds: 29
Epoch [1/1], Step [1636/13804], Loss: 3.0221, Perplexity: 20.5336, time_taken_in_seconds: 30
Epoch [1/1], Step [1637/13804], Loss: 3.3604, Perplexity: 28.7997, time_taken_in_seconds: 31
Epoch [1/1], Step [1638/13804], Loss: 2.8837, Perplexity: 17.8803, time_taken_in_seconds: 32
Epoch [1/1], Step [1639/13804], Loss: 3.4337, Perplexity: 30.9918, time_taken_in_seconds: 33
Epoch [1/1], Step [1640/13804], Loss: 3.2431, Perplexity: 25.6121, time_taken_in_seconds: 34
Epoch [1/1], Step [1641/13804], Loss: 3.5721, Perplexity: 35.5898, time_taken_in_seconds: 35
Epoch [1/1], Step [1642/13804], Loss: 3.0575, Perplexity: 21.2752, time_taken_in_seconds: 35
Epoch [1/1], Step [1643/13804], Loss: 2.8341, Perplexity: 17.0152, time_taken_in_seconds: 36
Epoch [1/1], Step [1644/13804], Loss: 3.2549, Perplexity: 25.9168, time_taken_in_seconds: 37
Epoch [1/1], Step [1645/13804], Loss: 3.1978, Perplexity: 24.4779, time_taken_in_seconds: 38
Epoch [1/1], Step [1646/13804], Loss: 3.0842, Perplexity: 21.8490, time_taken_in_seconds: 39
Epoch [1/1], Step [1647/13804], Loss: 2.8244, Perplexity: 16.8509, time_taken_in_seconds: 40
Epoch [1/1], Step [1648/13804], Loss: 2.8976, Perplexity: 18.1307, time_taken_in_seconds: 40
Epoch [1/1], Step [1649/13804], Loss: 3.1017, Perplexity: 22.2353, time_taken_in_seconds: 41
Epoch [1/1], Step [1650/13804], Loss: 2.8927, Perplexity: 18.0427, time_taken_in_seconds: 42
Epoch [1/1], Step [1651/13804], Loss: 3.2947, Perplexity: 26.9701, time_taken_in_seconds: 43
Epoch [1/1], Step [1652/13804], Loss: 2.8539, Perplexity: 17.3547, time_taken_in_seconds: 44
Epoch [1/1], Step [1653/13804], Loss: 2.7398, Perplexity: 15.4835, time_taken_in_seconds: 45
Epoch [1/1], Step [1654/13804], Loss: 3.2832, Perplexity: 26.6611, time_taken_in_seconds: 46
Epoch [1/1], Step [1655/13804], Loss: 2.5979, Perplexity: 13.4359, time_taken_in_seconds: 46
Epoch [1/1], Step [1656/13804], Loss: 3.3131, Perplexity: 27.4710, time_taken_in_seconds: 47
Epoch [1/1], Step [1657/13804], Loss: 3.1583, Perplexity: 23.5308, time_taken_in_seconds: 48
Epoch [1/1], Step [1658/13804], Loss: 3.1475, Perplexity: 23.2777, time_taken_in_seconds: 49
Epoch [1/1], Step [1659/13804], Loss: 2.9549, Perplexity: 19.1990, time_taken_in_seconds: 50
Epoch [1/1], Step [1660/13804], Loss: 2.7094, Perplexity: 15.0207, time_taken_in_seconds: 51
Epoch [1/1], Step [1661/13804], Loss: 2.9042, Perplexity: 18.2504, time_taken_in_seconds: 52
Epoch [1/1], Step [1662/13804], Loss: 3.8117, Perplexity: 45.2284, time_taken_in_seconds: 52
Epoch [1/1], Step [1663/13804], Loss: 2.9353, Perplexity: 18.8276, time_taken_in_seconds: 53
Epoch [1/1], Step [1664/13804], Loss: 2.8578, Perplexity: 17.4224, time_taken_in_seconds: 54
Epoch [1/1], Step [1665/13804], Loss: 3.1644, Perplexity: 23.6740, time_taken_in_seconds: 55
Epoch [1/1], Step [1666/13804], Loss: 3.0093, Perplexity: 20.2731, time_taken_in_seconds: 56
Epoch [1/1], Step [1667/13804], Loss: 3.2576, Perplexity: 25.9863, time_taken_in_seconds: 57
Epoch [1/1], Step [1668/13804], Loss: 3.3565, Perplexity: 28.6898, time_taken_in_seconds: 58
Epoch [1/1], Step [1669/13804], Loss: 3.2938, Perplexity: 26.9457, time_taken_in_seconds: 58
Epoch [1/1], Step [1670/13804], Loss: 2.7664, Perplexity: 15.9011, time_taken_in_seconds: 59
Epoch [1/1], Step [1671/13804], Loss: 3.2568, Perplexity: 25.9654, time_taken_in_seconds: 60
Epoch [1/1], Step [1672/13804], Loss: 2.9253, Perplexity: 18.6396, time_taken_in_seconds: 61
Epoch [1/1], Step [1673/13804], Loss: 3.0350, Perplexity: 20.8017, time_taken_in_seconds: 62
Epoch [1/1], Step [1674/13804], Loss: 3.4085, Perplexity: 30.2199, time_taken_in_seconds: 63
Epoch [1/1], Step [1675/13804], Loss: 2.7164, Perplexity: 15.1264, time_taken_in_seconds: 63
Epoch [1/1], Step [1676/13804], Loss: 3.5667, Perplexity: 35.3980, time_taken_in_seconds: 64
Epoch [1/1], Step [1677/13804], Loss: 2.9259, Perplexity: 18.6508, time_taken_in_seconds: 65
Epoch [1/1], Step [1678/13804], Loss: 3.0298, Perplexity: 20.6933, time_taken_in_seconds: 66
Epoch [1/1], Step [1679/13804], Loss: 2.8966, Perplexity: 18.1129, time_taken_in_seconds: 67
Epoch [1/1], Step [1680/13804], Loss: 2.8361, Perplexity: 17.0490, time_taken_in_seconds: 68
Epoch [1/1], Step [1681/13804], Loss: 2.9023, Perplexity: 18.2153, time_taken_in_seconds: 69
Epoch [1/1], Step [1682/13804], Loss: 2.9272, Perplexity: 18.6758, time_taken_in_seconds: 70
Epoch [1/1], Step [1683/13804], Loss: 3.0193, Perplexity: 20.4772, time_taken_in_seconds: 71
Epoch [1/1], Step [1684/13804], Loss: 2.7094, Perplexity: 15.0198, time_taken_in_seconds: 71
Epoch [1/1], Step [1685/13804], Loss: 3.1079, Perplexity: 22.3750, time_taken_in_seconds: 72
Epoch [1/1], Step [1686/13804], Loss: 3.0448, Perplexity: 21.0068, time_taken_in_seconds: 73
Epoch [1/1], Step [1687/13804], Loss: 3.1389, Perplexity: 23.0773, time_taken_in_seconds: 74
Epoch [1/1], Step [1688/13804], Loss: 2.8210, Perplexity: 16.7930, time_taken_in_seconds: 75
Epoch [1/1], Step [1689/13804], Loss: 3.0867, Perplexity: 21.9042, time_taken_in_seconds: 76
Epoch [1/1], Step [1690/13804], Loss: 3.1457, Perplexity: 23.2349, time_taken_in_seconds: 77
Epoch [1/1], Step [1691/13804], Loss: 2.9721, Perplexity: 19.5336, time_taken_in_seconds: 77
Epoch [1/1], Step [1692/13804], Loss: 2.9311, Perplexity: 18.7485, time_taken_in_seconds: 78
Epoch [1/1], Step [1693/13804], Loss: 2.8872, Perplexity: 17.9431, time_taken_in_seconds: 79
Epoch [1/1], Step [1694/13804], Loss: 2.7121, Perplexity: 15.0607, time_taken_in_seconds: 80
Epoch [1/1], Step [1695/13804], Loss: 3.3750, Perplexity: 29.2228, time_taken_in_seconds: 81
Epoch [1/1], Step [1696/13804], Loss: 4.2097, Perplexity: 67.3361, time_taken_in_seconds: 82
Epoch [1/1], Step [1697/13804], Loss: 3.0956, Perplexity: 22.1011, time_taken_in_seconds: 82
Epoch [1/1], Step [1698/13804], Loss: 2.9214, Perplexity: 18.5665, time_taken_in_seconds: 83
Epoch [1/1], Step [1699/13804], Loss: 2.8767, Perplexity: 17.7554, time_taken_in_seconds: 84
Epoch [1/1], Step [1700/13804], Loss: 3.0824, Perplexity: 21.8109, time_taken_in_seconds: 85
Epoch [1/1], Step [1701/13804], Loss: 3.2853, Perplexity: 26.7172, time_taken_in_seconds: 0
Epoch [1/1], Step [1702/13804], Loss: 2.9613, Perplexity: 19.3237, time_taken_in_seconds: 1
Epoch [1/1], Step [1703/13804], Loss: 2.8162, Perplexity: 16.7138, time_taken_in_seconds: 2
Epoch [1/1], Step [1704/13804], Loss: 2.9053, Perplexity: 18.2703, time_taken_in_seconds: 3
Epoch [1/1], Step [1705/13804], Loss: 2.9584, Perplexity: 19.2665, time_taken_in_seconds: 4
Epoch [1/1], Step [1706/13804], Loss: 2.9078, Perplexity: 18.3169, time_taken_in_seconds: 5
Epoch [1/1], Step [1707/13804], Loss: 2.9241, Perplexity: 18.6174, time_taken_in_seconds: 5
Epoch [1/1], Step [1708/13804], Loss: 3.0832, Perplexity: 21.8291, time_taken_in_seconds: 6
Epoch [1/1], Step [1709/13804], Loss: 2.8383, Perplexity: 17.0870, time_taken_in_seconds: 7
Epoch [1/1], Step [1710/13804], Loss: 3.0756, Perplexity: 21.6635, time_taken_in_seconds: 8
Epoch [1/1], Step [1711/13804], Loss: 3.3330, Perplexity: 28.0224, time_taken_in_seconds: 9
Epoch [1/1], Step [1712/13804], Loss: 2.6294, Perplexity: 13.8653, time_taken_in_seconds: 10
Epoch [1/1], Step [1713/13804], Loss: 2.9266, Perplexity: 18.6642, time_taken_in_seconds: 11
Epoch [1/1], Step [1714/13804], Loss: 3.4852, Perplexity: 32.6300, time_taken_in_seconds: 11
Epoch [1/1], Step [1715/13804], Loss: 3.0875, Perplexity: 21.9224, time_taken_in_seconds: 12
Epoch [1/1], Step [1716/13804], Loss: 3.2035, Perplexity: 24.6181, time_taken_in_seconds: 13
Epoch [1/1], Step [1717/13804], Loss: 3.2372, Perplexity: 25.4618, time_taken_in_seconds: 14
Epoch [1/1], Step [1718/13804], Loss: 3.0191, Perplexity: 20.4730, time_taken_in_seconds: 15
Epoch [1/1], Step [1719/13804], Loss: 3.0934, Perplexity: 22.0527, time_taken_in_seconds: 16
Epoch [1/1], Step [1720/13804], Loss: 3.1323, Perplexity: 22.9269, time_taken_in_seconds: 16
Epoch [1/1], Step [1721/13804], Loss: 2.9744, Perplexity: 19.5782, time_taken_in_seconds: 17
Epoch [1/1], Step [1722/13804], Loss: 2.7985, Perplexity: 16.4198, time_taken_in_seconds: 18
Epoch [1/1], Step [1723/13804], Loss: 3.7301, Perplexity: 41.6812, time_taken_in_seconds: 19
Epoch [1/1], Step [1724/13804], Loss: 2.9970, Perplexity: 20.0257, time_taken_in_seconds: 20
Epoch [1/1], Step [1725/13804], Loss: 3.4515, Perplexity: 31.5466, time_taken_in_seconds: 21
Epoch [1/1], Step [1726/13804], Loss: 2.9602, Perplexity: 19.3009, time_taken_in_seconds: 22
Epoch [1/1], Step [1727/13804], Loss: 2.9271, Perplexity: 18.6732, time_taken_in_seconds: 22
Epoch [1/1], Step [1728/13804], Loss: 2.9320, Perplexity: 18.7651, time_taken_in_seconds: 23
Epoch [1/1], Step [1729/13804], Loss: 3.1404, Perplexity: 23.1126, time_taken_in_seconds: 24
Epoch [1/1], Step [1730/13804], Loss: 2.9711, Perplexity: 19.5131, time_taken_in_seconds: 25
Epoch [1/1], Step [1731/13804], Loss: 3.2112, Perplexity: 24.8100, time_taken_in_seconds: 26
Epoch [1/1], Step [1732/13804], Loss: 3.3551, Perplexity: 28.6498, time_taken_in_seconds: 27
Epoch [1/1], Step [1733/13804], Loss: 3.2063, Perplexity: 24.6867, time_taken_in_seconds: 28
Epoch [1/1], Step [1734/13804], Loss: 2.8356, Perplexity: 17.0414, time_taken_in_seconds: 28
Epoch [1/1], Step [1735/13804], Loss: 2.8199, Perplexity: 16.7755, time_taken_in_seconds: 29
Epoch [1/1], Step [1736/13804], Loss: 5.0762, Perplexity: 160.1686, time_taken_in_seconds: 30
Epoch [1/1], Step [1737/13804], Loss: 3.0412, Perplexity: 20.9310, time_taken_in_seconds: 31
Epoch [1/1], Step [1738/13804], Loss: 3.2238, Perplexity: 25.1234, time_taken_in_seconds: 32
Epoch [1/1], Step [1739/13804], Loss: 3.8041, Perplexity: 44.8844, time_taken_in_seconds: 33
Epoch [1/1], Step [1740/13804], Loss: 2.9509, Perplexity: 19.1224, time_taken_in_seconds: 34
Epoch [1/1], Step [1741/13804], Loss: 3.1246, Perplexity: 22.7514, time_taken_in_seconds: 34
Epoch [1/1], Step [1742/13804], Loss: 3.0678, Perplexity: 21.4955, time_taken_in_seconds: 35
Epoch [1/1], Step [1743/13804], Loss: 3.0640, Perplexity: 21.4120, time_taken_in_seconds: 36
Epoch [1/1], Step [1744/13804], Loss: 3.1665, Perplexity: 23.7245, time_taken_in_seconds: 37
Epoch [1/1], Step [1745/13804], Loss: 2.9345, Perplexity: 18.8118, time_taken_in_seconds: 38
Epoch [1/1], Step [1746/13804], Loss: 3.3754, Perplexity: 29.2347, time_taken_in_seconds: 39
Epoch [1/1], Step [1747/13804], Loss: 2.8384, Perplexity: 17.0884, time_taken_in_seconds: 39
Epoch [1/1], Step [1748/13804], Loss: 3.4459, Perplexity: 31.3727, time_taken_in_seconds: 40
Epoch [1/1], Step [1749/13804], Loss: 3.0746, Perplexity: 21.6410, time_taken_in_seconds: 41
Epoch [1/1], Step [1750/13804], Loss: 3.1113, Perplexity: 22.4504, time_taken_in_seconds: 42
Epoch [1/1], Step [1751/13804], Loss: 3.3840, Perplexity: 29.4874, time_taken_in_seconds: 43
Epoch [1/1], Step [1752/13804], Loss: 3.1691, Perplexity: 23.7852, time_taken_in_seconds: 44
Epoch [1/1], Step [1753/13804], Loss: 3.0742, Perplexity: 21.6329, time_taken_in_seconds: 45
Epoch [1/1], Step [1754/13804], Loss: 3.5809, Perplexity: 35.9057, time_taken_in_seconds: 46
Epoch [1/1], Step [1755/13804], Loss: 2.7674, Perplexity: 15.9174, time_taken_in_seconds: 47
Epoch [1/1], Step [1756/13804], Loss: 2.9832, Perplexity: 19.7502, time_taken_in_seconds: 47
Epoch [1/1], Step [1757/13804], Loss: 2.8734, Perplexity: 17.6963, time_taken_in_seconds: 48
Epoch [1/1], Step [1758/13804], Loss: 3.3795, Perplexity: 29.3546, time_taken_in_seconds: 49
Epoch [1/1], Step [1759/13804], Loss: 2.8035, Perplexity: 16.5025, time_taken_in_seconds: 50
Epoch [1/1], Step [1760/13804], Loss: 3.1159, Perplexity: 22.5546, time_taken_in_seconds: 51
Epoch [1/1], Step [1761/13804], Loss: 2.8029, Perplexity: 16.4922, time_taken_in_seconds: 52
Epoch [1/1], Step [1762/13804], Loss: 2.9571, Perplexity: 19.2424, time_taken_in_seconds: 52
Epoch [1/1], Step [1763/13804], Loss: 3.0265, Perplexity: 20.6252, time_taken_in_seconds: 53
Epoch [1/1], Step [1764/13804], Loss: 3.2909, Perplexity: 26.8664, time_taken_in_seconds: 54
Epoch [1/1], Step [1765/13804], Loss: 2.8933, Perplexity: 18.0526, time_taken_in_seconds: 55
Epoch [1/1], Step [1766/13804], Loss: 3.1150, Perplexity: 22.5328, time_taken_in_seconds: 56
Epoch [1/1], Step [1767/13804], Loss: 3.1400, Perplexity: 23.1050, time_taken_in_seconds: 57
Epoch [1/1], Step [1768/13804], Loss: 3.2044, Perplexity: 24.6415, time_taken_in_seconds: 58
Epoch [1/1], Step [1769/13804], Loss: 2.8257, Perplexity: 16.8726, time_taken_in_seconds: 58
Epoch [1/1], Step [1770/13804], Loss: 2.8009, Perplexity: 16.4597, time_taken_in_seconds: 59
Epoch [1/1], Step [1771/13804], Loss: 2.7762, Perplexity: 16.0585, time_taken_in_seconds: 60
Epoch [1/1], Step [1772/13804], Loss: 3.0446, Perplexity: 21.0011, time_taken_in_seconds: 61
Epoch [1/1], Step [1773/13804], Loss: 3.0279, Perplexity: 20.6540, time_taken_in_seconds: 62
Epoch [1/1], Step [1774/13804], Loss: 3.0243, Perplexity: 20.5799, time_taken_in_seconds: 63
Epoch [1/1], Step [1775/13804], Loss: 3.3045, Perplexity: 27.2359, time_taken_in_seconds: 63
Epoch [1/1], Step [1776/13804], Loss: 2.9069, Perplexity: 18.3002, time_taken_in_seconds: 64
Epoch [1/1], Step [1777/13804], Loss: 2.7097, Perplexity: 15.0250, time_taken_in_seconds: 65
Epoch [1/1], Step [1778/13804], Loss: 2.7645, Perplexity: 15.8707, time_taken_in_seconds: 66
Epoch [1/1], Step [1779/13804], Loss: 2.8762, Perplexity: 17.7474, time_taken_in_seconds: 67
Epoch [1/1], Step [1780/13804], Loss: 3.3481, Perplexity: 28.4486, time_taken_in_seconds: 68
Epoch [1/1], Step [1781/13804], Loss: 3.0554, Perplexity: 21.2307, time_taken_in_seconds: 69
Epoch [1/1], Step [1782/13804], Loss: 3.0592, Perplexity: 21.3100, time_taken_in_seconds: 69
Epoch [1/1], Step [1783/13804], Loss: 3.1854, Perplexity: 24.1781, time_taken_in_seconds: 70
Epoch [1/1], Step [1784/13804], Loss: 3.1247, Perplexity: 22.7521, time_taken_in_seconds: 71
Epoch [1/1], Step [1785/13804], Loss: 2.7245, Perplexity: 15.2485, time_taken_in_seconds: 72
Epoch [1/1], Step [1786/13804], Loss: 3.1292, Perplexity: 22.8563, time_taken_in_seconds: 73
Epoch [1/1], Step [1787/13804], Loss: 2.5977, Perplexity: 13.4327, time_taken_in_seconds: 74
Epoch [1/1], Step [1788/13804], Loss: 3.3049, Perplexity: 27.2463, time_taken_in_seconds: 75
Epoch [1/1], Step [1789/13804], Loss: 3.3311, Perplexity: 27.9685, time_taken_in_seconds: 75
Epoch [1/1], Step [1790/13804], Loss: 2.9706, Perplexity: 19.5042, time_taken_in_seconds: 76
Epoch [1/1], Step [1791/13804], Loss: 2.8753, Perplexity: 17.7312, time_taken_in_seconds: 77
Epoch [1/1], Step [1792/13804], Loss: 2.9636, Perplexity: 19.3684, time_taken_in_seconds: 78
Epoch [1/1], Step [1793/13804], Loss: 3.5064, Perplexity: 33.3292, time_taken_in_seconds: 79
Epoch [1/1], Step [1794/13804], Loss: 3.2036, Perplexity: 24.6202, time_taken_in_seconds: 80
Epoch [1/1], Step [1795/13804], Loss: 2.9608, Perplexity: 19.3137, time_taken_in_seconds: 80
Epoch [1/1], Step [1796/13804], Loss: 2.9927, Perplexity: 19.9397, time_taken_in_seconds: 81
Epoch [1/1], Step [1797/13804], Loss: 2.9564, Perplexity: 19.2278, time_taken_in_seconds: 82
Epoch [1/1], Step [1798/13804], Loss: 3.1106, Perplexity: 22.4353, time_taken_in_seconds: 83
Epoch [1/1], Step [1799/13804], Loss: 2.9794, Perplexity: 19.6752, time_taken_in_seconds: 84
Epoch [1/1], Step [1800/13804], Loss: 3.0121, Perplexity: 20.3299, time_taken_in_seconds: 85
Epoch [1/1], Step [1801/13804], Loss: 3.1534, Perplexity: 23.4148, time_taken_in_seconds: 0
Epoch [1/1], Step [1802/13804], Loss: 3.2204, Perplexity: 25.0384, time_taken_in_seconds: 1
Epoch [1/1], Step [1803/13804], Loss: 2.8987, Perplexity: 18.1512, time_taken_in_seconds: 2
Epoch [1/1], Step [1804/13804], Loss: 3.0488, Perplexity: 21.0893, time_taken_in_seconds: 3
Epoch [1/1], Step [1805/13804], Loss: 3.0418, Perplexity: 20.9423, time_taken_in_seconds: 4
Epoch [1/1], Step [1806/13804], Loss: 3.2357, Perplexity: 25.4230, time_taken_in_seconds: 5
Epoch [1/1], Step [1807/13804], Loss: 2.6794, Perplexity: 14.5759, time_taken_in_seconds: 5
Epoch [1/1], Step [1808/13804], Loss: 3.0339, Perplexity: 20.7789, time_taken_in_seconds: 6
Epoch [1/1], Step [1809/13804], Loss: 2.8407, Perplexity: 17.1270, time_taken_in_seconds: 7
Epoch [1/1], Step [1810/13804], Loss: 3.0615, Perplexity: 21.3606, time_taken_in_seconds: 8
Epoch [1/1], Step [1811/13804], Loss: 2.8110, Perplexity: 16.6264, time_taken_in_seconds: 9
Epoch [1/1], Step [1812/13804], Loss: 2.5405, Perplexity: 12.6860, time_taken_in_seconds: 10
Epoch [1/1], Step [1813/13804], Loss: 3.0504, Perplexity: 21.1243, time_taken_in_seconds: 11
Epoch [1/1], Step [1814/13804], Loss: 3.0365, Perplexity: 20.8325, time_taken_in_seconds: 11
Epoch [1/1], Step [1815/13804], Loss: 2.9743, Perplexity: 19.5750, time_taken_in_seconds: 12
Epoch [1/1], Step [1816/13804], Loss: 2.8764, Perplexity: 17.7508, time_taken_in_seconds: 13
Epoch [1/1], Step [1817/13804], Loss: 2.9854, Perplexity: 19.7945, time_taken_in_seconds: 14
Epoch [1/1], Step [1818/13804], Loss: 2.9151, Perplexity: 18.4506, time_taken_in_seconds: 15
Epoch [1/1], Step [1819/13804], Loss: 3.1781, Perplexity: 24.0022, time_taken_in_seconds: 16
Epoch [1/1], Step [1820/13804], Loss: 2.9892, Perplexity: 19.8697, time_taken_in_seconds: 16
Epoch [1/1], Step [1821/13804], Loss: 2.8222, Perplexity: 16.8142, time_taken_in_seconds: 17
Epoch [1/1], Step [1822/13804], Loss: 2.9710, Perplexity: 19.5115, time_taken_in_seconds: 18
Epoch [1/1], Step [1823/13804], Loss: 3.0043, Perplexity: 20.1722, time_taken_in_seconds: 19
Epoch [1/1], Step [1824/13804], Loss: 2.9440, Perplexity: 18.9914, time_taken_in_seconds: 20
Epoch [1/1], Step [1825/13804], Loss: 3.0100, Perplexity: 20.2872, time_taken_in_seconds: 21
Epoch [1/1], Step [1826/13804], Loss: 3.2306, Perplexity: 25.2956, time_taken_in_seconds: 22
Epoch [1/1], Step [1827/13804], Loss: 2.7840, Perplexity: 16.1835, time_taken_in_seconds: 23
Epoch [1/1], Step [1828/13804], Loss: 2.7552, Perplexity: 15.7235, time_taken_in_seconds: 23
Epoch [1/1], Step [1829/13804], Loss: 2.7626, Perplexity: 15.8415, time_taken_in_seconds: 24
Epoch [1/1], Step [1830/13804], Loss: 3.1401, Perplexity: 23.1056, time_taken_in_seconds: 25
Epoch [1/1], Step [1831/13804], Loss: 2.8449, Perplexity: 17.2003, time_taken_in_seconds: 26
Epoch [1/1], Step [1832/13804], Loss: 3.1207, Perplexity: 22.6632, time_taken_in_seconds: 27
Epoch [1/1], Step [1833/13804], Loss: 3.0237, Perplexity: 20.5667, time_taken_in_seconds: 28
Epoch [1/1], Step [1834/13804], Loss: 3.0941, Perplexity: 22.0667, time_taken_in_seconds: 28
Epoch [1/1], Step [1835/13804], Loss: 3.0591, Perplexity: 21.3076, time_taken_in_seconds: 29
Epoch [1/1], Step [1836/13804], Loss: 3.4825, Perplexity: 32.5421, time_taken_in_seconds: 30
Epoch [1/1], Step [1837/13804], Loss: 3.1327, Perplexity: 22.9350, time_taken_in_seconds: 31
Epoch [1/1], Step [1838/13804], Loss: 3.6767, Perplexity: 39.5151, time_taken_in_seconds: 32
Epoch [1/1], Step [1839/13804], Loss: 3.4888, Perplexity: 32.7458, time_taken_in_seconds: 33
Epoch [1/1], Step [1840/13804], Loss: 2.7842, Perplexity: 16.1868, time_taken_in_seconds: 33
Epoch [1/1], Step [1841/13804], Loss: 2.9091, Perplexity: 18.3411, time_taken_in_seconds: 34
Epoch [1/1], Step [1842/13804], Loss: 3.3766, Perplexity: 29.2714, time_taken_in_seconds: 35
Epoch [1/1], Step [1843/13804], Loss: 3.4857, Perplexity: 32.6438, time_taken_in_seconds: 36
Epoch [1/1], Step [1844/13804], Loss: 2.7486, Perplexity: 15.6204, time_taken_in_seconds: 37
Epoch [1/1], Step [1845/13804], Loss: 2.7334, Perplexity: 15.3847, time_taken_in_seconds: 38
Epoch [1/1], Step [1846/13804], Loss: 2.9514, Perplexity: 19.1331, time_taken_in_seconds: 39
Epoch [1/1], Step [1847/13804], Loss: 3.0124, Perplexity: 20.3363, time_taken_in_seconds: 39
Epoch [1/1], Step [1848/13804], Loss: 3.5956, Perplexity: 36.4375, time_taken_in_seconds: 40
Epoch [1/1], Step [1849/13804], Loss: 2.9652, Perplexity: 19.3984, time_taken_in_seconds: 41
Epoch [1/1], Step [1850/13804], Loss: 3.0585, Perplexity: 21.2964, time_taken_in_seconds: 42
Epoch [1/1], Step [1851/13804], Loss: 2.9992, Perplexity: 20.0705, time_taken_in_seconds: 43
Epoch [1/1], Step [1852/13804], Loss: 2.7171, Perplexity: 15.1356, time_taken_in_seconds: 44
Epoch [1/1], Step [1853/13804], Loss: 2.8258, Perplexity: 16.8738, time_taken_in_seconds: 44
Epoch [1/1], Step [1854/13804], Loss: 3.1327, Perplexity: 22.9351, time_taken_in_seconds: 45
Epoch [1/1], Step [1855/13804], Loss: 3.2154, Perplexity: 24.9134, time_taken_in_seconds: 46
Epoch [1/1], Step [1856/13804], Loss: 2.7517, Perplexity: 15.6696, time_taken_in_seconds: 47
Epoch [1/1], Step [1857/13804], Loss: 2.8457, Perplexity: 17.2135, time_taken_in_seconds: 48
Epoch [1/1], Step [1858/13804], Loss: 3.1218, Perplexity: 22.6879, time_taken_in_seconds: 49
Epoch [1/1], Step [1859/13804], Loss: 3.3375, Perplexity: 28.1499, time_taken_in_seconds: 49
Epoch [1/1], Step [1860/13804], Loss: 2.9222, Perplexity: 18.5818, time_taken_in_seconds: 50
Epoch [1/1], Step [1861/13804], Loss: 2.5896, Perplexity: 13.3239, time_taken_in_seconds: 51
Epoch [1/1], Step [1862/13804], Loss: 2.9693, Perplexity: 19.4791, time_taken_in_seconds: 52
Epoch [1/1], Step [1863/13804], Loss: 2.8797, Perplexity: 17.8089, time_taken_in_seconds: 53
Epoch [1/1], Step [1864/13804], Loss: 2.9059, Perplexity: 18.2818, time_taken_in_seconds: 54
Epoch [1/1], Step [1865/13804], Loss: 2.7462, Perplexity: 15.5839, time_taken_in_seconds: 55
Epoch [1/1], Step [1866/13804], Loss: 2.9290, Perplexity: 18.7095, time_taken_in_seconds: 55
Epoch [1/1], Step [1867/13804], Loss: 3.2376, Perplexity: 25.4721, time_taken_in_seconds: 56
Epoch [1/1], Step [1868/13804], Loss: 3.0967, Perplexity: 22.1244, time_taken_in_seconds: 57
Epoch [1/1], Step [1869/13804], Loss: 3.0438, Perplexity: 20.9840, time_taken_in_seconds: 58
Epoch [1/1], Step [1870/13804], Loss: 3.2282, Perplexity: 25.2351, time_taken_in_seconds: 59
Epoch [1/1], Step [1871/13804], Loss: 3.1379, Perplexity: 23.0550, time_taken_in_seconds: 60
Epoch [1/1], Step [1872/13804], Loss: 2.6780, Perplexity: 14.5559, time_taken_in_seconds: 60
Epoch [1/1], Step [1873/13804], Loss: 3.5965, Perplexity: 36.4688, time_taken_in_seconds: 61
Epoch [1/1], Step [1874/13804], Loss: 3.2439, Perplexity: 25.6344, time_taken_in_seconds: 62
Epoch [1/1], Step [1875/13804], Loss: 3.0057, Perplexity: 20.2006, time_taken_in_seconds: 63
Epoch [1/1], Step [1876/13804], Loss: 3.2669, Perplexity: 26.2298, time_taken_in_seconds: 64
Epoch [1/1], Step [1877/13804], Loss: 2.7220, Perplexity: 15.2111, time_taken_in_seconds: 65
Epoch [1/1], Step [1878/13804], Loss: 3.6969, Perplexity: 40.3212, time_taken_in_seconds: 65
Epoch [1/1], Step [1879/13804], Loss: 3.4656, Perplexity: 31.9966, time_taken_in_seconds: 66
Epoch [1/1], Step [1880/13804], Loss: 3.1166, Perplexity: 22.5697, time_taken_in_seconds: 67
Epoch [1/1], Step [1881/13804], Loss: 3.2006, Perplexity: 24.5484, time_taken_in_seconds: 68
Epoch [1/1], Step [1882/13804], Loss: 2.9225, Perplexity: 18.5870, time_taken_in_seconds: 69
Epoch [1/1], Step [1883/13804], Loss: 3.0943, Perplexity: 22.0715, time_taken_in_seconds: 70
Epoch [1/1], Step [1884/13804], Loss: 2.7771, Perplexity: 16.0726, time_taken_in_seconds: 71
Epoch [1/1], Step [1885/13804], Loss: 2.9477, Perplexity: 19.0621, time_taken_in_seconds: 71
Epoch [1/1], Step [1886/13804], Loss: 2.7992, Perplexity: 16.4315, time_taken_in_seconds: 72
Epoch [1/1], Step [1887/13804], Loss: 2.7894, Perplexity: 16.2711, time_taken_in_seconds: 73
Epoch [1/1], Step [1888/13804], Loss: 2.9377, Perplexity: 18.8717, time_taken_in_seconds: 74
Epoch [1/1], Step [1889/13804], Loss: 2.9013, Perplexity: 18.1976, time_taken_in_seconds: 75
Epoch [1/1], Step [1890/13804], Loss: 2.9622, Perplexity: 19.3399, time_taken_in_seconds: 76
Epoch [1/1], Step [1891/13804], Loss: 3.0327, Perplexity: 20.7533, time_taken_in_seconds: 76
Epoch [1/1], Step [1892/13804], Loss: 2.9027, Perplexity: 18.2227, time_taken_in_seconds: 77
Epoch [1/1], Step [1893/13804], Loss: 3.0049, Perplexity: 20.1843, time_taken_in_seconds: 78
Epoch [1/1], Step [1894/13804], Loss: 2.8749, Perplexity: 17.7234, time_taken_in_seconds: 79
Epoch [1/1], Step [1895/13804], Loss: 2.2726, Perplexity: 9.7049, time_taken_in_seconds: 80
Epoch [1/1], Step [1896/13804], Loss: 3.0717, Perplexity: 21.5780, time_taken_in_seconds: 81
Epoch [1/1], Step [1897/13804], Loss: 3.2051, Perplexity: 24.6573, time_taken_in_seconds: 82
Epoch [1/1], Step [1898/13804], Loss: 3.0390, Perplexity: 20.8840, time_taken_in_seconds: 83
Epoch [1/1], Step [1899/13804], Loss: 3.1532, Perplexity: 23.4117, time_taken_in_seconds: 84
Epoch [1/1], Step [1900/13804], Loss: 3.0748, Perplexity: 21.6450, time_taken_in_seconds: 84
Epoch [1/1], Step [1901/13804], Loss: 3.2517, Perplexity: 25.8336, time_taken_in_seconds: 0
Epoch [1/1], Step [1902/13804], Loss: 2.5897, Perplexity: 13.3257, time_taken_in_seconds: 1
Epoch [1/1], Step [1903/13804], Loss: 2.7999, Perplexity: 16.4436, time_taken_in_seconds: 2
Epoch [1/1], Step [1904/13804], Loss: 3.0732, Perplexity: 21.6119, time_taken_in_seconds: 3
Epoch [1/1], Step [1905/13804], Loss: 3.1452, Perplexity: 23.2244, time_taken_in_seconds: 4
Epoch [1/1], Step [1906/13804], Loss: 2.9798, Perplexity: 19.6843, time_taken_in_seconds: 5
Epoch [1/1], Step [1907/13804], Loss: 3.7165, Perplexity: 41.1222, time_taken_in_seconds: 5
Epoch [1/1], Step [1908/13804], Loss: 2.9247, Perplexity: 18.6279, time_taken_in_seconds: 6
Epoch [1/1], Step [1909/13804], Loss: 3.4429, Perplexity: 31.2778, time_taken_in_seconds: 7
Epoch [1/1], Step [1910/13804], Loss: 3.0050, Perplexity: 20.1856, time_taken_in_seconds: 8
Epoch [1/1], Step [1911/13804], Loss: 2.7512, Perplexity: 15.6613, time_taken_in_seconds: 9
Epoch [1/1], Step [1912/13804], Loss: 3.1419, Perplexity: 23.1468, time_taken_in_seconds: 10
Epoch [1/1], Step [1913/13804], Loss: 2.9489, Perplexity: 19.0859, time_taken_in_seconds: 11
Epoch [1/1], Step [1914/13804], Loss: 2.7661, Perplexity: 15.8970, time_taken_in_seconds: 11
Epoch [1/1], Step [1915/13804], Loss: 2.7519, Perplexity: 15.6726, time_taken_in_seconds: 12
Epoch [1/1], Step [1916/13804], Loss: 2.9387, Perplexity: 18.8913, time_taken_in_seconds: 13
Epoch [1/1], Step [1917/13804], Loss: 2.9615, Perplexity: 19.3263, time_taken_in_seconds: 14
Epoch [1/1], Step [1918/13804], Loss: 2.7736, Perplexity: 16.0156, time_taken_in_seconds: 15
Epoch [1/1], Step [1919/13804], Loss: 2.9060, Perplexity: 18.2837, time_taken_in_seconds: 16
Epoch [1/1], Step [1920/13804], Loss: 3.0227, Perplexity: 20.5462, time_taken_in_seconds: 17
Epoch [1/1], Step [1921/13804], Loss: 2.9041, Perplexity: 18.2486, time_taken_in_seconds: 17
Epoch [1/1], Step [1922/13804], Loss: 2.8298, Perplexity: 16.9419, time_taken_in_seconds: 18
Epoch [1/1], Step [1923/13804], Loss: 2.8939, Perplexity: 18.0644, time_taken_in_seconds: 19
Epoch [1/1], Step [1924/13804], Loss: 2.6946, Perplexity: 14.7998, time_taken_in_seconds: 20
Epoch [1/1], Step [1925/13804], Loss: 2.7182, Perplexity: 15.1524, time_taken_in_seconds: 21
Epoch [1/1], Step [1926/13804], Loss: 2.9773, Perplexity: 19.6354, time_taken_in_seconds: 22
Epoch [1/1], Step [1927/13804], Loss: 2.9796, Perplexity: 19.6801, time_taken_in_seconds: 22
Epoch [1/1], Step [1928/13804], Loss: 2.9433, Perplexity: 18.9790, time_taken_in_seconds: 23
Epoch [1/1], Step [1929/13804], Loss: 2.7378, Perplexity: 15.4535, time_taken_in_seconds: 24
Epoch [1/1], Step [1930/13804], Loss: 2.8231, Perplexity: 16.8287, time_taken_in_seconds: 25
Epoch [1/1], Step [1931/13804], Loss: 2.6703, Perplexity: 14.4438, time_taken_in_seconds: 26
Epoch [1/1], Step [1932/13804], Loss: 2.7029, Perplexity: 14.9226, time_taken_in_seconds: 27
Epoch [1/1], Step [1933/13804], Loss: 2.8465, Perplexity: 17.2277, time_taken_in_seconds: 28
Epoch [1/1], Step [1934/13804], Loss: 3.2012, Perplexity: 24.5608, time_taken_in_seconds: 28
Epoch [1/1], Step [1935/13804], Loss: 2.9515, Perplexity: 19.1346, time_taken_in_seconds: 29
Epoch [1/1], Step [1936/13804], Loss: 3.3587, Perplexity: 28.7508, time_taken_in_seconds: 30
Epoch [1/1], Step [1937/13804], Loss: 3.1151, Perplexity: 22.5367, time_taken_in_seconds: 31
Epoch [1/1], Step [1938/13804], Loss: 2.8414, Perplexity: 17.1390, time_taken_in_seconds: 32
Epoch [1/1], Step [1939/13804], Loss: 3.8162, Perplexity: 45.4317, time_taken_in_seconds: 33
Epoch [1/1], Step [1940/13804], Loss: 2.8974, Perplexity: 18.1262, time_taken_in_seconds: 34
Epoch [1/1], Step [1941/13804], Loss: 3.2692, Perplexity: 26.2906, time_taken_in_seconds: 34
Epoch [1/1], Step [1942/13804], Loss: 2.9438, Perplexity: 18.9875, time_taken_in_seconds: 35
Epoch [1/1], Step [1943/13804], Loss: 2.9286, Perplexity: 18.7020, time_taken_in_seconds: 36
Epoch [1/1], Step [1944/13804], Loss: 2.9908, Perplexity: 19.9019, time_taken_in_seconds: 37
Epoch [1/1], Step [1945/13804], Loss: 3.0193, Perplexity: 20.4763, time_taken_in_seconds: 38
Epoch [1/1], Step [1946/13804], Loss: 2.9811, Perplexity: 19.7090, time_taken_in_seconds: 39
Epoch [1/1], Step [1947/13804], Loss: 2.9679, Perplexity: 19.4509, time_taken_in_seconds: 39
Epoch [1/1], Step [1948/13804], Loss: 2.8382, Perplexity: 17.0848, time_taken_in_seconds: 40
Epoch [1/1], Step [1949/13804], Loss: 2.7304, Perplexity: 15.3385, time_taken_in_seconds: 41
Epoch [1/1], Step [1950/13804], Loss: 2.8271, Perplexity: 16.8971, time_taken_in_seconds: 42
Epoch [1/1], Step [1951/13804], Loss: 3.5921, Perplexity: 36.3104, time_taken_in_seconds: 43
Epoch [1/1], Step [1952/13804], Loss: 2.9650, Perplexity: 19.3947, time_taken_in_seconds: 44
Epoch [1/1], Step [1953/13804], Loss: 2.9250, Perplexity: 18.6351, time_taken_in_seconds: 45
Epoch [1/1], Step [1954/13804], Loss: 3.1851, Perplexity: 24.1691, time_taken_in_seconds: 45
Epoch [1/1], Step [1955/13804], Loss: 2.7360, Perplexity: 15.4253, time_taken_in_seconds: 46
Epoch [1/1], Step [1956/13804], Loss: 2.6158, Perplexity: 13.6786, time_taken_in_seconds: 47
Epoch [1/1], Step [1957/13804], Loss: 2.7725, Perplexity: 15.9994, time_taken_in_seconds: 48
Epoch [1/1], Step [1958/13804], Loss: 2.9907, Perplexity: 19.9003, time_taken_in_seconds: 49
Epoch [1/1], Step [1959/13804], Loss: 2.8299, Perplexity: 16.9438, time_taken_in_seconds: 50
Epoch [1/1], Step [1960/13804], Loss: 3.0620, Perplexity: 21.3705, time_taken_in_seconds: 51
Epoch [1/1], Step [1961/13804], Loss: 2.9532, Perplexity: 19.1676, time_taken_in_seconds: 51
Epoch [1/1], Step [1962/13804], Loss: 2.7803, Perplexity: 16.1232, time_taken_in_seconds: 52
Epoch [1/1], Step [1963/13804], Loss: 2.8245, Perplexity: 16.8517, time_taken_in_seconds: 53
Epoch [1/1], Step [1964/13804], Loss: 3.0659, Perplexity: 21.4540, time_taken_in_seconds: 54
Epoch [1/1], Step [1965/13804], Loss: 3.1262, Perplexity: 22.7867, time_taken_in_seconds: 55
Epoch [1/1], Step [1966/13804], Loss: 2.6822, Perplexity: 14.6175, time_taken_in_seconds: 56
Epoch [1/1], Step [1967/13804], Loss: 2.6619, Perplexity: 14.3233, time_taken_in_seconds: 57
Epoch [1/1], Step [1968/13804], Loss: 2.8596, Perplexity: 17.4544, time_taken_in_seconds: 58
Epoch [1/1], Step [1969/13804], Loss: 3.2303, Perplexity: 25.2870, time_taken_in_seconds: 58
Epoch [1/1], Step [1970/13804], Loss: 2.7189, Perplexity: 15.1638, time_taken_in_seconds: 59
Epoch [1/1], Step [1971/13804], Loss: 2.6899, Perplexity: 14.7298, time_taken_in_seconds: 60
Epoch [1/1], Step [1972/13804], Loss: 2.8939, Perplexity: 18.0630, time_taken_in_seconds: 61
Epoch [1/1], Step [1973/13804], Loss: 3.1621, Perplexity: 23.6198, time_taken_in_seconds: 62
Epoch [1/1], Step [1974/13804], Loss: 3.2165, Perplexity: 24.9405, time_taken_in_seconds: 63
Epoch [1/1], Step [1975/13804], Loss: 2.6443, Perplexity: 14.0737, time_taken_in_seconds: 63
Epoch [1/1], Step [1976/13804], Loss: 3.1544, Perplexity: 23.4387, time_taken_in_seconds: 64
Epoch [1/1], Step [1977/13804], Loss: 2.8124, Perplexity: 16.6498, time_taken_in_seconds: 65
Epoch [1/1], Step [1978/13804], Loss: 2.9260, Perplexity: 18.6538, time_taken_in_seconds: 66
Epoch [1/1], Step [1979/13804], Loss: 2.8513, Perplexity: 17.3094, time_taken_in_seconds: 67
Epoch [1/1], Step [1980/13804], Loss: 3.1607, Perplexity: 23.5868, time_taken_in_seconds: 68
Epoch [1/1], Step [1981/13804], Loss: 2.8489, Perplexity: 17.2683, time_taken_in_seconds: 69
Epoch [1/1], Step [1982/13804], Loss: 2.6083, Perplexity: 13.5756, time_taken_in_seconds: 69
Epoch [1/1], Step [1983/13804], Loss: 3.0405, Perplexity: 20.9157, time_taken_in_seconds: 70
Epoch [1/1], Step [1984/13804], Loss: 2.9930, Perplexity: 19.9461, time_taken_in_seconds: 71
Epoch [1/1], Step [1985/13804], Loss: 2.9526, Perplexity: 19.1550, time_taken_in_seconds: 72
Epoch [1/1], Step [1986/13804], Loss: 2.8924, Perplexity: 18.0358, time_taken_in_seconds: 73
Epoch [1/1], Step [1987/13804], Loss: 2.9892, Perplexity: 19.8693, time_taken_in_seconds: 74
Epoch [1/1], Step [1988/13804], Loss: 2.9466, Perplexity: 19.0412, time_taken_in_seconds: 74
Epoch [1/1], Step [1989/13804], Loss: 2.8697, Perplexity: 17.6322, time_taken_in_seconds: 75
Epoch [1/1], Step [1990/13804], Loss: 2.8268, Perplexity: 16.8918, time_taken_in_seconds: 76
Epoch [1/1], Step [1991/13804], Loss: 2.7371, Perplexity: 15.4416, time_taken_in_seconds: 77
Epoch [1/1], Step [1992/13804], Loss: 3.2232, Perplexity: 25.1089, time_taken_in_seconds: 78
Epoch [1/1], Step [1993/13804], Loss: 2.8872, Perplexity: 17.9436, time_taken_in_seconds: 79
Epoch [1/1], Step [1994/13804], Loss: 3.1798, Perplexity: 24.0427, time_taken_in_seconds: 80
Epoch [1/1], Step [1995/13804], Loss: 2.9534, Perplexity: 19.1705, time_taken_in_seconds: 80
Epoch [1/1], Step [1996/13804], Loss: 2.6647, Perplexity: 14.3643, time_taken_in_seconds: 81
Epoch [1/1], Step [1997/13804], Loss: 3.0646, Perplexity: 21.4260, time_taken_in_seconds: 82
Epoch [1/1], Step [1998/13804], Loss: 2.8809, Perplexity: 17.8301, time_taken_in_seconds: 83
Epoch [1/1], Step [1999/13804], Loss: 2.7494, Perplexity: 15.6336, time_taken_in_seconds: 84
Epoch [1/1], Step [2000/13804], Loss: 3.4271, Perplexity: 30.7887, time_taken_in_seconds: 85
Epoch [1/1], Step [2001/13804], Loss: 2.5331, Perplexity: 12.5929, time_taken_in_seconds: 0
Epoch [1/1], Step [2002/13804], Loss: 2.7519, Perplexity: 15.6730, time_taken_in_seconds: 1
Epoch [1/1], Step [2003/13804], Loss: 2.5789, Perplexity: 13.1821, time_taken_in_seconds: 2
Epoch [1/1], Step [2004/13804], Loss: 2.8101, Perplexity: 16.6116, time_taken_in_seconds: 3
Epoch [1/1], Step [2005/13804], Loss: 2.9438, Perplexity: 18.9886, time_taken_in_seconds: 4
Epoch [1/1], Step [2006/13804], Loss: 3.8137, Perplexity: 45.3166, time_taken_in_seconds: 5
Epoch [1/1], Step [2007/13804], Loss: 2.6152, Perplexity: 13.6702, time_taken_in_seconds: 5
Epoch [1/1], Step [2008/13804], Loss: 3.5500, Perplexity: 34.8148, time_taken_in_seconds: 6
Epoch [1/1], Step [2009/13804], Loss: 2.9346, Perplexity: 18.8145, time_taken_in_seconds: 7
Epoch [1/1], Step [2010/13804], Loss: 3.3106, Perplexity: 27.4003, time_taken_in_seconds: 8
Epoch [1/1], Step [2011/13804], Loss: 3.0534, Perplexity: 21.1874, time_taken_in_seconds: 9
Epoch [1/1], Step [2012/13804], Loss: 3.2961, Perplexity: 27.0058, time_taken_in_seconds: 10
Epoch [1/1], Step [2013/13804], Loss: 2.7806, Perplexity: 16.1294, time_taken_in_seconds: 11
Epoch [1/1], Step [2014/13804], Loss: 2.9007, Perplexity: 18.1868, time_taken_in_seconds: 11
Epoch [1/1], Step [2015/13804], Loss: 3.0052, Perplexity: 20.1900, time_taken_in_seconds: 12
Epoch [1/1], Step [2016/13804], Loss: 3.0789, Perplexity: 21.7339, time_taken_in_seconds: 13
Epoch [1/1], Step [2017/13804], Loss: 2.7246, Perplexity: 15.2496, time_taken_in_seconds: 14
Epoch [1/1], Step [2018/13804], Loss: 3.2427, Perplexity: 25.6021, time_taken_in_seconds: 15
Epoch [1/1], Step [2019/13804], Loss: 3.0460, Perplexity: 21.0302, time_taken_in_seconds: 16
Epoch [1/1], Step [2020/13804], Loss: 2.8670, Perplexity: 17.5837, time_taken_in_seconds: 16
Epoch [1/1], Step [2021/13804], Loss: 3.0958, Perplexity: 22.1046, time_taken_in_seconds: 17
Epoch [1/1], Step [2022/13804], Loss: 3.3166, Perplexity: 27.5659, time_taken_in_seconds: 18
Epoch [1/1], Step [2023/13804], Loss: 3.4438, Perplexity: 31.3050, time_taken_in_seconds: 19
Epoch [1/1], Step [2024/13804], Loss: 3.3402, Perplexity: 28.2255, time_taken_in_seconds: 20
Epoch [1/1], Step [2025/13804], Loss: 2.8338, Perplexity: 17.0099, time_taken_in_seconds: 21
Epoch [1/1], Step [2026/13804], Loss: 2.9313, Perplexity: 18.7517, time_taken_in_seconds: 22
Epoch [1/1], Step [2027/13804], Loss: 2.8702, Perplexity: 17.6410, time_taken_in_seconds: 22
Epoch [1/1], Step [2028/13804], Loss: 2.9182, Perplexity: 18.5082, time_taken_in_seconds: 23
Epoch [1/1], Step [2029/13804], Loss: 2.9618, Perplexity: 19.3326, time_taken_in_seconds: 24
Epoch [1/1], Step [2030/13804], Loss: 2.8172, Perplexity: 16.7294, time_taken_in_seconds: 25
Epoch [1/1], Step [2031/13804], Loss: 3.2830, Perplexity: 26.6543, time_taken_in_seconds: 26
Epoch [1/1], Step [2032/13804], Loss: 3.1166, Perplexity: 22.5699, time_taken_in_seconds: 27
Epoch [1/1], Step [2033/13804], Loss: 2.9143, Perplexity: 18.4355, time_taken_in_seconds: 28
Epoch [1/1], Step [2034/13804], Loss: 3.1631, Perplexity: 23.6445, time_taken_in_seconds: 28
Epoch [1/1], Step [2035/13804], Loss: 2.9373, Perplexity: 18.8642, time_taken_in_seconds: 29
Epoch [1/1], Step [2036/13804], Loss: 3.1459, Perplexity: 23.2404, time_taken_in_seconds: 30
Epoch [1/1], Step [2037/13804], Loss: 3.2540, Perplexity: 25.8925, time_taken_in_seconds: 31
Epoch [1/1], Step [2038/13804], Loss: 2.8439, Perplexity: 17.1829, time_taken_in_seconds: 32
Epoch [1/1], Step [2039/13804], Loss: 2.8605, Perplexity: 17.4698, time_taken_in_seconds: 33
Epoch [1/1], Step [2040/13804], Loss: 3.3706, Perplexity: 29.0969, time_taken_in_seconds: 34
Epoch [1/1], Step [2041/13804], Loss: 2.6546, Perplexity: 14.2188, time_taken_in_seconds: 35
Epoch [1/1], Step [2042/13804], Loss: 3.3613, Perplexity: 28.8258, time_taken_in_seconds: 35
Epoch [1/1], Step [2043/13804], Loss: 2.9479, Perplexity: 19.0666, time_taken_in_seconds: 36
Epoch [1/1], Step [2044/13804], Loss: 2.9285, Perplexity: 18.7001, time_taken_in_seconds: 37
Epoch [1/1], Step [2045/13804], Loss: 3.0344, Perplexity: 20.7889, time_taken_in_seconds: 38
Epoch [1/1], Step [2046/13804], Loss: 3.0032, Perplexity: 20.1507, time_taken_in_seconds: 39
Epoch [1/1], Step [2047/13804], Loss: 2.6358, Perplexity: 13.9549, time_taken_in_seconds: 40
Epoch [1/1], Step [2048/13804], Loss: 3.0253, Perplexity: 20.5994, time_taken_in_seconds: 40
Epoch [1/1], Step [2049/13804], Loss: 2.7441, Perplexity: 15.5507, time_taken_in_seconds: 41
Epoch [1/1], Step [2050/13804], Loss: 2.6709, Perplexity: 14.4531, time_taken_in_seconds: 42
Epoch [1/1], Step [2051/13804], Loss: 3.2057, Perplexity: 24.6716, time_taken_in_seconds: 43
Epoch [1/1], Step [2052/13804], Loss: 3.0644, Perplexity: 21.4220, time_taken_in_seconds: 44
Epoch [1/1], Step [2053/13804], Loss: 2.5617, Perplexity: 12.9584, time_taken_in_seconds: 45
Epoch [1/1], Step [2054/13804], Loss: 4.0771, Perplexity: 58.9749, time_taken_in_seconds: 46
Epoch [1/1], Step [2055/13804], Loss: 3.0059, Perplexity: 20.2051, time_taken_in_seconds: 46
Epoch [1/1], Step [2056/13804], Loss: 2.7828, Perplexity: 16.1647, time_taken_in_seconds: 47
Epoch [1/1], Step [2057/13804], Loss: 2.7481, Perplexity: 15.6123, time_taken_in_seconds: 48
Epoch [1/1], Step [2058/13804], Loss: 2.5595, Perplexity: 12.9291, time_taken_in_seconds: 49
Epoch [1/1], Step [2059/13804], Loss: 2.8173, Perplexity: 16.7323, time_taken_in_seconds: 50
Epoch [1/1], Step [2060/13804], Loss: 3.1727, Perplexity: 23.8720, time_taken_in_seconds: 51
Epoch [1/1], Step [2061/13804], Loss: 3.2802, Perplexity: 26.5812, time_taken_in_seconds: 51
Epoch [1/1], Step [2062/13804], Loss: 2.5288, Perplexity: 12.5383, time_taken_in_seconds: 52
Epoch [1/1], Step [2063/13804], Loss: 2.6347, Perplexity: 13.9388, time_taken_in_seconds: 53
Epoch [1/1], Step [2064/13804], Loss: 2.6889, Perplexity: 14.7151, time_taken_in_seconds: 54
Epoch [1/1], Step [2065/13804], Loss: 2.9160, Perplexity: 18.4667, time_taken_in_seconds: 55
Epoch [1/1], Step [2066/13804], Loss: 3.3660, Perplexity: 28.9636, time_taken_in_seconds: 56
Epoch [1/1], Step [2067/13804], Loss: 3.0045, Perplexity: 20.1756, time_taken_in_seconds: 56
Epoch [1/1], Step [2068/13804], Loss: 2.7630, Perplexity: 15.8470, time_taken_in_seconds: 57
Epoch [1/1], Step [2069/13804], Loss: 3.0389, Perplexity: 20.8814, time_taken_in_seconds: 58
Epoch [1/1], Step [2070/13804], Loss: 2.9870, Perplexity: 19.8255, time_taken_in_seconds: 59
Epoch [1/1], Step [2071/13804], Loss: 2.6378, Perplexity: 13.9822, time_taken_in_seconds: 60
Epoch [1/1], Step [2072/13804], Loss: 3.0071, Perplexity: 20.2287, time_taken_in_seconds: 61
Epoch [1/1], Step [2073/13804], Loss: 2.8858, Perplexity: 17.9183, time_taken_in_seconds: 62
Epoch [1/1], Step [2074/13804], Loss: 3.1522, Perplexity: 23.3875, time_taken_in_seconds: 62
Epoch [1/1], Step [2075/13804], Loss: 3.2001, Perplexity: 24.5355, time_taken_in_seconds: 63
Epoch [1/1], Step [2076/13804], Loss: 3.0405, Perplexity: 20.9154, time_taken_in_seconds: 64
Epoch [1/1], Step [2077/13804], Loss: 3.0405, Perplexity: 20.9150, time_taken_in_seconds: 65
Epoch [1/1], Step [2078/13804], Loss: 2.7930, Perplexity: 16.3302, time_taken_in_seconds: 66
Epoch [1/1], Step [2079/13804], Loss: 3.0283, Perplexity: 20.6613, time_taken_in_seconds: 67
Epoch [1/1], Step [2080/13804], Loss: 2.7408, Perplexity: 15.4997, time_taken_in_seconds: 68
Epoch [1/1], Step [2081/13804], Loss: 3.4394, Perplexity: 31.1698, time_taken_in_seconds: 68
Epoch [1/1], Step [2082/13804], Loss: 2.7573, Perplexity: 15.7570, time_taken_in_seconds: 69
Epoch [1/1], Step [2083/13804], Loss: 3.1470, Perplexity: 23.2654, time_taken_in_seconds: 70
Epoch [1/1], Step [2084/13804], Loss: 4.9144, Perplexity: 136.2396, time_taken_in_seconds: 71
Epoch [1/1], Step [2085/13804], Loss: 3.7039, Perplexity: 40.6037, time_taken_in_seconds: 72
Epoch [1/1], Step [2086/13804], Loss: 3.2552, Perplexity: 25.9251, time_taken_in_seconds: 73
Epoch [1/1], Step [2087/13804], Loss: 2.8119, Perplexity: 16.6417, time_taken_in_seconds: 73
Epoch [1/1], Step [2088/13804], Loss: 3.4302, Perplexity: 30.8833, time_taken_in_seconds: 74
Epoch [1/1], Step [2089/13804], Loss: 2.5219, Perplexity: 12.4516, time_taken_in_seconds: 75
Epoch [1/1], Step [2090/13804], Loss: 3.0969, Perplexity: 22.1302, time_taken_in_seconds: 76
Epoch [1/1], Step [2091/13804], Loss: 3.1737, Perplexity: 23.8961, time_taken_in_seconds: 77
Epoch [1/1], Step [2092/13804], Loss: 3.1315, Perplexity: 22.9078, time_taken_in_seconds: 78
Epoch [1/1], Step [2093/13804], Loss: 2.9280, Perplexity: 18.6893, time_taken_in_seconds: 79
Epoch [1/1], Step [2094/13804], Loss: 3.4327, Perplexity: 30.9605, time_taken_in_seconds: 79
Epoch [1/1], Step [2095/13804], Loss: 3.0861, Perplexity: 21.8922, time_taken_in_seconds: 80
Epoch [1/1], Step [2096/13804], Loss: 2.8333, Perplexity: 17.0013, time_taken_in_seconds: 81
Epoch [1/1], Step [2097/13804], Loss: 3.0116, Perplexity: 20.3204, time_taken_in_seconds: 82
Epoch [1/1], Step [2098/13804], Loss: 2.8370, Perplexity: 17.0644, time_taken_in_seconds: 83
Epoch [1/1], Step [2099/13804], Loss: 2.9061, Perplexity: 18.2855, time_taken_in_seconds: 84
Epoch [1/1], Step [2100/13804], Loss: 2.6511, Perplexity: 14.1691, time_taken_in_seconds: 84
Epoch [1/1], Step [2101/13804], Loss: 2.9595, Perplexity: 19.2881, time_taken_in_seconds: 0
Epoch [1/1], Step [2102/13804], Loss: 3.0918, Perplexity: 22.0158, time_taken_in_seconds: 1
Epoch [1/1], Step [2103/13804], Loss: 3.5357, Perplexity: 34.3186, time_taken_in_seconds: 2
Epoch [1/1], Step [2104/13804], Loss: 2.8722, Perplexity: 17.6751, time_taken_in_seconds: 3
Epoch [1/1], Step [2105/13804], Loss: 3.0756, Perplexity: 21.6624, time_taken_in_seconds: 4
Epoch [1/1], Step [2106/13804], Loss: 2.7570, Perplexity: 15.7530, time_taken_in_seconds: 5
Epoch [1/1], Step [2107/13804], Loss: 3.7846, Perplexity: 44.0194, time_taken_in_seconds: 5
Epoch [1/1], Step [2108/13804], Loss: 2.9729, Perplexity: 19.5482, time_taken_in_seconds: 7
Epoch [1/1], Step [2109/13804], Loss: 2.9057, Perplexity: 18.2781, time_taken_in_seconds: 7
Epoch [1/1], Step [2110/13804], Loss: 3.0647, Perplexity: 21.4284, time_taken_in_seconds: 8
Epoch [1/1], Step [2111/13804], Loss: 2.9777, Perplexity: 19.6433, time_taken_in_seconds: 9
Epoch [1/1], Step [2112/13804], Loss: 2.8683, Perplexity: 17.6063, time_taken_in_seconds: 10
Epoch [1/1], Step [2113/13804], Loss: 3.3667, Perplexity: 28.9835, time_taken_in_seconds: 11
Epoch [1/1], Step [2114/13804], Loss: 3.1731, Perplexity: 23.8818, time_taken_in_seconds: 12
Epoch [1/1], Step [2115/13804], Loss: 2.6969, Perplexity: 14.8338, time_taken_in_seconds: 13
Epoch [1/1], Step [2116/13804], Loss: 2.9466, Perplexity: 19.0417, time_taken_in_seconds: 13
Epoch [1/1], Step [2117/13804], Loss: 2.7136, Perplexity: 15.0838, time_taken_in_seconds: 14
Epoch [1/1], Step [2118/13804], Loss: 2.8497, Perplexity: 17.2818, time_taken_in_seconds: 15
Epoch [1/1], Step [2119/13804], Loss: 2.8118, Perplexity: 16.6393, time_taken_in_seconds: 16
Epoch [1/1], Step [2120/13804], Loss: 2.7470, Perplexity: 15.5960, time_taken_in_seconds: 17
Epoch [1/1], Step [2121/13804], Loss: 3.3077, Perplexity: 27.3231, time_taken_in_seconds: 18
Epoch [1/1], Step [2122/13804], Loss: 2.9944, Perplexity: 19.9733, time_taken_in_seconds: 18
Epoch [1/1], Step [2123/13804], Loss: 2.9161, Perplexity: 18.4698, time_taken_in_seconds: 19
Epoch [1/1], Step [2124/13804], Loss: 3.1235, Perplexity: 22.7248, time_taken_in_seconds: 20
Epoch [1/1], Step [2125/13804], Loss: 2.8650, Perplexity: 17.5493, time_taken_in_seconds: 21
Epoch [1/1], Step [2126/13804], Loss: 2.7686, Perplexity: 15.9369, time_taken_in_seconds: 22
Epoch [1/1], Step [2127/13804], Loss: 2.8659, Perplexity: 17.5647, time_taken_in_seconds: 23
Epoch [1/1], Step [2128/13804], Loss: 2.8951, Perplexity: 18.0846, time_taken_in_seconds: 23
Epoch [1/1], Step [2129/13804], Loss: 3.1835, Perplexity: 24.1313, time_taken_in_seconds: 24
Epoch [1/1], Step [2130/13804], Loss: 2.7396, Perplexity: 15.4807, time_taken_in_seconds: 25
Epoch [1/1], Step [2131/13804], Loss: 2.7150, Perplexity: 15.1043, time_taken_in_seconds: 26
Epoch [1/1], Step [2132/13804], Loss: 3.0064, Perplexity: 20.2147, time_taken_in_seconds: 27
Epoch [1/1], Step [2133/13804], Loss: 3.2864, Perplexity: 26.7466, time_taken_in_seconds: 28
Epoch [1/1], Step [2134/13804], Loss: 2.8445, Perplexity: 17.1922, time_taken_in_seconds: 29
Epoch [1/1], Step [2135/13804], Loss: 2.7809, Perplexity: 16.1342, time_taken_in_seconds: 29
Epoch [1/1], Step [2136/13804], Loss: 2.9143, Perplexity: 18.4360, time_taken_in_seconds: 30
Epoch [1/1], Step [2137/13804], Loss: 3.3288, Perplexity: 27.9052, time_taken_in_seconds: 31
Epoch [1/1], Step [2138/13804], Loss: 2.5713, Perplexity: 13.0824, time_taken_in_seconds: 32
Epoch [1/1], Step [2139/13804], Loss: 2.9684, Perplexity: 19.4617, time_taken_in_seconds: 33
Epoch [1/1], Step [2140/13804], Loss: 2.9874, Perplexity: 19.8333, time_taken_in_seconds: 34
Epoch [1/1], Step [2141/13804], Loss: 3.2357, Perplexity: 25.4244, time_taken_in_seconds: 34
Epoch [1/1], Step [2142/13804], Loss: 3.0003, Perplexity: 20.0912, time_taken_in_seconds: 35
Epoch [1/1], Step [2143/13804], Loss: 3.4215, Perplexity: 30.6145, time_taken_in_seconds: 36
Epoch [1/1], Step [2144/13804], Loss: 3.4317, Perplexity: 30.9295, time_taken_in_seconds: 37
Epoch [1/1], Step [2145/13804], Loss: 3.6992, Perplexity: 40.4157, time_taken_in_seconds: 38
Epoch [1/1], Step [2146/13804], Loss: 2.8513, Perplexity: 17.3108, time_taken_in_seconds: 39
Epoch [1/1], Step [2147/13804], Loss: 2.7389, Perplexity: 15.4694, time_taken_in_seconds: 40
Epoch [1/1], Step [2148/13804], Loss: 2.8577, Perplexity: 17.4217, time_taken_in_seconds: 40
Epoch [1/1], Step [2149/13804], Loss: 2.8086, Perplexity: 16.5865, time_taken_in_seconds: 41
Epoch [1/1], Step [2150/13804], Loss: 2.6221, Perplexity: 13.7645, time_taken_in_seconds: 42
Epoch [1/1], Step [2151/13804], Loss: 3.2352, Perplexity: 25.4122, time_taken_in_seconds: 43
Epoch [1/1], Step [2152/13804], Loss: 2.9802, Perplexity: 19.6927, time_taken_in_seconds: 44
Epoch [1/1], Step [2153/13804], Loss: 3.0081, Perplexity: 20.2492, time_taken_in_seconds: 45
Epoch [1/1], Step [2154/13804], Loss: 2.5379, Perplexity: 12.6528, time_taken_in_seconds: 45
Epoch [1/1], Step [2155/13804], Loss: 3.0391, Perplexity: 20.8859, time_taken_in_seconds: 46
Epoch [1/1], Step [2156/13804], Loss: 3.4906, Perplexity: 32.8043, time_taken_in_seconds: 47
Epoch [1/1], Step [2157/13804], Loss: 2.8958, Perplexity: 18.0980, time_taken_in_seconds: 48
Epoch [1/1], Step [2158/13804], Loss: 2.5396, Perplexity: 12.6746, time_taken_in_seconds: 49
Epoch [1/1], Step [2159/13804], Loss: 2.7874, Perplexity: 16.2392, time_taken_in_seconds: 50
Epoch [1/1], Step [2160/13804], Loss: 2.6055, Perplexity: 13.5385, time_taken_in_seconds: 51
Epoch [1/1], Step [2161/13804], Loss: 2.6879, Perplexity: 14.7004, time_taken_in_seconds: 51
Epoch [1/1], Step [2162/13804], Loss: 2.8403, Perplexity: 17.1203, time_taken_in_seconds: 52
Epoch [1/1], Step [2163/13804], Loss: 3.1101, Perplexity: 22.4227, time_taken_in_seconds: 53
Epoch [1/1], Step [2164/13804], Loss: 3.1314, Perplexity: 22.9071, time_taken_in_seconds: 54
Epoch [1/1], Step [2165/13804], Loss: 3.2895, Perplexity: 26.8300, time_taken_in_seconds: 55
Epoch [1/1], Step [2166/13804], Loss: 2.9193, Perplexity: 18.5274, time_taken_in_seconds: 56
Epoch [1/1], Step [2167/13804], Loss: 2.8529, Perplexity: 17.3382, time_taken_in_seconds: 56
Epoch [1/1], Step [2168/13804], Loss: 2.9230, Perplexity: 18.5961, time_taken_in_seconds: 57
Epoch [1/1], Step [2169/13804], Loss: 2.8377, Perplexity: 17.0769, time_taken_in_seconds: 58
Epoch [1/1], Step [2170/13804], Loss: 2.8379, Perplexity: 17.0791, time_taken_in_seconds: 59
Epoch [1/1], Step [2171/13804], Loss: 2.9508, Perplexity: 19.1208, time_taken_in_seconds: 60
Epoch [1/1], Step [2172/13804], Loss: 3.0979, Perplexity: 22.1512, time_taken_in_seconds: 61
Epoch [1/1], Step [2173/13804], Loss: 3.3470, Perplexity: 28.4167, time_taken_in_seconds: 61
Epoch [1/1], Step [2174/13804], Loss: 2.8293, Perplexity: 16.9333, time_taken_in_seconds: 62
Epoch [1/1], Step [2175/13804], Loss: 2.8213, Perplexity: 16.7993, time_taken_in_seconds: 63
Epoch [1/1], Step [2176/13804], Loss: 2.6611, Perplexity: 14.3123, time_taken_in_seconds: 64
Epoch [1/1], Step [2177/13804], Loss: 2.6231, Perplexity: 13.7785, time_taken_in_seconds: 65
Epoch [1/1], Step [2178/13804], Loss: 2.7815, Perplexity: 16.1433, time_taken_in_seconds: 66
Epoch [1/1], Step [2179/13804], Loss: 3.1464, Perplexity: 23.2513, time_taken_in_seconds: 67
Epoch [1/1], Step [2180/13804], Loss: 3.4414, Perplexity: 31.2306, time_taken_in_seconds: 68
Epoch [1/1], Step [2181/13804], Loss: 3.1436, Perplexity: 23.1863, time_taken_in_seconds: 68
Epoch [1/1], Step [2182/13804], Loss: 3.1435, Perplexity: 23.1838, time_taken_in_seconds: 69
Epoch [1/1], Step [2183/13804], Loss: 3.0281, Perplexity: 20.6577, time_taken_in_seconds: 70
Epoch [1/1], Step [2184/13804], Loss: 2.8687, Perplexity: 17.6143, time_taken_in_seconds: 71
Epoch [1/1], Step [2185/13804], Loss: 2.5249, Perplexity: 12.4895, time_taken_in_seconds: 72
Epoch [1/1], Step [2186/13804], Loss: 2.9589, Perplexity: 19.2772, time_taken_in_seconds: 73
Epoch [1/1], Step [2187/13804], Loss: 2.8426, Perplexity: 17.1596, time_taken_in_seconds: 73
Epoch [1/1], Step [2188/13804], Loss: 2.9680, Perplexity: 19.4534, time_taken_in_seconds: 74
Epoch [1/1], Step [2189/13804], Loss: 2.7941, Perplexity: 16.3479, time_taken_in_seconds: 75
Epoch [1/1], Step [2190/13804], Loss: 2.9514, Perplexity: 19.1332, time_taken_in_seconds: 76
Epoch [1/1], Step [2191/13804], Loss: 2.8348, Perplexity: 17.0269, time_taken_in_seconds: 77
Epoch [1/1], Step [2192/13804], Loss: 3.0603, Perplexity: 21.3345, time_taken_in_seconds: 78
Epoch [1/1], Step [2193/13804], Loss: 2.4868, Perplexity: 12.0223, time_taken_in_seconds: 79
Epoch [1/1], Step [2194/13804], Loss: 2.7972, Perplexity: 16.3990, time_taken_in_seconds: 79
Epoch [1/1], Step [2195/13804], Loss: 2.8293, Perplexity: 16.9332, time_taken_in_seconds: 80
Epoch [1/1], Step [2196/13804], Loss: 3.5186, Perplexity: 33.7372, time_taken_in_seconds: 81
Epoch [1/1], Step [2197/13804], Loss: 2.8524, Perplexity: 17.3290, time_taken_in_seconds: 82
Epoch [1/1], Step [2198/13804], Loss: 2.8955, Perplexity: 18.0923, time_taken_in_seconds: 83
Epoch [1/1], Step [2199/13804], Loss: 2.6614, Perplexity: 14.3156, time_taken_in_seconds: 84
Epoch [1/1], Step [2200/13804], Loss: 3.5289, Perplexity: 34.0849, time_taken_in_seconds: 84
Epoch [1/1], Step [2201/13804], Loss: 2.7964, Perplexity: 16.3854, time_taken_in_seconds: 0
Epoch [1/1], Step [2202/13804], Loss: 2.9132, Perplexity: 18.4161, time_taken_in_seconds: 1
Epoch [1/1], Step [2203/13804], Loss: 3.0085, Perplexity: 20.2576, time_taken_in_seconds: 2
Epoch [1/1], Step [2204/13804], Loss: 2.8603, Perplexity: 17.4676, time_taken_in_seconds: 3
Epoch [1/1], Step [2205/13804], Loss: 3.3064, Perplexity: 27.2869, time_taken_in_seconds: 4
Epoch [1/1], Step [2206/13804], Loss: 2.6813, Perplexity: 14.6042, time_taken_in_seconds: 5
Epoch [1/1], Step [2207/13804], Loss: 3.0678, Perplexity: 21.4938, time_taken_in_seconds: 5
Epoch [1/1], Step [2208/13804], Loss: 2.8541, Perplexity: 17.3588, time_taken_in_seconds: 6
Epoch [1/1], Step [2209/13804], Loss: 5.2885, Perplexity: 198.0453, time_taken_in_seconds: 7
Epoch [1/1], Step [2210/13804], Loss: 2.9607, Perplexity: 19.3116, time_taken_in_seconds: 8
Epoch [1/1], Step [2211/13804], Loss: 3.4775, Perplexity: 32.3794, time_taken_in_seconds: 9
Epoch [1/1], Step [2212/13804], Loss: 3.0008, Perplexity: 20.1024, time_taken_in_seconds: 10
Epoch [1/1], Step [2213/13804], Loss: 2.7878, Perplexity: 16.2450, time_taken_in_seconds: 10
Epoch [1/1], Step [2214/13804], Loss: 3.1136, Perplexity: 22.5018, time_taken_in_seconds: 11
Epoch [1/1], Step [2215/13804], Loss: 3.0189, Perplexity: 20.4678, time_taken_in_seconds: 12
Epoch [1/1], Step [2216/13804], Loss: 2.7442, Perplexity: 15.5523, time_taken_in_seconds: 13
Epoch [1/1], Step [2217/13804], Loss: 2.9570, Perplexity: 19.2392, time_taken_in_seconds: 14
Epoch [1/1], Step [2218/13804], Loss: 3.0664, Perplexity: 21.4640, time_taken_in_seconds: 15
Epoch [1/1], Step [2219/13804], Loss: 3.4067, Perplexity: 30.1650, time_taken_in_seconds: 15
Epoch [1/1], Step [2220/13804], Loss: 3.0426, Perplexity: 20.9601, time_taken_in_seconds: 16
Epoch [1/1], Step [2221/13804], Loss: 3.0139, Perplexity: 20.3659, time_taken_in_seconds: 17
Epoch [1/1], Step [2222/13804], Loss: 2.8380, Perplexity: 17.0814, time_taken_in_seconds: 18
Epoch [1/1], Step [2223/13804], Loss: 2.7493, Perplexity: 15.6324, time_taken_in_seconds: 19
Epoch [1/1], Step [2224/13804], Loss: 2.6581, Perplexity: 14.2686, time_taken_in_seconds: 20
Epoch [1/1], Step [2225/13804], Loss: 3.1507, Perplexity: 23.3523, time_taken_in_seconds: 21
Epoch [1/1], Step [2226/13804], Loss: 3.0548, Perplexity: 21.2174, time_taken_in_seconds: 21
Epoch [1/1], Step [2227/13804], Loss: 2.9257, Perplexity: 18.6465, time_taken_in_seconds: 22
Epoch [1/1], Step [2228/13804], Loss: 2.7484, Perplexity: 15.6180, time_taken_in_seconds: 23
Epoch [1/1], Step [2229/13804], Loss: 2.6677, Perplexity: 14.4061, time_taken_in_seconds: 24
Epoch [1/1], Step [2230/13804], Loss: 2.8805, Perplexity: 17.8231, time_taken_in_seconds: 25
Epoch [1/1], Step [2231/13804], Loss: 3.3401, Perplexity: 28.2213, time_taken_in_seconds: 26
Epoch [1/1], Step [2232/13804], Loss: 2.5264, Perplexity: 12.5081, time_taken_in_seconds: 26
Epoch [1/1], Step [2233/13804], Loss: 3.1291, Perplexity: 22.8535, time_taken_in_seconds: 27
Epoch [1/1], Step [2234/13804], Loss: 3.1798, Perplexity: 24.0422, time_taken_in_seconds: 28
Epoch [1/1], Step [2235/13804], Loss: 2.5285, Perplexity: 12.5349, time_taken_in_seconds: 29
Epoch [1/1], Step [2236/13804], Loss: 3.3156, Perplexity: 27.5396, time_taken_in_seconds: 30
Epoch [1/1], Step [2237/13804], Loss: 2.7709, Perplexity: 15.9727, time_taken_in_seconds: 31
Epoch [1/1], Step [2238/13804], Loss: 2.9405, Perplexity: 18.9254, time_taken_in_seconds: 32
Epoch [1/1], Step [2239/13804], Loss: 2.8521, Perplexity: 17.3245, time_taken_in_seconds: 32
Epoch [1/1], Step [2240/13804], Loss: 2.7469, Perplexity: 15.5949, time_taken_in_seconds: 33
Epoch [1/1], Step [2241/13804], Loss: 2.7934, Perplexity: 16.3368, time_taken_in_seconds: 34
Epoch [1/1], Step [2242/13804], Loss: 3.0515, Perplexity: 21.1466, time_taken_in_seconds: 35
Epoch [1/1], Step [2243/13804], Loss: 2.8206, Perplexity: 16.7870, time_taken_in_seconds: 36
Epoch [1/1], Step [2244/13804], Loss: 3.2935, Perplexity: 26.9377, time_taken_in_seconds: 37
Epoch [1/1], Step [2245/13804], Loss: 2.9134, Perplexity: 18.4191, time_taken_in_seconds: 37
Epoch [1/1], Step [2246/13804], Loss: 2.9991, Perplexity: 20.0673, time_taken_in_seconds: 38
Epoch [1/1], Step [2247/13804], Loss: 2.7243, Perplexity: 15.2463, time_taken_in_seconds: 39
Epoch [1/1], Step [2248/13804], Loss: 3.2591, Perplexity: 26.0269, time_taken_in_seconds: 40
Epoch [1/1], Step [2249/13804], Loss: 2.7818, Perplexity: 16.1477, time_taken_in_seconds: 41
Epoch [1/1], Step [2250/13804], Loss: 2.5676, Perplexity: 13.0340, time_taken_in_seconds: 42
Epoch [1/1], Step [2251/13804], Loss: 3.1753, Perplexity: 23.9350, time_taken_in_seconds: 43
Epoch [1/1], Step [2252/13804], Loss: 2.5970, Perplexity: 13.4239, time_taken_in_seconds: 44
Epoch [1/1], Step [2253/13804], Loss: 3.3751, Perplexity: 29.2265, time_taken_in_seconds: 45
Epoch [1/1], Step [2254/13804], Loss: 2.7183, Perplexity: 15.1541, time_taken_in_seconds: 45
Epoch [1/1], Step [2255/13804], Loss: 3.0395, Perplexity: 20.8954, time_taken_in_seconds: 46
Epoch [1/1], Step [2256/13804], Loss: 2.7307, Perplexity: 15.3435, time_taken_in_seconds: 47
Epoch [1/1], Step [2257/13804], Loss: 2.9276, Perplexity: 18.6828, time_taken_in_seconds: 48
Epoch [1/1], Step [2258/13804], Loss: 2.9332, Perplexity: 18.7869, time_taken_in_seconds: 49
Epoch [1/1], Step [2259/13804], Loss: 2.6078, Perplexity: 13.5690, time_taken_in_seconds: 50
Epoch [1/1], Step [2260/13804], Loss: 2.8317, Perplexity: 16.9751, time_taken_in_seconds: 50
Epoch [1/1], Step [2261/13804], Loss: 2.8866, Perplexity: 17.9326, time_taken_in_seconds: 51
Epoch [1/1], Step [2262/13804], Loss: 2.8574, Perplexity: 17.4160, time_taken_in_seconds: 52
Epoch [1/1], Step [2263/13804], Loss: 3.1672, Perplexity: 23.7412, time_taken_in_seconds: 53
Epoch [1/1], Step [2264/13804], Loss: 2.7775, Perplexity: 16.0787, time_taken_in_seconds: 54
Epoch [1/1], Step [2265/13804], Loss: 2.7144, Perplexity: 15.0963, time_taken_in_seconds: 55
Epoch [1/1], Step [2266/13804], Loss: 3.2079, Perplexity: 24.7267, time_taken_in_seconds: 56
Epoch [1/1], Step [2267/13804], Loss: 3.4613, Perplexity: 31.8569, time_taken_in_seconds: 56
Epoch [1/1], Step [2268/13804], Loss: 3.0817, Perplexity: 21.7959, time_taken_in_seconds: 57
Epoch [1/1], Step [2269/13804], Loss: 2.8241, Perplexity: 16.8465, time_taken_in_seconds: 58
Epoch [1/1], Step [2270/13804], Loss: 2.9536, Perplexity: 19.1756, time_taken_in_seconds: 59
Epoch [1/1], Step [2271/13804], Loss: 3.0839, Perplexity: 21.8424, time_taken_in_seconds: 60
Epoch [1/1], Step [2272/13804], Loss: 3.1028, Perplexity: 22.2592, time_taken_in_seconds: 61
Epoch [1/1], Step [2273/13804], Loss: 2.9316, Perplexity: 18.7583, time_taken_in_seconds: 61
Epoch [1/1], Step [2274/13804], Loss: 3.0274, Perplexity: 20.6441, time_taken_in_seconds: 62
Epoch [1/1], Step [2275/13804], Loss: 2.8022, Perplexity: 16.4812, time_taken_in_seconds: 63
Epoch [1/1], Step [2276/13804], Loss: 2.9408, Perplexity: 18.9303, time_taken_in_seconds: 64
Epoch [1/1], Step [2277/13804], Loss: 2.8473, Perplexity: 17.2416, time_taken_in_seconds: 65
Epoch [1/1], Step [2278/13804], Loss: 3.0915, Perplexity: 22.0090, time_taken_in_seconds: 66
Epoch [1/1], Step [2279/13804], Loss: 3.1818, Perplexity: 24.0892, time_taken_in_seconds: 66
Epoch [1/1], Step [2280/13804], Loss: 2.9475, Perplexity: 19.0576, time_taken_in_seconds: 67
Epoch [1/1], Step [2281/13804], Loss: 3.2877, Perplexity: 26.7806, time_taken_in_seconds: 68
Epoch [1/1], Step [2282/13804], Loss: 2.7562, Perplexity: 15.7394, time_taken_in_seconds: 69
Epoch [1/1], Step [2283/13804], Loss: 2.6749, Perplexity: 14.5104, time_taken_in_seconds: 70
Epoch [1/1], Step [2284/13804], Loss: 2.7998, Perplexity: 16.4417, time_taken_in_seconds: 71
Epoch [1/1], Step [2285/13804], Loss: 3.2847, Perplexity: 26.7012, time_taken_in_seconds: 71
Epoch [1/1], Step [2286/13804], Loss: 3.1786, Perplexity: 24.0123, time_taken_in_seconds: 72
Epoch [1/1], Step [2287/13804], Loss: 3.0247, Perplexity: 20.5875, time_taken_in_seconds: 73
Epoch [1/1], Step [2288/13804], Loss: 3.1410, Perplexity: 23.1262, time_taken_in_seconds: 74
Epoch [1/1], Step [2289/13804], Loss: 3.0955, Perplexity: 22.0981, time_taken_in_seconds: 75
Epoch [1/1], Step [2290/13804], Loss: 2.7520, Perplexity: 15.6733, time_taken_in_seconds: 76
Epoch [1/1], Step [2291/13804], Loss: 2.6140, Perplexity: 13.6539, time_taken_in_seconds: 77
Epoch [1/1], Step [2292/13804], Loss: 2.8608, Perplexity: 17.4756, time_taken_in_seconds: 77
Epoch [1/1], Step [2293/13804], Loss: 3.5046, Perplexity: 33.2671, time_taken_in_seconds: 78
Epoch [1/1], Step [2294/13804], Loss: 2.8744, Perplexity: 17.7149, time_taken_in_seconds: 79
Epoch [1/1], Step [2295/13804], Loss: 3.3044, Perplexity: 27.2323, time_taken_in_seconds: 80
Epoch [1/1], Step [2296/13804], Loss: 2.6011, Perplexity: 13.4788, time_taken_in_seconds: 81
Epoch [1/1], Step [2297/13804], Loss: 2.9724, Perplexity: 19.5393, time_taken_in_seconds: 82
Epoch [1/1], Step [2298/13804], Loss: 2.9303, Perplexity: 18.7328, time_taken_in_seconds: 82
Epoch [1/1], Step [2299/13804], Loss: 3.2093, Perplexity: 24.7610, time_taken_in_seconds: 83
Epoch [1/1], Step [2300/13804], Loss: 2.7439, Perplexity: 15.5476, time_taken_in_seconds: 84
Epoch [1/1], Step [2301/13804], Loss: 3.0195, Perplexity: 20.4816, time_taken_in_seconds: 0
Epoch [1/1], Step [2302/13804], Loss: 2.6881, Perplexity: 14.7039, time_taken_in_seconds: 1
Epoch [1/1], Step [2303/13804], Loss: 2.7535, Perplexity: 15.6981, time_taken_in_seconds: 2
Epoch [1/1], Step [2304/13804], Loss: 2.7512, Perplexity: 15.6607, time_taken_in_seconds: 3
Epoch [1/1], Step [2305/13804], Loss: 2.7661, Perplexity: 15.8958, time_taken_in_seconds: 4
Epoch [1/1], Step [2306/13804], Loss: 3.1723, Perplexity: 23.8630, time_taken_in_seconds: 5
Epoch [1/1], Step [2307/13804], Loss: 2.9391, Perplexity: 18.8987, time_taken_in_seconds: 5
Epoch [1/1], Step [2308/13804], Loss: 2.8139, Perplexity: 16.6744, time_taken_in_seconds: 6
Epoch [1/1], Step [2309/13804], Loss: 3.2111, Perplexity: 24.8062, time_taken_in_seconds: 7
Epoch [1/1], Step [2310/13804], Loss: 2.8885, Perplexity: 17.9655, time_taken_in_seconds: 8
Epoch [1/1], Step [2311/13804], Loss: 3.5467, Perplexity: 34.6976, time_taken_in_seconds: 9
Epoch [1/1], Step [2312/13804], Loss: 2.9769, Perplexity: 19.6266, time_taken_in_seconds: 10
Epoch [1/1], Step [2313/13804], Loss: 3.2852, Perplexity: 26.7155, time_taken_in_seconds: 10
Epoch [1/1], Step [2314/13804], Loss: 3.2100, Perplexity: 24.7803, time_taken_in_seconds: 11
Epoch [1/1], Step [2315/13804], Loss: 2.7501, Perplexity: 15.6448, time_taken_in_seconds: 12
Epoch [1/1], Step [2316/13804], Loss: 2.9676, Perplexity: 19.4446, time_taken_in_seconds: 13
Epoch [1/1], Step [2317/13804], Loss: 2.8740, Perplexity: 17.7079, time_taken_in_seconds: 14
Epoch [1/1], Step [2318/13804], Loss: 3.1617, Perplexity: 23.6107, time_taken_in_seconds: 15
Epoch [1/1], Step [2319/13804], Loss: 2.8412, Perplexity: 17.1369, time_taken_in_seconds: 16
Epoch [1/1], Step [2320/13804], Loss: 2.8442, Perplexity: 17.1878, time_taken_in_seconds: 16
Epoch [1/1], Step [2321/13804], Loss: 3.1148, Perplexity: 22.5282, time_taken_in_seconds: 17
Epoch [1/1], Step [2322/13804], Loss: 3.1059, Perplexity: 22.3301, time_taken_in_seconds: 18
Epoch [1/1], Step [2323/13804], Loss: 3.2268, Perplexity: 25.1977, time_taken_in_seconds: 19
Epoch [1/1], Step [2324/13804], Loss: 2.8371, Perplexity: 17.0670, time_taken_in_seconds: 20
Epoch [1/1], Step [2325/13804], Loss: 2.7489, Perplexity: 15.6247, time_taken_in_seconds: 21
Epoch [1/1], Step [2326/13804], Loss: 2.9634, Perplexity: 19.3638, time_taken_in_seconds: 22
Epoch [1/1], Step [2327/13804], Loss: 2.7301, Perplexity: 15.3340, time_taken_in_seconds: 23
Epoch [1/1], Step [2328/13804], Loss: 3.2352, Perplexity: 25.4114, time_taken_in_seconds: 23
Epoch [1/1], Step [2329/13804], Loss: 3.6639, Perplexity: 39.0138, time_taken_in_seconds: 24
Epoch [1/1], Step [2330/13804], Loss: 3.5214, Perplexity: 33.8312, time_taken_in_seconds: 25
Epoch [1/1], Step [2331/13804], Loss: 3.4310, Perplexity: 30.9072, time_taken_in_seconds: 26
Epoch [1/1], Step [2332/13804], Loss: 2.7732, Perplexity: 16.0097, time_taken_in_seconds: 27
Epoch [1/1], Step [2333/13804], Loss: 2.9171, Perplexity: 18.4879, time_taken_in_seconds: 28
Epoch [1/1], Step [2334/13804], Loss: 2.7745, Perplexity: 16.0300, time_taken_in_seconds: 28
Epoch [1/1], Step [2335/13804], Loss: 3.1810, Perplexity: 24.0702, time_taken_in_seconds: 29
Epoch [1/1], Step [2336/13804], Loss: 2.6024, Perplexity: 13.4956, time_taken_in_seconds: 30
Epoch [1/1], Step [2337/13804], Loss: 2.8572, Perplexity: 17.4123, time_taken_in_seconds: 31
Epoch [1/1], Step [2338/13804], Loss: 2.6778, Perplexity: 14.5530, time_taken_in_seconds: 32
Epoch [1/1], Step [2339/13804], Loss: 2.9027, Perplexity: 18.2232, time_taken_in_seconds: 33
Epoch [1/1], Step [2340/13804], Loss: 2.8658, Perplexity: 17.5638, time_taken_in_seconds: 33
Epoch [1/1], Step [2341/13804], Loss: 3.2949, Perplexity: 26.9745, time_taken_in_seconds: 34
Epoch [1/1], Step [2342/13804], Loss: 2.9299, Perplexity: 18.7249, time_taken_in_seconds: 35
Epoch [1/1], Step [2343/13804], Loss: 2.9622, Perplexity: 19.3403, time_taken_in_seconds: 36
Epoch [1/1], Step [2344/13804], Loss: 2.9007, Perplexity: 18.1868, time_taken_in_seconds: 37
Epoch [1/1], Step [2345/13804], Loss: 3.0247, Perplexity: 20.5886, time_taken_in_seconds: 38
Epoch [1/1], Step [2346/13804], Loss: 3.2606, Perplexity: 26.0658, time_taken_in_seconds: 38
Epoch [1/1], Step [2347/13804], Loss: 2.8983, Perplexity: 18.1434, time_taken_in_seconds: 39
Epoch [1/1], Step [2348/13804], Loss: 2.9961, Perplexity: 20.0071, time_taken_in_seconds: 40
Epoch [1/1], Step [2349/13804], Loss: 2.8525, Perplexity: 17.3314, time_taken_in_seconds: 41
Epoch [1/1], Step [2350/13804], Loss: 2.7115, Perplexity: 15.0524, time_taken_in_seconds: 42
Epoch [1/1], Step [2351/13804], Loss: 3.1442, Perplexity: 23.2008, time_taken_in_seconds: 43
Epoch [1/1], Step [2352/13804], Loss: 3.0718, Perplexity: 21.5811, time_taken_in_seconds: 44
Epoch [1/1], Step [2353/13804], Loss: 3.1283, Perplexity: 22.8343, time_taken_in_seconds: 44
Epoch [1/1], Step [2354/13804], Loss: 2.8315, Perplexity: 16.9707, time_taken_in_seconds: 45
Epoch [1/1], Step [2355/13804], Loss: 2.9168, Perplexity: 18.4818, time_taken_in_seconds: 46
Epoch [1/1], Step [2356/13804], Loss: 2.7400, Perplexity: 15.4873, time_taken_in_seconds: 47
Epoch [1/1], Step [2357/13804], Loss: 2.5512, Perplexity: 12.8223, time_taken_in_seconds: 48
Epoch [1/1], Step [2358/13804], Loss: 2.7820, Perplexity: 16.1521, time_taken_in_seconds: 49
Epoch [1/1], Step [2359/13804], Loss: 2.9926, Perplexity: 19.9377, time_taken_in_seconds: 49
Epoch [1/1], Step [2360/13804], Loss: 2.7365, Perplexity: 15.4330, time_taken_in_seconds: 50
Epoch [1/1], Step [2361/13804], Loss: 2.8491, Perplexity: 17.2728, time_taken_in_seconds: 51
Epoch [1/1], Step [2362/13804], Loss: 3.2258, Perplexity: 25.1738, time_taken_in_seconds: 52
Epoch [1/1], Step [2363/13804], Loss: 3.3088, Perplexity: 27.3524, time_taken_in_seconds: 53
Epoch [1/1], Step [2364/13804], Loss: 3.2779, Perplexity: 26.5211, time_taken_in_seconds: 54
Epoch [1/1], Step [2365/13804], Loss: 2.8348, Perplexity: 17.0276, time_taken_in_seconds: 54
Epoch [1/1], Step [2366/13804], Loss: 2.9025, Perplexity: 18.2193, time_taken_in_seconds: 55
Epoch [1/1], Step [2367/13804], Loss: 2.8980, Perplexity: 18.1378, time_taken_in_seconds: 56
Epoch [1/1], Step [2368/13804], Loss: 3.3189, Perplexity: 27.6305, time_taken_in_seconds: 57
Epoch [1/1], Step [2369/13804], Loss: 2.5512, Perplexity: 12.8222, time_taken_in_seconds: 58
Epoch [1/1], Step [2370/13804], Loss: 2.9105, Perplexity: 18.3654, time_taken_in_seconds: 59
Epoch [1/1], Step [2371/13804], Loss: 2.8133, Perplexity: 16.6647, time_taken_in_seconds: 59
Epoch [1/1], Step [2372/13804], Loss: 2.5489, Perplexity: 12.7933, time_taken_in_seconds: 60
Epoch [1/1], Step [2373/13804], Loss: 2.6978, Perplexity: 14.8464, time_taken_in_seconds: 61
Epoch [1/1], Step [2374/13804], Loss: 3.0764, Perplexity: 21.6799, time_taken_in_seconds: 62
Epoch [1/1], Step [2375/13804], Loss: 3.0425, Perplexity: 20.9569, time_taken_in_seconds: 63
Epoch [1/1], Step [2376/13804], Loss: 2.8011, Perplexity: 16.4621, time_taken_in_seconds: 64
Epoch [1/1], Step [2377/13804], Loss: 2.7670, Perplexity: 15.9113, time_taken_in_seconds: 64
Epoch [1/1], Step [2378/13804], Loss: 3.1249, Perplexity: 22.7570, time_taken_in_seconds: 65
Epoch [1/1], Step [2379/13804], Loss: 3.6753, Perplexity: 39.4610, time_taken_in_seconds: 66
Epoch [1/1], Step [2380/13804], Loss: 3.0596, Perplexity: 21.3183, time_taken_in_seconds: 67
Epoch [1/1], Step [2381/13804], Loss: 3.0484, Perplexity: 21.0809, time_taken_in_seconds: 68
Epoch [1/1], Step [2382/13804], Loss: 2.8155, Perplexity: 16.7017, time_taken_in_seconds: 69
Epoch [1/1], Step [2383/13804], Loss: 2.4508, Perplexity: 11.5981, time_taken_in_seconds: 70
Epoch [1/1], Step [2384/13804], Loss: 2.8990, Perplexity: 18.1552, time_taken_in_seconds: 70
Epoch [1/1], Step [2385/13804], Loss: 2.9054, Perplexity: 18.2729, time_taken_in_seconds: 71
Epoch [1/1], Step [2386/13804], Loss: 3.1215, Perplexity: 22.6797, time_taken_in_seconds: 72
Epoch [1/1], Step [2387/13804], Loss: 2.7786, Perplexity: 16.0967, time_taken_in_seconds: 73
Epoch [1/1], Step [2388/13804], Loss: 3.2575, Perplexity: 25.9847, time_taken_in_seconds: 74
Epoch [1/1], Step [2389/13804], Loss: 3.0401, Perplexity: 20.9078, time_taken_in_seconds: 75
Epoch [1/1], Step [2390/13804], Loss: 2.6440, Perplexity: 14.0692, time_taken_in_seconds: 75
Epoch [1/1], Step [2391/13804], Loss: 2.9741, Perplexity: 19.5722, time_taken_in_seconds: 76
Epoch [1/1], Step [2392/13804], Loss: 2.9757, Perplexity: 19.6031, time_taken_in_seconds: 77
Epoch [1/1], Step [2393/13804], Loss: 3.0943, Perplexity: 22.0716, time_taken_in_seconds: 78
Epoch [1/1], Step [2394/13804], Loss: 2.7394, Perplexity: 15.4783, time_taken_in_seconds: 79
Epoch [1/1], Step [2395/13804], Loss: 2.6409, Perplexity: 14.0265, time_taken_in_seconds: 80
Epoch [1/1], Step [2396/13804], Loss: 3.1241, Perplexity: 22.7386, time_taken_in_seconds: 81
Epoch [1/1], Step [2397/13804], Loss: 2.9220, Perplexity: 18.5789, time_taken_in_seconds: 82
Epoch [1/1], Step [2398/13804], Loss: 3.0828, Perplexity: 21.8188, time_taken_in_seconds: 82
Epoch [1/1], Step [2399/13804], Loss: 3.1494, Perplexity: 23.3227, time_taken_in_seconds: 83
Epoch [1/1], Step [2400/13804], Loss: 3.0962, Perplexity: 22.1127, time_taken_in_seconds: 84
Epoch [1/1], Step [2401/13804], Loss: 3.4585, Perplexity: 31.7688, time_taken_in_seconds: 0
Epoch [1/1], Step [2402/13804], Loss: 2.8311, Perplexity: 16.9639, time_taken_in_seconds: 1
Epoch [1/1], Step [2403/13804], Loss: 2.7117, Perplexity: 15.0546, time_taken_in_seconds: 2
Epoch [1/1], Step [2404/13804], Loss: 2.6575, Perplexity: 14.2611, time_taken_in_seconds: 3
Epoch [1/1], Step [2405/13804], Loss: 2.9684, Perplexity: 19.4602, time_taken_in_seconds: 4
Epoch [1/1], Step [2406/13804], Loss: 2.9554, Perplexity: 19.2098, time_taken_in_seconds: 5
Epoch [1/1], Step [2407/13804], Loss: 2.7359, Perplexity: 15.4237, time_taken_in_seconds: 5
Epoch [1/1], Step [2408/13804], Loss: 2.9436, Perplexity: 18.9841, time_taken_in_seconds: 6
Epoch [1/1], Step [2409/13804], Loss: 2.5791, Perplexity: 13.1853, time_taken_in_seconds: 7
Epoch [1/1], Step [2410/13804], Loss: 2.7675, Perplexity: 15.9192, time_taken_in_seconds: 8
Epoch [1/1], Step [2411/13804], Loss: 3.0729, Perplexity: 21.6044, time_taken_in_seconds: 9
Epoch [1/1], Step [2412/13804], Loss: 2.8758, Perplexity: 17.7392, time_taken_in_seconds: 10
Epoch [1/1], Step [2413/13804], Loss: 3.0892, Perplexity: 21.9593, time_taken_in_seconds: 10
Epoch [1/1], Step [2414/13804], Loss: 2.9968, Perplexity: 20.0213, time_taken_in_seconds: 11
Epoch [1/1], Step [2415/13804], Loss: 2.9618, Perplexity: 19.3330, time_taken_in_seconds: 12
Epoch [1/1], Step [2416/13804], Loss: 2.9918, Perplexity: 19.9209, time_taken_in_seconds: 13
Epoch [1/1], Step [2417/13804], Loss: 2.9587, Perplexity: 19.2728, time_taken_in_seconds: 14
Epoch [1/1], Step [2418/13804], Loss: 2.9868, Perplexity: 19.8226, time_taken_in_seconds: 15
Epoch [1/1], Step [2419/13804], Loss: 3.0152, Perplexity: 20.3939, time_taken_in_seconds: 15
Epoch [1/1], Step [2420/13804], Loss: 2.8877, Perplexity: 17.9523, time_taken_in_seconds: 16
Epoch [1/1], Step [2421/13804], Loss: 2.9527, Perplexity: 19.1569, time_taken_in_seconds: 17
Epoch [1/1], Step [2422/13804], Loss: 2.8971, Perplexity: 18.1207, time_taken_in_seconds: 18
Epoch [1/1], Step [2423/13804], Loss: 2.7278, Perplexity: 15.2986, time_taken_in_seconds: 19
Epoch [1/1], Step [2424/13804], Loss: 3.1241, Perplexity: 22.7405, time_taken_in_seconds: 20
Epoch [1/1], Step [2425/13804], Loss: 2.7004, Perplexity: 14.8862, time_taken_in_seconds: 20
Epoch [1/1], Step [2426/13804], Loss: 2.9317, Perplexity: 18.7598, time_taken_in_seconds: 21
Epoch [1/1], Step [2427/13804], Loss: 3.1929, Perplexity: 24.3579, time_taken_in_seconds: 22
Epoch [1/1], Step [2428/13804], Loss: 3.0476, Perplexity: 21.0650, time_taken_in_seconds: 23
Epoch [1/1], Step [2429/13804], Loss: 2.6114, Perplexity: 13.6183, time_taken_in_seconds: 24
Epoch [1/1], Step [2430/13804], Loss: 2.6546, Perplexity: 14.2192, time_taken_in_seconds: 25
Epoch [1/1], Step [2431/13804], Loss: 2.6039, Perplexity: 13.5158, time_taken_in_seconds: 25
Epoch [1/1], Step [2432/13804], Loss: 2.6692, Perplexity: 14.4278, time_taken_in_seconds: 26
Epoch [1/1], Step [2433/13804], Loss: 2.5816, Perplexity: 13.2184, time_taken_in_seconds: 27
Epoch [1/1], Step [2434/13804], Loss: 2.8188, Perplexity: 16.7567, time_taken_in_seconds: 28
Epoch [1/1], Step [2435/13804], Loss: 2.9412, Perplexity: 18.9380, time_taken_in_seconds: 29
Epoch [1/1], Step [2436/13804], Loss: 2.8464, Perplexity: 17.2259, time_taken_in_seconds: 30
Epoch [1/1], Step [2437/13804], Loss: 2.9308, Perplexity: 18.7424, time_taken_in_seconds: 30
Epoch [1/1], Step [2438/13804], Loss: 2.4501, Perplexity: 11.5901, time_taken_in_seconds: 31
Epoch [1/1], Step [2439/13804], Loss: 2.9732, Perplexity: 19.5552, time_taken_in_seconds: 32
Epoch [1/1], Step [2440/13804], Loss: 2.7930, Perplexity: 16.3299, time_taken_in_seconds: 33
Epoch [1/1], Step [2441/13804], Loss: 3.1131, Perplexity: 22.4902, time_taken_in_seconds: 34
Epoch [1/1], Step [2442/13804], Loss: 2.7726, Perplexity: 16.0009, time_taken_in_seconds: 35
Epoch [1/1], Step [2443/13804], Loss: 2.7844, Perplexity: 16.1898, time_taken_in_seconds: 35
Epoch [1/1], Step [2444/13804], Loss: 2.7880, Perplexity: 16.2484, time_taken_in_seconds: 36
Epoch [1/1], Step [2445/13804], Loss: 3.2232, Perplexity: 25.1072, time_taken_in_seconds: 37
Epoch [1/1], Step [2446/13804], Loss: 2.5784, Perplexity: 13.1758, time_taken_in_seconds: 38
Epoch [1/1], Step [2447/13804], Loss: 3.0734, Perplexity: 21.6163, time_taken_in_seconds: 39
Epoch [1/1], Step [2448/13804], Loss: 2.5036, Perplexity: 12.2268, time_taken_in_seconds: 40
Epoch [1/1], Step [2449/13804], Loss: 2.5814, Perplexity: 13.2153, time_taken_in_seconds: 40
Epoch [1/1], Step [2450/13804], Loss: 2.8736, Perplexity: 17.7015, time_taken_in_seconds: 41
Epoch [1/1], Step [2451/13804], Loss: 2.5902, Perplexity: 13.3321, time_taken_in_seconds: 42
Epoch [1/1], Step [2452/13804], Loss: 2.7407, Perplexity: 15.4976, time_taken_in_seconds: 43
Epoch [1/1], Step [2453/13804], Loss: 2.9725, Perplexity: 19.5415, time_taken_in_seconds: 44
Epoch [1/1], Step [2454/13804], Loss: 2.8412, Perplexity: 17.1356, time_taken_in_seconds: 45
Epoch [1/1], Step [2455/13804], Loss: 2.6637, Perplexity: 14.3491, time_taken_in_seconds: 45
Epoch [1/1], Step [2456/13804], Loss: 2.6751, Perplexity: 14.5140, time_taken_in_seconds: 46
Epoch [1/1], Step [2457/13804], Loss: 3.2164, Perplexity: 24.9387, time_taken_in_seconds: 47
Epoch [1/1], Step [2458/13804], Loss: 2.8563, Perplexity: 17.3962, time_taken_in_seconds: 48
Epoch [1/1], Step [2459/13804], Loss: 3.1779, Perplexity: 23.9973, time_taken_in_seconds: 49
Epoch [1/1], Step [2460/13804], Loss: 3.4746, Perplexity: 32.2840, time_taken_in_seconds: 50
Epoch [1/1], Step [2461/13804], Loss: 2.7249, Perplexity: 15.2553, time_taken_in_seconds: 50
Epoch [1/1], Step [2462/13804], Loss: 2.8193, Perplexity: 16.7645, time_taken_in_seconds: 51
Epoch [1/1], Step [2463/13804], Loss: 2.8705, Perplexity: 17.6454, time_taken_in_seconds: 52
Epoch [1/1], Step [2464/13804], Loss: 2.6068, Perplexity: 13.5558, time_taken_in_seconds: 53
Epoch [1/1], Step [2465/13804], Loss: 2.7075, Perplexity: 14.9924, time_taken_in_seconds: 54
Epoch [1/1], Step [2466/13804], Loss: 3.0232, Perplexity: 20.5562, time_taken_in_seconds: 55
Epoch [1/1], Step [2467/13804], Loss: 2.8425, Perplexity: 17.1581, time_taken_in_seconds: 56
Epoch [1/1], Step [2468/13804], Loss: 2.8401, Perplexity: 17.1170, time_taken_in_seconds: 56
Epoch [1/1], Step [2469/13804], Loss: 2.8887, Perplexity: 17.9708, time_taken_in_seconds: 57
Epoch [1/1], Step [2470/13804], Loss: 2.6201, Perplexity: 13.7377, time_taken_in_seconds: 58
Epoch [1/1], Step [2471/13804], Loss: 3.2711, Perplexity: 26.3402, time_taken_in_seconds: 59
Epoch [1/1], Step [2472/13804], Loss: 3.0457, Perplexity: 21.0239, time_taken_in_seconds: 60
Epoch [1/1], Step [2473/13804], Loss: 2.9751, Perplexity: 19.5922, time_taken_in_seconds: 61
Epoch [1/1], Step [2474/13804], Loss: 3.6831, Perplexity: 39.7678, time_taken_in_seconds: 61
Epoch [1/1], Step [2475/13804], Loss: 2.8997, Perplexity: 18.1690, time_taken_in_seconds: 62
Epoch [1/1], Step [2476/13804], Loss: 2.9038, Perplexity: 18.2426, time_taken_in_seconds: 63
Epoch [1/1], Step [2477/13804], Loss: 2.7408, Perplexity: 15.4998, time_taken_in_seconds: 64
Epoch [1/1], Step [2478/13804], Loss: 3.1352, Perplexity: 22.9941, time_taken_in_seconds: 65
Epoch [1/1], Step [2479/13804], Loss: 2.8800, Perplexity: 17.8139, time_taken_in_seconds: 66
Epoch [1/1], Step [2480/13804], Loss: 3.1005, Perplexity: 22.2097, time_taken_in_seconds: 67
Epoch [1/1], Step [2481/13804], Loss: 2.9067, Perplexity: 18.2954, time_taken_in_seconds: 67
Epoch [1/1], Step [2482/13804], Loss: 3.3782, Perplexity: 29.3193, time_taken_in_seconds: 68
Epoch [1/1], Step [2483/13804], Loss: 2.6620, Perplexity: 14.3245, time_taken_in_seconds: 69
Epoch [1/1], Step [2484/13804], Loss: 3.0906, Perplexity: 21.9913, time_taken_in_seconds: 70
Epoch [1/1], Step [2485/13804], Loss: 2.9728, Perplexity: 19.5461, time_taken_in_seconds: 71
Epoch [1/1], Step [2486/13804], Loss: 3.5671, Perplexity: 35.4147, time_taken_in_seconds: 72
Epoch [1/1], Step [2487/13804], Loss: 2.8270, Perplexity: 16.8946, time_taken_in_seconds: 72
Epoch [1/1], Step [2488/13804], Loss: 2.9487, Perplexity: 19.0808, time_taken_in_seconds: 73
Epoch [1/1], Step [2489/13804], Loss: 4.2727, Perplexity: 71.7160, time_taken_in_seconds: 74
Epoch [1/1], Step [2490/13804], Loss: 3.2174, Perplexity: 24.9635, time_taken_in_seconds: 75
Epoch [1/1], Step [2491/13804], Loss: 3.0115, Perplexity: 20.3175, time_taken_in_seconds: 76
Epoch [1/1], Step [2492/13804], Loss: 3.1722, Perplexity: 23.8592, time_taken_in_seconds: 77
Epoch [1/1], Step [2493/13804], Loss: 2.9796, Perplexity: 19.6797, time_taken_in_seconds: 77
Epoch [1/1], Step [2494/13804], Loss: 3.6321, Perplexity: 37.7934, time_taken_in_seconds: 78
Epoch [1/1], Step [2495/13804], Loss: 3.0000, Perplexity: 20.0846, time_taken_in_seconds: 79
Epoch [1/1], Step [2496/13804], Loss: 3.0336, Perplexity: 20.7727, time_taken_in_seconds: 80
Epoch [1/1], Step [2497/13804], Loss: 2.8236, Perplexity: 16.8374, time_taken_in_seconds: 81
Epoch [1/1], Step [2498/13804], Loss: 2.8515, Perplexity: 17.3145, time_taken_in_seconds: 82
Epoch [1/1], Step [2499/13804], Loss: 2.8346, Perplexity: 17.0233, time_taken_in_seconds: 82
Epoch [1/1], Step [2500/13804], Loss: 3.0447, Perplexity: 21.0046, time_taken_in_seconds: 83
Epoch [1/1], Step [2501/13804], Loss: 3.8214, Perplexity: 45.6665, time_taken_in_seconds: 0
Epoch [1/1], Step [2502/13804], Loss: 2.9311, Perplexity: 18.7488, time_taken_in_seconds: 1
Epoch [1/1], Step [2503/13804], Loss: 2.9332, Perplexity: 18.7874, time_taken_in_seconds: 2
Epoch [1/1], Step [2504/13804], Loss: 2.9802, Perplexity: 19.6921, time_taken_in_seconds: 3
Epoch [1/1], Step [2505/13804], Loss: 2.8846, Perplexity: 17.8970, time_taken_in_seconds: 4
Epoch [1/1], Step [2506/13804], Loss: 3.0892, Perplexity: 21.9598, time_taken_in_seconds: 5
Epoch [1/1], Step [2507/13804], Loss: 3.4175, Perplexity: 30.4946, time_taken_in_seconds: 5
Epoch [1/1], Step [2508/13804], Loss: 2.9559, Perplexity: 19.2192, time_taken_in_seconds: 6
Epoch [1/1], Step [2509/13804], Loss: 2.5968, Perplexity: 13.4206, time_taken_in_seconds: 7
Epoch [1/1], Step [2510/13804], Loss: 2.9450, Perplexity: 19.0098, time_taken_in_seconds: 8
Epoch [1/1], Step [2511/13804], Loss: 3.0348, Perplexity: 20.7961, time_taken_in_seconds: 9
Epoch [1/1], Step [2512/13804], Loss: 2.7300, Perplexity: 15.3326, time_taken_in_seconds: 10
Epoch [1/1], Step [2513/13804], Loss: 3.0155, Perplexity: 20.3992, time_taken_in_seconds: 10
Epoch [1/1], Step [2514/13804], Loss: 2.7553, Perplexity: 15.7262, time_taken_in_seconds: 11
Epoch [1/1], Step [2515/13804], Loss: 2.7840, Perplexity: 16.1835, time_taken_in_seconds: 12
Epoch [1/1], Step [2516/13804], Loss: 3.2442, Perplexity: 25.6423, time_taken_in_seconds: 13
Epoch [1/1], Step [2517/13804], Loss: 2.6402, Perplexity: 14.0164, time_taken_in_seconds: 14
Epoch [1/1], Step [2518/13804], Loss: 2.9553, Perplexity: 19.2072, time_taken_in_seconds: 15
Epoch [1/1], Step [2519/13804], Loss: 2.8086, Perplexity: 16.5868, time_taken_in_seconds: 16
Epoch [1/1], Step [2520/13804], Loss: 2.7330, Perplexity: 15.3788, time_taken_in_seconds: 16
Epoch [1/1], Step [2521/13804], Loss: 2.6753, Perplexity: 14.5164, time_taken_in_seconds: 17
Epoch [1/1], Step [2522/13804], Loss: 2.7136, Perplexity: 15.0834, time_taken_in_seconds: 18
Epoch [1/1], Step [2523/13804], Loss: 2.7169, Perplexity: 15.1337, time_taken_in_seconds: 19
Epoch [1/1], Step [2524/13804], Loss: 2.8154, Perplexity: 16.7004, time_taken_in_seconds: 20
Epoch [1/1], Step [2525/13804], Loss: 2.5046, Perplexity: 12.2392, time_taken_in_seconds: 21
Epoch [1/1], Step [2526/13804], Loss: 2.9176, Perplexity: 18.4961, time_taken_in_seconds: 21
Epoch [1/1], Step [2527/13804], Loss: 3.1347, Perplexity: 22.9809, time_taken_in_seconds: 22
Epoch [1/1], Step [2528/13804], Loss: 2.8969, Perplexity: 18.1185, time_taken_in_seconds: 23
Epoch [1/1], Step [2529/13804], Loss: 2.8663, Perplexity: 17.5725, time_taken_in_seconds: 24
Epoch [1/1], Step [2530/13804], Loss: 2.7409, Perplexity: 15.5004, time_taken_in_seconds: 25
Epoch [1/1], Step [2531/13804], Loss: 3.1856, Perplexity: 24.1821, time_taken_in_seconds: 26
Epoch [1/1], Step [2532/13804], Loss: 2.7631, Perplexity: 15.8486, time_taken_in_seconds: 26
Epoch [1/1], Step [2533/13804], Loss: 3.0728, Perplexity: 21.6023, time_taken_in_seconds: 27
Epoch [1/1], Step [2534/13804], Loss: 3.0259, Perplexity: 20.6117, time_taken_in_seconds: 28
Epoch [1/1], Step [2535/13804], Loss: 3.0633, Perplexity: 21.3978, time_taken_in_seconds: 29
Epoch [1/1], Step [2536/13804], Loss: 2.8354, Perplexity: 17.0365, time_taken_in_seconds: 30
Epoch [1/1], Step [2537/13804], Loss: 2.7219, Perplexity: 15.2095, time_taken_in_seconds: 31
Epoch [1/1], Step [2538/13804], Loss: 2.6927, Perplexity: 14.7719, time_taken_in_seconds: 32
Epoch [1/1], Step [2539/13804], Loss: 3.1888, Perplexity: 24.2587, time_taken_in_seconds: 32
Epoch [1/1], Step [2540/13804], Loss: 2.5916, Perplexity: 13.3508, time_taken_in_seconds: 33
Epoch [1/1], Step [2541/13804], Loss: 2.9751, Perplexity: 19.5912, time_taken_in_seconds: 34
Epoch [1/1], Step [2542/13804], Loss: 2.9251, Perplexity: 18.6366, time_taken_in_seconds: 35
Epoch [1/1], Step [2543/13804], Loss: 2.8792, Perplexity: 17.7993, time_taken_in_seconds: 36
Epoch [1/1], Step [2544/13804], Loss: 2.7159, Perplexity: 15.1175, time_taken_in_seconds: 37
Epoch [1/1], Step [2545/13804], Loss: 3.2873, Perplexity: 26.7713, time_taken_in_seconds: 38
Epoch [1/1], Step [2546/13804], Loss: 2.9477, Perplexity: 19.0622, time_taken_in_seconds: 38
Epoch [1/1], Step [2547/13804], Loss: 3.2290, Perplexity: 25.2534, time_taken_in_seconds: 39
Epoch [1/1], Step [2548/13804], Loss: 3.4550, Perplexity: 31.6593, time_taken_in_seconds: 40
Epoch [1/1], Step [2549/13804], Loss: 2.8942, Perplexity: 18.0697, time_taken_in_seconds: 41
Epoch [1/1], Step [2550/13804], Loss: 2.7710, Perplexity: 15.9738, time_taken_in_seconds: 42
Epoch [1/1], Step [2551/13804], Loss: 2.9294, Perplexity: 18.7165, time_taken_in_seconds: 43
Epoch [1/1], Step [2552/13804], Loss: 3.2174, Perplexity: 24.9621, time_taken_in_seconds: 43
Epoch [1/1], Step [2553/13804], Loss: 2.7182, Perplexity: 15.1524, time_taken_in_seconds: 44
Epoch [1/1], Step [2554/13804], Loss: 2.7790, Perplexity: 16.1031, time_taken_in_seconds: 45
Epoch [1/1], Step [2555/13804], Loss: 2.6732, Perplexity: 14.4866, time_taken_in_seconds: 46
Epoch [1/1], Step [2556/13804], Loss: 2.9401, Perplexity: 18.9185, time_taken_in_seconds: 47
Epoch [1/1], Step [2557/13804], Loss: 3.1169, Perplexity: 22.5765, time_taken_in_seconds: 48
Epoch [1/1], Step [2558/13804], Loss: 3.2011, Perplexity: 24.5598, time_taken_in_seconds: 48
Epoch [1/1], Step [2559/13804], Loss: 2.6820, Perplexity: 14.6137, time_taken_in_seconds: 49
Epoch [1/1], Step [2560/13804], Loss: 3.0748, Perplexity: 21.6445, time_taken_in_seconds: 50
Epoch [1/1], Step [2561/13804], Loss: 2.8734, Perplexity: 17.6975, time_taken_in_seconds: 51
Epoch [1/1], Step [2562/13804], Loss: 2.6949, Perplexity: 14.8039, time_taken_in_seconds: 52
Epoch [1/1], Step [2563/13804], Loss: 2.9006, Perplexity: 18.1846, time_taken_in_seconds: 53
Epoch [1/1], Step [2564/13804], Loss: 2.8370, Perplexity: 17.0641, time_taken_in_seconds: 53
Epoch [1/1], Step [2565/13804], Loss: 2.5717, Perplexity: 13.0886, time_taken_in_seconds: 54
Epoch [1/1], Step [2566/13804], Loss: 3.0674, Perplexity: 21.4859, time_taken_in_seconds: 55
Epoch [1/1], Step [2567/13804], Loss: 2.5492, Perplexity: 12.7969, time_taken_in_seconds: 56
Epoch [1/1], Step [2568/13804], Loss: 3.1517, Perplexity: 23.3757, time_taken_in_seconds: 57
Epoch [1/1], Step [2569/13804], Loss: 3.4208, Perplexity: 30.5954, time_taken_in_seconds: 58
Epoch [1/1], Step [2570/13804], Loss: 2.9874, Perplexity: 19.8343, time_taken_in_seconds: 58
Epoch [1/1], Step [2571/13804], Loss: 2.6144, Perplexity: 13.6593, time_taken_in_seconds: 59
Epoch [1/1], Step [2572/13804], Loss: 2.5177, Perplexity: 12.4004, time_taken_in_seconds: 60
Epoch [1/1], Step [2573/13804], Loss: 2.8928, Perplexity: 18.0445, time_taken_in_seconds: 61
Epoch [1/1], Step [2574/13804], Loss: 2.9045, Perplexity: 18.2569, time_taken_in_seconds: 62
Epoch [1/1], Step [2575/13804], Loss: 3.2153, Perplexity: 24.9107, time_taken_in_seconds: 63
Epoch [1/1], Step [2576/13804], Loss: 2.6505, Perplexity: 14.1610, time_taken_in_seconds: 64
Epoch [1/1], Step [2577/13804], Loss: 2.7115, Perplexity: 15.0521, time_taken_in_seconds: 64
Epoch [1/1], Step [2578/13804], Loss: 3.5818, Perplexity: 35.9366, time_taken_in_seconds: 65
Epoch [1/1], Step [2579/13804], Loss: 2.8939, Perplexity: 18.0637, time_taken_in_seconds: 66
Epoch [1/1], Step [2580/13804], Loss: 2.5364, Perplexity: 12.6345, time_taken_in_seconds: 67
Epoch [1/1], Step [2581/13804], Loss: 2.5833, Perplexity: 13.2411, time_taken_in_seconds: 68
Epoch [1/1], Step [2582/13804], Loss: 3.4611, Perplexity: 31.8535, time_taken_in_seconds: 69
Epoch [1/1], Step [2583/13804], Loss: 2.9995, Perplexity: 20.0751, time_taken_in_seconds: 69
Epoch [1/1], Step [2584/13804], Loss: 3.1584, Perplexity: 23.5318, time_taken_in_seconds: 70
Epoch [1/1], Step [2585/13804], Loss: 2.9340, Perplexity: 18.8028, time_taken_in_seconds: 71
Epoch [1/1], Step [2586/13804], Loss: 2.7462, Perplexity: 15.5838, time_taken_in_seconds: 72
Epoch [1/1], Step [2587/13804], Loss: 2.5889, Perplexity: 13.3155, time_taken_in_seconds: 73
Epoch [1/1], Step [2588/13804], Loss: 2.9244, Perplexity: 18.6228, time_taken_in_seconds: 74
Epoch [1/1], Step [2589/13804], Loss: 2.6857, Perplexity: 14.6686, time_taken_in_seconds: 74
Epoch [1/1], Step [2590/13804], Loss: 2.9443, Perplexity: 18.9979, time_taken_in_seconds: 75
Epoch [1/1], Step [2591/13804], Loss: 2.7170, Perplexity: 15.1347, time_taken_in_seconds: 76
Epoch [1/1], Step [2592/13804], Loss: 2.7104, Perplexity: 15.0350, time_taken_in_seconds: 77
Epoch [1/1], Step [2593/13804], Loss: 2.7476, Perplexity: 15.6048, time_taken_in_seconds: 78
Epoch [1/1], Step [2594/13804], Loss: 2.8509, Perplexity: 17.3034, time_taken_in_seconds: 79
Epoch [1/1], Step [2595/13804], Loss: 2.8430, Perplexity: 17.1670, time_taken_in_seconds: 79
Epoch [1/1], Step [2596/13804], Loss: 2.8916, Perplexity: 18.0225, time_taken_in_seconds: 80
Epoch [1/1], Step [2597/13804], Loss: 2.9186, Perplexity: 18.5148, time_taken_in_seconds: 81
Epoch [1/1], Step [2598/13804], Loss: 3.0634, Perplexity: 21.3993, time_taken_in_seconds: 82
Epoch [1/1], Step [2599/13804], Loss: 2.6785, Perplexity: 14.5633, time_taken_in_seconds: 83
Epoch [1/1], Step [2600/13804], Loss: 2.7697, Perplexity: 15.9537, time_taken_in_seconds: 84
Epoch [1/1], Step [2601/13804], Loss: 2.7805, Perplexity: 16.1269, time_taken_in_seconds: 0
Epoch [1/1], Step [2602/13804], Loss: 3.0032, Perplexity: 20.1503, time_taken_in_seconds: 1
Epoch [1/1], Step [2603/13804], Loss: 2.6525, Perplexity: 14.1892, time_taken_in_seconds: 2
Epoch [1/1], Step [2604/13804], Loss: 2.4557, Perplexity: 11.6546, time_taken_in_seconds: 3
Epoch [1/1], Step [2605/13804], Loss: 2.7933, Perplexity: 16.3351, time_taken_in_seconds: 4
Epoch [1/1], Step [2606/13804], Loss: 2.7622, Perplexity: 15.8342, time_taken_in_seconds: 5
Epoch [1/1], Step [2607/13804], Loss: 3.0312, Perplexity: 20.7217, time_taken_in_seconds: 5
Epoch [1/1], Step [2608/13804], Loss: 3.0725, Perplexity: 21.5948, time_taken_in_seconds: 6
Epoch [1/1], Step [2609/13804], Loss: 2.7908, Perplexity: 16.2938, time_taken_in_seconds: 7
Epoch [1/1], Step [2610/13804], Loss: 4.1233, Perplexity: 61.7627, time_taken_in_seconds: 8
Epoch [1/1], Step [2611/13804], Loss: 2.7537, Perplexity: 15.7010, time_taken_in_seconds: 9
Epoch [1/1], Step [2612/13804], Loss: 2.7380, Perplexity: 15.4566, time_taken_in_seconds: 10
Epoch [1/1], Step [2613/13804], Loss: 2.9712, Perplexity: 19.5156, time_taken_in_seconds: 11
Epoch [1/1], Step [2614/13804], Loss: 3.0492, Perplexity: 21.0978, time_taken_in_seconds: 11
Epoch [1/1], Step [2615/13804], Loss: 3.2645, Perplexity: 26.1671, time_taken_in_seconds: 12
Epoch [1/1], Step [2616/13804], Loss: 3.0546, Perplexity: 21.2117, time_taken_in_seconds: 13
Epoch [1/1], Step [2617/13804], Loss: 3.2149, Perplexity: 24.9017, time_taken_in_seconds: 14
Epoch [1/1], Step [2618/13804], Loss: 3.2185, Perplexity: 24.9903, time_taken_in_seconds: 15
Epoch [1/1], Step [2619/13804], Loss: 3.2128, Perplexity: 24.8493, time_taken_in_seconds: 16
Epoch [1/1], Step [2620/13804], Loss: 2.6719, Perplexity: 14.4670, time_taken_in_seconds: 16
Epoch [1/1], Step [2621/13804], Loss: 3.1791, Perplexity: 24.0253, time_taken_in_seconds: 17
Epoch [1/1], Step [2622/13804], Loss: 2.7343, Perplexity: 15.3984, time_taken_in_seconds: 18
Epoch [1/1], Step [2623/13804], Loss: 3.0822, Perplexity: 21.8072, time_taken_in_seconds: 19
Epoch [1/1], Step [2624/13804], Loss: 2.7658, Perplexity: 15.8925, time_taken_in_seconds: 20
Epoch [1/1], Step [2625/13804], Loss: 3.0458, Perplexity: 21.0263, time_taken_in_seconds: 21
Epoch [1/1], Step [2626/13804], Loss: 3.1573, Perplexity: 23.5070, time_taken_in_seconds: 22
Epoch [1/1], Step [2627/13804], Loss: 3.0811, Perplexity: 21.7822, time_taken_in_seconds: 22
Epoch [1/1], Step [2628/13804], Loss: 2.8053, Perplexity: 16.5314, time_taken_in_seconds: 23
Epoch [1/1], Step [2629/13804], Loss: 3.0146, Perplexity: 20.3811, time_taken_in_seconds: 24
Epoch [1/1], Step [2630/13804], Loss: 2.8351, Perplexity: 17.0328, time_taken_in_seconds: 25
Epoch [1/1], Step [2631/13804], Loss: 3.0556, Perplexity: 21.2348, time_taken_in_seconds: 26
Epoch [1/1], Step [2632/13804], Loss: 2.9166, Perplexity: 18.4782, time_taken_in_seconds: 27
Epoch [1/1], Step [2633/13804], Loss: 2.9870, Perplexity: 19.8255, time_taken_in_seconds: 27
Epoch [1/1], Step [2634/13804], Loss: 2.6992, Perplexity: 14.8676, time_taken_in_seconds: 28
Epoch [1/1], Step [2635/13804], Loss: 2.8250, Perplexity: 16.8617, time_taken_in_seconds: 29
Epoch [1/1], Step [2636/13804], Loss: 2.7776, Perplexity: 16.0806, time_taken_in_seconds: 30
Epoch [1/1], Step [2637/13804], Loss: 3.3125, Perplexity: 27.4532, time_taken_in_seconds: 31
Epoch [1/1], Step [2638/13804], Loss: 2.8061, Perplexity: 16.5451, time_taken_in_seconds: 32
Epoch [1/1], Step [2639/13804], Loss: 2.9600, Perplexity: 19.2986, time_taken_in_seconds: 32
Epoch [1/1], Step [2640/13804], Loss: 3.2010, Perplexity: 24.5564, time_taken_in_seconds: 33
Epoch [1/1], Step [2641/13804], Loss: 3.2042, Perplexity: 24.6356, time_taken_in_seconds: 34
Epoch [1/1], Step [2642/13804], Loss: 2.6821, Perplexity: 14.6163, time_taken_in_seconds: 35
Epoch [1/1], Step [2643/13804], Loss: 3.1953, Perplexity: 24.4166, time_taken_in_seconds: 36
Epoch [1/1], Step [2644/13804], Loss: 2.7690, Perplexity: 15.9420, time_taken_in_seconds: 37
Epoch [1/1], Step [2645/13804], Loss: 2.9514, Perplexity: 19.1329, time_taken_in_seconds: 37
Epoch [1/1], Step [2646/13804], Loss: 2.7325, Perplexity: 15.3710, time_taken_in_seconds: 38
Epoch [1/1], Step [2647/13804], Loss: 2.9920, Perplexity: 19.9253, time_taken_in_seconds: 39
Epoch [1/1], Step [2648/13804], Loss: 2.6589, Perplexity: 14.2809, time_taken_in_seconds: 40
Epoch [1/1], Step [2649/13804], Loss: 3.1986, Perplexity: 24.4984, time_taken_in_seconds: 41
Epoch [1/1], Step [2650/13804], Loss: 2.5160, Perplexity: 12.3793, time_taken_in_seconds: 42
Epoch [1/1], Step [2651/13804], Loss: 2.6724, Perplexity: 14.4750, time_taken_in_seconds: 42
Epoch [1/1], Step [2652/13804], Loss: 2.9729, Perplexity: 19.5477, time_taken_in_seconds: 43
Epoch [1/1], Step [2653/13804], Loss: 2.8354, Perplexity: 17.0377, time_taken_in_seconds: 44
Epoch [1/1], Step [2654/13804], Loss: 2.9143, Perplexity: 18.4367, time_taken_in_seconds: 45
Epoch [1/1], Step [2655/13804], Loss: 2.6041, Perplexity: 13.5191, time_taken_in_seconds: 46
Epoch [1/1], Step [2656/13804], Loss: 3.5172, Perplexity: 33.6887, time_taken_in_seconds: 47
Epoch [1/1], Step [2657/13804], Loss: 3.0111, Perplexity: 20.3095, time_taken_in_seconds: 47
Epoch [1/1], Step [2658/13804], Loss: 2.6569, Perplexity: 14.2522, time_taken_in_seconds: 48
Epoch [1/1], Step [2659/13804], Loss: 3.0918, Perplexity: 22.0158, time_taken_in_seconds: 49
Epoch [1/1], Step [2660/13804], Loss: 2.6132, Perplexity: 13.6423, time_taken_in_seconds: 50
Epoch [1/1], Step [2661/13804], Loss: 3.2722, Perplexity: 26.3685, time_taken_in_seconds: 51
Epoch [1/1], Step [2662/13804], Loss: 2.9016, Perplexity: 18.2030, time_taken_in_seconds: 52
Epoch [1/1], Step [2663/13804], Loss: 3.1262, Perplexity: 22.7870, time_taken_in_seconds: 52
Epoch [1/1], Step [2664/13804], Loss: 2.8004, Perplexity: 16.4508, time_taken_in_seconds: 53
Epoch [1/1], Step [2665/13804], Loss: 2.8883, Perplexity: 17.9622, time_taken_in_seconds: 54
Epoch [1/1], Step [2666/13804], Loss: 3.5467, Perplexity: 34.6970, time_taken_in_seconds: 55
Epoch [1/1], Step [2667/13804], Loss: 2.8491, Perplexity: 17.2721, time_taken_in_seconds: 56
Epoch [1/1], Step [2668/13804], Loss: 2.7035, Perplexity: 14.9319, time_taken_in_seconds: 57
Epoch [1/1], Step [2669/13804], Loss: 2.7550, Perplexity: 15.7207, time_taken_in_seconds: 58
Epoch [1/1], Step [2670/13804], Loss: 4.2266, Perplexity: 68.4868, time_taken_in_seconds: 58
Epoch [1/1], Step [2671/13804], Loss: 2.7066, Perplexity: 14.9789, time_taken_in_seconds: 59
Epoch [1/1], Step [2672/13804], Loss: 2.6612, Perplexity: 14.3135, time_taken_in_seconds: 60
Epoch [1/1], Step [2673/13804], Loss: 2.9723, Perplexity: 19.5362, time_taken_in_seconds: 61
Epoch [1/1], Step [2674/13804], Loss: 3.4102, Perplexity: 30.2706, time_taken_in_seconds: 62
Epoch [1/1], Step [2675/13804], Loss: 2.7298, Perplexity: 15.3293, time_taken_in_seconds: 63
Epoch [1/1], Step [2676/13804], Loss: 3.0889, Perplexity: 21.9523, time_taken_in_seconds: 63
Epoch [1/1], Step [2677/13804], Loss: 2.9255, Perplexity: 18.6441, time_taken_in_seconds: 64
Epoch [1/1], Step [2678/13804], Loss: 2.5414, Perplexity: 12.6970, time_taken_in_seconds: 65
Epoch [1/1], Step [2679/13804], Loss: 2.7781, Perplexity: 16.0877, time_taken_in_seconds: 66
Epoch [1/1], Step [2680/13804], Loss: 2.5551, Perplexity: 12.8726, time_taken_in_seconds: 67
Epoch [1/1], Step [2681/13804], Loss: 2.5416, Perplexity: 12.6994, time_taken_in_seconds: 68
Epoch [1/1], Step [2682/13804], Loss: 3.4608, Perplexity: 31.8429, time_taken_in_seconds: 69
Epoch [1/1], Step [2683/13804], Loss: 2.5822, Perplexity: 13.2263, time_taken_in_seconds: 69
Epoch [1/1], Step [2684/13804], Loss: 3.1852, Perplexity: 24.1711, time_taken_in_seconds: 70
Epoch [1/1], Step [2685/13804], Loss: 2.8125, Perplexity: 16.6521, time_taken_in_seconds: 71
Epoch [1/1], Step [2686/13804], Loss: 2.4882, Perplexity: 12.0397, time_taken_in_seconds: 72
Epoch [1/1], Step [2687/13804], Loss: 2.8791, Perplexity: 17.7987, time_taken_in_seconds: 73
Epoch [1/1], Step [2688/13804], Loss: 3.0221, Perplexity: 20.5335, time_taken_in_seconds: 74
Epoch [1/1], Step [2689/13804], Loss: 2.6471, Perplexity: 14.1132, time_taken_in_seconds: 75
Epoch [1/1], Step [2690/13804], Loss: 3.0611, Perplexity: 21.3507, time_taken_in_seconds: 75
Epoch [1/1], Step [2691/13804], Loss: 3.0739, Perplexity: 21.6261, time_taken_in_seconds: 76
Epoch [1/1], Step [2692/13804], Loss: 2.5512, Perplexity: 12.8219, time_taken_in_seconds: 77
Epoch [1/1], Step [2693/13804], Loss: 2.7854, Perplexity: 16.2056, time_taken_in_seconds: 78
Epoch [1/1], Step [2694/13804], Loss: 3.6380, Perplexity: 38.0173, time_taken_in_seconds: 79
Epoch [1/1], Step [2695/13804], Loss: 2.7755, Perplexity: 16.0470, time_taken_in_seconds: 80
Epoch [1/1], Step [2696/13804], Loss: 2.7428, Perplexity: 15.5298, time_taken_in_seconds: 80
Epoch [1/1], Step [2697/13804], Loss: 3.0886, Perplexity: 21.9464, time_taken_in_seconds: 81
Epoch [1/1], Step [2698/13804], Loss: 2.8640, Perplexity: 17.5309, time_taken_in_seconds: 82
Epoch [1/1], Step [2699/13804], Loss: 2.9293, Perplexity: 18.7142, time_taken_in_seconds: 83
Epoch [1/1], Step [2700/13804], Loss: 2.7969, Perplexity: 16.3937, time_taken_in_seconds: 84
Epoch [1/1], Step [2701/13804], Loss: 3.1079, Perplexity: 22.3747, time_taken_in_seconds: 0
Epoch [1/1], Step [2702/13804], Loss: 2.6050, Perplexity: 13.5309, time_taken_in_seconds: 1
Epoch [1/1], Step [2703/13804], Loss: 2.5557, Perplexity: 12.8798, time_taken_in_seconds: 2
Epoch [1/1], Step [2704/13804], Loss: 2.6288, Perplexity: 13.8576, time_taken_in_seconds: 3
Epoch [1/1], Step [2705/13804], Loss: 2.4802, Perplexity: 11.9438, time_taken_in_seconds: 4
Epoch [1/1], Step [2706/13804], Loss: 2.6931, Perplexity: 14.7776, time_taken_in_seconds: 5
Epoch [1/1], Step [2707/13804], Loss: 2.9025, Perplexity: 18.2196, time_taken_in_seconds: 5
Epoch [1/1], Step [2708/13804], Loss: 3.0636, Perplexity: 21.4035, time_taken_in_seconds: 6
Epoch [1/1], Step [2709/13804], Loss: 2.9574, Perplexity: 19.2470, time_taken_in_seconds: 7
Epoch [1/1], Step [2710/13804], Loss: 3.2518, Perplexity: 25.8379, time_taken_in_seconds: 8
Epoch [1/1], Step [2711/13804], Loss: 2.8579, Perplexity: 17.4246, time_taken_in_seconds: 9
Epoch [1/1], Step [2712/13804], Loss: 3.3707, Perplexity: 29.0982, time_taken_in_seconds: 10
Epoch [1/1], Step [2713/13804], Loss: 2.8617, Perplexity: 17.4914, time_taken_in_seconds: 10
Epoch [1/1], Step [2714/13804], Loss: 2.9198, Perplexity: 18.5374, time_taken_in_seconds: 11
Epoch [1/1], Step [2715/13804], Loss: 3.0093, Perplexity: 20.2727, time_taken_in_seconds: 12
Epoch [1/1], Step [2716/13804], Loss: 2.8164, Perplexity: 16.7168, time_taken_in_seconds: 13
Epoch [1/1], Step [2717/13804], Loss: 2.7743, Perplexity: 16.0276, time_taken_in_seconds: 14
Epoch [1/1], Step [2718/13804], Loss: 2.8968, Perplexity: 18.1154, time_taken_in_seconds: 15
Epoch [1/1], Step [2719/13804], Loss: 2.7995, Perplexity: 16.4370, time_taken_in_seconds: 15
Epoch [1/1], Step [2720/13804], Loss: 2.9034, Perplexity: 18.2355, time_taken_in_seconds: 16
Epoch [1/1], Step [2721/13804], Loss: 2.9349, Perplexity: 18.8189, time_taken_in_seconds: 17
Epoch [1/1], Step [2722/13804], Loss: 2.7974, Perplexity: 16.4025, time_taken_in_seconds: 18
Epoch [1/1], Step [2723/13804], Loss: 2.7057, Perplexity: 14.9647, time_taken_in_seconds: 19
Epoch [1/1], Step [2724/13804], Loss: 2.5828, Perplexity: 13.2337, time_taken_in_seconds: 20
Epoch [1/1], Step [2725/13804], Loss: 2.9413, Perplexity: 18.9399, time_taken_in_seconds: 20
Epoch [1/1], Step [2726/13804], Loss: 2.6780, Perplexity: 14.5564, time_taken_in_seconds: 21
Epoch [1/1], Step [2727/13804], Loss: 2.8889, Perplexity: 17.9742, time_taken_in_seconds: 22
Epoch [1/1], Step [2728/13804], Loss: 2.9924, Perplexity: 19.9343, time_taken_in_seconds: 23
Epoch [1/1], Step [2729/13804], Loss: 3.5520, Perplexity: 34.8830, time_taken_in_seconds: 24
Epoch [1/1], Step [2730/13804], Loss: 2.9053, Perplexity: 18.2700, time_taken_in_seconds: 25
Epoch [1/1], Step [2731/13804], Loss: 3.5197, Perplexity: 33.7730, time_taken_in_seconds: 25
Epoch [1/1], Step [2732/13804], Loss: 3.1237, Perplexity: 22.7304, time_taken_in_seconds: 26
Epoch [1/1], Step [2733/13804], Loss: 2.5791, Perplexity: 13.1856, time_taken_in_seconds: 27
Epoch [1/1], Step [2734/13804], Loss: 2.9965, Perplexity: 20.0159, time_taken_in_seconds: 28
Epoch [1/1], Step [2735/13804], Loss: 3.0417, Perplexity: 20.9418, time_taken_in_seconds: 29
Epoch [1/1], Step [2736/13804], Loss: 3.0311, Perplexity: 20.7201, time_taken_in_seconds: 30
Epoch [1/1], Step [2737/13804], Loss: 2.8387, Perplexity: 17.0935, time_taken_in_seconds: 30
Epoch [1/1], Step [2738/13804], Loss: 2.8753, Perplexity: 17.7309, time_taken_in_seconds: 31
Epoch [1/1], Step [2739/13804], Loss: 2.9455, Perplexity: 19.0199, time_taken_in_seconds: 32
Epoch [1/1], Step [2740/13804], Loss: 2.6666, Perplexity: 14.3909, time_taken_in_seconds: 33
Epoch [1/1], Step [2741/13804], Loss: 2.8165, Perplexity: 16.7177, time_taken_in_seconds: 34
Epoch [1/1], Step [2742/13804], Loss: 2.6827, Perplexity: 14.6242, time_taken_in_seconds: 34
Epoch [1/1], Step [2743/13804], Loss: 2.9565, Perplexity: 19.2312, time_taken_in_seconds: 35
Epoch [1/1], Step [2744/13804], Loss: 2.5219, Perplexity: 12.4521, time_taken_in_seconds: 36
Epoch [1/1], Step [2745/13804], Loss: 2.9219, Perplexity: 18.5767, time_taken_in_seconds: 37
Epoch [1/1], Step [2746/13804], Loss: 3.4624, Perplexity: 31.8939, time_taken_in_seconds: 38
Epoch [1/1], Step [2747/13804], Loss: 2.8950, Perplexity: 18.0837, time_taken_in_seconds: 39
Epoch [1/1], Step [2748/13804], Loss: 2.6226, Perplexity: 13.7721, time_taken_in_seconds: 40
Epoch [1/1], Step [2749/13804], Loss: 2.7780, Perplexity: 16.0872, time_taken_in_seconds: 40
Epoch [1/1], Step [2750/13804], Loss: 2.8978, Perplexity: 18.1341, time_taken_in_seconds: 41
Epoch [1/1], Step [2751/13804], Loss: 3.1114, Perplexity: 22.4526, time_taken_in_seconds: 42
Epoch [1/1], Step [2752/13804], Loss: 2.8124, Perplexity: 16.6506, time_taken_in_seconds: 43
Epoch [1/1], Step [2753/13804], Loss: 2.9065, Perplexity: 18.2930, time_taken_in_seconds: 44
Epoch [1/1], Step [2754/13804], Loss: 3.1805, Perplexity: 24.0588, time_taken_in_seconds: 45
Epoch [1/1], Step [2755/13804], Loss: 2.7729, Perplexity: 16.0049, time_taken_in_seconds: 46
Epoch [1/1], Step [2756/13804], Loss: 2.8899, Perplexity: 17.9922, time_taken_in_seconds: 47
Epoch [1/1], Step [2757/13804], Loss: 2.9280, Perplexity: 18.6910, time_taken_in_seconds: 47
Epoch [1/1], Step [2758/13804], Loss: 2.7279, Perplexity: 15.3015, time_taken_in_seconds: 48
Epoch [1/1], Step [2759/13804], Loss: 2.8156, Perplexity: 16.7024, time_taken_in_seconds: 49
Epoch [1/1], Step [2760/13804], Loss: 2.6391, Perplexity: 14.0007, time_taken_in_seconds: 50
Epoch [1/1], Step [2761/13804], Loss: 2.8288, Perplexity: 16.9259, time_taken_in_seconds: 51
Epoch [1/1], Step [2762/13804], Loss: 2.9384, Perplexity: 18.8851, time_taken_in_seconds: 51
Epoch [1/1], Step [2763/13804], Loss: 3.2833, Perplexity: 26.6643, time_taken_in_seconds: 52
Epoch [1/1], Step [2764/13804], Loss: 2.5953, Perplexity: 13.4008, time_taken_in_seconds: 53
Epoch [1/1], Step [2765/13804], Loss: 2.6861, Perplexity: 14.6741, time_taken_in_seconds: 54
Epoch [1/1], Step [2766/13804], Loss: 2.6960, Perplexity: 14.8199, time_taken_in_seconds: 55
Epoch [1/1], Step [2767/13804], Loss: 3.0196, Perplexity: 20.4822, time_taken_in_seconds: 56
Epoch [1/1], Step [2768/13804], Loss: 2.9725, Perplexity: 19.5403, time_taken_in_seconds: 57
Epoch [1/1], Step [2769/13804], Loss: 2.9855, Perplexity: 19.7966, time_taken_in_seconds: 57
Epoch [1/1], Step [2770/13804], Loss: 2.9854, Perplexity: 19.7940, time_taken_in_seconds: 58
Epoch [1/1], Step [2771/13804], Loss: 2.6248, Perplexity: 13.8012, time_taken_in_seconds: 59
Epoch [1/1], Step [2772/13804], Loss: 2.5340, Perplexity: 12.6040, time_taken_in_seconds: 60
Epoch [1/1], Step [2773/13804], Loss: 2.7151, Perplexity: 15.1068, time_taken_in_seconds: 61
Epoch [1/1], Step [2774/13804], Loss: 2.9774, Perplexity: 19.6360, time_taken_in_seconds: 62
Epoch [1/1], Step [2775/13804], Loss: 2.8173, Perplexity: 16.7322, time_taken_in_seconds: 62
Epoch [1/1], Step [2776/13804], Loss: 3.1057, Perplexity: 22.3259, time_taken_in_seconds: 63
Epoch [1/1], Step [2777/13804], Loss: 3.3541, Perplexity: 28.6204, time_taken_in_seconds: 64
Epoch [1/1], Step [2778/13804], Loss: 3.0479, Perplexity: 21.0715, time_taken_in_seconds: 65
Epoch [1/1], Step [2779/13804], Loss: 2.9182, Perplexity: 18.5076, time_taken_in_seconds: 66
Epoch [1/1], Step [2780/13804], Loss: 2.8896, Perplexity: 17.9858, time_taken_in_seconds: 67
Epoch [1/1], Step [2781/13804], Loss: 2.7527, Perplexity: 15.6852, time_taken_in_seconds: 67
Epoch [1/1], Step [2782/13804], Loss: 3.1226, Perplexity: 22.7058, time_taken_in_seconds: 68
Epoch [1/1], Step [2783/13804], Loss: 3.0384, Perplexity: 20.8717, time_taken_in_seconds: 69
Epoch [1/1], Step [2784/13804], Loss: 2.8450, Perplexity: 17.2013, time_taken_in_seconds: 70
Epoch [1/1], Step [2785/13804], Loss: 2.7993, Perplexity: 16.4333, time_taken_in_seconds: 71
Epoch [1/1], Step [2786/13804], Loss: 2.7902, Perplexity: 16.2841, time_taken_in_seconds: 72
Epoch [1/1], Step [2787/13804], Loss: 3.2449, Perplexity: 25.6587, time_taken_in_seconds: 72
Epoch [1/1], Step [2788/13804], Loss: 2.6095, Perplexity: 13.5918, time_taken_in_seconds: 73
Epoch [1/1], Step [2789/13804], Loss: 3.2189, Perplexity: 24.9995, time_taken_in_seconds: 74
Epoch [1/1], Step [2790/13804], Loss: 3.3944, Perplexity: 29.7956, time_taken_in_seconds: 75
Epoch [1/1], Step [2791/13804], Loss: 2.7443, Perplexity: 15.5544, time_taken_in_seconds: 76
Epoch [1/1], Step [2792/13804], Loss: 2.9535, Perplexity: 19.1725, time_taken_in_seconds: 77
Epoch [1/1], Step [2793/13804], Loss: 2.9928, Perplexity: 19.9414, time_taken_in_seconds: 77
Epoch [1/1], Step [2794/13804], Loss: 2.6518, Perplexity: 14.1800, time_taken_in_seconds: 78
Epoch [1/1], Step [2795/13804], Loss: 2.6594, Perplexity: 14.2884, time_taken_in_seconds: 79
Epoch [1/1], Step [2796/13804], Loss: 2.5478, Perplexity: 12.7795, time_taken_in_seconds: 80
Epoch [1/1], Step [2797/13804], Loss: 2.8880, Perplexity: 17.9566, time_taken_in_seconds: 81
Epoch [1/1], Step [2798/13804], Loss: 2.9844, Perplexity: 19.7740, time_taken_in_seconds: 82
Epoch [1/1], Step [2799/13804], Loss: 2.6766, Perplexity: 14.5355, time_taken_in_seconds: 82
Epoch [1/1], Step [2800/13804], Loss: 2.7608, Perplexity: 15.8122, time_taken_in_seconds: 83
Epoch [1/1], Step [2801/13804], Loss: 2.9524, Perplexity: 19.1523, time_taken_in_seconds: 0
Epoch [1/1], Step [2802/13804], Loss: 2.6770, Perplexity: 14.5416, time_taken_in_seconds: 1
Epoch [1/1], Step [2803/13804], Loss: 2.9059, Perplexity: 18.2811, time_taken_in_seconds: 2
Epoch [1/1], Step [2804/13804], Loss: 3.0234, Perplexity: 20.5609, time_taken_in_seconds: 3
Epoch [1/1], Step [2805/13804], Loss: 2.6790, Perplexity: 14.5700, time_taken_in_seconds: 4
Epoch [1/1], Step [2806/13804], Loss: 2.3905, Perplexity: 10.9194, time_taken_in_seconds: 5
Epoch [1/1], Step [2807/13804], Loss: 2.9215, Perplexity: 18.5690, time_taken_in_seconds: 5
Epoch [1/1], Step [2808/13804], Loss: 2.6859, Perplexity: 14.6709, time_taken_in_seconds: 6
Epoch [1/1], Step [2809/13804], Loss: 2.9238, Perplexity: 18.6115, time_taken_in_seconds: 7
Epoch [1/1], Step [2810/13804], Loss: 2.5608, Perplexity: 12.9461, time_taken_in_seconds: 8
Epoch [1/1], Step [2811/13804], Loss: 2.7612, Perplexity: 15.8184, time_taken_in_seconds: 9
Epoch [1/1], Step [2812/13804], Loss: 2.7071, Perplexity: 14.9854, time_taken_in_seconds: 10
Epoch [1/1], Step [2813/13804], Loss: 3.2319, Perplexity: 25.3265, time_taken_in_seconds: 10
Epoch [1/1], Step [2814/13804], Loss: 2.8183, Perplexity: 16.7486, time_taken_in_seconds: 11
Epoch [1/1], Step [2815/13804], Loss: 3.0450, Perplexity: 21.0103, time_taken_in_seconds: 12
Epoch [1/1], Step [2816/13804], Loss: 2.6259, Perplexity: 13.8176, time_taken_in_seconds: 13
Epoch [1/1], Step [2817/13804], Loss: 2.7769, Perplexity: 16.0683, time_taken_in_seconds: 14
Epoch [1/1], Step [2818/13804], Loss: 2.8306, Perplexity: 16.9557, time_taken_in_seconds: 15
Epoch [1/1], Step [2819/13804], Loss: 2.9660, Perplexity: 19.4147, time_taken_in_seconds: 15
Epoch [1/1], Step [2820/13804], Loss: 2.7508, Perplexity: 15.6546, time_taken_in_seconds: 16
Epoch [1/1], Step [2821/13804], Loss: 3.1441, Perplexity: 23.1985, time_taken_in_seconds: 17
Epoch [1/1], Step [2822/13804], Loss: 2.4107, Perplexity: 11.1419, time_taken_in_seconds: 18
Epoch [1/1], Step [2823/13804], Loss: 2.9320, Perplexity: 18.7657, time_taken_in_seconds: 19
Epoch [1/1], Step [2824/13804], Loss: 2.8272, Perplexity: 16.8973, time_taken_in_seconds: 20
Epoch [1/1], Step [2825/13804], Loss: 3.0576, Perplexity: 21.2774, time_taken_in_seconds: 21
Epoch [1/1], Step [2826/13804], Loss: 3.3832, Perplexity: 29.4647, time_taken_in_seconds: 22
Epoch [1/1], Step [2827/13804], Loss: 2.7662, Perplexity: 15.8976, time_taken_in_seconds: 22
Epoch [1/1], Step [2828/13804], Loss: 2.5688, Perplexity: 13.0500, time_taken_in_seconds: 23
Epoch [1/1], Step [2829/13804], Loss: 2.9623, Perplexity: 19.3417, time_taken_in_seconds: 24
Epoch [1/1], Step [2830/13804], Loss: 2.6074, Perplexity: 13.5633, time_taken_in_seconds: 25
Epoch [1/1], Step [2831/13804], Loss: 3.1998, Perplexity: 24.5284, time_taken_in_seconds: 26
Epoch [1/1], Step [2832/13804], Loss: 2.7268, Perplexity: 15.2840, time_taken_in_seconds: 26
Epoch [1/1], Step [2833/13804], Loss: 2.8719, Perplexity: 17.6710, time_taken_in_seconds: 27
Epoch [1/1], Step [2834/13804], Loss: 2.8341, Perplexity: 17.0146, time_taken_in_seconds: 28
Epoch [1/1], Step [2835/13804], Loss: 3.3630, Perplexity: 28.8760, time_taken_in_seconds: 29
Epoch [1/1], Step [2836/13804], Loss: 2.9429, Perplexity: 18.9707, time_taken_in_seconds: 30
Epoch [1/1], Step [2837/13804], Loss: 2.5499, Perplexity: 12.8052, time_taken_in_seconds: 31
Epoch [1/1], Step [2838/13804], Loss: 2.8948, Perplexity: 18.0807, time_taken_in_seconds: 32
Epoch [1/1], Step [2839/13804], Loss: 2.5658, Perplexity: 13.0115, time_taken_in_seconds: 32
Epoch [1/1], Step [2840/13804], Loss: 2.7873, Perplexity: 16.2377, time_taken_in_seconds: 33
Epoch [1/1], Step [2841/13804], Loss: 2.7985, Perplexity: 16.4200, time_taken_in_seconds: 34
Epoch [1/1], Step [2842/13804], Loss: 2.8677, Perplexity: 17.5960, time_taken_in_seconds: 35
Epoch [1/1], Step [2843/13804], Loss: 2.7470, Perplexity: 15.5956, time_taken_in_seconds: 36
Epoch [1/1], Step [2844/13804], Loss: 2.5808, Perplexity: 13.2081, time_taken_in_seconds: 37
Epoch [1/1], Step [2845/13804], Loss: 3.0240, Perplexity: 20.5737, time_taken_in_seconds: 37
Epoch [1/1], Step [2846/13804], Loss: 2.9470, Perplexity: 19.0496, time_taken_in_seconds: 38
Epoch [1/1], Step [2847/13804], Loss: 2.7062, Perplexity: 14.9725, time_taken_in_seconds: 39
Epoch [1/1], Step [2848/13804], Loss: 2.6782, Perplexity: 14.5592, time_taken_in_seconds: 40
Epoch [1/1], Step [2849/13804], Loss: 3.0648, Perplexity: 21.4292, time_taken_in_seconds: 41
Epoch [1/1], Step [2850/13804], Loss: 2.4034, Perplexity: 11.0607, time_taken_in_seconds: 42
Epoch [1/1], Step [2851/13804], Loss: 2.6013, Perplexity: 13.4818, time_taken_in_seconds: 42
Epoch [1/1], Step [2852/13804], Loss: 3.0771, Perplexity: 21.6958, time_taken_in_seconds: 43
Epoch [1/1], Step [2853/13804], Loss: 2.6621, Perplexity: 14.3263, time_taken_in_seconds: 44
Epoch [1/1], Step [2854/13804], Loss: 2.8301, Perplexity: 16.9464, time_taken_in_seconds: 45
Epoch [1/1], Step [2855/13804], Loss: 2.3580, Perplexity: 10.5700, time_taken_in_seconds: 46
Epoch [1/1], Step [2856/13804], Loss: 3.2884, Perplexity: 26.8000, time_taken_in_seconds: 47
Epoch [1/1], Step [2857/13804], Loss: 2.3940, Perplexity: 10.9569, time_taken_in_seconds: 48
Epoch [1/1], Step [2858/13804], Loss: 2.7561, Perplexity: 15.7390, time_taken_in_seconds: 48
Epoch [1/1], Step [2859/13804], Loss: 3.3174, Perplexity: 27.5886, time_taken_in_seconds: 49
Epoch [1/1], Step [2860/13804], Loss: 2.9324, Perplexity: 18.7726, time_taken_in_seconds: 50
Epoch [1/1], Step [2861/13804], Loss: 2.9063, Perplexity: 18.2891, time_taken_in_seconds: 51
Epoch [1/1], Step [2862/13804], Loss: 2.8118, Perplexity: 16.6393, time_taken_in_seconds: 52
Epoch [1/1], Step [2863/13804], Loss: 2.7889, Perplexity: 16.2630, time_taken_in_seconds: 53
Epoch [1/1], Step [2864/13804], Loss: 2.8759, Perplexity: 17.7406, time_taken_in_seconds: 54
Epoch [1/1], Step [2865/13804], Loss: 3.0052, Perplexity: 20.1897, time_taken_in_seconds: 54
Epoch [1/1], Step [2866/13804], Loss: 3.1045, Perplexity: 22.2973, time_taken_in_seconds: 55
Epoch [1/1], Step [2867/13804], Loss: 2.6905, Perplexity: 14.7397, time_taken_in_seconds: 56
Epoch [1/1], Step [2868/13804], Loss: 2.8888, Perplexity: 17.9709, time_taken_in_seconds: 57
Epoch [1/1], Step [2869/13804], Loss: 2.7181, Perplexity: 15.1519, time_taken_in_seconds: 58
Epoch [1/1], Step [2870/13804], Loss: 2.5665, Perplexity: 13.0203, time_taken_in_seconds: 59
Epoch [1/1], Step [2871/13804], Loss: 2.9965, Perplexity: 20.0157, time_taken_in_seconds: 59
Epoch [1/1], Step [2872/13804], Loss: 2.7753, Perplexity: 16.0435, time_taken_in_seconds: 60
Epoch [1/1], Step [2873/13804], Loss: 2.9108, Perplexity: 18.3711, time_taken_in_seconds: 61
Epoch [1/1], Step [2874/13804], Loss: 2.8534, Perplexity: 17.3470, time_taken_in_seconds: 62
Epoch [1/1], Step [2875/13804], Loss: 2.6721, Perplexity: 14.4702, time_taken_in_seconds: 63
Epoch [1/1], Step [2876/13804], Loss: 2.8879, Perplexity: 17.9562, time_taken_in_seconds: 64
Epoch [1/1], Step [2877/13804], Loss: 2.5273, Perplexity: 12.5199, time_taken_in_seconds: 64
Epoch [1/1], Step [2878/13804], Loss: 3.1969, Perplexity: 24.4557, time_taken_in_seconds: 65
Epoch [1/1], Step [2879/13804], Loss: 2.5624, Perplexity: 12.9666, time_taken_in_seconds: 66
Epoch [1/1], Step [2880/13804], Loss: 2.8848, Perplexity: 17.8994, time_taken_in_seconds: 67
Epoch [1/1], Step [2881/13804], Loss: 2.8409, Perplexity: 17.1304, time_taken_in_seconds: 68
Epoch [1/1], Step [2882/13804], Loss: 3.0714, Perplexity: 21.5713, time_taken_in_seconds: 69
Epoch [1/1], Step [2883/13804], Loss: 2.5947, Perplexity: 13.3922, time_taken_in_seconds: 69
Epoch [1/1], Step [2884/13804], Loss: 3.0862, Perplexity: 21.8936, time_taken_in_seconds: 70
Epoch [1/1], Step [2885/13804], Loss: 2.7341, Perplexity: 15.3951, time_taken_in_seconds: 71
Epoch [1/1], Step [2886/13804], Loss: 2.7072, Perplexity: 14.9873, time_taken_in_seconds: 72
Epoch [1/1], Step [2887/13804], Loss: 3.4186, Perplexity: 30.5261, time_taken_in_seconds: 73
Epoch [1/1], Step [2888/13804], Loss: 3.1352, Perplexity: 22.9943, time_taken_in_seconds: 74
Epoch [1/1], Step [2889/13804], Loss: 2.8041, Perplexity: 16.5126, time_taken_in_seconds: 74
Epoch [1/1], Step [2890/13804], Loss: 3.2162, Perplexity: 24.9327, time_taken_in_seconds: 75
Epoch [1/1], Step [2891/13804], Loss: 2.9003, Perplexity: 18.1795, time_taken_in_seconds: 76
Epoch [1/1], Step [2892/13804], Loss: 2.7879, Perplexity: 16.2466, time_taken_in_seconds: 77
Epoch [1/1], Step [2893/13804], Loss: 2.6719, Perplexity: 14.4677, time_taken_in_seconds: 78
Epoch [1/1], Step [2894/13804], Loss: 2.5207, Perplexity: 12.4371, time_taken_in_seconds: 79
Epoch [1/1], Step [2895/13804], Loss: 2.5312, Perplexity: 12.5690, time_taken_in_seconds: 80
Epoch [1/1], Step [2896/13804], Loss: 2.8502, Perplexity: 17.2911, time_taken_in_seconds: 80
Epoch [1/1], Step [2897/13804], Loss: 2.7885, Perplexity: 16.2559, time_taken_in_seconds: 81
Epoch [1/1], Step [2898/13804], Loss: 2.6281, Perplexity: 13.8470, time_taken_in_seconds: 82
Epoch [1/1], Step [2899/13804], Loss: 4.4449, Perplexity: 85.1915, time_taken_in_seconds: 83
Epoch [1/1], Step [2900/13804], Loss: 2.8746, Perplexity: 17.7183, time_taken_in_seconds: 84
Epoch [1/1], Step [2901/13804], Loss: 2.5337, Perplexity: 12.5995, time_taken_in_seconds: 0
Epoch [1/1], Step [2902/13804], Loss: 2.8764, Perplexity: 17.7502, time_taken_in_seconds: 1
Epoch [1/1], Step [2903/13804], Loss: 2.8942, Perplexity: 18.0696, time_taken_in_seconds: 2
Epoch [1/1], Step [2904/13804], Loss: 2.9751, Perplexity: 19.5908, time_taken_in_seconds: 3
Epoch [1/1], Step [2905/13804], Loss: 3.5456, Perplexity: 34.6613, time_taken_in_seconds: 4
Epoch [1/1], Step [2906/13804], Loss: 2.6073, Perplexity: 13.5623, time_taken_in_seconds: 5
Epoch [1/1], Step [2907/13804], Loss: 2.8944, Perplexity: 18.0723, time_taken_in_seconds: 5
Epoch [1/1], Step [2908/13804], Loss: 2.8699, Perplexity: 17.6357, time_taken_in_seconds: 6
Epoch [1/1], Step [2909/13804], Loss: 3.3254, Perplexity: 27.8100, time_taken_in_seconds: 7
Epoch [1/1], Step [2910/13804], Loss: 2.8077, Perplexity: 16.5709, time_taken_in_seconds: 8
Epoch [1/1], Step [2911/13804], Loss: 3.4122, Perplexity: 30.3320, time_taken_in_seconds: 9
Epoch [1/1], Step [2912/13804], Loss: 2.8215, Perplexity: 16.8020, time_taken_in_seconds: 10
Epoch [1/1], Step [2913/13804], Loss: 2.7006, Perplexity: 14.8891, time_taken_in_seconds: 10
Epoch [1/1], Step [2914/13804], Loss: 2.8207, Perplexity: 16.7889, time_taken_in_seconds: 11
Epoch [1/1], Step [2915/13804], Loss: 2.9806, Perplexity: 19.6993, time_taken_in_seconds: 12
Epoch [1/1], Step [2916/13804], Loss: 3.5140, Perplexity: 33.5822, time_taken_in_seconds: 13
Epoch [1/1], Step [2917/13804], Loss: 3.0980, Perplexity: 22.1529, time_taken_in_seconds: 14
Epoch [1/1], Step [2918/13804], Loss: 3.0783, Perplexity: 21.7204, time_taken_in_seconds: 15
Epoch [1/1], Step [2919/13804], Loss: 3.1784, Perplexity: 24.0073, time_taken_in_seconds: 15
Epoch [1/1], Step [2920/13804], Loss: 2.6339, Perplexity: 13.9276, time_taken_in_seconds: 16
Epoch [1/1], Step [2921/13804], Loss: 2.4829, Perplexity: 11.9765, time_taken_in_seconds: 17
Epoch [1/1], Step [2922/13804], Loss: 2.9528, Perplexity: 19.1586, time_taken_in_seconds: 18
Epoch [1/1], Step [2923/13804], Loss: 3.3393, Perplexity: 28.1996, time_taken_in_seconds: 19
Epoch [1/1], Step [2924/13804], Loss: 2.8055, Perplexity: 16.5348, time_taken_in_seconds: 20
Epoch [1/1], Step [2925/13804], Loss: 2.6740, Perplexity: 14.4976, time_taken_in_seconds: 21
Epoch [1/1], Step [2926/13804], Loss: 2.5014, Perplexity: 12.2000, time_taken_in_seconds: 21
Epoch [1/1], Step [2927/13804], Loss: 3.1606, Perplexity: 23.5853, time_taken_in_seconds: 22
Epoch [1/1], Step [2928/13804], Loss: 2.7577, Perplexity: 15.7628, time_taken_in_seconds: 23
Epoch [1/1], Step [2929/13804], Loss: 2.7686, Perplexity: 15.9360, time_taken_in_seconds: 24
Epoch [1/1], Step [2930/13804], Loss: 2.8766, Perplexity: 17.7537, time_taken_in_seconds: 25
Epoch [1/1], Step [2931/13804], Loss: 3.1835, Perplexity: 24.1312, time_taken_in_seconds: 26
Epoch [1/1], Step [2932/13804], Loss: 2.7712, Perplexity: 15.9778, time_taken_in_seconds: 26
Epoch [1/1], Step [2933/13804], Loss: 2.6601, Perplexity: 14.2975, time_taken_in_seconds: 27
Epoch [1/1], Step [2934/13804], Loss: 2.6995, Perplexity: 14.8728, time_taken_in_seconds: 28
Epoch [1/1], Step [2935/13804], Loss: 3.0798, Perplexity: 21.7547, time_taken_in_seconds: 29
Epoch [1/1], Step [2936/13804], Loss: 3.0014, Perplexity: 20.1132, time_taken_in_seconds: 30
Epoch [1/1], Step [2937/13804], Loss: 2.7058, Perplexity: 14.9655, time_taken_in_seconds: 31
Epoch [1/1], Step [2938/13804], Loss: 2.8601, Perplexity: 17.4640, time_taken_in_seconds: 31
Epoch [1/1], Step [2939/13804], Loss: 2.8536, Perplexity: 17.3498, time_taken_in_seconds: 32
Epoch [1/1], Step [2940/13804], Loss: 3.5286, Perplexity: 34.0775, time_taken_in_seconds: 33
Epoch [1/1], Step [2941/13804], Loss: 2.7204, Perplexity: 15.1863, time_taken_in_seconds: 34
Epoch [1/1], Step [2942/13804], Loss: 2.7131, Perplexity: 15.0757, time_taken_in_seconds: 35
Epoch [1/1], Step [2943/13804], Loss: 2.9179, Perplexity: 18.5029, time_taken_in_seconds: 36
Epoch [1/1], Step [2944/13804], Loss: 2.9193, Perplexity: 18.5275, time_taken_in_seconds: 36
Epoch [1/1], Step [2945/13804], Loss: 3.0343, Perplexity: 20.7856, time_taken_in_seconds: 37
Epoch [1/1], Step [2946/13804], Loss: 2.9180, Perplexity: 18.5037, time_taken_in_seconds: 38
Epoch [1/1], Step [2947/13804], Loss: 2.7048, Perplexity: 14.9520, time_taken_in_seconds: 39
Epoch [1/1], Step [2948/13804], Loss: 2.6139, Perplexity: 13.6525, time_taken_in_seconds: 40
Epoch [1/1], Step [2949/13804], Loss: 2.8028, Perplexity: 16.4911, time_taken_in_seconds: 41
Epoch [1/1], Step [2950/13804], Loss: 2.8580, Perplexity: 17.4268, time_taken_in_seconds: 41
Epoch [1/1], Step [2951/13804], Loss: 2.8482, Perplexity: 17.2561, time_taken_in_seconds: 42
Epoch [1/1], Step [2952/13804], Loss: 2.8773, Perplexity: 17.7659, time_taken_in_seconds: 43
Epoch [1/1], Step [2953/13804], Loss: 2.6832, Perplexity: 14.6321, time_taken_in_seconds: 44
Epoch [1/1], Step [2954/13804], Loss: 2.8103, Perplexity: 16.6153, time_taken_in_seconds: 45
Epoch [1/1], Step [2955/13804], Loss: 3.2654, Perplexity: 26.1912, time_taken_in_seconds: 46
Epoch [1/1], Step [2956/13804], Loss: 3.1957, Perplexity: 24.4280, time_taken_in_seconds: 46
Epoch [1/1], Step [2957/13804], Loss: 2.8931, Perplexity: 18.0497, time_taken_in_seconds: 47
Epoch [1/1], Step [2958/13804], Loss: 2.7285, Perplexity: 15.3099, time_taken_in_seconds: 48
Epoch [1/1], Step [2959/13804], Loss: 3.2028, Perplexity: 24.6019, time_taken_in_seconds: 49
Epoch [1/1], Step [2960/13804], Loss: 2.9349, Perplexity: 18.8196, time_taken_in_seconds: 50
Epoch [1/1], Step [2961/13804], Loss: 3.4073, Perplexity: 30.1841, time_taken_in_seconds: 51
Epoch [1/1], Step [2962/13804], Loss: 2.9389, Perplexity: 18.8955, time_taken_in_seconds: 52
Epoch [1/1], Step [2963/13804], Loss: 2.6845, Perplexity: 14.6507, time_taken_in_seconds: 52
Epoch [1/1], Step [2964/13804], Loss: 4.1423, Perplexity: 62.9493, time_taken_in_seconds: 53
Epoch [1/1], Step [2965/13804], Loss: 2.7945, Perplexity: 16.3551, time_taken_in_seconds: 54
Epoch [1/1], Step [2966/13804], Loss: 2.6014, Perplexity: 13.4819, time_taken_in_seconds: 55
Epoch [1/1], Step [2967/13804], Loss: 3.2454, Perplexity: 25.6710, time_taken_in_seconds: 56
Epoch [1/1], Step [2968/13804], Loss: 2.7317, Perplexity: 15.3594, time_taken_in_seconds: 57
Epoch [1/1], Step [2969/13804], Loss: 2.8252, Perplexity: 16.8642, time_taken_in_seconds: 58
Epoch [1/1], Step [2970/13804], Loss: 3.3098, Perplexity: 27.3807, time_taken_in_seconds: 59
Epoch [1/1], Step [2971/13804], Loss: 3.0980, Perplexity: 22.1546, time_taken_in_seconds: 59
Epoch [1/1], Step [2972/13804], Loss: 2.9364, Perplexity: 18.8478, time_taken_in_seconds: 60
Epoch [1/1], Step [2973/13804], Loss: 2.9404, Perplexity: 18.9230, time_taken_in_seconds: 61
Epoch [1/1], Step [2974/13804], Loss: 2.7199, Perplexity: 15.1792, time_taken_in_seconds: 62
Epoch [1/1], Step [2975/13804], Loss: 3.0845, Perplexity: 21.8570, time_taken_in_seconds: 63
Epoch [1/1], Step [2976/13804], Loss: 2.5134, Perplexity: 12.3471, time_taken_in_seconds: 64
Epoch [1/1], Step [2977/13804], Loss: 2.4326, Perplexity: 11.3883, time_taken_in_seconds: 64
Epoch [1/1], Step [2978/13804], Loss: 3.3488, Perplexity: 28.4690, time_taken_in_seconds: 65
Epoch [1/1], Step [2979/13804], Loss: 2.5722, Perplexity: 13.0951, time_taken_in_seconds: 66
Epoch [1/1], Step [2980/13804], Loss: 2.7677, Perplexity: 15.9224, time_taken_in_seconds: 67
Epoch [1/1], Step [2981/13804], Loss: 2.5006, Perplexity: 12.1895, time_taken_in_seconds: 68
Epoch [1/1], Step [2982/13804], Loss: 2.8531, Perplexity: 17.3406, time_taken_in_seconds: 69
Epoch [1/1], Step [2983/13804], Loss: 2.9174, Perplexity: 18.4936, time_taken_in_seconds: 69
Epoch [1/1], Step [2984/13804], Loss: 3.5812, Perplexity: 35.9149, time_taken_in_seconds: 70
Epoch [1/1], Step [2985/13804], Loss: 2.8418, Perplexity: 17.1461, time_taken_in_seconds: 71
Epoch [1/1], Step [2986/13804], Loss: 2.5772, Perplexity: 13.1603, time_taken_in_seconds: 72
Epoch [1/1], Step [2987/13804], Loss: 2.8027, Perplexity: 16.4894, time_taken_in_seconds: 73
Epoch [1/1], Step [2988/13804], Loss: 2.5604, Perplexity: 12.9415, time_taken_in_seconds: 74
Epoch [1/1], Step [2989/13804], Loss: 3.3098, Perplexity: 27.3806, time_taken_in_seconds: 74
Epoch [1/1], Step [2990/13804], Loss: 3.5095, Perplexity: 33.4315, time_taken_in_seconds: 75
Epoch [1/1], Step [2991/13804], Loss: 2.5857, Perplexity: 13.2724, time_taken_in_seconds: 76
Epoch [1/1], Step [2992/13804], Loss: 2.9918, Perplexity: 19.9222, time_taken_in_seconds: 77
Epoch [1/1], Step [2993/13804], Loss: 2.8716, Perplexity: 17.6657, time_taken_in_seconds: 78
Epoch [1/1], Step [2994/13804], Loss: 2.8423, Perplexity: 17.1555, time_taken_in_seconds: 79
Epoch [1/1], Step [2995/13804], Loss: 2.8508, Perplexity: 17.3016, time_taken_in_seconds: 79
Epoch [1/1], Step [2996/13804], Loss: 2.8133, Perplexity: 16.6654, time_taken_in_seconds: 80
Epoch [1/1], Step [2997/13804], Loss: 2.6392, Perplexity: 14.0018, time_taken_in_seconds: 81
Epoch [1/1], Step [2998/13804], Loss: 2.8732, Perplexity: 17.6934, time_taken_in_seconds: 82
Epoch [1/1], Step [2999/13804], Loss: 2.8321, Perplexity: 16.9805, time_taken_in_seconds: 83
Epoch [1/1], Step [3000/13804], Loss: 2.9147, Perplexity: 18.4428, time_taken_in_seconds: 84
Epoch [1/1], Step [3001/13804], Loss: 3.3348, Perplexity: 28.0738, time_taken_in_seconds: 0
Epoch [1/1], Step [3002/13804], Loss: 2.4173, Perplexity: 11.2154, time_taken_in_seconds: 1
Epoch [1/1], Step [3003/13804], Loss: 2.6191, Perplexity: 13.7228, time_taken_in_seconds: 2
Epoch [1/1], Step [3004/13804], Loss: 3.3529, Perplexity: 28.5869, time_taken_in_seconds: 3
Epoch [1/1], Step [3005/13804], Loss: 3.3856, Perplexity: 29.5360, time_taken_in_seconds: 4
Epoch [1/1], Step [3006/13804], Loss: 2.4846, Perplexity: 11.9965, time_taken_in_seconds: 5
Epoch [1/1], Step [3007/13804], Loss: 2.8620, Perplexity: 17.4957, time_taken_in_seconds: 5
Epoch [1/1], Step [3008/13804], Loss: 2.9616, Perplexity: 19.3293, time_taken_in_seconds: 6
Epoch [1/1], Step [3009/13804], Loss: 3.1752, Perplexity: 23.9314, time_taken_in_seconds: 7
Epoch [1/1], Step [3010/13804], Loss: 3.4605, Perplexity: 31.8336, time_taken_in_seconds: 8
Epoch [1/1], Step [3011/13804], Loss: 2.3613, Perplexity: 10.6051, time_taken_in_seconds: 9
Epoch [1/1], Step [3012/13804], Loss: 2.7267, Perplexity: 15.2823, time_taken_in_seconds: 10
Epoch [1/1], Step [3013/13804], Loss: 2.7005, Perplexity: 14.8879, time_taken_in_seconds: 10
Epoch [1/1], Step [3014/13804], Loss: 2.7954, Perplexity: 16.3689, time_taken_in_seconds: 11
Epoch [1/1], Step [3015/13804], Loss: 2.8302, Perplexity: 16.9488, time_taken_in_seconds: 12
Epoch [1/1], Step [3016/13804], Loss: 2.9717, Perplexity: 19.5260, time_taken_in_seconds: 13
Epoch [1/1], Step [3017/13804], Loss: 3.1139, Perplexity: 22.5083, time_taken_in_seconds: 14
Epoch [1/1], Step [3018/13804], Loss: 2.7861, Perplexity: 16.2169, time_taken_in_seconds: 15
Epoch [1/1], Step [3019/13804], Loss: 2.3376, Perplexity: 10.3567, time_taken_in_seconds: 16
Epoch [1/1], Step [3020/13804], Loss: 2.5550, Perplexity: 12.8710, time_taken_in_seconds: 16
Epoch [1/1], Step [3021/13804], Loss: 2.9043, Perplexity: 18.2524, time_taken_in_seconds: 17
Epoch [1/1], Step [3022/13804], Loss: 3.2016, Perplexity: 24.5722, time_taken_in_seconds: 18
Epoch [1/1], Step [3023/13804], Loss: 2.6804, Perplexity: 14.5914, time_taken_in_seconds: 19
Epoch [1/1], Step [3024/13804], Loss: 3.1044, Perplexity: 22.2959, time_taken_in_seconds: 20
Epoch [1/1], Step [3025/13804], Loss: 3.2010, Perplexity: 24.5583, time_taken_in_seconds: 21
Epoch [1/1], Step [3026/13804], Loss: 2.7956, Perplexity: 16.3718, time_taken_in_seconds: 21
Epoch [1/1], Step [3027/13804], Loss: 2.6728, Perplexity: 14.4800, time_taken_in_seconds: 22
Epoch [1/1], Step [3028/13804], Loss: 3.1624, Perplexity: 23.6269, time_taken_in_seconds: 23
Epoch [1/1], Step [3029/13804], Loss: 2.8838, Perplexity: 17.8813, time_taken_in_seconds: 24
Epoch [1/1], Step [3030/13804], Loss: 2.7322, Perplexity: 15.3662, time_taken_in_seconds: 25
Epoch [1/1], Step [3031/13804], Loss: 3.1044, Perplexity: 22.2968, time_taken_in_seconds: 26
Epoch [1/1], Step [3032/13804], Loss: 3.0057, Perplexity: 20.2000, time_taken_in_seconds: 26
Epoch [1/1], Step [3033/13804], Loss: 2.7500, Perplexity: 15.6427, time_taken_in_seconds: 27
Epoch [1/1], Step [3034/13804], Loss: 2.7979, Perplexity: 16.4108, time_taken_in_seconds: 28
Epoch [1/1], Step [3035/13804], Loss: 3.2017, Perplexity: 24.5748, time_taken_in_seconds: 29
Epoch [1/1], Step [3036/13804], Loss: 2.6235, Perplexity: 13.7845, time_taken_in_seconds: 30
Epoch [1/1], Step [3037/13804], Loss: 2.6974, Perplexity: 14.8404, time_taken_in_seconds: 31
Epoch [1/1], Step [3038/13804], Loss: 3.1036, Perplexity: 22.2780, time_taken_in_seconds: 32
Epoch [1/1], Step [3039/13804], Loss: 2.6400, Perplexity: 14.0136, time_taken_in_seconds: 32
Epoch [1/1], Step [3040/13804], Loss: 2.9218, Perplexity: 18.5754, time_taken_in_seconds: 33
Epoch [1/1], Step [3041/13804], Loss: 2.9565, Perplexity: 19.2310, time_taken_in_seconds: 34
Epoch [1/1], Step [3042/13804], Loss: 2.8762, Perplexity: 17.7465, time_taken_in_seconds: 35
Epoch [1/1], Step [3043/13804], Loss: 2.7904, Perplexity: 16.2877, time_taken_in_seconds: 36
Epoch [1/1], Step [3044/13804], Loss: 2.6210, Perplexity: 13.7492, time_taken_in_seconds: 37
Epoch [1/1], Step [3045/13804], Loss: 2.5239, Perplexity: 12.4776, time_taken_in_seconds: 38
Epoch [1/1], Step [3046/13804], Loss: 2.7593, Perplexity: 15.7885, time_taken_in_seconds: 38
Epoch [1/1], Step [3047/13804], Loss: 2.7670, Perplexity: 15.9105, time_taken_in_seconds: 39
Epoch [1/1], Step [3048/13804], Loss: 3.0723, Perplexity: 21.5907, time_taken_in_seconds: 40
Epoch [1/1], Step [3049/13804], Loss: 2.9540, Perplexity: 19.1825, time_taken_in_seconds: 41
Epoch [1/1], Step [3050/13804], Loss: 3.1221, Perplexity: 22.6946, time_taken_in_seconds: 42
Epoch [1/1], Step [3051/13804], Loss: 3.5377, Perplexity: 34.3877, time_taken_in_seconds: 43
Epoch [1/1], Step [3052/13804], Loss: 3.6220, Perplexity: 37.4114, time_taken_in_seconds: 43
Epoch [1/1], Step [3053/13804], Loss: 3.1570, Perplexity: 23.5001, time_taken_in_seconds: 44
Epoch [1/1], Step [3054/13804], Loss: 3.0514, Perplexity: 21.1448, time_taken_in_seconds: 45
Epoch [1/1], Step [3055/13804], Loss: 3.0081, Perplexity: 20.2487, time_taken_in_seconds: 46
Epoch [1/1], Step [3056/13804], Loss: 2.6558, Perplexity: 14.2360, time_taken_in_seconds: 47
Epoch [1/1], Step [3057/13804], Loss: 3.0554, Perplexity: 21.2302, time_taken_in_seconds: 48
Epoch [1/1], Step [3058/13804], Loss: 2.6164, Perplexity: 13.6865, time_taken_in_seconds: 48
Epoch [1/1], Step [3059/13804], Loss: 2.5512, Perplexity: 12.8227, time_taken_in_seconds: 49
Epoch [1/1], Step [3060/13804], Loss: 2.8645, Perplexity: 17.5394, time_taken_in_seconds: 50
Epoch [1/1], Step [3061/13804], Loss: 2.9771, Perplexity: 19.6317, time_taken_in_seconds: 51
Epoch [1/1], Step [3062/13804], Loss: 2.7016, Perplexity: 14.9043, time_taken_in_seconds: 52
Epoch [1/1], Step [3063/13804], Loss: 2.9736, Perplexity: 19.5616, time_taken_in_seconds: 53
Epoch [1/1], Step [3064/13804], Loss: 2.6177, Perplexity: 13.7043, time_taken_in_seconds: 53
Epoch [1/1], Step [3065/13804], Loss: 2.8381, Perplexity: 17.0833, time_taken_in_seconds: 54
Epoch [1/1], Step [3066/13804], Loss: 2.9116, Perplexity: 18.3854, time_taken_in_seconds: 55
Epoch [1/1], Step [3067/13804], Loss: 2.5758, Perplexity: 13.1417, time_taken_in_seconds: 56
Epoch [1/1], Step [3068/13804], Loss: 2.7149, Perplexity: 15.1027, time_taken_in_seconds: 57
Epoch [1/1], Step [3069/13804], Loss: 2.3581, Perplexity: 10.5713, time_taken_in_seconds: 58
Epoch [1/1], Step [3070/13804], Loss: 2.7037, Perplexity: 14.9352, time_taken_in_seconds: 59
Epoch [1/1], Step [3071/13804], Loss: 2.6924, Perplexity: 14.7671, time_taken_in_seconds: 59
Epoch [1/1], Step [3072/13804], Loss: 2.8484, Perplexity: 17.2607, time_taken_in_seconds: 60
Epoch [1/1], Step [3073/13804], Loss: 4.1931, Perplexity: 66.2282, time_taken_in_seconds: 61
Epoch [1/1], Step [3074/13804], Loss: 2.7260, Perplexity: 15.2718, time_taken_in_seconds: 62
Epoch [1/1], Step [3075/13804], Loss: 2.6772, Perplexity: 14.5443, time_taken_in_seconds: 63
Epoch [1/1], Step [3076/13804], Loss: 2.4887, Perplexity: 12.0461, time_taken_in_seconds: 64
Epoch [1/1], Step [3077/13804], Loss: 2.8620, Perplexity: 17.4958, time_taken_in_seconds: 64
Epoch [1/1], Step [3078/13804], Loss: 2.9138, Perplexity: 18.4265, time_taken_in_seconds: 65
Epoch [1/1], Step [3079/13804], Loss: 3.0066, Perplexity: 20.2180, time_taken_in_seconds: 66
Epoch [1/1], Step [3080/13804], Loss: 2.8913, Perplexity: 18.0162, time_taken_in_seconds: 67
Epoch [1/1], Step [3081/13804], Loss: 3.3603, Perplexity: 28.7976, time_taken_in_seconds: 68
Epoch [1/1], Step [3082/13804], Loss: 2.9639, Perplexity: 19.3733, time_taken_in_seconds: 69
Epoch [1/1], Step [3083/13804], Loss: 2.7089, Perplexity: 15.0132, time_taken_in_seconds: 69
Epoch [1/1], Step [3084/13804], Loss: 2.6797, Perplexity: 14.5803, time_taken_in_seconds: 70
Epoch [1/1], Step [3085/13804], Loss: 2.6583, Perplexity: 14.2723, time_taken_in_seconds: 71
Epoch [1/1], Step [3086/13804], Loss: 2.8077, Perplexity: 16.5721, time_taken_in_seconds: 72
Epoch [1/1], Step [3087/13804], Loss: 2.8134, Perplexity: 16.6662, time_taken_in_seconds: 73
Epoch [1/1], Step [3088/13804], Loss: 3.3977, Perplexity: 29.8960, time_taken_in_seconds: 74
Epoch [1/1], Step [3089/13804], Loss: 3.0550, Perplexity: 21.2208, time_taken_in_seconds: 74
Epoch [1/1], Step [3090/13804], Loss: 2.4976, Perplexity: 12.1533, time_taken_in_seconds: 75
Epoch [1/1], Step [3091/13804], Loss: 2.8142, Perplexity: 16.6790, time_taken_in_seconds: 76
Epoch [1/1], Step [3092/13804], Loss: 2.6628, Perplexity: 14.3368, time_taken_in_seconds: 77
Epoch [1/1], Step [3093/13804], Loss: 2.7930, Perplexity: 16.3305, time_taken_in_seconds: 78
Epoch [1/1], Step [3094/13804], Loss: 2.9922, Perplexity: 19.9289, time_taken_in_seconds: 79
Epoch [1/1], Step [3095/13804], Loss: 2.8441, Perplexity: 17.1855, time_taken_in_seconds: 79
Epoch [1/1], Step [3096/13804], Loss: 3.0236, Perplexity: 20.5658, time_taken_in_seconds: 80
Epoch [1/1], Step [3097/13804], Loss: 2.4351, Perplexity: 11.4164, time_taken_in_seconds: 81
Epoch [1/1], Step [3098/13804], Loss: 2.7823, Perplexity: 16.1557, time_taken_in_seconds: 82
Epoch [1/1], Step [3099/13804], Loss: 2.7209, Perplexity: 15.1933, time_taken_in_seconds: 83
Epoch [1/1], Step [3100/13804], Loss: 2.7261, Perplexity: 15.2735, time_taken_in_seconds: 84
Epoch [1/1], Step [3101/13804], Loss: 3.2579, Perplexity: 25.9948, time_taken_in_seconds: 0
Epoch [1/1], Step [3102/13804], Loss: 2.6870, Perplexity: 14.6875, time_taken_in_seconds: 1
Epoch [1/1], Step [3103/13804], Loss: 2.9323, Perplexity: 18.7704, time_taken_in_seconds: 2
Epoch [1/1], Step [3104/13804], Loss: 3.0311, Perplexity: 20.7208, time_taken_in_seconds: 3
Epoch [1/1], Step [3105/13804], Loss: 2.5586, Perplexity: 12.9179, time_taken_in_seconds: 4
Epoch [1/1], Step [3106/13804], Loss: 3.2788, Perplexity: 26.5441, time_taken_in_seconds: 5
Epoch [1/1], Step [3107/13804], Loss: 2.9452, Perplexity: 19.0151, time_taken_in_seconds: 5
Epoch [1/1], Step [3108/13804], Loss: 3.0761, Perplexity: 21.6740, time_taken_in_seconds: 6
Epoch [1/1], Step [3109/13804], Loss: 2.7561, Perplexity: 15.7379, time_taken_in_seconds: 7
Epoch [1/1], Step [3110/13804], Loss: 3.4758, Perplexity: 32.3241, time_taken_in_seconds: 8
Epoch [1/1], Step [3111/13804], Loss: 2.9804, Perplexity: 19.6967, time_taken_in_seconds: 9
Epoch [1/1], Step [3112/13804], Loss: 2.8364, Perplexity: 17.0537, time_taken_in_seconds: 10
Epoch [1/1], Step [3113/13804], Loss: 2.5750, Perplexity: 13.1316, time_taken_in_seconds: 11
Epoch [1/1], Step [3114/13804], Loss: 2.7251, Perplexity: 15.2573, time_taken_in_seconds: 11
Epoch [1/1], Step [3115/13804], Loss: 3.0033, Perplexity: 20.1522, time_taken_in_seconds: 12
Epoch [1/1], Step [3116/13804], Loss: 2.3673, Perplexity: 10.6688, time_taken_in_seconds: 13
Epoch [1/1], Step [3117/13804], Loss: 3.0284, Perplexity: 20.6638, time_taken_in_seconds: 14
Epoch [1/1], Step [3118/13804], Loss: 3.0802, Perplexity: 21.7620, time_taken_in_seconds: 15
Epoch [1/1], Step [3119/13804], Loss: 3.0293, Perplexity: 20.6833, time_taken_in_seconds: 16
Epoch [1/1], Step [3120/13804], Loss: 2.7031, Perplexity: 14.9260, time_taken_in_seconds: 17
Epoch [1/1], Step [3121/13804], Loss: 2.9740, Perplexity: 19.5692, time_taken_in_seconds: 17
Epoch [1/1], Step [3122/13804], Loss: 3.0175, Perplexity: 20.4411, time_taken_in_seconds: 18
Epoch [1/1], Step [3123/13804], Loss: 2.7855, Perplexity: 16.2075, time_taken_in_seconds: 19
Epoch [1/1], Step [3124/13804], Loss: 2.9588, Perplexity: 19.2755, time_taken_in_seconds: 20
Epoch [1/1], Step [3125/13804], Loss: 2.6452, Perplexity: 14.0862, time_taken_in_seconds: 21
Epoch [1/1], Step [3126/13804], Loss: 2.7184, Perplexity: 15.1554, time_taken_in_seconds: 21
Epoch [1/1], Step [3127/13804], Loss: 2.5341, Perplexity: 12.6054, time_taken_in_seconds: 22
Epoch [1/1], Step [3128/13804], Loss: 2.5835, Perplexity: 13.2433, time_taken_in_seconds: 23
Epoch [1/1], Step [3129/13804], Loss: 2.9212, Perplexity: 18.5641, time_taken_in_seconds: 24
Epoch [1/1], Step [3130/13804], Loss: 2.4367, Perplexity: 11.4356, time_taken_in_seconds: 25
Epoch [1/1], Step [3131/13804], Loss: 2.7917, Perplexity: 16.3095, time_taken_in_seconds: 26
Epoch [1/1], Step [3132/13804], Loss: 2.7572, Perplexity: 15.7549, time_taken_in_seconds: 27
Epoch [1/1], Step [3133/13804], Loss: 2.7988, Perplexity: 16.4257, time_taken_in_seconds: 27
Epoch [1/1], Step [3134/13804], Loss: 2.6777, Perplexity: 14.5520, time_taken_in_seconds: 28
Epoch [1/1], Step [3135/13804], Loss: 2.9182, Perplexity: 18.5089, time_taken_in_seconds: 29
Epoch [1/1], Step [3136/13804], Loss: 2.9085, Perplexity: 18.3292, time_taken_in_seconds: 30
Epoch [1/1], Step [3137/13804], Loss: 3.5316, Perplexity: 34.1799, time_taken_in_seconds: 31
Epoch [1/1], Step [3138/13804], Loss: 2.9877, Perplexity: 19.8403, time_taken_in_seconds: 32
Epoch [1/1], Step [3139/13804], Loss: 2.5630, Perplexity: 12.9745, time_taken_in_seconds: 32
Epoch [1/1], Step [3140/13804], Loss: 2.7989, Perplexity: 16.4259, time_taken_in_seconds: 33
Epoch [1/1], Step [3141/13804], Loss: 2.9199, Perplexity: 18.5387, time_taken_in_seconds: 34
Epoch [1/1], Step [3142/13804], Loss: 2.9638, Perplexity: 19.3711, time_taken_in_seconds: 35
Epoch [1/1], Step [3143/13804], Loss: 2.7748, Perplexity: 16.0347, time_taken_in_seconds: 36
Epoch [1/1], Step [3144/13804], Loss: 3.1452, Perplexity: 23.2246, time_taken_in_seconds: 37
Epoch [1/1], Step [3145/13804], Loss: 3.0241, Perplexity: 20.5749, time_taken_in_seconds: 37
Epoch [1/1], Step [3146/13804], Loss: 2.7360, Perplexity: 15.4258, time_taken_in_seconds: 38
Epoch [1/1], Step [3147/13804], Loss: 2.9935, Perplexity: 19.9559, time_taken_in_seconds: 39
Epoch [1/1], Step [3148/13804], Loss: 3.0278, Perplexity: 20.6514, time_taken_in_seconds: 40
Epoch [1/1], Step [3149/13804], Loss: 2.4884, Perplexity: 12.0415, time_taken_in_seconds: 41
Epoch [1/1], Step [3150/13804], Loss: 2.7398, Perplexity: 15.4847, time_taken_in_seconds: 42
Epoch [1/1], Step [3151/13804], Loss: 2.5016, Perplexity: 12.2023, time_taken_in_seconds: 43
Epoch [1/1], Step [3152/13804], Loss: 2.5940, Perplexity: 13.3836, time_taken_in_seconds: 43
Epoch [1/1], Step [3153/13804], Loss: 4.8969, Perplexity: 133.8774, time_taken_in_seconds: 44
Epoch [1/1], Step [3154/13804], Loss: 2.5450, Perplexity: 12.7434, time_taken_in_seconds: 45
Epoch [1/1], Step [3155/13804], Loss: 3.3767, Perplexity: 29.2736, time_taken_in_seconds: 46
Epoch [1/1], Step [3156/13804], Loss: 2.9830, Perplexity: 19.7473, time_taken_in_seconds: 47
Epoch [1/1], Step [3157/13804], Loss: 2.8376, Perplexity: 17.0739, time_taken_in_seconds: 48
Epoch [1/1], Step [3158/13804], Loss: 2.9975, Perplexity: 20.0359, time_taken_in_seconds: 48
Epoch [1/1], Step [3159/13804], Loss: 2.4916, Perplexity: 12.0810, time_taken_in_seconds: 49
Epoch [1/1], Step [3160/13804], Loss: 2.7506, Perplexity: 15.6522, time_taken_in_seconds: 50
Epoch [1/1], Step [3161/13804], Loss: 2.6736, Perplexity: 14.4917, time_taken_in_seconds: 51
Epoch [1/1], Step [3162/13804], Loss: 2.6856, Perplexity: 14.6674, time_taken_in_seconds: 52
Epoch [1/1], Step [3163/13804], Loss: 2.8445, Perplexity: 17.1923, time_taken_in_seconds: 53
Epoch [1/1], Step [3164/13804], Loss: 2.5455, Perplexity: 12.7491, time_taken_in_seconds: 53
Epoch [1/1], Step [3165/13804], Loss: 2.7882, Perplexity: 16.2520, time_taken_in_seconds: 54
Epoch [1/1], Step [3166/13804], Loss: 2.9199, Perplexity: 18.5403, time_taken_in_seconds: 55
Epoch [1/1], Step [3167/13804], Loss: 3.0438, Perplexity: 20.9838, time_taken_in_seconds: 56
Epoch [1/1], Step [3168/13804], Loss: 2.5718, Perplexity: 13.0893, time_taken_in_seconds: 57
Epoch [1/1], Step [3169/13804], Loss: 3.1749, Perplexity: 23.9251, time_taken_in_seconds: 58
Epoch [1/1], Step [3170/13804], Loss: 2.5158, Perplexity: 12.3764, time_taken_in_seconds: 58
Epoch [1/1], Step [3171/13804], Loss: 2.9317, Perplexity: 18.7592, time_taken_in_seconds: 59
Epoch [1/1], Step [3172/13804], Loss: 2.8924, Perplexity: 18.0363, time_taken_in_seconds: 60
Epoch [1/1], Step [3173/13804], Loss: 3.4353, Perplexity: 31.0400, time_taken_in_seconds: 61
Epoch [1/1], Step [3174/13804], Loss: 2.6306, Perplexity: 13.8814, time_taken_in_seconds: 62
Epoch [1/1], Step [3175/13804], Loss: 2.8083, Perplexity: 16.5810, time_taken_in_seconds: 63
Epoch [1/1], Step [3176/13804], Loss: 2.7385, Perplexity: 15.4643, time_taken_in_seconds: 63
Epoch [1/1], Step [3177/13804], Loss: 2.8618, Perplexity: 17.4933, time_taken_in_seconds: 64
Epoch [1/1], Step [3178/13804], Loss: 2.8338, Perplexity: 17.0100, time_taken_in_seconds: 65
Epoch [1/1], Step [3179/13804], Loss: 2.8372, Perplexity: 17.0676, time_taken_in_seconds: 66
Epoch [1/1], Step [3180/13804], Loss: 2.7958, Perplexity: 16.3751, time_taken_in_seconds: 67
Epoch [1/1], Step [3181/13804], Loss: 2.7771, Perplexity: 16.0722, time_taken_in_seconds: 68
Epoch [1/1], Step [3182/13804], Loss: 2.5915, Perplexity: 13.3498, time_taken_in_seconds: 68
Epoch [1/1], Step [3183/13804], Loss: 2.8409, Perplexity: 17.1306, time_taken_in_seconds: 69
Epoch [1/1], Step [3184/13804], Loss: 2.7107, Perplexity: 15.0392, time_taken_in_seconds: 70
Epoch [1/1], Step [3185/13804], Loss: 2.8774, Perplexity: 17.7676, time_taken_in_seconds: 71
Epoch [1/1], Step [3186/13804], Loss: 2.8573, Perplexity: 17.4141, time_taken_in_seconds: 72
Epoch [1/1], Step [3187/13804], Loss: 2.7071, Perplexity: 14.9850, time_taken_in_seconds: 73
Epoch [1/1], Step [3188/13804], Loss: 2.9199, Perplexity: 18.5392, time_taken_in_seconds: 74
Epoch [1/1], Step [3189/13804], Loss: 2.5159, Perplexity: 12.3777, time_taken_in_seconds: 74
Epoch [1/1], Step [3190/13804], Loss: 2.5547, Perplexity: 12.8680, time_taken_in_seconds: 75
Epoch [1/1], Step [3191/13804], Loss: 2.3961, Perplexity: 10.9807, time_taken_in_seconds: 76
Epoch [1/1], Step [3192/13804], Loss: 2.8659, Perplexity: 17.5640, time_taken_in_seconds: 77
Epoch [1/1], Step [3193/13804], Loss: 2.8222, Perplexity: 16.8138, time_taken_in_seconds: 78
Epoch [1/1], Step [3194/13804], Loss: 2.9375, Perplexity: 18.8694, time_taken_in_seconds: 79
Epoch [1/1], Step [3195/13804], Loss: 2.8268, Perplexity: 16.8914, time_taken_in_seconds: 79
Epoch [1/1], Step [3196/13804], Loss: 3.0676, Perplexity: 21.4903, time_taken_in_seconds: 80
Epoch [1/1], Step [3197/13804], Loss: 2.3503, Perplexity: 10.4890, time_taken_in_seconds: 81
Epoch [1/1], Step [3198/13804], Loss: 3.3658, Perplexity: 28.9557, time_taken_in_seconds: 82
Epoch [1/1], Step [3199/13804], Loss: 2.8023, Perplexity: 16.4825, time_taken_in_seconds: 83
Epoch [1/1], Step [3200/13804], Loss: 2.9027, Perplexity: 18.2239, time_taken_in_seconds: 83
Epoch [1/1], Step [3201/13804], Loss: 2.8410, Perplexity: 17.1335, time_taken_in_seconds: 0
Epoch [1/1], Step [3202/13804], Loss: 2.5471, Perplexity: 12.7706, time_taken_in_seconds: 1
Epoch [1/1], Step [3203/13804], Loss: 3.0496, Perplexity: 21.1077, time_taken_in_seconds: 2
Epoch [1/1], Step [3204/13804], Loss: 2.6043, Perplexity: 13.5218, time_taken_in_seconds: 3
Epoch [1/1], Step [3205/13804], Loss: 2.7719, Perplexity: 15.9889, time_taken_in_seconds: 4
Epoch [1/1], Step [3206/13804], Loss: 2.9106, Perplexity: 18.3685, time_taken_in_seconds: 5
Epoch [1/1], Step [3207/13804], Loss: 2.7111, Perplexity: 15.0456, time_taken_in_seconds: 5
Epoch [1/1], Step [3208/13804], Loss: 2.7439, Perplexity: 15.5482, time_taken_in_seconds: 6
Epoch [1/1], Step [3209/13804], Loss: 2.8679, Perplexity: 17.6001, time_taken_in_seconds: 7
Epoch [1/1], Step [3210/13804], Loss: 2.6987, Perplexity: 14.8606, time_taken_in_seconds: 8
Epoch [1/1], Step [3211/13804], Loss: 2.4502, Perplexity: 11.5909, time_taken_in_seconds: 9
Epoch [1/1], Step [3212/13804], Loss: 3.3150, Perplexity: 27.5229, time_taken_in_seconds: 9
Epoch [1/1], Step [3213/13804], Loss: 2.7455, Perplexity: 15.5724, time_taken_in_seconds: 10
Epoch [1/1], Step [3214/13804], Loss: 2.6783, Perplexity: 14.5607, time_taken_in_seconds: 11
Epoch [1/1], Step [3215/13804], Loss: 3.6863, Perplexity: 39.8981, time_taken_in_seconds: 12
Epoch [1/1], Step [3216/13804], Loss: 2.9207, Perplexity: 18.5543, time_taken_in_seconds: 13
Epoch [1/1], Step [3217/13804], Loss: 2.7419, Perplexity: 15.5158, time_taken_in_seconds: 14
Epoch [1/1], Step [3218/13804], Loss: 3.0494, Perplexity: 21.1020, time_taken_in_seconds: 14
Epoch [1/1], Step [3219/13804], Loss: 2.7840, Perplexity: 16.1840, time_taken_in_seconds: 15
Epoch [1/1], Step [3220/13804], Loss: 2.9616, Perplexity: 19.3296, time_taken_in_seconds: 16
Epoch [1/1], Step [3221/13804], Loss: 2.7103, Perplexity: 15.0336, time_taken_in_seconds: 17
Epoch [1/1], Step [3222/13804], Loss: 2.6772, Perplexity: 14.5436, time_taken_in_seconds: 18
Epoch [1/1], Step [3223/13804], Loss: 2.9795, Perplexity: 19.6785, time_taken_in_seconds: 19
Epoch [1/1], Step [3224/13804], Loss: 3.0595, Perplexity: 21.3176, time_taken_in_seconds: 19
Epoch [1/1], Step [3225/13804], Loss: 2.5555, Perplexity: 12.8779, time_taken_in_seconds: 20
Epoch [1/1], Step [3226/13804], Loss: 2.9587, Perplexity: 19.2729, time_taken_in_seconds: 21
Epoch [1/1], Step [3227/13804], Loss: 2.8952, Perplexity: 18.0874, time_taken_in_seconds: 22
Epoch [1/1], Step [3228/13804], Loss: 2.7952, Perplexity: 16.3652, time_taken_in_seconds: 23
Epoch [1/1], Step [3229/13804], Loss: 2.9208, Perplexity: 18.5566, time_taken_in_seconds: 24
Epoch [1/1], Step [3230/13804], Loss: 3.1066, Perplexity: 22.3446, time_taken_in_seconds: 24
Epoch [1/1], Step [3231/13804], Loss: 2.7524, Perplexity: 15.6799, time_taken_in_seconds: 25
Epoch [1/1], Step [3232/13804], Loss: 2.9413, Perplexity: 18.9400, time_taken_in_seconds: 26
Epoch [1/1], Step [3233/13804], Loss: 2.8477, Perplexity: 17.2475, time_taken_in_seconds: 27
Epoch [1/1], Step [3234/13804], Loss: 2.7948, Perplexity: 16.3597, time_taken_in_seconds: 28
Epoch [1/1], Step [3235/13804], Loss: 3.0697, Perplexity: 21.5351, time_taken_in_seconds: 29
Epoch [1/1], Step [3236/13804], Loss: 2.6502, Perplexity: 14.1565, time_taken_in_seconds: 29
Epoch [1/1], Step [3237/13804], Loss: 2.8313, Perplexity: 16.9683, time_taken_in_seconds: 30
Epoch [1/1], Step [3238/13804], Loss: 2.7712, Perplexity: 15.9784, time_taken_in_seconds: 31
Epoch [1/1], Step [3239/13804], Loss: 3.0283, Perplexity: 20.6630, time_taken_in_seconds: 32
Epoch [1/1], Step [3240/13804], Loss: 2.6025, Perplexity: 13.4976, time_taken_in_seconds: 33
Epoch [1/1], Step [3241/13804], Loss: 2.8597, Perplexity: 17.4565, time_taken_in_seconds: 34
Epoch [1/1], Step [3242/13804], Loss: 2.8035, Perplexity: 16.5027, time_taken_in_seconds: 34
Epoch [1/1], Step [3243/13804], Loss: 2.6899, Perplexity: 14.7305, time_taken_in_seconds: 35
Epoch [1/1], Step [3244/13804], Loss: 2.8456, Perplexity: 17.2121, time_taken_in_seconds: 36
Epoch [1/1], Step [3245/13804], Loss: 3.0287, Perplexity: 20.6710, time_taken_in_seconds: 37
Epoch [1/1], Step [3246/13804], Loss: 2.6715, Perplexity: 14.4615, time_taken_in_seconds: 38
Epoch [1/1], Step [3247/13804], Loss: 2.6534, Perplexity: 14.2016, time_taken_in_seconds: 39
Epoch [1/1], Step [3248/13804], Loss: 2.9662, Perplexity: 19.4173, time_taken_in_seconds: 39
Epoch [1/1], Step [3249/13804], Loss: 2.5295, Perplexity: 12.5478, time_taken_in_seconds: 40
Epoch [1/1], Step [3250/13804], Loss: 2.7340, Perplexity: 15.3946, time_taken_in_seconds: 41
Epoch [1/1], Step [3251/13804], Loss: 2.6165, Perplexity: 13.6877, time_taken_in_seconds: 42
Epoch [1/1], Step [3252/13804], Loss: 2.7431, Perplexity: 15.5350, time_taken_in_seconds: 43
Epoch [1/1], Step [3253/13804], Loss: 2.5664, Perplexity: 13.0194, time_taken_in_seconds: 44
Epoch [1/1], Step [3254/13804], Loss: 2.4706, Perplexity: 11.8295, time_taken_in_seconds: 44
Epoch [1/1], Step [3255/13804], Loss: 2.9255, Perplexity: 18.6427, time_taken_in_seconds: 45
Epoch [1/1], Step [3256/13804], Loss: 2.8759, Perplexity: 17.7414, time_taken_in_seconds: 46
Epoch [1/1], Step [3257/13804], Loss: 2.7736, Perplexity: 16.0162, time_taken_in_seconds: 47
Epoch [1/1], Step [3258/13804], Loss: 2.9884, Perplexity: 19.8539, time_taken_in_seconds: 48
Epoch [1/1], Step [3259/13804], Loss: 2.5465, Perplexity: 12.7629, time_taken_in_seconds: 49
Epoch [1/1], Step [3260/13804], Loss: 2.8449, Perplexity: 17.1993, time_taken_in_seconds: 50
Epoch [1/1], Step [3261/13804], Loss: 3.1319, Perplexity: 22.9174, time_taken_in_seconds: 50
Epoch [1/1], Step [3262/13804], Loss: 3.2361, Perplexity: 25.4346, time_taken_in_seconds: 51
Epoch [1/1], Step [3263/13804], Loss: 3.0700, Perplexity: 21.5418, time_taken_in_seconds: 52
Epoch [1/1], Step [3264/13804], Loss: 2.5228, Perplexity: 12.4637, time_taken_in_seconds: 53
Epoch [1/1], Step [3265/13804], Loss: 2.8882, Perplexity: 17.9602, time_taken_in_seconds: 54
Epoch [1/1], Step [3266/13804], Loss: 2.7751, Perplexity: 16.0403, time_taken_in_seconds: 55
Epoch [1/1], Step [3267/13804], Loss: 2.6372, Perplexity: 13.9736, time_taken_in_seconds: 55
Epoch [1/1], Step [3268/13804], Loss: 2.7409, Perplexity: 15.5006, time_taken_in_seconds: 56
Epoch [1/1], Step [3269/13804], Loss: 2.6931, Perplexity: 14.7767, time_taken_in_seconds: 57
Epoch [1/1], Step [3270/13804], Loss: 2.9864, Perplexity: 19.8135, time_taken_in_seconds: 58
Epoch [1/1], Step [3271/13804], Loss: 3.0804, Perplexity: 21.7667, time_taken_in_seconds: 59
Epoch [1/1], Step [3272/13804], Loss: 3.3342, Perplexity: 28.0556, time_taken_in_seconds: 60
Epoch [1/1], Step [3273/13804], Loss: 3.0422, Perplexity: 20.9502, time_taken_in_seconds: 60
Epoch [1/1], Step [3274/13804], Loss: 2.7977, Perplexity: 16.4073, time_taken_in_seconds: 61
Epoch [1/1], Step [3275/13804], Loss: 2.6653, Perplexity: 14.3719, time_taken_in_seconds: 62
Epoch [1/1], Step [3276/13804], Loss: 3.3464, Perplexity: 28.4006, time_taken_in_seconds: 63
Epoch [1/1], Step [3277/13804], Loss: 2.8041, Perplexity: 16.5116, time_taken_in_seconds: 64
Epoch [1/1], Step [3278/13804], Loss: 2.9670, Perplexity: 19.4340, time_taken_in_seconds: 65
Epoch [1/1], Step [3279/13804], Loss: 2.9843, Perplexity: 19.7722, time_taken_in_seconds: 66
Epoch [1/1], Step [3280/13804], Loss: 2.9675, Perplexity: 19.4435, time_taken_in_seconds: 66
Epoch [1/1], Step [3281/13804], Loss: 2.8867, Perplexity: 17.9340, time_taken_in_seconds: 67
Epoch [1/1], Step [3282/13804], Loss: 2.9700, Perplexity: 19.4917, time_taken_in_seconds: 68
Epoch [1/1], Step [3283/13804], Loss: 2.8583, Perplexity: 17.4326, time_taken_in_seconds: 69
Epoch [1/1], Step [3284/13804], Loss: 2.6762, Perplexity: 14.5305, time_taken_in_seconds: 70
Epoch [1/1], Step [3285/13804], Loss: 2.9974, Perplexity: 20.0338, time_taken_in_seconds: 70
Epoch [1/1], Step [3286/13804], Loss: 2.7926, Perplexity: 16.3238, time_taken_in_seconds: 71
Epoch [1/1], Step [3287/13804], Loss: 3.0866, Perplexity: 21.9015, time_taken_in_seconds: 72
Epoch [1/1], Step [3288/13804], Loss: 2.9168, Perplexity: 18.4812, time_taken_in_seconds: 73
Epoch [1/1], Step [3289/13804], Loss: 2.8035, Perplexity: 16.5027, time_taken_in_seconds: 74
Epoch [1/1], Step [3290/13804], Loss: 3.2298, Perplexity: 25.2756, time_taken_in_seconds: 75
Epoch [1/1], Step [3291/13804], Loss: 2.8051, Perplexity: 16.5286, time_taken_in_seconds: 75
Epoch [1/1], Step [3292/13804], Loss: 3.0878, Perplexity: 21.9289, time_taken_in_seconds: 76
Epoch [1/1], Step [3293/13804], Loss: 2.7047, Perplexity: 14.9493, time_taken_in_seconds: 77
Epoch [1/1], Step [3294/13804], Loss: 2.3506, Perplexity: 10.4919, time_taken_in_seconds: 78
Epoch [1/1], Step [3295/13804], Loss: 2.6776, Perplexity: 14.5504, time_taken_in_seconds: 79
Epoch [1/1], Step [3296/13804], Loss: 2.7576, Perplexity: 15.7621, time_taken_in_seconds: 80
Epoch [1/1], Step [3297/13804], Loss: 2.7472, Perplexity: 15.5993, time_taken_in_seconds: 80
Epoch [1/1], Step [3298/13804], Loss: 2.5618, Perplexity: 12.9592, time_taken_in_seconds: 81
Epoch [1/1], Step [3299/13804], Loss: 2.9429, Perplexity: 18.9706, time_taken_in_seconds: 82
Epoch [1/1], Step [3300/13804], Loss: 2.7086, Perplexity: 15.0080, time_taken_in_seconds: 83
Epoch [1/1], Step [3301/13804], Loss: 2.8419, Perplexity: 17.1485, time_taken_in_seconds: 0
Epoch [1/1], Step [3302/13804], Loss: 2.7012, Perplexity: 14.8973, time_taken_in_seconds: 1
Epoch [1/1], Step [3303/13804], Loss: 3.0448, Perplexity: 21.0057, time_taken_in_seconds: 2
Epoch [1/1], Step [3304/13804], Loss: 2.9081, Perplexity: 18.3211, time_taken_in_seconds: 3
Epoch [1/1], Step [3305/13804], Loss: 2.9901, Perplexity: 19.8881, time_taken_in_seconds: 4
Epoch [1/1], Step [3306/13804], Loss: 2.7561, Perplexity: 15.7380, time_taken_in_seconds: 4
Epoch [1/1], Step [3307/13804], Loss: 3.3575, Perplexity: 28.7162, time_taken_in_seconds: 5
Epoch [1/1], Step [3308/13804], Loss: 2.6559, Perplexity: 14.2374, time_taken_in_seconds: 6
Epoch [1/1], Step [3309/13804], Loss: 2.5943, Perplexity: 13.3867, time_taken_in_seconds: 7
Epoch [1/1], Step [3310/13804], Loss: 2.5837, Perplexity: 13.2463, time_taken_in_seconds: 8
Epoch [1/1], Step [3311/13804], Loss: 2.8891, Perplexity: 17.9768, time_taken_in_seconds: 9
Epoch [1/1], Step [3312/13804], Loss: 2.8501, Perplexity: 17.2889, time_taken_in_seconds: 9
Epoch [1/1], Step [3313/13804], Loss: 2.5266, Perplexity: 12.5109, time_taken_in_seconds: 10
Epoch [1/1], Step [3314/13804], Loss: 2.6822, Perplexity: 14.6167, time_taken_in_seconds: 11
Epoch [1/1], Step [3315/13804], Loss: 3.5205, Perplexity: 33.8027, time_taken_in_seconds: 12
Epoch [1/1], Step [3316/13804], Loss: 2.6590, Perplexity: 14.2821, time_taken_in_seconds: 13
Epoch [1/1], Step [3317/13804], Loss: 2.6038, Perplexity: 13.5148, time_taken_in_seconds: 14
Epoch [1/1], Step [3318/13804], Loss: 2.9283, Perplexity: 18.6953, time_taken_in_seconds: 14
Epoch [1/1], Step [3319/13804], Loss: 2.4771, Perplexity: 11.9065, time_taken_in_seconds: 15
Epoch [1/1], Step [3320/13804], Loss: 3.1427, Perplexity: 23.1674, time_taken_in_seconds: 16
Epoch [1/1], Step [3321/13804], Loss: 2.7814, Perplexity: 16.1419, time_taken_in_seconds: 17
Epoch [1/1], Step [3322/13804], Loss: 3.0519, Perplexity: 21.1554, time_taken_in_seconds: 18
Epoch [1/1], Step [3323/13804], Loss: 2.8316, Perplexity: 16.9732, time_taken_in_seconds: 19
Epoch [1/1], Step [3324/13804], Loss: 2.9702, Perplexity: 19.4956, time_taken_in_seconds: 19
Epoch [1/1], Step [3325/13804], Loss: 2.6267, Perplexity: 13.8282, time_taken_in_seconds: 20
Epoch [1/1], Step [3326/13804], Loss: 3.0395, Perplexity: 20.8958, time_taken_in_seconds: 21
Epoch [1/1], Step [3327/13804], Loss: 2.8487, Perplexity: 17.2647, time_taken_in_seconds: 22
Epoch [1/1], Step [3328/13804], Loss: 2.9816, Perplexity: 19.7194, time_taken_in_seconds: 23
Epoch [1/1], Step [3329/13804], Loss: 2.8278, Perplexity: 16.9086, time_taken_in_seconds: 24
Epoch [1/1], Step [3330/13804], Loss: 2.7432, Perplexity: 15.5361, time_taken_in_seconds: 25
Epoch [1/1], Step [3331/13804], Loss: 2.9179, Perplexity: 18.5026, time_taken_in_seconds: 26
Epoch [1/1], Step [3332/13804], Loss: 3.0308, Perplexity: 20.7136, time_taken_in_seconds: 27
Epoch [1/1], Step [3333/13804], Loss: 2.9345, Perplexity: 18.8122, time_taken_in_seconds: 27
Epoch [1/1], Step [3334/13804], Loss: 2.5930, Perplexity: 13.3696, time_taken_in_seconds: 28
Epoch [1/1], Step [3335/13804], Loss: 2.5443, Perplexity: 12.7339, time_taken_in_seconds: 29
Epoch [1/1], Step [3336/13804], Loss: 2.8009, Perplexity: 16.4594, time_taken_in_seconds: 30
Epoch [1/1], Step [3337/13804], Loss: 3.0535, Perplexity: 21.1893, time_taken_in_seconds: 31
Epoch [1/1], Step [3338/13804], Loss: 2.6682, Perplexity: 14.4145, time_taken_in_seconds: 32
Epoch [1/1], Step [3339/13804], Loss: 3.0066, Perplexity: 20.2188, time_taken_in_seconds: 32
Epoch [1/1], Step [3340/13804], Loss: 3.1294, Perplexity: 22.8598, time_taken_in_seconds: 33
Epoch [1/1], Step [3341/13804], Loss: 2.7476, Perplexity: 15.6047, time_taken_in_seconds: 34
Epoch [1/1], Step [3342/13804], Loss: 2.8181, Perplexity: 16.7447, time_taken_in_seconds: 35
Epoch [1/1], Step [3343/13804], Loss: 3.3948, Perplexity: 29.8092, time_taken_in_seconds: 36
Epoch [1/1], Step [3344/13804], Loss: 2.8391, Perplexity: 17.1004, time_taken_in_seconds: 37
Epoch [1/1], Step [3345/13804], Loss: 2.5402, Perplexity: 12.6823, time_taken_in_seconds: 37
Epoch [1/1], Step [3346/13804], Loss: 3.1362, Perplexity: 23.0166, time_taken_in_seconds: 38
Epoch [1/1], Step [3347/13804], Loss: 2.5633, Perplexity: 12.9787, time_taken_in_seconds: 39
Epoch [1/1], Step [3348/13804], Loss: 2.7633, Perplexity: 15.8527, time_taken_in_seconds: 40
Epoch [1/1], Step [3349/13804], Loss: 2.5335, Perplexity: 12.5974, time_taken_in_seconds: 41
Epoch [1/1], Step [3350/13804], Loss: 2.6019, Perplexity: 13.4892, time_taken_in_seconds: 41
Epoch [1/1], Step [3351/13804], Loss: 2.8401, Perplexity: 17.1168, time_taken_in_seconds: 42
Epoch [1/1], Step [3352/13804], Loss: 3.0137, Perplexity: 20.3621, time_taken_in_seconds: 43
Epoch [1/1], Step [3353/13804], Loss: 2.8090, Perplexity: 16.5932, time_taken_in_seconds: 44
Epoch [1/1], Step [3354/13804], Loss: 2.9697, Perplexity: 19.4861, time_taken_in_seconds: 45
Epoch [1/1], Step [3355/13804], Loss: 2.8554, Perplexity: 17.3819, time_taken_in_seconds: 46
Epoch [1/1], Step [3356/13804], Loss: 2.8145, Perplexity: 16.6852, time_taken_in_seconds: 47
Epoch [1/1], Step [3357/13804], Loss: 2.8532, Perplexity: 17.3436, time_taken_in_seconds: 47
Epoch [1/1], Step [3358/13804], Loss: 2.6334, Perplexity: 13.9215, time_taken_in_seconds: 48
Epoch [1/1], Step [3359/13804], Loss: 2.6632, Perplexity: 14.3416, time_taken_in_seconds: 49
Epoch [1/1], Step [3360/13804], Loss: 3.0589, Perplexity: 21.3047, time_taken_in_seconds: 50
Epoch [1/1], Step [3361/13804], Loss: 3.0598, Perplexity: 21.3231, time_taken_in_seconds: 51
Epoch [1/1], Step [3362/13804], Loss: 2.9485, Perplexity: 19.0766, time_taken_in_seconds: 52
Epoch [1/1], Step [3363/13804], Loss: 2.8126, Perplexity: 16.6533, time_taken_in_seconds: 52
Epoch [1/1], Step [3364/13804], Loss: 2.6413, Perplexity: 14.0311, time_taken_in_seconds: 53
Epoch [1/1], Step [3365/13804], Loss: 2.8284, Perplexity: 16.9187, time_taken_in_seconds: 54
Epoch [1/1], Step [3366/13804], Loss: 2.7055, Perplexity: 14.9621, time_taken_in_seconds: 55
Epoch [1/1], Step [3367/13804], Loss: 2.8677, Perplexity: 17.5963, time_taken_in_seconds: 56
Epoch [1/1], Step [3368/13804], Loss: 2.8440, Perplexity: 17.1837, time_taken_in_seconds: 56
Epoch [1/1], Step [3369/13804], Loss: 2.8286, Perplexity: 16.9216, time_taken_in_seconds: 57
Epoch [1/1], Step [3370/13804], Loss: 2.8556, Perplexity: 17.3842, time_taken_in_seconds: 58
Epoch [1/1], Step [3371/13804], Loss: 2.7485, Perplexity: 15.6192, time_taken_in_seconds: 59
Epoch [1/1], Step [3372/13804], Loss: 2.7296, Perplexity: 15.3274, time_taken_in_seconds: 60
Epoch [1/1], Step [3373/13804], Loss: 3.1275, Perplexity: 22.8161, time_taken_in_seconds: 61
Epoch [1/1], Step [3374/13804], Loss: 2.8929, Perplexity: 18.0464, time_taken_in_seconds: 62
Epoch [1/1], Step [3375/13804], Loss: 2.8753, Perplexity: 17.7315, time_taken_in_seconds: 62
Epoch [1/1], Step [3376/13804], Loss: 2.8899, Perplexity: 17.9913, time_taken_in_seconds: 63
Epoch [1/1], Step [3377/13804], Loss: 3.0335, Perplexity: 20.7690, time_taken_in_seconds: 64
Epoch [1/1], Step [3378/13804], Loss: 3.3671, Perplexity: 28.9935, time_taken_in_seconds: 65
Epoch [1/1], Step [3379/13804], Loss: 2.6545, Perplexity: 14.2175, time_taken_in_seconds: 66
Epoch [1/1], Step [3380/13804], Loss: 3.4471, Perplexity: 31.4084, time_taken_in_seconds: 67
Epoch [1/1], Step [3381/13804], Loss: 3.8503, Perplexity: 47.0072, time_taken_in_seconds: 67
Epoch [1/1], Step [3382/13804], Loss: 2.7055, Perplexity: 14.9614, time_taken_in_seconds: 68
Epoch [1/1], Step [3383/13804], Loss: 2.6539, Perplexity: 14.2087, time_taken_in_seconds: 69
Epoch [1/1], Step [3384/13804], Loss: 2.6868, Perplexity: 14.6853, time_taken_in_seconds: 70
Epoch [1/1], Step [3385/13804], Loss: 2.9495, Perplexity: 19.0967, time_taken_in_seconds: 71
Epoch [1/1], Step [3386/13804], Loss: 3.1114, Perplexity: 22.4533, time_taken_in_seconds: 71
Epoch [1/1], Step [3387/13804], Loss: 2.8978, Perplexity: 18.1333, time_taken_in_seconds: 72
Epoch [1/1], Step [3388/13804], Loss: 3.2576, Perplexity: 25.9860, time_taken_in_seconds: 73
Epoch [1/1], Step [3389/13804], Loss: 2.9468, Perplexity: 19.0442, time_taken_in_seconds: 74
Epoch [1/1], Step [3390/13804], Loss: 2.9443, Perplexity: 18.9979, time_taken_in_seconds: 75
Epoch [1/1], Step [3391/13804], Loss: 2.5376, Perplexity: 12.6494, time_taken_in_seconds: 76
Epoch [1/1], Step [3392/13804], Loss: 2.8038, Perplexity: 16.5080, time_taken_in_seconds: 76
Epoch [1/1], Step [3393/13804], Loss: 2.8313, Perplexity: 16.9673, time_taken_in_seconds: 77
Epoch [1/1], Step [3394/13804], Loss: 2.8939, Perplexity: 18.0635, time_taken_in_seconds: 78
Epoch [1/1], Step [3395/13804], Loss: 2.6527, Perplexity: 14.1922, time_taken_in_seconds: 79
Epoch [1/1], Step [3396/13804], Loss: 2.7647, Perplexity: 15.8746, time_taken_in_seconds: 80
Epoch [1/1], Step [3397/13804], Loss: 2.5447, Perplexity: 12.7397, time_taken_in_seconds: 81
Epoch [1/1], Step [3398/13804], Loss: 2.4631, Perplexity: 11.7411, time_taken_in_seconds: 81
Epoch [1/1], Step [3399/13804], Loss: 2.4167, Perplexity: 11.2091, time_taken_in_seconds: 82
Epoch [1/1], Step [3400/13804], Loss: 2.9148, Perplexity: 18.4449, time_taken_in_seconds: 83
Epoch [1/1], Step [3401/13804], Loss: 3.0846, Perplexity: 21.8583, time_taken_in_seconds: 0
Epoch [1/1], Step [3402/13804], Loss: 2.7954, Perplexity: 16.3686, time_taken_in_seconds: 1
Epoch [1/1], Step [3403/13804], Loss: 2.7783, Perplexity: 16.0922, time_taken_in_seconds: 2
Epoch [1/1], Step [3404/13804], Loss: 2.6340, Perplexity: 13.9298, time_taken_in_seconds: 3
Epoch [1/1], Step [3405/13804], Loss: 2.8028, Perplexity: 16.4915, time_taken_in_seconds: 4
Epoch [1/1], Step [3406/13804], Loss: 3.1536, Perplexity: 23.4198, time_taken_in_seconds: 5
Epoch [1/1], Step [3407/13804], Loss: 2.5422, Perplexity: 12.7073, time_taken_in_seconds: 6
Epoch [1/1], Step [3408/13804], Loss: 2.5851, Perplexity: 13.2651, time_taken_in_seconds: 6
Epoch [1/1], Step [3409/13804], Loss: 2.8150, Perplexity: 16.6925, time_taken_in_seconds: 7
Epoch [1/1], Step [3410/13804], Loss: 2.6656, Perplexity: 14.3764, time_taken_in_seconds: 8
Epoch [1/1], Step [3411/13804], Loss: 2.7295, Perplexity: 15.3251, time_taken_in_seconds: 9
Epoch [1/1], Step [3412/13804], Loss: 2.9508, Perplexity: 19.1212, time_taken_in_seconds: 10
Epoch [1/1], Step [3413/13804], Loss: 2.4466, Perplexity: 11.5485, time_taken_in_seconds: 11
Epoch [1/1], Step [3414/13804], Loss: 2.6243, Perplexity: 13.7955, time_taken_in_seconds: 11
Epoch [1/1], Step [3415/13804], Loss: 2.9588, Perplexity: 19.2749, time_taken_in_seconds: 12
Epoch [1/1], Step [3416/13804], Loss: 3.0475, Perplexity: 21.0628, time_taken_in_seconds: 13
Epoch [1/1], Step [3417/13804], Loss: 2.7893, Perplexity: 16.2694, time_taken_in_seconds: 14
Epoch [1/1], Step [3418/13804], Loss: 3.4801, Perplexity: 32.4642, time_taken_in_seconds: 15
Epoch [1/1], Step [3419/13804], Loss: 2.6847, Perplexity: 14.6545, time_taken_in_seconds: 16
Epoch [1/1], Step [3420/13804], Loss: 2.7139, Perplexity: 15.0884, time_taken_in_seconds: 16
Epoch [1/1], Step [3421/13804], Loss: 2.8287, Perplexity: 16.9240, time_taken_in_seconds: 17
Epoch [1/1], Step [3422/13804], Loss: 2.6011, Perplexity: 13.4789, time_taken_in_seconds: 18
Epoch [1/1], Step [3423/13804], Loss: 3.0301, Perplexity: 20.6991, time_taken_in_seconds: 19
Epoch [1/1], Step [3424/13804], Loss: 2.8994, Perplexity: 18.1627, time_taken_in_seconds: 20
Epoch [1/1], Step [3425/13804], Loss: 2.8155, Perplexity: 16.7017, time_taken_in_seconds: 21
Epoch [1/1], Step [3426/13804], Loss: 2.4867, Perplexity: 12.0219, time_taken_in_seconds: 21
Epoch [1/1], Step [3427/13804], Loss: 3.1353, Perplexity: 22.9965, time_taken_in_seconds: 22
Epoch [1/1], Step [3428/13804], Loss: 2.6976, Perplexity: 14.8445, time_taken_in_seconds: 23
Epoch [1/1], Step [3429/13804], Loss: 3.4230, Perplexity: 30.6601, time_taken_in_seconds: 24
Epoch [1/1], Step [3430/13804], Loss: 2.9858, Perplexity: 19.8015, time_taken_in_seconds: 25
Epoch [1/1], Step [3431/13804], Loss: 2.8706, Perplexity: 17.6483, time_taken_in_seconds: 26
Epoch [1/1], Step [3432/13804], Loss: 3.0426, Perplexity: 20.9590, time_taken_in_seconds: 26
Epoch [1/1], Step [3433/13804], Loss: 2.5965, Perplexity: 13.4173, time_taken_in_seconds: 27
Epoch [1/1], Step [3434/13804], Loss: 2.3674, Perplexity: 10.6691, time_taken_in_seconds: 28
Epoch [1/1], Step [3435/13804], Loss: 2.4921, Perplexity: 12.0861, time_taken_in_seconds: 29
Epoch [1/1], Step [3436/13804], Loss: 2.5690, Perplexity: 13.0527, time_taken_in_seconds: 30
Epoch [1/1], Step [3437/13804], Loss: 3.1001, Perplexity: 22.2001, time_taken_in_seconds: 31
Epoch [1/1], Step [3438/13804], Loss: 2.8950, Perplexity: 18.0831, time_taken_in_seconds: 31
Epoch [1/1], Step [3439/13804], Loss: 2.8266, Perplexity: 16.8875, time_taken_in_seconds: 32
Epoch [1/1], Step [3440/13804], Loss: 2.8410, Perplexity: 17.1327, time_taken_in_seconds: 33
Epoch [1/1], Step [3441/13804], Loss: 2.8007, Perplexity: 16.4562, time_taken_in_seconds: 34
Epoch [1/1], Step [3442/13804], Loss: 2.8944, Perplexity: 18.0734, time_taken_in_seconds: 35
Epoch [1/1], Step [3443/13804], Loss: 3.0172, Perplexity: 20.4333, time_taken_in_seconds: 36
Epoch [1/1], Step [3444/13804], Loss: 3.1932, Perplexity: 24.3661, time_taken_in_seconds: 37
Epoch [1/1], Step [3445/13804], Loss: 2.5560, Perplexity: 12.8839, time_taken_in_seconds: 37
Epoch [1/1], Step [3446/13804], Loss: 2.5707, Perplexity: 13.0744, time_taken_in_seconds: 38
Epoch [1/1], Step [3447/13804], Loss: 2.9760, Perplexity: 19.6086, time_taken_in_seconds: 39
Epoch [1/1], Step [3448/13804], Loss: 2.6022, Perplexity: 13.4937, time_taken_in_seconds: 40
Epoch [1/1], Step [3449/13804], Loss: 2.8897, Perplexity: 17.9884, time_taken_in_seconds: 41
Epoch [1/1], Step [3450/13804], Loss: 2.9139, Perplexity: 18.4281, time_taken_in_seconds: 42
Epoch [1/1], Step [3451/13804], Loss: 2.8510, Perplexity: 17.3047, time_taken_in_seconds: 42
Epoch [1/1], Step [3452/13804], Loss: 3.1284, Perplexity: 22.8368, time_taken_in_seconds: 43
Epoch [1/1], Step [3453/13804], Loss: 2.7013, Perplexity: 14.8989, time_taken_in_seconds: 44
Epoch [1/1], Step [3454/13804], Loss: 2.6159, Perplexity: 13.6794, time_taken_in_seconds: 45
Epoch [1/1], Step [3455/13804], Loss: 2.8143, Perplexity: 16.6821, time_taken_in_seconds: 46
Epoch [1/1], Step [3456/13804], Loss: 2.7358, Perplexity: 15.4217, time_taken_in_seconds: 47
Epoch [1/1], Step [3457/13804], Loss: 2.7149, Perplexity: 15.1033, time_taken_in_seconds: 47
Epoch [1/1], Step [3458/13804], Loss: 3.1524, Perplexity: 23.3932, time_taken_in_seconds: 48
Epoch [1/1], Step [3459/13804], Loss: 2.6830, Perplexity: 14.6287, time_taken_in_seconds: 49
Epoch [1/1], Step [3460/13804], Loss: 2.7101, Perplexity: 15.0302, time_taken_in_seconds: 50
Epoch [1/1], Step [3461/13804], Loss: 2.9098, Perplexity: 18.3531, time_taken_in_seconds: 51
Epoch [1/1], Step [3462/13804], Loss: 2.6313, Perplexity: 13.8914, time_taken_in_seconds: 52
Epoch [1/1], Step [3463/13804], Loss: 2.8989, Perplexity: 18.1544, time_taken_in_seconds: 52
Epoch [1/1], Step [3464/13804], Loss: 3.0919, Perplexity: 22.0181, time_taken_in_seconds: 53
Epoch [1/1], Step [3465/13804], Loss: 2.9649, Perplexity: 19.3936, time_taken_in_seconds: 54
Epoch [1/1], Step [3466/13804], Loss: 2.6711, Perplexity: 14.4561, time_taken_in_seconds: 55
Epoch [1/1], Step [3467/13804], Loss: 3.0758, Perplexity: 21.6675, time_taken_in_seconds: 56
Epoch [1/1], Step [3468/13804], Loss: 3.1799, Perplexity: 24.0454, time_taken_in_seconds: 57
Epoch [1/1], Step [3469/13804], Loss: 2.7033, Perplexity: 14.9295, time_taken_in_seconds: 57
Epoch [1/1], Step [3470/13804], Loss: 2.9436, Perplexity: 18.9836, time_taken_in_seconds: 58
Epoch [1/1], Step [3471/13804], Loss: 2.5159, Perplexity: 12.3772, time_taken_in_seconds: 59
Epoch [1/1], Step [3472/13804], Loss: 2.9705, Perplexity: 19.5014, time_taken_in_seconds: 60
Epoch [1/1], Step [3473/13804], Loss: 2.5260, Perplexity: 12.5032, time_taken_in_seconds: 61
Epoch [1/1], Step [3474/13804], Loss: 2.7105, Perplexity: 15.0369, time_taken_in_seconds: 62
Epoch [1/1], Step [3475/13804], Loss: 2.6092, Perplexity: 13.5881, time_taken_in_seconds: 63
Epoch [1/1], Step [3476/13804], Loss: 2.9465, Perplexity: 19.0390, time_taken_in_seconds: 64
Epoch [1/1], Step [3477/13804], Loss: 2.4425, Perplexity: 11.5015, time_taken_in_seconds: 64
Epoch [1/1], Step [3478/13804], Loss: 2.4608, Perplexity: 11.7136, time_taken_in_seconds: 65
Epoch [1/1], Step [3479/13804], Loss: 2.5037, Perplexity: 12.2274, time_taken_in_seconds: 66
Epoch [1/1], Step [3480/13804], Loss: 2.3961, Perplexity: 10.9798, time_taken_in_seconds: 67
Epoch [1/1], Step [3481/13804], Loss: 2.9578, Perplexity: 19.2553, time_taken_in_seconds: 68
Epoch [1/1], Step [3482/13804], Loss: 2.9943, Perplexity: 19.9719, time_taken_in_seconds: 69
Epoch [1/1], Step [3483/13804], Loss: 2.7307, Perplexity: 15.3434, time_taken_in_seconds: 69
Epoch [1/1], Step [3484/13804], Loss: 2.6493, Perplexity: 14.1446, time_taken_in_seconds: 70
Epoch [1/1], Step [3485/13804], Loss: 2.4031, Perplexity: 11.0575, time_taken_in_seconds: 71
Epoch [1/1], Step [3486/13804], Loss: 2.7091, Perplexity: 15.0154, time_taken_in_seconds: 72
Epoch [1/1], Step [3487/13804], Loss: 3.1903, Perplexity: 24.2963, time_taken_in_seconds: 73
Epoch [1/1], Step [3488/13804], Loss: 2.8590, Perplexity: 17.4436, time_taken_in_seconds: 74
Epoch [1/1], Step [3489/13804], Loss: 2.9374, Perplexity: 18.8660, time_taken_in_seconds: 74
Epoch [1/1], Step [3490/13804], Loss: 2.8235, Perplexity: 16.8363, time_taken_in_seconds: 75
Epoch [1/1], Step [3491/13804], Loss: 2.9282, Perplexity: 18.6932, time_taken_in_seconds: 76
Epoch [1/1], Step [3492/13804], Loss: 3.0213, Perplexity: 20.5181, time_taken_in_seconds: 77
Epoch [1/1], Step [3493/13804], Loss: 2.7747, Perplexity: 16.0343, time_taken_in_seconds: 78
Epoch [1/1], Step [3494/13804], Loss: 3.1512, Perplexity: 23.3646, time_taken_in_seconds: 79
Epoch [1/1], Step [3495/13804], Loss: 2.9453, Perplexity: 19.0170, time_taken_in_seconds: 79
Epoch [1/1], Step [3496/13804], Loss: 3.8204, Perplexity: 45.6224, time_taken_in_seconds: 80
Epoch [1/1], Step [3497/13804], Loss: 2.4797, Perplexity: 11.9379, time_taken_in_seconds: 81
Epoch [1/1], Step [3498/13804], Loss: 2.8412, Perplexity: 17.1364, time_taken_in_seconds: 82
Epoch [1/1], Step [3499/13804], Loss: 2.5872, Perplexity: 13.2923, time_taken_in_seconds: 83
Epoch [1/1], Step [3500/13804], Loss: 3.1077, Perplexity: 22.3699, time_taken_in_seconds: 84
Epoch [1/1], Step [3501/13804], Loss: 2.7015, Perplexity: 14.9018, time_taken_in_seconds: 0
Epoch [1/1], Step [3502/13804], Loss: 2.7544, Perplexity: 15.7122, time_taken_in_seconds: 1
Epoch [1/1], Step [3503/13804], Loss: 3.0809, Perplexity: 21.7774, time_taken_in_seconds: 2
Epoch [1/1], Step [3504/13804], Loss: 2.5026, Perplexity: 12.2142, time_taken_in_seconds: 3
Epoch [1/1], Step [3505/13804], Loss: 2.6943, Perplexity: 14.7949, time_taken_in_seconds: 4
Epoch [1/1], Step [3506/13804], Loss: 2.8363, Perplexity: 17.0533, time_taken_in_seconds: 5
Epoch [1/1], Step [3507/13804], Loss: 2.6578, Perplexity: 14.2643, time_taken_in_seconds: 5
Epoch [1/1], Step [3508/13804], Loss: 2.6264, Perplexity: 13.8241, time_taken_in_seconds: 6
Epoch [1/1], Step [3509/13804], Loss: 2.4381, Perplexity: 11.4518, time_taken_in_seconds: 7
Epoch [1/1], Step [3510/13804], Loss: 2.7375, Perplexity: 15.4484, time_taken_in_seconds: 8
Epoch [1/1], Step [3511/13804], Loss: 3.2120, Perplexity: 24.8281, time_taken_in_seconds: 9
Epoch [1/1], Step [3512/13804], Loss: 2.4764, Perplexity: 11.8982, time_taken_in_seconds: 9
Epoch [1/1], Step [3513/13804], Loss: 2.5229, Perplexity: 12.4646, time_taken_in_seconds: 10
Epoch [1/1], Step [3514/13804], Loss: 2.6411, Perplexity: 14.0286, time_taken_in_seconds: 11
Epoch [1/1], Step [3515/13804], Loss: 2.6256, Perplexity: 13.8132, time_taken_in_seconds: 12
Epoch [1/1], Step [3516/13804], Loss: 3.0302, Perplexity: 20.7017, time_taken_in_seconds: 13
Epoch [1/1], Step [3517/13804], Loss: 2.9039, Perplexity: 18.2443, time_taken_in_seconds: 14
Epoch [1/1], Step [3518/13804], Loss: 2.3602, Perplexity: 10.5928, time_taken_in_seconds: 14
Epoch [1/1], Step [3519/13804], Loss: 2.8328, Perplexity: 16.9934, time_taken_in_seconds: 15
Epoch [1/1], Step [3520/13804], Loss: 2.9195, Perplexity: 18.5327, time_taken_in_seconds: 16
Epoch [1/1], Step [3521/13804], Loss: 2.7165, Perplexity: 15.1273, time_taken_in_seconds: 17
Epoch [1/1], Step [3522/13804], Loss: 2.7126, Perplexity: 15.0689, time_taken_in_seconds: 18
Epoch [1/1], Step [3523/13804], Loss: 2.7993, Perplexity: 16.4326, time_taken_in_seconds: 18
Epoch [1/1], Step [3524/13804], Loss: 2.9556, Perplexity: 19.2127, time_taken_in_seconds: 19
Epoch [1/1], Step [3525/13804], Loss: 2.7345, Perplexity: 15.4013, time_taken_in_seconds: 20
Epoch [1/1], Step [3526/13804], Loss: 2.9143, Perplexity: 18.4360, time_taken_in_seconds: 21
Epoch [1/1], Step [3527/13804], Loss: 2.9506, Perplexity: 19.1172, time_taken_in_seconds: 22
Epoch [1/1], Step [3528/13804], Loss: 2.9297, Perplexity: 18.7227, time_taken_in_seconds: 23
Epoch [1/1], Step [3529/13804], Loss: 3.3431, Perplexity: 28.3074, time_taken_in_seconds: 23
Epoch [1/1], Step [3530/13804], Loss: 2.7125, Perplexity: 15.0674, time_taken_in_seconds: 24
Epoch [1/1], Step [3531/13804], Loss: 2.7812, Perplexity: 16.1388, time_taken_in_seconds: 25
Epoch [1/1], Step [3532/13804], Loss: 3.2981, Perplexity: 27.0606, time_taken_in_seconds: 26
Epoch [1/1], Step [3533/13804], Loss: 3.9893, Perplexity: 54.0159, time_taken_in_seconds: 27
Epoch [1/1], Step [3534/13804], Loss: 3.2010, Perplexity: 24.5567, time_taken_in_seconds: 28
Epoch [1/1], Step [3535/13804], Loss: 2.6907, Perplexity: 14.7416, time_taken_in_seconds: 29
Epoch [1/1], Step [3536/13804], Loss: 3.7343, Perplexity: 41.8571, time_taken_in_seconds: 29
Epoch [1/1], Step [3537/13804], Loss: 2.7568, Perplexity: 15.7491, time_taken_in_seconds: 30
Epoch [1/1], Step [3538/13804], Loss: 2.8986, Perplexity: 18.1489, time_taken_in_seconds: 31
Epoch [1/1], Step [3539/13804], Loss: 3.0124, Perplexity: 20.3352, time_taken_in_seconds: 32
Epoch [1/1], Step [3540/13804], Loss: 3.0556, Perplexity: 21.2344, time_taken_in_seconds: 33
Epoch [1/1], Step [3541/13804], Loss: 3.2225, Perplexity: 25.0904, time_taken_in_seconds: 34
Epoch [1/1], Step [3542/13804], Loss: 2.6909, Perplexity: 14.7451, time_taken_in_seconds: 34
Epoch [1/1], Step [3543/13804], Loss: 2.6771, Perplexity: 14.5435, time_taken_in_seconds: 35
Epoch [1/1], Step [3544/13804], Loss: 2.5902, Perplexity: 13.3319, time_taken_in_seconds: 36
Epoch [1/1], Step [3545/13804], Loss: 4.9835, Perplexity: 145.9869, time_taken_in_seconds: 37
Epoch [1/1], Step [3546/13804], Loss: 2.8891, Perplexity: 17.9766, time_taken_in_seconds: 38
Epoch [1/1], Step [3547/13804], Loss: 2.7923, Perplexity: 16.3185, time_taken_in_seconds: 39
Epoch [1/1], Step [3548/13804], Loss: 2.7913, Perplexity: 16.3027, time_taken_in_seconds: 40
Epoch [1/1], Step [3549/13804], Loss: 3.4414, Perplexity: 31.2317, time_taken_in_seconds: 40
Epoch [1/1], Step [3550/13804], Loss: 2.9048, Perplexity: 18.2615, time_taken_in_seconds: 41
Epoch [1/1], Step [3551/13804], Loss: 3.2213, Perplexity: 25.0617, time_taken_in_seconds: 42
Epoch [1/1], Step [3552/13804], Loss: 2.8992, Perplexity: 18.1603, time_taken_in_seconds: 43
Epoch [1/1], Step [3553/13804], Loss: 2.5215, Perplexity: 12.4468, time_taken_in_seconds: 44
Epoch [1/1], Step [3554/13804], Loss: 3.1145, Perplexity: 22.5226, time_taken_in_seconds: 45
Epoch [1/1], Step [3555/13804], Loss: 2.6967, Perplexity: 14.8314, time_taken_in_seconds: 45
Epoch [1/1], Step [3556/13804], Loss: 2.8670, Perplexity: 17.5847, time_taken_in_seconds: 46
Epoch [1/1], Step [3557/13804], Loss: 2.6603, Perplexity: 14.3009, time_taken_in_seconds: 47
Epoch [1/1], Step [3558/13804], Loss: 2.4944, Perplexity: 12.1150, time_taken_in_seconds: 48
Epoch [1/1], Step [3559/13804], Loss: 3.6492, Perplexity: 38.4424, time_taken_in_seconds: 49
Epoch [1/1], Step [3560/13804], Loss: 2.6760, Perplexity: 14.5266, time_taken_in_seconds: 50
Epoch [1/1], Step [3561/13804], Loss: 2.9015, Perplexity: 18.2016, time_taken_in_seconds: 51
Epoch [1/1], Step [3562/13804], Loss: 2.7109, Perplexity: 15.0421, time_taken_in_seconds: 51
Epoch [1/1], Step [3563/13804], Loss: 3.1498, Perplexity: 23.3315, time_taken_in_seconds: 52
Epoch [1/1], Step [3564/13804], Loss: 2.8704, Perplexity: 17.6448, time_taken_in_seconds: 53
Epoch [1/1], Step [3565/13804], Loss: 2.9879, Perplexity: 19.8437, time_taken_in_seconds: 54
Epoch [1/1], Step [3566/13804], Loss: 2.9322, Perplexity: 18.7694, time_taken_in_seconds: 55
Epoch [1/1], Step [3567/13804], Loss: 2.8588, Perplexity: 17.4401, time_taken_in_seconds: 55
Epoch [1/1], Step [3568/13804], Loss: 3.1200, Perplexity: 22.6455, time_taken_in_seconds: 56
Epoch [1/1], Step [3569/13804], Loss: 2.9132, Perplexity: 18.4152, time_taken_in_seconds: 57
Epoch [1/1], Step [3570/13804], Loss: 2.9374, Perplexity: 18.8664, time_taken_in_seconds: 58
Epoch [1/1], Step [3571/13804], Loss: 2.9147, Perplexity: 18.4431, time_taken_in_seconds: 59
Epoch [1/1], Step [3572/13804], Loss: 2.9024, Perplexity: 18.2169, time_taken_in_seconds: 60
Epoch [1/1], Step [3573/13804], Loss: 3.0304, Perplexity: 20.7051, time_taken_in_seconds: 60
Epoch [1/1], Step [3574/13804], Loss: 2.6121, Perplexity: 13.6278, time_taken_in_seconds: 61
Epoch [1/1], Step [3575/13804], Loss: 2.9371, Perplexity: 18.8603, time_taken_in_seconds: 62
Epoch [1/1], Step [3576/13804], Loss: 2.7476, Perplexity: 15.6047, time_taken_in_seconds: 63
Epoch [1/1], Step [3577/13804], Loss: 2.8677, Perplexity: 17.5962, time_taken_in_seconds: 64
Epoch [1/1], Step [3578/13804], Loss: 2.6890, Perplexity: 14.7169, time_taken_in_seconds: 65
Epoch [1/1], Step [3579/13804], Loss: 3.0563, Perplexity: 21.2486, time_taken_in_seconds: 65
Epoch [1/1], Step [3580/13804], Loss: 2.5096, Perplexity: 12.3000, time_taken_in_seconds: 66
Epoch [1/1], Step [3581/13804], Loss: 3.0222, Perplexity: 20.5368, time_taken_in_seconds: 67
Epoch [1/1], Step [3582/13804], Loss: 3.1756, Perplexity: 23.9411, time_taken_in_seconds: 68
Epoch [1/1], Step [3583/13804], Loss: 2.7365, Perplexity: 15.4333, time_taken_in_seconds: 69
Epoch [1/1], Step [3584/13804], Loss: 3.2096, Perplexity: 24.7685, time_taken_in_seconds: 70
Epoch [1/1], Step [3585/13804], Loss: 3.0104, Perplexity: 20.2958, time_taken_in_seconds: 70
Epoch [1/1], Step [3586/13804], Loss: 2.8209, Perplexity: 16.7912, time_taken_in_seconds: 71
Epoch [1/1], Step [3587/13804], Loss: 2.5933, Perplexity: 13.3740, time_taken_in_seconds: 72
Epoch [1/1], Step [3588/13804], Loss: 2.6474, Perplexity: 14.1170, time_taken_in_seconds: 73
Epoch [1/1], Step [3589/13804], Loss: 2.8466, Perplexity: 17.2300, time_taken_in_seconds: 74
Epoch [1/1], Step [3590/13804], Loss: 2.7113, Perplexity: 15.0492, time_taken_in_seconds: 75
Epoch [1/1], Step [3591/13804], Loss: 2.7346, Perplexity: 15.4029, time_taken_in_seconds: 75
Epoch [1/1], Step [3592/13804], Loss: 2.8353, Perplexity: 17.0359, time_taken_in_seconds: 76
Epoch [1/1], Step [3593/13804], Loss: 2.9620, Perplexity: 19.3375, time_taken_in_seconds: 77
Epoch [1/1], Step [3594/13804], Loss: 2.6633, Perplexity: 14.3429, time_taken_in_seconds: 78
Epoch [1/1], Step [3595/13804], Loss: 2.7728, Perplexity: 16.0033, time_taken_in_seconds: 79
Epoch [1/1], Step [3596/13804], Loss: 2.9140, Perplexity: 18.4302, time_taken_in_seconds: 80
Epoch [1/1], Step [3597/13804], Loss: 2.7727, Perplexity: 16.0018, time_taken_in_seconds: 80
Epoch [1/1], Step [3598/13804], Loss: 2.6952, Perplexity: 14.8085, time_taken_in_seconds: 81
Epoch [1/1], Step [3599/13804], Loss: 2.7652, Perplexity: 15.8818, time_taken_in_seconds: 82
Epoch [1/1], Step [3600/13804], Loss: 3.2010, Perplexity: 24.5575, time_taken_in_seconds: 83
Epoch [1/1], Step [3601/13804], Loss: 2.8754, Perplexity: 17.7319, time_taken_in_seconds: 0
Epoch [1/1], Step [3602/13804], Loss: 2.8727, Perplexity: 17.6853, time_taken_in_seconds: 1
Epoch [1/1], Step [3603/13804], Loss: 2.8505, Perplexity: 17.2971, time_taken_in_seconds: 2
Epoch [1/1], Step [3604/13804], Loss: 3.3335, Perplexity: 28.0376, time_taken_in_seconds: 3
Epoch [1/1], Step [3605/13804], Loss: 2.7806, Perplexity: 16.1281, time_taken_in_seconds: 4
Epoch [1/1], Step [3606/13804], Loss: 2.5544, Perplexity: 12.8634, time_taken_in_seconds: 4
Epoch [1/1], Step [3607/13804], Loss: 2.8549, Perplexity: 17.3725, time_taken_in_seconds: 5
Epoch [1/1], Step [3608/13804], Loss: 2.6335, Perplexity: 13.9230, time_taken_in_seconds: 6
Epoch [1/1], Step [3609/13804], Loss: 2.7015, Perplexity: 14.9027, time_taken_in_seconds: 7
Epoch [1/1], Step [3610/13804], Loss: 2.7129, Perplexity: 15.0735, time_taken_in_seconds: 8
Epoch [1/1], Step [3611/13804], Loss: 3.1023, Perplexity: 22.2497, time_taken_in_seconds: 9
Epoch [1/1], Step [3612/13804], Loss: 2.7591, Perplexity: 15.7854, time_taken_in_seconds: 9
Epoch [1/1], Step [3613/13804], Loss: 2.6284, Perplexity: 13.8510, time_taken_in_seconds: 10
Epoch [1/1], Step [3614/13804], Loss: 2.3849, Perplexity: 10.8578, time_taken_in_seconds: 11
Epoch [1/1], Step [3615/13804], Loss: 2.6728, Perplexity: 14.4797, time_taken_in_seconds: 12
Epoch [1/1], Step [3616/13804], Loss: 2.6507, Perplexity: 14.1642, time_taken_in_seconds: 13
Epoch [1/1], Step [3617/13804], Loss: 3.1996, Perplexity: 24.5216, time_taken_in_seconds: 14
Epoch [1/1], Step [3618/13804], Loss: 2.6654, Perplexity: 14.3743, time_taken_in_seconds: 14
Epoch [1/1], Step [3619/13804], Loss: 3.0527, Perplexity: 21.1723, time_taken_in_seconds: 16
Epoch [1/1], Step [3620/13804], Loss: 2.7665, Perplexity: 15.9031, time_taken_in_seconds: 16
Epoch [1/1], Step [3621/13804], Loss: 2.7241, Perplexity: 15.2432, time_taken_in_seconds: 17
Epoch [1/1], Step [3622/13804], Loss: 3.5576, Perplexity: 35.0804, time_taken_in_seconds: 18
Epoch [1/1], Step [3623/13804], Loss: 3.3500, Perplexity: 28.5027, time_taken_in_seconds: 19
Epoch [1/1], Step [3624/13804], Loss: 3.5135, Perplexity: 33.5660, time_taken_in_seconds: 20
Epoch [1/1], Step [3625/13804], Loss: 2.8415, Perplexity: 17.1415, time_taken_in_seconds: 20
Epoch [1/1], Step [3626/13804], Loss: 2.6659, Perplexity: 14.3811, time_taken_in_seconds: 21
Epoch [1/1], Step [3627/13804], Loss: 2.5509, Perplexity: 12.8190, time_taken_in_seconds: 22
Epoch [1/1], Step [3628/13804], Loss: 3.1619, Perplexity: 23.6165, time_taken_in_seconds: 23
Epoch [1/1], Step [3629/13804], Loss: 2.4856, Perplexity: 12.0083, time_taken_in_seconds: 24
Epoch [1/1], Step [3630/13804], Loss: 2.4889, Perplexity: 12.0478, time_taken_in_seconds: 25
Epoch [1/1], Step [3631/13804], Loss: 2.7331, Perplexity: 15.3801, time_taken_in_seconds: 25
Epoch [1/1], Step [3632/13804], Loss: 2.8251, Perplexity: 16.8634, time_taken_in_seconds: 26
Epoch [1/1], Step [3633/13804], Loss: 2.6168, Perplexity: 13.6918, time_taken_in_seconds: 27
Epoch [1/1], Step [3634/13804], Loss: 2.8122, Perplexity: 16.6458, time_taken_in_seconds: 28
Epoch [1/1], Step [3635/13804], Loss: 3.0786, Perplexity: 21.7282, time_taken_in_seconds: 29
Epoch [1/1], Step [3636/13804], Loss: 2.4844, Perplexity: 11.9938, time_taken_in_seconds: 30
Epoch [1/1], Step [3637/13804], Loss: 2.7164, Perplexity: 15.1254, time_taken_in_seconds: 30
Epoch [1/1], Step [3638/13804], Loss: 2.8729, Perplexity: 17.6879, time_taken_in_seconds: 31
Epoch [1/1], Step [3639/13804], Loss: 2.2542, Perplexity: 9.5278, time_taken_in_seconds: 32
Epoch [1/1], Step [3640/13804], Loss: 3.0453, Perplexity: 21.0168, time_taken_in_seconds: 33
Epoch [1/1], Step [3641/13804], Loss: 2.6250, Perplexity: 13.8049, time_taken_in_seconds: 34
Epoch [1/1], Step [3642/13804], Loss: 2.6328, Perplexity: 13.9121, time_taken_in_seconds: 35
Epoch [1/1], Step [3643/13804], Loss: 2.8008, Perplexity: 16.4571, time_taken_in_seconds: 35
Epoch [1/1], Step [3644/13804], Loss: 2.9810, Perplexity: 19.7074, time_taken_in_seconds: 36
Epoch [1/1], Step [3645/13804], Loss: 2.7369, Perplexity: 15.4388, time_taken_in_seconds: 37
Epoch [1/1], Step [3646/13804], Loss: 2.3508, Perplexity: 10.4938, time_taken_in_seconds: 38
Epoch [1/1], Step [3647/13804], Loss: 2.7486, Perplexity: 15.6209, time_taken_in_seconds: 39
Epoch [1/1], Step [3648/13804], Loss: 2.9628, Perplexity: 19.3513, time_taken_in_seconds: 40
Epoch [1/1], Step [3649/13804], Loss: 2.9677, Perplexity: 19.4464, time_taken_in_seconds: 40
Epoch [1/1], Step [3650/13804], Loss: 2.5429, Perplexity: 12.7161, time_taken_in_seconds: 41
Epoch [1/1], Step [3651/13804], Loss: 2.7097, Perplexity: 15.0252, time_taken_in_seconds: 42
Epoch [1/1], Step [3652/13804], Loss: 2.7384, Perplexity: 15.4617, time_taken_in_seconds: 43
Epoch [1/1], Step [3653/13804], Loss: 3.0206, Perplexity: 20.5027, time_taken_in_seconds: 44
Epoch [1/1], Step [3654/13804], Loss: 3.0308, Perplexity: 20.7135, time_taken_in_seconds: 44
Epoch [1/1], Step [3655/13804], Loss: 2.5060, Perplexity: 12.2555, time_taken_in_seconds: 45
Epoch [1/1], Step [3656/13804], Loss: 2.8831, Perplexity: 17.8702, time_taken_in_seconds: 46
Epoch [1/1], Step [3657/13804], Loss: 2.3828, Perplexity: 10.8353, time_taken_in_seconds: 47
Epoch [1/1], Step [3658/13804], Loss: 2.8674, Perplexity: 17.5906, time_taken_in_seconds: 48
Epoch [1/1], Step [3659/13804], Loss: 2.9300, Perplexity: 18.7278, time_taken_in_seconds: 49
Epoch [1/1], Step [3660/13804], Loss: 2.6445, Perplexity: 14.0767, time_taken_in_seconds: 49
Epoch [1/1], Step [3661/13804], Loss: 2.9640, Perplexity: 19.3760, time_taken_in_seconds: 50
Epoch [1/1], Step [3662/13804], Loss: 2.8065, Perplexity: 16.5521, time_taken_in_seconds: 51
Epoch [1/1], Step [3663/13804], Loss: 2.7899, Perplexity: 16.2795, time_taken_in_seconds: 52
Epoch [1/1], Step [3664/13804], Loss: 2.9067, Perplexity: 18.2958, time_taken_in_seconds: 53
Epoch [1/1], Step [3665/13804], Loss: 2.8001, Perplexity: 16.4455, time_taken_in_seconds: 54
Epoch [1/1], Step [3666/13804], Loss: 3.2859, Perplexity: 26.7335, time_taken_in_seconds: 54
Epoch [1/1], Step [3667/13804], Loss: 3.5039, Perplexity: 33.2463, time_taken_in_seconds: 55
Epoch [1/1], Step [3668/13804], Loss: 2.9689, Perplexity: 19.4706, time_taken_in_seconds: 56
Epoch [1/1], Step [3669/13804], Loss: 2.8433, Perplexity: 17.1731, time_taken_in_seconds: 57
Epoch [1/1], Step [3670/13804], Loss: 2.8181, Perplexity: 16.7455, time_taken_in_seconds: 58
Epoch [1/1], Step [3671/13804], Loss: 2.9460, Perplexity: 19.0302, time_taken_in_seconds: 59
Epoch [1/1], Step [3672/13804], Loss: 2.6432, Perplexity: 14.0577, time_taken_in_seconds: 59
Epoch [1/1], Step [3673/13804], Loss: 2.7429, Perplexity: 15.5316, time_taken_in_seconds: 60
Epoch [1/1], Step [3674/13804], Loss: 2.7704, Perplexity: 15.9654, time_taken_in_seconds: 61
Epoch [1/1], Step [3675/13804], Loss: 3.1620, Perplexity: 23.6169, time_taken_in_seconds: 62
Epoch [1/1], Step [3676/13804], Loss: 3.3112, Perplexity: 27.4187, time_taken_in_seconds: 63
Epoch [1/1], Step [3677/13804], Loss: 2.8145, Perplexity: 16.6848, time_taken_in_seconds: 64
Epoch [1/1], Step [3678/13804], Loss: 3.0779, Perplexity: 21.7120, time_taken_in_seconds: 64
Epoch [1/1], Step [3679/13804], Loss: 2.6342, Perplexity: 13.9318, time_taken_in_seconds: 65
Epoch [1/1], Step [3680/13804], Loss: 2.5851, Perplexity: 13.2647, time_taken_in_seconds: 66
Epoch [1/1], Step [3681/13804], Loss: 2.8363, Perplexity: 17.0531, time_taken_in_seconds: 67
Epoch [1/1], Step [3682/13804], Loss: 2.6638, Perplexity: 14.3514, time_taken_in_seconds: 68
Epoch [1/1], Step [3683/13804], Loss: 2.8959, Perplexity: 18.1001, time_taken_in_seconds: 69
Epoch [1/1], Step [3684/13804], Loss: 2.5516, Perplexity: 12.8276, time_taken_in_seconds: 69
Epoch [1/1], Step [3685/13804], Loss: 2.7429, Perplexity: 15.5317, time_taken_in_seconds: 70
Epoch [1/1], Step [3686/13804], Loss: 3.0311, Perplexity: 20.7196, time_taken_in_seconds: 71
Epoch [1/1], Step [3687/13804], Loss: 2.5261, Perplexity: 12.5050, time_taken_in_seconds: 72
Epoch [1/1], Step [3688/13804], Loss: 2.8288, Perplexity: 16.9258, time_taken_in_seconds: 73
Epoch [1/1], Step [3689/13804], Loss: 2.5617, Perplexity: 12.9581, time_taken_in_seconds: 73
Epoch [1/1], Step [3690/13804], Loss: 2.3430, Perplexity: 10.4125, time_taken_in_seconds: 74
Epoch [1/1], Step [3691/13804], Loss: 2.8065, Perplexity: 16.5517, time_taken_in_seconds: 75
Epoch [1/1], Step [3692/13804], Loss: 2.7036, Perplexity: 14.9329, time_taken_in_seconds: 76
Epoch [1/1], Step [3693/13804], Loss: 2.9916, Perplexity: 19.9178, time_taken_in_seconds: 77
Epoch [1/1], Step [3694/13804], Loss: 3.2061, Perplexity: 24.6828, time_taken_in_seconds: 78
Epoch [1/1], Step [3695/13804], Loss: 2.7287, Perplexity: 15.3132, time_taken_in_seconds: 79
Epoch [1/1], Step [3696/13804], Loss: 2.7907, Perplexity: 16.2932, time_taken_in_seconds: 79
Epoch [1/1], Step [3697/13804], Loss: 2.7940, Perplexity: 16.3464, time_taken_in_seconds: 80
Epoch [1/1], Step [3698/13804], Loss: 2.5010, Perplexity: 12.1950, time_taken_in_seconds: 81
Epoch [1/1], Step [3699/13804], Loss: 3.2399, Perplexity: 25.5317, time_taken_in_seconds: 82
Epoch [1/1], Step [3700/13804], Loss: 2.6488, Perplexity: 14.1367, time_taken_in_seconds: 83
Epoch [1/1], Step [3701/13804], Loss: 3.0698, Perplexity: 21.5380, time_taken_in_seconds: 0
Epoch [1/1], Step [3702/13804], Loss: 3.1158, Perplexity: 22.5511, time_taken_in_seconds: 1
Epoch [1/1], Step [3703/13804], Loss: 2.6782, Perplexity: 14.5593, time_taken_in_seconds: 2
Epoch [1/1], Step [3704/13804], Loss: 2.7114, Perplexity: 15.0505, time_taken_in_seconds: 3
Epoch [1/1], Step [3705/13804], Loss: 2.6143, Perplexity: 13.6580, time_taken_in_seconds: 4
Epoch [1/1], Step [3706/13804], Loss: 3.0628, Perplexity: 21.3868, time_taken_in_seconds: 4
Epoch [1/1], Step [3707/13804], Loss: 2.6867, Perplexity: 14.6827, time_taken_in_seconds: 5
Epoch [1/1], Step [3708/13804], Loss: 2.7454, Perplexity: 15.5704, time_taken_in_seconds: 6
Epoch [1/1], Step [3709/13804], Loss: 2.7019, Perplexity: 14.9075, time_taken_in_seconds: 7
Epoch [1/1], Step [3710/13804], Loss: 2.8598, Perplexity: 17.4579, time_taken_in_seconds: 8
Epoch [1/1], Step [3711/13804], Loss: 2.8487, Perplexity: 17.2645, time_taken_in_seconds: 9
Epoch [1/1], Step [3712/13804], Loss: 2.6727, Perplexity: 14.4786, time_taken_in_seconds: 9
Epoch [1/1], Step [3713/13804], Loss: 2.4657, Perplexity: 11.7716, time_taken_in_seconds: 10
Epoch [1/1], Step [3714/13804], Loss: 2.5779, Perplexity: 13.1696, time_taken_in_seconds: 11
Epoch [1/1], Step [3715/13804], Loss: 2.6089, Perplexity: 13.5842, time_taken_in_seconds: 12
Epoch [1/1], Step [3716/13804], Loss: 2.6814, Perplexity: 14.6051, time_taken_in_seconds: 13
Epoch [1/1], Step [3717/13804], Loss: 2.7961, Perplexity: 16.3806, time_taken_in_seconds: 14
Epoch [1/1], Step [3718/13804], Loss: 2.9027, Perplexity: 18.2227, time_taken_in_seconds: 14
Epoch [1/1], Step [3719/13804], Loss: 3.1668, Perplexity: 23.7323, time_taken_in_seconds: 15
Epoch [1/1], Step [3720/13804], Loss: 2.5826, Perplexity: 13.2309, time_taken_in_seconds: 16
Epoch [1/1], Step [3721/13804], Loss: 2.9246, Perplexity: 18.6267, time_taken_in_seconds: 17
Epoch [1/1], Step [3722/13804], Loss: 2.8405, Perplexity: 17.1238, time_taken_in_seconds: 18
Epoch [1/1], Step [3723/13804], Loss: 2.6582, Perplexity: 14.2707, time_taken_in_seconds: 18
Epoch [1/1], Step [3724/13804], Loss: 2.5859, Perplexity: 13.2748, time_taken_in_seconds: 19
Epoch [1/1], Step [3725/13804], Loss: 2.5233, Perplexity: 12.4691, time_taken_in_seconds: 20
Epoch [1/1], Step [3726/13804], Loss: 2.8748, Perplexity: 17.7212, time_taken_in_seconds: 21
Epoch [1/1], Step [3727/13804], Loss: 3.1478, Perplexity: 23.2850, time_taken_in_seconds: 22
Epoch [1/1], Step [3728/13804], Loss: 2.6491, Perplexity: 14.1411, time_taken_in_seconds: 23
Epoch [1/1], Step [3729/13804], Loss: 3.1428, Perplexity: 23.1690, time_taken_in_seconds: 23
Epoch [1/1], Step [3730/13804], Loss: 2.9327, Perplexity: 18.7780, time_taken_in_seconds: 24
Epoch [1/1], Step [3731/13804], Loss: 2.9083, Perplexity: 18.3257, time_taken_in_seconds: 25
Epoch [1/1], Step [3732/13804], Loss: 2.7907, Perplexity: 16.2919, time_taken_in_seconds: 26
Epoch [1/1], Step [3733/13804], Loss: 2.7229, Perplexity: 15.2246, time_taken_in_seconds: 27
Epoch [1/1], Step [3734/13804], Loss: 2.8624, Perplexity: 17.5035, time_taken_in_seconds: 28
Epoch [1/1], Step [3735/13804], Loss: 3.1019, Perplexity: 22.2403, time_taken_in_seconds: 28
Epoch [1/1], Step [3736/13804], Loss: 3.1403, Perplexity: 23.1118, time_taken_in_seconds: 29
Epoch [1/1], Step [3737/13804], Loss: 2.6707, Perplexity: 14.4498, time_taken_in_seconds: 30
Epoch [1/1], Step [3738/13804], Loss: 2.8449, Perplexity: 17.1996, time_taken_in_seconds: 31
Epoch [1/1], Step [3739/13804], Loss: 3.0247, Perplexity: 20.5882, time_taken_in_seconds: 32
Epoch [1/1], Step [3740/13804], Loss: 2.6219, Perplexity: 13.7622, time_taken_in_seconds: 33
Epoch [1/1], Step [3741/13804], Loss: 2.8216, Perplexity: 16.8033, time_taken_in_seconds: 33
Epoch [1/1], Step [3742/13804], Loss: 2.6351, Perplexity: 13.9441, time_taken_in_seconds: 34
Epoch [1/1], Step [3743/13804], Loss: 2.6476, Perplexity: 14.1195, time_taken_in_seconds: 35
Epoch [1/1], Step [3744/13804], Loss: 2.9709, Perplexity: 19.5085, time_taken_in_seconds: 36
Epoch [1/1], Step [3745/13804], Loss: 2.5805, Perplexity: 13.2041, time_taken_in_seconds: 37
Epoch [1/1], Step [3746/13804], Loss: 2.8371, Perplexity: 17.0657, time_taken_in_seconds: 38
Epoch [1/1], Step [3747/13804], Loss: 2.7618, Perplexity: 15.8282, time_taken_in_seconds: 38
Epoch [1/1], Step [3748/13804], Loss: 2.6017, Perplexity: 13.4860, time_taken_in_seconds: 39
Epoch [1/1], Step [3749/13804], Loss: 2.7181, Perplexity: 15.1510, time_taken_in_seconds: 40
Epoch [1/1], Step [3750/13804], Loss: 2.7682, Perplexity: 15.9296, time_taken_in_seconds: 41
Epoch [1/1], Step [3751/13804], Loss: 2.8464, Perplexity: 17.2259, time_taken_in_seconds: 42
Epoch [1/1], Step [3752/13804], Loss: 2.8336, Perplexity: 17.0063, time_taken_in_seconds: 43
Epoch [1/1], Step [3753/13804], Loss: 2.9618, Perplexity: 19.3330, time_taken_in_seconds: 43
Epoch [1/1], Step [3754/13804], Loss: 3.0574, Perplexity: 21.2726, time_taken_in_seconds: 44
Epoch [1/1], Step [3755/13804], Loss: 2.5556, Perplexity: 12.8784, time_taken_in_seconds: 45
Epoch [1/1], Step [3756/13804], Loss: 2.8538, Perplexity: 17.3540, time_taken_in_seconds: 46
Epoch [1/1], Step [3757/13804], Loss: 2.7732, Perplexity: 16.0093, time_taken_in_seconds: 47
Epoch [1/1], Step [3758/13804], Loss: 2.8365, Perplexity: 17.0561, time_taken_in_seconds: 48
Epoch [1/1], Step [3759/13804], Loss: 2.3921, Perplexity: 10.9369, time_taken_in_seconds: 48
Epoch [1/1], Step [3760/13804], Loss: 2.9397, Perplexity: 18.9099, time_taken_in_seconds: 49
Epoch [1/1], Step [3761/13804], Loss: 2.6520, Perplexity: 14.1828, time_taken_in_seconds: 50
Epoch [1/1], Step [3762/13804], Loss: 2.8009, Perplexity: 16.4588, time_taken_in_seconds: 51
Epoch [1/1], Step [3763/13804], Loss: 3.0130, Perplexity: 20.3475, time_taken_in_seconds: 52
Epoch [1/1], Step [3764/13804], Loss: 2.9805, Perplexity: 19.6974, time_taken_in_seconds: 52
Epoch [1/1], Step [3765/13804], Loss: 2.6449, Perplexity: 14.0819, time_taken_in_seconds: 54
Epoch [1/1], Step [3766/13804], Loss: 2.4856, Perplexity: 12.0079, time_taken_in_seconds: 54
Epoch [1/1], Step [3767/13804], Loss: 3.0168, Perplexity: 20.4256, time_taken_in_seconds: 55
Epoch [1/1], Step [3768/13804], Loss: 3.1209, Perplexity: 22.6659, time_taken_in_seconds: 56
Epoch [1/1], Step [3769/13804], Loss: 2.6053, Perplexity: 13.5347, time_taken_in_seconds: 57
Epoch [1/1], Step [3770/13804], Loss: 2.7454, Perplexity: 15.5713, time_taken_in_seconds: 58
Epoch [1/1], Step [3771/13804], Loss: 2.6823, Perplexity: 14.6185, time_taken_in_seconds: 59
Epoch [1/1], Step [3772/13804], Loss: 2.9347, Perplexity: 18.8162, time_taken_in_seconds: 59
Epoch [1/1], Step [3773/13804], Loss: 2.8053, Perplexity: 16.5328, time_taken_in_seconds: 60
Epoch [1/1], Step [3774/13804], Loss: 2.5523, Perplexity: 12.8371, time_taken_in_seconds: 61
Epoch [1/1], Step [3775/13804], Loss: 3.0359, Perplexity: 20.8191, time_taken_in_seconds: 62
Epoch [1/1], Step [3776/13804], Loss: 2.9478, Perplexity: 19.0638, time_taken_in_seconds: 63
Epoch [1/1], Step [3777/13804], Loss: 2.6187, Perplexity: 13.7181, time_taken_in_seconds: 64
Epoch [1/1], Step [3778/13804], Loss: 3.1751, Perplexity: 23.9300, time_taken_in_seconds: 64
Epoch [1/1], Step [3779/13804], Loss: 3.2876, Perplexity: 26.7773, time_taken_in_seconds: 65
Epoch [1/1], Step [3780/13804], Loss: 2.7699, Perplexity: 15.9577, time_taken_in_seconds: 66
Epoch [1/1], Step [3781/13804], Loss: 2.8722, Perplexity: 17.6756, time_taken_in_seconds: 67
Epoch [1/1], Step [3782/13804], Loss: 2.6266, Perplexity: 13.8263, time_taken_in_seconds: 68
Epoch [1/1], Step [3783/13804], Loss: 2.7683, Perplexity: 15.9309, time_taken_in_seconds: 68
Epoch [1/1], Step [3784/13804], Loss: 3.1328, Perplexity: 22.9388, time_taken_in_seconds: 69
Epoch [1/1], Step [3785/13804], Loss: 3.0202, Perplexity: 20.4947, time_taken_in_seconds: 70
Epoch [1/1], Step [3786/13804], Loss: 2.4910, Perplexity: 12.0735, time_taken_in_seconds: 71
Epoch [1/1], Step [3787/13804], Loss: 3.1267, Perplexity: 22.7979, time_taken_in_seconds: 72
Epoch [1/1], Step [3788/13804], Loss: 2.7448, Perplexity: 15.5610, time_taken_in_seconds: 73
Epoch [1/1], Step [3789/13804], Loss: 3.0900, Perplexity: 21.9776, time_taken_in_seconds: 73
Epoch [1/1], Step [3790/13804], Loss: 2.6994, Perplexity: 14.8712, time_taken_in_seconds: 74
Epoch [1/1], Step [3791/13804], Loss: 2.4221, Perplexity: 11.2700, time_taken_in_seconds: 75
Epoch [1/1], Step [3792/13804], Loss: 2.4857, Perplexity: 12.0091, time_taken_in_seconds: 76
Epoch [1/1], Step [3793/13804], Loss: 2.7465, Perplexity: 15.5885, time_taken_in_seconds: 77
Epoch [1/1], Step [3794/13804], Loss: 2.2630, Perplexity: 9.6120, time_taken_in_seconds: 78
Epoch [1/1], Step [3795/13804], Loss: 2.7314, Perplexity: 15.3544, time_taken_in_seconds: 78
Epoch [1/1], Step [3796/13804], Loss: 2.7228, Perplexity: 15.2235, time_taken_in_seconds: 79
Epoch [1/1], Step [3797/13804], Loss: 2.4833, Perplexity: 11.9811, time_taken_in_seconds: 80
Epoch [1/1], Step [3798/13804], Loss: 2.8024, Perplexity: 16.4845, time_taken_in_seconds: 81
Epoch [1/1], Step [3799/13804], Loss: 2.6515, Perplexity: 14.1752, time_taken_in_seconds: 82
Epoch [1/1], Step [3800/13804], Loss: 2.9912, Perplexity: 19.9092, time_taken_in_seconds: 83
Epoch [1/1], Step [3801/13804], Loss: 2.4137, Perplexity: 11.1757, time_taken_in_seconds: 0
Epoch [1/1], Step [3802/13804], Loss: 2.7223, Perplexity: 15.2148, time_taken_in_seconds: 1
Epoch [1/1], Step [3803/13804], Loss: 2.7238, Perplexity: 15.2374, time_taken_in_seconds: 2
Epoch [1/1], Step [3804/13804], Loss: 3.0795, Perplexity: 21.7475, time_taken_in_seconds: 3
Epoch [1/1], Step [3805/13804], Loss: 2.5512, Perplexity: 12.8222, time_taken_in_seconds: 4
Epoch [1/1], Step [3806/13804], Loss: 2.4745, Perplexity: 11.8755, time_taken_in_seconds: 4
Epoch [1/1], Step [3807/13804], Loss: 2.8565, Perplexity: 17.4001, time_taken_in_seconds: 5
Epoch [1/1], Step [3808/13804], Loss: 2.6242, Perplexity: 13.7934, time_taken_in_seconds: 6
Epoch [1/1], Step [3809/13804], Loss: 2.6183, Perplexity: 13.7130, time_taken_in_seconds: 7
Epoch [1/1], Step [3810/13804], Loss: 3.1369, Perplexity: 23.0332, time_taken_in_seconds: 8
Epoch [1/1], Step [3811/13804], Loss: 2.7692, Perplexity: 15.9465, time_taken_in_seconds: 9
Epoch [1/1], Step [3812/13804], Loss: 2.9764, Perplexity: 19.6171, time_taken_in_seconds: 10
Epoch [1/1], Step [3813/13804], Loss: 2.7880, Perplexity: 16.2478, time_taken_in_seconds: 10
Epoch [1/1], Step [3814/13804], Loss: 2.9798, Perplexity: 19.6837, time_taken_in_seconds: 11
Epoch [1/1], Step [3815/13804], Loss: 2.7964, Perplexity: 16.3850, time_taken_in_seconds: 12
Epoch [1/1], Step [3816/13804], Loss: 2.8471, Perplexity: 17.2369, time_taken_in_seconds: 13
Epoch [1/1], Step [3817/13804], Loss: 2.7153, Perplexity: 15.1096, time_taken_in_seconds: 14
Epoch [1/1], Step [3818/13804], Loss: 2.7688, Perplexity: 15.9387, time_taken_in_seconds: 15
Epoch [1/1], Step [3819/13804], Loss: 2.6311, Perplexity: 13.8884, time_taken_in_seconds: 15
Epoch [1/1], Step [3820/13804], Loss: 3.0420, Perplexity: 20.9474, time_taken_in_seconds: 16
Epoch [1/1], Step [3821/13804], Loss: 2.8459, Perplexity: 17.2173, time_taken_in_seconds: 17
Epoch [1/1], Step [3822/13804], Loss: 2.9186, Perplexity: 18.5158, time_taken_in_seconds: 18
Epoch [1/1], Step [3823/13804], Loss: 2.8299, Perplexity: 16.9444, time_taken_in_seconds: 19
Epoch [1/1], Step [3824/13804], Loss: 2.7538, Perplexity: 15.7015, time_taken_in_seconds: 20
Epoch [1/1], Step [3825/13804], Loss: 2.8197, Perplexity: 16.7711, time_taken_in_seconds: 20
Epoch [1/1], Step [3826/13804], Loss: 3.2092, Perplexity: 24.7596, time_taken_in_seconds: 21
Epoch [1/1], Step [3827/13804], Loss: 2.5820, Perplexity: 13.2241, time_taken_in_seconds: 22
Epoch [1/1], Step [3828/13804], Loss: 2.7770, Perplexity: 16.0714, time_taken_in_seconds: 23
Epoch [1/1], Step [3829/13804], Loss: 2.8788, Perplexity: 17.7932, time_taken_in_seconds: 24
Epoch [1/1], Step [3830/13804], Loss: 2.7548, Perplexity: 15.7172, time_taken_in_seconds: 25
Epoch [1/1], Step [3831/13804], Loss: 2.6822, Perplexity: 14.6177, time_taken_in_seconds: 25
Epoch [1/1], Step [3832/13804], Loss: 2.9351, Perplexity: 18.8239, time_taken_in_seconds: 26
Epoch [1/1], Step [3833/13804], Loss: 2.6625, Perplexity: 14.3317, time_taken_in_seconds: 27
Epoch [1/1], Step [3834/13804], Loss: 3.0682, Perplexity: 21.5028, time_taken_in_seconds: 28
Epoch [1/1], Step [3835/13804], Loss: 2.7906, Perplexity: 16.2913, time_taken_in_seconds: 29
Epoch [1/1], Step [3836/13804], Loss: 2.9825, Perplexity: 19.7364, time_taken_in_seconds: 30
Epoch [1/1], Step [3837/13804], Loss: 2.3516, Perplexity: 10.5026, time_taken_in_seconds: 31
Epoch [1/1], Step [3838/13804], Loss: 2.9355, Perplexity: 18.8306, time_taken_in_seconds: 31
Epoch [1/1], Step [3839/13804], Loss: 2.6829, Perplexity: 14.6273, time_taken_in_seconds: 32
Epoch [1/1], Step [3840/13804], Loss: 2.7458, Perplexity: 15.5768, time_taken_in_seconds: 33
Epoch [1/1], Step [3841/13804], Loss: 2.3014, Perplexity: 9.9884, time_taken_in_seconds: 34
Epoch [1/1], Step [3842/13804], Loss: 3.0091, Perplexity: 20.2699, time_taken_in_seconds: 35
Epoch [1/1], Step [3843/13804], Loss: 2.8345, Perplexity: 17.0222, time_taken_in_seconds: 36
Epoch [1/1], Step [3844/13804], Loss: 2.7478, Perplexity: 15.6088, time_taken_in_seconds: 36
Epoch [1/1], Step [3845/13804], Loss: 2.5831, Perplexity: 13.2387, time_taken_in_seconds: 37
Epoch [1/1], Step [3846/13804], Loss: 2.7707, Perplexity: 15.9698, time_taken_in_seconds: 38
Epoch [1/1], Step [3847/13804], Loss: 3.0434, Perplexity: 20.9762, time_taken_in_seconds: 39
Epoch [1/1], Step [3848/13804], Loss: 2.9089, Perplexity: 18.3358, time_taken_in_seconds: 40
Epoch [1/1], Step [3849/13804], Loss: 2.8251, Perplexity: 16.8621, time_taken_in_seconds: 41
Epoch [1/1], Step [3850/13804], Loss: 2.8515, Perplexity: 17.3146, time_taken_in_seconds: 41
Epoch [1/1], Step [3851/13804], Loss: 2.3978, Perplexity: 10.9993, time_taken_in_seconds: 42
Epoch [1/1], Step [3852/13804], Loss: 2.6069, Perplexity: 13.5572, time_taken_in_seconds: 43
Epoch [1/1], Step [3853/13804], Loss: 2.7753, Perplexity: 16.0429, time_taken_in_seconds: 44
Epoch [1/1], Step [3854/13804], Loss: 2.5887, Perplexity: 13.3120, time_taken_in_seconds: 45
Epoch [1/1], Step [3855/13804], Loss: 2.5943, Perplexity: 13.3878, time_taken_in_seconds: 46
Epoch [1/1], Step [3856/13804], Loss: 3.0895, Perplexity: 21.9657, time_taken_in_seconds: 46
Epoch [1/1], Step [3857/13804], Loss: 2.7248, Perplexity: 15.2538, time_taken_in_seconds: 47
Epoch [1/1], Step [3858/13804], Loss: 2.6106, Perplexity: 13.6068, time_taken_in_seconds: 48
Epoch [1/1], Step [3859/13804], Loss: 2.8960, Perplexity: 18.1009, time_taken_in_seconds: 49
Epoch [1/1], Step [3860/13804], Loss: 2.4696, Perplexity: 11.8180, time_taken_in_seconds: 50
Epoch [1/1], Step [3861/13804], Loss: 2.8090, Perplexity: 16.5940, time_taken_in_seconds: 51
Epoch [1/1], Step [3862/13804], Loss: 2.4852, Perplexity: 12.0034, time_taken_in_seconds: 52
Epoch [1/1], Step [3863/13804], Loss: 2.4666, Perplexity: 11.7828, time_taken_in_seconds: 52
Epoch [1/1], Step [3864/13804], Loss: 2.6770, Perplexity: 14.5413, time_taken_in_seconds: 53
Epoch [1/1], Step [3865/13804], Loss: 2.8175, Perplexity: 16.7346, time_taken_in_seconds: 54
Epoch [1/1], Step [3866/13804], Loss: 2.6392, Perplexity: 14.0020, time_taken_in_seconds: 55
Epoch [1/1], Step [3867/13804], Loss: 2.7474, Perplexity: 15.6018, time_taken_in_seconds: 56
Epoch [1/1], Step [3868/13804], Loss: 2.6677, Perplexity: 14.4065, time_taken_in_seconds: 57
Epoch [1/1], Step [3869/13804], Loss: 2.6688, Perplexity: 14.4231, time_taken_in_seconds: 57
Epoch [1/1], Step [3870/13804], Loss: 2.7819, Perplexity: 16.1501, time_taken_in_seconds: 58
Epoch [1/1], Step [3871/13804], Loss: 2.3478, Perplexity: 10.4627, time_taken_in_seconds: 59
Epoch [1/1], Step [3872/13804], Loss: 2.6669, Perplexity: 14.3947, time_taken_in_seconds: 60
Epoch [1/1], Step [3873/13804], Loss: 2.3560, Perplexity: 10.5490, time_taken_in_seconds: 61
Epoch [1/1], Step [3874/13804], Loss: 2.6457, Perplexity: 14.0934, time_taken_in_seconds: 62
Epoch [1/1], Step [3875/13804], Loss: 2.7415, Perplexity: 15.5099, time_taken_in_seconds: 62
Epoch [1/1], Step [3876/13804], Loss: 2.7180, Perplexity: 15.1497, time_taken_in_seconds: 63
Epoch [1/1], Step [3877/13804], Loss: 2.8424, Perplexity: 17.1565, time_taken_in_seconds: 64
Epoch [1/1], Step [3878/13804], Loss: 2.8873, Perplexity: 17.9440, time_taken_in_seconds: 65
Epoch [1/1], Step [3879/13804], Loss: 2.8807, Perplexity: 17.8262, time_taken_in_seconds: 66
Epoch [1/1], Step [3880/13804], Loss: 2.7811, Perplexity: 16.1366, time_taken_in_seconds: 67
Epoch [1/1], Step [3881/13804], Loss: 2.6867, Perplexity: 14.6834, time_taken_in_seconds: 67
Epoch [1/1], Step [3882/13804], Loss: 2.6669, Perplexity: 14.3946, time_taken_in_seconds: 68
Epoch [1/1], Step [3883/13804], Loss: 2.8273, Perplexity: 16.9005, time_taken_in_seconds: 69
Epoch [1/1], Step [3884/13804], Loss: 2.4580, Perplexity: 11.6813, time_taken_in_seconds: 70
Epoch [1/1], Step [3885/13804], Loss: 2.6803, Perplexity: 14.5894, time_taken_in_seconds: 71
Epoch [1/1], Step [3886/13804], Loss: 2.5975, Perplexity: 13.4303, time_taken_in_seconds: 72
Epoch [1/1], Step [3887/13804], Loss: 2.8582, Perplexity: 17.4304, time_taken_in_seconds: 72
Epoch [1/1], Step [3888/13804], Loss: 3.3220, Perplexity: 27.7162, time_taken_in_seconds: 73
Epoch [1/1], Step [3889/13804], Loss: 2.9348, Perplexity: 18.8184, time_taken_in_seconds: 74
Epoch [1/1], Step [3890/13804], Loss: 2.7131, Perplexity: 15.0754, time_taken_in_seconds: 75
Epoch [1/1], Step [3891/13804], Loss: 2.6930, Perplexity: 14.7756, time_taken_in_seconds: 76
Epoch [1/1], Step [3892/13804], Loss: 2.7687, Perplexity: 15.9374, time_taken_in_seconds: 77
Epoch [1/1], Step [3893/13804], Loss: 2.5521, Perplexity: 12.8339, time_taken_in_seconds: 77
Epoch [1/1], Step [3894/13804], Loss: 2.6331, Perplexity: 13.9164, time_taken_in_seconds: 78
Epoch [1/1], Step [3895/13804], Loss: 2.6524, Perplexity: 14.1879, time_taken_in_seconds: 79
Epoch [1/1], Step [3896/13804], Loss: 2.5975, Perplexity: 13.4304, time_taken_in_seconds: 80
Epoch [1/1], Step [3897/13804], Loss: 2.6640, Perplexity: 14.3529, time_taken_in_seconds: 81
Epoch [1/1], Step [3898/13804], Loss: 2.5799, Perplexity: 13.1952, time_taken_in_seconds: 82
Epoch [1/1], Step [3899/13804], Loss: 2.6687, Perplexity: 14.4217, time_taken_in_seconds: 82
Epoch [1/1], Step [3900/13804], Loss: 2.8215, Perplexity: 16.8013, time_taken_in_seconds: 83
Epoch [1/1], Step [3901/13804], Loss: 2.7180, Perplexity: 15.1504, time_taken_in_seconds: 0
Epoch [1/1], Step [3902/13804], Loss: 2.6233, Perplexity: 13.7805, time_taken_in_seconds: 1
Epoch [1/1], Step [3903/13804], Loss: 3.1196, Perplexity: 22.6380, time_taken_in_seconds: 2
Epoch [1/1], Step [3904/13804], Loss: 2.4841, Perplexity: 11.9903, time_taken_in_seconds: 3
Epoch [1/1], Step [3905/13804], Loss: 2.7173, Perplexity: 15.1395, time_taken_in_seconds: 4
Epoch [1/1], Step [3906/13804], Loss: 2.9175, Perplexity: 18.4946, time_taken_in_seconds: 4
Epoch [1/1], Step [3907/13804], Loss: 2.6255, Perplexity: 13.8114, time_taken_in_seconds: 5
Epoch [1/1], Step [3908/13804], Loss: 2.8225, Perplexity: 16.8194, time_taken_in_seconds: 6
Epoch [1/1], Step [3909/13804], Loss: 3.9674, Perplexity: 52.8480, time_taken_in_seconds: 7
Epoch [1/1], Step [3910/13804], Loss: 3.1585, Perplexity: 23.5358, time_taken_in_seconds: 8
Epoch [1/1], Step [3911/13804], Loss: 2.7952, Perplexity: 16.3660, time_taken_in_seconds: 9
Epoch [1/1], Step [3912/13804], Loss: 3.5285, Perplexity: 34.0739, time_taken_in_seconds: 10
Epoch [1/1], Step [3913/13804], Loss: 2.6738, Perplexity: 14.4955, time_taken_in_seconds: 11
Epoch [1/1], Step [3914/13804], Loss: 3.0806, Perplexity: 21.7713, time_taken_in_seconds: 11
Epoch [1/1], Step [3915/13804], Loss: 2.4443, Perplexity: 11.5228, time_taken_in_seconds: 12
Epoch [1/1], Step [3916/13804], Loss: 3.0366, Perplexity: 20.8346, time_taken_in_seconds: 13
Epoch [1/1], Step [3917/13804], Loss: 2.6192, Perplexity: 13.7250, time_taken_in_seconds: 14
Epoch [1/1], Step [3918/13804], Loss: 2.4641, Perplexity: 11.7529, time_taken_in_seconds: 15
Epoch [1/1], Step [3919/13804], Loss: 2.5298, Perplexity: 12.5505, time_taken_in_seconds: 16
Epoch [1/1], Step [3920/13804], Loss: 2.7669, Perplexity: 15.9085, time_taken_in_seconds: 16
Epoch [1/1], Step [3921/13804], Loss: 2.7970, Perplexity: 16.3951, time_taken_in_seconds: 17
Epoch [1/1], Step [3922/13804], Loss: 2.8040, Perplexity: 16.5098, time_taken_in_seconds: 18
Epoch [1/1], Step [3923/13804], Loss: 3.1030, Perplexity: 22.2648, time_taken_in_seconds: 19
Epoch [1/1], Step [3924/13804], Loss: 2.8064, Perplexity: 16.5509, time_taken_in_seconds: 20
Epoch [1/1], Step [3925/13804], Loss: 2.9179, Perplexity: 18.5016, time_taken_in_seconds: 21
Epoch [1/1], Step [3926/13804], Loss: 3.0461, Perplexity: 21.0333, time_taken_in_seconds: 21
Epoch [1/1], Step [3927/13804], Loss: 2.6660, Perplexity: 14.3822, time_taken_in_seconds: 22
Epoch [1/1], Step [3928/13804], Loss: 2.9982, Perplexity: 20.0489, time_taken_in_seconds: 23
Epoch [1/1], Step [3929/13804], Loss: 2.3415, Perplexity: 10.3970, time_taken_in_seconds: 24
Epoch [1/1], Step [3930/13804], Loss: 2.6200, Perplexity: 13.7353, time_taken_in_seconds: 25
Epoch [1/1], Step [3931/13804], Loss: 2.8951, Perplexity: 18.0852, time_taken_in_seconds: 26
Epoch [1/1], Step [3932/13804], Loss: 2.9747, Perplexity: 19.5837, time_taken_in_seconds: 26
Epoch [1/1], Step [3933/13804], Loss: 3.0926, Perplexity: 22.0346, time_taken_in_seconds: 27
Epoch [1/1], Step [3934/13804], Loss: 2.8165, Perplexity: 16.7179, time_taken_in_seconds: 28
Epoch [1/1], Step [3935/13804], Loss: 2.7192, Perplexity: 15.1682, time_taken_in_seconds: 29
Epoch [1/1], Step [3936/13804], Loss: 2.7113, Perplexity: 15.0488, time_taken_in_seconds: 30
Epoch [1/1], Step [3937/13804], Loss: 2.4883, Perplexity: 12.0405, time_taken_in_seconds: 31
Epoch [1/1], Step [3938/13804], Loss: 2.5947, Perplexity: 13.3922, time_taken_in_seconds: 32
Epoch [1/1], Step [3939/13804], Loss: 2.6044, Perplexity: 13.5226, time_taken_in_seconds: 32
Epoch [1/1], Step [3940/13804], Loss: 2.6052, Perplexity: 13.5338, time_taken_in_seconds: 33
Epoch [1/1], Step [3941/13804], Loss: 2.7955, Perplexity: 16.3709, time_taken_in_seconds: 34
Epoch [1/1], Step [3942/13804], Loss: 2.8006, Perplexity: 16.4547, time_taken_in_seconds: 35
Epoch [1/1], Step [3943/13804], Loss: 2.9031, Perplexity: 18.2312, time_taken_in_seconds: 36
Epoch [1/1], Step [3944/13804], Loss: 2.4141, Perplexity: 11.1801, time_taken_in_seconds: 36
Epoch [1/1], Step [3945/13804], Loss: 2.5393, Perplexity: 12.6709, time_taken_in_seconds: 37
Epoch [1/1], Step [3946/13804], Loss: 2.7567, Perplexity: 15.7471, time_taken_in_seconds: 38
Epoch [1/1], Step [3947/13804], Loss: 2.6168, Perplexity: 13.6924, time_taken_in_seconds: 39
Epoch [1/1], Step [3948/13804], Loss: 2.9925, Perplexity: 19.9353, time_taken_in_seconds: 40
Epoch [1/1], Step [3949/13804], Loss: 2.9563, Perplexity: 19.2276, time_taken_in_seconds: 41
Epoch [1/1], Step [3950/13804], Loss: 2.4585, Perplexity: 11.6876, time_taken_in_seconds: 42
Epoch [1/1], Step [3951/13804], Loss: 2.9557, Perplexity: 19.2148, time_taken_in_seconds: 42
Epoch [1/1], Step [3952/13804], Loss: 2.4499, Perplexity: 11.5874, time_taken_in_seconds: 43
Epoch [1/1], Step [3953/13804], Loss: 2.5623, Perplexity: 12.9657, time_taken_in_seconds: 44
Epoch [1/1], Step [3954/13804], Loss: 3.0285, Perplexity: 20.6672, time_taken_in_seconds: 45
Epoch [1/1], Step [3955/13804], Loss: 2.6776, Perplexity: 14.5499, time_taken_in_seconds: 46
Epoch [1/1], Step [3956/13804], Loss: 3.0395, Perplexity: 20.8951, time_taken_in_seconds: 47
Epoch [1/1], Step [3957/13804], Loss: 2.8535, Perplexity: 17.3485, time_taken_in_seconds: 47
Epoch [1/1], Step [3958/13804], Loss: 2.7525, Perplexity: 15.6820, time_taken_in_seconds: 48
Epoch [1/1], Step [3959/13804], Loss: 2.7550, Perplexity: 15.7215, time_taken_in_seconds: 49
Epoch [1/1], Step [3960/13804], Loss: 2.8057, Perplexity: 16.5384, time_taken_in_seconds: 50
Epoch [1/1], Step [3961/13804], Loss: 2.5342, Perplexity: 12.6061, time_taken_in_seconds: 51
Epoch [1/1], Step [3962/13804], Loss: 2.9994, Perplexity: 20.0736, time_taken_in_seconds: 52
Epoch [1/1], Step [3963/13804], Loss: 2.8252, Perplexity: 16.8651, time_taken_in_seconds: 52
Epoch [1/1], Step [3964/13804], Loss: 2.7212, Perplexity: 15.1987, time_taken_in_seconds: 53
Epoch [1/1], Step [3965/13804], Loss: 2.8177, Perplexity: 16.7382, time_taken_in_seconds: 54
Epoch [1/1], Step [3966/13804], Loss: 2.9236, Perplexity: 18.6075, time_taken_in_seconds: 55
Epoch [1/1], Step [3967/13804], Loss: 2.4028, Perplexity: 11.0537, time_taken_in_seconds: 56
Epoch [1/1], Step [3968/13804], Loss: 2.8787, Perplexity: 17.7920, time_taken_in_seconds: 57
Epoch [1/1], Step [3969/13804], Loss: 2.4918, Perplexity: 12.0831, time_taken_in_seconds: 57
Epoch [1/1], Step [3970/13804], Loss: 2.6799, Perplexity: 14.5837, time_taken_in_seconds: 58
Epoch [1/1], Step [3971/13804], Loss: 2.8068, Perplexity: 16.5575, time_taken_in_seconds: 59
Epoch [1/1], Step [3972/13804], Loss: 3.0518, Perplexity: 21.1543, time_taken_in_seconds: 60
Epoch [1/1], Step [3973/13804], Loss: 2.4239, Perplexity: 11.2893, time_taken_in_seconds: 61
Epoch [1/1], Step [3974/13804], Loss: 3.2784, Perplexity: 26.5322, time_taken_in_seconds: 62
Epoch [1/1], Step [3975/13804], Loss: 2.6833, Perplexity: 14.6330, time_taken_in_seconds: 62
Epoch [1/1], Step [3976/13804], Loss: 2.6244, Perplexity: 13.7957, time_taken_in_seconds: 63
Epoch [1/1], Step [3977/13804], Loss: 2.9608, Perplexity: 19.3143, time_taken_in_seconds: 64
Epoch [1/1], Step [3978/13804], Loss: 2.9062, Perplexity: 18.2864, time_taken_in_seconds: 65
Epoch [1/1], Step [3979/13804], Loss: 2.9349, Perplexity: 18.8201, time_taken_in_seconds: 66
Epoch [1/1], Step [3980/13804], Loss: 2.9135, Perplexity: 18.4210, time_taken_in_seconds: 67
Epoch [1/1], Step [3981/13804], Loss: 3.0585, Perplexity: 21.2954, time_taken_in_seconds: 68
Epoch [1/1], Step [3982/13804], Loss: 2.8982, Perplexity: 18.1410, time_taken_in_seconds: 69
Epoch [1/1], Step [3983/13804], Loss: 2.9331, Perplexity: 18.7863, time_taken_in_seconds: 69
Epoch [1/1], Step [3984/13804], Loss: 2.7954, Perplexity: 16.3697, time_taken_in_seconds: 70
Epoch [1/1], Step [3985/13804], Loss: 2.5865, Perplexity: 13.2830, time_taken_in_seconds: 71
Epoch [1/1], Step [3986/13804], Loss: 2.8291, Perplexity: 16.9308, time_taken_in_seconds: 72
Epoch [1/1], Step [3987/13804], Loss: 2.6828, Perplexity: 14.6264, time_taken_in_seconds: 73
Epoch [1/1], Step [3988/13804], Loss: 2.9096, Perplexity: 18.3492, time_taken_in_seconds: 74
Epoch [1/1], Step [3989/13804], Loss: 2.7093, Perplexity: 15.0183, time_taken_in_seconds: 74
Epoch [1/1], Step [3990/13804], Loss: 3.1701, Perplexity: 23.8104, time_taken_in_seconds: 75
Epoch [1/1], Step [3991/13804], Loss: 2.7349, Perplexity: 15.4088, time_taken_in_seconds: 76
Epoch [1/1], Step [3992/13804], Loss: 2.5475, Perplexity: 12.7754, time_taken_in_seconds: 77
Epoch [1/1], Step [3993/13804], Loss: 3.2320, Perplexity: 25.3294, time_taken_in_seconds: 78
Epoch [1/1], Step [3994/13804], Loss: 2.6316, Perplexity: 13.8962, time_taken_in_seconds: 79
Epoch [1/1], Step [3995/13804], Loss: 2.6124, Perplexity: 13.6314, time_taken_in_seconds: 79
Epoch [1/1], Step [3996/13804], Loss: 2.8366, Perplexity: 17.0582, time_taken_in_seconds: 80
Epoch [1/1], Step [3997/13804], Loss: 2.8075, Perplexity: 16.5681, time_taken_in_seconds: 81
Epoch [1/1], Step [3998/13804], Loss: 2.6237, Perplexity: 13.7860, time_taken_in_seconds: 82
Epoch [1/1], Step [3999/13804], Loss: 2.6875, Perplexity: 14.6953, time_taken_in_seconds: 83
Epoch [1/1], Step [4000/13804], Loss: 2.9162, Perplexity: 18.4708, time_taken_in_seconds: 84
Epoch [1/1], Step [4001/13804], Loss: 2.7719, Perplexity: 15.9887, time_taken_in_seconds: 0
Epoch [1/1], Step [4002/13804], Loss: 2.5499, Perplexity: 12.8060, time_taken_in_seconds: 1
Epoch [1/1], Step [4003/13804], Loss: 2.5203, Perplexity: 12.4329, time_taken_in_seconds: 2
Epoch [1/1], Step [4004/13804], Loss: 2.6957, Perplexity: 14.8160, time_taken_in_seconds: 3
Epoch [1/1], Step [4005/13804], Loss: 2.3304, Perplexity: 10.2821, time_taken_in_seconds: 4
Epoch [1/1], Step [4006/13804], Loss: 3.1681, Perplexity: 23.7630, time_taken_in_seconds: 5
Epoch [1/1], Step [4007/13804], Loss: 3.5815, Perplexity: 35.9289, time_taken_in_seconds: 5
Epoch [1/1], Step [4008/13804], Loss: 2.3803, Perplexity: 10.8076, time_taken_in_seconds: 6
Epoch [1/1], Step [4009/13804], Loss: 2.5923, Perplexity: 13.3611, time_taken_in_seconds: 7
Epoch [1/1], Step [4010/13804], Loss: 2.5583, Perplexity: 12.9132, time_taken_in_seconds: 8
Epoch [1/1], Step [4011/13804], Loss: 3.2356, Perplexity: 25.4228, time_taken_in_seconds: 9
Epoch [1/1], Step [4012/13804], Loss: 2.9066, Perplexity: 18.2951, time_taken_in_seconds: 10
Epoch [1/1], Step [4013/13804], Loss: 2.6362, Perplexity: 13.9598, time_taken_in_seconds: 10
Epoch [1/1], Step [4014/13804], Loss: 2.6968, Perplexity: 14.8323, time_taken_in_seconds: 11
Epoch [1/1], Step [4015/13804], Loss: 2.6900, Perplexity: 14.7322, time_taken_in_seconds: 12
Epoch [1/1], Step [4016/13804], Loss: 3.1603, Perplexity: 23.5774, time_taken_in_seconds: 13
Epoch [1/1], Step [4017/13804], Loss: 2.8474, Perplexity: 17.2432, time_taken_in_seconds: 14
Epoch [1/1], Step [4018/13804], Loss: 2.9103, Perplexity: 18.3626, time_taken_in_seconds: 15
Epoch [1/1], Step [4019/13804], Loss: 3.0629, Perplexity: 21.3903, time_taken_in_seconds: 15
Epoch [1/1], Step [4020/13804], Loss: 3.0044, Perplexity: 20.1731, time_taken_in_seconds: 16
Epoch [1/1], Step [4021/13804], Loss: 2.7183, Perplexity: 15.1539, time_taken_in_seconds: 17
Epoch [1/1], Step [4022/13804], Loss: 2.9309, Perplexity: 18.7444, time_taken_in_seconds: 18
Epoch [1/1], Step [4023/13804], Loss: 2.8995, Perplexity: 18.1642, time_taken_in_seconds: 19
Epoch [1/1], Step [4024/13804], Loss: 2.5129, Perplexity: 12.3409, time_taken_in_seconds: 20
Epoch [1/1], Step [4025/13804], Loss: 2.5156, Perplexity: 12.3737, time_taken_in_seconds: 20
Epoch [1/1], Step [4026/13804], Loss: 2.4999, Perplexity: 12.1818, time_taken_in_seconds: 21
Epoch [1/1], Step [4027/13804], Loss: 3.0798, Perplexity: 21.7545, time_taken_in_seconds: 22
Epoch [1/1], Step [4028/13804], Loss: 2.8616, Perplexity: 17.4902, time_taken_in_seconds: 23
Epoch [1/1], Step [4029/13804], Loss: 2.6135, Perplexity: 13.6471, time_taken_in_seconds: 24
Epoch [1/1], Step [4030/13804], Loss: 2.7524, Perplexity: 15.6806, time_taken_in_seconds: 24
Epoch [1/1], Step [4031/13804], Loss: 2.8127, Perplexity: 16.6556, time_taken_in_seconds: 25
Epoch [1/1], Step [4032/13804], Loss: 2.6871, Perplexity: 14.6884, time_taken_in_seconds: 26
Epoch [1/1], Step [4033/13804], Loss: 3.0591, Perplexity: 21.3090, time_taken_in_seconds: 27
Epoch [1/1], Step [4034/13804], Loss: 3.2719, Perplexity: 26.3623, time_taken_in_seconds: 28
Epoch [1/1], Step [4035/13804], Loss: 2.8966, Perplexity: 18.1118, time_taken_in_seconds: 29
Epoch [1/1], Step [4036/13804], Loss: 3.1639, Perplexity: 23.6629, time_taken_in_seconds: 29
Epoch [1/1], Step [4037/13804], Loss: 3.0201, Perplexity: 20.4935, time_taken_in_seconds: 30
Epoch [1/1], Step [4038/13804], Loss: 2.5895, Perplexity: 13.3233, time_taken_in_seconds: 31
Epoch [1/1], Step [4039/13804], Loss: 2.5924, Perplexity: 13.3613, time_taken_in_seconds: 32
Epoch [1/1], Step [4040/13804], Loss: 2.6201, Perplexity: 13.7373, time_taken_in_seconds: 33
Epoch [1/1], Step [4041/13804], Loss: 2.6532, Perplexity: 14.1989, time_taken_in_seconds: 34
Epoch [1/1], Step [4042/13804], Loss: 2.6638, Perplexity: 14.3508, time_taken_in_seconds: 34
Epoch [1/1], Step [4043/13804], Loss: 2.6755, Perplexity: 14.5192, time_taken_in_seconds: 35
Epoch [1/1], Step [4044/13804], Loss: 2.5610, Perplexity: 12.9491, time_taken_in_seconds: 36
Epoch [1/1], Step [4045/13804], Loss: 2.7503, Perplexity: 15.6476, time_taken_in_seconds: 37
Epoch [1/1], Step [4046/13804], Loss: 2.8362, Perplexity: 17.0514, time_taken_in_seconds: 38
Epoch [1/1], Step [4047/13804], Loss: 2.4911, Perplexity: 12.0750, time_taken_in_seconds: 39
Epoch [1/1], Step [4048/13804], Loss: 2.8810, Perplexity: 17.8329, time_taken_in_seconds: 39
Epoch [1/1], Step [4049/13804], Loss: 2.8395, Perplexity: 17.1073, time_taken_in_seconds: 40
Epoch [1/1], Step [4050/13804], Loss: 2.7508, Perplexity: 15.6553, time_taken_in_seconds: 41
Epoch [1/1], Step [4051/13804], Loss: 2.8492, Perplexity: 17.2745, time_taken_in_seconds: 42
Epoch [1/1], Step [4052/13804], Loss: 2.6355, Perplexity: 13.9498, time_taken_in_seconds: 43
Epoch [1/1], Step [4053/13804], Loss: 2.6031, Perplexity: 13.5061, time_taken_in_seconds: 44
Epoch [1/1], Step [4054/13804], Loss: 2.9664, Perplexity: 19.4214, time_taken_in_seconds: 45
Epoch [1/1], Step [4055/13804], Loss: 2.9532, Perplexity: 19.1671, time_taken_in_seconds: 45
Epoch [1/1], Step [4056/13804], Loss: 2.5416, Perplexity: 12.7005, time_taken_in_seconds: 46
Epoch [1/1], Step [4057/13804], Loss: 2.3593, Perplexity: 10.5840, time_taken_in_seconds: 47
Epoch [1/1], Step [4058/13804], Loss: 2.7230, Perplexity: 15.2258, time_taken_in_seconds: 48
Epoch [1/1], Step [4059/13804], Loss: 2.7652, Perplexity: 15.8821, time_taken_in_seconds: 49
Epoch [1/1], Step [4060/13804], Loss: 3.0278, Perplexity: 20.6517, time_taken_in_seconds: 49
Epoch [1/1], Step [4061/13804], Loss: 2.7603, Perplexity: 15.8040, time_taken_in_seconds: 50
Epoch [1/1], Step [4062/13804], Loss: 2.7372, Perplexity: 15.4443, time_taken_in_seconds: 51
Epoch [1/1], Step [4063/13804], Loss: 2.8041, Perplexity: 16.5120, time_taken_in_seconds: 52
Epoch [1/1], Step [4064/13804], Loss: 2.6502, Perplexity: 14.1566, time_taken_in_seconds: 53
Epoch [1/1], Step [4065/13804], Loss: 2.6591, Perplexity: 14.2835, time_taken_in_seconds: 54
Epoch [1/1], Step [4066/13804], Loss: 2.6451, Perplexity: 14.0854, time_taken_in_seconds: 54
Epoch [1/1], Step [4067/13804], Loss: 3.0565, Perplexity: 21.2533, time_taken_in_seconds: 55
Epoch [1/1], Step [4068/13804], Loss: 2.8093, Perplexity: 16.5984, time_taken_in_seconds: 56
Epoch [1/1], Step [4069/13804], Loss: 2.7612, Perplexity: 15.8186, time_taken_in_seconds: 57
Epoch [1/1], Step [4070/13804], Loss: 2.7589, Perplexity: 15.7824, time_taken_in_seconds: 58
Epoch [1/1], Step [4071/13804], Loss: 2.3472, Perplexity: 10.4567, time_taken_in_seconds: 58
Epoch [1/1], Step [4072/13804], Loss: 2.6448, Perplexity: 14.0803, time_taken_in_seconds: 59
Epoch [1/1], Step [4073/13804], Loss: 2.8400, Perplexity: 17.1163, time_taken_in_seconds: 60
Epoch [1/1], Step [4074/13804], Loss: 2.8601, Perplexity: 17.4638, time_taken_in_seconds: 61
Epoch [1/1], Step [4075/13804], Loss: 2.7804, Perplexity: 16.1255, time_taken_in_seconds: 62
Epoch [1/1], Step [4076/13804], Loss: 2.7208, Perplexity: 15.1927, time_taken_in_seconds: 63
Epoch [1/1], Step [4077/13804], Loss: 2.7693, Perplexity: 15.9482, time_taken_in_seconds: 64
Epoch [1/1], Step [4078/13804], Loss: 2.5892, Perplexity: 13.3195, time_taken_in_seconds: 64
Epoch [1/1], Step [4079/13804], Loss: 2.5836, Perplexity: 13.2442, time_taken_in_seconds: 65
Epoch [1/1], Step [4080/13804], Loss: 2.6379, Perplexity: 13.9840, time_taken_in_seconds: 66
Epoch [1/1], Step [4081/13804], Loss: 2.9388, Perplexity: 18.8935, time_taken_in_seconds: 67
Epoch [1/1], Step [4082/13804], Loss: 2.5785, Perplexity: 13.1771, time_taken_in_seconds: 68
Epoch [1/1], Step [4083/13804], Loss: 2.7929, Perplexity: 16.3278, time_taken_in_seconds: 69
Epoch [1/1], Step [4084/13804], Loss: 2.6081, Perplexity: 13.5726, time_taken_in_seconds: 69
Epoch [1/1], Step [4085/13804], Loss: 2.6138, Perplexity: 13.6512, time_taken_in_seconds: 70
Epoch [1/1], Step [4086/13804], Loss: 2.7717, Perplexity: 15.9850, time_taken_in_seconds: 71
Epoch [1/1], Step [4087/13804], Loss: 2.5690, Perplexity: 13.0532, time_taken_in_seconds: 72
Epoch [1/1], Step [4088/13804], Loss: 2.4559, Perplexity: 11.6570, time_taken_in_seconds: 73
Epoch [1/1], Step [4089/13804], Loss: 2.7676, Perplexity: 15.9200, time_taken_in_seconds: 74
Epoch [1/1], Step [4090/13804], Loss: 2.4496, Perplexity: 11.5835, time_taken_in_seconds: 74
Epoch [1/1], Step [4091/13804], Loss: 2.3878, Perplexity: 10.8896, time_taken_in_seconds: 75
Epoch [1/1], Step [4092/13804], Loss: 2.7091, Perplexity: 15.0158, time_taken_in_seconds: 76
Epoch [1/1], Step [4093/13804], Loss: 3.0598, Perplexity: 21.3231, time_taken_in_seconds: 77
Epoch [1/1], Step [4094/13804], Loss: 2.7688, Perplexity: 15.9391, time_taken_in_seconds: 78
Epoch [1/1], Step [4095/13804], Loss: 2.7790, Perplexity: 16.1037, time_taken_in_seconds: 79
Epoch [1/1], Step [4096/13804], Loss: 2.6509, Perplexity: 14.1667, time_taken_in_seconds: 79
Epoch [1/1], Step [4097/13804], Loss: 2.8396, Perplexity: 17.1095, time_taken_in_seconds: 80
Epoch [1/1], Step [4098/13804], Loss: 2.3169, Perplexity: 10.1446, time_taken_in_seconds: 81
Epoch [1/1], Step [4099/13804], Loss: 2.2486, Perplexity: 9.4748, time_taken_in_seconds: 82
Epoch [1/1], Step [4100/13804], Loss: 2.6527, Perplexity: 14.1923, time_taken_in_seconds: 83
Epoch [1/1], Step [4101/13804], Loss: 2.6451, Perplexity: 14.0845, time_taken_in_seconds: 0
Epoch [1/1], Step [4102/13804], Loss: 3.1077, Perplexity: 22.3686, time_taken_in_seconds: 1
Epoch [1/1], Step [4103/13804], Loss: 2.7668, Perplexity: 15.9081, time_taken_in_seconds: 2
Epoch [1/1], Step [4104/13804], Loss: 2.8220, Perplexity: 16.8113, time_taken_in_seconds: 3
Epoch [1/1], Step [4105/13804], Loss: 3.0320, Perplexity: 20.7381, time_taken_in_seconds: 4
Epoch [1/1], Step [4106/13804], Loss: 2.9612, Perplexity: 19.3211, time_taken_in_seconds: 4
Epoch [1/1], Step [4107/13804], Loss: 3.1640, Perplexity: 23.6653, time_taken_in_seconds: 5
Epoch [1/1], Step [4108/13804], Loss: 2.8113, Perplexity: 16.6318, time_taken_in_seconds: 6
Epoch [1/1], Step [4109/13804], Loss: 2.7631, Perplexity: 15.8489, time_taken_in_seconds: 7
Epoch [1/1], Step [4110/13804], Loss: 2.5489, Perplexity: 12.7929, time_taken_in_seconds: 8
Epoch [1/1], Step [4111/13804], Loss: 2.6751, Perplexity: 14.5143, time_taken_in_seconds: 9
Epoch [1/1], Step [4112/13804], Loss: 2.7667, Perplexity: 15.9067, time_taken_in_seconds: 10
Epoch [1/1], Step [4113/13804], Loss: 2.5636, Perplexity: 12.9828, time_taken_in_seconds: 10
Epoch [1/1], Step [4114/13804], Loss: 2.4442, Perplexity: 11.5210, time_taken_in_seconds: 11
Epoch [1/1], Step [4115/13804], Loss: 2.7579, Perplexity: 15.7673, time_taken_in_seconds: 12
Epoch [1/1], Step [4116/13804], Loss: 2.9841, Perplexity: 19.7686, time_taken_in_seconds: 13
Epoch [1/1], Step [4117/13804], Loss: 2.6995, Perplexity: 14.8730, time_taken_in_seconds: 14
Epoch [1/1], Step [4118/13804], Loss: 2.9894, Perplexity: 19.8745, time_taken_in_seconds: 15
Epoch [1/1], Step [4119/13804], Loss: 3.1275, Perplexity: 22.8171, time_taken_in_seconds: 15
Epoch [1/1], Step [4120/13804], Loss: 2.9533, Perplexity: 19.1683, time_taken_in_seconds: 16
Epoch [1/1], Step [4121/13804], Loss: 2.8481, Perplexity: 17.2548, time_taken_in_seconds: 17
Epoch [1/1], Step [4122/13804], Loss: 2.5864, Perplexity: 13.2817, time_taken_in_seconds: 18
Epoch [1/1], Step [4123/13804], Loss: 2.5528, Perplexity: 12.8425, time_taken_in_seconds: 19
Epoch [1/1], Step [4124/13804], Loss: 2.3812, Perplexity: 10.8175, time_taken_in_seconds: 20
Epoch [1/1], Step [4125/13804], Loss: 2.8576, Perplexity: 17.4191, time_taken_in_seconds: 21
Epoch [1/1], Step [4126/13804], Loss: 2.9307, Perplexity: 18.7404, time_taken_in_seconds: 21
Epoch [1/1], Step [4127/13804], Loss: 3.0920, Perplexity: 22.0216, time_taken_in_seconds: 22
Epoch [1/1], Step [4128/13804], Loss: 2.9684, Perplexity: 19.4603, time_taken_in_seconds: 23
Epoch [1/1], Step [4129/13804], Loss: 2.5744, Perplexity: 13.1229, time_taken_in_seconds: 24
Epoch [1/1], Step [4130/13804], Loss: 2.6133, Perplexity: 13.6439, time_taken_in_seconds: 25
Epoch [1/1], Step [4131/13804], Loss: 2.5318, Perplexity: 12.5760, time_taken_in_seconds: 26
Epoch [1/1], Step [4132/13804], Loss: 2.6712, Perplexity: 14.4576, time_taken_in_seconds: 26
Epoch [1/1], Step [4133/13804], Loss: 2.4380, Perplexity: 11.4502, time_taken_in_seconds: 27
Epoch [1/1], Step [4134/13804], Loss: 2.7266, Perplexity: 15.2806, time_taken_in_seconds: 28
Epoch [1/1], Step [4135/13804], Loss: 2.5128, Perplexity: 12.3391, time_taken_in_seconds: 29
Epoch [1/1], Step [4136/13804], Loss: 2.4756, Perplexity: 11.8886, time_taken_in_seconds: 30
Epoch [1/1], Step [4137/13804], Loss: 3.2084, Perplexity: 24.7385, time_taken_in_seconds: 31
Epoch [1/1], Step [4138/13804], Loss: 2.6888, Perplexity: 14.7136, time_taken_in_seconds: 31
Epoch [1/1], Step [4139/13804], Loss: 3.9773, Perplexity: 53.3744, time_taken_in_seconds: 32
Epoch [1/1], Step [4140/13804], Loss: 2.4728, Perplexity: 11.8551, time_taken_in_seconds: 33
Epoch [1/1], Step [4141/13804], Loss: 2.9331, Perplexity: 18.7854, time_taken_in_seconds: 34
Epoch [1/1], Step [4142/13804], Loss: 2.4200, Perplexity: 11.2463, time_taken_in_seconds: 35
Epoch [1/1], Step [4143/13804], Loss: 2.6068, Perplexity: 13.5555, time_taken_in_seconds: 36
Epoch [1/1], Step [4144/13804], Loss: 2.8467, Perplexity: 17.2306, time_taken_in_seconds: 36
Epoch [1/1], Step [4145/13804], Loss: 2.6528, Perplexity: 14.1941, time_taken_in_seconds: 37
Epoch [1/1], Step [4146/13804], Loss: 2.6394, Perplexity: 14.0052, time_taken_in_seconds: 38
Epoch [1/1], Step [4147/13804], Loss: 2.8897, Perplexity: 17.9882, time_taken_in_seconds: 39
Epoch [1/1], Step [4148/13804], Loss: 2.7018, Perplexity: 14.9065, time_taken_in_seconds: 40
Epoch [1/1], Step [4149/13804], Loss: 3.0625, Perplexity: 21.3818, time_taken_in_seconds: 41
Epoch [1/1], Step [4150/13804], Loss: 2.6609, Perplexity: 14.3087, time_taken_in_seconds: 41
Epoch [1/1], Step [4151/13804], Loss: 2.8527, Perplexity: 17.3346, time_taken_in_seconds: 42
Epoch [1/1], Step [4152/13804], Loss: 3.6092, Perplexity: 36.9364, time_taken_in_seconds: 43
Epoch [1/1], Step [4153/13804], Loss: 2.5230, Perplexity: 12.4656, time_taken_in_seconds: 44
Epoch [1/1], Step [4154/13804], Loss: 2.5278, Perplexity: 12.5262, time_taken_in_seconds: 45
Epoch [1/1], Step [4155/13804], Loss: 2.7542, Perplexity: 15.7092, time_taken_in_seconds: 46
Epoch [1/1], Step [4156/13804], Loss: 2.6720, Perplexity: 14.4682, time_taken_in_seconds: 46
Epoch [1/1], Step [4157/13804], Loss: 2.6643, Perplexity: 14.3580, time_taken_in_seconds: 47
Epoch [1/1], Step [4158/13804], Loss: 2.7870, Perplexity: 16.2316, time_taken_in_seconds: 48
Epoch [1/1], Step [4159/13804], Loss: 2.6017, Perplexity: 13.4862, time_taken_in_seconds: 49
Epoch [1/1], Step [4160/13804], Loss: 2.8537, Perplexity: 17.3521, time_taken_in_seconds: 50
Epoch [1/1], Step [4161/13804], Loss: 2.8202, Perplexity: 16.7797, time_taken_in_seconds: 51
Epoch [1/1], Step [4162/13804], Loss: 2.8560, Perplexity: 17.3910, time_taken_in_seconds: 51
Epoch [1/1], Step [4163/13804], Loss: 2.6422, Perplexity: 14.0435, time_taken_in_seconds: 52
Epoch [1/1], Step [4164/13804], Loss: 2.6581, Perplexity: 14.2687, time_taken_in_seconds: 53
Epoch [1/1], Step [4165/13804], Loss: 2.7479, Perplexity: 15.6098, time_taken_in_seconds: 54
Epoch [1/1], Step [4166/13804], Loss: 2.9458, Perplexity: 19.0265, time_taken_in_seconds: 55
Epoch [1/1], Step [4167/13804], Loss: 2.4579, Perplexity: 11.6797, time_taken_in_seconds: 56
Epoch [1/1], Step [4168/13804], Loss: 2.7945, Perplexity: 16.3541, time_taken_in_seconds: 56
Epoch [1/1], Step [4169/13804], Loss: 2.9499, Perplexity: 19.1033, time_taken_in_seconds: 57
Epoch [1/1], Step [4170/13804], Loss: 2.8430, Perplexity: 17.1679, time_taken_in_seconds: 58
Epoch [1/1], Step [4171/13804], Loss: 2.7113, Perplexity: 15.0495, time_taken_in_seconds: 59
Epoch [1/1], Step [4172/13804], Loss: 2.7108, Perplexity: 15.0410, time_taken_in_seconds: 60
Epoch [1/1], Step [4173/13804], Loss: 3.0301, Perplexity: 20.6988, time_taken_in_seconds: 61
Epoch [1/1], Step [4174/13804], Loss: 2.7257, Perplexity: 15.2674, time_taken_in_seconds: 61
Epoch [1/1], Step [4175/13804], Loss: 2.7886, Perplexity: 16.2576, time_taken_in_seconds: 62
Epoch [1/1], Step [4176/13804], Loss: 2.5097, Perplexity: 12.3012, time_taken_in_seconds: 63
Epoch [1/1], Step [4177/13804], Loss: 3.0957, Perplexity: 22.1027, time_taken_in_seconds: 64
Epoch [1/1], Step [4178/13804], Loss: 3.0392, Perplexity: 20.8881, time_taken_in_seconds: 65
Epoch [1/1], Step [4179/13804], Loss: 2.5946, Perplexity: 13.3910, time_taken_in_seconds: 66
Epoch [1/1], Step [4180/13804], Loss: 2.7983, Perplexity: 16.4167, time_taken_in_seconds: 66
Epoch [1/1], Step [4181/13804], Loss: 2.6637, Perplexity: 14.3498, time_taken_in_seconds: 67
Epoch [1/1], Step [4182/13804], Loss: 2.9599, Perplexity: 19.2963, time_taken_in_seconds: 68
Epoch [1/1], Step [4183/13804], Loss: 2.7439, Perplexity: 15.5473, time_taken_in_seconds: 69
Epoch [1/1], Step [4184/13804], Loss: 2.6677, Perplexity: 14.4072, time_taken_in_seconds: 70
Epoch [1/1], Step [4185/13804], Loss: 2.5776, Perplexity: 13.1652, time_taken_in_seconds: 71
Epoch [1/1], Step [4186/13804], Loss: 2.6862, Perplexity: 14.6755, time_taken_in_seconds: 72
Epoch [1/1], Step [4187/13804], Loss: 2.6255, Perplexity: 13.8114, time_taken_in_seconds: 72
Epoch [1/1], Step [4188/13804], Loss: 2.7721, Perplexity: 15.9915, time_taken_in_seconds: 73
Epoch [1/1], Step [4189/13804], Loss: 2.8958, Perplexity: 18.0984, time_taken_in_seconds: 74
Epoch [1/1], Step [4190/13804], Loss: 2.6528, Perplexity: 14.1942, time_taken_in_seconds: 75
Epoch [1/1], Step [4191/13804], Loss: 2.8877, Perplexity: 17.9528, time_taken_in_seconds: 76
Epoch [1/1], Step [4192/13804], Loss: 3.8273, Perplexity: 45.9377, time_taken_in_seconds: 77
Epoch [1/1], Step [4193/13804], Loss: 2.8112, Perplexity: 16.6295, time_taken_in_seconds: 77
Epoch [1/1], Step [4194/13804], Loss: 2.6792, Perplexity: 14.5735, time_taken_in_seconds: 78
Epoch [1/1], Step [4195/13804], Loss: 2.8681, Perplexity: 17.6034, time_taken_in_seconds: 79
Epoch [1/1], Step [4196/13804], Loss: 2.7368, Perplexity: 15.4372, time_taken_in_seconds: 80
Epoch [1/1], Step [4197/13804], Loss: 2.6296, Perplexity: 13.8679, time_taken_in_seconds: 81
Epoch [1/1], Step [4198/13804], Loss: 2.3287, Perplexity: 10.2643, time_taken_in_seconds: 82
Epoch [1/1], Step [4199/13804], Loss: 2.8760, Perplexity: 17.7430, time_taken_in_seconds: 83
Epoch [1/1], Step [4200/13804], Loss: 2.8653, Perplexity: 17.5549, time_taken_in_seconds: 83
Epoch [1/1], Step [4201/13804], Loss: 2.7037, Perplexity: 14.9352, time_taken_in_seconds: 0
Epoch [1/1], Step [4202/13804], Loss: 3.2983, Perplexity: 27.0676, time_taken_in_seconds: 1
Epoch [1/1], Step [4203/13804], Loss: 2.9472, Perplexity: 19.0529, time_taken_in_seconds: 2
Epoch [1/1], Step [4204/13804], Loss: 2.7268, Perplexity: 15.2842, time_taken_in_seconds: 3
Epoch [1/1], Step [4205/13804], Loss: 2.9184, Perplexity: 18.5116, time_taken_in_seconds: 4
Epoch [1/1], Step [4206/13804], Loss: 3.0988, Perplexity: 22.1723, time_taken_in_seconds: 4
Epoch [1/1], Step [4207/13804], Loss: 2.6209, Perplexity: 13.7481, time_taken_in_seconds: 5
Epoch [1/1], Step [4208/13804], Loss: 2.9820, Perplexity: 19.7277, time_taken_in_seconds: 6
Epoch [1/1], Step [4209/13804], Loss: 2.6237, Perplexity: 13.7860, time_taken_in_seconds: 7
Epoch [1/1], Step [4210/13804], Loss: 2.7879, Perplexity: 16.2468, time_taken_in_seconds: 8
Epoch [1/1], Step [4211/13804], Loss: 2.7986, Perplexity: 16.4215, time_taken_in_seconds: 9
Epoch [1/1], Step [4212/13804], Loss: 2.5547, Perplexity: 12.8677, time_taken_in_seconds: 9
Epoch [1/1], Step [4213/13804], Loss: 2.8185, Perplexity: 16.7523, time_taken_in_seconds: 10
Epoch [1/1], Step [4214/13804], Loss: 2.5524, Perplexity: 12.8385, time_taken_in_seconds: 11
Epoch [1/1], Step [4215/13804], Loss: 2.6414, Perplexity: 14.0328, time_taken_in_seconds: 12
Epoch [1/1], Step [4216/13804], Loss: 2.8787, Perplexity: 17.7907, time_taken_in_seconds: 13
Epoch [1/1], Step [4217/13804], Loss: 2.9159, Perplexity: 18.4648, time_taken_in_seconds: 14
Epoch [1/1], Step [4218/13804], Loss: 2.5252, Perplexity: 12.4937, time_taken_in_seconds: 14
Epoch [1/1], Step [4219/13804], Loss: 2.6118, Perplexity: 13.6233, time_taken_in_seconds: 15
Epoch [1/1], Step [4220/13804], Loss: 2.8878, Perplexity: 17.9542, time_taken_in_seconds: 16
Epoch [1/1], Step [4221/13804], Loss: 2.6958, Perplexity: 14.8168, time_taken_in_seconds: 17
Epoch [1/1], Step [4222/13804], Loss: 2.9811, Perplexity: 19.7089, time_taken_in_seconds: 18
Epoch [1/1], Step [4223/13804], Loss: 2.6188, Perplexity: 13.7189, time_taken_in_seconds: 19
Epoch [1/1], Step [4224/13804], Loss: 2.7928, Perplexity: 16.3268, time_taken_in_seconds: 19
Epoch [1/1], Step [4225/13804], Loss: 2.7833, Perplexity: 16.1730, time_taken_in_seconds: 20
Epoch [1/1], Step [4226/13804], Loss: 2.5051, Perplexity: 12.2450, time_taken_in_seconds: 21
Epoch [1/1], Step [4227/13804], Loss: 2.8931, Perplexity: 18.0485, time_taken_in_seconds: 22
Epoch [1/1], Step [4228/13804], Loss: 3.0463, Perplexity: 21.0378, time_taken_in_seconds: 23
Epoch [1/1], Step [4229/13804], Loss: 2.7608, Perplexity: 15.8131, time_taken_in_seconds: 23
Epoch [1/1], Step [4230/13804], Loss: 2.6830, Perplexity: 14.6288, time_taken_in_seconds: 24
Epoch [1/1], Step [4231/13804], Loss: 2.7389, Perplexity: 15.4694, time_taken_in_seconds: 25
Epoch [1/1], Step [4232/13804], Loss: 2.8200, Perplexity: 16.7769, time_taken_in_seconds: 26
Epoch [1/1], Step [4233/13804], Loss: 2.7376, Perplexity: 15.4492, time_taken_in_seconds: 27
Epoch [1/1], Step [4234/13804], Loss: 3.0181, Perplexity: 20.4514, time_taken_in_seconds: 28
Epoch [1/1], Step [4235/13804], Loss: 2.8404, Perplexity: 17.1222, time_taken_in_seconds: 28
Epoch [1/1], Step [4236/13804], Loss: 2.6734, Perplexity: 14.4889, time_taken_in_seconds: 29
Epoch [1/1], Step [4237/13804], Loss: 2.7564, Perplexity: 15.7426, time_taken_in_seconds: 30
Epoch [1/1], Step [4238/13804], Loss: 2.6794, Perplexity: 14.5767, time_taken_in_seconds: 31
Epoch [1/1], Step [4239/13804], Loss: 2.5161, Perplexity: 12.3807, time_taken_in_seconds: 32
Epoch [1/1], Step [4240/13804], Loss: 2.9573, Perplexity: 19.2457, time_taken_in_seconds: 32
Epoch [1/1], Step [4241/13804], Loss: 3.0042, Perplexity: 20.1703, time_taken_in_seconds: 33
Epoch [1/1], Step [4242/13804], Loss: 3.1180, Perplexity: 22.6011, time_taken_in_seconds: 34
Epoch [1/1], Step [4243/13804], Loss: 2.6748, Perplexity: 14.5099, time_taken_in_seconds: 35
Epoch [1/1], Step [4244/13804], Loss: 3.0223, Perplexity: 20.5378, time_taken_in_seconds: 36
Epoch [1/1], Step [4245/13804], Loss: 2.6994, Perplexity: 14.8710, time_taken_in_seconds: 37
Epoch [1/1], Step [4246/13804], Loss: 2.7868, Perplexity: 16.2286, time_taken_in_seconds: 37
Epoch [1/1], Step [4247/13804], Loss: 3.2342, Perplexity: 25.3858, time_taken_in_seconds: 38
Epoch [1/1], Step [4248/13804], Loss: 2.3038, Perplexity: 10.0126, time_taken_in_seconds: 39
Epoch [1/1], Step [4249/13804], Loss: 2.9452, Perplexity: 19.0139, time_taken_in_seconds: 40
Epoch [1/1], Step [4250/13804], Loss: 3.0301, Perplexity: 20.6983, time_taken_in_seconds: 41
Epoch [1/1], Step [4251/13804], Loss: 2.9006, Perplexity: 18.1846, time_taken_in_seconds: 42
Epoch [1/1], Step [4252/13804], Loss: 2.7991, Perplexity: 16.4297, time_taken_in_seconds: 42
Epoch [1/1], Step [4253/13804], Loss: 2.6255, Perplexity: 13.8108, time_taken_in_seconds: 43
Epoch [1/1], Step [4254/13804], Loss: 2.8537, Perplexity: 17.3516, time_taken_in_seconds: 44
Epoch [1/1], Step [4255/13804], Loss: 2.8835, Perplexity: 17.8774, time_taken_in_seconds: 45
Epoch [1/1], Step [4256/13804], Loss: 2.7673, Perplexity: 15.9154, time_taken_in_seconds: 46
Epoch [1/1], Step [4257/13804], Loss: 3.0796, Perplexity: 21.7495, time_taken_in_seconds: 46
Epoch [1/1], Step [4258/13804], Loss: 2.7592, Perplexity: 15.7867, time_taken_in_seconds: 47
Epoch [1/1], Step [4259/13804], Loss: 2.7996, Perplexity: 16.4377, time_taken_in_seconds: 48
Epoch [1/1], Step [4260/13804], Loss: 2.8352, Perplexity: 17.0343, time_taken_in_seconds: 49
Epoch [1/1], Step [4261/13804], Loss: 2.5276, Perplexity: 12.5232, time_taken_in_seconds: 50
Epoch [1/1], Step [4262/13804], Loss: 3.5076, Perplexity: 33.3686, time_taken_in_seconds: 51
Epoch [1/1], Step [4263/13804], Loss: 2.6968, Perplexity: 14.8322, time_taken_in_seconds: 51
Epoch [1/1], Step [4264/13804], Loss: 2.6322, Perplexity: 13.9038, time_taken_in_seconds: 52
Epoch [1/1], Step [4265/13804], Loss: 3.2916, Perplexity: 26.8848, time_taken_in_seconds: 53
Epoch [1/1], Step [4266/13804], Loss: 2.8052, Perplexity: 16.5296, time_taken_in_seconds: 54
Epoch [1/1], Step [4267/13804], Loss: 2.6408, Perplexity: 14.0246, time_taken_in_seconds: 55
Epoch [1/1], Step [4268/13804], Loss: 3.1509, Perplexity: 23.3567, time_taken_in_seconds: 56
Epoch [1/1], Step [4269/13804], Loss: 2.8339, Perplexity: 17.0124, time_taken_in_seconds: 56
Epoch [1/1], Step [4270/13804], Loss: 2.6944, Perplexity: 14.7964, time_taken_in_seconds: 58
Epoch [1/1], Step [4271/13804], Loss: 3.8845, Perplexity: 48.6407, time_taken_in_seconds: 58
Epoch [1/1], Step [4272/13804], Loss: 2.7121, Perplexity: 15.0603, time_taken_in_seconds: 59
Epoch [1/1], Step [4273/13804], Loss: 2.7192, Perplexity: 15.1687, time_taken_in_seconds: 60
Epoch [1/1], Step [4274/13804], Loss: 3.7268, Perplexity: 41.5456, time_taken_in_seconds: 61
Epoch [1/1], Step [4275/13804], Loss: 3.0800, Perplexity: 21.7573, time_taken_in_seconds: 62
Epoch [1/1], Step [4276/13804], Loss: 2.5394, Perplexity: 12.6725, time_taken_in_seconds: 63
Epoch [1/1], Step [4277/13804], Loss: 2.8040, Perplexity: 16.5104, time_taken_in_seconds: 63
Epoch [1/1], Step [4278/13804], Loss: 2.9632, Perplexity: 19.3602, time_taken_in_seconds: 64
Epoch [1/1], Step [4279/13804], Loss: 2.5987, Perplexity: 13.4469, time_taken_in_seconds: 65
Epoch [1/1], Step [4280/13804], Loss: 2.5757, Perplexity: 13.1411, time_taken_in_seconds: 66
Epoch [1/1], Step [4281/13804], Loss: 2.6978, Perplexity: 14.8473, time_taken_in_seconds: 67
Epoch [1/1], Step [4282/13804], Loss: 2.8879, Perplexity: 17.9558, time_taken_in_seconds: 67
Epoch [1/1], Step [4283/13804], Loss: 2.9128, Perplexity: 18.4084, time_taken_in_seconds: 68
Epoch [1/1], Step [4284/13804], Loss: 2.5855, Perplexity: 13.2699, time_taken_in_seconds: 69
Epoch [1/1], Step [4285/13804], Loss: 2.4495, Perplexity: 11.5823, time_taken_in_seconds: 70
Epoch [1/1], Step [4286/13804], Loss: 3.1315, Perplexity: 22.9076, time_taken_in_seconds: 71
Epoch [1/1], Step [4287/13804], Loss: 2.9954, Perplexity: 19.9937, time_taken_in_seconds: 72
Epoch [1/1], Step [4288/13804], Loss: 2.9240, Perplexity: 18.6149, time_taken_in_seconds: 72
Epoch [1/1], Step [4289/13804], Loss: 3.1740, Perplexity: 23.9021, time_taken_in_seconds: 73
Epoch [1/1], Step [4290/13804], Loss: 2.4646, Perplexity: 11.7587, time_taken_in_seconds: 74
Epoch [1/1], Step [4291/13804], Loss: 2.3255, Perplexity: 10.2316, time_taken_in_seconds: 75
Epoch [1/1], Step [4292/13804], Loss: 2.7761, Perplexity: 16.0569, time_taken_in_seconds: 76
Epoch [1/1], Step [4293/13804], Loss: 2.9481, Perplexity: 19.0695, time_taken_in_seconds: 77
Epoch [1/1], Step [4294/13804], Loss: 2.9189, Perplexity: 18.5215, time_taken_in_seconds: 77
Epoch [1/1], Step [4295/13804], Loss: 2.7873, Perplexity: 16.2363, time_taken_in_seconds: 78
Epoch [1/1], Step [4296/13804], Loss: 2.6195, Perplexity: 13.7289, time_taken_in_seconds: 79
Epoch [1/1], Step [4297/13804], Loss: 2.6248, Perplexity: 13.8016, time_taken_in_seconds: 80
Epoch [1/1], Step [4298/13804], Loss: 2.6826, Perplexity: 14.6224, time_taken_in_seconds: 81
Epoch [1/1], Step [4299/13804], Loss: 2.4790, Perplexity: 11.9289, time_taken_in_seconds: 82
Epoch [1/1], Step [4300/13804], Loss: 2.6626, Perplexity: 14.3329, time_taken_in_seconds: 82
Epoch [1/1], Step [4301/13804], Loss: 2.5480, Perplexity: 12.7820, time_taken_in_seconds: 0
Epoch [1/1], Step [4302/13804], Loss: 3.5097, Perplexity: 33.4382, time_taken_in_seconds: 1
Epoch [1/1], Step [4303/13804], Loss: 3.2215, Perplexity: 25.0654, time_taken_in_seconds: 2
Epoch [1/1], Step [4304/13804], Loss: 2.9922, Perplexity: 19.9301, time_taken_in_seconds: 3
Epoch [1/1], Step [4305/13804], Loss: 3.9526, Perplexity: 52.0692, time_taken_in_seconds: 4
Epoch [1/1], Step [4306/13804], Loss: 2.3665, Perplexity: 10.6597, time_taken_in_seconds: 4
Epoch [1/1], Step [4307/13804], Loss: 2.6115, Perplexity: 13.6193, time_taken_in_seconds: 5
Epoch [1/1], Step [4308/13804], Loss: 3.1061, Perplexity: 22.3335, time_taken_in_seconds: 6
Epoch [1/1], Step [4309/13804], Loss: 2.8013, Perplexity: 16.4663, time_taken_in_seconds: 7
Epoch [1/1], Step [4310/13804], Loss: 3.7540, Perplexity: 42.6903, time_taken_in_seconds: 8
Epoch [1/1], Step [4311/13804], Loss: 2.6422, Perplexity: 14.0437, time_taken_in_seconds: 9
Epoch [1/1], Step [4312/13804], Loss: 2.8075, Perplexity: 16.5686, time_taken_in_seconds: 9
Epoch [1/1], Step [4313/13804], Loss: 2.6603, Perplexity: 14.3007, time_taken_in_seconds: 10
Epoch [1/1], Step [4314/13804], Loss: 2.8935, Perplexity: 18.0571, time_taken_in_seconds: 11
Epoch [1/1], Step [4315/13804], Loss: 2.8180, Perplexity: 16.7429, time_taken_in_seconds: 12
Epoch [1/1], Step [4316/13804], Loss: 2.5764, Perplexity: 13.1498, time_taken_in_seconds: 13
Epoch [1/1], Step [4317/13804], Loss: 3.5659, Perplexity: 35.3708, time_taken_in_seconds: 14
Epoch [1/1], Step [4318/13804], Loss: 2.4730, Perplexity: 11.8581, time_taken_in_seconds: 14
Epoch [1/1], Step [4319/13804], Loss: 2.4354, Perplexity: 11.4208, time_taken_in_seconds: 15
Epoch [1/1], Step [4320/13804], Loss: 2.7123, Perplexity: 15.0639, time_taken_in_seconds: 16
Epoch [1/1], Step [4321/13804], Loss: 3.1329, Perplexity: 22.9400, time_taken_in_seconds: 17
Epoch [1/1], Step [4322/13804], Loss: 2.7096, Perplexity: 15.0240, time_taken_in_seconds: 18
Epoch [1/1], Step [4323/13804], Loss: 2.6904, Perplexity: 14.7382, time_taken_in_seconds: 19
Epoch [1/1], Step [4324/13804], Loss: 2.5079, Perplexity: 12.2788, time_taken_in_seconds: 19
Epoch [1/1], Step [4325/13804], Loss: 2.5334, Perplexity: 12.5963, time_taken_in_seconds: 20
Epoch [1/1], Step [4326/13804], Loss: 3.4214, Perplexity: 30.6134, time_taken_in_seconds: 21
Epoch [1/1], Step [4327/13804], Loss: 2.8153, Perplexity: 16.6976, time_taken_in_seconds: 22
Epoch [1/1], Step [4328/13804], Loss: 2.5964, Perplexity: 13.4159, time_taken_in_seconds: 23
Epoch [1/1], Step [4329/13804], Loss: 2.7223, Perplexity: 15.2158, time_taken_in_seconds: 23
Epoch [1/1], Step [4330/13804], Loss: 2.5194, Perplexity: 12.4217, time_taken_in_seconds: 24
Epoch [1/1], Step [4331/13804], Loss: 2.8472, Perplexity: 17.2397, time_taken_in_seconds: 25
Epoch [1/1], Step [4332/13804], Loss: 2.8411, Perplexity: 17.1353, time_taken_in_seconds: 26
Epoch [1/1], Step [4333/13804], Loss: 2.9720, Perplexity: 19.5316, time_taken_in_seconds: 27
Epoch [1/1], Step [4334/13804], Loss: 2.7818, Perplexity: 16.1480, time_taken_in_seconds: 28
Epoch [1/1], Step [4335/13804], Loss: 2.3894, Perplexity: 10.9072, time_taken_in_seconds: 28
Epoch [1/1], Step [4336/13804], Loss: 2.9259, Perplexity: 18.6503, time_taken_in_seconds: 29
Epoch [1/1], Step [4337/13804], Loss: 3.4110, Perplexity: 30.2956, time_taken_in_seconds: 30
Epoch [1/1], Step [4338/13804], Loss: 2.7414, Perplexity: 15.5086, time_taken_in_seconds: 31
Epoch [1/1], Step [4339/13804], Loss: 2.6317, Perplexity: 13.8977, time_taken_in_seconds: 32
Epoch [1/1], Step [4340/13804], Loss: 2.8010, Perplexity: 16.4604, time_taken_in_seconds: 33
Epoch [1/1], Step [4341/13804], Loss: 3.1372, Perplexity: 23.0391, time_taken_in_seconds: 33
Epoch [1/1], Step [4342/13804], Loss: 2.6223, Perplexity: 13.7676, time_taken_in_seconds: 34
Epoch [1/1], Step [4343/13804], Loss: 2.5657, Perplexity: 13.0092, time_taken_in_seconds: 35
Epoch [1/1], Step [4344/13804], Loss: 2.6439, Perplexity: 14.0685, time_taken_in_seconds: 36
Epoch [1/1], Step [4345/13804], Loss: 2.6153, Perplexity: 13.6707, time_taken_in_seconds: 37
Epoch [1/1], Step [4346/13804], Loss: 2.7547, Perplexity: 15.7167, time_taken_in_seconds: 38
Epoch [1/1], Step [4347/13804], Loss: 2.3517, Perplexity: 10.5029, time_taken_in_seconds: 39
Epoch [1/1], Step [4348/13804], Loss: 3.0077, Perplexity: 20.2409, time_taken_in_seconds: 39
Epoch [1/1], Step [4349/13804], Loss: 2.7931, Perplexity: 16.3321, time_taken_in_seconds: 40
Epoch [1/1], Step [4350/13804], Loss: 2.8396, Perplexity: 17.1093, time_taken_in_seconds: 41
Epoch [1/1], Step [4351/13804], Loss: 2.6409, Perplexity: 14.0263, time_taken_in_seconds: 42
Epoch [1/1], Step [4352/13804], Loss: 2.4713, Perplexity: 11.8379, time_taken_in_seconds: 43
Epoch [1/1], Step [4353/13804], Loss: 2.8488, Perplexity: 17.2667, time_taken_in_seconds: 44
Epoch [1/1], Step [4354/13804], Loss: 2.8742, Perplexity: 17.7116, time_taken_in_seconds: 44
Epoch [1/1], Step [4355/13804], Loss: 2.7664, Perplexity: 15.9009, time_taken_in_seconds: 45
Epoch [1/1], Step [4356/13804], Loss: 2.7513, Perplexity: 15.6633, time_taken_in_seconds: 46
Epoch [1/1], Step [4357/13804], Loss: 2.5182, Perplexity: 12.4057, time_taken_in_seconds: 47
Epoch [1/1], Step [4358/13804], Loss: 2.3486, Perplexity: 10.4711, time_taken_in_seconds: 48
Epoch [1/1], Step [4359/13804], Loss: 2.8587, Perplexity: 17.4383, time_taken_in_seconds: 49
Epoch [1/1], Step [4360/13804], Loss: 2.5494, Perplexity: 12.7990, time_taken_in_seconds: 49
Epoch [1/1], Step [4361/13804], Loss: 2.6883, Perplexity: 14.7073, time_taken_in_seconds: 50
Epoch [1/1], Step [4362/13804], Loss: 2.3337, Perplexity: 10.3159, time_taken_in_seconds: 51
Epoch [1/1], Step [4363/13804], Loss: 2.5491, Perplexity: 12.7953, time_taken_in_seconds: 52
Epoch [1/1], Step [4364/13804], Loss: 2.7052, Perplexity: 14.9573, time_taken_in_seconds: 53
Epoch [1/1], Step [4365/13804], Loss: 2.9327, Perplexity: 18.7784, time_taken_in_seconds: 54
Epoch [1/1], Step [4366/13804], Loss: 2.3720, Perplexity: 10.7184, time_taken_in_seconds: 54
Epoch [1/1], Step [4367/13804], Loss: 3.2706, Perplexity: 26.3268, time_taken_in_seconds: 55
Epoch [1/1], Step [4368/13804], Loss: 2.3674, Perplexity: 10.6695, time_taken_in_seconds: 56
Epoch [1/1], Step [4369/13804], Loss: 2.9141, Perplexity: 18.4320, time_taken_in_seconds: 57
Epoch [1/1], Step [4370/13804], Loss: 3.2747, Perplexity: 26.4365, time_taken_in_seconds: 58
Epoch [1/1], Step [4371/13804], Loss: 2.7893, Perplexity: 16.2700, time_taken_in_seconds: 59
Epoch [1/1], Step [4372/13804], Loss: 2.5127, Perplexity: 12.3380, time_taken_in_seconds: 59
Epoch [1/1], Step [4373/13804], Loss: 2.7183, Perplexity: 15.1547, time_taken_in_seconds: 60
Epoch [1/1], Step [4374/13804], Loss: 2.7520, Perplexity: 15.6738, time_taken_in_seconds: 61
Epoch [1/1], Step [4375/13804], Loss: 2.7056, Perplexity: 14.9632, time_taken_in_seconds: 62
Epoch [1/1], Step [4376/13804], Loss: 2.8049, Perplexity: 16.5262, time_taken_in_seconds: 63
Epoch [1/1], Step [4377/13804], Loss: 2.7536, Perplexity: 15.6998, time_taken_in_seconds: 63
Epoch [1/1], Step [4378/13804], Loss: 2.6798, Perplexity: 14.5816, time_taken_in_seconds: 64
Epoch [1/1], Step [4379/13804], Loss: 3.0585, Perplexity: 21.2949, time_taken_in_seconds: 65
Epoch [1/1], Step [4380/13804], Loss: 2.9017, Perplexity: 18.2042, time_taken_in_seconds: 66
Epoch [1/1], Step [4381/13804], Loss: 3.1725, Perplexity: 23.8674, time_taken_in_seconds: 67
Epoch [1/1], Step [4382/13804], Loss: 3.1479, Perplexity: 23.2874, time_taken_in_seconds: 68
Epoch [1/1], Step [4383/13804], Loss: 2.5134, Perplexity: 12.3465, time_taken_in_seconds: 68
Epoch [1/1], Step [4384/13804], Loss: 2.3484, Perplexity: 10.4689, time_taken_in_seconds: 69
Epoch [1/1], Step [4385/13804], Loss: 2.4379, Perplexity: 11.4494, time_taken_in_seconds: 70
Epoch [1/1], Step [4386/13804], Loss: 3.0791, Perplexity: 21.7387, time_taken_in_seconds: 71
Epoch [1/1], Step [4387/13804], Loss: 2.7160, Perplexity: 15.1202, time_taken_in_seconds: 72
Epoch [1/1], Step [4388/13804], Loss: 2.9627, Perplexity: 19.3499, time_taken_in_seconds: 73
Epoch [1/1], Step [4389/13804], Loss: 3.0160, Perplexity: 20.4102, time_taken_in_seconds: 73
Epoch [1/1], Step [4390/13804], Loss: 2.9608, Perplexity: 19.3131, time_taken_in_seconds: 74
Epoch [1/1], Step [4391/13804], Loss: 2.7267, Perplexity: 15.2817, time_taken_in_seconds: 75
Epoch [1/1], Step [4392/13804], Loss: 2.4999, Perplexity: 12.1819, time_taken_in_seconds: 76
Epoch [1/1], Step [4393/13804], Loss: 2.5324, Perplexity: 12.5834, time_taken_in_seconds: 77
Epoch [1/1], Step [4394/13804], Loss: 3.1196, Perplexity: 22.6376, time_taken_in_seconds: 78
Epoch [1/1], Step [4395/13804], Loss: 2.6891, Perplexity: 14.7179, time_taken_in_seconds: 78
Epoch [1/1], Step [4396/13804], Loss: 2.5807, Perplexity: 13.2064, time_taken_in_seconds: 79
Epoch [1/1], Step [4397/13804], Loss: 2.9527, Perplexity: 19.1576, time_taken_in_seconds: 80
Epoch [1/1], Step [4398/13804], Loss: 2.9500, Perplexity: 19.1054, time_taken_in_seconds: 81
Epoch [1/1], Step [4399/13804], Loss: 2.6975, Perplexity: 14.8427, time_taken_in_seconds: 82
Epoch [1/1], Step [4400/13804], Loss: 2.3167, Perplexity: 10.1418, time_taken_in_seconds: 83
Epoch [1/1], Step [4401/13804], Loss: 2.8274, Perplexity: 16.9019, time_taken_in_seconds: 0
Epoch [1/1], Step [4402/13804], Loss: 2.7384, Perplexity: 15.4621, time_taken_in_seconds: 1
Epoch [1/1], Step [4403/13804], Loss: 2.8480, Perplexity: 17.2536, time_taken_in_seconds: 2
Epoch [1/1], Step [4404/13804], Loss: 2.4136, Perplexity: 11.1745, time_taken_in_seconds: 3
Epoch [1/1], Step [4405/13804], Loss: 3.2255, Perplexity: 25.1662, time_taken_in_seconds: 4
Epoch [1/1], Step [4406/13804], Loss: 2.5165, Perplexity: 12.3846, time_taken_in_seconds: 4
Epoch [1/1], Step [4407/13804], Loss: 2.6480, Perplexity: 14.1263, time_taken_in_seconds: 5
Epoch [1/1], Step [4408/13804], Loss: 2.9778, Perplexity: 19.6442, time_taken_in_seconds: 6
Epoch [1/1], Step [4409/13804], Loss: 2.9705, Perplexity: 19.5009, time_taken_in_seconds: 7
Epoch [1/1], Step [4410/13804], Loss: 2.8912, Perplexity: 18.0157, time_taken_in_seconds: 8
Epoch [1/1], Step [4411/13804], Loss: 2.6284, Perplexity: 13.8519, time_taken_in_seconds: 9
Epoch [1/1], Step [4412/13804], Loss: 2.8209, Perplexity: 16.7915, time_taken_in_seconds: 9
Epoch [1/1], Step [4413/13804], Loss: 2.6621, Perplexity: 14.3257, time_taken_in_seconds: 10
Epoch [1/1], Step [4414/13804], Loss: 2.8903, Perplexity: 17.9995, time_taken_in_seconds: 11
Epoch [1/1], Step [4415/13804], Loss: 2.7474, Perplexity: 15.6025, time_taken_in_seconds: 12
Epoch [1/1], Step [4416/13804], Loss: 2.6768, Perplexity: 14.5390, time_taken_in_seconds: 13
Epoch [1/1], Step [4417/13804], Loss: 2.5562, Perplexity: 12.8864, time_taken_in_seconds: 14
Epoch [1/1], Step [4418/13804], Loss: 2.6681, Perplexity: 14.4125, time_taken_in_seconds: 15
Epoch [1/1], Step [4419/13804], Loss: 2.5941, Perplexity: 13.3849, time_taken_in_seconds: 15
Epoch [1/1], Step [4420/13804], Loss: 2.9509, Perplexity: 19.1230, time_taken_in_seconds: 16
Epoch [1/1], Step [4421/13804], Loss: 2.5766, Perplexity: 13.1517, time_taken_in_seconds: 17
Epoch [1/1], Step [4422/13804], Loss: 2.7161, Perplexity: 15.1215, time_taken_in_seconds: 18
Epoch [1/1], Step [4423/13804], Loss: 2.6806, Perplexity: 14.5937, time_taken_in_seconds: 19
Epoch [1/1], Step [4424/13804], Loss: 2.4036, Perplexity: 11.0632, time_taken_in_seconds: 20
Epoch [1/1], Step [4425/13804], Loss: 2.7522, Perplexity: 15.6775, time_taken_in_seconds: 20
Epoch [1/1], Step [4426/13804], Loss: 2.5120, Perplexity: 12.3293, time_taken_in_seconds: 21
Epoch [1/1], Step [4427/13804], Loss: 2.7678, Perplexity: 15.9240, time_taken_in_seconds: 22
Epoch [1/1], Step [4428/13804], Loss: 2.6836, Perplexity: 14.6376, time_taken_in_seconds: 23
Epoch [1/1], Step [4429/13804], Loss: 2.8364, Perplexity: 17.0548, time_taken_in_seconds: 24
Epoch [1/1], Step [4430/13804], Loss: 2.5464, Perplexity: 12.7612, time_taken_in_seconds: 25
Epoch [1/1], Step [4431/13804], Loss: 3.0355, Perplexity: 20.8108, time_taken_in_seconds: 25
Epoch [1/1], Step [4432/13804], Loss: 2.6390, Perplexity: 13.9995, time_taken_in_seconds: 26
Epoch [1/1], Step [4433/13804], Loss: 2.7035, Perplexity: 14.9316, time_taken_in_seconds: 27
Epoch [1/1], Step [4434/13804], Loss: 2.3274, Perplexity: 10.2508, time_taken_in_seconds: 28
Epoch [1/1], Step [4435/13804], Loss: 2.8495, Perplexity: 17.2798, time_taken_in_seconds: 29
Epoch [1/1], Step [4436/13804], Loss: 2.5795, Perplexity: 13.1905, time_taken_in_seconds: 30
Epoch [1/1], Step [4437/13804], Loss: 2.7307, Perplexity: 15.3443, time_taken_in_seconds: 30
Epoch [1/1], Step [4438/13804], Loss: 2.6377, Perplexity: 13.9815, time_taken_in_seconds: 31
Epoch [1/1], Step [4439/13804], Loss: 2.7449, Perplexity: 15.5628, time_taken_in_seconds: 32
Epoch [1/1], Step [4440/13804], Loss: 2.8006, Perplexity: 16.4543, time_taken_in_seconds: 33
Epoch [1/1], Step [4441/13804], Loss: 2.6238, Perplexity: 13.7875, time_taken_in_seconds: 34
Epoch [1/1], Step [4442/13804], Loss: 3.0523, Perplexity: 21.1650, time_taken_in_seconds: 35
Epoch [1/1], Step [4443/13804], Loss: 2.6238, Perplexity: 13.7874, time_taken_in_seconds: 35
Epoch [1/1], Step [4444/13804], Loss: 2.8709, Perplexity: 17.6529, time_taken_in_seconds: 36
Epoch [1/1], Step [4445/13804], Loss: 3.0591, Perplexity: 21.3080, time_taken_in_seconds: 37
Epoch [1/1], Step [4446/13804], Loss: 2.8352, Perplexity: 17.0344, time_taken_in_seconds: 38
Epoch [1/1], Step [4447/13804], Loss: 3.1711, Perplexity: 23.8335, time_taken_in_seconds: 39
Epoch [1/1], Step [4448/13804], Loss: 2.8816, Perplexity: 17.8419, time_taken_in_seconds: 39
Epoch [1/1], Step [4449/13804], Loss: 2.5642, Perplexity: 12.9908, time_taken_in_seconds: 40
Epoch [1/1], Step [4450/13804], Loss: 2.7500, Perplexity: 15.6431, time_taken_in_seconds: 41
Epoch [1/1], Step [4451/13804], Loss: 2.9833, Perplexity: 19.7525, time_taken_in_seconds: 42
Epoch [1/1], Step [4452/13804], Loss: 2.4675, Perplexity: 11.7927, time_taken_in_seconds: 43
Epoch [1/1], Step [4453/13804], Loss: 2.7097, Perplexity: 15.0248, time_taken_in_seconds: 44
Epoch [1/1], Step [4454/13804], Loss: 2.6728, Perplexity: 14.4810, time_taken_in_seconds: 44
Epoch [1/1], Step [4455/13804], Loss: 3.1141, Perplexity: 22.5141, time_taken_in_seconds: 45
Epoch [1/1], Step [4456/13804], Loss: 3.4891, Perplexity: 32.7563, time_taken_in_seconds: 46
Epoch [1/1], Step [4457/13804], Loss: 2.8134, Perplexity: 16.6663, time_taken_in_seconds: 47
Epoch [1/1], Step [4458/13804], Loss: 3.0542, Perplexity: 21.2032, time_taken_in_seconds: 48
Epoch [1/1], Step [4459/13804], Loss: 2.6138, Perplexity: 13.6505, time_taken_in_seconds: 49
Epoch [1/1], Step [4460/13804], Loss: 2.2547, Perplexity: 9.5320, time_taken_in_seconds: 49
Epoch [1/1], Step [4461/13804], Loss: 2.4377, Perplexity: 11.4472, time_taken_in_seconds: 50
Epoch [1/1], Step [4462/13804], Loss: 2.7192, Perplexity: 15.1689, time_taken_in_seconds: 51
Epoch [1/1], Step [4463/13804], Loss: 2.8675, Perplexity: 17.5928, time_taken_in_seconds: 52
Epoch [1/1], Step [4464/13804], Loss: 2.8429, Perplexity: 17.1651, time_taken_in_seconds: 53
Epoch [1/1], Step [4465/13804], Loss: 2.8150, Perplexity: 16.6930, time_taken_in_seconds: 54
Epoch [1/1], Step [4466/13804], Loss: 2.7464, Perplexity: 15.5859, time_taken_in_seconds: 54
Epoch [1/1], Step [4467/13804], Loss: 2.7266, Perplexity: 15.2802, time_taken_in_seconds: 55
Epoch [1/1], Step [4468/13804], Loss: 2.5807, Perplexity: 13.2061, time_taken_in_seconds: 56
Epoch [1/1], Step [4469/13804], Loss: 2.7679, Perplexity: 15.9252, time_taken_in_seconds: 57
Epoch [1/1], Step [4470/13804], Loss: 2.9497, Perplexity: 19.1008, time_taken_in_seconds: 58
Epoch [1/1], Step [4471/13804], Loss: 2.7158, Perplexity: 15.1167, time_taken_in_seconds: 59
Epoch [1/1], Step [4472/13804], Loss: 2.9436, Perplexity: 18.9836, time_taken_in_seconds: 60
Epoch [1/1], Step [4473/13804], Loss: 2.9769, Perplexity: 19.6264, time_taken_in_seconds: 60
Epoch [1/1], Step [4474/13804], Loss: 2.6750, Perplexity: 14.5129, time_taken_in_seconds: 61
Epoch [1/1], Step [4475/13804], Loss: 2.8178, Perplexity: 16.7403, time_taken_in_seconds: 62
Epoch [1/1], Step [4476/13804], Loss: 2.9542, Perplexity: 19.1865, time_taken_in_seconds: 63
Epoch [1/1], Step [4477/13804], Loss: 2.4706, Perplexity: 11.8291, time_taken_in_seconds: 64
Epoch [1/1], Step [4478/13804], Loss: 2.7156, Perplexity: 15.1136, time_taken_in_seconds: 65
Epoch [1/1], Step [4479/13804], Loss: 2.6919, Perplexity: 14.7599, time_taken_in_seconds: 65
Epoch [1/1], Step [4480/13804], Loss: 2.7974, Perplexity: 16.4018, time_taken_in_seconds: 66
Epoch [1/1], Step [4481/13804], Loss: 2.9518, Perplexity: 19.1403, time_taken_in_seconds: 67
Epoch [1/1], Step [4482/13804], Loss: 2.5908, Perplexity: 13.3410, time_taken_in_seconds: 68
Epoch [1/1], Step [4483/13804], Loss: 2.8252, Perplexity: 16.8649, time_taken_in_seconds: 69
Epoch [1/1], Step [4484/13804], Loss: 2.8379, Perplexity: 17.0792, time_taken_in_seconds: 70
Epoch [1/1], Step [4485/13804], Loss: 3.2899, Perplexity: 26.8390, time_taken_in_seconds: 70
Epoch [1/1], Step [4486/13804], Loss: 2.7599, Perplexity: 15.7984, time_taken_in_seconds: 71
Epoch [1/1], Step [4487/13804], Loss: 2.6636, Perplexity: 14.3473, time_taken_in_seconds: 72
Epoch [1/1], Step [4488/13804], Loss: 2.4902, Perplexity: 12.0643, time_taken_in_seconds: 73
Epoch [1/1], Step [4489/13804], Loss: 2.4621, Perplexity: 11.7300, time_taken_in_seconds: 74
Epoch [1/1], Step [4490/13804], Loss: 2.5412, Perplexity: 12.6955, time_taken_in_seconds: 75
Epoch [1/1], Step [4491/13804], Loss: 2.7797, Perplexity: 16.1144, time_taken_in_seconds: 76
Epoch [1/1], Step [4492/13804], Loss: 2.8963, Perplexity: 18.1067, time_taken_in_seconds: 76
Epoch [1/1], Step [4493/13804], Loss: 3.4765, Perplexity: 32.3464, time_taken_in_seconds: 77
Epoch [1/1], Step [4494/13804], Loss: 2.8110, Perplexity: 16.6266, time_taken_in_seconds: 78
Epoch [1/1], Step [4495/13804], Loss: 3.5333, Perplexity: 34.2375, time_taken_in_seconds: 79
Epoch [1/1], Step [4496/13804], Loss: 3.3319, Perplexity: 27.9912, time_taken_in_seconds: 80
Epoch [1/1], Step [4497/13804], Loss: 2.7368, Perplexity: 15.4371, time_taken_in_seconds: 81
Epoch [1/1], Step [4498/13804], Loss: 2.9233, Perplexity: 18.6020, time_taken_in_seconds: 81
Epoch [1/1], Step [4499/13804], Loss: 2.3162, Perplexity: 10.1374, time_taken_in_seconds: 82
Epoch [1/1], Step [4500/13804], Loss: 2.8238, Perplexity: 16.8408, time_taken_in_seconds: 83
Epoch [1/1], Step [4501/13804], Loss: 2.6757, Perplexity: 14.5230, time_taken_in_seconds: 0
Epoch [1/1], Step [4502/13804], Loss: 2.8244, Perplexity: 16.8511, time_taken_in_seconds: 1
Epoch [1/1], Step [4503/13804], Loss: 3.0408, Perplexity: 20.9223, time_taken_in_seconds: 2
Epoch [1/1], Step [4504/13804], Loss: 2.5142, Perplexity: 12.3562, time_taken_in_seconds: 3
Epoch [1/1], Step [4505/13804], Loss: 2.9642, Perplexity: 19.3784, time_taken_in_seconds: 4
Epoch [1/1], Step [4506/13804], Loss: 2.7615, Perplexity: 15.8241, time_taken_in_seconds: 5
Epoch [1/1], Step [4507/13804], Loss: 2.9683, Perplexity: 19.4591, time_taken_in_seconds: 5
Epoch [1/1], Step [4508/13804], Loss: 2.7811, Perplexity: 16.1369, time_taken_in_seconds: 6
Epoch [1/1], Step [4509/13804], Loss: 2.6434, Perplexity: 14.0613, time_taken_in_seconds: 7
Epoch [1/1], Step [4510/13804], Loss: 2.7257, Perplexity: 15.2673, time_taken_in_seconds: 8
Epoch [1/1], Step [4511/13804], Loss: 2.7621, Perplexity: 15.8323, time_taken_in_seconds: 9
Epoch [1/1], Step [4512/13804], Loss: 2.8740, Perplexity: 17.7072, time_taken_in_seconds: 10
Epoch [1/1], Step [4513/13804], Loss: 2.5310, Perplexity: 12.5660, time_taken_in_seconds: 10
Epoch [1/1], Step [4514/13804], Loss: 2.9434, Perplexity: 18.9794, time_taken_in_seconds: 11
Epoch [1/1], Step [4515/13804], Loss: 2.6638, Perplexity: 14.3501, time_taken_in_seconds: 12
Epoch [1/1], Step [4516/13804], Loss: 2.5040, Perplexity: 12.2308, time_taken_in_seconds: 13
Epoch [1/1], Step [4517/13804], Loss: 2.9897, Perplexity: 19.8793, time_taken_in_seconds: 14
Epoch [1/1], Step [4518/13804], Loss: 2.8876, Perplexity: 17.9509, time_taken_in_seconds: 15
Epoch [1/1], Step [4519/13804], Loss: 2.3535, Perplexity: 10.5224, time_taken_in_seconds: 15
Epoch [1/1], Step [4520/13804], Loss: 2.6928, Perplexity: 14.7724, time_taken_in_seconds: 16
Epoch [1/1], Step [4521/13804], Loss: 2.7650, Perplexity: 15.8790, time_taken_in_seconds: 17
Epoch [1/1], Step [4522/13804], Loss: 2.6676, Perplexity: 14.4058, time_taken_in_seconds: 18
Epoch [1/1], Step [4523/13804], Loss: 2.8790, Perplexity: 17.7957, time_taken_in_seconds: 19
Epoch [1/1], Step [4524/13804], Loss: 2.6005, Perplexity: 13.4700, time_taken_in_seconds: 20
Epoch [1/1], Step [4525/13804], Loss: 2.4492, Perplexity: 11.5785, time_taken_in_seconds: 20
Epoch [1/1], Step [4526/13804], Loss: 2.7629, Perplexity: 15.8455, time_taken_in_seconds: 21
Epoch [1/1], Step [4527/13804], Loss: 2.3850, Perplexity: 10.8586, time_taken_in_seconds: 22
Epoch [1/1], Step [4528/13804], Loss: 3.2424, Perplexity: 25.5941, time_taken_in_seconds: 23
Epoch [1/1], Step [4529/13804], Loss: 2.9474, Perplexity: 19.0559, time_taken_in_seconds: 24
Epoch [1/1], Step [4530/13804], Loss: 2.8040, Perplexity: 16.5106, time_taken_in_seconds: 24
Epoch [1/1], Step [4531/13804], Loss: 2.6073, Perplexity: 13.5621, time_taken_in_seconds: 25
Epoch [1/1], Step [4532/13804], Loss: 2.8516, Perplexity: 17.3161, time_taken_in_seconds: 26
Epoch [1/1], Step [4533/13804], Loss: 2.5525, Perplexity: 12.8391, time_taken_in_seconds: 27
Epoch [1/1], Step [4534/13804], Loss: 2.6509, Perplexity: 14.1661, time_taken_in_seconds: 28
Epoch [1/1], Step [4535/13804], Loss: 2.8723, Perplexity: 17.6780, time_taken_in_seconds: 29
Epoch [1/1], Step [4536/13804], Loss: 2.2978, Perplexity: 9.9519, time_taken_in_seconds: 29
Epoch [1/1], Step [4537/13804], Loss: 2.6980, Perplexity: 14.8495, time_taken_in_seconds: 30
Epoch [1/1], Step [4538/13804], Loss: 2.9424, Perplexity: 18.9614, time_taken_in_seconds: 31
Epoch [1/1], Step [4539/13804], Loss: 2.7664, Perplexity: 15.9015, time_taken_in_seconds: 32
Epoch [1/1], Step [4540/13804], Loss: 2.5727, Perplexity: 13.1014, time_taken_in_seconds: 33
Epoch [1/1], Step [4541/13804], Loss: 2.7985, Perplexity: 16.4204, time_taken_in_seconds: 33
Epoch [1/1], Step [4542/13804], Loss: 2.5872, Perplexity: 13.2931, time_taken_in_seconds: 34
Epoch [1/1], Step [4543/13804], Loss: 2.3789, Perplexity: 10.7930, time_taken_in_seconds: 35
Epoch [1/1], Step [4544/13804], Loss: 2.8318, Perplexity: 16.9762, time_taken_in_seconds: 36
Epoch [1/1], Step [4545/13804], Loss: 3.2639, Perplexity: 26.1502, time_taken_in_seconds: 37
Epoch [1/1], Step [4546/13804], Loss: 2.7734, Perplexity: 16.0131, time_taken_in_seconds: 38
Epoch [1/1], Step [4547/13804], Loss: 2.5609, Perplexity: 12.9470, time_taken_in_seconds: 38
Epoch [1/1], Step [4548/13804], Loss: 3.0085, Perplexity: 20.2570, time_taken_in_seconds: 39
Epoch [1/1], Step [4549/13804], Loss: 3.1390, Perplexity: 23.0799, time_taken_in_seconds: 40
Epoch [1/1], Step [4550/13804], Loss: 2.8555, Perplexity: 17.3828, time_taken_in_seconds: 41
Epoch [1/1], Step [4551/13804], Loss: 2.9486, Perplexity: 19.0796, time_taken_in_seconds: 42
Epoch [1/1], Step [4552/13804], Loss: 2.8985, Perplexity: 18.1465, time_taken_in_seconds: 43
Epoch [1/1], Step [4553/13804], Loss: 3.1109, Perplexity: 22.4407, time_taken_in_seconds: 43
Epoch [1/1], Step [4554/13804], Loss: 2.6770, Perplexity: 14.5420, time_taken_in_seconds: 44
Epoch [1/1], Step [4555/13804], Loss: 2.6357, Perplexity: 13.9535, time_taken_in_seconds: 45
Epoch [1/1], Step [4556/13804], Loss: 2.7547, Perplexity: 15.7170, time_taken_in_seconds: 46
Epoch [1/1], Step [4557/13804], Loss: 2.6701, Perplexity: 14.4407, time_taken_in_seconds: 47
Epoch [1/1], Step [4558/13804], Loss: 2.7489, Perplexity: 15.6255, time_taken_in_seconds: 48
Epoch [1/1], Step [4559/13804], Loss: 2.7914, Perplexity: 16.3040, time_taken_in_seconds: 48
Epoch [1/1], Step [4560/13804], Loss: 2.8296, Perplexity: 16.9380, time_taken_in_seconds: 49
Epoch [1/1], Step [4561/13804], Loss: 2.7581, Perplexity: 15.7691, time_taken_in_seconds: 50
Epoch [1/1], Step [4562/13804], Loss: 2.7657, Perplexity: 15.8906, time_taken_in_seconds: 51
Epoch [1/1], Step [4563/13804], Loss: 2.6912, Perplexity: 14.7490, time_taken_in_seconds: 52
Epoch [1/1], Step [4564/13804], Loss: 3.0408, Perplexity: 20.9226, time_taken_in_seconds: 53
Epoch [1/1], Step [4565/13804], Loss: 2.9286, Perplexity: 18.7011, time_taken_in_seconds: 54
Epoch [1/1], Step [4566/13804], Loss: 2.8217, Perplexity: 16.8058, time_taken_in_seconds: 54
Epoch [1/1], Step [4567/13804], Loss: 2.6972, Perplexity: 14.8380, time_taken_in_seconds: 55
Epoch [1/1], Step [4568/13804], Loss: 2.6992, Perplexity: 14.8685, time_taken_in_seconds: 56
Epoch [1/1], Step [4569/13804], Loss: 2.4244, Perplexity: 11.2953, time_taken_in_seconds: 57
Epoch [1/1], Step [4570/13804], Loss: 2.6973, Perplexity: 14.8390, time_taken_in_seconds: 58
Epoch [1/1], Step [4571/13804], Loss: 2.5132, Perplexity: 12.3448, time_taken_in_seconds: 59
Epoch [1/1], Step [4572/13804], Loss: 2.6784, Perplexity: 14.5612, time_taken_in_seconds: 59
Epoch [1/1], Step [4573/13804], Loss: 2.7685, Perplexity: 15.9352, time_taken_in_seconds: 60
Epoch [1/1], Step [4574/13804], Loss: 2.7013, Perplexity: 14.8986, time_taken_in_seconds: 61
Epoch [1/1], Step [4575/13804], Loss: 2.9168, Perplexity: 18.4814, time_taken_in_seconds: 62
Epoch [1/1], Step [4576/13804], Loss: 3.1247, Perplexity: 22.7531, time_taken_in_seconds: 63
Epoch [1/1], Step [4577/13804], Loss: 2.4429, Perplexity: 11.5066, time_taken_in_seconds: 64
Epoch [1/1], Step [4578/13804], Loss: 2.5767, Perplexity: 13.1536, time_taken_in_seconds: 64
Epoch [1/1], Step [4579/13804], Loss: 2.7936, Perplexity: 16.3394, time_taken_in_seconds: 65
Epoch [1/1], Step [4580/13804], Loss: 2.6093, Perplexity: 13.5895, time_taken_in_seconds: 66
Epoch [1/1], Step [4581/13804], Loss: 2.7687, Perplexity: 15.9382, time_taken_in_seconds: 67
Epoch [1/1], Step [4582/13804], Loss: 2.7371, Perplexity: 15.4415, time_taken_in_seconds: 68
Epoch [1/1], Step [4583/13804], Loss: 2.9820, Perplexity: 19.7263, time_taken_in_seconds: 68
Epoch [1/1], Step [4584/13804], Loss: 2.5560, Perplexity: 12.8845, time_taken_in_seconds: 69
Epoch [1/1], Step [4585/13804], Loss: 2.6532, Perplexity: 14.1991, time_taken_in_seconds: 70
Epoch [1/1], Step [4586/13804], Loss: 2.4725, Perplexity: 11.8519, time_taken_in_seconds: 71
Epoch [1/1], Step [4587/13804], Loss: 2.3644, Perplexity: 10.6375, time_taken_in_seconds: 72
Epoch [1/1], Step [4588/13804], Loss: 2.6529, Perplexity: 14.1956, time_taken_in_seconds: 73
Epoch [1/1], Step [4589/13804], Loss: 2.6738, Perplexity: 14.4957, time_taken_in_seconds: 73
Epoch [1/1], Step [4590/13804], Loss: 2.4388, Perplexity: 11.4592, time_taken_in_seconds: 74
Epoch [1/1], Step [4591/13804], Loss: 2.9116, Perplexity: 18.3866, time_taken_in_seconds: 75
Epoch [1/1], Step [4592/13804], Loss: 2.6932, Perplexity: 14.7789, time_taken_in_seconds: 76
Epoch [1/1], Step [4593/13804], Loss: 3.0818, Perplexity: 21.7974, time_taken_in_seconds: 77
Epoch [1/1], Step [4594/13804], Loss: 2.5141, Perplexity: 12.3560, time_taken_in_seconds: 77
Epoch [1/1], Step [4595/13804], Loss: 2.8547, Perplexity: 17.3700, time_taken_in_seconds: 78
Epoch [1/1], Step [4596/13804], Loss: 3.2191, Perplexity: 25.0059, time_taken_in_seconds: 79
Epoch [1/1], Step [4597/13804], Loss: 2.6959, Perplexity: 14.8189, time_taken_in_seconds: 80
Epoch [1/1], Step [4598/13804], Loss: 3.0062, Perplexity: 20.2109, time_taken_in_seconds: 81
Epoch [1/1], Step [4599/13804], Loss: 2.7591, Perplexity: 15.7859, time_taken_in_seconds: 82
Epoch [1/1], Step [4600/13804], Loss: 3.0456, Perplexity: 21.0223, time_taken_in_seconds: 82
Epoch [1/1], Step [4601/13804], Loss: 2.9723, Perplexity: 19.5363, time_taken_in_seconds: 0
Epoch [1/1], Step [4602/13804], Loss: 3.1969, Perplexity: 24.4572, time_taken_in_seconds: 1
Epoch [1/1], Step [4603/13804], Loss: 2.7925, Perplexity: 16.3225, time_taken_in_seconds: 2
Epoch [1/1], Step [4604/13804], Loss: 2.7548, Perplexity: 15.7185, time_taken_in_seconds: 3
Epoch [1/1], Step [4605/13804], Loss: 3.1443, Perplexity: 23.2035, time_taken_in_seconds: 4
Epoch [1/1], Step [4606/13804], Loss: 2.9281, Perplexity: 18.6917, time_taken_in_seconds: 4
Epoch [1/1], Step [4607/13804], Loss: 2.5820, Perplexity: 13.2241, time_taken_in_seconds: 5
Epoch [1/1], Step [4608/13804], Loss: 3.2522, Perplexity: 25.8474, time_taken_in_seconds: 6
Epoch [1/1], Step [4609/13804], Loss: 2.6116, Perplexity: 13.6207, time_taken_in_seconds: 7
Epoch [1/1], Step [4610/13804], Loss: 2.5689, Perplexity: 13.0510, time_taken_in_seconds: 8
Epoch [1/1], Step [4611/13804], Loss: 2.6632, Perplexity: 14.3416, time_taken_in_seconds: 9
Epoch [1/1], Step [4612/13804], Loss: 2.7833, Perplexity: 16.1724, time_taken_in_seconds: 9
Epoch [1/1], Step [4613/13804], Loss: 2.7019, Perplexity: 14.9082, time_taken_in_seconds: 10
Epoch [1/1], Step [4614/13804], Loss: 2.3942, Perplexity: 10.9598, time_taken_in_seconds: 11
Epoch [1/1], Step [4615/13804], Loss: 2.7948, Perplexity: 16.3599, time_taken_in_seconds: 12
Epoch [1/1], Step [4616/13804], Loss: 3.9616, Perplexity: 52.5395, time_taken_in_seconds: 13
Epoch [1/1], Step [4617/13804], Loss: 3.2091, Perplexity: 24.7569, time_taken_in_seconds: 14
Epoch [1/1], Step [4618/13804], Loss: 2.7845, Perplexity: 16.1920, time_taken_in_seconds: 14
Epoch [1/1], Step [4619/13804], Loss: 2.5598, Perplexity: 12.9326, time_taken_in_seconds: 15
Epoch [1/1], Step [4620/13804], Loss: 2.5813, Perplexity: 13.2145, time_taken_in_seconds: 16
Epoch [1/1], Step [4621/13804], Loss: 2.5077, Perplexity: 12.2770, time_taken_in_seconds: 17
Epoch [1/1], Step [4622/13804], Loss: 2.6826, Perplexity: 14.6234, time_taken_in_seconds: 18
Epoch [1/1], Step [4623/13804], Loss: 2.5356, Perplexity: 12.6241, time_taken_in_seconds: 19
Epoch [1/1], Step [4624/13804], Loss: 2.4680, Perplexity: 11.7993, time_taken_in_seconds: 19
Epoch [1/1], Step [4625/13804], Loss: 3.9908, Perplexity: 54.0961, time_taken_in_seconds: 20
Epoch [1/1], Step [4626/13804], Loss: 2.7708, Perplexity: 15.9721, time_taken_in_seconds: 21
Epoch [1/1], Step [4627/13804], Loss: 2.7734, Perplexity: 16.0128, time_taken_in_seconds: 22
Epoch [1/1], Step [4628/13804], Loss: 2.7421, Perplexity: 15.5196, time_taken_in_seconds: 23
Epoch [1/1], Step [4629/13804], Loss: 2.6369, Perplexity: 13.9699, time_taken_in_seconds: 23
Epoch [1/1], Step [4630/13804], Loss: 2.7041, Perplexity: 14.9405, time_taken_in_seconds: 24
Epoch [1/1], Step [4631/13804], Loss: 2.5286, Perplexity: 12.5363, time_taken_in_seconds: 25
Epoch [1/1], Step [4632/13804], Loss: 3.3029, Perplexity: 27.1920, time_taken_in_seconds: 26
Epoch [1/1], Step [4633/13804], Loss: 3.1558, Perplexity: 23.4724, time_taken_in_seconds: 27
Epoch [1/1], Step [4634/13804], Loss: 3.1143, Perplexity: 22.5171, time_taken_in_seconds: 28
Epoch [1/1], Step [4635/13804], Loss: 2.7974, Perplexity: 16.4018, time_taken_in_seconds: 29
Epoch [1/1], Step [4636/13804], Loss: 2.9173, Perplexity: 18.4919, time_taken_in_seconds: 29
Epoch [1/1], Step [4637/13804], Loss: 2.4905, Perplexity: 12.0678, time_taken_in_seconds: 30
Epoch [1/1], Step [4638/13804], Loss: 2.6757, Perplexity: 14.5222, time_taken_in_seconds: 31
Epoch [1/1], Step [4639/13804], Loss: 2.7437, Perplexity: 15.5439, time_taken_in_seconds: 32
Epoch [1/1], Step [4640/13804], Loss: 3.5254, Perplexity: 33.9664, time_taken_in_seconds: 33
Epoch [1/1], Step [4641/13804], Loss: 2.7697, Perplexity: 15.9531, time_taken_in_seconds: 33
Epoch [1/1], Step [4642/13804], Loss: 2.7708, Perplexity: 15.9716, time_taken_in_seconds: 34
Epoch [1/1], Step [4643/13804], Loss: 2.3127, Perplexity: 10.1017, time_taken_in_seconds: 35
Epoch [1/1], Step [4644/13804], Loss: 2.8087, Perplexity: 16.5885, time_taken_in_seconds: 36
Epoch [1/1], Step [4645/13804], Loss: 2.6007, Perplexity: 13.4734, time_taken_in_seconds: 37
Epoch [1/1], Step [4646/13804], Loss: 2.9261, Perplexity: 18.6542, time_taken_in_seconds: 38
Epoch [1/1], Step [4647/13804], Loss: 2.7472, Perplexity: 15.5984, time_taken_in_seconds: 38
Epoch [1/1], Step [4648/13804], Loss: 2.8803, Perplexity: 17.8198, time_taken_in_seconds: 39
Epoch [1/1], Step [4649/13804], Loss: 2.7358, Perplexity: 15.4221, time_taken_in_seconds: 40
Epoch [1/1], Step [4650/13804], Loss: 2.7185, Perplexity: 15.1569, time_taken_in_seconds: 41
Epoch [1/1], Step [4651/13804], Loss: 2.4278, Perplexity: 11.3339, time_taken_in_seconds: 42
Epoch [1/1], Step [4652/13804], Loss: 2.5572, Perplexity: 12.9002, time_taken_in_seconds: 43
Epoch [1/1], Step [4653/13804], Loss: 2.6132, Perplexity: 13.6426, time_taken_in_seconds: 43
Epoch [1/1], Step [4654/13804], Loss: 2.8506, Perplexity: 17.2985, time_taken_in_seconds: 44
Epoch [1/1], Step [4655/13804], Loss: 2.9519, Perplexity: 19.1427, time_taken_in_seconds: 45
Epoch [1/1], Step [4656/13804], Loss: 2.6030, Perplexity: 13.5038, time_taken_in_seconds: 46
Epoch [1/1], Step [4657/13804], Loss: 2.5126, Perplexity: 12.3375, time_taken_in_seconds: 47
Epoch [1/1], Step [4658/13804], Loss: 2.7684, Perplexity: 15.9337, time_taken_in_seconds: 47
Epoch [1/1], Step [4659/13804], Loss: 2.5600, Perplexity: 12.9354, time_taken_in_seconds: 48
Epoch [1/1], Step [4660/13804], Loss: 2.5768, Perplexity: 13.1544, time_taken_in_seconds: 49
Epoch [1/1], Step [4661/13804], Loss: 2.8791, Perplexity: 17.7981, time_taken_in_seconds: 50
Epoch [1/1], Step [4662/13804], Loss: 2.8398, Perplexity: 17.1130, time_taken_in_seconds: 51
Epoch [1/1], Step [4663/13804], Loss: 3.0427, Perplexity: 20.9623, time_taken_in_seconds: 52
Epoch [1/1], Step [4664/13804], Loss: 2.5451, Perplexity: 12.7446, time_taken_in_seconds: 52
Epoch [1/1], Step [4665/13804], Loss: 2.5282, Perplexity: 12.5305, time_taken_in_seconds: 53
Epoch [1/1], Step [4666/13804], Loss: 3.0443, Perplexity: 20.9943, time_taken_in_seconds: 54
Epoch [1/1], Step [4667/13804], Loss: 3.1796, Perplexity: 24.0360, time_taken_in_seconds: 55
Epoch [1/1], Step [4668/13804], Loss: 2.7923, Perplexity: 16.3181, time_taken_in_seconds: 56
Epoch [1/1], Step [4669/13804], Loss: 3.0408, Perplexity: 20.9228, time_taken_in_seconds: 56
Epoch [1/1], Step [4670/13804], Loss: 2.7381, Perplexity: 15.4583, time_taken_in_seconds: 57
Epoch [1/1], Step [4671/13804], Loss: 2.9303, Perplexity: 18.7337, time_taken_in_seconds: 58
Epoch [1/1], Step [4672/13804], Loss: 2.4024, Perplexity: 11.0499, time_taken_in_seconds: 59
Epoch [1/1], Step [4673/13804], Loss: 2.4547, Perplexity: 11.6432, time_taken_in_seconds: 60
Epoch [1/1], Step [4674/13804], Loss: 2.5395, Perplexity: 12.6728, time_taken_in_seconds: 61
Epoch [1/1], Step [4675/13804], Loss: 2.8286, Perplexity: 16.9220, time_taken_in_seconds: 61
Epoch [1/1], Step [4676/13804], Loss: 2.5547, Perplexity: 12.8673, time_taken_in_seconds: 62
Epoch [1/1], Step [4677/13804], Loss: 2.9793, Perplexity: 19.6748, time_taken_in_seconds: 63
Epoch [1/1], Step [4678/13804], Loss: 2.7253, Perplexity: 15.2609, time_taken_in_seconds: 64
Epoch [1/1], Step [4679/13804], Loss: 2.6400, Perplexity: 14.0134, time_taken_in_seconds: 65
Epoch [1/1], Step [4680/13804], Loss: 2.8126, Perplexity: 16.6526, time_taken_in_seconds: 66
Epoch [1/1], Step [4681/13804], Loss: 2.5837, Perplexity: 13.2460, time_taken_in_seconds: 66
Epoch [1/1], Step [4682/13804], Loss: 2.3414, Perplexity: 10.3959, time_taken_in_seconds: 67
Epoch [1/1], Step [4683/13804], Loss: 2.6305, Perplexity: 13.8809, time_taken_in_seconds: 68
Epoch [1/1], Step [4684/13804], Loss: 2.8890, Perplexity: 17.9760, time_taken_in_seconds: 69
Epoch [1/1], Step [4685/13804], Loss: 2.5239, Perplexity: 12.4774, time_taken_in_seconds: 70
Epoch [1/1], Step [4686/13804], Loss: 3.5462, Perplexity: 34.6826, time_taken_in_seconds: 71
Epoch [1/1], Step [4687/13804], Loss: 2.4982, Perplexity: 12.1609, time_taken_in_seconds: 71
Epoch [1/1], Step [4688/13804], Loss: 2.7189, Perplexity: 15.1642, time_taken_in_seconds: 72
Epoch [1/1], Step [4689/13804], Loss: 2.5736, Perplexity: 13.1135, time_taken_in_seconds: 73
Epoch [1/1], Step [4690/13804], Loss: 2.4019, Perplexity: 11.0436, time_taken_in_seconds: 74
Epoch [1/1], Step [4691/13804], Loss: 2.5712, Perplexity: 13.0821, time_taken_in_seconds: 75
Epoch [1/1], Step [4692/13804], Loss: 3.1059, Perplexity: 22.3300, time_taken_in_seconds: 76
Epoch [1/1], Step [4693/13804], Loss: 2.7036, Perplexity: 14.9332, time_taken_in_seconds: 76
Epoch [1/1], Step [4694/13804], Loss: 2.4259, Perplexity: 11.3127, time_taken_in_seconds: 77
Epoch [1/1], Step [4695/13804], Loss: 2.7231, Perplexity: 15.2268, time_taken_in_seconds: 78
Epoch [1/1], Step [4696/13804], Loss: 2.6103, Perplexity: 13.6036, time_taken_in_seconds: 79
Epoch [1/1], Step [4697/13804], Loss: 2.4964, Perplexity: 12.1385, time_taken_in_seconds: 80
Epoch [1/1], Step [4698/13804], Loss: 2.6912, Perplexity: 14.7492, time_taken_in_seconds: 80
Epoch [1/1], Step [4699/13804], Loss: 2.5173, Perplexity: 12.3954, time_taken_in_seconds: 81
Epoch [1/1], Step [4700/13804], Loss: 2.5709, Perplexity: 13.0779, time_taken_in_seconds: 82
Epoch [1/1], Step [4701/13804], Loss: 2.7194, Perplexity: 15.1718, time_taken_in_seconds: 0
Epoch [1/1], Step [4702/13804], Loss: 2.8040, Perplexity: 16.5110, time_taken_in_seconds: 1
Epoch [1/1], Step [4703/13804], Loss: 2.5572, Perplexity: 12.9001, time_taken_in_seconds: 2
Epoch [1/1], Step [4704/13804], Loss: 2.4852, Perplexity: 12.0031, time_taken_in_seconds: 3
Epoch [1/1], Step [4705/13804], Loss: 2.7585, Perplexity: 15.7764, time_taken_in_seconds: 4
Epoch [1/1], Step [4706/13804], Loss: 3.2204, Perplexity: 25.0391, time_taken_in_seconds: 5
Epoch [1/1], Step [4707/13804], Loss: 2.6483, Perplexity: 14.1299, time_taken_in_seconds: 6
Epoch [1/1], Step [4708/13804], Loss: 2.7023, Perplexity: 14.9140, time_taken_in_seconds: 6
Epoch [1/1], Step [4709/13804], Loss: 3.1031, Perplexity: 22.2677, time_taken_in_seconds: 7
Epoch [1/1], Step [4710/13804], Loss: 2.8602, Perplexity: 17.4650, time_taken_in_seconds: 8
Epoch [1/1], Step [4711/13804], Loss: 2.7187, Perplexity: 15.1600, time_taken_in_seconds: 9
Epoch [1/1], Step [4712/13804], Loss: 2.8066, Perplexity: 16.5541, time_taken_in_seconds: 10
Epoch [1/1], Step [4713/13804], Loss: 2.2836, Perplexity: 9.8122, time_taken_in_seconds: 11
Epoch [1/1], Step [4714/13804], Loss: 3.2444, Perplexity: 25.6469, time_taken_in_seconds: 11
Epoch [1/1], Step [4715/13804], Loss: 2.5678, Perplexity: 13.0376, time_taken_in_seconds: 12
Epoch [1/1], Step [4716/13804], Loss: 2.5432, Perplexity: 12.7205, time_taken_in_seconds: 13
Epoch [1/1], Step [4717/13804], Loss: 3.4430, Perplexity: 31.2805, time_taken_in_seconds: 14
Epoch [1/1], Step [4718/13804], Loss: 3.2443, Perplexity: 25.6431, time_taken_in_seconds: 15
Epoch [1/1], Step [4719/13804], Loss: 2.6636, Perplexity: 14.3480, time_taken_in_seconds: 16
Epoch [1/1], Step [4720/13804], Loss: 2.8485, Perplexity: 17.2621, time_taken_in_seconds: 16
Epoch [1/1], Step [4721/13804], Loss: 2.8153, Perplexity: 16.6980, time_taken_in_seconds: 17
Epoch [1/1], Step [4722/13804], Loss: 2.6363, Perplexity: 13.9608, time_taken_in_seconds: 18
Epoch [1/1], Step [4723/13804], Loss: 2.6843, Perplexity: 14.6478, time_taken_in_seconds: 19
Epoch [1/1], Step [4724/13804], Loss: 3.0887, Perplexity: 21.9484, time_taken_in_seconds: 20
Epoch [1/1], Step [4725/13804], Loss: 2.6686, Perplexity: 14.4193, time_taken_in_seconds: 20
Epoch [1/1], Step [4726/13804], Loss: 2.6977, Perplexity: 14.8458, time_taken_in_seconds: 21
Epoch [1/1], Step [4727/13804], Loss: 2.4582, Perplexity: 11.6836, time_taken_in_seconds: 22
Epoch [1/1], Step [4728/13804], Loss: 2.3223, Perplexity: 10.1992, time_taken_in_seconds: 23
Epoch [1/1], Step [4729/13804], Loss: 2.5405, Perplexity: 12.6864, time_taken_in_seconds: 24
Epoch [1/1], Step [4730/13804], Loss: 2.6866, Perplexity: 14.6823, time_taken_in_seconds: 25
Epoch [1/1], Step [4731/13804], Loss: 2.8762, Perplexity: 17.7463, time_taken_in_seconds: 25
Epoch [1/1], Step [4732/13804], Loss: 2.7244, Perplexity: 15.2478, time_taken_in_seconds: 26
Epoch [1/1], Step [4733/13804], Loss: 2.9906, Perplexity: 19.8970, time_taken_in_seconds: 27
Epoch [1/1], Step [4734/13804], Loss: 2.9452, Perplexity: 19.0138, time_taken_in_seconds: 28
Epoch [1/1], Step [4735/13804], Loss: 2.7126, Perplexity: 15.0688, time_taken_in_seconds: 29
Epoch [1/1], Step [4736/13804], Loss: 2.9148, Perplexity: 18.4448, time_taken_in_seconds: 30
Epoch [1/1], Step [4737/13804], Loss: 2.8269, Perplexity: 16.8930, time_taken_in_seconds: 30
Epoch [1/1], Step [4738/13804], Loss: 2.9204, Perplexity: 18.5483, time_taken_in_seconds: 31
Epoch [1/1], Step [4739/13804], Loss: 2.6727, Perplexity: 14.4792, time_taken_in_seconds: 32
Epoch [1/1], Step [4740/13804], Loss: 2.5371, Perplexity: 12.6424, time_taken_in_seconds: 33
Epoch [1/1], Step [4741/13804], Loss: 2.5943, Perplexity: 13.3877, time_taken_in_seconds: 34
Epoch [1/1], Step [4742/13804], Loss: 2.6471, Perplexity: 14.1130, time_taken_in_seconds: 35
Epoch [1/1], Step [4743/13804], Loss: 2.6179, Perplexity: 13.7063, time_taken_in_seconds: 35
Epoch [1/1], Step [4744/13804], Loss: 2.5378, Perplexity: 12.6518, time_taken_in_seconds: 36
Epoch [1/1], Step [4745/13804], Loss: 2.6222, Perplexity: 13.7659, time_taken_in_seconds: 37
Epoch [1/1], Step [4746/13804], Loss: 2.7680, Perplexity: 15.9275, time_taken_in_seconds: 38
Epoch [1/1], Step [4747/13804], Loss: 2.7434, Perplexity: 15.5397, time_taken_in_seconds: 39
Epoch [1/1], Step [4748/13804], Loss: 2.5860, Perplexity: 13.2769, time_taken_in_seconds: 40
Epoch [1/1], Step [4749/13804], Loss: 2.7750, Perplexity: 16.0390, time_taken_in_seconds: 40
Epoch [1/1], Step [4750/13804], Loss: 2.6738, Perplexity: 14.4945, time_taken_in_seconds: 41
Epoch [1/1], Step [4751/13804], Loss: 2.7193, Perplexity: 15.1697, time_taken_in_seconds: 42
Epoch [1/1], Step [4752/13804], Loss: 2.8940, Perplexity: 18.0652, time_taken_in_seconds: 43
Epoch [1/1], Step [4753/13804], Loss: 2.4816, Perplexity: 11.9602, time_taken_in_seconds: 44
Epoch [1/1], Step [4754/13804], Loss: 2.8600, Perplexity: 17.4614, time_taken_in_seconds: 44
Epoch [1/1], Step [4755/13804], Loss: 2.4101, Perplexity: 11.1349, time_taken_in_seconds: 45
Epoch [1/1], Step [4756/13804], Loss: 3.0526, Perplexity: 21.1706, time_taken_in_seconds: 46
Epoch [1/1], Step [4757/13804], Loss: 3.7587, Perplexity: 42.8944, time_taken_in_seconds: 47
Epoch [1/1], Step [4758/13804], Loss: 2.9094, Perplexity: 18.3453, time_taken_in_seconds: 48
Epoch [1/1], Step [4759/13804], Loss: 2.6571, Perplexity: 14.2552, time_taken_in_seconds: 49
Epoch [1/1], Step [4760/13804], Loss: 2.4180, Perplexity: 11.2235, time_taken_in_seconds: 49
Epoch [1/1], Step [4761/13804], Loss: 2.9104, Perplexity: 18.3647, time_taken_in_seconds: 50
Epoch [1/1], Step [4762/13804], Loss: 2.6826, Perplexity: 14.6224, time_taken_in_seconds: 51
Epoch [1/1], Step [4763/13804], Loss: 2.5461, Perplexity: 12.7577, time_taken_in_seconds: 52
Epoch [1/1], Step [4764/13804], Loss: 2.5605, Perplexity: 12.9420, time_taken_in_seconds: 53
Epoch [1/1], Step [4765/13804], Loss: 2.6538, Perplexity: 14.2084, time_taken_in_seconds: 54
Epoch [1/1], Step [4766/13804], Loss: 2.6014, Perplexity: 13.4830, time_taken_in_seconds: 54
Epoch [1/1], Step [4767/13804], Loss: 2.9600, Perplexity: 19.2971, time_taken_in_seconds: 55
Epoch [1/1], Step [4768/13804], Loss: 2.4163, Perplexity: 11.2043, time_taken_in_seconds: 56
Epoch [1/1], Step [4769/13804], Loss: 2.6864, Perplexity: 14.6792, time_taken_in_seconds: 57
Epoch [1/1], Step [4770/13804], Loss: 2.5243, Perplexity: 12.4828, time_taken_in_seconds: 58
Epoch [1/1], Step [4771/13804], Loss: 2.5533, Perplexity: 12.8490, time_taken_in_seconds: 58
Epoch [1/1], Step [4772/13804], Loss: 2.6395, Perplexity: 14.0068, time_taken_in_seconds: 59
Epoch [1/1], Step [4773/13804], Loss: 3.1039, Perplexity: 22.2856, time_taken_in_seconds: 60
Epoch [1/1], Step [4774/13804], Loss: 2.8417, Perplexity: 17.1455, time_taken_in_seconds: 61
Epoch [1/1], Step [4775/13804], Loss: 2.5342, Perplexity: 12.6067, time_taken_in_seconds: 62
Epoch [1/1], Step [4776/13804], Loss: 2.5442, Perplexity: 12.7332, time_taken_in_seconds: 63
Epoch [1/1], Step [4777/13804], Loss: 2.7905, Perplexity: 16.2894, time_taken_in_seconds: 63
Epoch [1/1], Step [4778/13804], Loss: 2.5848, Perplexity: 13.2610, time_taken_in_seconds: 64
Epoch [1/1], Step [4779/13804], Loss: 2.5071, Perplexity: 12.2692, time_taken_in_seconds: 65
Epoch [1/1], Step [4780/13804], Loss: 2.8807, Perplexity: 17.8274, time_taken_in_seconds: 66
Epoch [1/1], Step [4781/13804], Loss: 2.6866, Perplexity: 14.6822, time_taken_in_seconds: 67
Epoch [1/1], Step [4782/13804], Loss: 2.4925, Perplexity: 12.0916, time_taken_in_seconds: 68
Epoch [1/1], Step [4783/13804], Loss: 2.7017, Perplexity: 14.9049, time_taken_in_seconds: 69
Epoch [1/1], Step [4784/13804], Loss: 2.6779, Perplexity: 14.5550, time_taken_in_seconds: 69
Epoch [1/1], Step [4785/13804], Loss: 2.7015, Perplexity: 14.9025, time_taken_in_seconds: 70
Epoch [1/1], Step [4786/13804], Loss: 2.8244, Perplexity: 16.8508, time_taken_in_seconds: 71
Epoch [1/1], Step [4787/13804], Loss: 2.5282, Perplexity: 12.5310, time_taken_in_seconds: 72
Epoch [1/1], Step [4788/13804], Loss: 2.9993, Perplexity: 20.0712, time_taken_in_seconds: 73
Epoch [1/1], Step [4789/13804], Loss: 2.6756, Perplexity: 14.5217, time_taken_in_seconds: 74
Epoch [1/1], Step [4790/13804], Loss: 3.1236, Perplexity: 22.7282, time_taken_in_seconds: 74
Epoch [1/1], Step [4791/13804], Loss: 2.6677, Perplexity: 14.4062, time_taken_in_seconds: 75
Epoch [1/1], Step [4792/13804], Loss: 2.3513, Perplexity: 10.4993, time_taken_in_seconds: 76
Epoch [1/1], Step [4793/13804], Loss: 2.9180, Perplexity: 18.5050, time_taken_in_seconds: 77
Epoch [1/1], Step [4794/13804], Loss: 2.7781, Perplexity: 16.0886, time_taken_in_seconds: 78
Epoch [1/1], Step [4795/13804], Loss: 2.5685, Perplexity: 13.0457, time_taken_in_seconds: 78
Epoch [1/1], Step [4796/13804], Loss: 2.5642, Perplexity: 12.9904, time_taken_in_seconds: 79
Epoch [1/1], Step [4797/13804], Loss: 2.8300, Perplexity: 16.9452, time_taken_in_seconds: 80
Epoch [1/1], Step [4798/13804], Loss: 2.4542, Perplexity: 11.6369, time_taken_in_seconds: 81
Epoch [1/1], Step [4799/13804], Loss: 2.8432, Perplexity: 17.1702, time_taken_in_seconds: 82
Epoch [1/1], Step [4800/13804], Loss: 2.7605, Perplexity: 15.8080, time_taken_in_seconds: 83
Epoch [1/1], Step [4801/13804], Loss: 2.3325, Perplexity: 10.3039, time_taken_in_seconds: 0
Epoch [1/1], Step [4802/13804], Loss: 2.8799, Perplexity: 17.8121, time_taken_in_seconds: 1
Epoch [1/1], Step [4803/13804], Loss: 3.5995, Perplexity: 36.5799, time_taken_in_seconds: 2
Epoch [1/1], Step [4804/13804], Loss: 3.1065, Perplexity: 22.3436, time_taken_in_seconds: 3
Epoch [1/1], Step [4805/13804], Loss: 2.3947, Perplexity: 10.9654, time_taken_in_seconds: 4
Epoch [1/1], Step [4806/13804], Loss: 2.4082, Perplexity: 11.1145, time_taken_in_seconds: 4
Epoch [1/1], Step [4807/13804], Loss: 2.6888, Perplexity: 14.7147, time_taken_in_seconds: 5
Epoch [1/1], Step [4808/13804], Loss: 2.6188, Perplexity: 13.7196, time_taken_in_seconds: 6
Epoch [1/1], Step [4809/13804], Loss: 2.6379, Perplexity: 13.9832, time_taken_in_seconds: 7
Epoch [1/1], Step [4810/13804], Loss: 2.7531, Perplexity: 15.6907, time_taken_in_seconds: 8
Epoch [1/1], Step [4811/13804], Loss: 2.6223, Perplexity: 13.7668, time_taken_in_seconds: 9
Epoch [1/1], Step [4812/13804], Loss: 2.6272, Perplexity: 13.8345, time_taken_in_seconds: 9
Epoch [1/1], Step [4813/13804], Loss: 2.5836, Perplexity: 13.2454, time_taken_in_seconds: 10
Epoch [1/1], Step [4814/13804], Loss: 2.4838, Perplexity: 11.9867, time_taken_in_seconds: 11
Epoch [1/1], Step [4815/13804], Loss: 2.7874, Perplexity: 16.2390, time_taken_in_seconds: 12
Epoch [1/1], Step [4816/13804], Loss: 2.9772, Perplexity: 19.6326, time_taken_in_seconds: 13
Epoch [1/1], Step [4817/13804], Loss: 2.6304, Perplexity: 13.8796, time_taken_in_seconds: 14
Epoch [1/1], Step [4818/13804], Loss: 2.5484, Perplexity: 12.7865, time_taken_in_seconds: 14
Epoch [1/1], Step [4819/13804], Loss: 2.8340, Perplexity: 17.0133, time_taken_in_seconds: 15
Epoch [1/1], Step [4820/13804], Loss: 2.6148, Perplexity: 13.6651, time_taken_in_seconds: 16
Epoch [1/1], Step [4821/13804], Loss: 2.3388, Perplexity: 10.3687, time_taken_in_seconds: 17
Epoch [1/1], Step [4822/13804], Loss: 2.3656, Perplexity: 10.6506, time_taken_in_seconds: 18
Epoch [1/1], Step [4823/13804], Loss: 2.3463, Perplexity: 10.4473, time_taken_in_seconds: 18
Epoch [1/1], Step [4824/13804], Loss: 2.9386, Perplexity: 18.8903, time_taken_in_seconds: 19
Epoch [1/1], Step [4825/13804], Loss: 2.7005, Perplexity: 14.8878, time_taken_in_seconds: 20
Epoch [1/1], Step [4826/13804], Loss: 2.6559, Perplexity: 14.2380, time_taken_in_seconds: 21
Epoch [1/1], Step [4827/13804], Loss: 2.5465, Perplexity: 12.7624, time_taken_in_seconds: 22
Epoch [1/1], Step [4828/13804], Loss: 2.5732, Perplexity: 13.1080, time_taken_in_seconds: 23
Epoch [1/1], Step [4829/13804], Loss: 3.0063, Perplexity: 20.2122, time_taken_in_seconds: 23
Epoch [1/1], Step [4830/13804], Loss: 2.5797, Perplexity: 13.1933, time_taken_in_seconds: 24
Epoch [1/1], Step [4831/13804], Loss: 2.3309, Perplexity: 10.2869, time_taken_in_seconds: 25
Epoch [1/1], Step [4832/13804], Loss: 2.6568, Perplexity: 14.2510, time_taken_in_seconds: 26
Epoch [1/1], Step [4833/13804], Loss: 2.6015, Perplexity: 13.4834, time_taken_in_seconds: 27
Epoch [1/1], Step [4834/13804], Loss: 2.3484, Perplexity: 10.4689, time_taken_in_seconds: 27
Epoch [1/1], Step [4835/13804], Loss: 3.0931, Perplexity: 22.0461, time_taken_in_seconds: 28
Epoch [1/1], Step [4836/13804], Loss: 3.0284, Perplexity: 20.6635, time_taken_in_seconds: 29
Epoch [1/1], Step [4837/13804], Loss: 2.7110, Perplexity: 15.0437, time_taken_in_seconds: 30
Epoch [1/1], Step [4838/13804], Loss: 2.3880, Perplexity: 10.8921, time_taken_in_seconds: 31
Epoch [1/1], Step [4839/13804], Loss: 2.8593, Perplexity: 17.4492, time_taken_in_seconds: 32
Epoch [1/1], Step [4840/13804], Loss: 2.5198, Perplexity: 12.4263, time_taken_in_seconds: 32
Epoch [1/1], Step [4841/13804], Loss: 2.4656, Perplexity: 11.7710, time_taken_in_seconds: 33
Epoch [1/1], Step [4842/13804], Loss: 3.0611, Perplexity: 21.3517, time_taken_in_seconds: 34
Epoch [1/1], Step [4843/13804], Loss: 3.0439, Perplexity: 20.9860, time_taken_in_seconds: 35
Epoch [1/1], Step [4844/13804], Loss: 2.6702, Perplexity: 14.4431, time_taken_in_seconds: 36
Epoch [1/1], Step [4845/13804], Loss: 2.8260, Perplexity: 16.8780, time_taken_in_seconds: 36
Epoch [1/1], Step [4846/13804], Loss: 2.6619, Perplexity: 14.3236, time_taken_in_seconds: 37
Epoch [1/1], Step [4847/13804], Loss: 2.5203, Perplexity: 12.4319, time_taken_in_seconds: 38
Epoch [1/1], Step [4848/13804], Loss: 3.0569, Perplexity: 21.2621, time_taken_in_seconds: 39
Epoch [1/1], Step [4849/13804], Loss: 2.6318, Perplexity: 13.8983, time_taken_in_seconds: 40
Epoch [1/1], Step [4850/13804], Loss: 2.8952, Perplexity: 18.0880, time_taken_in_seconds: 41
Epoch [1/1], Step [4851/13804], Loss: 2.7903, Perplexity: 16.2865, time_taken_in_seconds: 41
Epoch [1/1], Step [4852/13804], Loss: 2.9544, Perplexity: 19.1906, time_taken_in_seconds: 42
Epoch [1/1], Step [4853/13804], Loss: 2.8267, Perplexity: 16.8898, time_taken_in_seconds: 43
Epoch [1/1], Step [4854/13804], Loss: 2.4731, Perplexity: 11.8589, time_taken_in_seconds: 44
Epoch [1/1], Step [4855/13804], Loss: 2.7922, Perplexity: 16.3174, time_taken_in_seconds: 45
Epoch [1/1], Step [4856/13804], Loss: 2.5968, Perplexity: 13.4206, time_taken_in_seconds: 46
Epoch [1/1], Step [4857/13804], Loss: 2.8936, Perplexity: 18.0573, time_taken_in_seconds: 47
Epoch [1/1], Step [4858/13804], Loss: 2.6053, Perplexity: 13.5350, time_taken_in_seconds: 47
Epoch [1/1], Step [4859/13804], Loss: 2.5049, Perplexity: 12.2428, time_taken_in_seconds: 48
Epoch [1/1], Step [4860/13804], Loss: 2.7356, Perplexity: 15.4183, time_taken_in_seconds: 49
Epoch [1/1], Step [4861/13804], Loss: 2.7599, Perplexity: 15.7984, time_taken_in_seconds: 50
Epoch [1/1], Step [4862/13804], Loss: 2.8605, Perplexity: 17.4705, time_taken_in_seconds: 51
Epoch [1/1], Step [4863/13804], Loss: 3.0727, Perplexity: 21.6001, time_taken_in_seconds: 52
Epoch [1/1], Step [4864/13804], Loss: 2.6157, Perplexity: 13.6762, time_taken_in_seconds: 52
Epoch [1/1], Step [4865/13804], Loss: 2.7864, Perplexity: 16.2229, time_taken_in_seconds: 53
Epoch [1/1], Step [4866/13804], Loss: 2.7093, Perplexity: 15.0186, time_taken_in_seconds: 54
Epoch [1/1], Step [4867/13804], Loss: 2.7589, Perplexity: 15.7824, time_taken_in_seconds: 55
Epoch [1/1], Step [4868/13804], Loss: 2.6821, Perplexity: 14.6152, time_taken_in_seconds: 56
Epoch [1/1], Step [4869/13804], Loss: 3.5861, Perplexity: 36.0947, time_taken_in_seconds: 56
Epoch [1/1], Step [4870/13804], Loss: 2.7271, Perplexity: 15.2883, time_taken_in_seconds: 57
Epoch [1/1], Step [4871/13804], Loss: 2.6899, Perplexity: 14.7298, time_taken_in_seconds: 58
Epoch [1/1], Step [4872/13804], Loss: 2.5127, Perplexity: 12.3384, time_taken_in_seconds: 59
Epoch [1/1], Step [4873/13804], Loss: 2.8606, Perplexity: 17.4722, time_taken_in_seconds: 60
Epoch [1/1], Step [4874/13804], Loss: 2.8156, Perplexity: 16.7027, time_taken_in_seconds: 61
Epoch [1/1], Step [4875/13804], Loss: 2.5409, Perplexity: 12.6916, time_taken_in_seconds: 61
Epoch [1/1], Step [4876/13804], Loss: 2.5885, Perplexity: 13.3100, time_taken_in_seconds: 62
Epoch [1/1], Step [4877/13804], Loss: 2.8898, Perplexity: 17.9892, time_taken_in_seconds: 63
Epoch [1/1], Step [4878/13804], Loss: 2.6006, Perplexity: 13.4718, time_taken_in_seconds: 64
Epoch [1/1], Step [4879/13804], Loss: 2.9231, Perplexity: 18.5994, time_taken_in_seconds: 65
Epoch [1/1], Step [4880/13804], Loss: 2.5906, Perplexity: 13.3371, time_taken_in_seconds: 66
Epoch [1/1], Step [4881/13804], Loss: 2.6659, Perplexity: 14.3802, time_taken_in_seconds: 66
Epoch [1/1], Step [4882/13804], Loss: 3.0668, Perplexity: 21.4732, time_taken_in_seconds: 67
Epoch [1/1], Step [4883/13804], Loss: 2.6026, Perplexity: 13.4991, time_taken_in_seconds: 68
Epoch [1/1], Step [4884/13804], Loss: 2.7465, Perplexity: 15.5873, time_taken_in_seconds: 69
Epoch [1/1], Step [4885/13804], Loss: 3.2236, Perplexity: 25.1175, time_taken_in_seconds: 70
Epoch [1/1], Step [4886/13804], Loss: 2.6350, Perplexity: 13.9439, time_taken_in_seconds: 71
Epoch [1/1], Step [4887/13804], Loss: 2.6355, Perplexity: 13.9501, time_taken_in_seconds: 71
Epoch [1/1], Step [4888/13804], Loss: 2.4964, Perplexity: 12.1386, time_taken_in_seconds: 72
Epoch [1/1], Step [4889/13804], Loss: 2.7613, Perplexity: 15.8197, time_taken_in_seconds: 73
Epoch [1/1], Step [4890/13804], Loss: 2.4571, Perplexity: 11.6711, time_taken_in_seconds: 74
Epoch [1/1], Step [4891/13804], Loss: 2.7259, Perplexity: 15.2709, time_taken_in_seconds: 75
Epoch [1/1], Step [4892/13804], Loss: 2.3177, Perplexity: 10.1524, time_taken_in_seconds: 76
Epoch [1/1], Step [4893/13804], Loss: 2.6893, Perplexity: 14.7216, time_taken_in_seconds: 76
Epoch [1/1], Step [4894/13804], Loss: 3.6336, Perplexity: 37.8503, time_taken_in_seconds: 77
Epoch [1/1], Step [4895/13804], Loss: 2.3101, Perplexity: 10.0758, time_taken_in_seconds: 78
Epoch [1/1], Step [4896/13804], Loss: 2.7427, Perplexity: 15.5294, time_taken_in_seconds: 79
Epoch [1/1], Step [4897/13804], Loss: 2.6132, Perplexity: 13.6421, time_taken_in_seconds: 80
Epoch [1/1], Step [4898/13804], Loss: 2.4992, Perplexity: 12.1729, time_taken_in_seconds: 81
Epoch [1/1], Step [4899/13804], Loss: 2.3684, Perplexity: 10.6801, time_taken_in_seconds: 81
Epoch [1/1], Step [4900/13804], Loss: 2.4663, Perplexity: 11.7791, time_taken_in_seconds: 82
Epoch [1/1], Step [4901/13804], Loss: 2.8469, Perplexity: 17.2350, time_taken_in_seconds: 0
Epoch [1/1], Step [4902/13804], Loss: 2.7849, Perplexity: 16.1989, time_taken_in_seconds: 1
Epoch [1/1], Step [4903/13804], Loss: 3.3090, Perplexity: 27.3580, time_taken_in_seconds: 2
Epoch [1/1], Step [4904/13804], Loss: 2.7567, Perplexity: 15.7484, time_taken_in_seconds: 3
Epoch [1/1], Step [4905/13804], Loss: 2.7222, Perplexity: 15.2141, time_taken_in_seconds: 4
Epoch [1/1], Step [4906/13804], Loss: 2.7485, Perplexity: 15.6196, time_taken_in_seconds: 4
Epoch [1/1], Step [4907/13804], Loss: 2.9526, Perplexity: 19.1559, time_taken_in_seconds: 5
Epoch [1/1], Step [4908/13804], Loss: 2.6959, Perplexity: 14.8186, time_taken_in_seconds: 6
Epoch [1/1], Step [4909/13804], Loss: 2.5623, Perplexity: 12.9658, time_taken_in_seconds: 7
Epoch [1/1], Step [4910/13804], Loss: 2.8997, Perplexity: 18.1681, time_taken_in_seconds: 8
Epoch [1/1], Step [4911/13804], Loss: 2.9517, Perplexity: 19.1392, time_taken_in_seconds: 9
Epoch [1/1], Step [4912/13804], Loss: 2.3668, Perplexity: 10.6629, time_taken_in_seconds: 9
Epoch [1/1], Step [4913/13804], Loss: 2.9537, Perplexity: 19.1771, time_taken_in_seconds: 10
Epoch [1/1], Step [4914/13804], Loss: 2.3398, Perplexity: 10.3787, time_taken_in_seconds: 11
Epoch [1/1], Step [4915/13804], Loss: 2.7659, Perplexity: 15.8929, time_taken_in_seconds: 12
Epoch [1/1], Step [4916/13804], Loss: 2.6201, Perplexity: 13.7366, time_taken_in_seconds: 13
Epoch [1/1], Step [4917/13804], Loss: 2.6200, Perplexity: 13.7361, time_taken_in_seconds: 13
Epoch [1/1], Step [4918/13804], Loss: 2.7419, Perplexity: 15.5170, time_taken_in_seconds: 14
Epoch [1/1], Step [4919/13804], Loss: 2.8843, Perplexity: 17.8918, time_taken_in_seconds: 15
Epoch [1/1], Step [4920/13804], Loss: 2.9580, Perplexity: 19.2600, time_taken_in_seconds: 16
Epoch [1/1], Step [4921/13804], Loss: 3.0117, Perplexity: 20.3213, time_taken_in_seconds: 17
Epoch [1/1], Step [4922/13804], Loss: 2.6647, Perplexity: 14.3641, time_taken_in_seconds: 18
Epoch [1/1], Step [4923/13804], Loss: 2.7920, Perplexity: 16.3138, time_taken_in_seconds: 18
Epoch [1/1], Step [4924/13804], Loss: 2.8049, Perplexity: 16.5249, time_taken_in_seconds: 19
Epoch [1/1], Step [4925/13804], Loss: 2.5159, Perplexity: 12.3783, time_taken_in_seconds: 20
Epoch [1/1], Step [4926/13804], Loss: 2.8596, Perplexity: 17.4545, time_taken_in_seconds: 21
Epoch [1/1], Step [4927/13804], Loss: 2.4293, Perplexity: 11.3508, time_taken_in_seconds: 22
Epoch [1/1], Step [4928/13804], Loss: 2.7458, Perplexity: 15.5778, time_taken_in_seconds: 23
Epoch [1/1], Step [4929/13804], Loss: 3.2256, Perplexity: 25.1686, time_taken_in_seconds: 24
Epoch [1/1], Step [4930/13804], Loss: 2.3896, Perplexity: 10.9086, time_taken_in_seconds: 24
Epoch [1/1], Step [4931/13804], Loss: 2.6367, Perplexity: 13.9672, time_taken_in_seconds: 25
Epoch [1/1], Step [4932/13804], Loss: 2.8101, Perplexity: 16.6117, time_taken_in_seconds: 26
Epoch [1/1], Step [4933/13804], Loss: 2.9586, Perplexity: 19.2706, time_taken_in_seconds: 27
Epoch [1/1], Step [4934/13804], Loss: 3.0281, Perplexity: 20.6581, time_taken_in_seconds: 28
Epoch [1/1], Step [4935/13804], Loss: 2.7848, Perplexity: 16.1961, time_taken_in_seconds: 29
Epoch [1/1], Step [4936/13804], Loss: 2.6878, Perplexity: 14.6989, time_taken_in_seconds: 29
Epoch [1/1], Step [4937/13804], Loss: 2.8523, Perplexity: 17.3283, time_taken_in_seconds: 30
Epoch [1/1], Step [4938/13804], Loss: 2.4062, Perplexity: 11.0916, time_taken_in_seconds: 31
Epoch [1/1], Step [4939/13804], Loss: 2.4996, Perplexity: 12.1775, time_taken_in_seconds: 32
Epoch [1/1], Step [4940/13804], Loss: 2.6935, Perplexity: 14.7829, time_taken_in_seconds: 33
Epoch [1/1], Step [4941/13804], Loss: 2.4804, Perplexity: 11.9455, time_taken_in_seconds: 33
Epoch [1/1], Step [4942/13804], Loss: 2.7943, Perplexity: 16.3518, time_taken_in_seconds: 34
Epoch [1/1], Step [4943/13804], Loss: 2.9419, Perplexity: 18.9524, time_taken_in_seconds: 35
Epoch [1/1], Step [4944/13804], Loss: 3.6387, Perplexity: 38.0437, time_taken_in_seconds: 36
Epoch [1/1], Step [4945/13804], Loss: 2.3994, Perplexity: 11.0169, time_taken_in_seconds: 37
Epoch [1/1], Step [4946/13804], Loss: 2.6979, Perplexity: 14.8485, time_taken_in_seconds: 38
Epoch [1/1], Step [4947/13804], Loss: 2.7268, Perplexity: 15.2841, time_taken_in_seconds: 38
Epoch [1/1], Step [4948/13804], Loss: 2.9323, Perplexity: 18.7714, time_taken_in_seconds: 39
Epoch [1/1], Step [4949/13804], Loss: 2.6250, Perplexity: 13.8042, time_taken_in_seconds: 40
Epoch [1/1], Step [4950/13804], Loss: 2.5173, Perplexity: 12.3947, time_taken_in_seconds: 41
Epoch [1/1], Step [4951/13804], Loss: 4.1024, Perplexity: 60.4845, time_taken_in_seconds: 42
Epoch [1/1], Step [4952/13804], Loss: 2.7400, Perplexity: 15.4876, time_taken_in_seconds: 43
Epoch [1/1], Step [4953/13804], Loss: 2.7227, Perplexity: 15.2211, time_taken_in_seconds: 43
Epoch [1/1], Step [4954/13804], Loss: 2.7564, Perplexity: 15.7432, time_taken_in_seconds: 44
Epoch [1/1], Step [4955/13804], Loss: 2.9098, Perplexity: 18.3535, time_taken_in_seconds: 45
Epoch [1/1], Step [4956/13804], Loss: 3.5537, Perplexity: 34.9424, time_taken_in_seconds: 46
Epoch [1/1], Step [4957/13804], Loss: 2.8933, Perplexity: 18.0530, time_taken_in_seconds: 47
Epoch [1/1], Step [4958/13804], Loss: 2.6823, Perplexity: 14.6189, time_taken_in_seconds: 48
Epoch [1/1], Step [4959/13804], Loss: 2.9761, Perplexity: 19.6106, time_taken_in_seconds: 48
Epoch [1/1], Step [4960/13804], Loss: 2.7419, Perplexity: 15.5158, time_taken_in_seconds: 49
Epoch [1/1], Step [4961/13804], Loss: 2.7735, Perplexity: 16.0147, time_taken_in_seconds: 50
Epoch [1/1], Step [4962/13804], Loss: 2.5975, Perplexity: 13.4299, time_taken_in_seconds: 51
Epoch [1/1], Step [4963/13804], Loss: 2.3727, Perplexity: 10.7268, time_taken_in_seconds: 52
Epoch [1/1], Step [4964/13804], Loss: 2.7497, Perplexity: 15.6375, time_taken_in_seconds: 53
Epoch [1/1], Step [4965/13804], Loss: 2.6434, Perplexity: 14.0612, time_taken_in_seconds: 53
Epoch [1/1], Step [4966/13804], Loss: 2.8537, Perplexity: 17.3517, time_taken_in_seconds: 54
Epoch [1/1], Step [4967/13804], Loss: 2.4100, Perplexity: 11.1336, time_taken_in_seconds: 55
Epoch [1/1], Step [4968/13804], Loss: 2.3809, Perplexity: 10.8143, time_taken_in_seconds: 56
Epoch [1/1], Step [4969/13804], Loss: 3.2682, Perplexity: 26.2637, time_taken_in_seconds: 57
Epoch [1/1], Step [4970/13804], Loss: 2.7824, Perplexity: 16.1573, time_taken_in_seconds: 58
Epoch [1/1], Step [4971/13804], Loss: 2.7199, Perplexity: 15.1781, time_taken_in_seconds: 58
Epoch [1/1], Step [4972/13804], Loss: 2.7033, Perplexity: 14.9282, time_taken_in_seconds: 59
Epoch [1/1], Step [4973/13804], Loss: 2.9544, Perplexity: 19.1894, time_taken_in_seconds: 60
Epoch [1/1], Step [4974/13804], Loss: 3.6097, Perplexity: 36.9553, time_taken_in_seconds: 61
Epoch [1/1], Step [4975/13804], Loss: 2.4721, Perplexity: 11.8469, time_taken_in_seconds: 62
Epoch [1/1], Step [4976/13804], Loss: 2.6652, Perplexity: 14.3713, time_taken_in_seconds: 63
Epoch [1/1], Step [4977/13804], Loss: 2.6810, Perplexity: 14.5993, time_taken_in_seconds: 63
Epoch [1/1], Step [4978/13804], Loss: 2.8733, Perplexity: 17.6945, time_taken_in_seconds: 64
Epoch [1/1], Step [4979/13804], Loss: 2.8534, Perplexity: 17.3475, time_taken_in_seconds: 65
Epoch [1/1], Step [4980/13804], Loss: 2.7354, Perplexity: 15.4163, time_taken_in_seconds: 66
Epoch [1/1], Step [4981/13804], Loss: 2.4338, Perplexity: 11.4018, time_taken_in_seconds: 67
Epoch [1/1], Step [4982/13804], Loss: 2.7175, Perplexity: 15.1424, time_taken_in_seconds: 68
Epoch [1/1], Step [4983/13804], Loss: 2.7729, Perplexity: 16.0046, time_taken_in_seconds: 68
Epoch [1/1], Step [4984/13804], Loss: 2.5591, Perplexity: 12.9245, time_taken_in_seconds: 69
Epoch [1/1], Step [4985/13804], Loss: 2.5482, Perplexity: 12.7842, time_taken_in_seconds: 70
Epoch [1/1], Step [4986/13804], Loss: 2.6386, Perplexity: 13.9934, time_taken_in_seconds: 71
Epoch [1/1], Step [4987/13804], Loss: 3.1590, Perplexity: 23.5480, time_taken_in_seconds: 72
Epoch [1/1], Step [4988/13804], Loss: 3.1520, Perplexity: 23.3816, time_taken_in_seconds: 73
Epoch [1/1], Step [4989/13804], Loss: 2.8925, Perplexity: 18.0382, time_taken_in_seconds: 73
Epoch [1/1], Step [4990/13804], Loss: 2.6405, Perplexity: 14.0198, time_taken_in_seconds: 74
Epoch [1/1], Step [4991/13804], Loss: 2.7719, Perplexity: 15.9897, time_taken_in_seconds: 75
Epoch [1/1], Step [4992/13804], Loss: 2.5200, Perplexity: 12.4281, time_taken_in_seconds: 76
Epoch [1/1], Step [4993/13804], Loss: 2.7582, Perplexity: 15.7719, time_taken_in_seconds: 77
Epoch [1/1], Step [4994/13804], Loss: 2.7328, Perplexity: 15.3753, time_taken_in_seconds: 77
Epoch [1/1], Step [4995/13804], Loss: 2.7046, Perplexity: 14.9483, time_taken_in_seconds: 78
Epoch [1/1], Step [4996/13804], Loss: 2.5321, Perplexity: 12.5795, time_taken_in_seconds: 79
Epoch [1/1], Step [4997/13804], Loss: 2.4390, Perplexity: 11.4611, time_taken_in_seconds: 80
Epoch [1/1], Step [4998/13804], Loss: 3.6969, Perplexity: 40.3216, time_taken_in_seconds: 81
Epoch [1/1], Step [4999/13804], Loss: 2.4215, Perplexity: 11.2626, time_taken_in_seconds: 82
Epoch [1/1], Step [5000/13804], Loss: 2.6850, Perplexity: 14.6584, time_taken_in_seconds: 83
Epoch [1/1], Step [5001/13804], Loss: 2.5017, Perplexity: 12.2034, time_taken_in_seconds: 0
Epoch [1/1], Step [5002/13804], Loss: 2.6752, Perplexity: 14.5145, time_taken_in_seconds: 1
Epoch [1/1], Step [5003/13804], Loss: 2.8974, Perplexity: 18.1268, time_taken_in_seconds: 2
Epoch [1/1], Step [5004/13804], Loss: 2.4640, Perplexity: 11.7514, time_taken_in_seconds: 3
Epoch [1/1], Step [5005/13804], Loss: 2.9310, Perplexity: 18.7458, time_taken_in_seconds: 4
Epoch [1/1], Step [5006/13804], Loss: 2.6342, Perplexity: 13.9323, time_taken_in_seconds: 4
Epoch [1/1], Step [5007/13804], Loss: 2.7860, Perplexity: 16.2154, time_taken_in_seconds: 5
Epoch [1/1], Step [5008/13804], Loss: 2.6135, Perplexity: 13.6471, time_taken_in_seconds: 6
Epoch [1/1], Step [5009/13804], Loss: 2.7694, Perplexity: 15.9497, time_taken_in_seconds: 7
Epoch [1/1], Step [5010/13804], Loss: 2.8044, Perplexity: 16.5174, time_taken_in_seconds: 8
Epoch [1/1], Step [5011/13804], Loss: 2.7423, Perplexity: 15.5222, time_taken_in_seconds: 9
Epoch [1/1], Step [5012/13804], Loss: 3.0195, Perplexity: 20.4817, time_taken_in_seconds: 9
Epoch [1/1], Step [5013/13804], Loss: 2.8764, Perplexity: 17.7501, time_taken_in_seconds: 10
Epoch [1/1], Step [5014/13804], Loss: 2.6332, Perplexity: 13.9178, time_taken_in_seconds: 11
Epoch [1/1], Step [5015/13804], Loss: 2.7317, Perplexity: 15.3595, time_taken_in_seconds: 12
Epoch [1/1], Step [5016/13804], Loss: 2.7601, Perplexity: 15.8019, time_taken_in_seconds: 13
Epoch [1/1], Step [5017/13804], Loss: 2.7482, Perplexity: 15.6143, time_taken_in_seconds: 14
Epoch [1/1], Step [5018/13804], Loss: 2.7586, Perplexity: 15.7773, time_taken_in_seconds: 14
Epoch [1/1], Step [5019/13804], Loss: 3.2922, Perplexity: 26.9010, time_taken_in_seconds: 15
Epoch [1/1], Step [5020/13804], Loss: 2.8391, Perplexity: 17.0997, time_taken_in_seconds: 16
Epoch [1/1], Step [5021/13804], Loss: 2.3918, Perplexity: 10.9336, time_taken_in_seconds: 17
Epoch [1/1], Step [5022/13804], Loss: 2.5732, Perplexity: 13.1082, time_taken_in_seconds: 18
Epoch [1/1], Step [5023/13804], Loss: 2.7223, Perplexity: 15.2146, time_taken_in_seconds: 18
Epoch [1/1], Step [5024/13804], Loss: 2.4313, Perplexity: 11.3737, time_taken_in_seconds: 19
Epoch [1/1], Step [5025/13804], Loss: 2.7933, Perplexity: 16.3351, time_taken_in_seconds: 20
Epoch [1/1], Step [5026/13804], Loss: 2.3139, Perplexity: 10.1136, time_taken_in_seconds: 21
Epoch [1/1], Step [5027/13804], Loss: 2.4958, Perplexity: 12.1312, time_taken_in_seconds: 22
Epoch [1/1], Step [5028/13804], Loss: 2.3077, Perplexity: 10.0514, time_taken_in_seconds: 23
Epoch [1/1], Step [5029/13804], Loss: 3.0910, Perplexity: 21.9982, time_taken_in_seconds: 23
Epoch [1/1], Step [5030/13804], Loss: 2.4383, Perplexity: 11.4541, time_taken_in_seconds: 24
Epoch [1/1], Step [5031/13804], Loss: 3.0820, Perplexity: 21.8016, time_taken_in_seconds: 25
Epoch [1/1], Step [5032/13804], Loss: 3.2208, Perplexity: 25.0488, time_taken_in_seconds: 26
Epoch [1/1], Step [5033/13804], Loss: 2.6516, Perplexity: 14.1766, time_taken_in_seconds: 27
Epoch [1/1], Step [5034/13804], Loss: 2.3889, Perplexity: 10.9010, time_taken_in_seconds: 27
Epoch [1/1], Step [5035/13804], Loss: 2.6163, Perplexity: 13.6847, time_taken_in_seconds: 28
Epoch [1/1], Step [5036/13804], Loss: 2.7398, Perplexity: 15.4833, time_taken_in_seconds: 29
Epoch [1/1], Step [5037/13804], Loss: 2.7038, Perplexity: 14.9358, time_taken_in_seconds: 30
Epoch [1/1], Step [5038/13804], Loss: 2.7653, Perplexity: 15.8835, time_taken_in_seconds: 31
Epoch [1/1], Step [5039/13804], Loss: 2.7819, Perplexity: 16.1493, time_taken_in_seconds: 32
Epoch [1/1], Step [5040/13804], Loss: 2.7883, Perplexity: 16.2541, time_taken_in_seconds: 32
Epoch [1/1], Step [5041/13804], Loss: 2.5367, Perplexity: 12.6382, time_taken_in_seconds: 33
Epoch [1/1], Step [5042/13804], Loss: 2.6421, Perplexity: 14.0432, time_taken_in_seconds: 34
Epoch [1/1], Step [5043/13804], Loss: 2.6535, Perplexity: 14.2040, time_taken_in_seconds: 35
Epoch [1/1], Step [5044/13804], Loss: 2.8741, Perplexity: 17.7091, time_taken_in_seconds: 36
Epoch [1/1], Step [5045/13804], Loss: 3.0889, Perplexity: 21.9525, time_taken_in_seconds: 36
Epoch [1/1], Step [5046/13804], Loss: 3.2314, Perplexity: 25.3150, time_taken_in_seconds: 37
Epoch [1/1], Step [5047/13804], Loss: 2.4414, Perplexity: 11.4891, time_taken_in_seconds: 38
Epoch [1/1], Step [5048/13804], Loss: 2.7062, Perplexity: 14.9721, time_taken_in_seconds: 39
Epoch [1/1], Step [5049/13804], Loss: 2.9689, Perplexity: 19.4706, time_taken_in_seconds: 40
Epoch [1/1], Step [5050/13804], Loss: 2.4242, Perplexity: 11.2936, time_taken_in_seconds: 41
Epoch [1/1], Step [5051/13804], Loss: 2.8398, Perplexity: 17.1122, time_taken_in_seconds: 41
Epoch [1/1], Step [5052/13804], Loss: 2.5205, Perplexity: 12.4348, time_taken_in_seconds: 42
Epoch [1/1], Step [5053/13804], Loss: 2.9501, Perplexity: 19.1069, time_taken_in_seconds: 43
Epoch [1/1], Step [5054/13804], Loss: 2.8126, Perplexity: 16.6536, time_taken_in_seconds: 44
Epoch [1/1], Step [5055/13804], Loss: 2.9811, Perplexity: 19.7095, time_taken_in_seconds: 45
Epoch [1/1], Step [5056/13804], Loss: 2.7205, Perplexity: 15.1876, time_taken_in_seconds: 46
Epoch [1/1], Step [5057/13804], Loss: 2.8350, Perplexity: 17.0311, time_taken_in_seconds: 46
Epoch [1/1], Step [5058/13804], Loss: 2.4440, Perplexity: 11.5185, time_taken_in_seconds: 47
Epoch [1/1], Step [5059/13804], Loss: 2.9365, Perplexity: 18.8491, time_taken_in_seconds: 48
Epoch [1/1], Step [5060/13804], Loss: 2.7997, Perplexity: 16.4396, time_taken_in_seconds: 49
Epoch [1/1], Step [5061/13804], Loss: 2.8702, Perplexity: 17.6405, time_taken_in_seconds: 50
Epoch [1/1], Step [5062/13804], Loss: 2.8643, Perplexity: 17.5373, time_taken_in_seconds: 51
Epoch [1/1], Step [5063/13804], Loss: 2.5420, Perplexity: 12.7056, time_taken_in_seconds: 51
Epoch [1/1], Step [5064/13804], Loss: 2.6060, Perplexity: 13.5449, time_taken_in_seconds: 52
Epoch [1/1], Step [5065/13804], Loss: 2.7622, Perplexity: 15.8344, time_taken_in_seconds: 53
Epoch [1/1], Step [5066/13804], Loss: 3.2206, Perplexity: 25.0422, time_taken_in_seconds: 54
Epoch [1/1], Step [5067/13804], Loss: 2.9016, Perplexity: 18.2034, time_taken_in_seconds: 55
Epoch [1/1], Step [5068/13804], Loss: 2.5118, Perplexity: 12.3268, time_taken_in_seconds: 56
Epoch [1/1], Step [5069/13804], Loss: 2.6041, Perplexity: 13.5188, time_taken_in_seconds: 56
Epoch [1/1], Step [5070/13804], Loss: 2.6934, Perplexity: 14.7811, time_taken_in_seconds: 57
Epoch [1/1], Step [5071/13804], Loss: 2.6306, Perplexity: 13.8827, time_taken_in_seconds: 58
Epoch [1/1], Step [5072/13804], Loss: 2.7533, Perplexity: 15.6947, time_taken_in_seconds: 59
Epoch [1/1], Step [5073/13804], Loss: 2.7189, Perplexity: 15.1639, time_taken_in_seconds: 60
Epoch [1/1], Step [5074/13804], Loss: 2.5218, Perplexity: 12.4511, time_taken_in_seconds: 61
Epoch [1/1], Step [5075/13804], Loss: 2.7357, Perplexity: 15.4199, time_taken_in_seconds: 62
Epoch [1/1], Step [5076/13804], Loss: 2.7329, Perplexity: 15.3778, time_taken_in_seconds: 62
Epoch [1/1], Step [5077/13804], Loss: 2.8337, Perplexity: 17.0089, time_taken_in_seconds: 63
Epoch [1/1], Step [5078/13804], Loss: 2.5016, Perplexity: 12.2018, time_taken_in_seconds: 64
Epoch [1/1], Step [5079/13804], Loss: 3.2119, Perplexity: 24.8269, time_taken_in_seconds: 65
Epoch [1/1], Step [5080/13804], Loss: 2.4281, Perplexity: 11.3373, time_taken_in_seconds: 66
Epoch [1/1], Step [5081/13804], Loss: 2.4606, Perplexity: 11.7114, time_taken_in_seconds: 67
Epoch [1/1], Step [5082/13804], Loss: 2.7947, Perplexity: 16.3574, time_taken_in_seconds: 67
Epoch [1/1], Step [5083/13804], Loss: 3.0795, Perplexity: 21.7471, time_taken_in_seconds: 68
Epoch [1/1], Step [5084/13804], Loss: 2.7204, Perplexity: 15.1862, time_taken_in_seconds: 69
Epoch [1/1], Step [5085/13804], Loss: 2.8228, Perplexity: 16.8233, time_taken_in_seconds: 70
Epoch [1/1], Step [5086/13804], Loss: 2.4279, Perplexity: 11.3348, time_taken_in_seconds: 71
Epoch [1/1], Step [5087/13804], Loss: 2.6717, Perplexity: 14.4648, time_taken_in_seconds: 72
Epoch [1/1], Step [5088/13804], Loss: 2.4828, Perplexity: 11.9746, time_taken_in_seconds: 72
Epoch [1/1], Step [5089/13804], Loss: 3.0081, Perplexity: 20.2487, time_taken_in_seconds: 73
Epoch [1/1], Step [5090/13804], Loss: 2.9776, Perplexity: 19.6399, time_taken_in_seconds: 74
Epoch [1/1], Step [5091/13804], Loss: 3.0494, Perplexity: 21.1030, time_taken_in_seconds: 75
Epoch [1/1], Step [5092/13804], Loss: 2.7268, Perplexity: 15.2833, time_taken_in_seconds: 76
Epoch [1/1], Step [5093/13804], Loss: 2.7597, Perplexity: 15.7955, time_taken_in_seconds: 77
Epoch [1/1], Step [5094/13804], Loss: 2.3480, Perplexity: 10.4648, time_taken_in_seconds: 77
Epoch [1/1], Step [5095/13804], Loss: 2.8962, Perplexity: 18.1061, time_taken_in_seconds: 78
Epoch [1/1], Step [5096/13804], Loss: 2.8189, Perplexity: 16.7581, time_taken_in_seconds: 79
Epoch [1/1], Step [5097/13804], Loss: 2.6292, Perplexity: 13.8621, time_taken_in_seconds: 80
Epoch [1/1], Step [5098/13804], Loss: 2.8343, Perplexity: 17.0192, time_taken_in_seconds: 81
Epoch [1/1], Step [5099/13804], Loss: 3.9460, Perplexity: 51.7269, time_taken_in_seconds: 81
Epoch [1/1], Step [5100/13804], Loss: 2.5216, Perplexity: 12.4480, time_taken_in_seconds: 82
Epoch [1/1], Step [5101/13804], Loss: 3.0523, Perplexity: 21.1631, time_taken_in_seconds: 0
Epoch [1/1], Step [5102/13804], Loss: 2.7461, Perplexity: 15.5814, time_taken_in_seconds: 1
Epoch [1/1], Step [5103/13804], Loss: 3.1297, Perplexity: 22.8666, time_taken_in_seconds: 2
Epoch [1/1], Step [5104/13804], Loss: 2.4453, Perplexity: 11.5346, time_taken_in_seconds: 3
Epoch [1/1], Step [5105/13804], Loss: 2.6189, Perplexity: 13.7211, time_taken_in_seconds: 4
Epoch [1/1], Step [5106/13804], Loss: 3.2004, Perplexity: 24.5422, time_taken_in_seconds: 4
Epoch [1/1], Step [5107/13804], Loss: 2.7673, Perplexity: 15.9159, time_taken_in_seconds: 5
Epoch [1/1], Step [5108/13804], Loss: 3.1333, Perplexity: 22.9490, time_taken_in_seconds: 6
Epoch [1/1], Step [5109/13804], Loss: 3.2601, Perplexity: 26.0532, time_taken_in_seconds: 7
Epoch [1/1], Step [5110/13804], Loss: 2.5727, Perplexity: 13.1011, time_taken_in_seconds: 8
Epoch [1/1], Step [5111/13804], Loss: 2.6980, Perplexity: 14.8495, time_taken_in_seconds: 9
Epoch [1/1], Step [5112/13804], Loss: 2.7776, Perplexity: 16.0801, time_taken_in_seconds: 9
Epoch [1/1], Step [5113/13804], Loss: 2.5814, Perplexity: 13.2151, time_taken_in_seconds: 10
Epoch [1/1], Step [5114/13804], Loss: 2.7023, Perplexity: 14.9139, time_taken_in_seconds: 11
Epoch [1/1], Step [5115/13804], Loss: 2.9496, Perplexity: 19.0991, time_taken_in_seconds: 12
Epoch [1/1], Step [5116/13804], Loss: 2.7815, Perplexity: 16.1426, time_taken_in_seconds: 13
Epoch [1/1], Step [5117/13804], Loss: 2.5056, Perplexity: 12.2512, time_taken_in_seconds: 14
Epoch [1/1], Step [5118/13804], Loss: 2.7777, Perplexity: 16.0821, time_taken_in_seconds: 14
Epoch [1/1], Step [5119/13804], Loss: 2.5730, Perplexity: 13.1049, time_taken_in_seconds: 15
Epoch [1/1], Step [5120/13804], Loss: 2.6651, Perplexity: 14.3689, time_taken_in_seconds: 16
Epoch [1/1], Step [5121/13804], Loss: 2.5737, Perplexity: 13.1145, time_taken_in_seconds: 17
Epoch [1/1], Step [5122/13804], Loss: 3.1492, Perplexity: 23.3166, time_taken_in_seconds: 18
Epoch [1/1], Step [5123/13804], Loss: 2.5479, Perplexity: 12.7806, time_taken_in_seconds: 19
Epoch [1/1], Step [5124/13804], Loss: 2.5907, Perplexity: 13.3390, time_taken_in_seconds: 19
Epoch [1/1], Step [5125/13804], Loss: 2.9774, Perplexity: 19.6375, time_taken_in_seconds: 20
Epoch [1/1], Step [5126/13804], Loss: 2.7395, Perplexity: 15.4785, time_taken_in_seconds: 21
Epoch [1/1], Step [5127/13804], Loss: 2.6501, Perplexity: 14.1552, time_taken_in_seconds: 22
Epoch [1/1], Step [5128/13804], Loss: 2.8764, Perplexity: 17.7498, time_taken_in_seconds: 23
Epoch [1/1], Step [5129/13804], Loss: 2.9701, Perplexity: 19.4930, time_taken_in_seconds: 24
Epoch [1/1], Step [5130/13804], Loss: 2.7779, Perplexity: 16.0849, time_taken_in_seconds: 24
Epoch [1/1], Step [5131/13804], Loss: 2.5690, Perplexity: 13.0530, time_taken_in_seconds: 25
Epoch [1/1], Step [5132/13804], Loss: 2.4774, Perplexity: 11.9099, time_taken_in_seconds: 26
Epoch [1/1], Step [5133/13804], Loss: 3.0399, Perplexity: 20.9025, time_taken_in_seconds: 27
Epoch [1/1], Step [5134/13804], Loss: 2.5717, Perplexity: 13.0884, time_taken_in_seconds: 28
Epoch [1/1], Step [5135/13804], Loss: 2.4702, Perplexity: 11.8244, time_taken_in_seconds: 29
Epoch [1/1], Step [5136/13804], Loss: 3.0332, Perplexity: 20.7640, time_taken_in_seconds: 29
Epoch [1/1], Step [5137/13804], Loss: 3.0140, Perplexity: 20.3683, time_taken_in_seconds: 30
Epoch [1/1], Step [5138/13804], Loss: 2.9340, Perplexity: 18.8022, time_taken_in_seconds: 31
Epoch [1/1], Step [5139/13804], Loss: 2.5986, Perplexity: 13.4445, time_taken_in_seconds: 32
Epoch [1/1], Step [5140/13804], Loss: 2.9529, Perplexity: 19.1612, time_taken_in_seconds: 33
Epoch [1/1], Step [5141/13804], Loss: 2.9181, Perplexity: 18.5056, time_taken_in_seconds: 33
Epoch [1/1], Step [5142/13804], Loss: 3.1754, Perplexity: 23.9373, time_taken_in_seconds: 34
Epoch [1/1], Step [5143/13804], Loss: 2.7240, Perplexity: 15.2411, time_taken_in_seconds: 35
Epoch [1/1], Step [5144/13804], Loss: 2.9043, Perplexity: 18.2525, time_taken_in_seconds: 36
Epoch [1/1], Step [5145/13804], Loss: 2.8101, Perplexity: 16.6114, time_taken_in_seconds: 37
Epoch [1/1], Step [5146/13804], Loss: 2.6370, Perplexity: 13.9710, time_taken_in_seconds: 38
Epoch [1/1], Step [5147/13804], Loss: 2.6950, Perplexity: 14.8049, time_taken_in_seconds: 39
Epoch [1/1], Step [5148/13804], Loss: 2.6835, Perplexity: 14.6356, time_taken_in_seconds: 40
Epoch [1/1], Step [5149/13804], Loss: 2.6103, Perplexity: 13.6034, time_taken_in_seconds: 40
Epoch [1/1], Step [5150/13804], Loss: 3.0112, Perplexity: 20.3118, time_taken_in_seconds: 41
Epoch [1/1], Step [5151/13804], Loss: 2.7178, Perplexity: 15.1471, time_taken_in_seconds: 42
Epoch [1/1], Step [5152/13804], Loss: 3.1123, Perplexity: 22.4725, time_taken_in_seconds: 43
Epoch [1/1], Step [5153/13804], Loss: 2.7929, Perplexity: 16.3278, time_taken_in_seconds: 44
Epoch [1/1], Step [5154/13804], Loss: 2.7085, Perplexity: 15.0073, time_taken_in_seconds: 44
Epoch [1/1], Step [5155/13804], Loss: 3.1493, Perplexity: 23.3199, time_taken_in_seconds: 45
Epoch [1/1], Step [5156/13804], Loss: 2.6135, Perplexity: 13.6462, time_taken_in_seconds: 46
Epoch [1/1], Step [5157/13804], Loss: 2.9841, Perplexity: 19.7693, time_taken_in_seconds: 47
Epoch [1/1], Step [5158/13804], Loss: 2.6157, Perplexity: 13.6766, time_taken_in_seconds: 48
Epoch [1/1], Step [5159/13804], Loss: 3.1089, Perplexity: 22.3969, time_taken_in_seconds: 49
Epoch [1/1], Step [5160/13804], Loss: 3.0701, Perplexity: 21.5446, time_taken_in_seconds: 49
Epoch [1/1], Step [5161/13804], Loss: 2.8320, Perplexity: 16.9802, time_taken_in_seconds: 50
Epoch [1/1], Step [5162/13804], Loss: 3.3649, Perplexity: 28.9320, time_taken_in_seconds: 51
Epoch [1/1], Step [5163/13804], Loss: 2.6424, Perplexity: 14.0474, time_taken_in_seconds: 52
Epoch [1/1], Step [5164/13804], Loss: 2.6561, Perplexity: 14.2411, time_taken_in_seconds: 53
Epoch [1/1], Step [5165/13804], Loss: 2.6724, Perplexity: 14.4753, time_taken_in_seconds: 54
Epoch [1/1], Step [5166/13804], Loss: 2.8885, Perplexity: 17.9658, time_taken_in_seconds: 54
Epoch [1/1], Step [5167/13804], Loss: 2.9787, Perplexity: 19.6625, time_taken_in_seconds: 55
Epoch [1/1], Step [5168/13804], Loss: 2.6777, Perplexity: 14.5513, time_taken_in_seconds: 56
Epoch [1/1], Step [5169/13804], Loss: 2.5771, Perplexity: 13.1586, time_taken_in_seconds: 57
Epoch [1/1], Step [5170/13804], Loss: 2.6716, Perplexity: 14.4626, time_taken_in_seconds: 58
Epoch [1/1], Step [5171/13804], Loss: 2.4310, Perplexity: 11.3700, time_taken_in_seconds: 59
Epoch [1/1], Step [5172/13804], Loss: 2.7419, Perplexity: 15.5165, time_taken_in_seconds: 59
Epoch [1/1], Step [5173/13804], Loss: 3.4933, Perplexity: 32.8934, time_taken_in_seconds: 60
Epoch [1/1], Step [5174/13804], Loss: 2.2767, Perplexity: 9.7447, time_taken_in_seconds: 61
Epoch [1/1], Step [5175/13804], Loss: 2.6954, Perplexity: 14.8120, time_taken_in_seconds: 62
Epoch [1/1], Step [5176/13804], Loss: 2.9368, Perplexity: 18.8553, time_taken_in_seconds: 63
Epoch [1/1], Step [5177/13804], Loss: 2.5681, Perplexity: 13.0413, time_taken_in_seconds: 63
Epoch [1/1], Step [5178/13804], Loss: 2.5475, Perplexity: 12.7749, time_taken_in_seconds: 64
Epoch [1/1], Step [5179/13804], Loss: 2.7471, Perplexity: 15.5968, time_taken_in_seconds: 65
Epoch [1/1], Step [5180/13804], Loss: 2.8158, Perplexity: 16.7060, time_taken_in_seconds: 66
Epoch [1/1], Step [5181/13804], Loss: 2.6918, Perplexity: 14.7582, time_taken_in_seconds: 67
Epoch [1/1], Step [5182/13804], Loss: 3.2887, Perplexity: 26.8068, time_taken_in_seconds: 68
Epoch [1/1], Step [5183/13804], Loss: 2.6074, Perplexity: 13.5635, time_taken_in_seconds: 68
Epoch [1/1], Step [5184/13804], Loss: 2.9046, Perplexity: 18.2584, time_taken_in_seconds: 69
Epoch [1/1], Step [5185/13804], Loss: 2.5887, Perplexity: 13.3120, time_taken_in_seconds: 70
Epoch [1/1], Step [5186/13804], Loss: 2.8699, Perplexity: 17.6361, time_taken_in_seconds: 71
Epoch [1/1], Step [5187/13804], Loss: 2.7436, Perplexity: 15.5433, time_taken_in_seconds: 72
Epoch [1/1], Step [5188/13804], Loss: 3.0102, Perplexity: 20.2923, time_taken_in_seconds: 73
Epoch [1/1], Step [5189/13804], Loss: 2.6520, Perplexity: 14.1829, time_taken_in_seconds: 73
Epoch [1/1], Step [5190/13804], Loss: 2.6894, Perplexity: 14.7225, time_taken_in_seconds: 74
Epoch [1/1], Step [5191/13804], Loss: 2.6989, Perplexity: 14.8638, time_taken_in_seconds: 75
Epoch [1/1], Step [5192/13804], Loss: 3.0456, Perplexity: 21.0225, time_taken_in_seconds: 76
Epoch [1/1], Step [5193/13804], Loss: 2.5849, Perplexity: 13.2621, time_taken_in_seconds: 77
Epoch [1/1], Step [5194/13804], Loss: 2.5420, Perplexity: 12.7051, time_taken_in_seconds: 78
Epoch [1/1], Step [5195/13804], Loss: 2.9472, Perplexity: 19.0523, time_taken_in_seconds: 78
Epoch [1/1], Step [5196/13804], Loss: 3.1395, Perplexity: 23.0925, time_taken_in_seconds: 79
Epoch [1/1], Step [5197/13804], Loss: 2.6235, Perplexity: 13.7836, time_taken_in_seconds: 80
Epoch [1/1], Step [5198/13804], Loss: 3.0795, Perplexity: 21.7485, time_taken_in_seconds: 81
Epoch [1/1], Step [5199/13804], Loss: 2.4165, Perplexity: 11.2063, time_taken_in_seconds: 82
Epoch [1/1], Step [5200/13804], Loss: 3.0927, Perplexity: 22.0363, time_taken_in_seconds: 83
Epoch [1/1], Step [5201/13804], Loss: 2.7376, Perplexity: 15.4506, time_taken_in_seconds: 0
Epoch [1/1], Step [5202/13804], Loss: 2.7262, Perplexity: 15.2744, time_taken_in_seconds: 1
Epoch [1/1], Step [5203/13804], Loss: 2.7687, Perplexity: 15.9374, time_taken_in_seconds: 2
Epoch [1/1], Step [5204/13804], Loss: 2.6641, Perplexity: 14.3554, time_taken_in_seconds: 3
Epoch [1/1], Step [5205/13804], Loss: 2.6197, Perplexity: 13.7309, time_taken_in_seconds: 4
Epoch [1/1], Step [5206/13804], Loss: 2.4281, Perplexity: 11.3373, time_taken_in_seconds: 4
Epoch [1/1], Step [5207/13804], Loss: 2.7122, Perplexity: 15.0618, time_taken_in_seconds: 5
Epoch [1/1], Step [5208/13804], Loss: 2.5436, Perplexity: 12.7253, time_taken_in_seconds: 6
Epoch [1/1], Step [5209/13804], Loss: 2.6345, Perplexity: 13.9360, time_taken_in_seconds: 7
Epoch [1/1], Step [5210/13804], Loss: 2.6566, Perplexity: 14.2481, time_taken_in_seconds: 8
Epoch [1/1], Step [5211/13804], Loss: 2.8027, Perplexity: 16.4895, time_taken_in_seconds: 9
Epoch [1/1], Step [5212/13804], Loss: 2.8814, Perplexity: 17.8398, time_taken_in_seconds: 9
Epoch [1/1], Step [5213/13804], Loss: 2.6256, Perplexity: 13.8129, time_taken_in_seconds: 10
Epoch [1/1], Step [5214/13804], Loss: 2.5748, Perplexity: 13.1281, time_taken_in_seconds: 11
Epoch [1/1], Step [5215/13804], Loss: 2.5040, Perplexity: 12.2314, time_taken_in_seconds: 12
Epoch [1/1], Step [5216/13804], Loss: 2.8785, Perplexity: 17.7880, time_taken_in_seconds: 13
Epoch [1/1], Step [5217/13804], Loss: 2.6423, Perplexity: 14.0452, time_taken_in_seconds: 14
Epoch [1/1], Step [5218/13804], Loss: 2.8828, Perplexity: 17.8650, time_taken_in_seconds: 15
Epoch [1/1], Step [5219/13804], Loss: 2.4845, Perplexity: 11.9946, time_taken_in_seconds: 15
Epoch [1/1], Step [5220/13804], Loss: 2.8235, Perplexity: 16.8356, time_taken_in_seconds: 16
Epoch [1/1], Step [5221/13804], Loss: 2.6796, Perplexity: 14.5798, time_taken_in_seconds: 17
Epoch [1/1], Step [5222/13804], Loss: 2.5450, Perplexity: 12.7436, time_taken_in_seconds: 18
Epoch [1/1], Step [5223/13804], Loss: 4.0073, Perplexity: 54.9971, time_taken_in_seconds: 19
Epoch [1/1], Step [5224/13804], Loss: 2.4980, Perplexity: 12.1576, time_taken_in_seconds: 20
Epoch [1/1], Step [5225/13804], Loss: 2.9008, Perplexity: 18.1895, time_taken_in_seconds: 20
Epoch [1/1], Step [5226/13804], Loss: 3.0549, Perplexity: 21.2200, time_taken_in_seconds: 21
Epoch [1/1], Step [5227/13804], Loss: 2.0506, Perplexity: 7.7728, time_taken_in_seconds: 22
Epoch [1/1], Step [5228/13804], Loss: 2.8440, Perplexity: 17.1841, time_taken_in_seconds: 23
Epoch [1/1], Step [5229/13804], Loss: 2.4506, Perplexity: 11.5955, time_taken_in_seconds: 24
Epoch [1/1], Step [5230/13804], Loss: 3.1751, Perplexity: 23.9287, time_taken_in_seconds: 25
Epoch [1/1], Step [5231/13804], Loss: 2.5243, Perplexity: 12.4828, time_taken_in_seconds: 25
Epoch [1/1], Step [5232/13804], Loss: 3.0833, Perplexity: 21.8309, time_taken_in_seconds: 26
Epoch [1/1], Step [5233/13804], Loss: 2.7399, Perplexity: 15.4852, time_taken_in_seconds: 27
Epoch [1/1], Step [5234/13804], Loss: 2.6013, Perplexity: 13.4808, time_taken_in_seconds: 28
Epoch [1/1], Step [5235/13804], Loss: 2.7053, Perplexity: 14.9588, time_taken_in_seconds: 29
Epoch [1/1], Step [5236/13804], Loss: 2.2060, Perplexity: 9.0797, time_taken_in_seconds: 29
Epoch [1/1], Step [5237/13804], Loss: 2.8697, Perplexity: 17.6312, time_taken_in_seconds: 30
Epoch [1/1], Step [5238/13804], Loss: 2.7761, Perplexity: 16.0561, time_taken_in_seconds: 31
Epoch [1/1], Step [5239/13804], Loss: 2.7321, Perplexity: 15.3648, time_taken_in_seconds: 32
Epoch [1/1], Step [5240/13804], Loss: 3.0205, Perplexity: 20.5006, time_taken_in_seconds: 33
Epoch [1/1], Step [5241/13804], Loss: 2.7978, Perplexity: 16.4089, time_taken_in_seconds: 34
Epoch [1/1], Step [5242/13804], Loss: 2.6988, Perplexity: 14.8620, time_taken_in_seconds: 34
Epoch [1/1], Step [5243/13804], Loss: 2.8769, Perplexity: 17.7598, time_taken_in_seconds: 35
Epoch [1/1], Step [5244/13804], Loss: 2.7255, Perplexity: 15.2647, time_taken_in_seconds: 36
Epoch [1/1], Step [5245/13804], Loss: 2.6264, Perplexity: 13.8238, time_taken_in_seconds: 37
Epoch [1/1], Step [5246/13804], Loss: 2.7597, Perplexity: 15.7950, time_taken_in_seconds: 38
Epoch [1/1], Step [5247/13804], Loss: 3.0881, Perplexity: 21.9350, time_taken_in_seconds: 39
Epoch [1/1], Step [5248/13804], Loss: 2.5232, Perplexity: 12.4684, time_taken_in_seconds: 39
Epoch [1/1], Step [5249/13804], Loss: 2.9433, Perplexity: 18.9791, time_taken_in_seconds: 40
Epoch [1/1], Step [5250/13804], Loss: 2.5968, Perplexity: 13.4201, time_taken_in_seconds: 41
Epoch [1/1], Step [5251/13804], Loss: 2.9985, Perplexity: 20.0552, time_taken_in_seconds: 42
Epoch [1/1], Step [5252/13804], Loss: 2.6239, Perplexity: 13.7891, time_taken_in_seconds: 43
Epoch [1/1], Step [5253/13804], Loss: 2.9286, Perplexity: 18.7011, time_taken_in_seconds: 44
Epoch [1/1], Step [5254/13804], Loss: 2.6940, Perplexity: 14.7904, time_taken_in_seconds: 44
Epoch [1/1], Step [5255/13804], Loss: 2.9274, Perplexity: 18.6782, time_taken_in_seconds: 45
Epoch [1/1], Step [5256/13804], Loss: 2.8195, Perplexity: 16.7681, time_taken_in_seconds: 46
Epoch [1/1], Step [5257/13804], Loss: 2.6374, Perplexity: 13.9771, time_taken_in_seconds: 47
Epoch [1/1], Step [5258/13804], Loss: 2.7449, Perplexity: 15.5630, time_taken_in_seconds: 48
Epoch [1/1], Step [5259/13804], Loss: 4.6268, Perplexity: 102.1877, time_taken_in_seconds: 48
Epoch [1/1], Step [5260/13804], Loss: 2.8817, Perplexity: 17.8439, time_taken_in_seconds: 49
Epoch [1/1], Step [5261/13804], Loss: 2.7850, Perplexity: 16.1991, time_taken_in_seconds: 50
Epoch [1/1], Step [5262/13804], Loss: 2.6080, Perplexity: 13.5718, time_taken_in_seconds: 51
Epoch [1/1], Step [5263/13804], Loss: 4.0151, Perplexity: 55.4308, time_taken_in_seconds: 52
Epoch [1/1], Step [5264/13804], Loss: 2.3006, Perplexity: 9.9803, time_taken_in_seconds: 53
Epoch [1/1], Step [5265/13804], Loss: 2.8045, Perplexity: 16.5184, time_taken_in_seconds: 53
Epoch [1/1], Step [5266/13804], Loss: 2.9348, Perplexity: 18.8183, time_taken_in_seconds: 54
Epoch [1/1], Step [5267/13804], Loss: 2.8211, Perplexity: 16.7947, time_taken_in_seconds: 55
Epoch [1/1], Step [5268/13804], Loss: 2.6183, Perplexity: 13.7123, time_taken_in_seconds: 56
Epoch [1/1], Step [5269/13804], Loss: 2.7951, Perplexity: 16.3636, time_taken_in_seconds: 57
Epoch [1/1], Step [5270/13804], Loss: 2.8729, Perplexity: 17.6877, time_taken_in_seconds: 58
Epoch [1/1], Step [5271/13804], Loss: 2.6565, Perplexity: 14.2468, time_taken_in_seconds: 58
Epoch [1/1], Step [5272/13804], Loss: 2.7505, Perplexity: 15.6499, time_taken_in_seconds: 59
Epoch [1/1], Step [5273/13804], Loss: 2.8218, Perplexity: 16.8076, time_taken_in_seconds: 60
Epoch [1/1], Step [5274/13804], Loss: 2.9373, Perplexity: 18.8654, time_taken_in_seconds: 61
Epoch [1/1], Step [5275/13804], Loss: 2.7846, Perplexity: 16.1939, time_taken_in_seconds: 62
Epoch [1/1], Step [5276/13804], Loss: 2.8642, Perplexity: 17.5342, time_taken_in_seconds: 62
Epoch [1/1], Step [5277/13804], Loss: 3.1954, Perplexity: 24.4210, time_taken_in_seconds: 63
Epoch [1/1], Step [5278/13804], Loss: 3.5976, Perplexity: 36.5121, time_taken_in_seconds: 64
Epoch [1/1], Step [5279/13804], Loss: 2.9325, Perplexity: 18.7754, time_taken_in_seconds: 65
Epoch [1/1], Step [5280/13804], Loss: 3.0920, Perplexity: 22.0212, time_taken_in_seconds: 66
Epoch [1/1], Step [5281/13804], Loss: 2.8071, Perplexity: 16.5613, time_taken_in_seconds: 67
Epoch [1/1], Step [5282/13804], Loss: 2.9838, Perplexity: 19.7637, time_taken_in_seconds: 67
Epoch [1/1], Step [5283/13804], Loss: 2.3481, Perplexity: 10.4660, time_taken_in_seconds: 68
Epoch [1/1], Step [5284/13804], Loss: 2.9006, Perplexity: 18.1857, time_taken_in_seconds: 69
Epoch [1/1], Step [5285/13804], Loss: 3.3265, Perplexity: 27.8416, time_taken_in_seconds: 70
Epoch [1/1], Step [5286/13804], Loss: 2.9856, Perplexity: 19.7986, time_taken_in_seconds: 71
Epoch [1/1], Step [5287/13804], Loss: 2.4126, Perplexity: 11.1626, time_taken_in_seconds: 72
Epoch [1/1], Step [5288/13804], Loss: 2.7682, Perplexity: 15.9299, time_taken_in_seconds: 72
Epoch [1/1], Step [5289/13804], Loss: 2.6439, Perplexity: 14.0681, time_taken_in_seconds: 73
Epoch [1/1], Step [5290/13804], Loss: 2.8072, Perplexity: 16.5640, time_taken_in_seconds: 74
Epoch [1/1], Step [5291/13804], Loss: 2.6881, Perplexity: 14.7043, time_taken_in_seconds: 75
Epoch [1/1], Step [5292/13804], Loss: 2.5406, Perplexity: 12.6874, time_taken_in_seconds: 76
Epoch [1/1], Step [5293/13804], Loss: 2.5981, Perplexity: 13.4385, time_taken_in_seconds: 77
Epoch [1/1], Step [5294/13804], Loss: 2.4807, Perplexity: 11.9498, time_taken_in_seconds: 78
Epoch [1/1], Step [5295/13804], Loss: 2.3692, Perplexity: 10.6890, time_taken_in_seconds: 79
Epoch [1/1], Step [5296/13804], Loss: 2.8645, Perplexity: 17.5396, time_taken_in_seconds: 79
Epoch [1/1], Step [5297/13804], Loss: 2.8145, Perplexity: 16.6842, time_taken_in_seconds: 80
Epoch [1/1], Step [5298/13804], Loss: 2.6909, Perplexity: 14.7454, time_taken_in_seconds: 81
Epoch [1/1], Step [5299/13804], Loss: 2.6640, Perplexity: 14.3540, time_taken_in_seconds: 82
Epoch [1/1], Step [5300/13804], Loss: 2.6553, Perplexity: 14.2299, time_taken_in_seconds: 83
Epoch [1/1], Step [5301/13804], Loss: 3.1565, Perplexity: 23.4872, time_taken_in_seconds: 0
Epoch [1/1], Step [5302/13804], Loss: 2.4633, Perplexity: 11.7436, time_taken_in_seconds: 1
Epoch [1/1], Step [5303/13804], Loss: 3.5753, Perplexity: 35.7058, time_taken_in_seconds: 2
Epoch [1/1], Step [5304/13804], Loss: 2.5722, Perplexity: 13.0950, time_taken_in_seconds: 3
Epoch [1/1], Step [5305/13804], Loss: 3.0412, Perplexity: 20.9294, time_taken_in_seconds: 4
Epoch [1/1], Step [5306/13804], Loss: 2.2987, Perplexity: 9.9612, time_taken_in_seconds: 4
Epoch [1/1], Step [5307/13804], Loss: 2.5305, Perplexity: 12.5594, time_taken_in_seconds: 5
Epoch [1/1], Step [5308/13804], Loss: 2.7187, Perplexity: 15.1599, time_taken_in_seconds: 6
Epoch [1/1], Step [5309/13804], Loss: 2.4209, Perplexity: 11.2554, time_taken_in_seconds: 7
Epoch [1/1], Step [5310/13804], Loss: 2.8575, Perplexity: 17.4172, time_taken_in_seconds: 8
Epoch [1/1], Step [5311/13804], Loss: 2.4877, Perplexity: 12.0337, time_taken_in_seconds: 9
Epoch [1/1], Step [5312/13804], Loss: 3.0654, Perplexity: 21.4439, time_taken_in_seconds: 10
Epoch [1/1], Step [5313/13804], Loss: 2.9984, Perplexity: 20.0538, time_taken_in_seconds: 10
Epoch [1/1], Step [5314/13804], Loss: 2.5149, Perplexity: 12.3654, time_taken_in_seconds: 11
Epoch [1/1], Step [5315/13804], Loss: 2.6709, Perplexity: 14.4534, time_taken_in_seconds: 12
Epoch [1/1], Step [5316/13804], Loss: 2.5548, Perplexity: 12.8689, time_taken_in_seconds: 13
Epoch [1/1], Step [5317/13804], Loss: 2.7153, Perplexity: 15.1088, time_taken_in_seconds: 14
Epoch [1/1], Step [5318/13804], Loss: 2.4868, Perplexity: 12.0229, time_taken_in_seconds: 14
Epoch [1/1], Step [5319/13804], Loss: 2.5557, Perplexity: 12.8803, time_taken_in_seconds: 15
Epoch [1/1], Step [5320/13804], Loss: 2.5451, Perplexity: 12.7447, time_taken_in_seconds: 16
Epoch [1/1], Step [5321/13804], Loss: 2.9523, Perplexity: 19.1496, time_taken_in_seconds: 17
Epoch [1/1], Step [5322/13804], Loss: 2.5930, Perplexity: 13.3693, time_taken_in_seconds: 18
Epoch [1/1], Step [5323/13804], Loss: 2.6239, Perplexity: 13.7897, time_taken_in_seconds: 19
Epoch [1/1], Step [5324/13804], Loss: 2.6221, Perplexity: 13.7647, time_taken_in_seconds: 19
Epoch [1/1], Step [5325/13804], Loss: 2.5500, Perplexity: 12.8065, time_taken_in_seconds: 20
Epoch [1/1], Step [5326/13804], Loss: 2.6518, Perplexity: 14.1794, time_taken_in_seconds: 21
Epoch [1/1], Step [5327/13804], Loss: 2.9858, Perplexity: 19.8031, time_taken_in_seconds: 22
Epoch [1/1], Step [5328/13804], Loss: 2.6968, Perplexity: 14.8319, time_taken_in_seconds: 23
Epoch [1/1], Step [5329/13804], Loss: 2.6983, Perplexity: 14.8549, time_taken_in_seconds: 23
Epoch [1/1], Step [5330/13804], Loss: 3.0028, Perplexity: 20.1426, time_taken_in_seconds: 24
Epoch [1/1], Step [5331/13804], Loss: 2.5230, Perplexity: 12.4665, time_taken_in_seconds: 25
Epoch [1/1], Step [5332/13804], Loss: 2.5430, Perplexity: 12.7173, time_taken_in_seconds: 26
Epoch [1/1], Step [5333/13804], Loss: 2.7822, Perplexity: 16.1549, time_taken_in_seconds: 27
Epoch [1/1], Step [5334/13804], Loss: 3.0531, Perplexity: 21.1810, time_taken_in_seconds: 27
Epoch [1/1], Step [5335/13804], Loss: 2.5114, Perplexity: 12.3216, time_taken_in_seconds: 28
Epoch [1/1], Step [5336/13804], Loss: 2.4517, Perplexity: 11.6078, time_taken_in_seconds: 29
Epoch [1/1], Step [5337/13804], Loss: 3.1668, Perplexity: 23.7305, time_taken_in_seconds: 30
Epoch [1/1], Step [5338/13804], Loss: 2.7265, Perplexity: 15.2789, time_taken_in_seconds: 31
Epoch [1/1], Step [5339/13804], Loss: 3.0010, Perplexity: 20.1049, time_taken_in_seconds: 32
Epoch [1/1], Step [5340/13804], Loss: 2.6014, Perplexity: 13.4826, time_taken_in_seconds: 32
Epoch [1/1], Step [5341/13804], Loss: 2.7976, Perplexity: 16.4051, time_taken_in_seconds: 33
Epoch [1/1], Step [5342/13804], Loss: 2.7261, Perplexity: 15.2734, time_taken_in_seconds: 34
Epoch [1/1], Step [5343/13804], Loss: 2.6222, Perplexity: 13.7658, time_taken_in_seconds: 35
Epoch [1/1], Step [5344/13804], Loss: 2.7287, Perplexity: 15.3122, time_taken_in_seconds: 36
Epoch [1/1], Step [5345/13804], Loss: 2.4294, Perplexity: 11.3516, time_taken_in_seconds: 36
Epoch [1/1], Step [5346/13804], Loss: 2.7434, Perplexity: 15.5396, time_taken_in_seconds: 37
Epoch [1/1], Step [5347/13804], Loss: 2.9197, Perplexity: 18.5366, time_taken_in_seconds: 38
Epoch [1/1], Step [5348/13804], Loss: 2.4879, Perplexity: 12.0357, time_taken_in_seconds: 39
Epoch [1/1], Step [5349/13804], Loss: 2.9400, Perplexity: 18.9156, time_taken_in_seconds: 40
Epoch [1/1], Step [5350/13804], Loss: 2.5702, Perplexity: 13.0685, time_taken_in_seconds: 41
Epoch [1/1], Step [5351/13804], Loss: 2.7210, Perplexity: 15.1960, time_taken_in_seconds: 41
Epoch [1/1], Step [5352/13804], Loss: 3.2013, Perplexity: 24.5646, time_taken_in_seconds: 42
Epoch [1/1], Step [5353/13804], Loss: 2.6573, Perplexity: 14.2572, time_taken_in_seconds: 43
Epoch [1/1], Step [5354/13804], Loss: 2.5216, Perplexity: 12.4484, time_taken_in_seconds: 44
Epoch [1/1], Step [5355/13804], Loss: 2.6239, Perplexity: 13.7899, time_taken_in_seconds: 45
Epoch [1/1], Step [5356/13804], Loss: 2.5969, Perplexity: 13.4223, time_taken_in_seconds: 45
Epoch [1/1], Step [5357/13804], Loss: 2.8750, Perplexity: 17.7249, time_taken_in_seconds: 46
Epoch [1/1], Step [5358/13804], Loss: 2.5731, Perplexity: 13.1060, time_taken_in_seconds: 47
Epoch [1/1], Step [5359/13804], Loss: 3.0573, Perplexity: 21.2707, time_taken_in_seconds: 48
Epoch [1/1], Step [5360/13804], Loss: 2.6484, Perplexity: 14.1310, time_taken_in_seconds: 49
Epoch [1/1], Step [5361/13804], Loss: 3.3528, Perplexity: 28.5826, time_taken_in_seconds: 50
Epoch [1/1], Step [5362/13804], Loss: 2.7560, Perplexity: 15.7365, time_taken_in_seconds: 50
Epoch [1/1], Step [5363/13804], Loss: 2.9303, Perplexity: 18.7342, time_taken_in_seconds: 51
Epoch [1/1], Step [5364/13804], Loss: 2.8632, Perplexity: 17.5175, time_taken_in_seconds: 52
Epoch [1/1], Step [5365/13804], Loss: 2.5335, Perplexity: 12.5972, time_taken_in_seconds: 53
Epoch [1/1], Step [5366/13804], Loss: 2.4870, Perplexity: 12.0249, time_taken_in_seconds: 54
Epoch [1/1], Step [5367/13804], Loss: 2.8167, Perplexity: 16.7223, time_taken_in_seconds: 55
Epoch [1/1], Step [5368/13804], Loss: 2.6022, Perplexity: 13.4940, time_taken_in_seconds: 56
Epoch [1/1], Step [5369/13804], Loss: 2.7889, Perplexity: 16.2629, time_taken_in_seconds: 56
Epoch [1/1], Step [5370/13804], Loss: 2.9302, Perplexity: 18.7315, time_taken_in_seconds: 57
Epoch [1/1], Step [5371/13804], Loss: 2.6838, Perplexity: 14.6405, time_taken_in_seconds: 58
Epoch [1/1], Step [5372/13804], Loss: 2.7321, Perplexity: 15.3651, time_taken_in_seconds: 59
Epoch [1/1], Step [5373/13804], Loss: 2.6436, Perplexity: 14.0633, time_taken_in_seconds: 60
Epoch [1/1], Step [5374/13804], Loss: 2.7273, Perplexity: 15.2910, time_taken_in_seconds: 60
Epoch [1/1], Step [5375/13804], Loss: 2.5392, Perplexity: 12.6691, time_taken_in_seconds: 61
Epoch [1/1], Step [5376/13804], Loss: 2.6107, Perplexity: 13.6086, time_taken_in_seconds: 62
Epoch [1/1], Step [5377/13804], Loss: 2.8302, Perplexity: 16.9493, time_taken_in_seconds: 63
Epoch [1/1], Step [5378/13804], Loss: 2.9723, Perplexity: 19.5365, time_taken_in_seconds: 64
Epoch [1/1], Step [5379/13804], Loss: 2.8191, Perplexity: 16.7615, time_taken_in_seconds: 64
Epoch [1/1], Step [5380/13804], Loss: 2.5809, Perplexity: 13.2093, time_taken_in_seconds: 65
Epoch [1/1], Step [5381/13804], Loss: 2.8166, Perplexity: 16.7191, time_taken_in_seconds: 66
Epoch [1/1], Step [5382/13804], Loss: 2.7076, Perplexity: 14.9939, time_taken_in_seconds: 67
Epoch [1/1], Step [5383/13804], Loss: 2.5409, Perplexity: 12.6906, time_taken_in_seconds: 68
Epoch [1/1], Step [5384/13804], Loss: 2.4031, Perplexity: 11.0573, time_taken_in_seconds: 69
Epoch [1/1], Step [5385/13804], Loss: 2.9283, Perplexity: 18.6951, time_taken_in_seconds: 69
Epoch [1/1], Step [5386/13804], Loss: 3.1866, Perplexity: 24.2065, time_taken_in_seconds: 70
Epoch [1/1], Step [5387/13804], Loss: 2.3921, Perplexity: 10.9368, time_taken_in_seconds: 71
Epoch [1/1], Step [5388/13804], Loss: 2.8801, Perplexity: 17.8167, time_taken_in_seconds: 72
Epoch [1/1], Step [5389/13804], Loss: 2.8035, Perplexity: 16.5015, time_taken_in_seconds: 73
Epoch [1/1], Step [5390/13804], Loss: 2.5027, Perplexity: 12.2149, time_taken_in_seconds: 74
Epoch [1/1], Step [5391/13804], Loss: 2.2736, Perplexity: 9.7146, time_taken_in_seconds: 74
Epoch [1/1], Step [5392/13804], Loss: 2.9599, Perplexity: 19.2961, time_taken_in_seconds: 75
Epoch [1/1], Step [5393/13804], Loss: 2.5333, Perplexity: 12.5951, time_taken_in_seconds: 76
Epoch [1/1], Step [5394/13804], Loss: 3.2207, Perplexity: 25.0447, time_taken_in_seconds: 77
Epoch [1/1], Step [5395/13804], Loss: 2.2240, Perplexity: 9.2444, time_taken_in_seconds: 78
Epoch [1/1], Step [5396/13804], Loss: 2.9352, Perplexity: 18.8259, time_taken_in_seconds: 78
Epoch [1/1], Step [5397/13804], Loss: 3.2281, Perplexity: 25.2323, time_taken_in_seconds: 79
Epoch [1/1], Step [5398/13804], Loss: 2.8423, Perplexity: 17.1558, time_taken_in_seconds: 80
Epoch [1/1], Step [5399/13804], Loss: 2.5966, Perplexity: 13.4184, time_taken_in_seconds: 81
Epoch [1/1], Step [5400/13804], Loss: 2.7444, Perplexity: 15.5557, time_taken_in_seconds: 82
Epoch [1/1], Step [5401/13804], Loss: 2.5472, Perplexity: 12.7716, time_taken_in_seconds: 0
Epoch [1/1], Step [5402/13804], Loss: 3.0016, Perplexity: 20.1176, time_taken_in_seconds: 1
Epoch [1/1], Step [5403/13804], Loss: 2.9362, Perplexity: 18.8447, time_taken_in_seconds: 2
Epoch [1/1], Step [5404/13804], Loss: 2.8048, Perplexity: 16.5242, time_taken_in_seconds: 3
Epoch [1/1], Step [5405/13804], Loss: 2.6208, Perplexity: 13.7463, time_taken_in_seconds: 4
Epoch [1/1], Step [5406/13804], Loss: 2.7742, Perplexity: 16.0265, time_taken_in_seconds: 4
Epoch [1/1], Step [5407/13804], Loss: 2.7860, Perplexity: 16.2158, time_taken_in_seconds: 5
Epoch [1/1], Step [5408/13804], Loss: 2.5937, Perplexity: 13.3794, time_taken_in_seconds: 6
Epoch [1/1], Step [5409/13804], Loss: 3.2071, Perplexity: 24.7085, time_taken_in_seconds: 7
Epoch [1/1], Step [5410/13804], Loss: 2.6108, Perplexity: 13.6103, time_taken_in_seconds: 8
Epoch [1/1], Step [5411/13804], Loss: 2.8448, Perplexity: 17.1974, time_taken_in_seconds: 9
Epoch [1/1], Step [5412/13804], Loss: 2.8744, Perplexity: 17.7140, time_taken_in_seconds: 9
Epoch [1/1], Step [5413/13804], Loss: 2.7067, Perplexity: 14.9791, time_taken_in_seconds: 10
Epoch [1/1], Step [5414/13804], Loss: 2.8653, Perplexity: 17.5551, time_taken_in_seconds: 11
Epoch [1/1], Step [5415/13804], Loss: 2.8079, Perplexity: 16.5754, time_taken_in_seconds: 12
Epoch [1/1], Step [5416/13804], Loss: 2.6854, Perplexity: 14.6643, time_taken_in_seconds: 13
Epoch [1/1], Step [5417/13804], Loss: 3.1759, Perplexity: 23.9485, time_taken_in_seconds: 14
Epoch [1/1], Step [5418/13804], Loss: 2.8410, Perplexity: 17.1333, time_taken_in_seconds: 14
Epoch [1/1], Step [5419/13804], Loss: 3.7218, Perplexity: 41.3368, time_taken_in_seconds: 15
Epoch [1/1], Step [5420/13804], Loss: 2.8626, Perplexity: 17.5077, time_taken_in_seconds: 16
Epoch [1/1], Step [5421/13804], Loss: 2.6744, Perplexity: 14.5041, time_taken_in_seconds: 17
Epoch [1/1], Step [5422/13804], Loss: 2.8486, Perplexity: 17.2630, time_taken_in_seconds: 18
Epoch [1/1], Step [5423/13804], Loss: 2.9206, Perplexity: 18.5522, time_taken_in_seconds: 19
Epoch [1/1], Step [5424/13804], Loss: 3.1354, Perplexity: 22.9971, time_taken_in_seconds: 19
Epoch [1/1], Step [5425/13804], Loss: 2.3843, Perplexity: 10.8512, time_taken_in_seconds: 20
Epoch [1/1], Step [5426/13804], Loss: 2.8351, Perplexity: 17.0323, time_taken_in_seconds: 21
Epoch [1/1], Step [5427/13804], Loss: 2.5899, Perplexity: 13.3289, time_taken_in_seconds: 22
Epoch [1/1], Step [5428/13804], Loss: 2.5589, Perplexity: 12.9219, time_taken_in_seconds: 23
Epoch [1/1], Step [5429/13804], Loss: 3.1175, Perplexity: 22.5896, time_taken_in_seconds: 24
Epoch [1/1], Step [5430/13804], Loss: 2.6137, Perplexity: 13.6496, time_taken_in_seconds: 24
Epoch [1/1], Step [5431/13804], Loss: 2.6951, Perplexity: 14.8075, time_taken_in_seconds: 25
Epoch [1/1], Step [5432/13804], Loss: 3.0737, Perplexity: 21.6224, time_taken_in_seconds: 26
Epoch [1/1], Step [5433/13804], Loss: 3.2506, Perplexity: 25.8057, time_taken_in_seconds: 27
Epoch [1/1], Step [5434/13804], Loss: 2.6185, Perplexity: 13.7147, time_taken_in_seconds: 28
Epoch [1/1], Step [5435/13804], Loss: 2.8422, Perplexity: 17.1537, time_taken_in_seconds: 28
Epoch [1/1], Step [5436/13804], Loss: 2.5127, Perplexity: 12.3385, time_taken_in_seconds: 29
Epoch [1/1], Step [5437/13804], Loss: 2.5291, Perplexity: 12.5417, time_taken_in_seconds: 30
Epoch [1/1], Step [5438/13804], Loss: 2.9159, Perplexity: 18.4660, time_taken_in_seconds: 31
Epoch [1/1], Step [5439/13804], Loss: 3.5449, Perplexity: 34.6374, time_taken_in_seconds: 32
Epoch [1/1], Step [5440/13804], Loss: 3.0082, Perplexity: 20.2503, time_taken_in_seconds: 33
Epoch [1/1], Step [5441/13804], Loss: 3.0453, Perplexity: 21.0161, time_taken_in_seconds: 34
Epoch [1/1], Step [5442/13804], Loss: 2.4540, Perplexity: 11.6351, time_taken_in_seconds: 34
Epoch [1/1], Step [5443/13804], Loss: 2.6993, Perplexity: 14.8698, time_taken_in_seconds: 35
Epoch [1/1], Step [5444/13804], Loss: 2.6405, Perplexity: 14.0203, time_taken_in_seconds: 36
Epoch [1/1], Step [5445/13804], Loss: 3.7808, Perplexity: 43.8513, time_taken_in_seconds: 37
Epoch [1/1], Step [5446/13804], Loss: 2.8117, Perplexity: 16.6379, time_taken_in_seconds: 38
Epoch [1/1], Step [5447/13804], Loss: 2.7733, Perplexity: 16.0118, time_taken_in_seconds: 39
Epoch [1/1], Step [5448/13804], Loss: 2.7265, Perplexity: 15.2787, time_taken_in_seconds: 39
Epoch [1/1], Step [5449/13804], Loss: 2.6370, Perplexity: 13.9708, time_taken_in_seconds: 40
Epoch [1/1], Step [5450/13804], Loss: 3.0165, Perplexity: 20.4207, time_taken_in_seconds: 41
Epoch [1/1], Step [5451/13804], Loss: 2.5067, Perplexity: 12.2641, time_taken_in_seconds: 42
Epoch [1/1], Step [5452/13804], Loss: 2.4993, Perplexity: 12.1738, time_taken_in_seconds: 43
Epoch [1/1], Step [5453/13804], Loss: 2.4406, Perplexity: 11.4799, time_taken_in_seconds: 44
Epoch [1/1], Step [5454/13804], Loss: 2.6641, Perplexity: 14.3549, time_taken_in_seconds: 44
Epoch [1/1], Step [5455/13804], Loss: 2.8826, Perplexity: 17.8613, time_taken_in_seconds: 45
Epoch [1/1], Step [5456/13804], Loss: 2.4292, Perplexity: 11.3493, time_taken_in_seconds: 46
Epoch [1/1], Step [5457/13804], Loss: 2.7363, Perplexity: 15.4303, time_taken_in_seconds: 47
Epoch [1/1], Step [5458/13804], Loss: 2.9826, Perplexity: 19.7398, time_taken_in_seconds: 48
Epoch [1/1], Step [5459/13804], Loss: 2.7149, Perplexity: 15.1025, time_taken_in_seconds: 48
Epoch [1/1], Step [5460/13804], Loss: 2.5590, Perplexity: 12.9224, time_taken_in_seconds: 49
Epoch [1/1], Step [5461/13804], Loss: 2.8578, Perplexity: 17.4237, time_taken_in_seconds: 50
Epoch [1/1], Step [5462/13804], Loss: 2.7886, Perplexity: 16.2586, time_taken_in_seconds: 51
Epoch [1/1], Step [5463/13804], Loss: 2.7474, Perplexity: 15.6015, time_taken_in_seconds: 52
Epoch [1/1], Step [5464/13804], Loss: 2.4110, Perplexity: 11.1448, time_taken_in_seconds: 53
Epoch [1/1], Step [5465/13804], Loss: 2.7709, Perplexity: 15.9736, time_taken_in_seconds: 53
Epoch [1/1], Step [5466/13804], Loss: 2.2176, Perplexity: 9.1854, time_taken_in_seconds: 54
Epoch [1/1], Step [5467/13804], Loss: 2.5818, Perplexity: 13.2203, time_taken_in_seconds: 55
Epoch [1/1], Step [5468/13804], Loss: 2.5842, Perplexity: 13.2527, time_taken_in_seconds: 56
Epoch [1/1], Step [5469/13804], Loss: 2.8549, Perplexity: 17.3719, time_taken_in_seconds: 57
Epoch [1/1], Step [5470/13804], Loss: 2.9825, Perplexity: 19.7372, time_taken_in_seconds: 58
Epoch [1/1], Step [5471/13804], Loss: 2.7576, Perplexity: 15.7612, time_taken_in_seconds: 58
Epoch [1/1], Step [5472/13804], Loss: 2.8842, Perplexity: 17.8891, time_taken_in_seconds: 59
Epoch [1/1], Step [5473/13804], Loss: 3.2586, Perplexity: 26.0138, time_taken_in_seconds: 60
Epoch [1/1], Step [5474/13804], Loss: 2.6087, Perplexity: 13.5814, time_taken_in_seconds: 61
Epoch [1/1], Step [5475/13804], Loss: 2.5868, Perplexity: 13.2875, time_taken_in_seconds: 62
Epoch [1/1], Step [5476/13804], Loss: 2.7056, Perplexity: 14.9633, time_taken_in_seconds: 62
Epoch [1/1], Step [5477/13804], Loss: 2.5729, Perplexity: 13.1039, time_taken_in_seconds: 63
Epoch [1/1], Step [5478/13804], Loss: 2.3891, Perplexity: 10.9035, time_taken_in_seconds: 64
Epoch [1/1], Step [5479/13804], Loss: 2.4362, Perplexity: 11.4299, time_taken_in_seconds: 65
Epoch [1/1], Step [5480/13804], Loss: 2.2663, Perplexity: 9.6434, time_taken_in_seconds: 66
Epoch [1/1], Step [5481/13804], Loss: 2.7059, Perplexity: 14.9679, time_taken_in_seconds: 66
Epoch [1/1], Step [5482/13804], Loss: 2.7589, Perplexity: 15.7828, time_taken_in_seconds: 67
Epoch [1/1], Step [5483/13804], Loss: 2.8318, Perplexity: 16.9754, time_taken_in_seconds: 68
Epoch [1/1], Step [5484/13804], Loss: 2.5428, Perplexity: 12.7150, time_taken_in_seconds: 69
Epoch [1/1], Step [5485/13804], Loss: 2.6212, Perplexity: 13.7528, time_taken_in_seconds: 70
Epoch [1/1], Step [5486/13804], Loss: 2.5798, Perplexity: 13.1945, time_taken_in_seconds: 71
Epoch [1/1], Step [5487/13804], Loss: 2.9511, Perplexity: 19.1278, time_taken_in_seconds: 71
Epoch [1/1], Step [5488/13804], Loss: 2.7786, Perplexity: 16.0969, time_taken_in_seconds: 72
Epoch [1/1], Step [5489/13804], Loss: 2.9408, Perplexity: 18.9306, time_taken_in_seconds: 73
Epoch [1/1], Step [5490/13804], Loss: 2.6269, Perplexity: 13.8315, time_taken_in_seconds: 74
Epoch [1/1], Step [5491/13804], Loss: 2.8364, Perplexity: 17.0544, time_taken_in_seconds: 75
Epoch [1/1], Step [5492/13804], Loss: 2.4529, Perplexity: 11.6220, time_taken_in_seconds: 76
Epoch [1/1], Step [5493/13804], Loss: 2.7276, Perplexity: 15.2967, time_taken_in_seconds: 76
Epoch [1/1], Step [5494/13804], Loss: 3.0275, Perplexity: 20.6466, time_taken_in_seconds: 77
Epoch [1/1], Step [5495/13804], Loss: 2.4889, Perplexity: 12.0486, time_taken_in_seconds: 78
Epoch [1/1], Step [5496/13804], Loss: 2.3991, Perplexity: 11.0128, time_taken_in_seconds: 79
Epoch [1/1], Step [5497/13804], Loss: 2.7330, Perplexity: 15.3793, time_taken_in_seconds: 80
Epoch [1/1], Step [5498/13804], Loss: 2.7732, Perplexity: 16.0092, time_taken_in_seconds: 81
Epoch [1/1], Step [5499/13804], Loss: 2.6712, Perplexity: 14.4577, time_taken_in_seconds: 81
Epoch [1/1], Step [5500/13804], Loss: 2.9130, Perplexity: 18.4119, time_taken_in_seconds: 82
Epoch [1/1], Step [5501/13804], Loss: 2.6776, Perplexity: 14.5496, time_taken_in_seconds: 0
Epoch [1/1], Step [5502/13804], Loss: 3.1670, Perplexity: 23.7357, time_taken_in_seconds: 1
Epoch [1/1], Step [5503/13804], Loss: 2.5978, Perplexity: 13.4348, time_taken_in_seconds: 2
Epoch [1/1], Step [5504/13804], Loss: 2.4445, Perplexity: 11.5243, time_taken_in_seconds: 3
Epoch [1/1], Step [5505/13804], Loss: 2.6741, Perplexity: 14.4990, time_taken_in_seconds: 4
Epoch [1/1], Step [5506/13804], Loss: 2.7023, Perplexity: 14.9135, time_taken_in_seconds: 4
Epoch [1/1], Step [5507/13804], Loss: 2.6652, Perplexity: 14.3715, time_taken_in_seconds: 5
Epoch [1/1], Step [5508/13804], Loss: 2.9846, Perplexity: 19.7784, time_taken_in_seconds: 6
Epoch [1/1], Step [5509/13804], Loss: 2.8823, Perplexity: 17.8556, time_taken_in_seconds: 7
Epoch [1/1], Step [5510/13804], Loss: 2.8369, Perplexity: 17.0626, time_taken_in_seconds: 8
Epoch [1/1], Step [5511/13804], Loss: 2.8115, Perplexity: 16.6345, time_taken_in_seconds: 9
Epoch [1/1], Step [5512/13804], Loss: 2.8084, Perplexity: 16.5833, time_taken_in_seconds: 10
Epoch [1/1], Step [5513/13804], Loss: 2.4499, Perplexity: 11.5875, time_taken_in_seconds: 10
Epoch [1/1], Step [5514/13804], Loss: 2.7079, Perplexity: 14.9975, time_taken_in_seconds: 11
Epoch [1/1], Step [5515/13804], Loss: 2.5242, Perplexity: 12.4804, time_taken_in_seconds: 12
Epoch [1/1], Step [5516/13804], Loss: 2.3609, Perplexity: 10.6001, time_taken_in_seconds: 13
Epoch [1/1], Step [5517/13804], Loss: 2.8854, Perplexity: 17.9100, time_taken_in_seconds: 14
Epoch [1/1], Step [5518/13804], Loss: 2.4901, Perplexity: 12.0626, time_taken_in_seconds: 15
Epoch [1/1], Step [5519/13804], Loss: 2.6436, Perplexity: 14.0636, time_taken_in_seconds: 15
Epoch [1/1], Step [5520/13804], Loss: 2.6510, Perplexity: 14.1683, time_taken_in_seconds: 16
Epoch [1/1], Step [5521/13804], Loss: 2.7060, Perplexity: 14.9698, time_taken_in_seconds: 17
Epoch [1/1], Step [5522/13804], Loss: 2.5764, Perplexity: 13.1494, time_taken_in_seconds: 18
Epoch [1/1], Step [5523/13804], Loss: 2.4591, Perplexity: 11.6940, time_taken_in_seconds: 19
Epoch [1/1], Step [5524/13804], Loss: 2.4655, Perplexity: 11.7695, time_taken_in_seconds: 19
Epoch [1/1], Step [5525/13804], Loss: 2.7366, Perplexity: 15.4345, time_taken_in_seconds: 20
Epoch [1/1], Step [5526/13804], Loss: 2.6780, Perplexity: 14.5565, time_taken_in_seconds: 21
Epoch [1/1], Step [5527/13804], Loss: 2.8709, Perplexity: 17.6537, time_taken_in_seconds: 22
Epoch [1/1], Step [5528/13804], Loss: 2.2831, Perplexity: 9.8074, time_taken_in_seconds: 23
Epoch [1/1], Step [5529/13804], Loss: 2.4930, Perplexity: 12.0977, time_taken_in_seconds: 24
Epoch [1/1], Step [5530/13804], Loss: 2.8141, Perplexity: 16.6783, time_taken_in_seconds: 24
Epoch [1/1], Step [5531/13804], Loss: 2.4574, Perplexity: 11.6747, time_taken_in_seconds: 25
Epoch [1/1], Step [5532/13804], Loss: 2.3291, Perplexity: 10.2685, time_taken_in_seconds: 26
Epoch [1/1], Step [5533/13804], Loss: 2.7608, Perplexity: 15.8118, time_taken_in_seconds: 27
Epoch [1/1], Step [5534/13804], Loss: 2.7826, Perplexity: 16.1615, time_taken_in_seconds: 28
Epoch [1/1], Step [5535/13804], Loss: 2.6285, Perplexity: 13.8530, time_taken_in_seconds: 29
Epoch [1/1], Step [5536/13804], Loss: 2.7812, Perplexity: 16.1380, time_taken_in_seconds: 29
Epoch [1/1], Step [5537/13804], Loss: 2.6418, Perplexity: 14.0382, time_taken_in_seconds: 30
Epoch [1/1], Step [5538/13804], Loss: 2.4344, Perplexity: 11.4086, time_taken_in_seconds: 31
Epoch [1/1], Step [5539/13804], Loss: 2.8177, Perplexity: 16.7381, time_taken_in_seconds: 32
Epoch [1/1], Step [5540/13804], Loss: 2.5824, Perplexity: 13.2285, time_taken_in_seconds: 33
Epoch [1/1], Step [5541/13804], Loss: 2.8796, Perplexity: 17.8067, time_taken_in_seconds: 34
Epoch [1/1], Step [5542/13804], Loss: 2.4572, Perplexity: 11.6725, time_taken_in_seconds: 34
Epoch [1/1], Step [5543/13804], Loss: 2.5445, Perplexity: 12.7367, time_taken_in_seconds: 35
Epoch [1/1], Step [5544/13804], Loss: 2.7727, Perplexity: 16.0010, time_taken_in_seconds: 36
Epoch [1/1], Step [5545/13804], Loss: 2.6888, Perplexity: 14.7143, time_taken_in_seconds: 37
Epoch [1/1], Step [5546/13804], Loss: 2.3185, Perplexity: 10.1600, time_taken_in_seconds: 38
Epoch [1/1], Step [5547/13804], Loss: 2.7038, Perplexity: 14.9370, time_taken_in_seconds: 38
Epoch [1/1], Step [5548/13804], Loss: 2.6476, Perplexity: 14.1196, time_taken_in_seconds: 39
Epoch [1/1], Step [5549/13804], Loss: 2.5968, Perplexity: 13.4213, time_taken_in_seconds: 40
Epoch [1/1], Step [5550/13804], Loss: 2.4051, Perplexity: 11.0798, time_taken_in_seconds: 41
Epoch [1/1], Step [5551/13804], Loss: 2.8104, Perplexity: 16.6168, time_taken_in_seconds: 42
Epoch [1/1], Step [5552/13804], Loss: 2.8022, Perplexity: 16.4801, time_taken_in_seconds: 43
Epoch [1/1], Step [5553/13804], Loss: 2.9407, Perplexity: 18.9292, time_taken_in_seconds: 43
Epoch [1/1], Step [5554/13804], Loss: 2.7390, Perplexity: 15.4717, time_taken_in_seconds: 44
Epoch [1/1], Step [5555/13804], Loss: 2.7235, Perplexity: 15.2341, time_taken_in_seconds: 45
Epoch [1/1], Step [5556/13804], Loss: 2.8417, Perplexity: 17.1447, time_taken_in_seconds: 46
Epoch [1/1], Step [5557/13804], Loss: 2.4553, Perplexity: 11.6497, time_taken_in_seconds: 47
Epoch [1/1], Step [5558/13804], Loss: 2.6709, Perplexity: 14.4533, time_taken_in_seconds: 48
Epoch [1/1], Step [5559/13804], Loss: 2.6005, Perplexity: 13.4704, time_taken_in_seconds: 48
Epoch [1/1], Step [5560/13804], Loss: 2.6401, Perplexity: 14.0150, time_taken_in_seconds: 49
Epoch [1/1], Step [5561/13804], Loss: 2.3340, Perplexity: 10.3190, time_taken_in_seconds: 50
Epoch [1/1], Step [5562/13804], Loss: 2.6557, Perplexity: 14.2350, time_taken_in_seconds: 51
Epoch [1/1], Step [5563/13804], Loss: 2.9928, Perplexity: 19.9408, time_taken_in_seconds: 52
Epoch [1/1], Step [5564/13804], Loss: 2.9501, Perplexity: 19.1075, time_taken_in_seconds: 52
Epoch [1/1], Step [5565/13804], Loss: 3.0106, Perplexity: 20.3005, time_taken_in_seconds: 53
Epoch [1/1], Step [5566/13804], Loss: 2.2931, Perplexity: 9.9053, time_taken_in_seconds: 54
Epoch [1/1], Step [5567/13804], Loss: 2.6434, Perplexity: 14.0611, time_taken_in_seconds: 55
Epoch [1/1], Step [5568/13804], Loss: 2.4238, Perplexity: 11.2882, time_taken_in_seconds: 56
Epoch [1/1], Step [5569/13804], Loss: 2.3839, Perplexity: 10.8466, time_taken_in_seconds: 57
Epoch [1/1], Step [5570/13804], Loss: 2.7497, Perplexity: 15.6376, time_taken_in_seconds: 57
Epoch [1/1], Step [5571/13804], Loss: 2.3085, Perplexity: 10.0596, time_taken_in_seconds: 58
Epoch [1/1], Step [5572/13804], Loss: 3.1955, Perplexity: 24.4225, time_taken_in_seconds: 59
Epoch [1/1], Step [5573/13804], Loss: 2.7013, Perplexity: 14.8997, time_taken_in_seconds: 60
Epoch [1/1], Step [5574/13804], Loss: 2.4895, Perplexity: 12.0558, time_taken_in_seconds: 61
Epoch [1/1], Step [5575/13804], Loss: 2.5352, Perplexity: 12.6195, time_taken_in_seconds: 62
Epoch [1/1], Step [5576/13804], Loss: 2.8196, Perplexity: 16.7699, time_taken_in_seconds: 62
Epoch [1/1], Step [5577/13804], Loss: 2.6156, Perplexity: 13.6747, time_taken_in_seconds: 63
Epoch [1/1], Step [5578/13804], Loss: 2.4983, Perplexity: 12.1614, time_taken_in_seconds: 64
Epoch [1/1], Step [5579/13804], Loss: 3.4163, Perplexity: 30.4556, time_taken_in_seconds: 65
Epoch [1/1], Step [5580/13804], Loss: 2.6477, Perplexity: 14.1219, time_taken_in_seconds: 66
Epoch [1/1], Step [5581/13804], Loss: 2.7687, Perplexity: 15.9377, time_taken_in_seconds: 67
Epoch [1/1], Step [5582/13804], Loss: 2.9012, Perplexity: 18.1954, time_taken_in_seconds: 67
Epoch [1/1], Step [5583/13804], Loss: 2.8371, Perplexity: 17.0665, time_taken_in_seconds: 69
Epoch [1/1], Step [5584/13804], Loss: 2.7615, Perplexity: 15.8228, time_taken_in_seconds: 69
Epoch [1/1], Step [5585/13804], Loss: 2.8041, Perplexity: 16.5122, time_taken_in_seconds: 70
Epoch [1/1], Step [5586/13804], Loss: 2.7950, Perplexity: 16.3630, time_taken_in_seconds: 71
Epoch [1/1], Step [5587/13804], Loss: 2.4088, Perplexity: 11.1206, time_taken_in_seconds: 72
Epoch [1/1], Step [5588/13804], Loss: 2.3186, Perplexity: 10.1612, time_taken_in_seconds: 73
Epoch [1/1], Step [5589/13804], Loss: 2.6022, Perplexity: 13.4934, time_taken_in_seconds: 73
Epoch [1/1], Step [5590/13804], Loss: 2.7149, Perplexity: 15.1028, time_taken_in_seconds: 74
Epoch [1/1], Step [5591/13804], Loss: 2.4344, Perplexity: 11.4088, time_taken_in_seconds: 75
Epoch [1/1], Step [5592/13804], Loss: 2.6354, Perplexity: 13.9495, time_taken_in_seconds: 76
Epoch [1/1], Step [5593/13804], Loss: 2.6915, Perplexity: 14.7542, time_taken_in_seconds: 77
Epoch [1/1], Step [5594/13804], Loss: 2.6158, Perplexity: 13.6780, time_taken_in_seconds: 78
Epoch [1/1], Step [5595/13804], Loss: 3.1904, Perplexity: 24.2985, time_taken_in_seconds: 78
Epoch [1/1], Step [5596/13804], Loss: 2.7382, Perplexity: 15.4595, time_taken_in_seconds: 79
Epoch [1/1], Step [5597/13804], Loss: 2.8551, Perplexity: 17.3759, time_taken_in_seconds: 80
Epoch [1/1], Step [5598/13804], Loss: 2.8817, Perplexity: 17.8442, time_taken_in_seconds: 81
Epoch [1/1], Step [5599/13804], Loss: 2.4709, Perplexity: 11.8332, time_taken_in_seconds: 82
Epoch [1/1], Step [5600/13804], Loss: 2.7949, Perplexity: 16.3610, time_taken_in_seconds: 83
Epoch [1/1], Step [5601/13804], Loss: 2.5213, Perplexity: 12.4449, time_taken_in_seconds: 0
Epoch [1/1], Step [5602/13804], Loss: 2.8018, Perplexity: 16.4743, time_taken_in_seconds: 1
Epoch [1/1], Step [5603/13804], Loss: 2.7720, Perplexity: 15.9905, time_taken_in_seconds: 2
Epoch [1/1], Step [5604/13804], Loss: 2.8788, Perplexity: 17.7936, time_taken_in_seconds: 3
Epoch [1/1], Step [5605/13804], Loss: 2.5814, Perplexity: 13.2154, time_taken_in_seconds: 4
Epoch [1/1], Step [5606/13804], Loss: 2.1574, Perplexity: 8.6486, time_taken_in_seconds: 4
Epoch [1/1], Step [5607/13804], Loss: 2.8589, Perplexity: 17.4415, time_taken_in_seconds: 5
Epoch [1/1], Step [5608/13804], Loss: 3.0751, Perplexity: 21.6520, time_taken_in_seconds: 6
Epoch [1/1], Step [5609/13804], Loss: 3.0341, Perplexity: 20.7824, time_taken_in_seconds: 7
Epoch [1/1], Step [5610/13804], Loss: 2.7339, Perplexity: 15.3933, time_taken_in_seconds: 8
Epoch [1/1], Step [5611/13804], Loss: 2.3227, Perplexity: 10.2034, time_taken_in_seconds: 9
Epoch [1/1], Step [5612/13804], Loss: 2.7716, Perplexity: 15.9835, time_taken_in_seconds: 9
Epoch [1/1], Step [5613/13804], Loss: 3.4685, Perplexity: 32.0898, time_taken_in_seconds: 10
Epoch [1/1], Step [5614/13804], Loss: 2.9750, Perplexity: 19.5896, time_taken_in_seconds: 11
Epoch [1/1], Step [5615/13804], Loss: 2.6236, Perplexity: 13.7848, time_taken_in_seconds: 12
Epoch [1/1], Step [5616/13804], Loss: 2.7000, Perplexity: 14.8792, time_taken_in_seconds: 13
Epoch [1/1], Step [5617/13804], Loss: 2.6149, Perplexity: 13.6662, time_taken_in_seconds: 14
Epoch [1/1], Step [5618/13804], Loss: 2.7109, Perplexity: 15.0421, time_taken_in_seconds: 14
Epoch [1/1], Step [5619/13804], Loss: 2.5971, Perplexity: 13.4253, time_taken_in_seconds: 15
Epoch [1/1], Step [5620/13804], Loss: 3.2910, Perplexity: 26.8695, time_taken_in_seconds: 16
Epoch [1/1], Step [5621/13804], Loss: 2.7340, Perplexity: 15.3944, time_taken_in_seconds: 17
Epoch [1/1], Step [5622/13804], Loss: 2.9470, Perplexity: 19.0493, time_taken_in_seconds: 18
Epoch [1/1], Step [5623/13804], Loss: 3.0152, Perplexity: 20.3932, time_taken_in_seconds: 19
Epoch [1/1], Step [5624/13804], Loss: 2.7120, Perplexity: 15.0589, time_taken_in_seconds: 19
Epoch [1/1], Step [5625/13804], Loss: 2.5273, Perplexity: 12.5201, time_taken_in_seconds: 20
Epoch [1/1], Step [5626/13804], Loss: 2.6758, Perplexity: 14.5238, time_taken_in_seconds: 21
Epoch [1/1], Step [5627/13804], Loss: 3.5952, Perplexity: 36.4227, time_taken_in_seconds: 22
Epoch [1/1], Step [5628/13804], Loss: 2.9972, Perplexity: 20.0295, time_taken_in_seconds: 23
Epoch [1/1], Step [5629/13804], Loss: 2.7876, Perplexity: 16.2415, time_taken_in_seconds: 24
Epoch [1/1], Step [5630/13804], Loss: 2.5876, Perplexity: 13.2981, time_taken_in_seconds: 24
Epoch [1/1], Step [5631/13804], Loss: 2.4549, Perplexity: 11.6449, time_taken_in_seconds: 25
Epoch [1/1], Step [5632/13804], Loss: 2.2365, Perplexity: 9.3601, time_taken_in_seconds: 26
Epoch [1/1], Step [5633/13804], Loss: 2.5888, Perplexity: 13.3134, time_taken_in_seconds: 27
Epoch [1/1], Step [5634/13804], Loss: 2.5522, Perplexity: 12.8358, time_taken_in_seconds: 28
Epoch [1/1], Step [5635/13804], Loss: 3.4558, Perplexity: 31.6835, time_taken_in_seconds: 29
Epoch [1/1], Step [5636/13804], Loss: 2.2721, Perplexity: 9.6998, time_taken_in_seconds: 29
Epoch [1/1], Step [5637/13804], Loss: 2.8255, Perplexity: 16.8687, time_taken_in_seconds: 30
Epoch [1/1], Step [5638/13804], Loss: 2.5781, Perplexity: 13.1719, time_taken_in_seconds: 31
Epoch [1/1], Step [5639/13804], Loss: 2.6626, Perplexity: 14.3331, time_taken_in_seconds: 32
Epoch [1/1], Step [5640/13804], Loss: 2.7920, Perplexity: 16.3143, time_taken_in_seconds: 33
Epoch [1/1], Step [5641/13804], Loss: 2.5630, Perplexity: 12.9747, time_taken_in_seconds: 33
Epoch [1/1], Step [5642/13804], Loss: 2.8191, Perplexity: 16.7618, time_taken_in_seconds: 34
Epoch [1/1], Step [5643/13804], Loss: 2.5114, Perplexity: 12.3224, time_taken_in_seconds: 35
Epoch [1/1], Step [5644/13804], Loss: 2.5360, Perplexity: 12.6296, time_taken_in_seconds: 36
Epoch [1/1], Step [5645/13804], Loss: 2.3830, Perplexity: 10.8373, time_taken_in_seconds: 37
Epoch [1/1], Step [5646/13804], Loss: 2.5075, Perplexity: 12.2738, time_taken_in_seconds: 38
Epoch [1/1], Step [5647/13804], Loss: 2.3919, Perplexity: 10.9343, time_taken_in_seconds: 38
Epoch [1/1], Step [5648/13804], Loss: 2.8812, Perplexity: 17.8363, time_taken_in_seconds: 39
Epoch [1/1], Step [5649/13804], Loss: 2.7513, Perplexity: 15.6631, time_taken_in_seconds: 40
Epoch [1/1], Step [5650/13804], Loss: 2.2729, Perplexity: 9.7078, time_taken_in_seconds: 41
Epoch [1/1], Step [5651/13804], Loss: 2.7596, Perplexity: 15.7933, time_taken_in_seconds: 42
Epoch [1/1], Step [5652/13804], Loss: 2.3352, Perplexity: 10.3311, time_taken_in_seconds: 43
Epoch [1/1], Step [5653/13804], Loss: 2.7430, Perplexity: 15.5338, time_taken_in_seconds: 44
Epoch [1/1], Step [5654/13804], Loss: 2.5316, Perplexity: 12.5741, time_taken_in_seconds: 44
Epoch [1/1], Step [5655/13804], Loss: 2.6055, Perplexity: 13.5380, time_taken_in_seconds: 45
Epoch [1/1], Step [5656/13804], Loss: 2.7697, Perplexity: 15.9534, time_taken_in_seconds: 46
Epoch [1/1], Step [5657/13804], Loss: 2.3972, Perplexity: 10.9919, time_taken_in_seconds: 47
Epoch [1/1], Step [5658/13804], Loss: 3.2063, Perplexity: 24.6874, time_taken_in_seconds: 48
Epoch [1/1], Step [5659/13804], Loss: 2.3007, Perplexity: 9.9809, time_taken_in_seconds: 49
Epoch [1/1], Step [5660/13804], Loss: 3.0424, Perplexity: 20.9562, time_taken_in_seconds: 50
Epoch [1/1], Step [5661/13804], Loss: 3.8503, Perplexity: 47.0054, time_taken_in_seconds: 50
Epoch [1/1], Step [5662/13804], Loss: 2.7933, Perplexity: 16.3343, time_taken_in_seconds: 51
Epoch [1/1], Step [5663/13804], Loss: 2.8925, Perplexity: 18.0383, time_taken_in_seconds: 52
Epoch [1/1], Step [5664/13804], Loss: 2.7144, Perplexity: 15.0948, time_taken_in_seconds: 53
Epoch [1/1], Step [5665/13804], Loss: 2.8720, Perplexity: 17.6723, time_taken_in_seconds: 54
Epoch [1/1], Step [5666/13804], Loss: 2.8401, Perplexity: 17.1177, time_taken_in_seconds: 54
Epoch [1/1], Step [5667/13804], Loss: 2.5076, Perplexity: 12.2752, time_taken_in_seconds: 55
Epoch [1/1], Step [5668/13804], Loss: 2.6335, Perplexity: 13.9225, time_taken_in_seconds: 56
Epoch [1/1], Step [5669/13804], Loss: 2.5094, Perplexity: 12.2981, time_taken_in_seconds: 57
Epoch [1/1], Step [5670/13804], Loss: 2.8158, Perplexity: 16.7068, time_taken_in_seconds: 58
Epoch [1/1], Step [5671/13804], Loss: 2.6176, Perplexity: 13.7033, time_taken_in_seconds: 59
Epoch [1/1], Step [5672/13804], Loss: 2.7512, Perplexity: 15.6617, time_taken_in_seconds: 59
Epoch [1/1], Step [5673/13804], Loss: 2.8011, Perplexity: 16.4620, time_taken_in_seconds: 60
Epoch [1/1], Step [5674/13804], Loss: 2.5773, Perplexity: 13.1618, time_taken_in_seconds: 61
Epoch [1/1], Step [5675/13804], Loss: 2.6089, Perplexity: 13.5845, time_taken_in_seconds: 62
Epoch [1/1], Step [5676/13804], Loss: 2.4012, Perplexity: 11.0359, time_taken_in_seconds: 63
Epoch [1/1], Step [5677/13804], Loss: 3.5633, Perplexity: 35.2811, time_taken_in_seconds: 64
Epoch [1/1], Step [5678/13804], Loss: 2.6242, Perplexity: 13.7937, time_taken_in_seconds: 64
Epoch [1/1], Step [5679/13804], Loss: 2.7225, Perplexity: 15.2176, time_taken_in_seconds: 65
Epoch [1/1], Step [5680/13804], Loss: 2.6818, Perplexity: 14.6110, time_taken_in_seconds: 66
Epoch [1/1], Step [5681/13804], Loss: 2.5502, Perplexity: 12.8094, time_taken_in_seconds: 67
Epoch [1/1], Step [5682/13804], Loss: 2.6997, Perplexity: 14.8750, time_taken_in_seconds: 68
Epoch [1/1], Step [5683/13804], Loss: 2.5887, Perplexity: 13.3119, time_taken_in_seconds: 69
Epoch [1/1], Step [5684/13804], Loss: 2.6402, Perplexity: 14.0165, time_taken_in_seconds: 69
Epoch [1/1], Step [5685/13804], Loss: 2.4977, Perplexity: 12.1547, time_taken_in_seconds: 70
Epoch [1/1], Step [5686/13804], Loss: 2.7632, Perplexity: 15.8498, time_taken_in_seconds: 71
Epoch [1/1], Step [5687/13804], Loss: 3.0566, Perplexity: 21.2549, time_taken_in_seconds: 72
Epoch [1/1], Step [5688/13804], Loss: 2.7637, Perplexity: 15.8582, time_taken_in_seconds: 73
Epoch [1/1], Step [5689/13804], Loss: 2.9063, Perplexity: 18.2893, time_taken_in_seconds: 73
Epoch [1/1], Step [5690/13804], Loss: 3.1493, Perplexity: 23.3194, time_taken_in_seconds: 74
Epoch [1/1], Step [5691/13804], Loss: 2.5800, Perplexity: 13.1977, time_taken_in_seconds: 75
Epoch [1/1], Step [5692/13804], Loss: 3.0036, Perplexity: 20.1571, time_taken_in_seconds: 76
Epoch [1/1], Step [5693/13804], Loss: 2.9901, Perplexity: 19.8869, time_taken_in_seconds: 77
Epoch [1/1], Step [5694/13804], Loss: 2.7023, Perplexity: 14.9135, time_taken_in_seconds: 78
Epoch [1/1], Step [5695/13804], Loss: 2.6381, Perplexity: 13.9861, time_taken_in_seconds: 78
Epoch [1/1], Step [5696/13804], Loss: 2.3777, Perplexity: 10.7796, time_taken_in_seconds: 79
Epoch [1/1], Step [5697/13804], Loss: 2.7966, Perplexity: 16.3896, time_taken_in_seconds: 80
Epoch [1/1], Step [5698/13804], Loss: 2.4072, Perplexity: 11.1030, time_taken_in_seconds: 81
Epoch [1/1], Step [5699/13804], Loss: 3.2928, Perplexity: 26.9178, time_taken_in_seconds: 82
Epoch [1/1], Step [5700/13804], Loss: 2.4833, Perplexity: 11.9803, time_taken_in_seconds: 83
Epoch [1/1], Step [5701/13804], Loss: 2.6103, Perplexity: 13.6027, time_taken_in_seconds: 0
Epoch [1/1], Step [5702/13804], Loss: 2.4972, Perplexity: 12.1480, time_taken_in_seconds: 1
Epoch [1/1], Step [5703/13804], Loss: 2.5277, Perplexity: 12.5241, time_taken_in_seconds: 2
Epoch [1/1], Step [5704/13804], Loss: 2.4754, Perplexity: 11.8859, time_taken_in_seconds: 3
Epoch [1/1], Step [5705/13804], Loss: 2.9866, Perplexity: 19.8192, time_taken_in_seconds: 4
Epoch [1/1], Step [5706/13804], Loss: 2.4584, Perplexity: 11.6864, time_taken_in_seconds: 4
Epoch [1/1], Step [5707/13804], Loss: 2.9567, Perplexity: 19.2339, time_taken_in_seconds: 5
Epoch [1/1], Step [5708/13804], Loss: 2.8654, Perplexity: 17.5566, time_taken_in_seconds: 6
Epoch [1/1], Step [5709/13804], Loss: 2.6967, Perplexity: 14.8303, time_taken_in_seconds: 7
Epoch [1/1], Step [5710/13804], Loss: 2.5420, Perplexity: 12.7054, time_taken_in_seconds: 8
Epoch [1/1], Step [5711/13804], Loss: 2.7623, Perplexity: 15.8361, time_taken_in_seconds: 9
Epoch [1/1], Step [5712/13804], Loss: 2.3083, Perplexity: 10.0571, time_taken_in_seconds: 9
Epoch [1/1], Step [5713/13804], Loss: 2.5366, Perplexity: 12.6365, time_taken_in_seconds: 10
Epoch [1/1], Step [5714/13804], Loss: 3.2034, Perplexity: 24.6168, time_taken_in_seconds: 11
Epoch [1/1], Step [5715/13804], Loss: 2.9872, Perplexity: 19.8308, time_taken_in_seconds: 12
Epoch [1/1], Step [5716/13804], Loss: 2.3454, Perplexity: 10.4374, time_taken_in_seconds: 13
Epoch [1/1], Step [5717/13804], Loss: 3.0441, Perplexity: 20.9904, time_taken_in_seconds: 14
Epoch [1/1], Step [5718/13804], Loss: 2.5283, Perplexity: 12.5327, time_taken_in_seconds: 14
Epoch [1/1], Step [5719/13804], Loss: 2.5849, Perplexity: 13.2614, time_taken_in_seconds: 15
Epoch [1/1], Step [5720/13804], Loss: 2.8301, Perplexity: 16.9468, time_taken_in_seconds: 16
Epoch [1/1], Step [5721/13804], Loss: 2.5273, Perplexity: 12.5195, time_taken_in_seconds: 17
Epoch [1/1], Step [5722/13804], Loss: 2.6276, Perplexity: 13.8399, time_taken_in_seconds: 18
Epoch [1/1], Step [5723/13804], Loss: 3.1087, Perplexity: 22.3917, time_taken_in_seconds: 18
Epoch [1/1], Step [5724/13804], Loss: 2.5781, Perplexity: 13.1725, time_taken_in_seconds: 19
Epoch [1/1], Step [5725/13804], Loss: 2.4901, Perplexity: 12.0630, time_taken_in_seconds: 20
Epoch [1/1], Step [5726/13804], Loss: 2.3975, Perplexity: 10.9957, time_taken_in_seconds: 21
Epoch [1/1], Step [5727/13804], Loss: 2.5122, Perplexity: 12.3321, time_taken_in_seconds: 22
Epoch [1/1], Step [5728/13804], Loss: 2.6751, Perplexity: 14.5134, time_taken_in_seconds: 23
Epoch [1/1], Step [5729/13804], Loss: 2.8746, Perplexity: 17.7185, time_taken_in_seconds: 24
Epoch [1/1], Step [5730/13804], Loss: 2.3824, Perplexity: 10.8308, time_taken_in_seconds: 24
Epoch [1/1], Step [5731/13804], Loss: 2.6092, Perplexity: 13.5878, time_taken_in_seconds: 25
Epoch [1/1], Step [5732/13804], Loss: 2.4892, Perplexity: 12.0519, time_taken_in_seconds: 26
Epoch [1/1], Step [5733/13804], Loss: 2.6462, Perplexity: 14.1008, time_taken_in_seconds: 27
Epoch [1/1], Step [5734/13804], Loss: 2.1165, Perplexity: 8.3017, time_taken_in_seconds: 28
Epoch [1/1], Step [5735/13804], Loss: 3.8445, Perplexity: 46.7373, time_taken_in_seconds: 29
Epoch [1/1], Step [5736/13804], Loss: 2.9735, Perplexity: 19.5612, time_taken_in_seconds: 29
Epoch [1/1], Step [5737/13804], Loss: 2.4416, Perplexity: 11.4918, time_taken_in_seconds: 30
Epoch [1/1], Step [5738/13804], Loss: 2.8053, Perplexity: 16.5315, time_taken_in_seconds: 31
Epoch [1/1], Step [5739/13804], Loss: 2.7804, Perplexity: 16.1257, time_taken_in_seconds: 32
Epoch [1/1], Step [5740/13804], Loss: 2.8335, Perplexity: 17.0042, time_taken_in_seconds: 33
Epoch [1/1], Step [5741/13804], Loss: 2.7289, Perplexity: 15.3156, time_taken_in_seconds: 34
Epoch [1/1], Step [5742/13804], Loss: 2.9135, Perplexity: 18.4211, time_taken_in_seconds: 34
Epoch [1/1], Step [5743/13804], Loss: 2.2983, Perplexity: 9.9568, time_taken_in_seconds: 35
Epoch [1/1], Step [5744/13804], Loss: 2.6149, Perplexity: 13.6653, time_taken_in_seconds: 36
Epoch [1/1], Step [5745/13804], Loss: 2.5482, Perplexity: 12.7839, time_taken_in_seconds: 37
Epoch [1/1], Step [5746/13804], Loss: 2.7830, Perplexity: 16.1680, time_taken_in_seconds: 38
Epoch [1/1], Step [5747/13804], Loss: 2.6954, Perplexity: 14.8112, time_taken_in_seconds: 39
Epoch [1/1], Step [5748/13804], Loss: 2.8074, Perplexity: 16.5676, time_taken_in_seconds: 39
Epoch [1/1], Step [5749/13804], Loss: 2.4766, Perplexity: 11.9005, time_taken_in_seconds: 40
Epoch [1/1], Step [5750/13804], Loss: 2.5593, Perplexity: 12.9268, time_taken_in_seconds: 41
Epoch [1/1], Step [5751/13804], Loss: 2.5933, Perplexity: 13.3741, time_taken_in_seconds: 42
Epoch [1/1], Step [5752/13804], Loss: 2.4717, Perplexity: 11.8424, time_taken_in_seconds: 43
Epoch [1/1], Step [5753/13804], Loss: 2.8594, Perplexity: 17.4516, time_taken_in_seconds: 43
Epoch [1/1], Step [5754/13804], Loss: 2.2743, Perplexity: 9.7211, time_taken_in_seconds: 44
Epoch [1/1], Step [5755/13804], Loss: 2.4043, Perplexity: 11.0710, time_taken_in_seconds: 45
Epoch [1/1], Step [5756/13804], Loss: 2.8848, Perplexity: 17.9001, time_taken_in_seconds: 46
Epoch [1/1], Step [5757/13804], Loss: 2.5387, Perplexity: 12.6629, time_taken_in_seconds: 47
Epoch [1/1], Step [5758/13804], Loss: 2.7107, Perplexity: 15.0401, time_taken_in_seconds: 48
Epoch [1/1], Step [5759/13804], Loss: 2.5470, Perplexity: 12.7686, time_taken_in_seconds: 48
Epoch [1/1], Step [5760/13804], Loss: 2.7102, Perplexity: 15.0321, time_taken_in_seconds: 49
Epoch [1/1], Step [5761/13804], Loss: 2.4366, Perplexity: 11.4341, time_taken_in_seconds: 50
Epoch [1/1], Step [5762/13804], Loss: 2.6359, Perplexity: 13.9558, time_taken_in_seconds: 51
Epoch [1/1], Step [5763/13804], Loss: 2.7787, Perplexity: 16.0973, time_taken_in_seconds: 52
Epoch [1/1], Step [5764/13804], Loss: 2.7469, Perplexity: 15.5943, time_taken_in_seconds: 53
Epoch [1/1], Step [5765/13804], Loss: 2.3092, Perplexity: 10.0665, time_taken_in_seconds: 53
Epoch [1/1], Step [5766/13804], Loss: 2.8246, Perplexity: 16.8534, time_taken_in_seconds: 54
Epoch [1/1], Step [5767/13804], Loss: 2.4489, Perplexity: 11.5755, time_taken_in_seconds: 55
Epoch [1/1], Step [5768/13804], Loss: 2.4772, Perplexity: 11.9079, time_taken_in_seconds: 56
Epoch [1/1], Step [5769/13804], Loss: 2.6212, Perplexity: 13.7524, time_taken_in_seconds: 57
Epoch [1/1], Step [5770/13804], Loss: 2.4935, Perplexity: 12.1038, time_taken_in_seconds: 58
Epoch [1/1], Step [5771/13804], Loss: 2.7502, Perplexity: 15.6464, time_taken_in_seconds: 58
Epoch [1/1], Step [5772/13804], Loss: 2.8777, Perplexity: 17.7737, time_taken_in_seconds: 59
Epoch [1/1], Step [5773/13804], Loss: 2.5822, Perplexity: 13.2269, time_taken_in_seconds: 60
Epoch [1/1], Step [5774/13804], Loss: 2.4566, Perplexity: 11.6649, time_taken_in_seconds: 61
Epoch [1/1], Step [5775/13804], Loss: 2.6612, Perplexity: 14.3130, time_taken_in_seconds: 62
Epoch [1/1], Step [5776/13804], Loss: 2.3495, Perplexity: 10.4802, time_taken_in_seconds: 63
Epoch [1/1], Step [5777/13804], Loss: 2.6908, Perplexity: 14.7435, time_taken_in_seconds: 63
Epoch [1/1], Step [5778/13804], Loss: 2.6714, Perplexity: 14.4598, time_taken_in_seconds: 64
Epoch [1/1], Step [5779/13804], Loss: 2.6758, Perplexity: 14.5238, time_taken_in_seconds: 65
Epoch [1/1], Step [5780/13804], Loss: 2.7450, Perplexity: 15.5641, time_taken_in_seconds: 66
Epoch [1/1], Step [5781/13804], Loss: 3.1139, Perplexity: 22.5087, time_taken_in_seconds: 67
Epoch [1/1], Step [5782/13804], Loss: 2.5362, Perplexity: 12.6317, time_taken_in_seconds: 67
Epoch [1/1], Step [5783/13804], Loss: 2.8617, Perplexity: 17.4917, time_taken_in_seconds: 68
Epoch [1/1], Step [5784/13804], Loss: 2.7394, Perplexity: 15.4777, time_taken_in_seconds: 69
Epoch [1/1], Step [5785/13804], Loss: 2.7165, Perplexity: 15.1273, time_taken_in_seconds: 70
Epoch [1/1], Step [5786/13804], Loss: 2.6731, Perplexity: 14.4842, time_taken_in_seconds: 71
Epoch [1/1], Step [5787/13804], Loss: 2.4493, Perplexity: 11.5801, time_taken_in_seconds: 72
Epoch [1/1], Step [5788/13804], Loss: 2.6917, Perplexity: 14.7561, time_taken_in_seconds: 72
Epoch [1/1], Step [5789/13804], Loss: 2.6635, Perplexity: 14.3458, time_taken_in_seconds: 73
Epoch [1/1], Step [5790/13804], Loss: 2.4611, Perplexity: 11.7178, time_taken_in_seconds: 74
Epoch [1/1], Step [5791/13804], Loss: 2.7216, Perplexity: 15.2044, time_taken_in_seconds: 75
Epoch [1/1], Step [5792/13804], Loss: 2.5825, Perplexity: 13.2296, time_taken_in_seconds: 76
Epoch [1/1], Step [5793/13804], Loss: 2.7655, Perplexity: 15.8868, time_taken_in_seconds: 77
Epoch [1/1], Step [5794/13804], Loss: 2.8736, Perplexity: 17.7013, time_taken_in_seconds: 77
Epoch [1/1], Step [5795/13804], Loss: 2.7471, Perplexity: 15.5975, time_taken_in_seconds: 78
Epoch [1/1], Step [5796/13804], Loss: 2.7021, Perplexity: 14.9115, time_taken_in_seconds: 79
Epoch [1/1], Step [5797/13804], Loss: 2.4121, Perplexity: 11.1570, time_taken_in_seconds: 80
Epoch [1/1], Step [5798/13804], Loss: 3.1378, Perplexity: 23.0529, time_taken_in_seconds: 81
Epoch [1/1], Step [5799/13804], Loss: 2.4885, Perplexity: 12.0427, time_taken_in_seconds: 81
Epoch [1/1], Step [5800/13804], Loss: 2.5779, Perplexity: 13.1696, time_taken_in_seconds: 82
Epoch [1/1], Step [5801/13804], Loss: 2.3563, Perplexity: 10.5524, time_taken_in_seconds: 0
Epoch [1/1], Step [5802/13804], Loss: 2.5477, Perplexity: 12.7771, time_taken_in_seconds: 1
Epoch [1/1], Step [5803/13804], Loss: 2.5449, Perplexity: 12.7425, time_taken_in_seconds: 2
Epoch [1/1], Step [5804/13804], Loss: 2.6852, Perplexity: 14.6608, time_taken_in_seconds: 3
Epoch [1/1], Step [5805/13804], Loss: 2.4497, Perplexity: 11.5843, time_taken_in_seconds: 4
Epoch [1/1], Step [5806/13804], Loss: 2.5255, Perplexity: 12.4969, time_taken_in_seconds: 5
Epoch [1/1], Step [5807/13804], Loss: 2.4682, Perplexity: 11.8016, time_taken_in_seconds: 5
Epoch [1/1], Step [5808/13804], Loss: 2.7964, Perplexity: 16.3852, time_taken_in_seconds: 6
Epoch [1/1], Step [5809/13804], Loss: 2.6107, Perplexity: 13.6086, time_taken_in_seconds: 7
Epoch [1/1], Step [5810/13804], Loss: 2.4400, Perplexity: 11.4730, time_taken_in_seconds: 8
Epoch [1/1], Step [5811/13804], Loss: 2.6094, Perplexity: 13.5911, time_taken_in_seconds: 9
Epoch [1/1], Step [5812/13804], Loss: 2.9116, Perplexity: 18.3857, time_taken_in_seconds: 10
Epoch [1/1], Step [5813/13804], Loss: 2.4857, Perplexity: 12.0090, time_taken_in_seconds: 10
Epoch [1/1], Step [5814/13804], Loss: 2.3107, Perplexity: 10.0819, time_taken_in_seconds: 11
Epoch [1/1], Step [5815/13804], Loss: 2.5855, Perplexity: 13.2705, time_taken_in_seconds: 12
Epoch [1/1], Step [5816/13804], Loss: 2.4643, Perplexity: 11.7555, time_taken_in_seconds: 13
Epoch [1/1], Step [5817/13804], Loss: 2.6268, Perplexity: 13.8299, time_taken_in_seconds: 14
Epoch [1/1], Step [5818/13804], Loss: 2.7944, Perplexity: 16.3529, time_taken_in_seconds: 15
Epoch [1/1], Step [5819/13804], Loss: 2.1631, Perplexity: 8.6978, time_taken_in_seconds: 15
Epoch [1/1], Step [5820/13804], Loss: 2.7889, Perplexity: 16.2625, time_taken_in_seconds: 16
Epoch [1/1], Step [5821/13804], Loss: 2.7443, Perplexity: 15.5539, time_taken_in_seconds: 17
Epoch [1/1], Step [5822/13804], Loss: 3.2048, Perplexity: 24.6517, time_taken_in_seconds: 18
Epoch [1/1], Step [5823/13804], Loss: 2.4583, Perplexity: 11.6852, time_taken_in_seconds: 19
Epoch [1/1], Step [5824/13804], Loss: 2.7539, Perplexity: 15.7032, time_taken_in_seconds: 20
Epoch [1/1], Step [5825/13804], Loss: 2.5837, Perplexity: 13.2458, time_taken_in_seconds: 20
Epoch [1/1], Step [5826/13804], Loss: 2.5629, Perplexity: 12.9733, time_taken_in_seconds: 21
Epoch [1/1], Step [5827/13804], Loss: 2.6585, Perplexity: 14.2751, time_taken_in_seconds: 22
Epoch [1/1], Step [5828/13804], Loss: 2.7478, Perplexity: 15.6076, time_taken_in_seconds: 23
Epoch [1/1], Step [5829/13804], Loss: 2.7517, Perplexity: 15.6688, time_taken_in_seconds: 24
Epoch [1/1], Step [5830/13804], Loss: 2.5848, Perplexity: 13.2602, time_taken_in_seconds: 24
Epoch [1/1], Step [5831/13804], Loss: 2.7273, Perplexity: 15.2921, time_taken_in_seconds: 25
Epoch [1/1], Step [5832/13804], Loss: 2.2354, Perplexity: 9.3503, time_taken_in_seconds: 26
Epoch [1/1], Step [5833/13804], Loss: 2.3327, Perplexity: 10.3053, time_taken_in_seconds: 27
Epoch [1/1], Step [5834/13804], Loss: 2.9510, Perplexity: 19.1242, time_taken_in_seconds: 28
Epoch [1/1], Step [5835/13804], Loss: 2.6762, Perplexity: 14.5293, time_taken_in_seconds: 29
Epoch [1/1], Step [5836/13804], Loss: 2.9231, Perplexity: 18.5982, time_taken_in_seconds: 29
Epoch [1/1], Step [5837/13804], Loss: 2.8147, Perplexity: 16.6882, time_taken_in_seconds: 30
Epoch [1/1], Step [5838/13804], Loss: 2.6122, Perplexity: 13.6294, time_taken_in_seconds: 31
Epoch [1/1], Step [5839/13804], Loss: 2.8169, Perplexity: 16.7253, time_taken_in_seconds: 32
Epoch [1/1], Step [5840/13804], Loss: 2.4456, Perplexity: 11.5376, time_taken_in_seconds: 33
Epoch [1/1], Step [5841/13804], Loss: 2.8488, Perplexity: 17.2665, time_taken_in_seconds: 34
Epoch [1/1], Step [5842/13804], Loss: 2.5228, Perplexity: 12.4637, time_taken_in_seconds: 34
Epoch [1/1], Step [5843/13804], Loss: 2.8142, Perplexity: 16.6794, time_taken_in_seconds: 35
Epoch [1/1], Step [5844/13804], Loss: 3.1112, Perplexity: 22.4484, time_taken_in_seconds: 36
Epoch [1/1], Step [5845/13804], Loss: 2.6704, Perplexity: 14.4460, time_taken_in_seconds: 37
Epoch [1/1], Step [5846/13804], Loss: 2.2136, Perplexity: 9.1486, time_taken_in_seconds: 38
Epoch [1/1], Step [5847/13804], Loss: 2.4711, Perplexity: 11.8351, time_taken_in_seconds: 39
Epoch [1/1], Step [5848/13804], Loss: 2.7978, Perplexity: 16.4090, time_taken_in_seconds: 39
Epoch [1/1], Step [5849/13804], Loss: 3.0254, Perplexity: 20.6027, time_taken_in_seconds: 40
Epoch [1/1], Step [5850/13804], Loss: 2.7807, Perplexity: 16.1295, time_taken_in_seconds: 41
Epoch [1/1], Step [5851/13804], Loss: 2.7152, Perplexity: 15.1074, time_taken_in_seconds: 42
Epoch [1/1], Step [5852/13804], Loss: 2.5371, Perplexity: 12.6430, time_taken_in_seconds: 43
Epoch [1/1], Step [5853/13804], Loss: 2.8136, Perplexity: 16.6706, time_taken_in_seconds: 44
Epoch [1/1], Step [5854/13804], Loss: 2.9453, Perplexity: 19.0169, time_taken_in_seconds: 44
Epoch [1/1], Step [5855/13804], Loss: 2.5009, Perplexity: 12.1940, time_taken_in_seconds: 45
Epoch [1/1], Step [5856/13804], Loss: 2.7043, Perplexity: 14.9443, time_taken_in_seconds: 46
Epoch [1/1], Step [5857/13804], Loss: 3.3063, Perplexity: 27.2843, time_taken_in_seconds: 47
Epoch [1/1], Step [5858/13804], Loss: 2.5384, Perplexity: 12.6595, time_taken_in_seconds: 48
Epoch [1/1], Step [5859/13804], Loss: 2.4708, Perplexity: 11.8314, time_taken_in_seconds: 49
Epoch [1/1], Step [5860/13804], Loss: 2.6091, Perplexity: 13.5873, time_taken_in_seconds: 49
Epoch [1/1], Step [5861/13804], Loss: 2.5315, Perplexity: 12.5729, time_taken_in_seconds: 50
Epoch [1/1], Step [5862/13804], Loss: 2.8554, Perplexity: 17.3816, time_taken_in_seconds: 51
Epoch [1/1], Step [5863/13804], Loss: 2.5968, Perplexity: 13.4209, time_taken_in_seconds: 52
Epoch [1/1], Step [5864/13804], Loss: 2.6107, Perplexity: 13.6082, time_taken_in_seconds: 53
Epoch [1/1], Step [5865/13804], Loss: 2.5641, Perplexity: 12.9884, time_taken_in_seconds: 53
Epoch [1/1], Step [5866/13804], Loss: 2.7344, Perplexity: 15.4009, time_taken_in_seconds: 54
Epoch [1/1], Step [5867/13804], Loss: 2.6766, Perplexity: 14.5352, time_taken_in_seconds: 55
Epoch [1/1], Step [5868/13804], Loss: 2.9495, Perplexity: 19.0959, time_taken_in_seconds: 56
Epoch [1/1], Step [5869/13804], Loss: 2.8914, Perplexity: 18.0194, time_taken_in_seconds: 57
Epoch [1/1], Step [5870/13804], Loss: 2.5283, Perplexity: 12.5319, time_taken_in_seconds: 58
Epoch [1/1], Step [5871/13804], Loss: 2.3859, Perplexity: 10.8693, time_taken_in_seconds: 58
Epoch [1/1], Step [5872/13804], Loss: 2.7629, Perplexity: 15.8450, time_taken_in_seconds: 59
Epoch [1/1], Step [5873/13804], Loss: 2.6966, Perplexity: 14.8289, time_taken_in_seconds: 60
Epoch [1/1], Step [5874/13804], Loss: 2.6377, Perplexity: 13.9808, time_taken_in_seconds: 61
Epoch [1/1], Step [5875/13804], Loss: 2.7453, Perplexity: 15.5689, time_taken_in_seconds: 62
Epoch [1/1], Step [5876/13804], Loss: 2.4269, Perplexity: 11.3233, time_taken_in_seconds: 63
Epoch [1/1], Step [5877/13804], Loss: 2.3766, Perplexity: 10.7677, time_taken_in_seconds: 64
Epoch [1/1], Step [5878/13804], Loss: 2.5085, Perplexity: 12.2862, time_taken_in_seconds: 64
Epoch [1/1], Step [5879/13804], Loss: 3.3642, Perplexity: 28.9100, time_taken_in_seconds: 65
Epoch [1/1], Step [5880/13804], Loss: 2.6371, Perplexity: 13.9731, time_taken_in_seconds: 66
Epoch [1/1], Step [5881/13804], Loss: 2.8258, Perplexity: 16.8743, time_taken_in_seconds: 67
Epoch [1/1], Step [5882/13804], Loss: 2.8233, Perplexity: 16.8320, time_taken_in_seconds: 68
Epoch [1/1], Step [5883/13804], Loss: 2.7485, Perplexity: 15.6184, time_taken_in_seconds: 69
Epoch [1/1], Step [5884/13804], Loss: 2.4497, Perplexity: 11.5846, time_taken_in_seconds: 69
Epoch [1/1], Step [5885/13804], Loss: 2.4463, Perplexity: 11.5455, time_taken_in_seconds: 70
Epoch [1/1], Step [5886/13804], Loss: 2.6646, Perplexity: 14.3620, time_taken_in_seconds: 71
Epoch [1/1], Step [5887/13804], Loss: 3.1544, Perplexity: 23.4390, time_taken_in_seconds: 72
Epoch [1/1], Step [5888/13804], Loss: 2.2447, Perplexity: 9.4380, time_taken_in_seconds: 73
Epoch [1/1], Step [5889/13804], Loss: 2.6178, Perplexity: 13.7049, time_taken_in_seconds: 74
Epoch [1/1], Step [5890/13804], Loss: 2.7259, Perplexity: 15.2708, time_taken_in_seconds: 74
Epoch [1/1], Step [5891/13804], Loss: 2.6507, Perplexity: 14.1645, time_taken_in_seconds: 75
Epoch [1/1], Step [5892/13804], Loss: 2.5984, Perplexity: 13.4426, time_taken_in_seconds: 76
Epoch [1/1], Step [5893/13804], Loss: 2.5227, Perplexity: 12.4622, time_taken_in_seconds: 77
Epoch [1/1], Step [5894/13804], Loss: 2.6775, Perplexity: 14.5487, time_taken_in_seconds: 78
Epoch [1/1], Step [5895/13804], Loss: 2.7569, Perplexity: 15.7508, time_taken_in_seconds: 78
Epoch [1/1], Step [5896/13804], Loss: 2.7856, Perplexity: 16.2103, time_taken_in_seconds: 79
Epoch [1/1], Step [5897/13804], Loss: 2.4774, Perplexity: 11.9099, time_taken_in_seconds: 80
Epoch [1/1], Step [5898/13804], Loss: 2.4076, Perplexity: 11.1076, time_taken_in_seconds: 81
Epoch [1/1], Step [5899/13804], Loss: 2.3904, Perplexity: 10.9178, time_taken_in_seconds: 82
Epoch [1/1], Step [5900/13804], Loss: 2.2790, Perplexity: 9.7667, time_taken_in_seconds: 83
Epoch [1/1], Step [5901/13804], Loss: 3.2512, Perplexity: 25.8201, time_taken_in_seconds: 0
Epoch [1/1], Step [5902/13804], Loss: 2.7813, Perplexity: 16.1393, time_taken_in_seconds: 1
Epoch [1/1], Step [5903/13804], Loss: 2.7827, Perplexity: 16.1623, time_taken_in_seconds: 2
Epoch [1/1], Step [5904/13804], Loss: 2.4548, Perplexity: 11.6446, time_taken_in_seconds: 3
Epoch [1/1], Step [5905/13804], Loss: 2.6615, Perplexity: 14.3174, time_taken_in_seconds: 4
Epoch [1/1], Step [5906/13804], Loss: 2.4727, Perplexity: 11.8547, time_taken_in_seconds: 4
Epoch [1/1], Step [5907/13804], Loss: 2.7247, Perplexity: 15.2513, time_taken_in_seconds: 5
Epoch [1/1], Step [5908/13804], Loss: 2.6626, Perplexity: 14.3337, time_taken_in_seconds: 6
Epoch [1/1], Step [5909/13804], Loss: 2.7162, Perplexity: 15.1234, time_taken_in_seconds: 7
Epoch [1/1], Step [5910/13804], Loss: 2.5695, Perplexity: 13.0593, time_taken_in_seconds: 8
Epoch [1/1], Step [5911/13804], Loss: 2.5232, Perplexity: 12.4678, time_taken_in_seconds: 9
Epoch [1/1], Step [5912/13804], Loss: 2.8746, Perplexity: 17.7181, time_taken_in_seconds: 9
Epoch [1/1], Step [5913/13804], Loss: 2.5901, Perplexity: 13.3309, time_taken_in_seconds: 10
Epoch [1/1], Step [5914/13804], Loss: 2.7455, Perplexity: 15.5724, time_taken_in_seconds: 11
Epoch [1/1], Step [5915/13804], Loss: 2.6637, Perplexity: 14.3488, time_taken_in_seconds: 12
Epoch [1/1], Step [5916/13804], Loss: 2.4729, Perplexity: 11.8574, time_taken_in_seconds: 13
Epoch [1/1], Step [5917/13804], Loss: 2.6300, Perplexity: 13.8733, time_taken_in_seconds: 14
Epoch [1/1], Step [5918/13804], Loss: 2.6666, Perplexity: 14.3905, time_taken_in_seconds: 14
Epoch [1/1], Step [5919/13804], Loss: 2.4337, Perplexity: 11.4011, time_taken_in_seconds: 15
Epoch [1/1], Step [5920/13804], Loss: 3.4264, Perplexity: 30.7663, time_taken_in_seconds: 16
Epoch [1/1], Step [5921/13804], Loss: 2.6250, Perplexity: 13.8049, time_taken_in_seconds: 17
Epoch [1/1], Step [5922/13804], Loss: 2.4690, Perplexity: 11.8105, time_taken_in_seconds: 18
Epoch [1/1], Step [5923/13804], Loss: 2.8193, Perplexity: 16.7653, time_taken_in_seconds: 19
Epoch [1/1], Step [5924/13804], Loss: 2.9928, Perplexity: 19.9409, time_taken_in_seconds: 19
Epoch [1/1], Step [5925/13804], Loss: 2.7253, Perplexity: 15.2616, time_taken_in_seconds: 20
Epoch [1/1], Step [5926/13804], Loss: 2.6490, Perplexity: 14.1406, time_taken_in_seconds: 21
Epoch [1/1], Step [5927/13804], Loss: 2.2729, Perplexity: 9.7077, time_taken_in_seconds: 22
Epoch [1/1], Step [5928/13804], Loss: 3.2942, Perplexity: 26.9550, time_taken_in_seconds: 23
Epoch [1/1], Step [5929/13804], Loss: 2.6245, Perplexity: 13.7982, time_taken_in_seconds: 24
Epoch [1/1], Step [5930/13804], Loss: 2.3558, Perplexity: 10.5471, time_taken_in_seconds: 24
Epoch [1/1], Step [5931/13804], Loss: 2.3195, Perplexity: 10.1708, time_taken_in_seconds: 25
Epoch [1/1], Step [5932/13804], Loss: 2.6863, Perplexity: 14.6769, time_taken_in_seconds: 26
Epoch [1/1], Step [5933/13804], Loss: 3.6267, Perplexity: 37.5880, time_taken_in_seconds: 27
Epoch [1/1], Step [5934/13804], Loss: 2.6111, Perplexity: 13.6144, time_taken_in_seconds: 28
Epoch [1/1], Step [5935/13804], Loss: 2.5810, Perplexity: 13.2097, time_taken_in_seconds: 28
Epoch [1/1], Step [5936/13804], Loss: 2.7399, Perplexity: 15.4851, time_taken_in_seconds: 29
Epoch [1/1], Step [5937/13804], Loss: 2.7061, Perplexity: 14.9708, time_taken_in_seconds: 30
Epoch [1/1], Step [5938/13804], Loss: 2.5809, Perplexity: 13.2089, time_taken_in_seconds: 31
Epoch [1/1], Step [5939/13804], Loss: 2.5936, Perplexity: 13.3777, time_taken_in_seconds: 32
Epoch [1/1], Step [5940/13804], Loss: 2.7019, Perplexity: 14.9085, time_taken_in_seconds: 33
Epoch [1/1], Step [5941/13804], Loss: 2.9602, Perplexity: 19.3011, time_taken_in_seconds: 33
Epoch [1/1], Step [5942/13804], Loss: 2.3315, Perplexity: 10.2929, time_taken_in_seconds: 34
Epoch [1/1], Step [5943/13804], Loss: 2.6240, Perplexity: 13.7913, time_taken_in_seconds: 35
Epoch [1/1], Step [5944/13804], Loss: 2.3153, Perplexity: 10.1276, time_taken_in_seconds: 36
Epoch [1/1], Step [5945/13804], Loss: 2.5506, Perplexity: 12.8144, time_taken_in_seconds: 37
Epoch [1/1], Step [5946/13804], Loss: 2.5664, Perplexity: 13.0189, time_taken_in_seconds: 37
Epoch [1/1], Step [5947/13804], Loss: 2.6713, Perplexity: 14.4591, time_taken_in_seconds: 38
Epoch [1/1], Step [5948/13804], Loss: 2.3559, Perplexity: 10.5473, time_taken_in_seconds: 39
Epoch [1/1], Step [5949/13804], Loss: 3.0602, Perplexity: 21.3327, time_taken_in_seconds: 40
Epoch [1/1], Step [5950/13804], Loss: 2.7695, Perplexity: 15.9502, time_taken_in_seconds: 41
Epoch [1/1], Step [5951/13804], Loss: 2.7655, Perplexity: 15.8866, time_taken_in_seconds: 42
Epoch [1/1], Step [5952/13804], Loss: 3.0426, Perplexity: 20.9594, time_taken_in_seconds: 43
Epoch [1/1], Step [5953/13804], Loss: 2.5313, Perplexity: 12.5693, time_taken_in_seconds: 43
Epoch [1/1], Step [5954/13804], Loss: 2.5930, Perplexity: 13.3696, time_taken_in_seconds: 44
Epoch [1/1], Step [5955/13804], Loss: 2.2648, Perplexity: 9.6292, time_taken_in_seconds: 45
Epoch [1/1], Step [5956/13804], Loss: 3.5241, Perplexity: 33.9221, time_taken_in_seconds: 46
Epoch [1/1], Step [5957/13804], Loss: 2.7070, Perplexity: 14.9837, time_taken_in_seconds: 47
Epoch [1/1], Step [5958/13804], Loss: 2.4424, Perplexity: 11.5009, time_taken_in_seconds: 48
Epoch [1/1], Step [5959/13804], Loss: 2.6193, Perplexity: 13.7257, time_taken_in_seconds: 48
Epoch [1/1], Step [5960/13804], Loss: 2.7408, Perplexity: 15.4987, time_taken_in_seconds: 49
Epoch [1/1], Step [5961/13804], Loss: 2.3700, Perplexity: 10.6976, time_taken_in_seconds: 50
Epoch [1/1], Step [5962/13804], Loss: 2.4269, Perplexity: 11.3239, time_taken_in_seconds: 51
Epoch [1/1], Step [5963/13804], Loss: 2.5769, Perplexity: 13.1561, time_taken_in_seconds: 52
Epoch [1/1], Step [5964/13804], Loss: 2.5918, Perplexity: 13.3538, time_taken_in_seconds: 52
Epoch [1/1], Step [5965/13804], Loss: 2.5742, Perplexity: 13.1214, time_taken_in_seconds: 53
Epoch [1/1], Step [5966/13804], Loss: 2.2886, Perplexity: 9.8614, time_taken_in_seconds: 54
Epoch [1/1], Step [5967/13804], Loss: 3.0275, Perplexity: 20.6458, time_taken_in_seconds: 55
Epoch [1/1], Step [5968/13804], Loss: 2.6944, Perplexity: 14.7965, time_taken_in_seconds: 56
Epoch [1/1], Step [5969/13804], Loss: 2.9547, Perplexity: 19.1964, time_taken_in_seconds: 57
Epoch [1/1], Step [5970/13804], Loss: 2.4782, Perplexity: 11.9197, time_taken_in_seconds: 57
Epoch [1/1], Step [5971/13804], Loss: 2.4263, Perplexity: 11.3173, time_taken_in_seconds: 58
Epoch [1/1], Step [5972/13804], Loss: 2.4601, Perplexity: 11.7057, time_taken_in_seconds: 59
Epoch [1/1], Step [5973/13804], Loss: 2.9096, Perplexity: 18.3492, time_taken_in_seconds: 60
Epoch [1/1], Step [5974/13804], Loss: 2.5746, Perplexity: 13.1256, time_taken_in_seconds: 61
Epoch [1/1], Step [5975/13804], Loss: 2.5384, Perplexity: 12.6590, time_taken_in_seconds: 62
Epoch [1/1], Step [5976/13804], Loss: 2.6807, Perplexity: 14.5955, time_taken_in_seconds: 62
Epoch [1/1], Step [5977/13804], Loss: 2.6393, Perplexity: 14.0039, time_taken_in_seconds: 63
Epoch [1/1], Step [5978/13804], Loss: 2.7796, Perplexity: 16.1122, time_taken_in_seconds: 64
Epoch [1/1], Step [5979/13804], Loss: 2.5803, Perplexity: 13.2016, time_taken_in_seconds: 65
Epoch [1/1], Step [5980/13804], Loss: 2.2943, Perplexity: 9.9174, time_taken_in_seconds: 66
Epoch [1/1], Step [5981/13804], Loss: 2.6254, Perplexity: 13.8101, time_taken_in_seconds: 66
Epoch [1/1], Step [5982/13804], Loss: 2.7558, Perplexity: 15.7337, time_taken_in_seconds: 67
Epoch [1/1], Step [5983/13804], Loss: 2.6630, Perplexity: 14.3397, time_taken_in_seconds: 68
Epoch [1/1], Step [5984/13804], Loss: 2.7353, Perplexity: 15.4138, time_taken_in_seconds: 69
Epoch [1/1], Step [5985/13804], Loss: 2.4817, Perplexity: 11.9617, time_taken_in_seconds: 70
Epoch [1/1], Step [5986/13804], Loss: 2.7014, Perplexity: 14.9008, time_taken_in_seconds: 71
Epoch [1/1], Step [5987/13804], Loss: 3.7058, Perplexity: 40.6807, time_taken_in_seconds: 71
Epoch [1/1], Step [5988/13804], Loss: 2.5766, Perplexity: 13.1520, time_taken_in_seconds: 72
Epoch [1/1], Step [5989/13804], Loss: 2.6972, Perplexity: 14.8378, time_taken_in_seconds: 73
Epoch [1/1], Step [5990/13804], Loss: 2.8937, Perplexity: 18.0606, time_taken_in_seconds: 74
Epoch [1/1], Step [5991/13804], Loss: 2.7581, Perplexity: 15.7697, time_taken_in_seconds: 75
Epoch [1/1], Step [5992/13804], Loss: 2.7783, Perplexity: 16.0912, time_taken_in_seconds: 75
Epoch [1/1], Step [5993/13804], Loss: 2.7623, Perplexity: 15.8356, time_taken_in_seconds: 76
Epoch [1/1], Step [5994/13804], Loss: 2.5659, Perplexity: 13.0118, time_taken_in_seconds: 77
Epoch [1/1], Step [5995/13804], Loss: 2.6168, Perplexity: 13.6912, time_taken_in_seconds: 78
Epoch [1/1], Step [5996/13804], Loss: 2.3428, Perplexity: 10.4099, time_taken_in_seconds: 79
Epoch [1/1], Step [5997/13804], Loss: 2.4165, Perplexity: 11.2061, time_taken_in_seconds: 80
Epoch [1/1], Step [5998/13804], Loss: 2.9930, Perplexity: 19.9451, time_taken_in_seconds: 80
Epoch [1/1], Step [5999/13804], Loss: 2.7871, Perplexity: 16.2342, time_taken_in_seconds: 81
Epoch [1/1], Step [6000/13804], Loss: 2.8133, Perplexity: 16.6640, time_taken_in_seconds: 82
Epoch [1/1], Step [6001/13804], Loss: 2.8915, Perplexity: 18.0207, time_taken_in_seconds: 0
Epoch [1/1], Step [6002/13804], Loss: 2.4962, Perplexity: 12.1368, time_taken_in_seconds: 1
Epoch [1/1], Step [6003/13804], Loss: 2.7730, Perplexity: 16.0071, time_taken_in_seconds: 2
Epoch [1/1], Step [6004/13804], Loss: 2.7956, Perplexity: 16.3722, time_taken_in_seconds: 3
Epoch [1/1], Step [6005/13804], Loss: 2.7192, Perplexity: 15.1681, time_taken_in_seconds: 4
Epoch [1/1], Step [6006/13804], Loss: 2.3571, Perplexity: 10.5598, time_taken_in_seconds: 4
Epoch [1/1], Step [6007/13804], Loss: 2.5704, Perplexity: 13.0708, time_taken_in_seconds: 5
Epoch [1/1], Step [6008/13804], Loss: 2.8387, Perplexity: 17.0943, time_taken_in_seconds: 6
Epoch [1/1], Step [6009/13804], Loss: 2.6742, Perplexity: 14.5006, time_taken_in_seconds: 7
Epoch [1/1], Step [6010/13804], Loss: 2.2191, Perplexity: 9.1991, time_taken_in_seconds: 8
Epoch [1/1], Step [6011/13804], Loss: 2.8228, Perplexity: 16.8236, time_taken_in_seconds: 9
Epoch [1/1], Step [6012/13804], Loss: 2.8307, Perplexity: 16.9577, time_taken_in_seconds: 9
Epoch [1/1], Step [6013/13804], Loss: 3.0976, Perplexity: 22.1450, time_taken_in_seconds: 10
Epoch [1/1], Step [6014/13804], Loss: 2.7791, Perplexity: 16.1045, time_taken_in_seconds: 11
Epoch [1/1], Step [6015/13804], Loss: 2.5646, Perplexity: 12.9957, time_taken_in_seconds: 12
Epoch [1/1], Step [6016/13804], Loss: 2.6485, Perplexity: 14.1329, time_taken_in_seconds: 13
Epoch [1/1], Step [6017/13804], Loss: 2.4215, Perplexity: 11.2623, time_taken_in_seconds: 13
Epoch [1/1], Step [6018/13804], Loss: 2.4133, Perplexity: 11.1713, time_taken_in_seconds: 14
Epoch [1/1], Step [6019/13804], Loss: 2.4677, Perplexity: 11.7949, time_taken_in_seconds: 15
Epoch [1/1], Step [6020/13804], Loss: 2.7230, Perplexity: 15.2259, time_taken_in_seconds: 16
Epoch [1/1], Step [6021/13804], Loss: 2.2510, Perplexity: 9.4968, time_taken_in_seconds: 17
Epoch [1/1], Step [6022/13804], Loss: 2.4479, Perplexity: 11.5645, time_taken_in_seconds: 18
Epoch [1/1], Step [6023/13804], Loss: 2.6280, Perplexity: 13.8461, time_taken_in_seconds: 19
Epoch [1/1], Step [6024/13804], Loss: 3.0810, Perplexity: 21.7792, time_taken_in_seconds: 19
Epoch [1/1], Step [6025/13804], Loss: 3.0224, Perplexity: 20.5403, time_taken_in_seconds: 20
Epoch [1/1], Step [6026/13804], Loss: 2.2302, Perplexity: 9.3019, time_taken_in_seconds: 21
Epoch [1/1], Step [6027/13804], Loss: 2.5589, Perplexity: 12.9222, time_taken_in_seconds: 22
Epoch [1/1], Step [6028/13804], Loss: 2.7237, Perplexity: 15.2362, time_taken_in_seconds: 23
Epoch [1/1], Step [6029/13804], Loss: 2.3833, Perplexity: 10.8405, time_taken_in_seconds: 24
Epoch [1/1], Step [6030/13804], Loss: 2.6532, Perplexity: 14.1997, time_taken_in_seconds: 24
Epoch [1/1], Step [6031/13804], Loss: 2.6200, Perplexity: 13.7356, time_taken_in_seconds: 25
Epoch [1/1], Step [6032/13804], Loss: 3.2539, Perplexity: 25.8922, time_taken_in_seconds: 26
Epoch [1/1], Step [6033/13804], Loss: 2.6838, Perplexity: 14.6399, time_taken_in_seconds: 27
Epoch [1/1], Step [6034/13804], Loss: 2.6223, Perplexity: 13.7673, time_taken_in_seconds: 28
Epoch [1/1], Step [6035/13804], Loss: 2.7703, Perplexity: 15.9633, time_taken_in_seconds: 28
Epoch [1/1], Step [6036/13804], Loss: 2.8462, Perplexity: 17.2230, time_taken_in_seconds: 29
Epoch [1/1], Step [6037/13804], Loss: 2.6697, Perplexity: 14.4357, time_taken_in_seconds: 30
Epoch [1/1], Step [6038/13804], Loss: 2.7013, Perplexity: 14.8985, time_taken_in_seconds: 31
Epoch [1/1], Step [6039/13804], Loss: 2.7083, Perplexity: 15.0041, time_taken_in_seconds: 32
Epoch [1/1], Step [6040/13804], Loss: 2.6681, Perplexity: 14.4130, time_taken_in_seconds: 33
Epoch [1/1], Step [6041/13804], Loss: 2.8709, Perplexity: 17.6525, time_taken_in_seconds: 33
Epoch [1/1], Step [6042/13804], Loss: 2.5403, Perplexity: 12.6837, time_taken_in_seconds: 34
Epoch [1/1], Step [6043/13804], Loss: 2.7660, Perplexity: 15.8950, time_taken_in_seconds: 35
Epoch [1/1], Step [6044/13804], Loss: 2.6019, Perplexity: 13.4890, time_taken_in_seconds: 36
Epoch [1/1], Step [6045/13804], Loss: 2.5016, Perplexity: 12.2018, time_taken_in_seconds: 37
Epoch [1/1], Step [6046/13804], Loss: 2.8178, Perplexity: 16.7393, time_taken_in_seconds: 37
Epoch [1/1], Step [6047/13804], Loss: 3.0701, Perplexity: 21.5448, time_taken_in_seconds: 38
Epoch [1/1], Step [6048/13804], Loss: 2.8290, Perplexity: 16.9290, time_taken_in_seconds: 39
Epoch [1/1], Step [6049/13804], Loss: 2.5319, Perplexity: 12.5770, time_taken_in_seconds: 40
Epoch [1/1], Step [6050/13804], Loss: 3.0455, Perplexity: 21.0206, time_taken_in_seconds: 41
Epoch [1/1], Step [6051/13804], Loss: 3.2537, Perplexity: 25.8857, time_taken_in_seconds: 42
Epoch [1/1], Step [6052/13804], Loss: 2.8009, Perplexity: 16.4591, time_taken_in_seconds: 42
Epoch [1/1], Step [6053/13804], Loss: 2.8248, Perplexity: 16.8576, time_taken_in_seconds: 43
Epoch [1/1], Step [6054/13804], Loss: 2.2390, Perplexity: 9.3843, time_taken_in_seconds: 44
Epoch [1/1], Step [6055/13804], Loss: 2.8695, Perplexity: 17.6280, time_taken_in_seconds: 45
Epoch [1/1], Step [6056/13804], Loss: 2.7853, Perplexity: 16.2048, time_taken_in_seconds: 46
Epoch [1/1], Step [6057/13804], Loss: 2.8119, Perplexity: 16.6421, time_taken_in_seconds: 47
Epoch [1/1], Step [6058/13804], Loss: 2.3909, Perplexity: 10.9233, time_taken_in_seconds: 47
Epoch [1/1], Step [6059/13804], Loss: 2.5557, Perplexity: 12.8808, time_taken_in_seconds: 48
Epoch [1/1], Step [6060/13804], Loss: 2.4923, Perplexity: 12.0891, time_taken_in_seconds: 49
Epoch [1/1], Step [6061/13804], Loss: 2.5556, Perplexity: 12.8787, time_taken_in_seconds: 50
Epoch [1/1], Step [6062/13804], Loss: 2.7229, Perplexity: 15.2244, time_taken_in_seconds: 51
Epoch [1/1], Step [6063/13804], Loss: 2.9375, Perplexity: 18.8689, time_taken_in_seconds: 51
Epoch [1/1], Step [6064/13804], Loss: 2.5756, Perplexity: 13.1395, time_taken_in_seconds: 52
Epoch [1/1], Step [6065/13804], Loss: 2.7211, Perplexity: 15.1972, time_taken_in_seconds: 53
Epoch [1/1], Step [6066/13804], Loss: 2.4210, Perplexity: 11.2576, time_taken_in_seconds: 54
Epoch [1/1], Step [6067/13804], Loss: 2.8628, Perplexity: 17.5103, time_taken_in_seconds: 55
Epoch [1/1], Step [6068/13804], Loss: 3.0117, Perplexity: 20.3219, time_taken_in_seconds: 55
Epoch [1/1], Step [6069/13804], Loss: 2.8197, Perplexity: 16.7716, time_taken_in_seconds: 56
Epoch [1/1], Step [6070/13804], Loss: 2.3474, Perplexity: 10.4582, time_taken_in_seconds: 57
Epoch [1/1], Step [6071/13804], Loss: 2.8219, Perplexity: 16.8095, time_taken_in_seconds: 58
Epoch [1/1], Step [6072/13804], Loss: 2.7606, Perplexity: 15.8094, time_taken_in_seconds: 59
Epoch [1/1], Step [6073/13804], Loss: 2.4910, Perplexity: 12.0733, time_taken_in_seconds: 60
Epoch [1/1], Step [6074/13804], Loss: 2.6922, Perplexity: 14.7639, time_taken_in_seconds: 60
Epoch [1/1], Step [6075/13804], Loss: 2.5041, Perplexity: 12.2330, time_taken_in_seconds: 61
Epoch [1/1], Step [6076/13804], Loss: 2.4862, Perplexity: 12.0150, time_taken_in_seconds: 62
Epoch [1/1], Step [6077/13804], Loss: 2.9524, Perplexity: 19.1514, time_taken_in_seconds: 63
Epoch [1/1], Step [6078/13804], Loss: 2.5147, Perplexity: 12.3629, time_taken_in_seconds: 64
Epoch [1/1], Step [6079/13804], Loss: 2.7057, Perplexity: 14.9653, time_taken_in_seconds: 65
Epoch [1/1], Step [6080/13804], Loss: 2.3888, Perplexity: 10.9003, time_taken_in_seconds: 65
Epoch [1/1], Step [6081/13804], Loss: 2.7895, Perplexity: 16.2730, time_taken_in_seconds: 66
Epoch [1/1], Step [6082/13804], Loss: 2.7961, Perplexity: 16.3809, time_taken_in_seconds: 67
Epoch [1/1], Step [6083/13804], Loss: 2.7988, Perplexity: 16.4248, time_taken_in_seconds: 68
Epoch [1/1], Step [6084/13804], Loss: 2.5683, Perplexity: 13.0443, time_taken_in_seconds: 69
Epoch [1/1], Step [6085/13804], Loss: 2.7795, Perplexity: 16.1117, time_taken_in_seconds: 69
Epoch [1/1], Step [6086/13804], Loss: 2.4719, Perplexity: 11.8448, time_taken_in_seconds: 70
Epoch [1/1], Step [6087/13804], Loss: 3.1962, Perplexity: 24.4398, time_taken_in_seconds: 71
Epoch [1/1], Step [6088/13804], Loss: 2.7825, Perplexity: 16.1588, time_taken_in_seconds: 72
Epoch [1/1], Step [6089/13804], Loss: 2.6836, Perplexity: 14.6382, time_taken_in_seconds: 73
Epoch [1/1], Step [6090/13804], Loss: 2.9228, Perplexity: 18.5933, time_taken_in_seconds: 74
Epoch [1/1], Step [6091/13804], Loss: 3.0064, Perplexity: 20.2142, time_taken_in_seconds: 74
Epoch [1/1], Step [6092/13804], Loss: 2.6543, Perplexity: 14.2156, time_taken_in_seconds: 75
Epoch [1/1], Step [6093/13804], Loss: 3.0657, Perplexity: 21.4498, time_taken_in_seconds: 76
Epoch [1/1], Step [6094/13804], Loss: 2.6875, Perplexity: 14.6954, time_taken_in_seconds: 77
Epoch [1/1], Step [6095/13804], Loss: 2.6329, Perplexity: 13.9147, time_taken_in_seconds: 78
Epoch [1/1], Step [6096/13804], Loss: 2.7306, Perplexity: 15.3421, time_taken_in_seconds: 79
Epoch [1/1], Step [6097/13804], Loss: 3.5086, Perplexity: 33.4017, time_taken_in_seconds: 80
Epoch [1/1], Step [6098/13804], Loss: 2.8400, Perplexity: 17.1155, time_taken_in_seconds: 80
Epoch [1/1], Step [6099/13804], Loss: 2.7912, Perplexity: 16.2999, time_taken_in_seconds: 81
Epoch [1/1], Step [6100/13804], Loss: 2.1620, Perplexity: 8.6883, time_taken_in_seconds: 82
Epoch [1/1], Step [6101/13804], Loss: 2.8789, Perplexity: 17.7947, time_taken_in_seconds: 0
Epoch [1/1], Step [6102/13804], Loss: 2.7713, Perplexity: 15.9793, time_taken_in_seconds: 1
Epoch [1/1], Step [6103/13804], Loss: 3.0617, Perplexity: 21.3647, time_taken_in_seconds: 2
Epoch [1/1], Step [6104/13804], Loss: 2.6566, Perplexity: 14.2476, time_taken_in_seconds: 3
Epoch [1/1], Step [6105/13804], Loss: 2.5954, Perplexity: 13.4020, time_taken_in_seconds: 4
Epoch [1/1], Step [6106/13804], Loss: 2.7514, Perplexity: 15.6644, time_taken_in_seconds: 4
Epoch [1/1], Step [6107/13804], Loss: 2.6974, Perplexity: 14.8405, time_taken_in_seconds: 5
Epoch [1/1], Step [6108/13804], Loss: 3.1162, Perplexity: 22.5597, time_taken_in_seconds: 6
Epoch [1/1], Step [6109/13804], Loss: 2.9433, Perplexity: 18.9783, time_taken_in_seconds: 7
Epoch [1/1], Step [6110/13804], Loss: 3.0799, Perplexity: 21.7570, time_taken_in_seconds: 8
Epoch [1/1], Step [6111/13804], Loss: 2.7947, Perplexity: 16.3573, time_taken_in_seconds: 9
Epoch [1/1], Step [6112/13804], Loss: 2.3800, Perplexity: 10.8050, time_taken_in_seconds: 9
Epoch [1/1], Step [6113/13804], Loss: 2.9290, Perplexity: 18.7085, time_taken_in_seconds: 10
Epoch [1/1], Step [6114/13804], Loss: 2.6934, Perplexity: 14.7813, time_taken_in_seconds: 11
Epoch [1/1], Step [6115/13804], Loss: 2.7292, Perplexity: 15.3211, time_taken_in_seconds: 12
Epoch [1/1], Step [6116/13804], Loss: 2.5926, Perplexity: 13.3644, time_taken_in_seconds: 13
Epoch [1/1], Step [6117/13804], Loss: 2.5289, Perplexity: 12.5394, time_taken_in_seconds: 14
Epoch [1/1], Step [6118/13804], Loss: 2.6043, Perplexity: 13.5223, time_taken_in_seconds: 14
Epoch [1/1], Step [6119/13804], Loss: 2.2157, Perplexity: 9.1680, time_taken_in_seconds: 15
Epoch [1/1], Step [6120/13804], Loss: 3.6246, Perplexity: 37.5092, time_taken_in_seconds: 16
Epoch [1/1], Step [6121/13804], Loss: 2.6423, Perplexity: 14.0450, time_taken_in_seconds: 17
Epoch [1/1], Step [6122/13804], Loss: 2.3015, Perplexity: 9.9891, time_taken_in_seconds: 18
Epoch [1/1], Step [6123/13804], Loss: 2.5015, Perplexity: 12.2010, time_taken_in_seconds: 19
Epoch [1/1], Step [6124/13804], Loss: 2.7575, Perplexity: 15.7608, time_taken_in_seconds: 19
Epoch [1/1], Step [6125/13804], Loss: 2.7031, Perplexity: 14.9253, time_taken_in_seconds: 20
Epoch [1/1], Step [6126/13804], Loss: 2.7671, Perplexity: 15.9130, time_taken_in_seconds: 21
Epoch [1/1], Step [6127/13804], Loss: 2.9287, Perplexity: 18.7033, time_taken_in_seconds: 22
Epoch [1/1], Step [6128/13804], Loss: 2.7609, Perplexity: 15.8134, time_taken_in_seconds: 23
Epoch [1/1], Step [6129/13804], Loss: 2.4089, Perplexity: 11.1213, time_taken_in_seconds: 24
Epoch [1/1], Step [6130/13804], Loss: 3.1558, Perplexity: 23.4707, time_taken_in_seconds: 24
Epoch [1/1], Step [6131/13804], Loss: 3.0059, Perplexity: 20.2048, time_taken_in_seconds: 25
Epoch [1/1], Step [6132/13804], Loss: 2.7697, Perplexity: 15.9532, time_taken_in_seconds: 26
Epoch [1/1], Step [6133/13804], Loss: 2.4925, Perplexity: 12.0911, time_taken_in_seconds: 27
Epoch [1/1], Step [6134/13804], Loss: 2.5544, Perplexity: 12.8633, time_taken_in_seconds: 28
Epoch [1/1], Step [6135/13804], Loss: 3.2701, Perplexity: 26.3137, time_taken_in_seconds: 29
Epoch [1/1], Step [6136/13804], Loss: 2.7413, Perplexity: 15.5069, time_taken_in_seconds: 29
Epoch [1/1], Step [6137/13804], Loss: 2.8443, Perplexity: 17.1891, time_taken_in_seconds: 30
Epoch [1/1], Step [6138/13804], Loss: 2.7519, Perplexity: 15.6716, time_taken_in_seconds: 31
Epoch [1/1], Step [6139/13804], Loss: 2.8005, Perplexity: 16.4527, time_taken_in_seconds: 32
Epoch [1/1], Step [6140/13804], Loss: 2.4548, Perplexity: 11.6437, time_taken_in_seconds: 33
Epoch [1/1], Step [6141/13804], Loss: 2.4264, Perplexity: 11.3184, time_taken_in_seconds: 34
Epoch [1/1], Step [6142/13804], Loss: 2.5985, Perplexity: 13.4430, time_taken_in_seconds: 34
Epoch [1/1], Step [6143/13804], Loss: 2.6980, Perplexity: 14.8502, time_taken_in_seconds: 35
Epoch [1/1], Step [6144/13804], Loss: 2.6585, Perplexity: 14.2752, time_taken_in_seconds: 36
Epoch [1/1], Step [6145/13804], Loss: 2.7011, Perplexity: 14.8958, time_taken_in_seconds: 37
Epoch [1/1], Step [6146/13804], Loss: 2.8288, Perplexity: 16.9258, time_taken_in_seconds: 38
Epoch [1/1], Step [6147/13804], Loss: 2.3939, Perplexity: 10.9564, time_taken_in_seconds: 39
Epoch [1/1], Step [6148/13804], Loss: 2.4485, Perplexity: 11.5704, time_taken_in_seconds: 39
Epoch [1/1], Step [6149/13804], Loss: 2.6674, Perplexity: 14.4030, time_taken_in_seconds: 40
Epoch [1/1], Step [6150/13804], Loss: 2.6641, Perplexity: 14.3553, time_taken_in_seconds: 41
Epoch [1/1], Step [6151/13804], Loss: 2.4120, Perplexity: 11.1568, time_taken_in_seconds: 42
Epoch [1/1], Step [6152/13804], Loss: 2.2798, Perplexity: 9.7747, time_taken_in_seconds: 43
Epoch [1/1], Step [6153/13804], Loss: 2.8338, Perplexity: 17.0099, time_taken_in_seconds: 43
Epoch [1/1], Step [6154/13804], Loss: 2.4496, Perplexity: 11.5836, time_taken_in_seconds: 44
Epoch [1/1], Step [6155/13804], Loss: 2.5108, Perplexity: 12.3148, time_taken_in_seconds: 45
Epoch [1/1], Step [6156/13804], Loss: 2.6004, Perplexity: 13.4687, time_taken_in_seconds: 46
Epoch [1/1], Step [6157/13804], Loss: 2.6829, Perplexity: 14.6269, time_taken_in_seconds: 47
Epoch [1/1], Step [6158/13804], Loss: 2.3983, Perplexity: 11.0047, time_taken_in_seconds: 48
Epoch [1/1], Step [6159/13804], Loss: 2.5233, Perplexity: 12.4698, time_taken_in_seconds: 48
Epoch [1/1], Step [6160/13804], Loss: 2.9804, Perplexity: 19.6963, time_taken_in_seconds: 49
Epoch [1/1], Step [6161/13804], Loss: 2.6231, Perplexity: 13.7782, time_taken_in_seconds: 50
Epoch [1/1], Step [6162/13804], Loss: 2.8654, Perplexity: 17.5569, time_taken_in_seconds: 51
Epoch [1/1], Step [6163/13804], Loss: 2.8966, Perplexity: 18.1133, time_taken_in_seconds: 52
Epoch [1/1], Step [6164/13804], Loss: 2.8867, Perplexity: 17.9348, time_taken_in_seconds: 53
Epoch [1/1], Step [6165/13804], Loss: 2.9615, Perplexity: 19.3268, time_taken_in_seconds: 53
Epoch [1/1], Step [6166/13804], Loss: 2.6707, Perplexity: 14.4503, time_taken_in_seconds: 54
Epoch [1/1], Step [6167/13804], Loss: 2.4525, Perplexity: 11.6176, time_taken_in_seconds: 55
Epoch [1/1], Step [6168/13804], Loss: 3.0068, Perplexity: 20.2223, time_taken_in_seconds: 56
Epoch [1/1], Step [6169/13804], Loss: 2.5093, Perplexity: 12.2966, time_taken_in_seconds: 57
Epoch [1/1], Step [6170/13804], Loss: 2.3761, Perplexity: 10.7626, time_taken_in_seconds: 58
Epoch [1/1], Step [6171/13804], Loss: 2.8303, Perplexity: 16.9512, time_taken_in_seconds: 59
Epoch [1/1], Step [6172/13804], Loss: 2.2518, Perplexity: 9.5047, time_taken_in_seconds: 59
Epoch [1/1], Step [6173/13804], Loss: 3.0176, Perplexity: 20.4426, time_taken_in_seconds: 60
Epoch [1/1], Step [6174/13804], Loss: 2.5013, Perplexity: 12.1980, time_taken_in_seconds: 61
Epoch [1/1], Step [6175/13804], Loss: 2.7888, Perplexity: 16.2609, time_taken_in_seconds: 62
Epoch [1/1], Step [6176/13804], Loss: 2.8092, Perplexity: 16.5959, time_taken_in_seconds: 63
Epoch [1/1], Step [6177/13804], Loss: 3.1416, Perplexity: 23.1418, time_taken_in_seconds: 64
Epoch [1/1], Step [6178/13804], Loss: 2.6249, Perplexity: 13.8036, time_taken_in_seconds: 64
Epoch [1/1], Step [6179/13804], Loss: 2.8010, Perplexity: 16.4615, time_taken_in_seconds: 65
Epoch [1/1], Step [6180/13804], Loss: 2.5866, Perplexity: 13.2846, time_taken_in_seconds: 66
Epoch [1/1], Step [6181/13804], Loss: 2.8374, Perplexity: 17.0705, time_taken_in_seconds: 67
Epoch [1/1], Step [6182/13804], Loss: 2.9220, Perplexity: 18.5784, time_taken_in_seconds: 68
Epoch [1/1], Step [6183/13804], Loss: 2.9032, Perplexity: 18.2333, time_taken_in_seconds: 69
Epoch [1/1], Step [6184/13804], Loss: 2.3757, Perplexity: 10.7581, time_taken_in_seconds: 69
Epoch [1/1], Step [6185/13804], Loss: 4.2234, Perplexity: 68.2635, time_taken_in_seconds: 70
Epoch [1/1], Step [6186/13804], Loss: 2.8827, Perplexity: 17.8628, time_taken_in_seconds: 71
Epoch [1/1], Step [6187/13804], Loss: 2.4344, Perplexity: 11.4086, time_taken_in_seconds: 72
Epoch [1/1], Step [6188/13804], Loss: 2.5886, Perplexity: 13.3111, time_taken_in_seconds: 73
Epoch [1/1], Step [6189/13804], Loss: 3.3208, Perplexity: 27.6831, time_taken_in_seconds: 74
Epoch [1/1], Step [6190/13804], Loss: 2.9679, Perplexity: 19.4506, time_taken_in_seconds: 74
Epoch [1/1], Step [6191/13804], Loss: 2.5887, Perplexity: 13.3128, time_taken_in_seconds: 75
Epoch [1/1], Step [6192/13804], Loss: 2.8851, Perplexity: 17.9052, time_taken_in_seconds: 76
Epoch [1/1], Step [6193/13804], Loss: 2.7344, Perplexity: 15.4010, time_taken_in_seconds: 77
Epoch [1/1], Step [6194/13804], Loss: 2.6878, Perplexity: 14.6988, time_taken_in_seconds: 78
Epoch [1/1], Step [6195/13804], Loss: 3.4884, Perplexity: 32.7333, time_taken_in_seconds: 79
Epoch [1/1], Step [6196/13804], Loss: 2.4259, Perplexity: 11.3124, time_taken_in_seconds: 79
Epoch [1/1], Step [6197/13804], Loss: 3.1269, Perplexity: 22.8028, time_taken_in_seconds: 80
Epoch [1/1], Step [6198/13804], Loss: 2.7941, Perplexity: 16.3476, time_taken_in_seconds: 81
Epoch [1/1], Step [6199/13804], Loss: 2.5065, Perplexity: 12.2621, time_taken_in_seconds: 82
Epoch [1/1], Step [6200/13804], Loss: 2.5781, Perplexity: 13.1724, time_taken_in_seconds: 83
Epoch [1/1], Step [6201/13804], Loss: 2.8138, Perplexity: 16.6728, time_taken_in_seconds: 0
Epoch [1/1], Step [6202/13804], Loss: 2.6287, Perplexity: 13.8559, time_taken_in_seconds: 1
Epoch [1/1], Step [6203/13804], Loss: 2.5186, Perplexity: 12.4109, time_taken_in_seconds: 2
Epoch [1/1], Step [6204/13804], Loss: 2.8686, Perplexity: 17.6129, time_taken_in_seconds: 3
Epoch [1/1], Step [6205/13804], Loss: 2.7157, Perplexity: 15.1153, time_taken_in_seconds: 4
Epoch [1/1], Step [6206/13804], Loss: 2.8655, Perplexity: 17.5579, time_taken_in_seconds: 5
Epoch [1/1], Step [6207/13804], Loss: 2.7923, Perplexity: 16.3186, time_taken_in_seconds: 5
Epoch [1/1], Step [6208/13804], Loss: 2.5551, Perplexity: 12.8723, time_taken_in_seconds: 6
Epoch [1/1], Step [6209/13804], Loss: 2.6778, Perplexity: 14.5532, time_taken_in_seconds: 7
Epoch [1/1], Step [6210/13804], Loss: 2.7218, Perplexity: 15.2069, time_taken_in_seconds: 8
Epoch [1/1], Step [6211/13804], Loss: 2.8554, Perplexity: 17.3822, time_taken_in_seconds: 9
Epoch [1/1], Step [6212/13804], Loss: 3.2645, Perplexity: 26.1675, time_taken_in_seconds: 10
Epoch [1/1], Step [6213/13804], Loss: 2.6810, Perplexity: 14.5996, time_taken_in_seconds: 10
Epoch [1/1], Step [6214/13804], Loss: 2.3361, Perplexity: 10.3413, time_taken_in_seconds: 11
Epoch [1/1], Step [6215/13804], Loss: 2.9305, Perplexity: 18.7368, time_taken_in_seconds: 12
Epoch [1/1], Step [6216/13804], Loss: 2.6071, Perplexity: 13.5592, time_taken_in_seconds: 13
Epoch [1/1], Step [6217/13804], Loss: 2.6330, Perplexity: 13.9148, time_taken_in_seconds: 14
Epoch [1/1], Step [6218/13804], Loss: 2.6549, Perplexity: 14.2236, time_taken_in_seconds: 15
Epoch [1/1], Step [6219/13804], Loss: 2.7969, Perplexity: 16.3943, time_taken_in_seconds: 15
Epoch [1/1], Step [6220/13804], Loss: 2.2838, Perplexity: 9.8144, time_taken_in_seconds: 16
Epoch [1/1], Step [6221/13804], Loss: 2.5258, Perplexity: 12.5005, time_taken_in_seconds: 17
Epoch [1/1], Step [6222/13804], Loss: 2.6081, Perplexity: 13.5738, time_taken_in_seconds: 18
Epoch [1/1], Step [6223/13804], Loss: 2.2098, Perplexity: 9.1140, time_taken_in_seconds: 19
Epoch [1/1], Step [6224/13804], Loss: 2.6397, Perplexity: 14.0097, time_taken_in_seconds: 20
Epoch [1/1], Step [6225/13804], Loss: 2.7111, Perplexity: 15.0456, time_taken_in_seconds: 20
Epoch [1/1], Step [6226/13804], Loss: 2.6591, Perplexity: 14.2831, time_taken_in_seconds: 21
Epoch [1/1], Step [6227/13804], Loss: 2.5922, Perplexity: 13.3598, time_taken_in_seconds: 22
Epoch [1/1], Step [6228/13804], Loss: 2.8050, Perplexity: 16.5275, time_taken_in_seconds: 23
Epoch [1/1], Step [6229/13804], Loss: 2.5806, Perplexity: 13.2048, time_taken_in_seconds: 24
Epoch [1/1], Step [6230/13804], Loss: 2.4951, Perplexity: 12.1230, time_taken_in_seconds: 25
Epoch [1/1], Step [6231/13804], Loss: 2.8150, Perplexity: 16.6932, time_taken_in_seconds: 25
Epoch [1/1], Step [6232/13804], Loss: 2.9952, Perplexity: 19.9895, time_taken_in_seconds: 26
Epoch [1/1], Step [6233/13804], Loss: 2.9219, Perplexity: 18.5770, time_taken_in_seconds: 27
Epoch [1/1], Step [6234/13804], Loss: 2.4832, Perplexity: 11.9799, time_taken_in_seconds: 28
Epoch [1/1], Step [6235/13804], Loss: 3.0939, Perplexity: 22.0620, time_taken_in_seconds: 29
Epoch [1/1], Step [6236/13804], Loss: 2.9274, Perplexity: 18.6785, time_taken_in_seconds: 30
Epoch [1/1], Step [6237/13804], Loss: 2.8811, Perplexity: 17.8336, time_taken_in_seconds: 30
Epoch [1/1], Step [6238/13804], Loss: 2.5802, Perplexity: 13.2003, time_taken_in_seconds: 31
Epoch [1/1], Step [6239/13804], Loss: 2.6518, Perplexity: 14.1800, time_taken_in_seconds: 32
Epoch [1/1], Step [6240/13804], Loss: 2.3770, Perplexity: 10.7725, time_taken_in_seconds: 33
Epoch [1/1], Step [6241/13804], Loss: 2.6822, Perplexity: 14.6173, time_taken_in_seconds: 34
Epoch [1/1], Step [6242/13804], Loss: 2.6057, Perplexity: 13.5412, time_taken_in_seconds: 35
Epoch [1/1], Step [6243/13804], Loss: 2.7004, Perplexity: 14.8851, time_taken_in_seconds: 36
Epoch [1/1], Step [6244/13804], Loss: 3.9758, Perplexity: 53.2938, time_taken_in_seconds: 36
Epoch [1/1], Step [6245/13804], Loss: 2.7743, Perplexity: 16.0274, time_taken_in_seconds: 37
Epoch [1/1], Step [6246/13804], Loss: 3.3583, Perplexity: 28.7396, time_taken_in_seconds: 38
Epoch [1/1], Step [6247/13804], Loss: 2.3810, Perplexity: 10.8152, time_taken_in_seconds: 39
Epoch [1/1], Step [6248/13804], Loss: 2.6159, Perplexity: 13.6795, time_taken_in_seconds: 40
Epoch [1/1], Step [6249/13804], Loss: 3.0075, Perplexity: 20.2374, time_taken_in_seconds: 40
Epoch [1/1], Step [6250/13804], Loss: 2.6597, Perplexity: 14.2919, time_taken_in_seconds: 41
Epoch [1/1], Step [6251/13804], Loss: 2.8200, Perplexity: 16.7772, time_taken_in_seconds: 42
Epoch [1/1], Step [6252/13804], Loss: 2.8496, Perplexity: 17.2815, time_taken_in_seconds: 43
Epoch [1/1], Step [6253/13804], Loss: 2.5156, Perplexity: 12.3740, time_taken_in_seconds: 44
Epoch [1/1], Step [6254/13804], Loss: 2.6091, Perplexity: 13.5874, time_taken_in_seconds: 45
Epoch [1/1], Step [6255/13804], Loss: 2.6833, Perplexity: 14.6328, time_taken_in_seconds: 45
Epoch [1/1], Step [6256/13804], Loss: 2.7306, Perplexity: 15.3416, time_taken_in_seconds: 46
Epoch [1/1], Step [6257/13804], Loss: 2.3971, Perplexity: 10.9912, time_taken_in_seconds: 47
Epoch [1/1], Step [6258/13804], Loss: 2.6934, Perplexity: 14.7812, time_taken_in_seconds: 48
Epoch [1/1], Step [6259/13804], Loss: 2.8525, Perplexity: 17.3314, time_taken_in_seconds: 49
Epoch [1/1], Step [6260/13804], Loss: 3.0200, Perplexity: 20.4914, time_taken_in_seconds: 50
Epoch [1/1], Step [6261/13804], Loss: 2.6597, Perplexity: 14.2924, time_taken_in_seconds: 50
Epoch [1/1], Step [6262/13804], Loss: 3.7650, Perplexity: 43.1640, time_taken_in_seconds: 51
Epoch [1/1], Step [6263/13804], Loss: 2.5618, Perplexity: 12.9593, time_taken_in_seconds: 52
Epoch [1/1], Step [6264/13804], Loss: 2.5334, Perplexity: 12.5963, time_taken_in_seconds: 53
Epoch [1/1], Step [6265/13804], Loss: 3.1829, Perplexity: 24.1164, time_taken_in_seconds: 54
Epoch [1/1], Step [6266/13804], Loss: 2.8144, Perplexity: 16.6834, time_taken_in_seconds: 55
Epoch [1/1], Step [6267/13804], Loss: 2.7489, Perplexity: 15.6252, time_taken_in_seconds: 55
Epoch [1/1], Step [6268/13804], Loss: 2.4249, Perplexity: 11.3015, time_taken_in_seconds: 56
Epoch [1/1], Step [6269/13804], Loss: 2.2867, Perplexity: 9.8426, time_taken_in_seconds: 57
Epoch [1/1], Step [6270/13804], Loss: 3.2320, Perplexity: 25.3301, time_taken_in_seconds: 58
Epoch [1/1], Step [6271/13804], Loss: 2.8109, Perplexity: 16.6254, time_taken_in_seconds: 59
Epoch [1/1], Step [6272/13804], Loss: 2.4374, Perplexity: 11.4435, time_taken_in_seconds: 60
Epoch [1/1], Step [6273/13804], Loss: 2.7410, Perplexity: 15.5032, time_taken_in_seconds: 60
Epoch [1/1], Step [6274/13804], Loss: 2.4334, Perplexity: 11.3971, time_taken_in_seconds: 61
Epoch [1/1], Step [6275/13804], Loss: 2.5174, Perplexity: 12.3966, time_taken_in_seconds: 62
Epoch [1/1], Step [6276/13804], Loss: 2.5378, Perplexity: 12.6520, time_taken_in_seconds: 63
Epoch [1/1], Step [6277/13804], Loss: 2.4983, Perplexity: 12.1622, time_taken_in_seconds: 64
Epoch [1/1], Step [6278/13804], Loss: 3.6379, Perplexity: 38.0103, time_taken_in_seconds: 65
Epoch [1/1], Step [6279/13804], Loss: 2.4628, Perplexity: 11.7379, time_taken_in_seconds: 65
Epoch [1/1], Step [6280/13804], Loss: 2.4396, Perplexity: 11.4684, time_taken_in_seconds: 66
Epoch [1/1], Step [6281/13804], Loss: 2.8729, Perplexity: 17.6879, time_taken_in_seconds: 67
Epoch [1/1], Step [6282/13804], Loss: 2.5272, Perplexity: 12.5180, time_taken_in_seconds: 68
Epoch [1/1], Step [6283/13804], Loss: 2.6128, Perplexity: 13.6370, time_taken_in_seconds: 69
Epoch [1/1], Step [6284/13804], Loss: 2.5052, Perplexity: 12.2463, time_taken_in_seconds: 69
Epoch [1/1], Step [6285/13804], Loss: 2.9886, Perplexity: 19.8575, time_taken_in_seconds: 70
Epoch [1/1], Step [6286/13804], Loss: 2.6962, Perplexity: 14.8234, time_taken_in_seconds: 71
Epoch [1/1], Step [6287/13804], Loss: 2.4680, Perplexity: 11.7985, time_taken_in_seconds: 72
Epoch [1/1], Step [6288/13804], Loss: 2.3250, Perplexity: 10.2265, time_taken_in_seconds: 73
Epoch [1/1], Step [6289/13804], Loss: 2.6877, Perplexity: 14.6979, time_taken_in_seconds: 74
Epoch [1/1], Step [6290/13804], Loss: 2.3380, Perplexity: 10.3602, time_taken_in_seconds: 74
Epoch [1/1], Step [6291/13804], Loss: 2.4136, Perplexity: 11.1737, time_taken_in_seconds: 75
Epoch [1/1], Step [6292/13804], Loss: 2.7299, Perplexity: 15.3311, time_taken_in_seconds: 76
Epoch [1/1], Step [6293/13804], Loss: 3.4827, Perplexity: 32.5487, time_taken_in_seconds: 77
Epoch [1/1], Step [6294/13804], Loss: 2.4457, Perplexity: 11.5387, time_taken_in_seconds: 78
Epoch [1/1], Step [6295/13804], Loss: 3.3966, Perplexity: 29.8613, time_taken_in_seconds: 79
Epoch [1/1], Step [6296/13804], Loss: 2.5346, Perplexity: 12.6117, time_taken_in_seconds: 79
Epoch [1/1], Step [6297/13804], Loss: 2.6884, Perplexity: 14.7088, time_taken_in_seconds: 80
Epoch [1/1], Step [6298/13804], Loss: 2.1353, Perplexity: 8.4599, time_taken_in_seconds: 81
Epoch [1/1], Step [6299/13804], Loss: 2.6542, Perplexity: 14.2132, time_taken_in_seconds: 82
Epoch [1/1], Step [6300/13804], Loss: 3.0895, Perplexity: 21.9666, time_taken_in_seconds: 83
Epoch [1/1], Step [6301/13804], Loss: 2.6288, Perplexity: 13.8573, time_taken_in_seconds: 0
Epoch [1/1], Step [6302/13804], Loss: 2.4736, Perplexity: 11.8646, time_taken_in_seconds: 1
Epoch [1/1], Step [6303/13804], Loss: 2.7784, Perplexity: 16.0937, time_taken_in_seconds: 2
Epoch [1/1], Step [6304/13804], Loss: 2.5582, Perplexity: 12.9130, time_taken_in_seconds: 3
Epoch [1/1], Step [6305/13804], Loss: 2.4355, Perplexity: 11.4212, time_taken_in_seconds: 4
Epoch [1/1], Step [6306/13804], Loss: 2.7154, Perplexity: 15.1112, time_taken_in_seconds: 4
Epoch [1/1], Step [6307/13804], Loss: 2.6160, Perplexity: 13.6812, time_taken_in_seconds: 5
Epoch [1/1], Step [6308/13804], Loss: 2.5285, Perplexity: 12.5346, time_taken_in_seconds: 6
Epoch [1/1], Step [6309/13804], Loss: 2.7191, Perplexity: 15.1661, time_taken_in_seconds: 7
Epoch [1/1], Step [6310/13804], Loss: 2.7902, Perplexity: 16.2848, time_taken_in_seconds: 8
Epoch [1/1], Step [6311/13804], Loss: 2.8348, Perplexity: 17.0267, time_taken_in_seconds: 9
Epoch [1/1], Step [6312/13804], Loss: 2.5798, Perplexity: 13.1946, time_taken_in_seconds: 10
Epoch [1/1], Step [6313/13804], Loss: 2.7068, Perplexity: 14.9812, time_taken_in_seconds: 10
Epoch [1/1], Step [6314/13804], Loss: 3.3365, Perplexity: 28.1211, time_taken_in_seconds: 11
Epoch [1/1], Step [6315/13804], Loss: 2.5909, Perplexity: 13.3414, time_taken_in_seconds: 12
Epoch [1/1], Step [6316/13804], Loss: 2.5619, Perplexity: 12.9603, time_taken_in_seconds: 13
Epoch [1/1], Step [6317/13804], Loss: 2.4374, Perplexity: 11.4427, time_taken_in_seconds: 14
Epoch [1/1], Step [6318/13804], Loss: 2.4332, Perplexity: 11.3954, time_taken_in_seconds: 15
Epoch [1/1], Step [6319/13804], Loss: 2.7905, Perplexity: 16.2885, time_taken_in_seconds: 15
Epoch [1/1], Step [6320/13804], Loss: 3.0060, Perplexity: 20.2063, time_taken_in_seconds: 16
Epoch [1/1], Step [6321/13804], Loss: 3.0655, Perplexity: 21.4445, time_taken_in_seconds: 17
Epoch [1/1], Step [6322/13804], Loss: 2.6526, Perplexity: 14.1902, time_taken_in_seconds: 18
Epoch [1/1], Step [6323/13804], Loss: 2.4024, Perplexity: 11.0495, time_taken_in_seconds: 19
Epoch [1/1], Step [6324/13804], Loss: 3.1099, Perplexity: 22.4179, time_taken_in_seconds: 20
Epoch [1/1], Step [6325/13804], Loss: 2.5764, Perplexity: 13.1504, time_taken_in_seconds: 20
Epoch [1/1], Step [6326/13804], Loss: 2.9960, Perplexity: 20.0049, time_taken_in_seconds: 21
Epoch [1/1], Step [6327/13804], Loss: 2.9229, Perplexity: 18.5947, time_taken_in_seconds: 22
Epoch [1/1], Step [6328/13804], Loss: 2.6552, Perplexity: 14.2273, time_taken_in_seconds: 23
Epoch [1/1], Step [6329/13804], Loss: 2.4017, Perplexity: 11.0420, time_taken_in_seconds: 24
Epoch [1/1], Step [6330/13804], Loss: 2.4868, Perplexity: 12.0230, time_taken_in_seconds: 25
Epoch [1/1], Step [6331/13804], Loss: 2.5386, Perplexity: 12.6623, time_taken_in_seconds: 25
Epoch [1/1], Step [6332/13804], Loss: 2.7533, Perplexity: 15.6945, time_taken_in_seconds: 26
Epoch [1/1], Step [6333/13804], Loss: 2.5215, Perplexity: 12.4473, time_taken_in_seconds: 27
Epoch [1/1], Step [6334/13804], Loss: 2.6301, Perplexity: 13.8747, time_taken_in_seconds: 28
Epoch [1/1], Step [6335/13804], Loss: 2.6743, Perplexity: 14.5018, time_taken_in_seconds: 29
Epoch [1/1], Step [6336/13804], Loss: 2.7935, Perplexity: 16.3384, time_taken_in_seconds: 30
Epoch [1/1], Step [6337/13804], Loss: 2.7553, Perplexity: 15.7258, time_taken_in_seconds: 30
Epoch [1/1], Step [6338/13804], Loss: 2.9611, Perplexity: 19.3185, time_taken_in_seconds: 31
Epoch [1/1], Step [6339/13804], Loss: 3.0043, Perplexity: 20.1726, time_taken_in_seconds: 32
Epoch [1/1], Step [6340/13804], Loss: 2.5098, Perplexity: 12.3030, time_taken_in_seconds: 33
Epoch [1/1], Step [6341/13804], Loss: 2.9693, Perplexity: 19.4788, time_taken_in_seconds: 34
Epoch [1/1], Step [6342/13804], Loss: 2.5540, Perplexity: 12.8591, time_taken_in_seconds: 34
Epoch [1/1], Step [6343/13804], Loss: 2.6151, Perplexity: 13.6691, time_taken_in_seconds: 35
Epoch [1/1], Step [6344/13804], Loss: 2.2607, Perplexity: 9.5897, time_taken_in_seconds: 36
Epoch [1/1], Step [6345/13804], Loss: 2.6282, Perplexity: 13.8482, time_taken_in_seconds: 37
Epoch [1/1], Step [6346/13804], Loss: 2.5163, Perplexity: 12.3828, time_taken_in_seconds: 38
Epoch [1/1], Step [6347/13804], Loss: 2.4019, Perplexity: 11.0437, time_taken_in_seconds: 39
Epoch [1/1], Step [6348/13804], Loss: 2.7805, Perplexity: 16.1271, time_taken_in_seconds: 39
Epoch [1/1], Step [6349/13804], Loss: 2.9625, Perplexity: 19.3457, time_taken_in_seconds: 40
Epoch [1/1], Step [6350/13804], Loss: 2.5425, Perplexity: 12.7114, time_taken_in_seconds: 41
Epoch [1/1], Step [6351/13804], Loss: 2.7200, Perplexity: 15.1806, time_taken_in_seconds: 42
Epoch [1/1], Step [6352/13804], Loss: 2.9568, Perplexity: 19.2354, time_taken_in_seconds: 43
Epoch [1/1], Step [6353/13804], Loss: 3.0856, Perplexity: 21.8797, time_taken_in_seconds: 44
Epoch [1/1], Step [6354/13804], Loss: 2.4296, Perplexity: 11.3542, time_taken_in_seconds: 44
Epoch [1/1], Step [6355/13804], Loss: 2.9727, Perplexity: 19.5440, time_taken_in_seconds: 45
Epoch [1/1], Step [6356/13804], Loss: 2.7271, Perplexity: 15.2884, time_taken_in_seconds: 46
Epoch [1/1], Step [6357/13804], Loss: 2.4743, Perplexity: 11.8736, time_taken_in_seconds: 47
Epoch [1/1], Step [6358/13804], Loss: 2.7183, Perplexity: 15.1544, time_taken_in_seconds: 48
Epoch [1/1], Step [6359/13804], Loss: 2.8002, Perplexity: 16.4485, time_taken_in_seconds: 49
Epoch [1/1], Step [6360/13804], Loss: 2.7034, Perplexity: 14.9300, time_taken_in_seconds: 49
Epoch [1/1], Step [6361/13804], Loss: 2.5426, Perplexity: 12.7122, time_taken_in_seconds: 50
Epoch [1/1], Step [6362/13804], Loss: 2.4160, Perplexity: 11.2012, time_taken_in_seconds: 51
Epoch [1/1], Step [6363/13804], Loss: 2.9619, Perplexity: 19.3339, time_taken_in_seconds: 52
Epoch [1/1], Step [6364/13804], Loss: 2.4492, Perplexity: 11.5786, time_taken_in_seconds: 53
Epoch [1/1], Step [6365/13804], Loss: 2.6196, Perplexity: 13.7300, time_taken_in_seconds: 53
Epoch [1/1], Step [6366/13804], Loss: 2.5148, Perplexity: 12.3647, time_taken_in_seconds: 54
Epoch [1/1], Step [6367/13804], Loss: 2.5497, Perplexity: 12.8029, time_taken_in_seconds: 55
Epoch [1/1], Step [6368/13804], Loss: 2.5150, Perplexity: 12.3661, time_taken_in_seconds: 56
Epoch [1/1], Step [6369/13804], Loss: 3.0159, Perplexity: 20.4072, time_taken_in_seconds: 57
Epoch [1/1], Step [6370/13804], Loss: 2.5841, Perplexity: 13.2507, time_taken_in_seconds: 58
Epoch [1/1], Step [6371/13804], Loss: 3.5262, Perplexity: 33.9947, time_taken_in_seconds: 58
Epoch [1/1], Step [6372/13804], Loss: 2.5041, Perplexity: 12.2330, time_taken_in_seconds: 59
Epoch [1/1], Step [6373/13804], Loss: 2.6009, Perplexity: 13.4759, time_taken_in_seconds: 60
Epoch [1/1], Step [6374/13804], Loss: 2.8003, Perplexity: 16.4495, time_taken_in_seconds: 61
Epoch [1/1], Step [6375/13804], Loss: 2.6014, Perplexity: 13.4823, time_taken_in_seconds: 62
Epoch [1/1], Step [6376/13804], Loss: 2.5980, Perplexity: 13.4375, time_taken_in_seconds: 62
Epoch [1/1], Step [6377/13804], Loss: 2.8516, Perplexity: 17.3156, time_taken_in_seconds: 63
Epoch [1/1], Step [6378/13804], Loss: 3.0836, Perplexity: 21.8358, time_taken_in_seconds: 64
Epoch [1/1], Step [6379/13804], Loss: 3.0724, Perplexity: 21.5929, time_taken_in_seconds: 65
Epoch [1/1], Step [6380/13804], Loss: 2.8903, Perplexity: 17.9983, time_taken_in_seconds: 66
Epoch [1/1], Step [6381/13804], Loss: 2.7235, Perplexity: 15.2334, time_taken_in_seconds: 67
Epoch [1/1], Step [6382/13804], Loss: 2.8652, Perplexity: 17.5525, time_taken_in_seconds: 67
Epoch [1/1], Step [6383/13804], Loss: 2.3930, Perplexity: 10.9464, time_taken_in_seconds: 68
Epoch [1/1], Step [6384/13804], Loss: 2.4418, Perplexity: 11.4933, time_taken_in_seconds: 69
Epoch [1/1], Step [6385/13804], Loss: 2.7042, Perplexity: 14.9430, time_taken_in_seconds: 70
Epoch [1/1], Step [6386/13804], Loss: 2.8518, Perplexity: 17.3187, time_taken_in_seconds: 71
Epoch [1/1], Step [6387/13804], Loss: 2.5008, Perplexity: 12.1923, time_taken_in_seconds: 72
Epoch [1/1], Step [6388/13804], Loss: 2.5185, Perplexity: 12.4101, time_taken_in_seconds: 73
Epoch [1/1], Step [6389/13804], Loss: 2.7707, Perplexity: 15.9706, time_taken_in_seconds: 73
Epoch [1/1], Step [6390/13804], Loss: 2.3541, Perplexity: 10.5290, time_taken_in_seconds: 74
Epoch [1/1], Step [6391/13804], Loss: 2.5161, Perplexity: 12.3808, time_taken_in_seconds: 75
Epoch [1/1], Step [6392/13804], Loss: 2.3948, Perplexity: 10.9656, time_taken_in_seconds: 76
Epoch [1/1], Step [6393/13804], Loss: 2.8225, Perplexity: 16.8186, time_taken_in_seconds: 77
Epoch [1/1], Step [6394/13804], Loss: 2.6469, Perplexity: 14.1099, time_taken_in_seconds: 78
Epoch [1/1], Step [6395/13804], Loss: 2.5011, Perplexity: 12.1954, time_taken_in_seconds: 78
Epoch [1/1], Step [6396/13804], Loss: 3.0450, Perplexity: 21.0111, time_taken_in_seconds: 79
Epoch [1/1], Step [6397/13804], Loss: 2.6976, Perplexity: 14.8447, time_taken_in_seconds: 80
Epoch [1/1], Step [6398/13804], Loss: 2.6501, Perplexity: 14.1556, time_taken_in_seconds: 81
Epoch [1/1], Step [6399/13804], Loss: 2.8277, Perplexity: 16.9058, time_taken_in_seconds: 82
Epoch [1/1], Step [6400/13804], Loss: 2.6800, Perplexity: 14.5847, time_taken_in_seconds: 83
Epoch [1/1], Step [6401/13804], Loss: 3.6033, Perplexity: 36.7176, time_taken_in_seconds: 0
Epoch [1/1], Step [6402/13804], Loss: 2.3749, Perplexity: 10.7502, time_taken_in_seconds: 1
Epoch [1/1], Step [6403/13804], Loss: 2.8821, Perplexity: 17.8515, time_taken_in_seconds: 2
Epoch [1/1], Step [6404/13804], Loss: 2.6217, Perplexity: 13.7588, time_taken_in_seconds: 3
Epoch [1/1], Step [6405/13804], Loss: 2.6417, Perplexity: 14.0375, time_taken_in_seconds: 4
Epoch [1/1], Step [6406/13804], Loss: 3.0488, Perplexity: 21.0904, time_taken_in_seconds: 5
Epoch [1/1], Step [6407/13804], Loss: 2.7966, Perplexity: 16.3895, time_taken_in_seconds: 5
Epoch [1/1], Step [6408/13804], Loss: 2.7514, Perplexity: 15.6642, time_taken_in_seconds: 6
Epoch [1/1], Step [6409/13804], Loss: 3.2873, Perplexity: 26.7694, time_taken_in_seconds: 7
Epoch [1/1], Step [6410/13804], Loss: 2.7191, Perplexity: 15.1666, time_taken_in_seconds: 8
Epoch [1/1], Step [6411/13804], Loss: 2.5649, Perplexity: 12.9994, time_taken_in_seconds: 9
Epoch [1/1], Step [6412/13804], Loss: 2.5705, Perplexity: 13.0719, time_taken_in_seconds: 9
Epoch [1/1], Step [6413/13804], Loss: 2.9502, Perplexity: 19.1088, time_taken_in_seconds: 10
Epoch [1/1], Step [6414/13804], Loss: 2.7217, Perplexity: 15.2057, time_taken_in_seconds: 11
Epoch [1/1], Step [6415/13804], Loss: 2.8138, Perplexity: 16.6733, time_taken_in_seconds: 12
Epoch [1/1], Step [6416/13804], Loss: 2.8654, Perplexity: 17.5564, time_taken_in_seconds: 13
Epoch [1/1], Step [6417/13804], Loss: 2.5080, Perplexity: 12.2806, time_taken_in_seconds: 14
Epoch [1/1], Step [6418/13804], Loss: 2.8342, Perplexity: 17.0171, time_taken_in_seconds: 14
Epoch [1/1], Step [6419/13804], Loss: 2.7576, Perplexity: 15.7620, time_taken_in_seconds: 15
Epoch [1/1], Step [6420/13804], Loss: 2.6546, Perplexity: 14.2189, time_taken_in_seconds: 16
Epoch [1/1], Step [6421/13804], Loss: 2.7777, Perplexity: 16.0813, time_taken_in_seconds: 17
Epoch [1/1], Step [6422/13804], Loss: 2.6297, Perplexity: 13.8695, time_taken_in_seconds: 18
Epoch [1/1], Step [6423/13804], Loss: 2.8566, Perplexity: 17.4025, time_taken_in_seconds: 18
Epoch [1/1], Step [6424/13804], Loss: 2.7449, Perplexity: 15.5628, time_taken_in_seconds: 19
Epoch [1/1], Step [6425/13804], Loss: 2.5780, Perplexity: 13.1708, time_taken_in_seconds: 20
Epoch [1/1], Step [6426/13804], Loss: 2.7090, Perplexity: 15.0139, time_taken_in_seconds: 21
Epoch [1/1], Step [6427/13804], Loss: 2.6233, Perplexity: 13.7815, time_taken_in_seconds: 22
Epoch [1/1], Step [6428/13804], Loss: 3.2953, Perplexity: 26.9863, time_taken_in_seconds: 23
Epoch [1/1], Step [6429/13804], Loss: 2.5153, Perplexity: 12.3701, time_taken_in_seconds: 23
Epoch [1/1], Step [6430/13804], Loss: 2.7259, Perplexity: 15.2708, time_taken_in_seconds: 24
Epoch [1/1], Step [6431/13804], Loss: 2.5953, Perplexity: 13.4002, time_taken_in_seconds: 25
Epoch [1/1], Step [6432/13804], Loss: 2.7475, Perplexity: 15.6033, time_taken_in_seconds: 26
Epoch [1/1], Step [6433/13804], Loss: 2.6064, Perplexity: 13.5500, time_taken_in_seconds: 27
Epoch [1/1], Step [6434/13804], Loss: 2.9475, Perplexity: 19.0574, time_taken_in_seconds: 28
Epoch [1/1], Step [6435/13804], Loss: 2.5000, Perplexity: 12.1824, time_taken_in_seconds: 28
Epoch [1/1], Step [6436/13804], Loss: 2.3675, Perplexity: 10.6706, time_taken_in_seconds: 29
Epoch [1/1], Step [6437/13804], Loss: 2.6026, Perplexity: 13.4986, time_taken_in_seconds: 30
Epoch [1/1], Step [6438/13804], Loss: 2.9217, Perplexity: 18.5723, time_taken_in_seconds: 31
Epoch [1/1], Step [6439/13804], Loss: 2.5040, Perplexity: 12.2314, time_taken_in_seconds: 32
Epoch [1/1], Step [6440/13804], Loss: 2.8647, Perplexity: 17.5430, time_taken_in_seconds: 32
Epoch [1/1], Step [6441/13804], Loss: 2.8562, Perplexity: 17.3948, time_taken_in_seconds: 33
Epoch [1/1], Step [6442/13804], Loss: 3.0908, Perplexity: 21.9939, time_taken_in_seconds: 34
Epoch [1/1], Step [6443/13804], Loss: 2.6935, Perplexity: 14.7834, time_taken_in_seconds: 35
Epoch [1/1], Step [6444/13804], Loss: 2.8737, Perplexity: 17.7030, time_taken_in_seconds: 36
Epoch [1/1], Step [6445/13804], Loss: 2.6445, Perplexity: 14.0760, time_taken_in_seconds: 36
Epoch [1/1], Step [6446/13804], Loss: 2.7331, Perplexity: 15.3807, time_taken_in_seconds: 37
Epoch [1/1], Step [6447/13804], Loss: 2.7374, Perplexity: 15.4464, time_taken_in_seconds: 38
Epoch [1/1], Step [6448/13804], Loss: 2.5857, Perplexity: 13.2729, time_taken_in_seconds: 39
Epoch [1/1], Step [6449/13804], Loss: 2.5616, Perplexity: 12.9571, time_taken_in_seconds: 40
Epoch [1/1], Step [6450/13804], Loss: 2.9159, Perplexity: 18.4657, time_taken_in_seconds: 41
Epoch [1/1], Step [6451/13804], Loss: 2.4555, Perplexity: 11.6522, time_taken_in_seconds: 41
Epoch [1/1], Step [6452/13804], Loss: 2.5537, Perplexity: 12.8551, time_taken_in_seconds: 42
Epoch [1/1], Step [6453/13804], Loss: 2.7653, Perplexity: 15.8838, time_taken_in_seconds: 43
Epoch [1/1], Step [6454/13804], Loss: 3.0466, Perplexity: 21.0447, time_taken_in_seconds: 44
Epoch [1/1], Step [6455/13804], Loss: 2.7039, Perplexity: 14.9382, time_taken_in_seconds: 45
Epoch [1/1], Step [6456/13804], Loss: 2.6517, Perplexity: 14.1775, time_taken_in_seconds: 45
Epoch [1/1], Step [6457/13804], Loss: 3.0357, Perplexity: 20.8147, time_taken_in_seconds: 46
Epoch [1/1], Step [6458/13804], Loss: 2.7656, Perplexity: 15.8885, time_taken_in_seconds: 47
Epoch [1/1], Step [6459/13804], Loss: 2.4799, Perplexity: 11.9398, time_taken_in_seconds: 48
Epoch [1/1], Step [6460/13804], Loss: 2.6863, Perplexity: 14.6771, time_taken_in_seconds: 49
Epoch [1/1], Step [6461/13804], Loss: 2.9248, Perplexity: 18.6306, time_taken_in_seconds: 50
Epoch [1/1], Step [6462/13804], Loss: 2.4881, Perplexity: 12.0382, time_taken_in_seconds: 51
Epoch [1/1], Step [6463/13804], Loss: 2.6904, Perplexity: 14.7382, time_taken_in_seconds: 51
Epoch [1/1], Step [6464/13804], Loss: 2.6935, Perplexity: 14.7837, time_taken_in_seconds: 52
Epoch [1/1], Step [6465/13804], Loss: 2.3594, Perplexity: 10.5851, time_taken_in_seconds: 53
Epoch [1/1], Step [6466/13804], Loss: 2.3889, Perplexity: 10.9010, time_taken_in_seconds: 54
Epoch [1/1], Step [6467/13804], Loss: 2.6748, Perplexity: 14.5089, time_taken_in_seconds: 55
Epoch [1/1], Step [6468/13804], Loss: 2.4278, Perplexity: 11.3342, time_taken_in_seconds: 56
Epoch [1/1], Step [6469/13804], Loss: 2.6887, Perplexity: 14.7121, time_taken_in_seconds: 56
Epoch [1/1], Step [6470/13804], Loss: 2.5887, Perplexity: 13.3127, time_taken_in_seconds: 57
Epoch [1/1], Step [6471/13804], Loss: 2.7746, Perplexity: 16.0318, time_taken_in_seconds: 58
Epoch [1/1], Step [6472/13804], Loss: 2.6013, Perplexity: 13.4811, time_taken_in_seconds: 59
Epoch [1/1], Step [6473/13804], Loss: 2.8888, Perplexity: 17.9723, time_taken_in_seconds: 60
Epoch [1/1], Step [6474/13804], Loss: 2.1379, Perplexity: 8.4814, time_taken_in_seconds: 60
Epoch [1/1], Step [6475/13804], Loss: 2.3714, Perplexity: 10.7123, time_taken_in_seconds: 61
Epoch [1/1], Step [6476/13804], Loss: 2.6404, Perplexity: 14.0186, time_taken_in_seconds: 62
Epoch [1/1], Step [6477/13804], Loss: 2.5284, Perplexity: 12.5329, time_taken_in_seconds: 63
Epoch [1/1], Step [6478/13804], Loss: 2.7902, Perplexity: 16.2847, time_taken_in_seconds: 64
Epoch [1/1], Step [6479/13804], Loss: 2.9790, Perplexity: 19.6679, time_taken_in_seconds: 65
Epoch [1/1], Step [6480/13804], Loss: 2.3900, Perplexity: 10.9130, time_taken_in_seconds: 65
Epoch [1/1], Step [6481/13804], Loss: 3.4981, Perplexity: 33.0540, time_taken_in_seconds: 66
Epoch [1/1], Step [6482/13804], Loss: 3.0468, Perplexity: 21.0489, time_taken_in_seconds: 67
Epoch [1/1], Step [6483/13804], Loss: 2.7259, Perplexity: 15.2705, time_taken_in_seconds: 68
Epoch [1/1], Step [6484/13804], Loss: 3.2591, Perplexity: 26.0263, time_taken_in_seconds: 69
Epoch [1/1], Step [6485/13804], Loss: 2.5367, Perplexity: 12.6382, time_taken_in_seconds: 69
Epoch [1/1], Step [6486/13804], Loss: 2.3665, Perplexity: 10.6600, time_taken_in_seconds: 70
Epoch [1/1], Step [6487/13804], Loss: 2.6084, Perplexity: 13.5774, time_taken_in_seconds: 71
Epoch [1/1], Step [6488/13804], Loss: 2.6954, Perplexity: 14.8113, time_taken_in_seconds: 72
Epoch [1/1], Step [6489/13804], Loss: 2.6435, Perplexity: 14.0629, time_taken_in_seconds: 73
Epoch [1/1], Step [6490/13804], Loss: 3.0412, Perplexity: 20.9293, time_taken_in_seconds: 73
Epoch [1/1], Step [6491/13804], Loss: 2.4820, Perplexity: 11.9655, time_taken_in_seconds: 74
Epoch [1/1], Step [6492/13804], Loss: 2.8169, Perplexity: 16.7256, time_taken_in_seconds: 75
Epoch [1/1], Step [6493/13804], Loss: 2.6137, Perplexity: 13.6497, time_taken_in_seconds: 76
Epoch [1/1], Step [6494/13804], Loss: 2.8264, Perplexity: 16.8841, time_taken_in_seconds: 77
Epoch [1/1], Step [6495/13804], Loss: 2.5394, Perplexity: 12.6725, time_taken_in_seconds: 78
Epoch [1/1], Step [6496/13804], Loss: 2.8987, Perplexity: 18.1512, time_taken_in_seconds: 78
Epoch [1/1], Step [6497/13804], Loss: 2.8162, Perplexity: 16.7127, time_taken_in_seconds: 79
Epoch [1/1], Step [6498/13804], Loss: 3.2272, Perplexity: 25.2091, time_taken_in_seconds: 80
Epoch [1/1], Step [6499/13804], Loss: 2.6388, Perplexity: 13.9970, time_taken_in_seconds: 81
Epoch [1/1], Step [6500/13804], Loss: 2.7424, Perplexity: 15.5238, time_taken_in_seconds: 82
Epoch [1/1], Step [6501/13804], Loss: 2.7520, Perplexity: 15.6734, time_taken_in_seconds: 0
Epoch [1/1], Step [6502/13804], Loss: 3.1350, Perplexity: 22.9888, time_taken_in_seconds: 1
Epoch [1/1], Step [6503/13804], Loss: 2.6600, Perplexity: 14.2963, time_taken_in_seconds: 2
Epoch [1/1], Step [6504/13804], Loss: 3.0655, Perplexity: 21.4461, time_taken_in_seconds: 3
Epoch [1/1], Step [6505/13804], Loss: 2.4074, Perplexity: 11.1048, time_taken_in_seconds: 4
Epoch [1/1], Step [6506/13804], Loss: 2.3351, Perplexity: 10.3310, time_taken_in_seconds: 4
Epoch [1/1], Step [6507/13804], Loss: 2.1891, Perplexity: 8.9274, time_taken_in_seconds: 5
Epoch [1/1], Step [6508/13804], Loss: 2.8285, Perplexity: 16.9195, time_taken_in_seconds: 6
Epoch [1/1], Step [6509/13804], Loss: 2.7619, Perplexity: 15.8305, time_taken_in_seconds: 7
Epoch [1/1], Step [6510/13804], Loss: 2.5802, Perplexity: 13.1993, time_taken_in_seconds: 8
Epoch [1/1], Step [6511/13804], Loss: 2.6291, Perplexity: 13.8617, time_taken_in_seconds: 9
Epoch [1/1], Step [6512/13804], Loss: 3.6386, Perplexity: 38.0379, time_taken_in_seconds: 9
Epoch [1/1], Step [6513/13804], Loss: 2.6273, Perplexity: 13.8367, time_taken_in_seconds: 10
Epoch [1/1], Step [6514/13804], Loss: 2.7079, Perplexity: 14.9977, time_taken_in_seconds: 11
Epoch [1/1], Step [6515/13804], Loss: 2.8873, Perplexity: 17.9443, time_taken_in_seconds: 12
Epoch [1/1], Step [6516/13804], Loss: 2.7747, Perplexity: 16.0337, time_taken_in_seconds: 13
Epoch [1/1], Step [6517/13804], Loss: 2.4235, Perplexity: 11.2851, time_taken_in_seconds: 14
Epoch [1/1], Step [6518/13804], Loss: 2.8985, Perplexity: 18.1462, time_taken_in_seconds: 14
Epoch [1/1], Step [6519/13804], Loss: 2.4111, Perplexity: 11.1459, time_taken_in_seconds: 15
Epoch [1/1], Step [6520/13804], Loss: 2.8076, Perplexity: 16.5701, time_taken_in_seconds: 16
Epoch [1/1], Step [6521/13804], Loss: 2.4018, Perplexity: 11.0432, time_taken_in_seconds: 17
Epoch [1/1], Step [6522/13804], Loss: 2.4820, Perplexity: 11.9652, time_taken_in_seconds: 18
Epoch [1/1], Step [6523/13804], Loss: 2.8377, Perplexity: 17.0761, time_taken_in_seconds: 18
Epoch [1/1], Step [6524/13804], Loss: 2.4934, Perplexity: 12.1028, time_taken_in_seconds: 19
Epoch [1/1], Step [6525/13804], Loss: 2.4453, Perplexity: 11.5345, time_taken_in_seconds: 20
Epoch [1/1], Step [6526/13804], Loss: 2.6889, Perplexity: 14.7151, time_taken_in_seconds: 21
Epoch [1/1], Step [6527/13804], Loss: 2.9275, Perplexity: 18.6812, time_taken_in_seconds: 22
Epoch [1/1], Step [6528/13804], Loss: 2.5752, Perplexity: 13.1333, time_taken_in_seconds: 23
Epoch [1/1], Step [6529/13804], Loss: 2.9271, Perplexity: 18.6732, time_taken_in_seconds: 23
Epoch [1/1], Step [6530/13804], Loss: 3.0442, Perplexity: 20.9925, time_taken_in_seconds: 24
Epoch [1/1], Step [6531/13804], Loss: 3.3470, Perplexity: 28.4182, time_taken_in_seconds: 25
Epoch [1/1], Step [6532/13804], Loss: 2.5644, Perplexity: 12.9922, time_taken_in_seconds: 26
Epoch [1/1], Step [6533/13804], Loss: 2.9419, Perplexity: 18.9510, time_taken_in_seconds: 27
Epoch [1/1], Step [6534/13804], Loss: 2.4915, Perplexity: 12.0789, time_taken_in_seconds: 28
Epoch [1/1], Step [6535/13804], Loss: 2.7170, Perplexity: 15.1348, time_taken_in_seconds: 29
Epoch [1/1], Step [6536/13804], Loss: 2.5653, Perplexity: 13.0048, time_taken_in_seconds: 29
Epoch [1/1], Step [6537/13804], Loss: 2.9420, Perplexity: 18.9537, time_taken_in_seconds: 30
Epoch [1/1], Step [6538/13804], Loss: 2.8425, Perplexity: 17.1595, time_taken_in_seconds: 31
Epoch [1/1], Step [6539/13804], Loss: 2.4241, Perplexity: 11.2917, time_taken_in_seconds: 32
Epoch [1/1], Step [6540/13804], Loss: 3.5067, Perplexity: 33.3393, time_taken_in_seconds: 33
Epoch [1/1], Step [6541/13804], Loss: 2.1927, Perplexity: 8.9594, time_taken_in_seconds: 33
Epoch [1/1], Step [6542/13804], Loss: 2.7289, Perplexity: 15.3161, time_taken_in_seconds: 34
Epoch [1/1], Step [6543/13804], Loss: 2.8555, Perplexity: 17.3828, time_taken_in_seconds: 35
Epoch [1/1], Step [6544/13804], Loss: 2.6155, Perplexity: 13.6738, time_taken_in_seconds: 36
Epoch [1/1], Step [6545/13804], Loss: 2.5439, Perplexity: 12.7290, time_taken_in_seconds: 37
Epoch [1/1], Step [6546/13804], Loss: 2.4321, Perplexity: 11.3823, time_taken_in_seconds: 38
Epoch [1/1], Step [6547/13804], Loss: 2.4982, Perplexity: 12.1605, time_taken_in_seconds: 38
Epoch [1/1], Step [6548/13804], Loss: 2.5022, Perplexity: 12.2093, time_taken_in_seconds: 39
Epoch [1/1], Step [6549/13804], Loss: 2.5668, Perplexity: 13.0241, time_taken_in_seconds: 40
Epoch [1/1], Step [6550/13804], Loss: 2.2442, Perplexity: 9.4328, time_taken_in_seconds: 41
Epoch [1/1], Step [6551/13804], Loss: 2.5403, Perplexity: 12.6835, time_taken_in_seconds: 42
Epoch [1/1], Step [6552/13804], Loss: 2.6514, Perplexity: 14.1736, time_taken_in_seconds: 42
Epoch [1/1], Step [6553/13804], Loss: 2.4104, Perplexity: 11.1380, time_taken_in_seconds: 43
Epoch [1/1], Step [6554/13804], Loss: 2.5862, Perplexity: 13.2798, time_taken_in_seconds: 44
Epoch [1/1], Step [6555/13804], Loss: 2.9525, Perplexity: 19.1543, time_taken_in_seconds: 45
Epoch [1/1], Step [6556/13804], Loss: 2.5927, Perplexity: 13.3653, time_taken_in_seconds: 46
Epoch [1/1], Step [6557/13804], Loss: 2.9904, Perplexity: 19.8933, time_taken_in_seconds: 46
Epoch [1/1], Step [6558/13804], Loss: 2.7956, Perplexity: 16.3725, time_taken_in_seconds: 47
Epoch [1/1], Step [6559/13804], Loss: 2.7432, Perplexity: 15.5366, time_taken_in_seconds: 48
Epoch [1/1], Step [6560/13804], Loss: 2.8109, Perplexity: 16.6254, time_taken_in_seconds: 49
Epoch [1/1], Step [6561/13804], Loss: 2.7141, Perplexity: 15.0910, time_taken_in_seconds: 50
Epoch [1/1], Step [6562/13804], Loss: 2.4686, Perplexity: 11.8057, time_taken_in_seconds: 51
Epoch [1/1], Step [6563/13804], Loss: 2.7302, Perplexity: 15.3363, time_taken_in_seconds: 51
Epoch [1/1], Step [6564/13804], Loss: 2.5968, Perplexity: 13.4213, time_taken_in_seconds: 52
Epoch [1/1], Step [6565/13804], Loss: 2.7344, Perplexity: 15.3999, time_taken_in_seconds: 53
Epoch [1/1], Step [6566/13804], Loss: 3.4002, Perplexity: 29.9698, time_taken_in_seconds: 54
Epoch [1/1], Step [6567/13804], Loss: 2.8740, Perplexity: 17.7084, time_taken_in_seconds: 55
Epoch [1/1], Step [6568/13804], Loss: 2.4737, Perplexity: 11.8661, time_taken_in_seconds: 55
Epoch [1/1], Step [6569/13804], Loss: 3.2083, Perplexity: 24.7375, time_taken_in_seconds: 56
Epoch [1/1], Step [6570/13804], Loss: 2.4680, Perplexity: 11.7990, time_taken_in_seconds: 57
Epoch [1/1], Step [6571/13804], Loss: 2.4064, Perplexity: 11.0945, time_taken_in_seconds: 58
Epoch [1/1], Step [6572/13804], Loss: 2.4039, Perplexity: 11.0658, time_taken_in_seconds: 59
Epoch [1/1], Step [6573/13804], Loss: 3.1519, Perplexity: 23.3795, time_taken_in_seconds: 60
Epoch [1/1], Step [6574/13804], Loss: 2.4396, Perplexity: 11.4680, time_taken_in_seconds: 60
Epoch [1/1], Step [6575/13804], Loss: 2.4587, Perplexity: 11.6900, time_taken_in_seconds: 61
Epoch [1/1], Step [6576/13804], Loss: 2.9805, Perplexity: 19.6980, time_taken_in_seconds: 62
Epoch [1/1], Step [6577/13804], Loss: 3.2172, Perplexity: 24.9569, time_taken_in_seconds: 63
Epoch [1/1], Step [6578/13804], Loss: 2.3273, Perplexity: 10.2502, time_taken_in_seconds: 64
Epoch [1/1], Step [6579/13804], Loss: 2.7347, Perplexity: 15.4057, time_taken_in_seconds: 65
Epoch [1/1], Step [6580/13804], Loss: 2.1866, Perplexity: 8.9048, time_taken_in_seconds: 65
Epoch [1/1], Step [6581/13804], Loss: 2.9105, Perplexity: 18.3656, time_taken_in_seconds: 66
Epoch [1/1], Step [6582/13804], Loss: 2.9813, Perplexity: 19.7132, time_taken_in_seconds: 67
Epoch [1/1], Step [6583/13804], Loss: 3.0839, Perplexity: 21.8435, time_taken_in_seconds: 68
Epoch [1/1], Step [6584/13804], Loss: 2.6541, Perplexity: 14.2121, time_taken_in_seconds: 69
Epoch [1/1], Step [6585/13804], Loss: 2.3027, Perplexity: 10.0014, time_taken_in_seconds: 69
Epoch [1/1], Step [6586/13804], Loss: 2.7700, Perplexity: 15.9585, time_taken_in_seconds: 70
Epoch [1/1], Step [6587/13804], Loss: 2.1734, Perplexity: 8.7881, time_taken_in_seconds: 71
Epoch [1/1], Step [6588/13804], Loss: 2.8180, Perplexity: 16.7430, time_taken_in_seconds: 72
Epoch [1/1], Step [6589/13804], Loss: 2.7126, Perplexity: 15.0683, time_taken_in_seconds: 73
Epoch [1/1], Step [6590/13804], Loss: 2.1635, Perplexity: 8.7012, time_taken_in_seconds: 74
Epoch [1/1], Step [6591/13804], Loss: 2.6527, Perplexity: 14.1926, time_taken_in_seconds: 74
Epoch [1/1], Step [6592/13804], Loss: 2.3446, Perplexity: 10.4293, time_taken_in_seconds: 75
Epoch [1/1], Step [6593/13804], Loss: 2.4146, Perplexity: 11.1851, time_taken_in_seconds: 76
Epoch [1/1], Step [6594/13804], Loss: 2.6204, Perplexity: 13.7412, time_taken_in_seconds: 77
Epoch [1/1], Step [6595/13804], Loss: 2.8408, Perplexity: 17.1291, time_taken_in_seconds: 78
Epoch [1/1], Step [6596/13804], Loss: 2.5165, Perplexity: 12.3857, time_taken_in_seconds: 79
Epoch [1/1], Step [6597/13804], Loss: 2.5371, Perplexity: 12.6424, time_taken_in_seconds: 79
Epoch [1/1], Step [6598/13804], Loss: 2.6236, Perplexity: 13.7848, time_taken_in_seconds: 80
Epoch [1/1], Step [6599/13804], Loss: 2.5062, Perplexity: 12.2583, time_taken_in_seconds: 81
Epoch [1/1], Step [6600/13804], Loss: 2.6106, Perplexity: 13.6066, time_taken_in_seconds: 82
Epoch [1/1], Step [6601/13804], Loss: 2.5347, Perplexity: 12.6122, time_taken_in_seconds: 0
Epoch [1/1], Step [6602/13804], Loss: 2.6657, Perplexity: 14.3777, time_taken_in_seconds: 1
Epoch [1/1], Step [6603/13804], Loss: 2.5137, Perplexity: 12.3501, time_taken_in_seconds: 2
Epoch [1/1], Step [6604/13804], Loss: 2.5463, Perplexity: 12.7594, time_taken_in_seconds: 3
Epoch [1/1], Step [6605/13804], Loss: 3.3264, Perplexity: 27.8369, time_taken_in_seconds: 4
Epoch [1/1], Step [6606/13804], Loss: 2.5163, Perplexity: 12.3821, time_taken_in_seconds: 5
Epoch [1/1], Step [6607/13804], Loss: 2.9378, Perplexity: 18.8745, time_taken_in_seconds: 6
Epoch [1/1], Step [6608/13804], Loss: 2.5046, Perplexity: 12.2384, time_taken_in_seconds: 6
Epoch [1/1], Step [6609/13804], Loss: 2.6748, Perplexity: 14.5093, time_taken_in_seconds: 7
Epoch [1/1], Step [6610/13804], Loss: 2.5355, Perplexity: 12.6226, time_taken_in_seconds: 8
Epoch [1/1], Step [6611/13804], Loss: 2.3970, Perplexity: 10.9901, time_taken_in_seconds: 9
Epoch [1/1], Step [6612/13804], Loss: 2.8179, Perplexity: 16.7420, time_taken_in_seconds: 10
Epoch [1/1], Step [6613/13804], Loss: 2.5789, Perplexity: 13.1825, time_taken_in_seconds: 10
Epoch [1/1], Step [6614/13804], Loss: 2.9346, Perplexity: 18.8135, time_taken_in_seconds: 11
Epoch [1/1], Step [6615/13804], Loss: 2.6528, Perplexity: 14.1943, time_taken_in_seconds: 12
Epoch [1/1], Step [6616/13804], Loss: 2.7443, Perplexity: 15.5534, time_taken_in_seconds: 13
Epoch [1/1], Step [6617/13804], Loss: 2.5982, Perplexity: 13.4396, time_taken_in_seconds: 14
Epoch [1/1], Step [6618/13804], Loss: 2.5126, Perplexity: 12.3369, time_taken_in_seconds: 15
Epoch [1/1], Step [6619/13804], Loss: 2.8162, Perplexity: 16.7130, time_taken_in_seconds: 15
Epoch [1/1], Step [6620/13804], Loss: 2.9894, Perplexity: 19.8741, time_taken_in_seconds: 16
Epoch [1/1], Step [6621/13804], Loss: 2.5733, Perplexity: 13.1085, time_taken_in_seconds: 17
Epoch [1/1], Step [6622/13804], Loss: 2.5379, Perplexity: 12.6526, time_taken_in_seconds: 18
Epoch [1/1], Step [6623/13804], Loss: 2.8013, Perplexity: 16.4655, time_taken_in_seconds: 19
Epoch [1/1], Step [6624/13804], Loss: 2.5437, Perplexity: 12.7271, time_taken_in_seconds: 19
Epoch [1/1], Step [6625/13804], Loss: 2.4441, Perplexity: 11.5205, time_taken_in_seconds: 20
Epoch [1/1], Step [6626/13804], Loss: 2.6158, Perplexity: 13.6775, time_taken_in_seconds: 21
Epoch [1/1], Step [6627/13804], Loss: 2.4120, Perplexity: 11.1563, time_taken_in_seconds: 22
Epoch [1/1], Step [6628/13804], Loss: 2.8053, Perplexity: 16.5324, time_taken_in_seconds: 23
Epoch [1/1], Step [6629/13804], Loss: 2.6820, Perplexity: 14.6140, time_taken_in_seconds: 24
Epoch [1/1], Step [6630/13804], Loss: 2.4818, Perplexity: 11.9629, time_taken_in_seconds: 24
Epoch [1/1], Step [6631/13804], Loss: 2.7534, Perplexity: 15.6962, time_taken_in_seconds: 25
Epoch [1/1], Step [6632/13804], Loss: 2.4629, Perplexity: 11.7385, time_taken_in_seconds: 26
Epoch [1/1], Step [6633/13804], Loss: 2.6276, Perplexity: 13.8410, time_taken_in_seconds: 27
Epoch [1/1], Step [6634/13804], Loss: 2.8632, Perplexity: 17.5180, time_taken_in_seconds: 28
Epoch [1/1], Step [6635/13804], Loss: 2.7265, Perplexity: 15.2788, time_taken_in_seconds: 28
Epoch [1/1], Step [6636/13804], Loss: 2.6029, Perplexity: 13.5034, time_taken_in_seconds: 29
Epoch [1/1], Step [6637/13804], Loss: 2.0619, Perplexity: 7.8609, time_taken_in_seconds: 30
Epoch [1/1], Step [6638/13804], Loss: 2.6557, Perplexity: 14.2343, time_taken_in_seconds: 31
Epoch [1/1], Step [6639/13804], Loss: 2.4484, Perplexity: 11.5694, time_taken_in_seconds: 32
Epoch [1/1], Step [6640/13804], Loss: 2.3646, Perplexity: 10.6397, time_taken_in_seconds: 33
Epoch [1/1], Step [6641/13804], Loss: 2.7713, Perplexity: 15.9800, time_taken_in_seconds: 33
Epoch [1/1], Step [6642/13804], Loss: 2.7242, Perplexity: 15.2439, time_taken_in_seconds: 34
Epoch [1/1], Step [6643/13804], Loss: 2.8639, Perplexity: 17.5299, time_taken_in_seconds: 35
Epoch [1/1], Step [6644/13804], Loss: 2.3565, Perplexity: 10.5537, time_taken_in_seconds: 36
Epoch [1/1], Step [6645/13804], Loss: 2.6565, Perplexity: 14.2470, time_taken_in_seconds: 37
Epoch [1/1], Step [6646/13804], Loss: 2.4841, Perplexity: 11.9903, time_taken_in_seconds: 37
Epoch [1/1], Step [6647/13804], Loss: 2.3634, Perplexity: 10.6271, time_taken_in_seconds: 38
Epoch [1/1], Step [6648/13804], Loss: 2.5105, Perplexity: 12.3110, time_taken_in_seconds: 39
Epoch [1/1], Step [6649/13804], Loss: 2.5466, Perplexity: 12.7641, time_taken_in_seconds: 40
Epoch [1/1], Step [6650/13804], Loss: 2.7507, Perplexity: 15.6531, time_taken_in_seconds: 41
Epoch [1/1], Step [6651/13804], Loss: 2.6377, Perplexity: 13.9816, time_taken_in_seconds: 41
Epoch [1/1], Step [6652/13804], Loss: 2.3479, Perplexity: 10.4631, time_taken_in_seconds: 42
Epoch [1/1], Step [6653/13804], Loss: 2.6973, Perplexity: 14.8392, time_taken_in_seconds: 43
Epoch [1/1], Step [6654/13804], Loss: 2.2163, Perplexity: 9.1729, time_taken_in_seconds: 44
Epoch [1/1], Step [6655/13804], Loss: 3.0658, Perplexity: 21.4514, time_taken_in_seconds: 45
Epoch [1/1], Step [6656/13804], Loss: 2.5183, Perplexity: 12.4075, time_taken_in_seconds: 46
Epoch [1/1], Step [6657/13804], Loss: 2.6916, Perplexity: 14.7558, time_taken_in_seconds: 46
Epoch [1/1], Step [6658/13804], Loss: 2.5665, Perplexity: 13.0197, time_taken_in_seconds: 47
Epoch [1/1], Step [6659/13804], Loss: 2.8393, Perplexity: 17.1033, time_taken_in_seconds: 48
Epoch [1/1], Step [6660/13804], Loss: 2.2504, Perplexity: 9.4914, time_taken_in_seconds: 49
Epoch [1/1], Step [6661/13804], Loss: 2.4923, Perplexity: 12.0896, time_taken_in_seconds: 50
Epoch [1/1], Step [6662/13804], Loss: 2.4631, Perplexity: 11.7413, time_taken_in_seconds: 50
Epoch [1/1], Step [6663/13804], Loss: 2.7672, Perplexity: 15.9142, time_taken_in_seconds: 51
Epoch [1/1], Step [6664/13804], Loss: 2.6594, Perplexity: 14.2882, time_taken_in_seconds: 52
Epoch [1/1], Step [6665/13804], Loss: 2.5195, Perplexity: 12.4225, time_taken_in_seconds: 53
Epoch [1/1], Step [6666/13804], Loss: 2.4482, Perplexity: 11.5669, time_taken_in_seconds: 54
Epoch [1/1], Step [6667/13804], Loss: 2.6327, Perplexity: 13.9119, time_taken_in_seconds: 55
Epoch [1/1], Step [6668/13804], Loss: 2.2679, Perplexity: 9.6590, time_taken_in_seconds: 55
Epoch [1/1], Step [6669/13804], Loss: 2.6879, Perplexity: 14.7005, time_taken_in_seconds: 56
Epoch [1/1], Step [6670/13804], Loss: 2.6144, Perplexity: 13.6587, time_taken_in_seconds: 57
Epoch [1/1], Step [6671/13804], Loss: 2.8666, Perplexity: 17.5776, time_taken_in_seconds: 58
Epoch [1/1], Step [6672/13804], Loss: 2.8752, Perplexity: 17.7285, time_taken_in_seconds: 59
Epoch [1/1], Step [6673/13804], Loss: 2.5667, Perplexity: 13.0227, time_taken_in_seconds: 59
Epoch [1/1], Step [6674/13804], Loss: 2.7391, Perplexity: 15.4727, time_taken_in_seconds: 60
Epoch [1/1], Step [6675/13804], Loss: 2.6433, Perplexity: 14.0598, time_taken_in_seconds: 61
Epoch [1/1], Step [6676/13804], Loss: 2.5949, Perplexity: 13.3953, time_taken_in_seconds: 62
Epoch [1/1], Step [6677/13804], Loss: 2.7563, Perplexity: 15.7408, time_taken_in_seconds: 63
Epoch [1/1], Step [6678/13804], Loss: 2.8632, Perplexity: 17.5184, time_taken_in_seconds: 64
Epoch [1/1], Step [6679/13804], Loss: 2.9501, Perplexity: 19.1082, time_taken_in_seconds: 65
Epoch [1/1], Step [6680/13804], Loss: 2.6324, Perplexity: 13.9077, time_taken_in_seconds: 65
Epoch [1/1], Step [6681/13804], Loss: 2.9113, Perplexity: 18.3810, time_taken_in_seconds: 66
Epoch [1/1], Step [6682/13804], Loss: 2.5350, Perplexity: 12.6164, time_taken_in_seconds: 67
Epoch [1/1], Step [6683/13804], Loss: 2.4072, Perplexity: 11.1032, time_taken_in_seconds: 68
Epoch [1/1], Step [6684/13804], Loss: 2.4489, Perplexity: 11.5760, time_taken_in_seconds: 69
Epoch [1/1], Step [6685/13804], Loss: 2.7073, Perplexity: 14.9885, time_taken_in_seconds: 69
Epoch [1/1], Step [6686/13804], Loss: 3.0607, Perplexity: 21.3417, time_taken_in_seconds: 70
Epoch [1/1], Step [6687/13804], Loss: 2.9888, Perplexity: 19.8613, time_taken_in_seconds: 71
Epoch [1/1], Step [6688/13804], Loss: 2.9880, Perplexity: 19.8466, time_taken_in_seconds: 72
Epoch [1/1], Step [6689/13804], Loss: 2.5870, Perplexity: 13.2899, time_taken_in_seconds: 73
Epoch [1/1], Step [6690/13804], Loss: 2.7105, Perplexity: 15.0375, time_taken_in_seconds: 74
Epoch [1/1], Step [6691/13804], Loss: 2.7125, Perplexity: 15.0663, time_taken_in_seconds: 74
Epoch [1/1], Step [6692/13804], Loss: 3.1283, Perplexity: 22.8349, time_taken_in_seconds: 75
Epoch [1/1], Step [6693/13804], Loss: 2.5967, Perplexity: 13.4191, time_taken_in_seconds: 76
Epoch [1/1], Step [6694/13804], Loss: 2.4263, Perplexity: 11.3173, time_taken_in_seconds: 77
Epoch [1/1], Step [6695/13804], Loss: 3.5826, Perplexity: 35.9673, time_taken_in_seconds: 78
Epoch [1/1], Step [6696/13804], Loss: 2.5635, Perplexity: 12.9805, time_taken_in_seconds: 79
Epoch [1/1], Step [6697/13804], Loss: 2.7798, Perplexity: 16.1164, time_taken_in_seconds: 79
Epoch [1/1], Step [6698/13804], Loss: 3.0654, Perplexity: 21.4427, time_taken_in_seconds: 80
Epoch [1/1], Step [6699/13804], Loss: 2.6772, Perplexity: 14.5437, time_taken_in_seconds: 81
Epoch [1/1], Step [6700/13804], Loss: 2.6688, Perplexity: 14.4230, time_taken_in_seconds: 82
Epoch [1/1], Step [6701/13804], Loss: 3.7068, Perplexity: 40.7243, time_taken_in_seconds: 0
Epoch [1/1], Step [6702/13804], Loss: 2.5730, Perplexity: 13.1049, time_taken_in_seconds: 1
Epoch [1/1], Step [6703/13804], Loss: 2.5512, Perplexity: 12.8229, time_taken_in_seconds: 2
Epoch [1/1], Step [6704/13804], Loss: 2.7198, Perplexity: 15.1776, time_taken_in_seconds: 3
Epoch [1/1], Step [6705/13804], Loss: 2.8502, Perplexity: 17.2911, time_taken_in_seconds: 4
Epoch [1/1], Step [6706/13804], Loss: 4.7102, Perplexity: 111.0771, time_taken_in_seconds: 4
Epoch [1/1], Step [6707/13804], Loss: 2.7432, Perplexity: 15.5359, time_taken_in_seconds: 5
Epoch [1/1], Step [6708/13804], Loss: 2.7177, Perplexity: 15.1450, time_taken_in_seconds: 6
Epoch [1/1], Step [6709/13804], Loss: 2.6893, Perplexity: 14.7218, time_taken_in_seconds: 7
Epoch [1/1], Step [6710/13804], Loss: 2.2862, Perplexity: 9.8371, time_taken_in_seconds: 8
Epoch [1/1], Step [6711/13804], Loss: 2.9164, Perplexity: 18.4750, time_taken_in_seconds: 9
Epoch [1/1], Step [6712/13804], Loss: 2.6079, Perplexity: 13.5710, time_taken_in_seconds: 9
Epoch [1/1], Step [6713/13804], Loss: 2.7228, Perplexity: 15.2231, time_taken_in_seconds: 10
Epoch [1/1], Step [6714/13804], Loss: 2.6993, Perplexity: 14.8699, time_taken_in_seconds: 11
Epoch [1/1], Step [6715/13804], Loss: 2.9870, Perplexity: 19.8256, time_taken_in_seconds: 12
Epoch [1/1], Step [6716/13804], Loss: 2.4343, Perplexity: 11.4073, time_taken_in_seconds: 13
Epoch [1/1], Step [6717/13804], Loss: 2.4486, Perplexity: 11.5718, time_taken_in_seconds: 14
Epoch [1/1], Step [6718/13804], Loss: 2.5748, Perplexity: 13.1290, time_taken_in_seconds: 14
Epoch [1/1], Step [6719/13804], Loss: 2.4121, Perplexity: 11.1569, time_taken_in_seconds: 15
Epoch [1/1], Step [6720/13804], Loss: 2.6839, Perplexity: 14.6416, time_taken_in_seconds: 16
Epoch [1/1], Step [6721/13804], Loss: 2.4790, Perplexity: 11.9296, time_taken_in_seconds: 17
Epoch [1/1], Step [6722/13804], Loss: 2.3250, Perplexity: 10.2264, time_taken_in_seconds: 18
Epoch [1/1], Step [6723/13804], Loss: 2.6720, Perplexity: 14.4696, time_taken_in_seconds: 18
Epoch [1/1], Step [6724/13804], Loss: 2.8259, Perplexity: 16.8754, time_taken_in_seconds: 19
Epoch [1/1], Step [6725/13804], Loss: 2.6530, Perplexity: 14.1971, time_taken_in_seconds: 20
Epoch [1/1], Step [6726/13804], Loss: 2.5164, Perplexity: 12.3840, time_taken_in_seconds: 21
Epoch [1/1], Step [6727/13804], Loss: 2.6461, Perplexity: 14.0993, time_taken_in_seconds: 22
Epoch [1/1], Step [6728/13804], Loss: 2.4983, Perplexity: 12.1622, time_taken_in_seconds: 22
Epoch [1/1], Step [6729/13804], Loss: 2.8556, Perplexity: 17.3843, time_taken_in_seconds: 23
Epoch [1/1], Step [6730/13804], Loss: 2.6686, Perplexity: 14.4201, time_taken_in_seconds: 24
Epoch [1/1], Step [6731/13804], Loss: 2.6886, Perplexity: 14.7113, time_taken_in_seconds: 25
Epoch [1/1], Step [6732/13804], Loss: 2.5615, Perplexity: 12.9548, time_taken_in_seconds: 26
Epoch [1/1], Step [6733/13804], Loss: 3.0516, Perplexity: 21.1500, time_taken_in_seconds: 27
Epoch [1/1], Step [6734/13804], Loss: 2.7463, Perplexity: 15.5851, time_taken_in_seconds: 27
Epoch [1/1], Step [6735/13804], Loss: 2.5372, Perplexity: 12.6441, time_taken_in_seconds: 28
Epoch [1/1], Step [6736/13804], Loss: 2.8550, Perplexity: 17.3738, time_taken_in_seconds: 29
Epoch [1/1], Step [6737/13804], Loss: 2.6580, Perplexity: 14.2677, time_taken_in_seconds: 30
Epoch [1/1], Step [6738/13804], Loss: 2.4828, Perplexity: 11.9746, time_taken_in_seconds: 31
Epoch [1/1], Step [6739/13804], Loss: 2.6518, Perplexity: 14.1794, time_taken_in_seconds: 32
Epoch [1/1], Step [6740/13804], Loss: 2.6976, Perplexity: 14.8437, time_taken_in_seconds: 32
Epoch [1/1], Step [6741/13804], Loss: 2.7514, Perplexity: 15.6641, time_taken_in_seconds: 33
Epoch [1/1], Step [6742/13804], Loss: 2.5095, Perplexity: 12.2983, time_taken_in_seconds: 34
Epoch [1/1], Step [6743/13804], Loss: 2.7083, Perplexity: 15.0031, time_taken_in_seconds: 35
Epoch [1/1], Step [6744/13804], Loss: 2.6973, Perplexity: 14.8390, time_taken_in_seconds: 36
Epoch [1/1], Step [6745/13804], Loss: 3.1114, Perplexity: 22.4517, time_taken_in_seconds: 36
Epoch [1/1], Step [6746/13804], Loss: 2.2399, Perplexity: 9.3922, time_taken_in_seconds: 37
Epoch [1/1], Step [6747/13804], Loss: 3.7834, Perplexity: 43.9657, time_taken_in_seconds: 38
Epoch [1/1], Step [6748/13804], Loss: 2.8582, Perplexity: 17.4305, time_taken_in_seconds: 39
Epoch [1/1], Step [6749/13804], Loss: 2.3657, Perplexity: 10.6511, time_taken_in_seconds: 40
Epoch [1/1], Step [6750/13804], Loss: 2.4499, Perplexity: 11.5874, time_taken_in_seconds: 41
Epoch [1/1], Step [6751/13804], Loss: 2.5440, Perplexity: 12.7300, time_taken_in_seconds: 42
Epoch [1/1], Step [6752/13804], Loss: 2.1156, Perplexity: 8.2943, time_taken_in_seconds: 42
Epoch [1/1], Step [6753/13804], Loss: 2.9641, Perplexity: 19.3766, time_taken_in_seconds: 43
Epoch [1/1], Step [6754/13804], Loss: 2.3478, Perplexity: 10.4628, time_taken_in_seconds: 44
Epoch [1/1], Step [6755/13804], Loss: 3.1439, Perplexity: 23.1952, time_taken_in_seconds: 45
Epoch [1/1], Step [6756/13804], Loss: 2.3809, Perplexity: 10.8141, time_taken_in_seconds: 46
Epoch [1/1], Step [6757/13804], Loss: 2.4780, Perplexity: 11.9176, time_taken_in_seconds: 47
Epoch [1/1], Step [6758/13804], Loss: 2.9714, Perplexity: 19.5189, time_taken_in_seconds: 47
Epoch [1/1], Step [6759/13804], Loss: 2.6049, Perplexity: 13.5299, time_taken_in_seconds: 48
Epoch [1/1], Step [6760/13804], Loss: 2.6531, Perplexity: 14.1973, time_taken_in_seconds: 49
Epoch [1/1], Step [6761/13804], Loss: 2.6356, Perplexity: 13.9517, time_taken_in_seconds: 50
Epoch [1/1], Step [6762/13804], Loss: 2.8266, Perplexity: 16.8888, time_taken_in_seconds: 51
Epoch [1/1], Step [6763/13804], Loss: 2.9900, Perplexity: 19.8865, time_taken_in_seconds: 52
Epoch [1/1], Step [6764/13804], Loss: 2.8269, Perplexity: 16.8927, time_taken_in_seconds: 52
Epoch [1/1], Step [6765/13804], Loss: 2.9776, Perplexity: 19.6412, time_taken_in_seconds: 53
Epoch [1/1], Step [6766/13804], Loss: 2.3379, Perplexity: 10.3591, time_taken_in_seconds: 54
Epoch [1/1], Step [6767/13804], Loss: 3.1473, Perplexity: 23.2737, time_taken_in_seconds: 55
Epoch [1/1], Step [6768/13804], Loss: 2.6397, Perplexity: 14.0092, time_taken_in_seconds: 56
Epoch [1/1], Step [6769/13804], Loss: 2.3682, Perplexity: 10.6782, time_taken_in_seconds: 57
Epoch [1/1], Step [6770/13804], Loss: 2.6640, Perplexity: 14.3536, time_taken_in_seconds: 57
Epoch [1/1], Step [6771/13804], Loss: 2.7934, Perplexity: 16.3360, time_taken_in_seconds: 58
Epoch [1/1], Step [6772/13804], Loss: 2.7000, Perplexity: 14.8794, time_taken_in_seconds: 59
Epoch [1/1], Step [6773/13804], Loss: 2.6039, Perplexity: 13.5169, time_taken_in_seconds: 60
Epoch [1/1], Step [6774/13804], Loss: 2.6493, Perplexity: 14.1438, time_taken_in_seconds: 61
Epoch [1/1], Step [6775/13804], Loss: 2.8266, Perplexity: 16.8887, time_taken_in_seconds: 61
Epoch [1/1], Step [6776/13804], Loss: 3.0959, Perplexity: 22.1071, time_taken_in_seconds: 62
Epoch [1/1], Step [6777/13804], Loss: 2.5799, Perplexity: 13.1961, time_taken_in_seconds: 63
Epoch [1/1], Step [6778/13804], Loss: 2.8923, Perplexity: 18.0343, time_taken_in_seconds: 64
Epoch [1/1], Step [6779/13804], Loss: 2.8285, Perplexity: 16.9202, time_taken_in_seconds: 65
Epoch [1/1], Step [6780/13804], Loss: 2.8002, Perplexity: 16.4481, time_taken_in_seconds: 66
Epoch [1/1], Step [6781/13804], Loss: 2.5462, Perplexity: 12.7583, time_taken_in_seconds: 66
Epoch [1/1], Step [6782/13804], Loss: 3.4082, Perplexity: 30.2116, time_taken_in_seconds: 67
Epoch [1/1], Step [6783/13804], Loss: 2.4908, Perplexity: 12.0708, time_taken_in_seconds: 68
Epoch [1/1], Step [6784/13804], Loss: 2.7790, Perplexity: 16.1037, time_taken_in_seconds: 69
Epoch [1/1], Step [6785/13804], Loss: 2.7501, Perplexity: 15.6440, time_taken_in_seconds: 70
Epoch [1/1], Step [6786/13804], Loss: 2.6046, Perplexity: 13.5253, time_taken_in_seconds: 70
Epoch [1/1], Step [6787/13804], Loss: 2.5985, Perplexity: 13.4431, time_taken_in_seconds: 71
Epoch [1/1], Step [6788/13804], Loss: 2.3883, Perplexity: 10.8953, time_taken_in_seconds: 72
Epoch [1/1], Step [6789/13804], Loss: 2.7379, Perplexity: 15.4542, time_taken_in_seconds: 73
Epoch [1/1], Step [6790/13804], Loss: 2.9061, Perplexity: 18.2850, time_taken_in_seconds: 74
Epoch [1/1], Step [6791/13804], Loss: 2.5388, Perplexity: 12.6644, time_taken_in_seconds: 75
Epoch [1/1], Step [6792/13804], Loss: 3.6811, Perplexity: 39.6883, time_taken_in_seconds: 75
Epoch [1/1], Step [6793/13804], Loss: 2.6400, Perplexity: 14.0134, time_taken_in_seconds: 76
Epoch [1/1], Step [6794/13804], Loss: 2.5409, Perplexity: 12.6912, time_taken_in_seconds: 77
Epoch [1/1], Step [6795/13804], Loss: 2.7848, Perplexity: 16.1966, time_taken_in_seconds: 78
Epoch [1/1], Step [6796/13804], Loss: 2.6408, Perplexity: 14.0249, time_taken_in_seconds: 79
Epoch [1/1], Step [6797/13804], Loss: 2.6488, Perplexity: 14.1366, time_taken_in_seconds: 80
Epoch [1/1], Step [6798/13804], Loss: 2.3131, Perplexity: 10.1053, time_taken_in_seconds: 80
Epoch [1/1], Step [6799/13804], Loss: 2.8220, Perplexity: 16.8107, time_taken_in_seconds: 81
Epoch [1/1], Step [6800/13804], Loss: 2.5171, Perplexity: 12.3924, time_taken_in_seconds: 82
Epoch [1/1], Step [6801/13804], Loss: 2.6138, Perplexity: 13.6505, time_taken_in_seconds: 0
Epoch [1/1], Step [6802/13804], Loss: 2.5003, Perplexity: 12.1859, time_taken_in_seconds: 1
Epoch [1/1], Step [6803/13804], Loss: 2.8994, Perplexity: 18.1637, time_taken_in_seconds: 2
Epoch [1/1], Step [6804/13804], Loss: 2.4746, Perplexity: 11.8771, time_taken_in_seconds: 3
Epoch [1/1], Step [6805/13804], Loss: 2.5215, Perplexity: 12.4471, time_taken_in_seconds: 4
Epoch [1/1], Step [6806/13804], Loss: 2.5368, Perplexity: 12.6391, time_taken_in_seconds: 4
Epoch [1/1], Step [6807/13804], Loss: 2.5985, Perplexity: 13.4440, time_taken_in_seconds: 5
Epoch [1/1], Step [6808/13804], Loss: 3.2839, Perplexity: 26.6801, time_taken_in_seconds: 6
Epoch [1/1], Step [6809/13804], Loss: 2.4048, Perplexity: 11.0759, time_taken_in_seconds: 7
Epoch [1/1], Step [6810/13804], Loss: 2.5258, Perplexity: 12.5008, time_taken_in_seconds: 8
Epoch [1/1], Step [6811/13804], Loss: 2.1903, Perplexity: 8.9379, time_taken_in_seconds: 9
Epoch [1/1], Step [6812/13804], Loss: 2.7182, Perplexity: 15.1523, time_taken_in_seconds: 9
Epoch [1/1], Step [6813/13804], Loss: 2.6767, Perplexity: 14.5373, time_taken_in_seconds: 10
Epoch [1/1], Step [6814/13804], Loss: 2.5642, Perplexity: 12.9899, time_taken_in_seconds: 11
Epoch [1/1], Step [6815/13804], Loss: 2.8148, Perplexity: 16.6897, time_taken_in_seconds: 12
Epoch [1/1], Step [6816/13804], Loss: 2.5980, Perplexity: 13.4368, time_taken_in_seconds: 13
Epoch [1/1], Step [6817/13804], Loss: 2.4286, Perplexity: 11.3433, time_taken_in_seconds: 14
Epoch [1/1], Step [6818/13804], Loss: 2.6594, Perplexity: 14.2877, time_taken_in_seconds: 14
Epoch [1/1], Step [6819/13804], Loss: 2.5666, Perplexity: 13.0211, time_taken_in_seconds: 15
Epoch [1/1], Step [6820/13804], Loss: 2.9767, Perplexity: 19.6225, time_taken_in_seconds: 16
Epoch [1/1], Step [6821/13804], Loss: 2.7145, Perplexity: 15.0974, time_taken_in_seconds: 17
Epoch [1/1], Step [6822/13804], Loss: 2.9675, Perplexity: 19.4436, time_taken_in_seconds: 18
Epoch [1/1], Step [6823/13804], Loss: 2.4790, Perplexity: 11.9296, time_taken_in_seconds: 18
Epoch [1/1], Step [6824/13804], Loss: 2.8619, Perplexity: 17.4948, time_taken_in_seconds: 19
Epoch [1/1], Step [6825/13804], Loss: 2.8447, Perplexity: 17.1967, time_taken_in_seconds: 20
Epoch [1/1], Step [6826/13804], Loss: 2.6224, Perplexity: 13.7683, time_taken_in_seconds: 21
Epoch [1/1], Step [6827/13804], Loss: 2.5389, Perplexity: 12.6663, time_taken_in_seconds: 22
Epoch [1/1], Step [6828/13804], Loss: 2.4563, Perplexity: 11.6611, time_taken_in_seconds: 23
Epoch [1/1], Step [6829/13804], Loss: 2.5828, Perplexity: 13.2346, time_taken_in_seconds: 24
Epoch [1/1], Step [6830/13804], Loss: 2.9470, Perplexity: 19.0480, time_taken_in_seconds: 24
Epoch [1/1], Step [6831/13804], Loss: 2.5375, Perplexity: 12.6475, time_taken_in_seconds: 25
Epoch [1/1], Step [6832/13804], Loss: 2.5612, Perplexity: 12.9509, time_taken_in_seconds: 26
Epoch [1/1], Step [6833/13804], Loss: 2.4738, Perplexity: 11.8673, time_taken_in_seconds: 27
Epoch [1/1], Step [6834/13804], Loss: 3.0511, Perplexity: 21.1395, time_taken_in_seconds: 28
Epoch [1/1], Step [6835/13804], Loss: 2.6240, Perplexity: 13.7902, time_taken_in_seconds: 29
Epoch [1/1], Step [6836/13804], Loss: 3.4699, Perplexity: 32.1335, time_taken_in_seconds: 29
Epoch [1/1], Step [6837/13804], Loss: 2.6044, Perplexity: 13.5226, time_taken_in_seconds: 30
Epoch [1/1], Step [6838/13804], Loss: 2.5582, Perplexity: 12.9126, time_taken_in_seconds: 31
Epoch [1/1], Step [6839/13804], Loss: 2.8807, Perplexity: 17.8276, time_taken_in_seconds: 32
Epoch [1/1], Step [6840/13804], Loss: 2.3354, Perplexity: 10.3332, time_taken_in_seconds: 33
Epoch [1/1], Step [6841/13804], Loss: 2.4637, Perplexity: 11.7487, time_taken_in_seconds: 33
Epoch [1/1], Step [6842/13804], Loss: 2.4174, Perplexity: 11.2167, time_taken_in_seconds: 34
Epoch [1/1], Step [6843/13804], Loss: 2.5060, Perplexity: 12.2559, time_taken_in_seconds: 35
Epoch [1/1], Step [6844/13804], Loss: 2.7857, Perplexity: 16.2112, time_taken_in_seconds: 36
Epoch [1/1], Step [6845/13804], Loss: 2.1556, Perplexity: 8.6334, time_taken_in_seconds: 37
Epoch [1/1], Step [6846/13804], Loss: 2.6200, Perplexity: 13.7355, time_taken_in_seconds: 38
Epoch [1/1], Step [6847/13804], Loss: 2.5592, Perplexity: 12.9256, time_taken_in_seconds: 38
Epoch [1/1], Step [6848/13804], Loss: 2.7886, Perplexity: 16.2579, time_taken_in_seconds: 39
Epoch [1/1], Step [6849/13804], Loss: 2.8682, Perplexity: 17.6058, time_taken_in_seconds: 40
Epoch [1/1], Step [6850/13804], Loss: 2.7833, Perplexity: 16.1718, time_taken_in_seconds: 41
Epoch [1/1], Step [6851/13804], Loss: 3.0728, Perplexity: 21.6028, time_taken_in_seconds: 42
Epoch [1/1], Step [6852/13804], Loss: 2.5809, Perplexity: 13.2088, time_taken_in_seconds: 43
Epoch [1/1], Step [6853/13804], Loss: 2.6357, Perplexity: 13.9531, time_taken_in_seconds: 43
Epoch [1/1], Step [6854/13804], Loss: 2.5881, Perplexity: 13.3044, time_taken_in_seconds: 44
Epoch [1/1], Step [6855/13804], Loss: 2.5751, Perplexity: 13.1320, time_taken_in_seconds: 45
Epoch [1/1], Step [6856/13804], Loss: 2.2218, Perplexity: 9.2241, time_taken_in_seconds: 46
Epoch [1/1], Step [6857/13804], Loss: 2.7399, Perplexity: 15.4854, time_taken_in_seconds: 47
Epoch [1/1], Step [6858/13804], Loss: 2.7801, Perplexity: 16.1200, time_taken_in_seconds: 48
Epoch [1/1], Step [6859/13804], Loss: 2.6603, Perplexity: 14.3012, time_taken_in_seconds: 48
Epoch [1/1], Step [6860/13804], Loss: 2.4016, Perplexity: 11.0407, time_taken_in_seconds: 49
Epoch [1/1], Step [6861/13804], Loss: 2.5783, Perplexity: 13.1748, time_taken_in_seconds: 50
Epoch [1/1], Step [6862/13804], Loss: 2.5516, Perplexity: 12.8270, time_taken_in_seconds: 51
Epoch [1/1], Step [6863/13804], Loss: 2.5541, Perplexity: 12.8592, time_taken_in_seconds: 52
Epoch [1/1], Step [6864/13804], Loss: 2.4311, Perplexity: 11.3719, time_taken_in_seconds: 52
Epoch [1/1], Step [6865/13804], Loss: 2.7177, Perplexity: 15.1456, time_taken_in_seconds: 53
Epoch [1/1], Step [6866/13804], Loss: 2.5032, Perplexity: 12.2210, time_taken_in_seconds: 54
Epoch [1/1], Step [6867/13804], Loss: 2.5003, Perplexity: 12.1865, time_taken_in_seconds: 55
Epoch [1/1], Step [6868/13804], Loss: 2.5929, Perplexity: 13.3683, time_taken_in_seconds: 56
Epoch [1/1], Step [6869/13804], Loss: 3.0651, Perplexity: 21.4365, time_taken_in_seconds: 57
Epoch [1/1], Step [6870/13804], Loss: 2.5842, Perplexity: 13.2525, time_taken_in_seconds: 57
Epoch [1/1], Step [6871/13804], Loss: 2.7398, Perplexity: 15.4838, time_taken_in_seconds: 58
Epoch [1/1], Step [6872/13804], Loss: 2.8609, Perplexity: 17.4778, time_taken_in_seconds: 59
Epoch [1/1], Step [6873/13804], Loss: 2.6308, Perplexity: 13.8850, time_taken_in_seconds: 60
Epoch [1/1], Step [6874/13804], Loss: 2.5320, Perplexity: 12.5785, time_taken_in_seconds: 61
Epoch [1/1], Step [6875/13804], Loss: 2.6486, Perplexity: 14.1339, time_taken_in_seconds: 62
Epoch [1/1], Step [6876/13804], Loss: 3.4997, Perplexity: 33.1051, time_taken_in_seconds: 62
Epoch [1/1], Step [6877/13804], Loss: 2.6639, Perplexity: 14.3526, time_taken_in_seconds: 63
Epoch [1/1], Step [6878/13804], Loss: 2.5784, Perplexity: 13.1755, time_taken_in_seconds: 64
Epoch [1/1], Step [6879/13804], Loss: 2.4155, Perplexity: 11.1953, time_taken_in_seconds: 65
Epoch [1/1], Step [6880/13804], Loss: 2.7382, Perplexity: 15.4597, time_taken_in_seconds: 66
Epoch [1/1], Step [6881/13804], Loss: 2.5694, Perplexity: 13.0580, time_taken_in_seconds: 66
Epoch [1/1], Step [6882/13804], Loss: 2.3391, Perplexity: 10.3722, time_taken_in_seconds: 67
Epoch [1/1], Step [6883/13804], Loss: 2.6316, Perplexity: 13.8962, time_taken_in_seconds: 68
Epoch [1/1], Step [6884/13804], Loss: 3.3349, Perplexity: 28.0769, time_taken_in_seconds: 69
Epoch [1/1], Step [6885/13804], Loss: 2.6363, Perplexity: 13.9610, time_taken_in_seconds: 70
Epoch [1/1], Step [6886/13804], Loss: 2.6271, Perplexity: 13.8335, time_taken_in_seconds: 71
Epoch [1/1], Step [6887/13804], Loss: 3.0898, Perplexity: 21.9722, time_taken_in_seconds: 71
Epoch [1/1], Step [6888/13804], Loss: 3.0692, Perplexity: 21.5245, time_taken_in_seconds: 72
Epoch [1/1], Step [6889/13804], Loss: 3.5236, Perplexity: 33.9078, time_taken_in_seconds: 73
Epoch [1/1], Step [6890/13804], Loss: 2.2143, Perplexity: 9.1546, time_taken_in_seconds: 74
Epoch [1/1], Step [6891/13804], Loss: 2.5966, Perplexity: 13.4180, time_taken_in_seconds: 75
Epoch [1/1], Step [6892/13804], Loss: 2.5482, Perplexity: 12.7841, time_taken_in_seconds: 76
Epoch [1/1], Step [6893/13804], Loss: 3.0678, Perplexity: 21.4956, time_taken_in_seconds: 76
Epoch [1/1], Step [6894/13804], Loss: 2.6155, Perplexity: 13.6738, time_taken_in_seconds: 77
Epoch [1/1], Step [6895/13804], Loss: 2.6463, Perplexity: 14.1013, time_taken_in_seconds: 78
Epoch [1/1], Step [6896/13804], Loss: 3.0559, Perplexity: 21.2402, time_taken_in_seconds: 79
Epoch [1/1], Step [6897/13804], Loss: 2.7796, Perplexity: 16.1119, time_taken_in_seconds: 80
Epoch [1/1], Step [6898/13804], Loss: 2.7250, Perplexity: 15.2563, time_taken_in_seconds: 81
Epoch [1/1], Step [6899/13804], Loss: 2.8405, Perplexity: 17.1241, time_taken_in_seconds: 82
Epoch [1/1], Step [6900/13804], Loss: 2.8332, Perplexity: 16.9991, time_taken_in_seconds: 83
Epoch [1/1], Step [6901/13804], Loss: 2.9083, Perplexity: 18.3248, time_taken_in_seconds: 0
Epoch [1/1], Step [6902/13804], Loss: 2.7091, Perplexity: 15.0152, time_taken_in_seconds: 1
Epoch [1/1], Step [6903/13804], Loss: 2.4858, Perplexity: 12.0106, time_taken_in_seconds: 2
Epoch [1/1], Step [6904/13804], Loss: 2.5650, Perplexity: 13.0009, time_taken_in_seconds: 3
Epoch [1/1], Step [6905/13804], Loss: 3.0283, Perplexity: 20.6626, time_taken_in_seconds: 4
Epoch [1/1], Step [6906/13804], Loss: 2.7380, Perplexity: 15.4554, time_taken_in_seconds: 4
Epoch [1/1], Step [6907/13804], Loss: 2.6227, Perplexity: 13.7732, time_taken_in_seconds: 5
Epoch [1/1], Step [6908/13804], Loss: 2.4118, Perplexity: 11.1537, time_taken_in_seconds: 6
Epoch [1/1], Step [6909/13804], Loss: 2.5278, Perplexity: 12.5260, time_taken_in_seconds: 7
Epoch [1/1], Step [6910/13804], Loss: 2.5025, Perplexity: 12.2125, time_taken_in_seconds: 8
Epoch [1/1], Step [6911/13804], Loss: 2.5243, Perplexity: 12.4816, time_taken_in_seconds: 9
Epoch [1/1], Step [6912/13804], Loss: 2.7641, Perplexity: 15.8655, time_taken_in_seconds: 9
Epoch [1/1], Step [6913/13804], Loss: 2.4741, Perplexity: 11.8709, time_taken_in_seconds: 10
Epoch [1/1], Step [6914/13804], Loss: 2.6299, Perplexity: 13.8725, time_taken_in_seconds: 11
Epoch [1/1], Step [6915/13804], Loss: 2.5558, Perplexity: 12.8811, time_taken_in_seconds: 12
Epoch [1/1], Step [6916/13804], Loss: 2.5578, Perplexity: 12.9080, time_taken_in_seconds: 13
Epoch [1/1], Step [6917/13804], Loss: 2.5460, Perplexity: 12.7562, time_taken_in_seconds: 13
Epoch [1/1], Step [6918/13804], Loss: 2.5735, Perplexity: 13.1118, time_taken_in_seconds: 14
Epoch [1/1], Step [6919/13804], Loss: 3.1135, Perplexity: 22.4994, time_taken_in_seconds: 15
Epoch [1/1], Step [6920/13804], Loss: 2.3954, Perplexity: 10.9727, time_taken_in_seconds: 16
Epoch [1/1], Step [6921/13804], Loss: 2.4630, Perplexity: 11.7400, time_taken_in_seconds: 17
Epoch [1/1], Step [6922/13804], Loss: 2.7941, Perplexity: 16.3476, time_taken_in_seconds: 18
Epoch [1/1], Step [6923/13804], Loss: 2.9935, Perplexity: 19.9544, time_taken_in_seconds: 18
Epoch [1/1], Step [6924/13804], Loss: 2.9942, Perplexity: 19.9686, time_taken_in_seconds: 19
Epoch [1/1], Step [6925/13804], Loss: 2.8440, Perplexity: 17.1840, time_taken_in_seconds: 20
Epoch [1/1], Step [6926/13804], Loss: 3.1533, Perplexity: 23.4127, time_taken_in_seconds: 21
Epoch [1/1], Step [6927/13804], Loss: 2.5193, Perplexity: 12.4199, time_taken_in_seconds: 22
Epoch [1/1], Step [6928/13804], Loss: 2.7046, Perplexity: 14.9484, time_taken_in_seconds: 23
Epoch [1/1], Step [6929/13804], Loss: 2.8727, Perplexity: 17.6843, time_taken_in_seconds: 23
Epoch [1/1], Step [6930/13804], Loss: 2.5723, Perplexity: 13.0964, time_taken_in_seconds: 24
Epoch [1/1], Step [6931/13804], Loss: 2.3389, Perplexity: 10.3700, time_taken_in_seconds: 25
Epoch [1/1], Step [6932/13804], Loss: 2.5728, Perplexity: 13.1020, time_taken_in_seconds: 26
Epoch [1/1], Step [6933/13804], Loss: 2.8013, Perplexity: 16.4665, time_taken_in_seconds: 27
Epoch [1/1], Step [6934/13804], Loss: 2.4791, Perplexity: 11.9301, time_taken_in_seconds: 27
Epoch [1/1], Step [6935/13804], Loss: 2.4922, Perplexity: 12.0879, time_taken_in_seconds: 28
Epoch [1/1], Step [6936/13804], Loss: 2.8129, Perplexity: 16.6590, time_taken_in_seconds: 29
Epoch [1/1], Step [6937/13804], Loss: 4.0436, Perplexity: 57.0310, time_taken_in_seconds: 30
Epoch [1/1], Step [6938/13804], Loss: 2.6422, Perplexity: 14.0441, time_taken_in_seconds: 31
Epoch [1/1], Step [6939/13804], Loss: 2.4280, Perplexity: 11.3367, time_taken_in_seconds: 32
Epoch [1/1], Step [6940/13804], Loss: 2.6289, Perplexity: 13.8592, time_taken_in_seconds: 32
Epoch [1/1], Step [6941/13804], Loss: 2.7899, Perplexity: 16.2798, time_taken_in_seconds: 33
Epoch [1/1], Step [6942/13804], Loss: 3.4415, Perplexity: 31.2344, time_taken_in_seconds: 34
Epoch [1/1], Step [6943/13804], Loss: 2.9215, Perplexity: 18.5694, time_taken_in_seconds: 35
Epoch [1/1], Step [6944/13804], Loss: 3.1031, Perplexity: 22.2662, time_taken_in_seconds: 36
Epoch [1/1], Step [6945/13804], Loss: 2.4903, Perplexity: 12.0651, time_taken_in_seconds: 37
Epoch [1/1], Step [6946/13804], Loss: 2.6958, Perplexity: 14.8177, time_taken_in_seconds: 37
Epoch [1/1], Step [6947/13804], Loss: 2.5239, Perplexity: 12.4774, time_taken_in_seconds: 38
Epoch [1/1], Step [6948/13804], Loss: 2.8578, Perplexity: 17.4239, time_taken_in_seconds: 39
Epoch [1/1], Step [6949/13804], Loss: 2.6064, Perplexity: 13.5503, time_taken_in_seconds: 40
Epoch [1/1], Step [6950/13804], Loss: 2.8837, Perplexity: 17.8802, time_taken_in_seconds: 41
Epoch [1/1], Step [6951/13804], Loss: 2.6926, Perplexity: 14.7698, time_taken_in_seconds: 41
Epoch [1/1], Step [6952/13804], Loss: 2.6143, Perplexity: 13.6574, time_taken_in_seconds: 42
Epoch [1/1], Step [6953/13804], Loss: 2.3729, Perplexity: 10.7279, time_taken_in_seconds: 43
Epoch [1/1], Step [6954/13804], Loss: 2.7667, Perplexity: 15.9056, time_taken_in_seconds: 44
Epoch [1/1], Step [6955/13804], Loss: 2.2114, Perplexity: 9.1289, time_taken_in_seconds: 45
Epoch [1/1], Step [6956/13804], Loss: 2.4296, Perplexity: 11.3547, time_taken_in_seconds: 46
Epoch [1/1], Step [6957/13804], Loss: 2.5964, Perplexity: 13.4157, time_taken_in_seconds: 46
Epoch [1/1], Step [6958/13804], Loss: 2.6570, Perplexity: 14.2537, time_taken_in_seconds: 47
Epoch [1/1], Step [6959/13804], Loss: 2.7621, Perplexity: 15.8335, time_taken_in_seconds: 48
Epoch [1/1], Step [6960/13804], Loss: 3.1363, Perplexity: 23.0192, time_taken_in_seconds: 49
Epoch [1/1], Step [6961/13804], Loss: 3.1276, Perplexity: 22.8183, time_taken_in_seconds: 50
Epoch [1/1], Step [6962/13804], Loss: 2.2363, Perplexity: 9.3587, time_taken_in_seconds: 50
Epoch [1/1], Step [6963/13804], Loss: 2.6241, Perplexity: 13.7922, time_taken_in_seconds: 51
Epoch [1/1], Step [6964/13804], Loss: 2.8002, Perplexity: 16.4475, time_taken_in_seconds: 52
Epoch [1/1], Step [6965/13804], Loss: 2.8955, Perplexity: 18.0924, time_taken_in_seconds: 53
Epoch [1/1], Step [6966/13804], Loss: 2.6664, Perplexity: 14.3877, time_taken_in_seconds: 54
Epoch [1/1], Step [6967/13804], Loss: 2.8541, Perplexity: 17.3587, time_taken_in_seconds: 54
Epoch [1/1], Step [6968/13804], Loss: 2.3905, Perplexity: 10.9187, time_taken_in_seconds: 55
Epoch [1/1], Step [6969/13804], Loss: 2.5416, Perplexity: 12.6995, time_taken_in_seconds: 56
Epoch [1/1], Step [6970/13804], Loss: 2.5108, Perplexity: 12.3144, time_taken_in_seconds: 57
Epoch [1/1], Step [6971/13804], Loss: 2.5991, Perplexity: 13.4521, time_taken_in_seconds: 58
Epoch [1/1], Step [6972/13804], Loss: 2.5068, Perplexity: 12.2651, time_taken_in_seconds: 59
Epoch [1/1], Step [6973/13804], Loss: 2.4848, Perplexity: 11.9982, time_taken_in_seconds: 60
Epoch [1/1], Step [6974/13804], Loss: 2.7137, Perplexity: 15.0849, time_taken_in_seconds: 60
Epoch [1/1], Step [6975/13804], Loss: 2.8663, Perplexity: 17.5719, time_taken_in_seconds: 61
Epoch [1/1], Step [6976/13804], Loss: 3.1437, Perplexity: 23.1904, time_taken_in_seconds: 62
Epoch [1/1], Step [6977/13804], Loss: 2.3293, Perplexity: 10.2710, time_taken_in_seconds: 63
Epoch [1/1], Step [6978/13804], Loss: 2.3779, Perplexity: 10.7817, time_taken_in_seconds: 64
Epoch [1/1], Step [6979/13804], Loss: 2.7286, Perplexity: 15.3111, time_taken_in_seconds: 64
Epoch [1/1], Step [6980/13804], Loss: 2.5912, Perplexity: 13.3454, time_taken_in_seconds: 65
Epoch [1/1], Step [6981/13804], Loss: 2.8200, Perplexity: 16.7764, time_taken_in_seconds: 66
Epoch [1/1], Step [6982/13804], Loss: 2.6280, Perplexity: 13.8454, time_taken_in_seconds: 67
Epoch [1/1], Step [6983/13804], Loss: 2.6438, Perplexity: 14.0669, time_taken_in_seconds: 68
Epoch [1/1], Step [6984/13804], Loss: 2.2770, Perplexity: 9.7476, time_taken_in_seconds: 69
Epoch [1/1], Step [6985/13804], Loss: 2.9749, Perplexity: 19.5870, time_taken_in_seconds: 69
Epoch [1/1], Step [6986/13804], Loss: 2.7222, Perplexity: 15.2143, time_taken_in_seconds: 70
Epoch [1/1], Step [6987/13804], Loss: 2.6608, Perplexity: 14.3082, time_taken_in_seconds: 71
Epoch [1/1], Step [6988/13804], Loss: 2.3458, Perplexity: 10.4418, time_taken_in_seconds: 72
Epoch [1/1], Step [6989/13804], Loss: 3.0326, Perplexity: 20.7514, time_taken_in_seconds: 73
Epoch [1/1], Step [6990/13804], Loss: 2.7638, Perplexity: 15.8599, time_taken_in_seconds: 73
Epoch [1/1], Step [6991/13804], Loss: 2.5086, Perplexity: 12.2881, time_taken_in_seconds: 74
Epoch [1/1], Step [6992/13804], Loss: 2.3919, Perplexity: 10.9340, time_taken_in_seconds: 75
Epoch [1/1], Step [6993/13804], Loss: 2.7379, Perplexity: 15.4547, time_taken_in_seconds: 76
Epoch [1/1], Step [6994/13804], Loss: 2.7129, Perplexity: 15.0722, time_taken_in_seconds: 77
Epoch [1/1], Step [6995/13804], Loss: 2.5170, Perplexity: 12.3914, time_taken_in_seconds: 78
Epoch [1/1], Step [6996/13804], Loss: 2.4091, Perplexity: 11.1238, time_taken_in_seconds: 78
Epoch [1/1], Step [6997/13804], Loss: 2.5432, Perplexity: 12.7203, time_taken_in_seconds: 79
Epoch [1/1], Step [6998/13804], Loss: 2.7592, Perplexity: 15.7866, time_taken_in_seconds: 80
Epoch [1/1], Step [6999/13804], Loss: 2.5730, Perplexity: 13.1054, time_taken_in_seconds: 81
Epoch [1/1], Step [7000/13804], Loss: 2.5991, Perplexity: 13.4510, time_taken_in_seconds: 82
Epoch [1/1], Step [7001/13804], Loss: 2.8645, Perplexity: 17.5409, time_taken_in_seconds: 0
Epoch [1/1], Step [7002/13804], Loss: 2.6623, Perplexity: 14.3295, time_taken_in_seconds: 1
Epoch [1/1], Step [7003/13804], Loss: 2.4285, Perplexity: 11.3421, time_taken_in_seconds: 2
Epoch [1/1], Step [7004/13804], Loss: 2.9567, Perplexity: 19.2335, time_taken_in_seconds: 3
Epoch [1/1], Step [7005/13804], Loss: 2.9001, Perplexity: 18.1755, time_taken_in_seconds: 4
Epoch [1/1], Step [7006/13804], Loss: 2.8543, Perplexity: 17.3630, time_taken_in_seconds: 4
Epoch [1/1], Step [7007/13804], Loss: 2.7972, Perplexity: 16.3991, time_taken_in_seconds: 5
Epoch [1/1], Step [7008/13804], Loss: 2.2662, Perplexity: 9.6430, time_taken_in_seconds: 6
Epoch [1/1], Step [7009/13804], Loss: 2.5958, Perplexity: 13.4078, time_taken_in_seconds: 7
Epoch [1/1], Step [7010/13804], Loss: 2.7628, Perplexity: 15.8440, time_taken_in_seconds: 8
Epoch [1/1], Step [7011/13804], Loss: 2.9164, Perplexity: 18.4749, time_taken_in_seconds: 9
Epoch [1/1], Step [7012/13804], Loss: 3.0501, Perplexity: 21.1165, time_taken_in_seconds: 9
Epoch [1/1], Step [7013/13804], Loss: 2.5224, Perplexity: 12.4587, time_taken_in_seconds: 10
Epoch [1/1], Step [7014/13804], Loss: 2.7202, Perplexity: 15.1836, time_taken_in_seconds: 11
Epoch [1/1], Step [7015/13804], Loss: 2.7763, Perplexity: 16.0599, time_taken_in_seconds: 12
Epoch [1/1], Step [7016/13804], Loss: 2.9849, Perplexity: 19.7843, time_taken_in_seconds: 13
Epoch [1/1], Step [7017/13804], Loss: 3.1239, Perplexity: 22.7347, time_taken_in_seconds: 13
Epoch [1/1], Step [7018/13804], Loss: 2.6290, Perplexity: 13.8600, time_taken_in_seconds: 14
Epoch [1/1], Step [7019/13804], Loss: 2.6541, Perplexity: 14.2128, time_taken_in_seconds: 15
Epoch [1/1], Step [7020/13804], Loss: 2.8308, Perplexity: 16.9590, time_taken_in_seconds: 16
Epoch [1/1], Step [7021/13804], Loss: 2.5523, Perplexity: 12.8369, time_taken_in_seconds: 17
Epoch [1/1], Step [7022/13804], Loss: 2.8848, Perplexity: 17.9008, time_taken_in_seconds: 18
Epoch [1/1], Step [7023/13804], Loss: 2.6260, Perplexity: 13.8185, time_taken_in_seconds: 18
Epoch [1/1], Step [7024/13804], Loss: 2.4520, Perplexity: 11.6118, time_taken_in_seconds: 19
Epoch [1/1], Step [7025/13804], Loss: 2.8275, Perplexity: 16.9025, time_taken_in_seconds: 20
Epoch [1/1], Step [7026/13804], Loss: 2.3251, Perplexity: 10.2277, time_taken_in_seconds: 21
Epoch [1/1], Step [7027/13804], Loss: 3.7077, Perplexity: 40.7613, time_taken_in_seconds: 22
Epoch [1/1], Step [7028/13804], Loss: 2.6829, Perplexity: 14.6279, time_taken_in_seconds: 23
Epoch [1/1], Step [7029/13804], Loss: 2.3526, Perplexity: 10.5133, time_taken_in_seconds: 23
Epoch [1/1], Step [7030/13804], Loss: 2.6658, Perplexity: 14.3787, time_taken_in_seconds: 24
Epoch [1/1], Step [7031/13804], Loss: 2.5074, Perplexity: 12.2730, time_taken_in_seconds: 25
Epoch [1/1], Step [7032/13804], Loss: 2.5483, Perplexity: 12.7858, time_taken_in_seconds: 26
Epoch [1/1], Step [7033/13804], Loss: 2.3405, Perplexity: 10.3860, time_taken_in_seconds: 27
Epoch [1/1], Step [7034/13804], Loss: 2.4627, Perplexity: 11.7367, time_taken_in_seconds: 27
Epoch [1/1], Step [7035/13804], Loss: 2.4717, Perplexity: 11.8427, time_taken_in_seconds: 28
Epoch [1/1], Step [7036/13804], Loss: 2.7853, Perplexity: 16.2039, time_taken_in_seconds: 29
Epoch [1/1], Step [7037/13804], Loss: 2.8462, Perplexity: 17.2218, time_taken_in_seconds: 30
Epoch [1/1], Step [7038/13804], Loss: 2.8865, Perplexity: 17.9304, time_taken_in_seconds: 31
Epoch [1/1], Step [7039/13804], Loss: 2.5455, Perplexity: 12.7500, time_taken_in_seconds: 32
Epoch [1/1], Step [7040/13804], Loss: 2.5730, Perplexity: 13.1051, time_taken_in_seconds: 32
Epoch [1/1], Step [7041/13804], Loss: 2.2323, Perplexity: 9.3210, time_taken_in_seconds: 33
Epoch [1/1], Step [7042/13804], Loss: 3.0361, Perplexity: 20.8246, time_taken_in_seconds: 34
Epoch [1/1], Step [7043/13804], Loss: 2.5435, Perplexity: 12.7237, time_taken_in_seconds: 35
Epoch [1/1], Step [7044/13804], Loss: 2.5585, Perplexity: 12.9160, time_taken_in_seconds: 36
Epoch [1/1], Step [7045/13804], Loss: 3.0301, Perplexity: 20.6999, time_taken_in_seconds: 37
Epoch [1/1], Step [7046/13804], Loss: 2.3259, Perplexity: 10.2364, time_taken_in_seconds: 38
Epoch [1/1], Step [7047/13804], Loss: 2.7131, Perplexity: 15.0752, time_taken_in_seconds: 38
Epoch [1/1], Step [7048/13804], Loss: 2.8406, Perplexity: 17.1268, time_taken_in_seconds: 39
Epoch [1/1], Step [7049/13804], Loss: 2.2875, Perplexity: 9.8502, time_taken_in_seconds: 40
Epoch [1/1], Step [7050/13804], Loss: 3.3318, Perplexity: 27.9876, time_taken_in_seconds: 41
Epoch [1/1], Step [7051/13804], Loss: 2.3048, Perplexity: 10.0219, time_taken_in_seconds: 42
Epoch [1/1], Step [7052/13804], Loss: 2.7058, Perplexity: 14.9658, time_taken_in_seconds: 42
Epoch [1/1], Step [7053/13804], Loss: 2.3882, Perplexity: 10.8938, time_taken_in_seconds: 43
Epoch [1/1], Step [7054/13804], Loss: 2.5864, Perplexity: 13.2815, time_taken_in_seconds: 44
Epoch [1/1], Step [7055/13804], Loss: 2.3635, Perplexity: 10.6285, time_taken_in_seconds: 45
Epoch [1/1], Step [7056/13804], Loss: 2.6563, Perplexity: 14.2430, time_taken_in_seconds: 46
Epoch [1/1], Step [7057/13804], Loss: 2.6557, Perplexity: 14.2343, time_taken_in_seconds: 47
Epoch [1/1], Step [7058/13804], Loss: 2.5293, Perplexity: 12.5444, time_taken_in_seconds: 47
Epoch [1/1], Step [7059/13804], Loss: 2.8900, Perplexity: 17.9928, time_taken_in_seconds: 48
Epoch [1/1], Step [7060/13804], Loss: 3.6465, Perplexity: 38.3387, time_taken_in_seconds: 49
Epoch [1/1], Step [7061/13804], Loss: 2.5414, Perplexity: 12.6969, time_taken_in_seconds: 50
Epoch [1/1], Step [7062/13804], Loss: 2.5259, Perplexity: 12.5022, time_taken_in_seconds: 51
Epoch [1/1], Step [7063/13804], Loss: 2.4314, Perplexity: 11.3743, time_taken_in_seconds: 51
Epoch [1/1], Step [7064/13804], Loss: 2.3538, Perplexity: 10.5259, time_taken_in_seconds: 52
Epoch [1/1], Step [7065/13804], Loss: 2.5824, Perplexity: 13.2283, time_taken_in_seconds: 53
Epoch [1/1], Step [7066/13804], Loss: 2.5360, Perplexity: 12.6292, time_taken_in_seconds: 54
Epoch [1/1], Step [7067/13804], Loss: 2.3182, Perplexity: 10.1579, time_taken_in_seconds: 55
Epoch [1/1], Step [7068/13804], Loss: 2.9080, Perplexity: 18.3198, time_taken_in_seconds: 56
Epoch [1/1], Step [7069/13804], Loss: 2.5939, Perplexity: 13.3818, time_taken_in_seconds: 56
Epoch [1/1], Step [7070/13804], Loss: 2.4574, Perplexity: 11.6744, time_taken_in_seconds: 57
Epoch [1/1], Step [7071/13804], Loss: 2.8520, Perplexity: 17.3228, time_taken_in_seconds: 58
Epoch [1/1], Step [7072/13804], Loss: 2.5774, Perplexity: 13.1628, time_taken_in_seconds: 59
Epoch [1/1], Step [7073/13804], Loss: 2.8916, Perplexity: 18.0230, time_taken_in_seconds: 60
Epoch [1/1], Step [7074/13804], Loss: 2.7386, Perplexity: 15.4647, time_taken_in_seconds: 60
Epoch [1/1], Step [7075/13804], Loss: 2.6861, Perplexity: 14.6743, time_taken_in_seconds: 61
Epoch [1/1], Step [7076/13804], Loss: 3.0522, Perplexity: 21.1627, time_taken_in_seconds: 62
Epoch [1/1], Step [7077/13804], Loss: 2.7045, Perplexity: 14.9465, time_taken_in_seconds: 63
Epoch [1/1], Step [7078/13804], Loss: 2.6707, Perplexity: 14.4497, time_taken_in_seconds: 64
Epoch [1/1], Step [7079/13804], Loss: 2.7820, Perplexity: 16.1518, time_taken_in_seconds: 64
Epoch [1/1], Step [7080/13804], Loss: 3.3693, Perplexity: 29.0582, time_taken_in_seconds: 65
Epoch [1/1], Step [7081/13804], Loss: 3.1890, Perplexity: 24.2652, time_taken_in_seconds: 66
Epoch [1/1], Step [7082/13804], Loss: 2.3052, Perplexity: 10.0263, time_taken_in_seconds: 67
Epoch [1/1], Step [7083/13804], Loss: 2.7463, Perplexity: 15.5854, time_taken_in_seconds: 68
Epoch [1/1], Step [7084/13804], Loss: 2.7390, Perplexity: 15.4710, time_taken_in_seconds: 69
Epoch [1/1], Step [7085/13804], Loss: 2.5959, Perplexity: 13.4090, time_taken_in_seconds: 69
Epoch [1/1], Step [7086/13804], Loss: 2.4961, Perplexity: 12.1356, time_taken_in_seconds: 70
Epoch [1/1], Step [7087/13804], Loss: 2.6465, Perplexity: 14.1052, time_taken_in_seconds: 71
Epoch [1/1], Step [7088/13804], Loss: 2.9664, Perplexity: 19.4228, time_taken_in_seconds: 72
Epoch [1/1], Step [7089/13804], Loss: 3.5614, Perplexity: 35.2140, time_taken_in_seconds: 73
Epoch [1/1], Step [7090/13804], Loss: 2.9791, Perplexity: 19.6702, time_taken_in_seconds: 74
Epoch [1/1], Step [7091/13804], Loss: 3.1147, Perplexity: 22.5276, time_taken_in_seconds: 74
Epoch [1/1], Step [7092/13804], Loss: 2.7183, Perplexity: 15.1543, time_taken_in_seconds: 75
Epoch [1/1], Step [7093/13804], Loss: 3.1027, Perplexity: 22.2575, time_taken_in_seconds: 76
Epoch [1/1], Step [7094/13804], Loss: 2.6430, Perplexity: 14.0552, time_taken_in_seconds: 77
Epoch [1/1], Step [7095/13804], Loss: 2.4092, Perplexity: 11.1255, time_taken_in_seconds: 78
Epoch [1/1], Step [7096/13804], Loss: 2.5430, Perplexity: 12.7178, time_taken_in_seconds: 78
Epoch [1/1], Step [7097/13804], Loss: 2.4118, Perplexity: 11.1537, time_taken_in_seconds: 79
Epoch [1/1], Step [7098/13804], Loss: 2.6158, Perplexity: 13.6777, time_taken_in_seconds: 80
Epoch [1/1], Step [7099/13804], Loss: 2.3953, Perplexity: 10.9711, time_taken_in_seconds: 81
Epoch [1/1], Step [7100/13804], Loss: 2.8192, Perplexity: 16.7627, time_taken_in_seconds: 82
Epoch [1/1], Step [7101/13804], Loss: 2.3827, Perplexity: 10.8339, time_taken_in_seconds: 0
Epoch [1/1], Step [7102/13804], Loss: 2.7795, Perplexity: 16.1107, time_taken_in_seconds: 1
Epoch [1/1], Step [7103/13804], Loss: 2.8288, Perplexity: 16.9246, time_taken_in_seconds: 2
Epoch [1/1], Step [7104/13804], Loss: 2.8887, Perplexity: 17.9704, time_taken_in_seconds: 3
Epoch [1/1], Step [7105/13804], Loss: 2.6197, Perplexity: 13.7315, time_taken_in_seconds: 4
Epoch [1/1], Step [7106/13804], Loss: 2.6426, Perplexity: 14.0496, time_taken_in_seconds: 4
Epoch [1/1], Step [7107/13804], Loss: 3.2488, Perplexity: 25.7604, time_taken_in_seconds: 5
Epoch [1/1], Step [7108/13804], Loss: 2.6117, Perplexity: 13.6222, time_taken_in_seconds: 6
Epoch [1/1], Step [7109/13804], Loss: 2.8797, Perplexity: 17.8081, time_taken_in_seconds: 7
Epoch [1/1], Step [7110/13804], Loss: 2.7462, Perplexity: 15.5832, time_taken_in_seconds: 8
Epoch [1/1], Step [7111/13804], Loss: 2.4153, Perplexity: 11.1932, time_taken_in_seconds: 9
Epoch [1/1], Step [7112/13804], Loss: 2.4623, Perplexity: 11.7321, time_taken_in_seconds: 9
Epoch [1/1], Step [7113/13804], Loss: 2.5718, Perplexity: 13.0889, time_taken_in_seconds: 10
Epoch [1/1], Step [7114/13804], Loss: 2.5097, Perplexity: 12.3017, time_taken_in_seconds: 11
Epoch [1/1], Step [7115/13804], Loss: 2.5266, Perplexity: 12.5104, time_taken_in_seconds: 12
Epoch [1/1], Step [7116/13804], Loss: 2.8331, Perplexity: 16.9988, time_taken_in_seconds: 13
Epoch [1/1], Step [7117/13804], Loss: 2.7445, Perplexity: 15.5575, time_taken_in_seconds: 14
Epoch [1/1], Step [7118/13804], Loss: 2.5231, Perplexity: 12.4672, time_taken_in_seconds: 14
Epoch [1/1], Step [7119/13804], Loss: 2.5010, Perplexity: 12.1952, time_taken_in_seconds: 15
Epoch [1/1], Step [7120/13804], Loss: 2.6036, Perplexity: 13.5126, time_taken_in_seconds: 16
Epoch [1/1], Step [7121/13804], Loss: 2.6262, Perplexity: 13.8212, time_taken_in_seconds: 17
Epoch [1/1], Step [7122/13804], Loss: 2.7988, Perplexity: 16.4247, time_taken_in_seconds: 18
Epoch [1/1], Step [7123/13804], Loss: 2.6403, Perplexity: 14.0181, time_taken_in_seconds: 19
Epoch [1/1], Step [7124/13804], Loss: 2.6674, Perplexity: 14.4031, time_taken_in_seconds: 19
Epoch [1/1], Step [7125/13804], Loss: 2.5129, Perplexity: 12.3409, time_taken_in_seconds: 20
Epoch [1/1], Step [7126/13804], Loss: 2.6909, Perplexity: 14.7449, time_taken_in_seconds: 21
Epoch [1/1], Step [7127/13804], Loss: 2.3920, Perplexity: 10.9358, time_taken_in_seconds: 22
Epoch [1/1], Step [7128/13804], Loss: 3.1686, Perplexity: 23.7736, time_taken_in_seconds: 23
Epoch [1/1], Step [7129/13804], Loss: 2.4956, Perplexity: 12.1287, time_taken_in_seconds: 24
Epoch [1/1], Step [7130/13804], Loss: 2.5981, Perplexity: 13.4380, time_taken_in_seconds: 24
Epoch [1/1], Step [7131/13804], Loss: 2.3862, Perplexity: 10.8716, time_taken_in_seconds: 25
Epoch [1/1], Step [7132/13804], Loss: 2.6692, Perplexity: 14.4279, time_taken_in_seconds: 26
Epoch [1/1], Step [7133/13804], Loss: 2.6563, Perplexity: 14.2430, time_taken_in_seconds: 27
Epoch [1/1], Step [7134/13804], Loss: 2.6383, Perplexity: 13.9889, time_taken_in_seconds: 28
Epoch [1/1], Step [7135/13804], Loss: 2.2690, Perplexity: 9.6699, time_taken_in_seconds: 28
Epoch [1/1], Step [7136/13804], Loss: 2.8797, Perplexity: 17.8093, time_taken_in_seconds: 29
Epoch [1/1], Step [7137/13804], Loss: 2.9296, Perplexity: 18.7209, time_taken_in_seconds: 30
Epoch [1/1], Step [7138/13804], Loss: 2.8723, Perplexity: 17.6774, time_taken_in_seconds: 31
Epoch [1/1], Step [7139/13804], Loss: 2.3927, Perplexity: 10.9425, time_taken_in_seconds: 32
Epoch [1/1], Step [7140/13804], Loss: 2.8251, Perplexity: 16.8619, time_taken_in_seconds: 32
Epoch [1/1], Step [7141/13804], Loss: 2.3346, Perplexity: 10.3256, time_taken_in_seconds: 33
Epoch [1/1], Step [7142/13804], Loss: 2.7573, Perplexity: 15.7579, time_taken_in_seconds: 34
Epoch [1/1], Step [7143/13804], Loss: 2.3012, Perplexity: 9.9859, time_taken_in_seconds: 35
Epoch [1/1], Step [7144/13804], Loss: 2.8788, Perplexity: 17.7928, time_taken_in_seconds: 36
Epoch [1/1], Step [7145/13804], Loss: 2.6512, Perplexity: 14.1713, time_taken_in_seconds: 37
Epoch [1/1], Step [7146/13804], Loss: 2.7748, Perplexity: 16.0361, time_taken_in_seconds: 37
Epoch [1/1], Step [7147/13804], Loss: 2.3209, Perplexity: 10.1847, time_taken_in_seconds: 38
Epoch [1/1], Step [7148/13804], Loss: 2.3623, Perplexity: 10.6149, time_taken_in_seconds: 39
Epoch [1/1], Step [7149/13804], Loss: 2.9156, Perplexity: 18.4604, time_taken_in_seconds: 40
Epoch [1/1], Step [7150/13804], Loss: 2.9630, Perplexity: 19.3558, time_taken_in_seconds: 41
Epoch [1/1], Step [7151/13804], Loss: 2.6895, Perplexity: 14.7240, time_taken_in_seconds: 41
Epoch [1/1], Step [7152/13804], Loss: 2.8426, Perplexity: 17.1607, time_taken_in_seconds: 42
Epoch [1/1], Step [7153/13804], Loss: 2.7857, Perplexity: 16.2118, time_taken_in_seconds: 43
Epoch [1/1], Step [7154/13804], Loss: 2.7203, Perplexity: 15.1852, time_taken_in_seconds: 44
Epoch [1/1], Step [7155/13804], Loss: 2.8497, Perplexity: 17.2830, time_taken_in_seconds: 45
Epoch [1/1], Step [7156/13804], Loss: 2.5059, Perplexity: 12.2546, time_taken_in_seconds: 45
Epoch [1/1], Step [7157/13804], Loss: 2.8414, Perplexity: 17.1399, time_taken_in_seconds: 46
Epoch [1/1], Step [7158/13804], Loss: 2.6181, Perplexity: 13.7091, time_taken_in_seconds: 47
Epoch [1/1], Step [7159/13804], Loss: 2.5581, Perplexity: 12.9111, time_taken_in_seconds: 48
Epoch [1/1], Step [7160/13804], Loss: 2.3160, Perplexity: 10.1350, time_taken_in_seconds: 49
Epoch [1/1], Step [7161/13804], Loss: 2.0955, Perplexity: 8.1296, time_taken_in_seconds: 50
Epoch [1/1], Step [7162/13804], Loss: 2.6004, Perplexity: 13.4692, time_taken_in_seconds: 50
Epoch [1/1], Step [7163/13804], Loss: 2.8228, Perplexity: 16.8246, time_taken_in_seconds: 51
Epoch [1/1], Step [7164/13804], Loss: 2.4150, Perplexity: 11.1897, time_taken_in_seconds: 52
Epoch [1/1], Step [7165/13804], Loss: 2.9688, Perplexity: 19.4677, time_taken_in_seconds: 53
Epoch [1/1], Step [7166/13804], Loss: 2.6229, Perplexity: 13.7756, time_taken_in_seconds: 54
Epoch [1/1], Step [7167/13804], Loss: 3.1167, Perplexity: 22.5709, time_taken_in_seconds: 55
Epoch [1/1], Step [7168/13804], Loss: 2.7448, Perplexity: 15.5614, time_taken_in_seconds: 55
Epoch [1/1], Step [7169/13804], Loss: 2.2541, Perplexity: 9.5270, time_taken_in_seconds: 56
Epoch [1/1], Step [7170/13804], Loss: 3.1335, Perplexity: 22.9538, time_taken_in_seconds: 57
Epoch [1/1], Step [7171/13804], Loss: 2.6027, Perplexity: 13.4995, time_taken_in_seconds: 58
Epoch [1/1], Step [7172/13804], Loss: 3.1470, Perplexity: 23.2651, time_taken_in_seconds: 59
Epoch [1/1], Step [7173/13804], Loss: 2.5070, Perplexity: 12.2678, time_taken_in_seconds: 59
Epoch [1/1], Step [7174/13804], Loss: 2.2724, Perplexity: 9.7023, time_taken_in_seconds: 60
Epoch [1/1], Step [7175/13804], Loss: 2.8445, Perplexity: 17.1928, time_taken_in_seconds: 61
Epoch [1/1], Step [7176/13804], Loss: 2.3999, Perplexity: 11.0217, time_taken_in_seconds: 62
Epoch [1/1], Step [7177/13804], Loss: 2.6111, Perplexity: 13.6137, time_taken_in_seconds: 63
Epoch [1/1], Step [7178/13804], Loss: 3.0257, Perplexity: 20.6092, time_taken_in_seconds: 64
Epoch [1/1], Step [7179/13804], Loss: 2.8574, Perplexity: 17.4158, time_taken_in_seconds: 64
Epoch [1/1], Step [7180/13804], Loss: 2.4188, Perplexity: 11.2327, time_taken_in_seconds: 65
Epoch [1/1], Step [7181/13804], Loss: 2.8188, Perplexity: 16.7571, time_taken_in_seconds: 66
Epoch [1/1], Step [7182/13804], Loss: 2.6603, Perplexity: 14.3011, time_taken_in_seconds: 67
Epoch [1/1], Step [7183/13804], Loss: 2.6384, Perplexity: 13.9909, time_taken_in_seconds: 68
Epoch [1/1], Step [7184/13804], Loss: 2.4164, Perplexity: 11.2054, time_taken_in_seconds: 68
Epoch [1/1], Step [7185/13804], Loss: 2.6323, Perplexity: 13.9055, time_taken_in_seconds: 69
Epoch [1/1], Step [7186/13804], Loss: 2.6882, Perplexity: 14.7050, time_taken_in_seconds: 70
Epoch [1/1], Step [7187/13804], Loss: 2.8452, Perplexity: 17.2057, time_taken_in_seconds: 71
Epoch [1/1], Step [7188/13804], Loss: 2.7634, Perplexity: 15.8536, time_taken_in_seconds: 72
Epoch [1/1], Step [7189/13804], Loss: 3.0203, Perplexity: 20.4967, time_taken_in_seconds: 73
Epoch [1/1], Step [7190/13804], Loss: 2.4286, Perplexity: 11.3426, time_taken_in_seconds: 73
Epoch [1/1], Step [7191/13804], Loss: 2.5766, Perplexity: 13.1526, time_taken_in_seconds: 74
Epoch [1/1], Step [7192/13804], Loss: 2.4112, Perplexity: 11.1468, time_taken_in_seconds: 75
Epoch [1/1], Step [7193/13804], Loss: 2.6748, Perplexity: 14.5101, time_taken_in_seconds: 76
Epoch [1/1], Step [7194/13804], Loss: 2.4137, Perplexity: 11.1754, time_taken_in_seconds: 77
Epoch [1/1], Step [7195/13804], Loss: 2.8560, Perplexity: 17.3912, time_taken_in_seconds: 78
Epoch [1/1], Step [7196/13804], Loss: 2.9988, Perplexity: 20.0616, time_taken_in_seconds: 78
Epoch [1/1], Step [7197/13804], Loss: 2.5041, Perplexity: 12.2326, time_taken_in_seconds: 79
Epoch [1/1], Step [7198/13804], Loss: 2.3482, Perplexity: 10.4665, time_taken_in_seconds: 80
Epoch [1/1], Step [7199/13804], Loss: 2.9743, Perplexity: 19.5753, time_taken_in_seconds: 81
Epoch [1/1], Step [7200/13804], Loss: 2.9970, Perplexity: 20.0260, time_taken_in_seconds: 82
Epoch [1/1], Step [7201/13804], Loss: 2.4946, Perplexity: 12.1174, time_taken_in_seconds: 0
Epoch [1/1], Step [7202/13804], Loss: 2.7002, Perplexity: 14.8832, time_taken_in_seconds: 1
Epoch [1/1], Step [7203/13804], Loss: 2.3535, Perplexity: 10.5228, time_taken_in_seconds: 2
Epoch [1/1], Step [7204/13804], Loss: 3.1930, Perplexity: 24.3605, time_taken_in_seconds: 3
Epoch [1/1], Step [7205/13804], Loss: 2.5649, Perplexity: 12.9998, time_taken_in_seconds: 4
Epoch [1/1], Step [7206/13804], Loss: 2.5704, Perplexity: 13.0715, time_taken_in_seconds: 4
Epoch [1/1], Step [7207/13804], Loss: 2.6381, Perplexity: 13.9861, time_taken_in_seconds: 5
Epoch [1/1], Step [7208/13804], Loss: 2.5580, Perplexity: 12.9098, time_taken_in_seconds: 6
Epoch [1/1], Step [7209/13804], Loss: 2.5472, Perplexity: 12.7719, time_taken_in_seconds: 7
Epoch [1/1], Step [7210/13804], Loss: 1.9572, Perplexity: 7.0795, time_taken_in_seconds: 8
Epoch [1/1], Step [7211/13804], Loss: 3.3532, Perplexity: 28.5928, time_taken_in_seconds: 8
Epoch [1/1], Step [7212/13804], Loss: 2.4785, Perplexity: 11.9239, time_taken_in_seconds: 9
Epoch [1/1], Step [7213/13804], Loss: 2.2650, Perplexity: 9.6307, time_taken_in_seconds: 10
Epoch [1/1], Step [7214/13804], Loss: 2.8003, Perplexity: 16.4488, time_taken_in_seconds: 11
Epoch [1/1], Step [7215/13804], Loss: 2.5632, Perplexity: 12.9775, time_taken_in_seconds: 12
Epoch [1/1], Step [7216/13804], Loss: 2.6973, Perplexity: 14.8401, time_taken_in_seconds: 13
Epoch [1/1], Step [7217/13804], Loss: 2.6841, Perplexity: 14.6444, time_taken_in_seconds: 13
Epoch [1/1], Step [7218/13804], Loss: 2.3536, Perplexity: 10.5237, time_taken_in_seconds: 14
Epoch [1/1], Step [7219/13804], Loss: 2.4966, Perplexity: 12.1414, time_taken_in_seconds: 15
Epoch [1/1], Step [7220/13804], Loss: 2.4965, Perplexity: 12.1401, time_taken_in_seconds: 16
Epoch [1/1], Step [7221/13804], Loss: 3.0048, Perplexity: 20.1826, time_taken_in_seconds: 17
Epoch [1/1], Step [7222/13804], Loss: 3.4330, Perplexity: 30.9703, time_taken_in_seconds: 17
Epoch [1/1], Step [7223/13804], Loss: 3.0251, Perplexity: 20.5958, time_taken_in_seconds: 18
Epoch [1/1], Step [7224/13804], Loss: 2.6432, Perplexity: 14.0578, time_taken_in_seconds: 19
Epoch [1/1], Step [7225/13804], Loss: 2.6133, Perplexity: 13.6445, time_taken_in_seconds: 20
Epoch [1/1], Step [7226/13804], Loss: 2.3553, Perplexity: 10.5409, time_taken_in_seconds: 21
Epoch [1/1], Step [7227/13804], Loss: 2.8683, Perplexity: 17.6071, time_taken_in_seconds: 22
Epoch [1/1], Step [7228/13804], Loss: 2.4326, Perplexity: 11.3886, time_taken_in_seconds: 22
Epoch [1/1], Step [7229/13804], Loss: 2.4073, Perplexity: 11.1034, time_taken_in_seconds: 23
Epoch [1/1], Step [7230/13804], Loss: 2.6301, Perplexity: 13.8758, time_taken_in_seconds: 24
Epoch [1/1], Step [7231/13804], Loss: 2.9140, Perplexity: 18.4295, time_taken_in_seconds: 25
Epoch [1/1], Step [7232/13804], Loss: 2.8684, Perplexity: 17.6091, time_taken_in_seconds: 26
Epoch [1/1], Step [7233/13804], Loss: 2.7487, Perplexity: 15.6224, time_taken_in_seconds: 27
Epoch [1/1], Step [7234/13804], Loss: 2.7748, Perplexity: 16.0348, time_taken_in_seconds: 27
Epoch [1/1], Step [7235/13804], Loss: 3.1825, Perplexity: 24.1079, time_taken_in_seconds: 28
Epoch [1/1], Step [7236/13804], Loss: 2.4419, Perplexity: 11.4948, time_taken_in_seconds: 29
Epoch [1/1], Step [7237/13804], Loss: 2.7817, Perplexity: 16.1459, time_taken_in_seconds: 30
Epoch [1/1], Step [7238/13804], Loss: 2.5751, Perplexity: 13.1330, time_taken_in_seconds: 31
Epoch [1/1], Step [7239/13804], Loss: 2.7411, Perplexity: 15.5042, time_taken_in_seconds: 32
Epoch [1/1], Step [7240/13804], Loss: 2.8796, Perplexity: 17.8073, time_taken_in_seconds: 32
Epoch [1/1], Step [7241/13804], Loss: 2.4077, Perplexity: 11.1082, time_taken_in_seconds: 33
Epoch [1/1], Step [7242/13804], Loss: 2.6878, Perplexity: 14.6991, time_taken_in_seconds: 34
Epoch [1/1], Step [7243/13804], Loss: 2.8693, Perplexity: 17.6255, time_taken_in_seconds: 35
Epoch [1/1], Step [7244/13804], Loss: 2.6090, Perplexity: 13.5855, time_taken_in_seconds: 36
Epoch [1/1], Step [7245/13804], Loss: 2.7557, Perplexity: 15.7323, time_taken_in_seconds: 36
Epoch [1/1], Step [7246/13804], Loss: 2.9110, Perplexity: 18.3749, time_taken_in_seconds: 37
Epoch [1/1], Step [7247/13804], Loss: 2.4810, Perplexity: 11.9531, time_taken_in_seconds: 38
Epoch [1/1], Step [7248/13804], Loss: 2.5900, Perplexity: 13.3295, time_taken_in_seconds: 39
Epoch [1/1], Step [7249/13804], Loss: 2.3929, Perplexity: 10.9455, time_taken_in_seconds: 40
Epoch [1/1], Step [7250/13804], Loss: 3.3790, Perplexity: 29.3421, time_taken_in_seconds: 41
Epoch [1/1], Step [7251/13804], Loss: 2.4313, Perplexity: 11.3742, time_taken_in_seconds: 41
Epoch [1/1], Step [7252/13804], Loss: 2.6301, Perplexity: 13.8747, time_taken_in_seconds: 42
Epoch [1/1], Step [7253/13804], Loss: 2.6128, Perplexity: 13.6365, time_taken_in_seconds: 43
Epoch [1/1], Step [7254/13804], Loss: 2.7116, Perplexity: 15.0530, time_taken_in_seconds: 44
Epoch [1/1], Step [7255/13804], Loss: 2.6942, Perplexity: 14.7934, time_taken_in_seconds: 45
Epoch [1/1], Step [7256/13804], Loss: 3.0108, Perplexity: 20.3029, time_taken_in_seconds: 46
Epoch [1/1], Step [7257/13804], Loss: 2.4179, Perplexity: 11.2220, time_taken_in_seconds: 46
Epoch [1/1], Step [7258/13804], Loss: 2.8891, Perplexity: 17.9766, time_taken_in_seconds: 47
Epoch [1/1], Step [7259/13804], Loss: 2.8500, Perplexity: 17.2873, time_taken_in_seconds: 48
Epoch [1/1], Step [7260/13804], Loss: 2.7073, Perplexity: 14.9888, time_taken_in_seconds: 49
Epoch [1/1], Step [7261/13804], Loss: 2.8163, Perplexity: 16.7147, time_taken_in_seconds: 50
Epoch [1/1], Step [7262/13804], Loss: 2.9787, Perplexity: 19.6615, time_taken_in_seconds: 50
Epoch [1/1], Step [7263/13804], Loss: 2.6594, Perplexity: 14.2877, time_taken_in_seconds: 51
Epoch [1/1], Step [7264/13804], Loss: 2.6571, Perplexity: 14.2542, time_taken_in_seconds: 53
Epoch [1/1], Step [7265/13804], Loss: 2.5687, Perplexity: 13.0493, time_taken_in_seconds: 53
Epoch [1/1], Step [7266/13804], Loss: 2.9372, Perplexity: 18.8628, time_taken_in_seconds: 54
Epoch [1/1], Step [7267/13804], Loss: 2.5928, Perplexity: 13.3671, time_taken_in_seconds: 55
Epoch [1/1], Step [7268/13804], Loss: 2.8074, Perplexity: 16.5666, time_taken_in_seconds: 56
Epoch [1/1], Step [7269/13804], Loss: 2.9053, Perplexity: 18.2700, time_taken_in_seconds: 57
Epoch [1/1], Step [7270/13804], Loss: 2.5187, Perplexity: 12.4123, time_taken_in_seconds: 58
Epoch [1/1], Step [7271/13804], Loss: 2.8427, Perplexity: 17.1616, time_taken_in_seconds: 58
Epoch [1/1], Step [7272/13804], Loss: 2.5971, Perplexity: 13.4243, time_taken_in_seconds: 59
Epoch [1/1], Step [7273/13804], Loss: 2.6607, Perplexity: 14.3069, time_taken_in_seconds: 60
Epoch [1/1], Step [7274/13804], Loss: 2.4297, Perplexity: 11.3550, time_taken_in_seconds: 61
Epoch [1/1], Step [7275/13804], Loss: 2.6795, Perplexity: 14.5781, time_taken_in_seconds: 62
Epoch [1/1], Step [7276/13804], Loss: 3.3297, Perplexity: 27.9293, time_taken_in_seconds: 62
Epoch [1/1], Step [7277/13804], Loss: 2.8998, Perplexity: 18.1696, time_taken_in_seconds: 63
Epoch [1/1], Step [7278/13804], Loss: 2.6510, Perplexity: 14.1687, time_taken_in_seconds: 64
Epoch [1/1], Step [7279/13804], Loss: 2.6646, Perplexity: 14.3621, time_taken_in_seconds: 65
Epoch [1/1], Step [7280/13804], Loss: 2.6259, Perplexity: 13.8171, time_taken_in_seconds: 66
Epoch [1/1], Step [7281/13804], Loss: 2.3771, Perplexity: 10.7739, time_taken_in_seconds: 67
Epoch [1/1], Step [7282/13804], Loss: 2.7946, Perplexity: 16.3553, time_taken_in_seconds: 67
Epoch [1/1], Step [7283/13804], Loss: 2.6580, Perplexity: 14.2684, time_taken_in_seconds: 68
Epoch [1/1], Step [7284/13804], Loss: 2.7666, Perplexity: 15.9048, time_taken_in_seconds: 69
Epoch [1/1], Step [7285/13804], Loss: 3.3334, Perplexity: 28.0335, time_taken_in_seconds: 70
Epoch [1/1], Step [7286/13804], Loss: 2.7251, Perplexity: 15.2587, time_taken_in_seconds: 71
Epoch [1/1], Step [7287/13804], Loss: 2.5679, Perplexity: 13.0390, time_taken_in_seconds: 72
Epoch [1/1], Step [7288/13804], Loss: 2.6873, Perplexity: 14.6912, time_taken_in_seconds: 72
Epoch [1/1], Step [7289/13804], Loss: 2.5620, Perplexity: 12.9614, time_taken_in_seconds: 73
Epoch [1/1], Step [7290/13804], Loss: 2.5255, Perplexity: 12.4976, time_taken_in_seconds: 74
Epoch [1/1], Step [7291/13804], Loss: 2.4426, Perplexity: 11.5030, time_taken_in_seconds: 75
Epoch [1/1], Step [7292/13804], Loss: 2.6024, Perplexity: 13.4959, time_taken_in_seconds: 76
Epoch [1/1], Step [7293/13804], Loss: 2.8748, Perplexity: 17.7216, time_taken_in_seconds: 76
Epoch [1/1], Step [7294/13804], Loss: 2.7139, Perplexity: 15.0878, time_taken_in_seconds: 77
Epoch [1/1], Step [7295/13804], Loss: 2.7963, Perplexity: 16.3842, time_taken_in_seconds: 78
Epoch [1/1], Step [7296/13804], Loss: 2.3079, Perplexity: 10.0533, time_taken_in_seconds: 79
Epoch [1/1], Step [7297/13804], Loss: 2.7309, Perplexity: 15.3464, time_taken_in_seconds: 80
Epoch [1/1], Step [7298/13804], Loss: 2.5512, Perplexity: 12.8230, time_taken_in_seconds: 80
Epoch [1/1], Step [7299/13804], Loss: 2.6640, Perplexity: 14.3542, time_taken_in_seconds: 81
Epoch [1/1], Step [7300/13804], Loss: 2.7082, Perplexity: 15.0022, time_taken_in_seconds: 82
Epoch [1/1], Step [7301/13804], Loss: 2.4669, Perplexity: 11.7854, time_taken_in_seconds: 0
Epoch [1/1], Step [7302/13804], Loss: 2.6963, Perplexity: 14.8255, time_taken_in_seconds: 1
Epoch [1/1], Step [7303/13804], Loss: 2.5328, Perplexity: 12.5893, time_taken_in_seconds: 2
Epoch [1/1], Step [7304/13804], Loss: 2.4769, Perplexity: 11.9043, time_taken_in_seconds: 3
Epoch [1/1], Step [7305/13804], Loss: 2.9755, Perplexity: 19.5992, time_taken_in_seconds: 4
Epoch [1/1], Step [7306/13804], Loss: 2.8223, Perplexity: 16.8147, time_taken_in_seconds: 4
Epoch [1/1], Step [7307/13804], Loss: 2.5952, Perplexity: 13.3987, time_taken_in_seconds: 5
Epoch [1/1], Step [7308/13804], Loss: 2.8852, Perplexity: 17.9068, time_taken_in_seconds: 6
Epoch [1/1], Step [7309/13804], Loss: 2.4256, Perplexity: 11.3090, time_taken_in_seconds: 7
Epoch [1/1], Step [7310/13804], Loss: 2.1070, Perplexity: 8.2238, time_taken_in_seconds: 8
Epoch [1/1], Step [7311/13804], Loss: 2.4298, Perplexity: 11.3571, time_taken_in_seconds: 8
Epoch [1/1], Step [7312/13804], Loss: 2.6603, Perplexity: 14.2999, time_taken_in_seconds: 9
Epoch [1/1], Step [7313/13804], Loss: 2.6560, Perplexity: 14.2397, time_taken_in_seconds: 10
Epoch [1/1], Step [7314/13804], Loss: 2.3916, Perplexity: 10.9309, time_taken_in_seconds: 11
Epoch [1/1], Step [7315/13804], Loss: 2.5895, Perplexity: 13.3230, time_taken_in_seconds: 12
Epoch [1/1], Step [7316/13804], Loss: 2.7225, Perplexity: 15.2185, time_taken_in_seconds: 13
Epoch [1/1], Step [7317/13804], Loss: 2.4829, Perplexity: 11.9764, time_taken_in_seconds: 13
Epoch [1/1], Step [7318/13804], Loss: 2.7477, Perplexity: 15.6064, time_taken_in_seconds: 14
Epoch [1/1], Step [7319/13804], Loss: 2.7393, Perplexity: 15.4767, time_taken_in_seconds: 15
Epoch [1/1], Step [7320/13804], Loss: 2.3824, Perplexity: 10.8311, time_taken_in_seconds: 16
Epoch [1/1], Step [7321/13804], Loss: 2.8350, Perplexity: 17.0299, time_taken_in_seconds: 17
Epoch [1/1], Step [7322/13804], Loss: 2.9448, Perplexity: 19.0073, time_taken_in_seconds: 17
Epoch [1/1], Step [7323/13804], Loss: 2.7274, Perplexity: 15.2931, time_taken_in_seconds: 18
Epoch [1/1], Step [7324/13804], Loss: 2.5564, Perplexity: 12.8896, time_taken_in_seconds: 19
Epoch [1/1], Step [7325/13804], Loss: 2.7105, Perplexity: 15.0368, time_taken_in_seconds: 20
Epoch [1/1], Step [7326/13804], Loss: 2.7069, Perplexity: 14.9829, time_taken_in_seconds: 21
Epoch [1/1], Step [7327/13804], Loss: 2.8252, Perplexity: 16.8637, time_taken_in_seconds: 21
Epoch [1/1], Step [7328/13804], Loss: 2.4827, Perplexity: 11.9738, time_taken_in_seconds: 22
Epoch [1/1], Step [7329/13804], Loss: 3.3758, Perplexity: 29.2472, time_taken_in_seconds: 23
Epoch [1/1], Step [7330/13804], Loss: 2.6027, Perplexity: 13.5000, time_taken_in_seconds: 24
Epoch [1/1], Step [7331/13804], Loss: 2.6612, Perplexity: 14.3130, time_taken_in_seconds: 25
Epoch [1/1], Step [7332/13804], Loss: 2.7944, Perplexity: 16.3534, time_taken_in_seconds: 26
Epoch [1/1], Step [7333/13804], Loss: 2.6420, Perplexity: 14.0409, time_taken_in_seconds: 26
Epoch [1/1], Step [7334/13804], Loss: 2.4726, Perplexity: 11.8528, time_taken_in_seconds: 27
Epoch [1/1], Step [7335/13804], Loss: 2.4514, Perplexity: 11.6044, time_taken_in_seconds: 28
Epoch [1/1], Step [7336/13804], Loss: 2.8635, Perplexity: 17.5228, time_taken_in_seconds: 29
Epoch [1/1], Step [7337/13804], Loss: 2.3759, Perplexity: 10.7602, time_taken_in_seconds: 30
Epoch [1/1], Step [7338/13804], Loss: 2.4577, Perplexity: 11.6780, time_taken_in_seconds: 31
Epoch [1/1], Step [7339/13804], Loss: 2.4761, Perplexity: 11.8943, time_taken_in_seconds: 32
Epoch [1/1], Step [7340/13804], Loss: 2.3170, Perplexity: 10.1450, time_taken_in_seconds: 32
Epoch [1/1], Step [7341/13804], Loss: 3.5233, Perplexity: 33.8973, time_taken_in_seconds: 33
Epoch [1/1], Step [7342/13804], Loss: 2.4984, Perplexity: 12.1629, time_taken_in_seconds: 34
Epoch [1/1], Step [7343/13804], Loss: 2.5805, Perplexity: 13.2042, time_taken_in_seconds: 35
Epoch [1/1], Step [7344/13804], Loss: 4.3239, Perplexity: 75.4844, time_taken_in_seconds: 36
Epoch [1/1], Step [7345/13804], Loss: 2.6642, Perplexity: 14.3567, time_taken_in_seconds: 37
Epoch [1/1], Step [7346/13804], Loss: 2.5542, Perplexity: 12.8610, time_taken_in_seconds: 37
Epoch [1/1], Step [7347/13804], Loss: 2.8481, Perplexity: 17.2542, time_taken_in_seconds: 38
Epoch [1/1], Step [7348/13804], Loss: 2.5284, Perplexity: 12.5339, time_taken_in_seconds: 39
Epoch [1/1], Step [7349/13804], Loss: 2.4348, Perplexity: 11.4137, time_taken_in_seconds: 40
Epoch [1/1], Step [7350/13804], Loss: 2.4787, Perplexity: 11.9263, time_taken_in_seconds: 41
Epoch [1/1], Step [7351/13804], Loss: 2.6469, Perplexity: 14.1097, time_taken_in_seconds: 41
Epoch [1/1], Step [7352/13804], Loss: 2.4741, Perplexity: 11.8707, time_taken_in_seconds: 42
Epoch [1/1], Step [7353/13804], Loss: 2.3164, Perplexity: 10.1386, time_taken_in_seconds: 43
Epoch [1/1], Step [7354/13804], Loss: 2.5051, Perplexity: 12.2443, time_taken_in_seconds: 44
Epoch [1/1], Step [7355/13804], Loss: 3.4395, Perplexity: 31.1700, time_taken_in_seconds: 45
Epoch [1/1], Step [7356/13804], Loss: 2.9093, Perplexity: 18.3447, time_taken_in_seconds: 46
Epoch [1/1], Step [7357/13804], Loss: 3.1297, Perplexity: 22.8665, time_taken_in_seconds: 46
Epoch [1/1], Step [7358/13804], Loss: 2.4058, Perplexity: 11.0874, time_taken_in_seconds: 47
Epoch [1/1], Step [7359/13804], Loss: 3.3458, Perplexity: 28.3839, time_taken_in_seconds: 48
Epoch [1/1], Step [7360/13804], Loss: 2.6351, Perplexity: 13.9446, time_taken_in_seconds: 49
Epoch [1/1], Step [7361/13804], Loss: 2.5105, Perplexity: 12.3109, time_taken_in_seconds: 50
Epoch [1/1], Step [7362/13804], Loss: 2.6074, Perplexity: 13.5639, time_taken_in_seconds: 50
Epoch [1/1], Step [7363/13804], Loss: 2.4461, Perplexity: 11.5433, time_taken_in_seconds: 51
Epoch [1/1], Step [7364/13804], Loss: 2.7651, Perplexity: 15.8802, time_taken_in_seconds: 52
Epoch [1/1], Step [7365/13804], Loss: 2.3092, Perplexity: 10.0664, time_taken_in_seconds: 53
Epoch [1/1], Step [7366/13804], Loss: 2.6008, Perplexity: 13.4744, time_taken_in_seconds: 54
Epoch [1/1], Step [7367/13804], Loss: 2.7101, Perplexity: 15.0304, time_taken_in_seconds: 54
Epoch [1/1], Step [7368/13804], Loss: 2.6573, Perplexity: 14.2582, time_taken_in_seconds: 55
Epoch [1/1], Step [7369/13804], Loss: 3.1044, Perplexity: 22.2968, time_taken_in_seconds: 56
Epoch [1/1], Step [7370/13804], Loss: 2.3394, Perplexity: 10.3746, time_taken_in_seconds: 57
Epoch [1/1], Step [7371/13804], Loss: 2.9886, Perplexity: 19.8580, time_taken_in_seconds: 58
Epoch [1/1], Step [7372/13804], Loss: 2.7903, Perplexity: 16.2855, time_taken_in_seconds: 59
Epoch [1/1], Step [7373/13804], Loss: 2.5454, Perplexity: 12.7483, time_taken_in_seconds: 59
Epoch [1/1], Step [7374/13804], Loss: 2.6325, Perplexity: 13.9089, time_taken_in_seconds: 60
Epoch [1/1], Step [7375/13804], Loss: 2.9443, Perplexity: 18.9965, time_taken_in_seconds: 61
Epoch [1/1], Step [7376/13804], Loss: 2.6264, Perplexity: 13.8239, time_taken_in_seconds: 62
Epoch [1/1], Step [7377/13804], Loss: 2.6280, Perplexity: 13.8458, time_taken_in_seconds: 63
Epoch [1/1], Step [7378/13804], Loss: 2.9418, Perplexity: 18.9506, time_taken_in_seconds: 63
Epoch [1/1], Step [7379/13804], Loss: 2.6000, Perplexity: 13.4639, time_taken_in_seconds: 64
Epoch [1/1], Step [7380/13804], Loss: 2.7256, Perplexity: 15.2662, time_taken_in_seconds: 65
Epoch [1/1], Step [7381/13804], Loss: 2.9202, Perplexity: 18.5457, time_taken_in_seconds: 66
Epoch [1/1], Step [7382/13804], Loss: 2.9558, Perplexity: 19.2172, time_taken_in_seconds: 67
Epoch [1/1], Step [7383/13804], Loss: 2.6932, Perplexity: 14.7787, time_taken_in_seconds: 68
Epoch [1/1], Step [7384/13804], Loss: 2.4294, Perplexity: 11.3522, time_taken_in_seconds: 68
Epoch [1/1], Step [7385/13804], Loss: 2.4103, Perplexity: 11.1371, time_taken_in_seconds: 69
Epoch [1/1], Step [7386/13804], Loss: 3.3798, Perplexity: 29.3635, time_taken_in_seconds: 70
Epoch [1/1], Step [7387/13804], Loss: 2.7047, Perplexity: 14.9496, time_taken_in_seconds: 71
Epoch [1/1], Step [7388/13804], Loss: 3.0347, Perplexity: 20.7953, time_taken_in_seconds: 72
Epoch [1/1], Step [7389/13804], Loss: 2.4917, Perplexity: 12.0823, time_taken_in_seconds: 72
Epoch [1/1], Step [7390/13804], Loss: 2.5786, Perplexity: 13.1783, time_taken_in_seconds: 73
Epoch [1/1], Step [7391/13804], Loss: 2.8569, Perplexity: 17.4067, time_taken_in_seconds: 74
Epoch [1/1], Step [7392/13804], Loss: 2.4140, Perplexity: 11.1784, time_taken_in_seconds: 75
Epoch [1/1], Step [7393/13804], Loss: 2.6585, Perplexity: 14.2743, time_taken_in_seconds: 76
Epoch [1/1], Step [7394/13804], Loss: 2.7323, Perplexity: 15.3686, time_taken_in_seconds: 77
Epoch [1/1], Step [7395/13804], Loss: 2.3669, Perplexity: 10.6644, time_taken_in_seconds: 77
Epoch [1/1], Step [7396/13804], Loss: 3.1908, Perplexity: 24.3084, time_taken_in_seconds: 78
Epoch [1/1], Step [7397/13804], Loss: 2.5387, Perplexity: 12.6630, time_taken_in_seconds: 79
Epoch [1/1], Step [7398/13804], Loss: 3.2698, Perplexity: 26.3053, time_taken_in_seconds: 80
Epoch [1/1], Step [7399/13804], Loss: 2.7302, Perplexity: 15.3366, time_taken_in_seconds: 81
Epoch [1/1], Step [7400/13804], Loss: 2.3955, Perplexity: 10.9733, time_taken_in_seconds: 81
Epoch [1/1], Step [7401/13804], Loss: 2.6871, Perplexity: 14.6891, time_taken_in_seconds: 0
Epoch [1/1], Step [7402/13804], Loss: 2.7810, Perplexity: 16.1347, time_taken_in_seconds: 1
Epoch [1/1], Step [7403/13804], Loss: 2.6654, Perplexity: 14.3733, time_taken_in_seconds: 2
Epoch [1/1], Step [7404/13804], Loss: 2.6903, Perplexity: 14.7356, time_taken_in_seconds: 3
Epoch [1/1], Step [7405/13804], Loss: 2.6263, Perplexity: 13.8229, time_taken_in_seconds: 4
Epoch [1/1], Step [7406/13804], Loss: 3.2275, Perplexity: 25.2169, time_taken_in_seconds: 4
Epoch [1/1], Step [7407/13804], Loss: 2.6293, Perplexity: 13.8635, time_taken_in_seconds: 5
Epoch [1/1], Step [7408/13804], Loss: 2.5587, Perplexity: 12.9187, time_taken_in_seconds: 6
Epoch [1/1], Step [7409/13804], Loss: 2.3412, Perplexity: 10.3933, time_taken_in_seconds: 7
Epoch [1/1], Step [7410/13804], Loss: 2.4233, Perplexity: 11.2835, time_taken_in_seconds: 8
Epoch [1/1], Step [7411/13804], Loss: 2.4613, Perplexity: 11.7200, time_taken_in_seconds: 9
Epoch [1/1], Step [7412/13804], Loss: 3.0913, Perplexity: 22.0052, time_taken_in_seconds: 10
Epoch [1/1], Step [7413/13804], Loss: 2.8839, Perplexity: 17.8838, time_taken_in_seconds: 10
Epoch [1/1], Step [7414/13804], Loss: 2.3731, Perplexity: 10.7305, time_taken_in_seconds: 11
Epoch [1/1], Step [7415/13804], Loss: 2.7275, Perplexity: 15.2939, time_taken_in_seconds: 12
Epoch [1/1], Step [7416/13804], Loss: 2.6065, Perplexity: 13.5514, time_taken_in_seconds: 13
Epoch [1/1], Step [7417/13804], Loss: 2.5667, Perplexity: 13.0231, time_taken_in_seconds: 14
Epoch [1/1], Step [7418/13804], Loss: 2.6834, Perplexity: 14.6349, time_taken_in_seconds: 15
Epoch [1/1], Step [7419/13804], Loss: 2.3293, Perplexity: 10.2705, time_taken_in_seconds: 15
Epoch [1/1], Step [7420/13804], Loss: 2.6137, Perplexity: 13.6492, time_taken_in_seconds: 16
Epoch [1/1], Step [7421/13804], Loss: 2.3507, Perplexity: 10.4928, time_taken_in_seconds: 17
Epoch [1/1], Step [7422/13804], Loss: 2.7071, Perplexity: 14.9850, time_taken_in_seconds: 18
Epoch [1/1], Step [7423/13804], Loss: 2.6636, Perplexity: 14.3479, time_taken_in_seconds: 19
Epoch [1/1], Step [7424/13804], Loss: 2.5308, Perplexity: 12.5635, time_taken_in_seconds: 19
Epoch [1/1], Step [7425/13804], Loss: 2.8723, Perplexity: 17.6784, time_taken_in_seconds: 20
Epoch [1/1], Step [7426/13804], Loss: 3.0395, Perplexity: 20.8958, time_taken_in_seconds: 21
Epoch [1/1], Step [7427/13804], Loss: 2.2990, Perplexity: 9.9645, time_taken_in_seconds: 22
Epoch [1/1], Step [7428/13804], Loss: 2.5358, Perplexity: 12.6261, time_taken_in_seconds: 23
Epoch [1/1], Step [7429/13804], Loss: 2.5922, Perplexity: 13.3590, time_taken_in_seconds: 23
Epoch [1/1], Step [7430/13804], Loss: 2.8666, Perplexity: 17.5774, time_taken_in_seconds: 24
Epoch [1/1], Step [7431/13804], Loss: 2.8899, Perplexity: 17.9915, time_taken_in_seconds: 25
Epoch [1/1], Step [7432/13804], Loss: 2.6765, Perplexity: 14.5343, time_taken_in_seconds: 26
Epoch [1/1], Step [7433/13804], Loss: 2.4771, Perplexity: 11.9069, time_taken_in_seconds: 27
Epoch [1/1], Step [7434/13804], Loss: 2.7104, Perplexity: 15.0354, time_taken_in_seconds: 28
Epoch [1/1], Step [7435/13804], Loss: 2.6035, Perplexity: 13.5110, time_taken_in_seconds: 28
Epoch [1/1], Step [7436/13804], Loss: 2.7021, Perplexity: 14.9116, time_taken_in_seconds: 29
Epoch [1/1], Step [7437/13804], Loss: 2.4932, Perplexity: 12.1000, time_taken_in_seconds: 30
Epoch [1/1], Step [7438/13804], Loss: 2.4210, Perplexity: 11.2576, time_taken_in_seconds: 31
Epoch [1/1], Step [7439/13804], Loss: 2.6648, Perplexity: 14.3648, time_taken_in_seconds: 32
Epoch [1/1], Step [7440/13804], Loss: 2.6706, Perplexity: 14.4490, time_taken_in_seconds: 32
Epoch [1/1], Step [7441/13804], Loss: 2.5580, Perplexity: 12.9097, time_taken_in_seconds: 33
Epoch [1/1], Step [7442/13804], Loss: 2.7028, Perplexity: 14.9211, time_taken_in_seconds: 34
Epoch [1/1], Step [7443/13804], Loss: 2.6055, Perplexity: 13.5386, time_taken_in_seconds: 35
Epoch [1/1], Step [7444/13804], Loss: 2.5953, Perplexity: 13.4003, time_taken_in_seconds: 36
Epoch [1/1], Step [7445/13804], Loss: 2.6898, Perplexity: 14.7285, time_taken_in_seconds: 37
Epoch [1/1], Step [7446/13804], Loss: 3.3527, Perplexity: 28.5785, time_taken_in_seconds: 37
Epoch [1/1], Step [7447/13804], Loss: 2.4832, Perplexity: 11.9790, time_taken_in_seconds: 38
Epoch [1/1], Step [7448/13804], Loss: 3.1658, Perplexity: 23.7080, time_taken_in_seconds: 39
Epoch [1/1], Step [7449/13804], Loss: 2.2448, Perplexity: 9.4384, time_taken_in_seconds: 40
Epoch [1/1], Step [7450/13804], Loss: 2.6288, Perplexity: 13.8576, time_taken_in_seconds: 41
Epoch [1/1], Step [7451/13804], Loss: 2.5820, Perplexity: 13.2234, time_taken_in_seconds: 41
Epoch [1/1], Step [7452/13804], Loss: 2.3279, Perplexity: 10.2566, time_taken_in_seconds: 42
Epoch [1/1], Step [7453/13804], Loss: 3.0416, Perplexity: 20.9382, time_taken_in_seconds: 43
Epoch [1/1], Step [7454/13804], Loss: 2.6338, Perplexity: 13.9263, time_taken_in_seconds: 44
Epoch [1/1], Step [7455/13804], Loss: 2.4905, Perplexity: 12.0668, time_taken_in_seconds: 45
Epoch [1/1], Step [7456/13804], Loss: 2.3241, Perplexity: 10.2178, time_taken_in_seconds: 46
Epoch [1/1], Step [7457/13804], Loss: 2.6563, Perplexity: 14.2435, time_taken_in_seconds: 46
Epoch [1/1], Step [7458/13804], Loss: 2.7053, Perplexity: 14.9584, time_taken_in_seconds: 47
Epoch [1/1], Step [7459/13804], Loss: 2.7004, Perplexity: 14.8855, time_taken_in_seconds: 48
Epoch [1/1], Step [7460/13804], Loss: 2.5107, Perplexity: 12.3136, time_taken_in_seconds: 49
Epoch [1/1], Step [7461/13804], Loss: 2.2916, Perplexity: 9.8908, time_taken_in_seconds: 50
Epoch [1/1], Step [7462/13804], Loss: 2.9966, Perplexity: 20.0181, time_taken_in_seconds: 51
Epoch [1/1], Step [7463/13804], Loss: 2.5252, Perplexity: 12.4939, time_taken_in_seconds: 51
Epoch [1/1], Step [7464/13804], Loss: 2.8429, Perplexity: 17.1647, time_taken_in_seconds: 52
Epoch [1/1], Step [7465/13804], Loss: 2.7595, Perplexity: 15.7920, time_taken_in_seconds: 53
Epoch [1/1], Step [7466/13804], Loss: 2.8219, Perplexity: 16.8083, time_taken_in_seconds: 54
Epoch [1/1], Step [7467/13804], Loss: 2.5546, Perplexity: 12.8666, time_taken_in_seconds: 55
Epoch [1/1], Step [7468/13804], Loss: 2.8855, Perplexity: 17.9119, time_taken_in_seconds: 56
Epoch [1/1], Step [7469/13804], Loss: 2.3507, Perplexity: 10.4926, time_taken_in_seconds: 56
Epoch [1/1], Step [7470/13804], Loss: 2.6855, Perplexity: 14.6650, time_taken_in_seconds: 57
Epoch [1/1], Step [7471/13804], Loss: 2.5249, Perplexity: 12.4893, time_taken_in_seconds: 58
Epoch [1/1], Step [7472/13804], Loss: 2.5347, Perplexity: 12.6123, time_taken_in_seconds: 59
Epoch [1/1], Step [7473/13804], Loss: 2.8160, Perplexity: 16.7093, time_taken_in_seconds: 60
Epoch [1/1], Step [7474/13804], Loss: 2.5550, Perplexity: 12.8709, time_taken_in_seconds: 61
Epoch [1/1], Step [7475/13804], Loss: 2.8395, Perplexity: 17.1071, time_taken_in_seconds: 61
Epoch [1/1], Step [7476/13804], Loss: 2.3503, Perplexity: 10.4886, time_taken_in_seconds: 62
Epoch [1/1], Step [7477/13804], Loss: 2.3567, Perplexity: 10.5556, time_taken_in_seconds: 63
Epoch [1/1], Step [7478/13804], Loss: 2.6159, Perplexity: 13.6793, time_taken_in_seconds: 64
Epoch [1/1], Step [7479/13804], Loss: 2.6426, Perplexity: 14.0492, time_taken_in_seconds: 65
Epoch [1/1], Step [7480/13804], Loss: 2.4215, Perplexity: 11.2627, time_taken_in_seconds: 65
Epoch [1/1], Step [7481/13804], Loss: 2.9039, Perplexity: 18.2458, time_taken_in_seconds: 66
Epoch [1/1], Step [7482/13804], Loss: 2.5333, Perplexity: 12.5950, time_taken_in_seconds: 67
Epoch [1/1], Step [7483/13804], Loss: 2.8957, Perplexity: 18.0963, time_taken_in_seconds: 68
Epoch [1/1], Step [7484/13804], Loss: 2.6378, Perplexity: 13.9824, time_taken_in_seconds: 69
Epoch [1/1], Step [7485/13804], Loss: 2.4185, Perplexity: 11.2295, time_taken_in_seconds: 70
Epoch [1/1], Step [7486/13804], Loss: 2.5734, Perplexity: 13.1108, time_taken_in_seconds: 71
Epoch [1/1], Step [7487/13804], Loss: 2.5005, Perplexity: 12.1884, time_taken_in_seconds: 71
Epoch [1/1], Step [7488/13804], Loss: 2.5469, Perplexity: 12.7678, time_taken_in_seconds: 72
Epoch [1/1], Step [7489/13804], Loss: 2.5743, Perplexity: 13.1221, time_taken_in_seconds: 73
Epoch [1/1], Step [7490/13804], Loss: 3.0667, Perplexity: 21.4708, time_taken_in_seconds: 74
Epoch [1/1], Step [7491/13804], Loss: 2.6283, Perplexity: 13.8497, time_taken_in_seconds: 75
Epoch [1/1], Step [7492/13804], Loss: 2.5285, Perplexity: 12.5343, time_taken_in_seconds: 76
Epoch [1/1], Step [7493/13804], Loss: 2.7420, Perplexity: 15.5177, time_taken_in_seconds: 76
Epoch [1/1], Step [7494/13804], Loss: 2.6317, Perplexity: 13.8976, time_taken_in_seconds: 77
Epoch [1/1], Step [7495/13804], Loss: 3.0656, Perplexity: 21.4477, time_taken_in_seconds: 78
Epoch [1/1], Step [7496/13804], Loss: 2.6539, Perplexity: 14.2091, time_taken_in_seconds: 79
Epoch [1/1], Step [7497/13804], Loss: 2.7296, Perplexity: 15.3267, time_taken_in_seconds: 80
Epoch [1/1], Step [7498/13804], Loss: 2.9295, Perplexity: 18.7174, time_taken_in_seconds: 80
Epoch [1/1], Step [7499/13804], Loss: 2.7731, Perplexity: 16.0084, time_taken_in_seconds: 81
Epoch [1/1], Step [7500/13804], Loss: 3.0696, Perplexity: 21.5334, time_taken_in_seconds: 82
Epoch [1/1], Step [7501/13804], Loss: 2.7416, Perplexity: 15.5117, time_taken_in_seconds: 0
Epoch [1/1], Step [7502/13804], Loss: 2.7858, Perplexity: 16.2124, time_taken_in_seconds: 1
Epoch [1/1], Step [7503/13804], Loss: 2.6591, Perplexity: 14.2836, time_taken_in_seconds: 2
Epoch [1/1], Step [7504/13804], Loss: 2.7652, Perplexity: 15.8828, time_taken_in_seconds: 3
Epoch [1/1], Step [7505/13804], Loss: 2.3660, Perplexity: 10.6551, time_taken_in_seconds: 4
Epoch [1/1], Step [7506/13804], Loss: 2.5347, Perplexity: 12.6127, time_taken_in_seconds: 4
Epoch [1/1], Step [7507/13804], Loss: 2.5919, Perplexity: 13.3547, time_taken_in_seconds: 5
Epoch [1/1], Step [7508/13804], Loss: 2.3663, Perplexity: 10.6575, time_taken_in_seconds: 6
Epoch [1/1], Step [7509/13804], Loss: 2.6980, Perplexity: 14.8505, time_taken_in_seconds: 7
Epoch [1/1], Step [7510/13804], Loss: 2.8844, Perplexity: 17.8936, time_taken_in_seconds: 8
Epoch [1/1], Step [7511/13804], Loss: 2.8974, Perplexity: 18.1277, time_taken_in_seconds: 9
Epoch [1/1], Step [7512/13804], Loss: 2.2836, Perplexity: 9.8116, time_taken_in_seconds: 9
Epoch [1/1], Step [7513/13804], Loss: 2.8642, Perplexity: 17.5355, time_taken_in_seconds: 10
Epoch [1/1], Step [7514/13804], Loss: 2.4203, Perplexity: 11.2495, time_taken_in_seconds: 11
Epoch [1/1], Step [7515/13804], Loss: 2.8820, Perplexity: 17.8501, time_taken_in_seconds: 12
Epoch [1/1], Step [7516/13804], Loss: 2.6920, Perplexity: 14.7610, time_taken_in_seconds: 13
Epoch [1/1], Step [7517/13804], Loss: 2.4129, Perplexity: 11.1661, time_taken_in_seconds: 13
Epoch [1/1], Step [7518/13804], Loss: 2.4090, Perplexity: 11.1224, time_taken_in_seconds: 14
Epoch [1/1], Step [7519/13804], Loss: 2.9410, Perplexity: 18.9339, time_taken_in_seconds: 15
Epoch [1/1], Step [7520/13804], Loss: 2.6007, Perplexity: 13.4736, time_taken_in_seconds: 16
Epoch [1/1], Step [7521/13804], Loss: 2.5575, Perplexity: 12.9032, time_taken_in_seconds: 17
Epoch [1/1], Step [7522/13804], Loss: 2.4217, Perplexity: 11.2652, time_taken_in_seconds: 17
Epoch [1/1], Step [7523/13804], Loss: 2.6736, Perplexity: 14.4924, time_taken_in_seconds: 18
Epoch [1/1], Step [7524/13804], Loss: 2.6715, Perplexity: 14.4614, time_taken_in_seconds: 19
Epoch [1/1], Step [7525/13804], Loss: 2.7853, Perplexity: 16.2043, time_taken_in_seconds: 20
Epoch [1/1], Step [7526/13804], Loss: 2.7304, Perplexity: 15.3391, time_taken_in_seconds: 21
Epoch [1/1], Step [7527/13804], Loss: 2.4060, Perplexity: 11.0894, time_taken_in_seconds: 22
Epoch [1/1], Step [7528/13804], Loss: 2.7727, Perplexity: 16.0022, time_taken_in_seconds: 22
Epoch [1/1], Step [7529/13804], Loss: 2.6602, Perplexity: 14.2988, time_taken_in_seconds: 23
Epoch [1/1], Step [7530/13804], Loss: 2.5505, Perplexity: 12.8134, time_taken_in_seconds: 24
Epoch [1/1], Step [7531/13804], Loss: 2.7912, Perplexity: 16.3011, time_taken_in_seconds: 25
Epoch [1/1], Step [7532/13804], Loss: 2.7766, Perplexity: 16.0646, time_taken_in_seconds: 26
Epoch [1/1], Step [7533/13804], Loss: 2.8619, Perplexity: 17.4947, time_taken_in_seconds: 26
Epoch [1/1], Step [7534/13804], Loss: 2.4783, Perplexity: 11.9213, time_taken_in_seconds: 27
Epoch [1/1], Step [7535/13804], Loss: 3.1347, Perplexity: 22.9817, time_taken_in_seconds: 28
Epoch [1/1], Step [7536/13804], Loss: 2.2084, Perplexity: 9.1010, time_taken_in_seconds: 29
Epoch [1/1], Step [7537/13804], Loss: 2.2221, Perplexity: 9.2262, time_taken_in_seconds: 30
Epoch [1/1], Step [7538/13804], Loss: 2.5081, Perplexity: 12.2812, time_taken_in_seconds: 31
Epoch [1/1], Step [7539/13804], Loss: 2.5230, Perplexity: 12.4664, time_taken_in_seconds: 31
Epoch [1/1], Step [7540/13804], Loss: 2.4537, Perplexity: 11.6314, time_taken_in_seconds: 32
Epoch [1/1], Step [7541/13804], Loss: 2.4385, Perplexity: 11.4553, time_taken_in_seconds: 33
Epoch [1/1], Step [7542/13804], Loss: 2.5829, Perplexity: 13.2352, time_taken_in_seconds: 34
Epoch [1/1], Step [7543/13804], Loss: 2.7073, Perplexity: 14.9893, time_taken_in_seconds: 35
Epoch [1/1], Step [7544/13804], Loss: 2.5557, Perplexity: 12.8800, time_taken_in_seconds: 35
Epoch [1/1], Step [7545/13804], Loss: 2.6662, Perplexity: 14.3853, time_taken_in_seconds: 36
Epoch [1/1], Step [7546/13804], Loss: 2.9739, Perplexity: 19.5675, time_taken_in_seconds: 37
Epoch [1/1], Step [7547/13804], Loss: 2.5154, Perplexity: 12.3721, time_taken_in_seconds: 38
Epoch [1/1], Step [7548/13804], Loss: 2.3948, Perplexity: 10.9664, time_taken_in_seconds: 39
Epoch [1/1], Step [7549/13804], Loss: 2.6068, Perplexity: 13.5550, time_taken_in_seconds: 39
Epoch [1/1], Step [7550/13804], Loss: 3.5365, Perplexity: 34.3481, time_taken_in_seconds: 40
Epoch [1/1], Step [7551/13804], Loss: 2.6367, Perplexity: 13.9672, time_taken_in_seconds: 41
Epoch [1/1], Step [7552/13804], Loss: 2.5459, Perplexity: 12.7545, time_taken_in_seconds: 42
Epoch [1/1], Step [7553/13804], Loss: 2.5462, Perplexity: 12.7582, time_taken_in_seconds: 43
Epoch [1/1], Step [7554/13804], Loss: 2.6540, Perplexity: 14.2115, time_taken_in_seconds: 44
Epoch [1/1], Step [7555/13804], Loss: 2.4639, Perplexity: 11.7503, time_taken_in_seconds: 44
Epoch [1/1], Step [7556/13804], Loss: 2.4236, Perplexity: 11.2869, time_taken_in_seconds: 45
Epoch [1/1], Step [7557/13804], Loss: 2.9200, Perplexity: 18.5416, time_taken_in_seconds: 46
Epoch [1/1], Step [7558/13804], Loss: 3.1243, Perplexity: 22.7437, time_taken_in_seconds: 47
Epoch [1/1], Step [7559/13804], Loss: 2.5328, Perplexity: 12.5882, time_taken_in_seconds: 48
Epoch [1/1], Step [7560/13804], Loss: 2.4691, Perplexity: 11.8115, time_taken_in_seconds: 49
Epoch [1/1], Step [7561/13804], Loss: 2.4684, Perplexity: 11.8031, time_taken_in_seconds: 50
Epoch [1/1], Step [7562/13804], Loss: 2.3177, Perplexity: 10.1526, time_taken_in_seconds: 50
Epoch [1/1], Step [7563/13804], Loss: 2.5257, Perplexity: 12.4995, time_taken_in_seconds: 51
Epoch [1/1], Step [7564/13804], Loss: 2.8063, Perplexity: 16.5485, time_taken_in_seconds: 52
Epoch [1/1], Step [7565/13804], Loss: 2.5712, Perplexity: 13.0819, time_taken_in_seconds: 53
Epoch [1/1], Step [7566/13804], Loss: 2.2254, Perplexity: 9.2567, time_taken_in_seconds: 54
Epoch [1/1], Step [7567/13804], Loss: 2.7950, Perplexity: 16.3619, time_taken_in_seconds: 54
Epoch [1/1], Step [7568/13804], Loss: 2.6288, Perplexity: 13.8571, time_taken_in_seconds: 55
Epoch [1/1], Step [7569/13804], Loss: 2.7338, Perplexity: 15.3909, time_taken_in_seconds: 56
Epoch [1/1], Step [7570/13804], Loss: 2.6251, Perplexity: 13.8056, time_taken_in_seconds: 57
Epoch [1/1], Step [7571/13804], Loss: 2.6299, Perplexity: 13.8720, time_taken_in_seconds: 58
Epoch [1/1], Step [7572/13804], Loss: 2.7437, Perplexity: 15.5441, time_taken_in_seconds: 58
Epoch [1/1], Step [7573/13804], Loss: 2.6028, Perplexity: 13.5015, time_taken_in_seconds: 59
Epoch [1/1], Step [7574/13804], Loss: 2.6541, Perplexity: 14.2125, time_taken_in_seconds: 60
Epoch [1/1], Step [7575/13804], Loss: 2.5196, Perplexity: 12.4239, time_taken_in_seconds: 61
Epoch [1/1], Step [7576/13804], Loss: 3.1111, Perplexity: 22.4460, time_taken_in_seconds: 62
Epoch [1/1], Step [7577/13804], Loss: 2.5380, Perplexity: 12.6541, time_taken_in_seconds: 63
Epoch [1/1], Step [7578/13804], Loss: 2.2412, Perplexity: 9.4043, time_taken_in_seconds: 63
Epoch [1/1], Step [7579/13804], Loss: 2.7682, Perplexity: 15.9307, time_taken_in_seconds: 64
Epoch [1/1], Step [7580/13804], Loss: 2.4310, Perplexity: 11.3701, time_taken_in_seconds: 65
Epoch [1/1], Step [7581/13804], Loss: 2.4966, Perplexity: 12.1410, time_taken_in_seconds: 66
Epoch [1/1], Step [7582/13804], Loss: 2.4878, Perplexity: 12.0343, time_taken_in_seconds: 67
Epoch [1/1], Step [7583/13804], Loss: 2.3779, Perplexity: 10.7823, time_taken_in_seconds: 67
Epoch [1/1], Step [7584/13804], Loss: 3.2779, Perplexity: 26.5204, time_taken_in_seconds: 68
Epoch [1/1], Step [7585/13804], Loss: 3.5173, Perplexity: 33.6923, time_taken_in_seconds: 69
Epoch [1/1], Step [7586/13804], Loss: 2.6291, Perplexity: 13.8607, time_taken_in_seconds: 70
Epoch [1/1], Step [7587/13804], Loss: 2.4873, Perplexity: 12.0293, time_taken_in_seconds: 71
Epoch [1/1], Step [7588/13804], Loss: 2.6836, Perplexity: 14.6370, time_taken_in_seconds: 72
Epoch [1/1], Step [7589/13804], Loss: 2.5843, Perplexity: 13.2542, time_taken_in_seconds: 72
Epoch [1/1], Step [7590/13804], Loss: 2.6286, Perplexity: 13.8548, time_taken_in_seconds: 73
Epoch [1/1], Step [7591/13804], Loss: 2.6791, Perplexity: 14.5725, time_taken_in_seconds: 74
Epoch [1/1], Step [7592/13804], Loss: 2.6811, Perplexity: 14.6005, time_taken_in_seconds: 75
Epoch [1/1], Step [7593/13804], Loss: 2.8052, Perplexity: 16.5306, time_taken_in_seconds: 76
Epoch [1/1], Step [7594/13804], Loss: 3.0257, Perplexity: 20.6077, time_taken_in_seconds: 76
Epoch [1/1], Step [7595/13804], Loss: 2.4056, Perplexity: 11.0850, time_taken_in_seconds: 77
Epoch [1/1], Step [7596/13804], Loss: 2.3978, Perplexity: 10.9985, time_taken_in_seconds: 78
Epoch [1/1], Step [7597/13804], Loss: 2.3119, Perplexity: 10.0940, time_taken_in_seconds: 79
Epoch [1/1], Step [7598/13804], Loss: 2.6087, Perplexity: 13.5808, time_taken_in_seconds: 80
Epoch [1/1], Step [7599/13804], Loss: 2.3178, Perplexity: 10.1528, time_taken_in_seconds: 81
Epoch [1/1], Step [7600/13804], Loss: 2.8181, Perplexity: 16.7444, time_taken_in_seconds: 81
Epoch [1/1], Step [7601/13804], Loss: 2.7299, Perplexity: 15.3317, time_taken_in_seconds: 0
Epoch [1/1], Step [7602/13804], Loss: 2.7021, Perplexity: 14.9112, time_taken_in_seconds: 1
Epoch [1/1], Step [7603/13804], Loss: 2.8041, Perplexity: 16.5114, time_taken_in_seconds: 2
Epoch [1/1], Step [7604/13804], Loss: 2.5877, Perplexity: 13.2997, time_taken_in_seconds: 3
Epoch [1/1], Step [7605/13804], Loss: 2.5682, Perplexity: 13.0428, time_taken_in_seconds: 4
Epoch [1/1], Step [7606/13804], Loss: 2.6354, Perplexity: 13.9492, time_taken_in_seconds: 4
Epoch [1/1], Step [7607/13804], Loss: 2.5404, Perplexity: 12.6851, time_taken_in_seconds: 5
Epoch [1/1], Step [7608/13804], Loss: 2.7567, Perplexity: 15.7480, time_taken_in_seconds: 6
Epoch [1/1], Step [7609/13804], Loss: 2.5805, Perplexity: 13.2040, time_taken_in_seconds: 7
Epoch [1/1], Step [7610/13804], Loss: 2.5014, Perplexity: 12.1992, time_taken_in_seconds: 8
Epoch [1/1], Step [7611/13804], Loss: 2.4506, Perplexity: 11.5949, time_taken_in_seconds: 8
Epoch [1/1], Step [7612/13804], Loss: 2.4061, Perplexity: 11.0908, time_taken_in_seconds: 9
Epoch [1/1], Step [7613/13804], Loss: 2.3960, Perplexity: 10.9789, time_taken_in_seconds: 10
Epoch [1/1], Step [7614/13804], Loss: 2.8603, Perplexity: 17.4675, time_taken_in_seconds: 11
Epoch [1/1], Step [7615/13804], Loss: 3.0802, Perplexity: 21.7635, time_taken_in_seconds: 12
Epoch [1/1], Step [7616/13804], Loss: 2.4462, Perplexity: 11.5444, time_taken_in_seconds: 13
Epoch [1/1], Step [7617/13804], Loss: 2.9451, Perplexity: 19.0122, time_taken_in_seconds: 13
Epoch [1/1], Step [7618/13804], Loss: 2.6931, Perplexity: 14.7771, time_taken_in_seconds: 14
Epoch [1/1], Step [7619/13804], Loss: 3.0076, Perplexity: 20.2396, time_taken_in_seconds: 15
Epoch [1/1], Step [7620/13804], Loss: 2.8984, Perplexity: 18.1458, time_taken_in_seconds: 16
Epoch [1/1], Step [7621/13804], Loss: 2.9306, Perplexity: 18.7384, time_taken_in_seconds: 17
Epoch [1/1], Step [7622/13804], Loss: 2.3259, Perplexity: 10.2363, time_taken_in_seconds: 17
Epoch [1/1], Step [7623/13804], Loss: 2.5080, Perplexity: 12.2807, time_taken_in_seconds: 18
Epoch [1/1], Step [7624/13804], Loss: 2.3843, Perplexity: 10.8519, time_taken_in_seconds: 19
Epoch [1/1], Step [7625/13804], Loss: 2.3801, Perplexity: 10.8064, time_taken_in_seconds: 20
Epoch [1/1], Step [7626/13804], Loss: 2.8290, Perplexity: 16.9278, time_taken_in_seconds: 21
Epoch [1/1], Step [7627/13804], Loss: 3.0201, Perplexity: 20.4929, time_taken_in_seconds: 22
Epoch [1/1], Step [7628/13804], Loss: 2.5852, Perplexity: 13.2660, time_taken_in_seconds: 22
Epoch [1/1], Step [7629/13804], Loss: 3.5478, Perplexity: 34.7363, time_taken_in_seconds: 23
Epoch [1/1], Step [7630/13804], Loss: 2.4309, Perplexity: 11.3686, time_taken_in_seconds: 24
Epoch [1/1], Step [7631/13804], Loss: 2.5528, Perplexity: 12.8436, time_taken_in_seconds: 25
Epoch [1/1], Step [7632/13804], Loss: 2.6734, Perplexity: 14.4893, time_taken_in_seconds: 26
Epoch [1/1], Step [7633/13804], Loss: 2.6100, Perplexity: 13.5994, time_taken_in_seconds: 27
Epoch [1/1], Step [7634/13804], Loss: 2.3313, Perplexity: 10.2909, time_taken_in_seconds: 27
Epoch [1/1], Step [7635/13804], Loss: 2.6239, Perplexity: 13.7891, time_taken_in_seconds: 28
Epoch [1/1], Step [7636/13804], Loss: 3.0588, Perplexity: 21.3017, time_taken_in_seconds: 29
Epoch [1/1], Step [7637/13804], Loss: 2.4181, Perplexity: 11.2242, time_taken_in_seconds: 30
Epoch [1/1], Step [7638/13804], Loss: 2.4785, Perplexity: 11.9231, time_taken_in_seconds: 31
Epoch [1/1], Step [7639/13804], Loss: 2.9054, Perplexity: 18.2717, time_taken_in_seconds: 32
Epoch [1/1], Step [7640/13804], Loss: 2.7496, Perplexity: 15.6359, time_taken_in_seconds: 32
Epoch [1/1], Step [7641/13804], Loss: 2.4611, Perplexity: 11.7182, time_taken_in_seconds: 33
Epoch [1/1], Step [7642/13804], Loss: 2.7743, Perplexity: 16.0280, time_taken_in_seconds: 34
Epoch [1/1], Step [7643/13804], Loss: 2.4452, Perplexity: 11.5327, time_taken_in_seconds: 35
Epoch [1/1], Step [7644/13804], Loss: 2.8065, Perplexity: 16.5516, time_taken_in_seconds: 36
Epoch [1/1], Step [7645/13804], Loss: 2.4785, Perplexity: 11.9229, time_taken_in_seconds: 36
Epoch [1/1], Step [7646/13804], Loss: 2.6832, Perplexity: 14.6324, time_taken_in_seconds: 37
Epoch [1/1], Step [7647/13804], Loss: 2.5854, Perplexity: 13.2685, time_taken_in_seconds: 38
Epoch [1/1], Step [7648/13804], Loss: 2.3186, Perplexity: 10.1618, time_taken_in_seconds: 39
Epoch [1/1], Step [7649/13804], Loss: 2.6448, Perplexity: 14.0812, time_taken_in_seconds: 40
Epoch [1/1], Step [7650/13804], Loss: 2.6715, Perplexity: 14.4613, time_taken_in_seconds: 41
Epoch [1/1], Step [7651/13804], Loss: 2.3235, Perplexity: 10.2114, time_taken_in_seconds: 41
Epoch [1/1], Step [7652/13804], Loss: 2.8174, Perplexity: 16.7338, time_taken_in_seconds: 42
Epoch [1/1], Step [7653/13804], Loss: 2.6874, Perplexity: 14.6928, time_taken_in_seconds: 43
Epoch [1/1], Step [7654/13804], Loss: 2.6537, Perplexity: 14.2069, time_taken_in_seconds: 44
Epoch [1/1], Step [7655/13804], Loss: 2.5729, Perplexity: 13.1040, time_taken_in_seconds: 45
Epoch [1/1], Step [7656/13804], Loss: 2.7702, Perplexity: 15.9613, time_taken_in_seconds: 45
Epoch [1/1], Step [7657/13804], Loss: 2.5706, Perplexity: 13.0731, time_taken_in_seconds: 46
Epoch [1/1], Step [7658/13804], Loss: 2.9102, Perplexity: 18.3611, time_taken_in_seconds: 47
Epoch [1/1], Step [7659/13804], Loss: 2.9476, Perplexity: 19.0607, time_taken_in_seconds: 48
Epoch [1/1], Step [7660/13804], Loss: 2.8561, Perplexity: 17.3942, time_taken_in_seconds: 49
Epoch [1/1], Step [7661/13804], Loss: 2.8983, Perplexity: 18.1435, time_taken_in_seconds: 50
Epoch [1/1], Step [7662/13804], Loss: 2.4765, Perplexity: 11.9001, time_taken_in_seconds: 50
Epoch [1/1], Step [7663/13804], Loss: 2.4239, Perplexity: 11.2895, time_taken_in_seconds: 51
Epoch [1/1], Step [7664/13804], Loss: 2.6612, Perplexity: 14.3137, time_taken_in_seconds: 52
Epoch [1/1], Step [7665/13804], Loss: 2.4005, Perplexity: 11.0291, time_taken_in_seconds: 53
Epoch [1/1], Step [7666/13804], Loss: 2.6000, Perplexity: 13.4639, time_taken_in_seconds: 54
Epoch [1/1], Step [7667/13804], Loss: 2.5846, Perplexity: 13.2581, time_taken_in_seconds: 55
Epoch [1/1], Step [7668/13804], Loss: 2.5856, Perplexity: 13.2712, time_taken_in_seconds: 55
Epoch [1/1], Step [7669/13804], Loss: 2.7338, Perplexity: 15.3919, time_taken_in_seconds: 56
Epoch [1/1], Step [7670/13804], Loss: 2.7029, Perplexity: 14.9224, time_taken_in_seconds: 57
Epoch [1/1], Step [7671/13804], Loss: 2.6922, Perplexity: 14.7638, time_taken_in_seconds: 58
Epoch [1/1], Step [7672/13804], Loss: 2.5838, Perplexity: 13.2471, time_taken_in_seconds: 59
Epoch [1/1], Step [7673/13804], Loss: 2.9152, Perplexity: 18.4522, time_taken_in_seconds: 59
Epoch [1/1], Step [7674/13804], Loss: 2.6155, Perplexity: 13.6743, time_taken_in_seconds: 60
Epoch [1/1], Step [7675/13804], Loss: 2.5895, Perplexity: 13.3229, time_taken_in_seconds: 61
Epoch [1/1], Step [7676/13804], Loss: 2.4760, Perplexity: 11.8937, time_taken_in_seconds: 62
Epoch [1/1], Step [7677/13804], Loss: 2.3609, Perplexity: 10.6004, time_taken_in_seconds: 63
Epoch [1/1], Step [7678/13804], Loss: 2.7333, Perplexity: 15.3834, time_taken_in_seconds: 64
Epoch [1/1], Step [7679/13804], Loss: 2.4488, Perplexity: 11.5744, time_taken_in_seconds: 64
Epoch [1/1], Step [7680/13804], Loss: 2.4310, Perplexity: 11.3701, time_taken_in_seconds: 65
Epoch [1/1], Step [7681/13804], Loss: 2.8902, Perplexity: 17.9964, time_taken_in_seconds: 66
Epoch [1/1], Step [7682/13804], Loss: 2.7214, Perplexity: 15.2017, time_taken_in_seconds: 67
Epoch [1/1], Step [7683/13804], Loss: 2.5881, Perplexity: 13.3044, time_taken_in_seconds: 68
Epoch [1/1], Step [7684/13804], Loss: 2.7736, Perplexity: 16.0161, time_taken_in_seconds: 68
Epoch [1/1], Step [7685/13804], Loss: 3.0119, Perplexity: 20.3267, time_taken_in_seconds: 69
Epoch [1/1], Step [7686/13804], Loss: 2.7499, Perplexity: 15.6415, time_taken_in_seconds: 70
Epoch [1/1], Step [7687/13804], Loss: 2.5188, Perplexity: 12.4141, time_taken_in_seconds: 71
Epoch [1/1], Step [7688/13804], Loss: 2.5129, Perplexity: 12.3403, time_taken_in_seconds: 72
Epoch [1/1], Step [7689/13804], Loss: 2.3387, Perplexity: 10.3676, time_taken_in_seconds: 73
Epoch [1/1], Step [7690/13804], Loss: 2.6551, Perplexity: 14.2261, time_taken_in_seconds: 73
Epoch [1/1], Step [7691/13804], Loss: 2.6024, Perplexity: 13.4956, time_taken_in_seconds: 74
Epoch [1/1], Step [7692/13804], Loss: 2.7553, Perplexity: 15.7256, time_taken_in_seconds: 75
Epoch [1/1], Step [7693/13804], Loss: 2.3964, Perplexity: 10.9834, time_taken_in_seconds: 76
Epoch [1/1], Step [7694/13804], Loss: 2.8644, Perplexity: 17.5379, time_taken_in_seconds: 77
Epoch [1/1], Step [7695/13804], Loss: 2.7527, Perplexity: 15.6845, time_taken_in_seconds: 78
Epoch [1/1], Step [7696/13804], Loss: 2.3959, Perplexity: 10.9780, time_taken_in_seconds: 78
Epoch [1/1], Step [7697/13804], Loss: 2.7008, Perplexity: 14.8919, time_taken_in_seconds: 79
Epoch [1/1], Step [7698/13804], Loss: 2.4188, Perplexity: 11.2328, time_taken_in_seconds: 80
Epoch [1/1], Step [7699/13804], Loss: 2.3857, Perplexity: 10.8667, time_taken_in_seconds: 81
Epoch [1/1], Step [7700/13804], Loss: 2.4312, Perplexity: 11.3729, time_taken_in_seconds: 82
Epoch [1/1], Step [7701/13804], Loss: 2.4295, Perplexity: 11.3528, time_taken_in_seconds: 0
Epoch [1/1], Step [7702/13804], Loss: 2.6458, Perplexity: 14.0941, time_taken_in_seconds: 1
Epoch [1/1], Step [7703/13804], Loss: 2.5968, Perplexity: 13.4204, time_taken_in_seconds: 2
Epoch [1/1], Step [7704/13804], Loss: 2.8871, Perplexity: 17.9420, time_taken_in_seconds: 3
Epoch [1/1], Step [7705/13804], Loss: 2.7546, Perplexity: 15.7150, time_taken_in_seconds: 4
Epoch [1/1], Step [7706/13804], Loss: 2.5736, Perplexity: 13.1123, time_taken_in_seconds: 5
Epoch [1/1], Step [7707/13804], Loss: 2.6885, Perplexity: 14.7103, time_taken_in_seconds: 5
Epoch [1/1], Step [7708/13804], Loss: 2.5338, Perplexity: 12.6009, time_taken_in_seconds: 6
Epoch [1/1], Step [7709/13804], Loss: 2.7117, Perplexity: 15.0553, time_taken_in_seconds: 7
Epoch [1/1], Step [7710/13804], Loss: 2.3628, Perplexity: 10.6209, time_taken_in_seconds: 8
Epoch [1/1], Step [7711/13804], Loss: 2.5516, Perplexity: 12.8273, time_taken_in_seconds: 9
Epoch [1/1], Step [7712/13804], Loss: 2.4512, Perplexity: 11.6021, time_taken_in_seconds: 10
Epoch [1/1], Step [7713/13804], Loss: 2.8071, Perplexity: 16.5626, time_taken_in_seconds: 10
Epoch [1/1], Step [7714/13804], Loss: 2.3189, Perplexity: 10.1644, time_taken_in_seconds: 11
Epoch [1/1], Step [7715/13804], Loss: 2.4076, Perplexity: 11.1074, time_taken_in_seconds: 12
Epoch [1/1], Step [7716/13804], Loss: 2.4494, Perplexity: 11.5814, time_taken_in_seconds: 13
Epoch [1/1], Step [7717/13804], Loss: 2.7888, Perplexity: 16.2615, time_taken_in_seconds: 14
Epoch [1/1], Step [7718/13804], Loss: 2.7187, Perplexity: 15.1604, time_taken_in_seconds: 14
Epoch [1/1], Step [7719/13804], Loss: 2.4619, Perplexity: 11.7273, time_taken_in_seconds: 15
Epoch [1/1], Step [7720/13804], Loss: 2.9144, Perplexity: 18.4374, time_taken_in_seconds: 16
Epoch [1/1], Step [7721/13804], Loss: 2.3427, Perplexity: 10.4091, time_taken_in_seconds: 17
Epoch [1/1], Step [7722/13804], Loss: 2.4927, Perplexity: 12.0939, time_taken_in_seconds: 18
Epoch [1/1], Step [7723/13804], Loss: 2.5590, Perplexity: 12.9226, time_taken_in_seconds: 19
Epoch [1/1], Step [7724/13804], Loss: 2.5964, Perplexity: 13.4158, time_taken_in_seconds: 19
Epoch [1/1], Step [7725/13804], Loss: 2.3280, Perplexity: 10.2573, time_taken_in_seconds: 20
Epoch [1/1], Step [7726/13804], Loss: 2.9465, Perplexity: 19.0396, time_taken_in_seconds: 21
Epoch [1/1], Step [7727/13804], Loss: 2.7108, Perplexity: 15.0414, time_taken_in_seconds: 22
Epoch [1/1], Step [7728/13804], Loss: 2.4195, Perplexity: 11.2405, time_taken_in_seconds: 23
Epoch [1/1], Step [7729/13804], Loss: 2.6394, Perplexity: 14.0048, time_taken_in_seconds: 24
Epoch [1/1], Step [7730/13804], Loss: 2.2964, Perplexity: 9.9387, time_taken_in_seconds: 24
Epoch [1/1], Step [7731/13804], Loss: 2.4329, Perplexity: 11.3914, time_taken_in_seconds: 25
Epoch [1/1], Step [7732/13804], Loss: 2.7987, Perplexity: 16.4240, time_taken_in_seconds: 26
Epoch [1/1], Step [7733/13804], Loss: 2.8304, Perplexity: 16.9519, time_taken_in_seconds: 27
Epoch [1/1], Step [7734/13804], Loss: 2.4669, Perplexity: 11.7853, time_taken_in_seconds: 28
Epoch [1/1], Step [7735/13804], Loss: 2.7683, Perplexity: 15.9314, time_taken_in_seconds: 28
Epoch [1/1], Step [7736/13804], Loss: 2.4165, Perplexity: 11.2067, time_taken_in_seconds: 29
Epoch [1/1], Step [7737/13804], Loss: 2.5461, Perplexity: 12.7576, time_taken_in_seconds: 30
Epoch [1/1], Step [7738/13804], Loss: 2.5460, Perplexity: 12.7559, time_taken_in_seconds: 31
Epoch [1/1], Step [7739/13804], Loss: 2.6930, Perplexity: 14.7762, time_taken_in_seconds: 32
Epoch [1/1], Step [7740/13804], Loss: 2.6367, Perplexity: 13.9670, time_taken_in_seconds: 33
Epoch [1/1], Step [7741/13804], Loss: 2.7573, Perplexity: 15.7579, time_taken_in_seconds: 33
Epoch [1/1], Step [7742/13804], Loss: 2.6777, Perplexity: 14.5517, time_taken_in_seconds: 34
Epoch [1/1], Step [7743/13804], Loss: 2.9911, Perplexity: 19.9083, time_taken_in_seconds: 35
Epoch [1/1], Step [7744/13804], Loss: 3.3194, Perplexity: 27.6448, time_taken_in_seconds: 36
Epoch [1/1], Step [7745/13804], Loss: 2.4291, Perplexity: 11.3481, time_taken_in_seconds: 37
Epoch [1/1], Step [7746/13804], Loss: 2.5753, Perplexity: 13.1350, time_taken_in_seconds: 37
Epoch [1/1], Step [7747/13804], Loss: 2.2338, Perplexity: 9.3350, time_taken_in_seconds: 38
Epoch [1/1], Step [7748/13804], Loss: 3.2131, Perplexity: 24.8568, time_taken_in_seconds: 39
Epoch [1/1], Step [7749/13804], Loss: 2.8656, Perplexity: 17.5603, time_taken_in_seconds: 40
Epoch [1/1], Step [7750/13804], Loss: 2.5758, Perplexity: 13.1412, time_taken_in_seconds: 41
Epoch [1/1], Step [7751/13804], Loss: 2.9260, Perplexity: 18.6521, time_taken_in_seconds: 42
Epoch [1/1], Step [7752/13804], Loss: 2.6052, Perplexity: 13.5345, time_taken_in_seconds: 42
Epoch [1/1], Step [7753/13804], Loss: 2.6826, Perplexity: 14.6225, time_taken_in_seconds: 43
Epoch [1/1], Step [7754/13804], Loss: 2.7724, Perplexity: 15.9967, time_taken_in_seconds: 44
Epoch [1/1], Step [7755/13804], Loss: 2.6427, Perplexity: 14.0516, time_taken_in_seconds: 45
Epoch [1/1], Step [7756/13804], Loss: 2.7620, Perplexity: 15.8314, time_taken_in_seconds: 46
Epoch [1/1], Step [7757/13804], Loss: 2.7976, Perplexity: 16.4057, time_taken_in_seconds: 46
Epoch [1/1], Step [7758/13804], Loss: 3.0223, Perplexity: 20.5395, time_taken_in_seconds: 47
Epoch [1/1], Step [7759/13804], Loss: 3.1405, Perplexity: 23.1158, time_taken_in_seconds: 48
Epoch [1/1], Step [7760/13804], Loss: 2.4306, Perplexity: 11.3662, time_taken_in_seconds: 49
Epoch [1/1], Step [7761/13804], Loss: 2.2705, Perplexity: 9.6841, time_taken_in_seconds: 50
Epoch [1/1], Step [7762/13804], Loss: 2.6453, Perplexity: 14.0876, time_taken_in_seconds: 51
Epoch [1/1], Step [7763/13804], Loss: 2.6403, Perplexity: 14.0174, time_taken_in_seconds: 51
Epoch [1/1], Step [7764/13804], Loss: 2.7113, Perplexity: 15.0491, time_taken_in_seconds: 52
Epoch [1/1], Step [7765/13804], Loss: 2.4911, Perplexity: 12.0746, time_taken_in_seconds: 53
Epoch [1/1], Step [7766/13804], Loss: 2.7363, Perplexity: 15.4305, time_taken_in_seconds: 54
Epoch [1/1], Step [7767/13804], Loss: 2.5271, Perplexity: 12.5167, time_taken_in_seconds: 55
Epoch [1/1], Step [7768/13804], Loss: 3.0526, Perplexity: 21.1696, time_taken_in_seconds: 55
Epoch [1/1], Step [7769/13804], Loss: 2.3727, Perplexity: 10.7268, time_taken_in_seconds: 56
Epoch [1/1], Step [7770/13804], Loss: 2.6360, Perplexity: 13.9577, time_taken_in_seconds: 57
Epoch [1/1], Step [7771/13804], Loss: 2.4349, Perplexity: 11.4149, time_taken_in_seconds: 58
Epoch [1/1], Step [7772/13804], Loss: 3.3162, Perplexity: 27.5559, time_taken_in_seconds: 59
Epoch [1/1], Step [7773/13804], Loss: 2.6576, Perplexity: 14.2617, time_taken_in_seconds: 60
Epoch [1/1], Step [7774/13804], Loss: 2.4425, Perplexity: 11.5021, time_taken_in_seconds: 60
Epoch [1/1], Step [7775/13804], Loss: 2.9704, Perplexity: 19.4993, time_taken_in_seconds: 61
Epoch [1/1], Step [7776/13804], Loss: 2.4616, Perplexity: 11.7236, time_taken_in_seconds: 62
Epoch [1/1], Step [7777/13804], Loss: 2.7020, Perplexity: 14.9097, time_taken_in_seconds: 63
Epoch [1/1], Step [7778/13804], Loss: 2.8366, Perplexity: 17.0578, time_taken_in_seconds: 64
Epoch [1/1], Step [7779/13804], Loss: 2.3463, Perplexity: 10.4472, time_taken_in_seconds: 65
Epoch [1/1], Step [7780/13804], Loss: 2.3454, Perplexity: 10.4379, time_taken_in_seconds: 66
Epoch [1/1], Step [7781/13804], Loss: 2.5279, Perplexity: 12.5272, time_taken_in_seconds: 66
Epoch [1/1], Step [7782/13804], Loss: 2.3738, Perplexity: 10.7379, time_taken_in_seconds: 67
Epoch [1/1], Step [7783/13804], Loss: 2.3552, Perplexity: 10.5402, time_taken_in_seconds: 68
Epoch [1/1], Step [7784/13804], Loss: 2.5293, Perplexity: 12.5448, time_taken_in_seconds: 69
Epoch [1/1], Step [7785/13804], Loss: 2.8660, Perplexity: 17.5660, time_taken_in_seconds: 70
Epoch [1/1], Step [7786/13804], Loss: 2.8214, Perplexity: 16.8003, time_taken_in_seconds: 70
Epoch [1/1], Step [7787/13804], Loss: 2.3177, Perplexity: 10.1528, time_taken_in_seconds: 71
Epoch [1/1], Step [7788/13804], Loss: 2.4522, Perplexity: 11.6143, time_taken_in_seconds: 72
Epoch [1/1], Step [7789/13804], Loss: 2.7007, Perplexity: 14.8906, time_taken_in_seconds: 73
Epoch [1/1], Step [7790/13804], Loss: 2.8884, Perplexity: 17.9638, time_taken_in_seconds: 74
Epoch [1/1], Step [7791/13804], Loss: 2.3033, Perplexity: 10.0069, time_taken_in_seconds: 75
Epoch [1/1], Step [7792/13804], Loss: 2.3780, Perplexity: 10.7838, time_taken_in_seconds: 75
Epoch [1/1], Step [7793/13804], Loss: 2.6293, Perplexity: 13.8647, time_taken_in_seconds: 76
Epoch [1/1], Step [7794/13804], Loss: 3.1727, Perplexity: 23.8723, time_taken_in_seconds: 77
Epoch [1/1], Step [7795/13804], Loss: 2.7044, Perplexity: 14.9457, time_taken_in_seconds: 78
Epoch [1/1], Step [7796/13804], Loss: 2.5764, Perplexity: 13.1493, time_taken_in_seconds: 79
Epoch [1/1], Step [7797/13804], Loss: 2.4857, Perplexity: 12.0090, time_taken_in_seconds: 80
Epoch [1/1], Step [7798/13804], Loss: 2.7906, Perplexity: 16.2907, time_taken_in_seconds: 80
Epoch [1/1], Step [7799/13804], Loss: 3.0034, Perplexity: 20.1532, time_taken_in_seconds: 81
Epoch [1/1], Step [7800/13804], Loss: 2.6189, Perplexity: 13.7212, time_taken_in_seconds: 82
Epoch [1/1], Step [7801/13804], Loss: 2.6647, Perplexity: 14.3638, time_taken_in_seconds: 0
Epoch [1/1], Step [7802/13804], Loss: 2.4775, Perplexity: 11.9112, time_taken_in_seconds: 1
Epoch [1/1], Step [7803/13804], Loss: 3.3156, Perplexity: 27.5399, time_taken_in_seconds: 2
Epoch [1/1], Step [7804/13804], Loss: 2.8736, Perplexity: 17.7000, time_taken_in_seconds: 3
Epoch [1/1], Step [7805/13804], Loss: 2.6378, Perplexity: 13.9830, time_taken_in_seconds: 4
Epoch [1/1], Step [7806/13804], Loss: 2.2882, Perplexity: 9.8567, time_taken_in_seconds: 5
Epoch [1/1], Step [7807/13804], Loss: 2.6306, Perplexity: 13.8817, time_taken_in_seconds: 5
Epoch [1/1], Step [7808/13804], Loss: 2.5036, Perplexity: 12.2269, time_taken_in_seconds: 6
Epoch [1/1], Step [7809/13804], Loss: 2.6265, Perplexity: 13.8255, time_taken_in_seconds: 7
Epoch [1/1], Step [7810/13804], Loss: 2.4562, Perplexity: 11.6604, time_taken_in_seconds: 8
Epoch [1/1], Step [7811/13804], Loss: 2.2073, Perplexity: 9.0911, time_taken_in_seconds: 9
Epoch [1/1], Step [7812/13804], Loss: 2.9243, Perplexity: 18.6217, time_taken_in_seconds: 9
Epoch [1/1], Step [7813/13804], Loss: 2.6303, Perplexity: 13.8783, time_taken_in_seconds: 10
Epoch [1/1], Step [7814/13804], Loss: 2.5912, Perplexity: 13.3463, time_taken_in_seconds: 11
Epoch [1/1], Step [7815/13804], Loss: 2.6177, Perplexity: 13.7043, time_taken_in_seconds: 12
Epoch [1/1], Step [7816/13804], Loss: 2.5567, Perplexity: 12.8932, time_taken_in_seconds: 13
Epoch [1/1], Step [7817/13804], Loss: 3.0015, Perplexity: 20.1161, time_taken_in_seconds: 14
Epoch [1/1], Step [7818/13804], Loss: 2.7271, Perplexity: 15.2888, time_taken_in_seconds: 14
Epoch [1/1], Step [7819/13804], Loss: 2.4757, Perplexity: 11.8905, time_taken_in_seconds: 15
Epoch [1/1], Step [7820/13804], Loss: 2.5380, Perplexity: 12.6547, time_taken_in_seconds: 16
Epoch [1/1], Step [7821/13804], Loss: 2.7757, Perplexity: 16.0504, time_taken_in_seconds: 17
Epoch [1/1], Step [7822/13804], Loss: 2.7421, Perplexity: 15.5201, time_taken_in_seconds: 18
Epoch [1/1], Step [7823/13804], Loss: 3.1464, Perplexity: 23.2530, time_taken_in_seconds: 19
Epoch [1/1], Step [7824/13804], Loss: 2.5227, Perplexity: 12.4625, time_taken_in_seconds: 19
Epoch [1/1], Step [7825/13804], Loss: 2.7839, Perplexity: 16.1822, time_taken_in_seconds: 20
Epoch [1/1], Step [7826/13804], Loss: 2.8118, Perplexity: 16.6400, time_taken_in_seconds: 21
Epoch [1/1], Step [7827/13804], Loss: 3.1106, Perplexity: 22.4342, time_taken_in_seconds: 22
Epoch [1/1], Step [7828/13804], Loss: 2.8309, Perplexity: 16.9607, time_taken_in_seconds: 23
Epoch [1/1], Step [7829/13804], Loss: 2.8770, Perplexity: 17.7606, time_taken_in_seconds: 24
Epoch [1/1], Step [7830/13804], Loss: 2.5390, Perplexity: 12.6675, time_taken_in_seconds: 24
Epoch [1/1], Step [7831/13804], Loss: 2.7506, Perplexity: 15.6516, time_taken_in_seconds: 25
Epoch [1/1], Step [7832/13804], Loss: 2.6431, Perplexity: 14.0569, time_taken_in_seconds: 26
Epoch [1/1], Step [7833/13804], Loss: 3.3075, Perplexity: 27.3167, time_taken_in_seconds: 27
Epoch [1/1], Step [7834/13804], Loss: 2.6432, Perplexity: 14.0579, time_taken_in_seconds: 28
Epoch [1/1], Step [7835/13804], Loss: 2.5577, Perplexity: 12.9062, time_taken_in_seconds: 28
Epoch [1/1], Step [7836/13804], Loss: 2.6153, Perplexity: 13.6707, time_taken_in_seconds: 29
Epoch [1/1], Step [7837/13804], Loss: 2.5583, Perplexity: 12.9137, time_taken_in_seconds: 30
Epoch [1/1], Step [7838/13804], Loss: 2.1794, Perplexity: 8.8413, time_taken_in_seconds: 31
Epoch [1/1], Step [7839/13804], Loss: 2.7178, Perplexity: 15.1474, time_taken_in_seconds: 32
Epoch [1/1], Step [7840/13804], Loss: 2.9631, Perplexity: 19.3588, time_taken_in_seconds: 33
Epoch [1/1], Step [7841/13804], Loss: 2.9731, Perplexity: 19.5530, time_taken_in_seconds: 33
Epoch [1/1], Step [7842/13804], Loss: 2.6697, Perplexity: 14.4350, time_taken_in_seconds: 34
Epoch [1/1], Step [7843/13804], Loss: 2.7003, Perplexity: 14.8845, time_taken_in_seconds: 35
Epoch [1/1], Step [7844/13804], Loss: 2.3391, Perplexity: 10.3716, time_taken_in_seconds: 36
Epoch [1/1], Step [7845/13804], Loss: 2.3578, Perplexity: 10.5681, time_taken_in_seconds: 37
Epoch [1/1], Step [7846/13804], Loss: 3.5256, Perplexity: 33.9736, time_taken_in_seconds: 38
Epoch [1/1], Step [7847/13804], Loss: 2.3961, Perplexity: 10.9805, time_taken_in_seconds: 39
Epoch [1/1], Step [7848/13804], Loss: 2.4438, Perplexity: 11.5171, time_taken_in_seconds: 39
Epoch [1/1], Step [7849/13804], Loss: 3.0769, Perplexity: 21.6919, time_taken_in_seconds: 40
Epoch [1/1], Step [7850/13804], Loss: 2.9324, Perplexity: 18.7733, time_taken_in_seconds: 41
Epoch [1/1], Step [7851/13804], Loss: 2.6533, Perplexity: 14.2007, time_taken_in_seconds: 42
Epoch [1/1], Step [7852/13804], Loss: 2.8910, Perplexity: 18.0107, time_taken_in_seconds: 43
Epoch [1/1], Step [7853/13804], Loss: 2.7024, Perplexity: 14.9150, time_taken_in_seconds: 44
Epoch [1/1], Step [7854/13804], Loss: 2.6785, Perplexity: 14.5630, time_taken_in_seconds: 45
Epoch [1/1], Step [7855/13804], Loss: 2.2107, Perplexity: 9.1221, time_taken_in_seconds: 45
Epoch [1/1], Step [7856/13804], Loss: 2.9236, Perplexity: 18.6084, time_taken_in_seconds: 46
Epoch [1/1], Step [7857/13804], Loss: 2.6332, Perplexity: 13.9179, time_taken_in_seconds: 47
Epoch [1/1], Step [7858/13804], Loss: 2.4375, Perplexity: 11.4448, time_taken_in_seconds: 48
Epoch [1/1], Step [7859/13804], Loss: 2.9267, Perplexity: 18.6656, time_taken_in_seconds: 49
Epoch [1/1], Step [7860/13804], Loss: 2.5479, Perplexity: 12.7803, time_taken_in_seconds: 49
Epoch [1/1], Step [7861/13804], Loss: 2.4875, Perplexity: 12.0310, time_taken_in_seconds: 50
Epoch [1/1], Step [7862/13804], Loss: 2.6282, Perplexity: 13.8487, time_taken_in_seconds: 51
Epoch [1/1], Step [7863/13804], Loss: 2.6649, Perplexity: 14.3669, time_taken_in_seconds: 52
Epoch [1/1], Step [7864/13804], Loss: 2.9009, Perplexity: 18.1906, time_taken_in_seconds: 53
Epoch [1/1], Step [7865/13804], Loss: 2.3074, Perplexity: 10.0482, time_taken_in_seconds: 54
Epoch [1/1], Step [7866/13804], Loss: 2.5809, Perplexity: 13.2089, time_taken_in_seconds: 54
Epoch [1/1], Step [7867/13804], Loss: 2.2154, Perplexity: 9.1648, time_taken_in_seconds: 55
Epoch [1/1], Step [7868/13804], Loss: 2.6786, Perplexity: 14.5647, time_taken_in_seconds: 56
Epoch [1/1], Step [7869/13804], Loss: 3.2876, Perplexity: 26.7796, time_taken_in_seconds: 57
Epoch [1/1], Step [7870/13804], Loss: 2.3707, Perplexity: 10.7054, time_taken_in_seconds: 58
Epoch [1/1], Step [7871/13804], Loss: 2.3093, Perplexity: 10.0677, time_taken_in_seconds: 58
Epoch [1/1], Step [7872/13804], Loss: 2.7191, Perplexity: 15.1661, time_taken_in_seconds: 59
Epoch [1/1], Step [7873/13804], Loss: 2.4546, Perplexity: 11.6415, time_taken_in_seconds: 60
Epoch [1/1], Step [7874/13804], Loss: 2.8656, Perplexity: 17.5589, time_taken_in_seconds: 61
Epoch [1/1], Step [7875/13804], Loss: 2.6224, Perplexity: 13.7684, time_taken_in_seconds: 62
Epoch [1/1], Step [7876/13804], Loss: 2.6401, Perplexity: 14.0152, time_taken_in_seconds: 63
Epoch [1/1], Step [7877/13804], Loss: 2.3589, Perplexity: 10.5797, time_taken_in_seconds: 63
Epoch [1/1], Step [7878/13804], Loss: 2.4366, Perplexity: 11.4336, time_taken_in_seconds: 64
Epoch [1/1], Step [7879/13804], Loss: 2.7605, Perplexity: 15.8083, time_taken_in_seconds: 65
Epoch [1/1], Step [7880/13804], Loss: 3.2008, Perplexity: 24.5512, time_taken_in_seconds: 66
Epoch [1/1], Step [7881/13804], Loss: 2.7381, Perplexity: 15.4576, time_taken_in_seconds: 67
Epoch [1/1], Step [7882/13804], Loss: 2.8384, Perplexity: 17.0885, time_taken_in_seconds: 68
Epoch [1/1], Step [7883/13804], Loss: 2.5828, Perplexity: 13.2343, time_taken_in_seconds: 68
Epoch [1/1], Step [7884/13804], Loss: 2.5619, Perplexity: 12.9604, time_taken_in_seconds: 69
Epoch [1/1], Step [7885/13804], Loss: 2.5662, Perplexity: 13.0163, time_taken_in_seconds: 70
Epoch [1/1], Step [7886/13804], Loss: 2.4865, Perplexity: 12.0197, time_taken_in_seconds: 71
Epoch [1/1], Step [7887/13804], Loss: 2.2755, Perplexity: 9.7328, time_taken_in_seconds: 72
Epoch [1/1], Step [7888/13804], Loss: 2.8840, Perplexity: 17.8851, time_taken_in_seconds: 72
Epoch [1/1], Step [7889/13804], Loss: 2.7588, Perplexity: 15.7808, time_taken_in_seconds: 73
Epoch [1/1], Step [7890/13804], Loss: 2.7615, Perplexity: 15.8228, time_taken_in_seconds: 74
Epoch [1/1], Step [7891/13804], Loss: 2.6923, Perplexity: 14.7660, time_taken_in_seconds: 75
Epoch [1/1], Step [7892/13804], Loss: 2.1899, Perplexity: 8.9340, time_taken_in_seconds: 76
Epoch [1/1], Step [7893/13804], Loss: 2.7751, Perplexity: 16.0399, time_taken_in_seconds: 76
Epoch [1/1], Step [7894/13804], Loss: 2.2734, Perplexity: 9.7127, time_taken_in_seconds: 77
Epoch [1/1], Step [7895/13804], Loss: 2.2756, Perplexity: 9.7336, time_taken_in_seconds: 78
Epoch [1/1], Step [7896/13804], Loss: 2.8613, Perplexity: 17.4834, time_taken_in_seconds: 79
Epoch [1/1], Step [7897/13804], Loss: 2.5996, Perplexity: 13.4586, time_taken_in_seconds: 80
Epoch [1/1], Step [7898/13804], Loss: 2.6328, Perplexity: 13.9126, time_taken_in_seconds: 81
Epoch [1/1], Step [7899/13804], Loss: 2.9656, Perplexity: 19.4059, time_taken_in_seconds: 81
Epoch [1/1], Step [7900/13804], Loss: 2.5425, Perplexity: 12.7116, time_taken_in_seconds: 82
Epoch [1/1], Step [7901/13804], Loss: 2.7541, Perplexity: 15.7069, time_taken_in_seconds: 0
Epoch [1/1], Step [7902/13804], Loss: 3.0061, Perplexity: 20.2092, time_taken_in_seconds: 1
Epoch [1/1], Step [7903/13804], Loss: 2.4975, Perplexity: 12.1523, time_taken_in_seconds: 2
Epoch [1/1], Step [7904/13804], Loss: 2.6948, Perplexity: 14.8025, time_taken_in_seconds: 3
Epoch [1/1], Step [7905/13804], Loss: 2.6203, Perplexity: 13.7401, time_taken_in_seconds: 4
Epoch [1/1], Step [7906/13804], Loss: 2.6627, Perplexity: 14.3355, time_taken_in_seconds: 4
Epoch [1/1], Step [7907/13804], Loss: 2.7068, Perplexity: 14.9817, time_taken_in_seconds: 5
Epoch [1/1], Step [7908/13804], Loss: 2.3618, Perplexity: 10.6098, time_taken_in_seconds: 6
Epoch [1/1], Step [7909/13804], Loss: 2.3568, Perplexity: 10.5575, time_taken_in_seconds: 7
Epoch [1/1], Step [7910/13804], Loss: 2.7146, Perplexity: 15.0985, time_taken_in_seconds: 8
Epoch [1/1], Step [7911/13804], Loss: 2.4401, Perplexity: 11.4741, time_taken_in_seconds: 9
Epoch [1/1], Step [7912/13804], Loss: 2.4844, Perplexity: 11.9939, time_taken_in_seconds: 9
Epoch [1/1], Step [7913/13804], Loss: 2.3669, Perplexity: 10.6644, time_taken_in_seconds: 10
Epoch [1/1], Step [7914/13804], Loss: 2.7197, Perplexity: 15.1754, time_taken_in_seconds: 11
Epoch [1/1], Step [7915/13804], Loss: 2.5222, Perplexity: 12.4554, time_taken_in_seconds: 12
Epoch [1/1], Step [7916/13804], Loss: 2.1828, Perplexity: 8.8711, time_taken_in_seconds: 13
Epoch [1/1], Step [7917/13804], Loss: 2.4526, Perplexity: 11.6182, time_taken_in_seconds: 13
Epoch [1/1], Step [7918/13804], Loss: 2.7529, Perplexity: 15.6880, time_taken_in_seconds: 14
Epoch [1/1], Step [7919/13804], Loss: 2.4587, Perplexity: 11.6900, time_taken_in_seconds: 15
Epoch [1/1], Step [7920/13804], Loss: 2.4764, Perplexity: 11.8983, time_taken_in_seconds: 16
Epoch [1/1], Step [7921/13804], Loss: 2.5765, Perplexity: 13.1517, time_taken_in_seconds: 17
Epoch [1/1], Step [7922/13804], Loss: 2.3796, Perplexity: 10.8009, time_taken_in_seconds: 18
Epoch [1/1], Step [7923/13804], Loss: 2.7384, Perplexity: 15.4626, time_taken_in_seconds: 18
Epoch [1/1], Step [7924/13804], Loss: 2.5423, Perplexity: 12.7085, time_taken_in_seconds: 19
Epoch [1/1], Step [7925/13804], Loss: 2.5451, Perplexity: 12.7450, time_taken_in_seconds: 20
Epoch [1/1], Step [7926/13804], Loss: 2.5202, Perplexity: 12.4316, time_taken_in_seconds: 21
Epoch [1/1], Step [7927/13804], Loss: 2.7307, Perplexity: 15.3432, time_taken_in_seconds: 22
Epoch [1/1], Step [7928/13804], Loss: 2.5966, Perplexity: 13.4178, time_taken_in_seconds: 23
Epoch [1/1], Step [7929/13804], Loss: 2.2907, Perplexity: 9.8814, time_taken_in_seconds: 24
Epoch [1/1], Step [7930/13804], Loss: 2.5960, Perplexity: 13.4097, time_taken_in_seconds: 24
Epoch [1/1], Step [7931/13804], Loss: 2.5481, Perplexity: 12.7832, time_taken_in_seconds: 25
Epoch [1/1], Step [7932/13804], Loss: 3.0365, Perplexity: 20.8318, time_taken_in_seconds: 26
Epoch [1/1], Step [7933/13804], Loss: 2.6121, Perplexity: 13.6283, time_taken_in_seconds: 27
Epoch [1/1], Step [7934/13804], Loss: 2.8757, Perplexity: 17.7384, time_taken_in_seconds: 28
Epoch [1/1], Step [7935/13804], Loss: 2.8346, Perplexity: 17.0232, time_taken_in_seconds: 28
Epoch [1/1], Step [7936/13804], Loss: 2.7441, Perplexity: 15.5505, time_taken_in_seconds: 29
Epoch [1/1], Step [7937/13804], Loss: 2.7866, Perplexity: 16.2257, time_taken_in_seconds: 30
Epoch [1/1], Step [7938/13804], Loss: 2.6021, Perplexity: 13.4926, time_taken_in_seconds: 31
Epoch [1/1], Step [7939/13804], Loss: 2.8220, Perplexity: 16.8109, time_taken_in_seconds: 32
Epoch [1/1], Step [7940/13804], Loss: 2.4171, Perplexity: 11.2128, time_taken_in_seconds: 33
Epoch [1/1], Step [7941/13804], Loss: 3.1356, Perplexity: 23.0033, time_taken_in_seconds: 33
Epoch [1/1], Step [7942/13804], Loss: 2.4341, Perplexity: 11.4058, time_taken_in_seconds: 34
Epoch [1/1], Step [7943/13804], Loss: 2.5270, Perplexity: 12.5164, time_taken_in_seconds: 35
Epoch [1/1], Step [7944/13804], Loss: 2.5293, Perplexity: 12.5446, time_taken_in_seconds: 36
Epoch [1/1], Step [7945/13804], Loss: 2.5301, Perplexity: 12.5547, time_taken_in_seconds: 37
Epoch [1/1], Step [7946/13804], Loss: 2.2886, Perplexity: 9.8607, time_taken_in_seconds: 38
Epoch [1/1], Step [7947/13804], Loss: 2.3833, Perplexity: 10.8407, time_taken_in_seconds: 38
Epoch [1/1], Step [7948/13804], Loss: 2.6361, Perplexity: 13.9585, time_taken_in_seconds: 39
Epoch [1/1], Step [7949/13804], Loss: 2.9256, Perplexity: 18.6457, time_taken_in_seconds: 40
Epoch [1/1], Step [7950/13804], Loss: 2.4174, Perplexity: 11.2167, time_taken_in_seconds: 41
Epoch [1/1], Step [7951/13804], Loss: 3.0042, Perplexity: 20.1699, time_taken_in_seconds: 42
Epoch [1/1], Step [7952/13804], Loss: 2.6401, Perplexity: 14.0143, time_taken_in_seconds: 42
Epoch [1/1], Step [7953/13804], Loss: 2.8917, Perplexity: 18.0240, time_taken_in_seconds: 43
Epoch [1/1], Step [7954/13804], Loss: 2.5363, Perplexity: 12.6327, time_taken_in_seconds: 44
Epoch [1/1], Step [7955/13804], Loss: 2.5991, Perplexity: 13.4512, time_taken_in_seconds: 45
Epoch [1/1], Step [7956/13804], Loss: 2.4167, Perplexity: 11.2087, time_taken_in_seconds: 46
Epoch [1/1], Step [7957/13804], Loss: 2.4837, Perplexity: 11.9852, time_taken_in_seconds: 47
Epoch [1/1], Step [7958/13804], Loss: 3.0125, Perplexity: 20.3391, time_taken_in_seconds: 47
Epoch [1/1], Step [7959/13804], Loss: 2.4871, Perplexity: 12.0267, time_taken_in_seconds: 48
Epoch [1/1], Step [7960/13804], Loss: 2.8349, Perplexity: 17.0279, time_taken_in_seconds: 49
Epoch [1/1], Step [7961/13804], Loss: 2.3570, Perplexity: 10.5594, time_taken_in_seconds: 50
Epoch [1/1], Step [7962/13804], Loss: 2.9162, Perplexity: 18.4707, time_taken_in_seconds: 51
Epoch [1/1], Step [7963/13804], Loss: 3.2615, Perplexity: 26.0883, time_taken_in_seconds: 52
Epoch [1/1], Step [7964/13804], Loss: 2.4630, Perplexity: 11.7395, time_taken_in_seconds: 52
Epoch [1/1], Step [7965/13804], Loss: 2.7078, Perplexity: 14.9958, time_taken_in_seconds: 53
Epoch [1/1], Step [7966/13804], Loss: 2.5990, Perplexity: 13.4505, time_taken_in_seconds: 54
Epoch [1/1], Step [7967/13804], Loss: 2.5639, Perplexity: 12.9860, time_taken_in_seconds: 55
Epoch [1/1], Step [7968/13804], Loss: 2.6199, Perplexity: 13.7349, time_taken_in_seconds: 56
Epoch [1/1], Step [7969/13804], Loss: 2.4408, Perplexity: 11.4817, time_taken_in_seconds: 57
Epoch [1/1], Step [7970/13804], Loss: 2.4633, Perplexity: 11.7438, time_taken_in_seconds: 57
Epoch [1/1], Step [7971/13804], Loss: 2.6156, Perplexity: 13.6752, time_taken_in_seconds: 58
Epoch [1/1], Step [7972/13804], Loss: 2.6482, Perplexity: 14.1279, time_taken_in_seconds: 59
Epoch [1/1], Step [7973/13804], Loss: 2.6089, Perplexity: 13.5839, time_taken_in_seconds: 60
Epoch [1/1], Step [7974/13804], Loss: 2.5229, Perplexity: 12.4643, time_taken_in_seconds: 61
Epoch [1/1], Step [7975/13804], Loss: 2.4830, Perplexity: 11.9772, time_taken_in_seconds: 61
Epoch [1/1], Step [7976/13804], Loss: 2.4453, Perplexity: 11.5342, time_taken_in_seconds: 62
Epoch [1/1], Step [7977/13804], Loss: 2.5751, Perplexity: 13.1330, time_taken_in_seconds: 63
Epoch [1/1], Step [7978/13804], Loss: 2.4669, Perplexity: 11.7859, time_taken_in_seconds: 64
Epoch [1/1], Step [7979/13804], Loss: 2.4239, Perplexity: 11.2901, time_taken_in_seconds: 65
Epoch [1/1], Step [7980/13804], Loss: 2.8369, Perplexity: 17.0631, time_taken_in_seconds: 66
Epoch [1/1], Step [7981/13804], Loss: 2.7904, Perplexity: 16.2877, time_taken_in_seconds: 66
Epoch [1/1], Step [7982/13804], Loss: 2.3275, Perplexity: 10.2524, time_taken_in_seconds: 67
Epoch [1/1], Step [7983/13804], Loss: 2.9344, Perplexity: 18.8093, time_taken_in_seconds: 68
Epoch [1/1], Step [7984/13804], Loss: 2.4099, Perplexity: 11.1329, time_taken_in_seconds: 69
Epoch [1/1], Step [7985/13804], Loss: 2.7208, Perplexity: 15.1922, time_taken_in_seconds: 70
Epoch [1/1], Step [7986/13804], Loss: 2.6699, Perplexity: 14.4386, time_taken_in_seconds: 71
Epoch [1/1], Step [7987/13804], Loss: 3.0908, Perplexity: 21.9954, time_taken_in_seconds: 71
Epoch [1/1], Step [7988/13804], Loss: 2.3389, Perplexity: 10.3696, time_taken_in_seconds: 72
Epoch [1/1], Step [7989/13804], Loss: 2.4559, Perplexity: 11.6565, time_taken_in_seconds: 73
Epoch [1/1], Step [7990/13804], Loss: 2.4191, Perplexity: 11.2353, time_taken_in_seconds: 74
Epoch [1/1], Step [7991/13804], Loss: 3.0824, Perplexity: 21.8114, time_taken_in_seconds: 75
Epoch [1/1], Step [7992/13804], Loss: 2.6008, Perplexity: 13.4739, time_taken_in_seconds: 75
Epoch [1/1], Step [7993/13804], Loss: 2.6302, Perplexity: 13.8768, time_taken_in_seconds: 76
Epoch [1/1], Step [7994/13804], Loss: 2.7602, Perplexity: 15.8026, time_taken_in_seconds: 77
Epoch [1/1], Step [7995/13804], Loss: 2.3844, Perplexity: 10.8529, time_taken_in_seconds: 78
Epoch [1/1], Step [7996/13804], Loss: 2.7838, Perplexity: 16.1800, time_taken_in_seconds: 79
Epoch [1/1], Step [7997/13804], Loss: 3.2071, Perplexity: 24.7070, time_taken_in_seconds: 80
Epoch [1/1], Step [7998/13804], Loss: 2.5853, Perplexity: 13.2677, time_taken_in_seconds: 81
Epoch [1/1], Step [7999/13804], Loss: 2.6401, Perplexity: 14.0150, time_taken_in_seconds: 81
Epoch [1/1], Step [8000/13804], Loss: 2.9103, Perplexity: 18.3626, time_taken_in_seconds: 82
Epoch [1/1], Step [8001/13804], Loss: 2.8179, Perplexity: 16.7412, time_taken_in_seconds: 0
Epoch [1/1], Step [8002/13804], Loss: 2.6085, Perplexity: 13.5787, time_taken_in_seconds: 1
Epoch [1/1], Step [8003/13804], Loss: 2.8294, Perplexity: 16.9358, time_taken_in_seconds: 2
Epoch [1/1], Step [8004/13804], Loss: 3.2387, Perplexity: 25.5014, time_taken_in_seconds: 3
Epoch [1/1], Step [8005/13804], Loss: 2.3843, Perplexity: 10.8517, time_taken_in_seconds: 4
Epoch [1/1], Step [8006/13804], Loss: 2.7713, Perplexity: 15.9801, time_taken_in_seconds: 4
Epoch [1/1], Step [8007/13804], Loss: 2.9621, Perplexity: 19.3387, time_taken_in_seconds: 5
Epoch [1/1], Step [8008/13804], Loss: 2.6325, Perplexity: 13.9082, time_taken_in_seconds: 6
Epoch [1/1], Step [8009/13804], Loss: 2.5169, Perplexity: 12.3896, time_taken_in_seconds: 7
Epoch [1/1], Step [8010/13804], Loss: 2.5048, Perplexity: 12.2416, time_taken_in_seconds: 8
Epoch [1/1], Step [8011/13804], Loss: 2.3864, Perplexity: 10.8741, time_taken_in_seconds: 9
Epoch [1/1], Step [8012/13804], Loss: 2.5842, Perplexity: 13.2532, time_taken_in_seconds: 9
Epoch [1/1], Step [8013/13804], Loss: 2.6282, Perplexity: 13.8485, time_taken_in_seconds: 10
Epoch [1/1], Step [8014/13804], Loss: 2.4803, Perplexity: 11.9454, time_taken_in_seconds: 11
Epoch [1/1], Step [8015/13804], Loss: 2.4303, Perplexity: 11.3626, time_taken_in_seconds: 12
Epoch [1/1], Step [8016/13804], Loss: 2.6838, Perplexity: 14.6412, time_taken_in_seconds: 13
Epoch [1/1], Step [8017/13804], Loss: 2.2780, Perplexity: 9.7574, time_taken_in_seconds: 13
Epoch [1/1], Step [8018/13804], Loss: 2.8139, Perplexity: 16.6755, time_taken_in_seconds: 14
Epoch [1/1], Step [8019/13804], Loss: 2.4752, Perplexity: 11.8840, time_taken_in_seconds: 15
Epoch [1/1], Step [8020/13804], Loss: 2.7368, Perplexity: 15.4371, time_taken_in_seconds: 16
Epoch [1/1], Step [8021/13804], Loss: 2.5234, Perplexity: 12.4712, time_taken_in_seconds: 17
Epoch [1/1], Step [8022/13804], Loss: 2.5123, Perplexity: 12.3336, time_taken_in_seconds: 17
Epoch [1/1], Step [8023/13804], Loss: 2.6510, Perplexity: 14.1677, time_taken_in_seconds: 18
Epoch [1/1], Step [8024/13804], Loss: 2.6273, Perplexity: 13.8364, time_taken_in_seconds: 19
Epoch [1/1], Step [8025/13804], Loss: 2.9133, Perplexity: 18.4169, time_taken_in_seconds: 20
Epoch [1/1], Step [8026/13804], Loss: 2.5079, Perplexity: 12.2796, time_taken_in_seconds: 21
Epoch [1/1], Step [8027/13804], Loss: 2.4472, Perplexity: 11.5558, time_taken_in_seconds: 21
Epoch [1/1], Step [8028/13804], Loss: 2.4258, Perplexity: 11.3108, time_taken_in_seconds: 22
Epoch [1/1], Step [8029/13804], Loss: 3.1496, Perplexity: 23.3269, time_taken_in_seconds: 23
Epoch [1/1], Step [8030/13804], Loss: 2.7271, Perplexity: 15.2878, time_taken_in_seconds: 24
Epoch [1/1], Step [8031/13804], Loss: 2.5110, Perplexity: 12.3168, time_taken_in_seconds: 25
Epoch [1/1], Step [8032/13804], Loss: 4.4558, Perplexity: 86.1246, time_taken_in_seconds: 25
Epoch [1/1], Step [8033/13804], Loss: 2.4729, Perplexity: 11.8573, time_taken_in_seconds: 26
Epoch [1/1], Step [8034/13804], Loss: 2.8977, Perplexity: 18.1322, time_taken_in_seconds: 27
Epoch [1/1], Step [8035/13804], Loss: 2.2407, Perplexity: 9.3997, time_taken_in_seconds: 28
Epoch [1/1], Step [8036/13804], Loss: 2.6806, Perplexity: 14.5932, time_taken_in_seconds: 29
Epoch [1/1], Step [8037/13804], Loss: 2.6675, Perplexity: 14.4041, time_taken_in_seconds: 30
Epoch [1/1], Step [8038/13804], Loss: 2.6351, Perplexity: 13.9447, time_taken_in_seconds: 30
Epoch [1/1], Step [8039/13804], Loss: 2.3761, Perplexity: 10.7627, time_taken_in_seconds: 31
Epoch [1/1], Step [8040/13804], Loss: 2.6406, Perplexity: 14.0222, time_taken_in_seconds: 32
Epoch [1/1], Step [8041/13804], Loss: 2.7438, Perplexity: 15.5458, time_taken_in_seconds: 33
Epoch [1/1], Step [8042/13804], Loss: 2.6165, Perplexity: 13.6873, time_taken_in_seconds: 34
Epoch [1/1], Step [8043/13804], Loss: 2.5579, Perplexity: 12.9086, time_taken_in_seconds: 35
Epoch [1/1], Step [8044/13804], Loss: 3.2393, Perplexity: 25.5147, time_taken_in_seconds: 35
Epoch [1/1], Step [8045/13804], Loss: 2.7055, Perplexity: 14.9625, time_taken_in_seconds: 36
Epoch [1/1], Step [8046/13804], Loss: 3.0963, Perplexity: 22.1168, time_taken_in_seconds: 37
Epoch [1/1], Step [8047/13804], Loss: 2.5758, Perplexity: 13.1416, time_taken_in_seconds: 38
Epoch [1/1], Step [8048/13804], Loss: 2.7238, Perplexity: 15.2388, time_taken_in_seconds: 39
Epoch [1/1], Step [8049/13804], Loss: 2.3879, Perplexity: 10.8906, time_taken_in_seconds: 39
Epoch [1/1], Step [8050/13804], Loss: 2.6915, Perplexity: 14.7543, time_taken_in_seconds: 40
Epoch [1/1], Step [8051/13804], Loss: 2.2213, Perplexity: 9.2189, time_taken_in_seconds: 41
Epoch [1/1], Step [8052/13804], Loss: 2.6654, Perplexity: 14.3744, time_taken_in_seconds: 42
Epoch [1/1], Step [8053/13804], Loss: 2.8787, Perplexity: 17.7908, time_taken_in_seconds: 43
Epoch [1/1], Step [8054/13804], Loss: 2.9008, Perplexity: 18.1879, time_taken_in_seconds: 43
Epoch [1/1], Step [8055/13804], Loss: 2.5189, Perplexity: 12.4151, time_taken_in_seconds: 44
Epoch [1/1], Step [8056/13804], Loss: 2.2655, Perplexity: 9.6361, time_taken_in_seconds: 45
Epoch [1/1], Step [8057/13804], Loss: 2.5949, Perplexity: 13.3958, time_taken_in_seconds: 46
Epoch [1/1], Step [8058/13804], Loss: 2.7110, Perplexity: 15.0444, time_taken_in_seconds: 47
Epoch [1/1], Step [8059/13804], Loss: 2.6207, Perplexity: 13.7460, time_taken_in_seconds: 47
Epoch [1/1], Step [8060/13804], Loss: 2.4357, Perplexity: 11.4242, time_taken_in_seconds: 48
Epoch [1/1], Step [8061/13804], Loss: 2.5642, Perplexity: 12.9904, time_taken_in_seconds: 49
Epoch [1/1], Step [8062/13804], Loss: 2.5501, Perplexity: 12.8081, time_taken_in_seconds: 50
Epoch [1/1], Step [8063/13804], Loss: 2.5235, Perplexity: 12.4721, time_taken_in_seconds: 51
Epoch [1/1], Step [8064/13804], Loss: 2.5429, Perplexity: 12.7168, time_taken_in_seconds: 52
Epoch [1/1], Step [8065/13804], Loss: 3.3719, Perplexity: 29.1324, time_taken_in_seconds: 52
Epoch [1/1], Step [8066/13804], Loss: 2.4498, Perplexity: 11.5859, time_taken_in_seconds: 53
Epoch [1/1], Step [8067/13804], Loss: 2.9863, Perplexity: 19.8131, time_taken_in_seconds: 54
Epoch [1/1], Step [8068/13804], Loss: 2.7120, Perplexity: 15.0589, time_taken_in_seconds: 55
Epoch [1/1], Step [8069/13804], Loss: 2.5452, Perplexity: 12.7461, time_taken_in_seconds: 56
Epoch [1/1], Step [8070/13804], Loss: 2.6005, Perplexity: 13.4699, time_taken_in_seconds: 56
Epoch [1/1], Step [8071/13804], Loss: 2.6939, Perplexity: 14.7895, time_taken_in_seconds: 58
Epoch [1/1], Step [8072/13804], Loss: 2.7228, Perplexity: 15.2230, time_taken_in_seconds: 58
Epoch [1/1], Step [8073/13804], Loss: 2.7956, Perplexity: 16.3731, time_taken_in_seconds: 59
Epoch [1/1], Step [8074/13804], Loss: 2.3984, Perplexity: 11.0053, time_taken_in_seconds: 60
Epoch [1/1], Step [8075/13804], Loss: 2.4189, Perplexity: 11.2338, time_taken_in_seconds: 61
Epoch [1/1], Step [8076/13804], Loss: 2.7357, Perplexity: 15.4207, time_taken_in_seconds: 62
Epoch [1/1], Step [8077/13804], Loss: 2.8553, Perplexity: 17.3795, time_taken_in_seconds: 62
Epoch [1/1], Step [8078/13804], Loss: 2.4662, Perplexity: 11.7773, time_taken_in_seconds: 63
Epoch [1/1], Step [8079/13804], Loss: 2.5599, Perplexity: 12.9351, time_taken_in_seconds: 64
Epoch [1/1], Step [8080/13804], Loss: 3.1797, Perplexity: 24.0396, time_taken_in_seconds: 65
Epoch [1/1], Step [8081/13804], Loss: 3.1928, Perplexity: 24.3572, time_taken_in_seconds: 66
Epoch [1/1], Step [8082/13804], Loss: 2.5651, Perplexity: 13.0026, time_taken_in_seconds: 67
Epoch [1/1], Step [8083/13804], Loss: 2.7902, Perplexity: 16.2838, time_taken_in_seconds: 67
Epoch [1/1], Step [8084/13804], Loss: 2.6510, Perplexity: 14.1682, time_taken_in_seconds: 68
Epoch [1/1], Step [8085/13804], Loss: 2.8865, Perplexity: 17.9307, time_taken_in_seconds: 69
Epoch [1/1], Step [8086/13804], Loss: 2.4746, Perplexity: 11.8772, time_taken_in_seconds: 70
Epoch [1/1], Step [8087/13804], Loss: 2.6070, Perplexity: 13.5578, time_taken_in_seconds: 71
Epoch [1/1], Step [8088/13804], Loss: 3.2945, Perplexity: 26.9651, time_taken_in_seconds: 72
Epoch [1/1], Step [8089/13804], Loss: 2.3633, Perplexity: 10.6258, time_taken_in_seconds: 72
Epoch [1/1], Step [8090/13804], Loss: 2.5198, Perplexity: 12.4266, time_taken_in_seconds: 73
Epoch [1/1], Step [8091/13804], Loss: 2.4982, Perplexity: 12.1610, time_taken_in_seconds: 74
Epoch [1/1], Step [8092/13804], Loss: 2.6634, Perplexity: 14.3455, time_taken_in_seconds: 75
Epoch [1/1], Step [8093/13804], Loss: 2.4801, Perplexity: 11.9427, time_taken_in_seconds: 76
Epoch [1/1], Step [8094/13804], Loss: 2.7669, Perplexity: 15.9092, time_taken_in_seconds: 76
Epoch [1/1], Step [8095/13804], Loss: 2.5021, Perplexity: 12.2082, time_taken_in_seconds: 77
Epoch [1/1], Step [8096/13804], Loss: 2.7111, Perplexity: 15.0457, time_taken_in_seconds: 78
Epoch [1/1], Step [8097/13804], Loss: 2.4413, Perplexity: 11.4876, time_taken_in_seconds: 79
Epoch [1/1], Step [8098/13804], Loss: 2.4679, Perplexity: 11.7979, time_taken_in_seconds: 80
Epoch [1/1], Step [8099/13804], Loss: 2.7730, Perplexity: 16.0073, time_taken_in_seconds: 80
Epoch [1/1], Step [8100/13804], Loss: 2.4684, Perplexity: 11.8040, time_taken_in_seconds: 81
Epoch [1/1], Step [8101/13804], Loss: 2.8482, Perplexity: 17.2559, time_taken_in_seconds: 0
Epoch [1/1], Step [8102/13804], Loss: 2.5627, Perplexity: 12.9706, time_taken_in_seconds: 1
Epoch [1/1], Step [8103/13804], Loss: 2.7614, Perplexity: 15.8219, time_taken_in_seconds: 2
Epoch [1/1], Step [8104/13804], Loss: 2.7116, Perplexity: 15.0536, time_taken_in_seconds: 3
Epoch [1/1], Step [8105/13804], Loss: 2.2931, Perplexity: 9.9056, time_taken_in_seconds: 4
Epoch [1/1], Step [8106/13804], Loss: 2.5465, Perplexity: 12.7627, time_taken_in_seconds: 4
Epoch [1/1], Step [8107/13804], Loss: 2.2864, Perplexity: 9.8394, time_taken_in_seconds: 5
Epoch [1/1], Step [8108/13804], Loss: 2.6969, Perplexity: 14.8335, time_taken_in_seconds: 6
Epoch [1/1], Step [8109/13804], Loss: 2.8829, Perplexity: 17.8659, time_taken_in_seconds: 7
Epoch [1/1], Step [8110/13804], Loss: 2.5315, Perplexity: 12.5727, time_taken_in_seconds: 8
Epoch [1/1], Step [8111/13804], Loss: 2.6767, Perplexity: 14.5370, time_taken_in_seconds: 9
Epoch [1/1], Step [8112/13804], Loss: 2.8376, Perplexity: 17.0749, time_taken_in_seconds: 9
Epoch [1/1], Step [8113/13804], Loss: 2.4373, Perplexity: 11.4424, time_taken_in_seconds: 10
Epoch [1/1], Step [8114/13804], Loss: 2.9662, Perplexity: 19.4174, time_taken_in_seconds: 11
Epoch [1/1], Step [8115/13804], Loss: 2.5917, Perplexity: 13.3530, time_taken_in_seconds: 12
Epoch [1/1], Step [8116/13804], Loss: 2.5657, Perplexity: 13.0098, time_taken_in_seconds: 13
Epoch [1/1], Step [8117/13804], Loss: 2.4052, Perplexity: 11.0812, time_taken_in_seconds: 13
Epoch [1/1], Step [8118/13804], Loss: 2.3471, Perplexity: 10.4551, time_taken_in_seconds: 14
Epoch [1/1], Step [8119/13804], Loss: 2.5096, Perplexity: 12.3003, time_taken_in_seconds: 15
Epoch [1/1], Step [8120/13804], Loss: 2.8730, Perplexity: 17.6908, time_taken_in_seconds: 16
Epoch [1/1], Step [8121/13804], Loss: 2.5796, Perplexity: 13.1915, time_taken_in_seconds: 17
Epoch [1/1], Step [8122/13804], Loss: 2.7253, Perplexity: 15.2611, time_taken_in_seconds: 18
Epoch [1/1], Step [8123/13804], Loss: 2.6988, Perplexity: 14.8617, time_taken_in_seconds: 18
Epoch [1/1], Step [8124/13804], Loss: 2.4682, Perplexity: 11.8009, time_taken_in_seconds: 19
Epoch [1/1], Step [8125/13804], Loss: 2.8440, Perplexity: 17.1848, time_taken_in_seconds: 20
Epoch [1/1], Step [8126/13804], Loss: 3.0125, Perplexity: 20.3383, time_taken_in_seconds: 21
Epoch [1/1], Step [8127/13804], Loss: 2.1079, Perplexity: 8.2313, time_taken_in_seconds: 22
Epoch [1/1], Step [8128/13804], Loss: 2.5167, Perplexity: 12.3874, time_taken_in_seconds: 23
Epoch [1/1], Step [8129/13804], Loss: 2.3922, Perplexity: 10.9373, time_taken_in_seconds: 23
Epoch [1/1], Step [8130/13804], Loss: 2.7335, Perplexity: 15.3865, time_taken_in_seconds: 24
Epoch [1/1], Step [8131/13804], Loss: 2.7586, Perplexity: 15.7778, time_taken_in_seconds: 25
Epoch [1/1], Step [8132/13804], Loss: 2.9772, Perplexity: 19.6326, time_taken_in_seconds: 26
Epoch [1/1], Step [8133/13804], Loss: 2.6592, Perplexity: 14.2847, time_taken_in_seconds: 27
Epoch [1/1], Step [8134/13804], Loss: 2.8827, Perplexity: 17.8623, time_taken_in_seconds: 27
Epoch [1/1], Step [8135/13804], Loss: 2.4252, Perplexity: 11.3044, time_taken_in_seconds: 28
Epoch [1/1], Step [8136/13804], Loss: 2.5212, Perplexity: 12.4431, time_taken_in_seconds: 29
Epoch [1/1], Step [8137/13804], Loss: 2.4844, Perplexity: 11.9934, time_taken_in_seconds: 30
Epoch [1/1], Step [8138/13804], Loss: 3.0878, Perplexity: 21.9282, time_taken_in_seconds: 31
Epoch [1/1], Step [8139/13804], Loss: 2.7784, Perplexity: 16.0929, time_taken_in_seconds: 32
Epoch [1/1], Step [8140/13804], Loss: 3.2634, Perplexity: 26.1395, time_taken_in_seconds: 32
Epoch [1/1], Step [8141/13804], Loss: 2.3317, Perplexity: 10.2950, time_taken_in_seconds: 33
Epoch [1/1], Step [8142/13804], Loss: 2.4134, Perplexity: 11.1722, time_taken_in_seconds: 34
Epoch [1/1], Step [8143/13804], Loss: 2.4349, Perplexity: 11.4141, time_taken_in_seconds: 35
Epoch [1/1], Step [8144/13804], Loss: 2.5348, Perplexity: 12.6144, time_taken_in_seconds: 36
Epoch [1/1], Step [8145/13804], Loss: 2.5947, Perplexity: 13.3924, time_taken_in_seconds: 37
Epoch [1/1], Step [8146/13804], Loss: 2.8361, Perplexity: 17.0491, time_taken_in_seconds: 38
Epoch [1/1], Step [8147/13804], Loss: 2.5724, Perplexity: 13.0978, time_taken_in_seconds: 38
Epoch [1/1], Step [8148/13804], Loss: 2.5232, Perplexity: 12.4690, time_taken_in_seconds: 39
Epoch [1/1], Step [8149/13804], Loss: 2.4381, Perplexity: 11.4507, time_taken_in_seconds: 40
Epoch [1/1], Step [8150/13804], Loss: 2.2281, Perplexity: 9.2820, time_taken_in_seconds: 41
Epoch [1/1], Step [8151/13804], Loss: 2.6225, Perplexity: 13.7702, time_taken_in_seconds: 42
Epoch [1/1], Step [8152/13804], Loss: 2.7100, Perplexity: 15.0300, time_taken_in_seconds: 42
Epoch [1/1], Step [8153/13804], Loss: 2.5225, Perplexity: 12.4591, time_taken_in_seconds: 43
Epoch [1/1], Step [8154/13804], Loss: 2.4648, Perplexity: 11.7608, time_taken_in_seconds: 44
Epoch [1/1], Step [8155/13804], Loss: 3.1217, Perplexity: 22.6848, time_taken_in_seconds: 45
Epoch [1/1], Step [8156/13804], Loss: 2.5738, Perplexity: 13.1157, time_taken_in_seconds: 46
Epoch [1/1], Step [8157/13804], Loss: 2.6369, Perplexity: 13.9701, time_taken_in_seconds: 47
Epoch [1/1], Step [8158/13804], Loss: 2.4576, Perplexity: 11.6767, time_taken_in_seconds: 47
Epoch [1/1], Step [8159/13804], Loss: 2.5654, Perplexity: 13.0056, time_taken_in_seconds: 48
Epoch [1/1], Step [8160/13804], Loss: 2.5500, Perplexity: 12.8068, time_taken_in_seconds: 49
Epoch [1/1], Step [8161/13804], Loss: 2.7434, Perplexity: 15.5393, time_taken_in_seconds: 50
Epoch [1/1], Step [8162/13804], Loss: 2.5776, Perplexity: 13.1659, time_taken_in_seconds: 51
Epoch [1/1], Step [8163/13804], Loss: 2.5623, Perplexity: 12.9654, time_taken_in_seconds: 52
Epoch [1/1], Step [8164/13804], Loss: 2.6653, Perplexity: 14.3724, time_taken_in_seconds: 52
Epoch [1/1], Step [8165/13804], Loss: 2.8317, Perplexity: 16.9739, time_taken_in_seconds: 53
Epoch [1/1], Step [8166/13804], Loss: 2.4869, Perplexity: 12.0236, time_taken_in_seconds: 54
Epoch [1/1], Step [8167/13804], Loss: 2.4656, Perplexity: 11.7705, time_taken_in_seconds: 55
Epoch [1/1], Step [8168/13804], Loss: 3.0606, Perplexity: 21.3414, time_taken_in_seconds: 56
Epoch [1/1], Step [8169/13804], Loss: 2.5709, Perplexity: 13.0782, time_taken_in_seconds: 56
Epoch [1/1], Step [8170/13804], Loss: 2.2608, Perplexity: 9.5912, time_taken_in_seconds: 57
Epoch [1/1], Step [8171/13804], Loss: 2.5886, Perplexity: 13.3108, time_taken_in_seconds: 58
Epoch [1/1], Step [8172/13804], Loss: 2.5597, Perplexity: 12.9324, time_taken_in_seconds: 59
Epoch [1/1], Step [8173/13804], Loss: 2.3860, Perplexity: 10.8696, time_taken_in_seconds: 60
Epoch [1/1], Step [8174/13804], Loss: 3.0888, Perplexity: 21.9505, time_taken_in_seconds: 61
Epoch [1/1], Step [8175/13804], Loss: 2.7170, Perplexity: 15.1355, time_taken_in_seconds: 61
Epoch [1/1], Step [8176/13804], Loss: 2.9882, Perplexity: 19.8498, time_taken_in_seconds: 62
Epoch [1/1], Step [8177/13804], Loss: 2.3686, Perplexity: 10.6820, time_taken_in_seconds: 63
Epoch [1/1], Step [8178/13804], Loss: 2.6284, Perplexity: 13.8515, time_taken_in_seconds: 64
Epoch [1/1], Step [8179/13804], Loss: 2.7761, Perplexity: 16.0569, time_taken_in_seconds: 65
Epoch [1/1], Step [8180/13804], Loss: 2.4861, Perplexity: 12.0144, time_taken_in_seconds: 65
Epoch [1/1], Step [8181/13804], Loss: 2.8476, Perplexity: 17.2465, time_taken_in_seconds: 66
Epoch [1/1], Step [8182/13804], Loss: 2.3566, Perplexity: 10.5547, time_taken_in_seconds: 67
Epoch [1/1], Step [8183/13804], Loss: 3.9490, Perplexity: 51.8826, time_taken_in_seconds: 68
Epoch [1/1], Step [8184/13804], Loss: 3.1189, Perplexity: 22.6205, time_taken_in_seconds: 69
Epoch [1/1], Step [8185/13804], Loss: 2.5933, Perplexity: 13.3736, time_taken_in_seconds: 70
Epoch [1/1], Step [8186/13804], Loss: 2.6189, Perplexity: 13.7205, time_taken_in_seconds: 70
Epoch [1/1], Step [8187/13804], Loss: 2.8109, Perplexity: 16.6249, time_taken_in_seconds: 71
Epoch [1/1], Step [8188/13804], Loss: 3.0060, Perplexity: 20.2065, time_taken_in_seconds: 72
Epoch [1/1], Step [8189/13804], Loss: 2.7713, Perplexity: 15.9791, time_taken_in_seconds: 73
Epoch [1/1], Step [8190/13804], Loss: 2.5007, Perplexity: 12.1916, time_taken_in_seconds: 74
Epoch [1/1], Step [8191/13804], Loss: 2.7329, Perplexity: 15.3772, time_taken_in_seconds: 74
Epoch [1/1], Step [8192/13804], Loss: 2.5936, Perplexity: 13.3785, time_taken_in_seconds: 75
Epoch [1/1], Step [8193/13804], Loss: 2.2849, Perplexity: 9.8249, time_taken_in_seconds: 76
Epoch [1/1], Step [8194/13804], Loss: 2.6587, Perplexity: 14.2784, time_taken_in_seconds: 77
Epoch [1/1], Step [8195/13804], Loss: 2.8963, Perplexity: 18.1066, time_taken_in_seconds: 78
Epoch [1/1], Step [8196/13804], Loss: 2.7649, Perplexity: 15.8778, time_taken_in_seconds: 78
Epoch [1/1], Step [8197/13804], Loss: 2.8802, Perplexity: 17.8183, time_taken_in_seconds: 79
Epoch [1/1], Step [8198/13804], Loss: 2.1618, Perplexity: 8.6864, time_taken_in_seconds: 80
Epoch [1/1], Step [8199/13804], Loss: 2.0807, Perplexity: 8.0100, time_taken_in_seconds: 81
Epoch [1/1], Step [8200/13804], Loss: 2.5898, Perplexity: 13.3267, time_taken_in_seconds: 82
Epoch [1/1], Step [8201/13804], Loss: 2.4967, Perplexity: 12.1418, time_taken_in_seconds: 0
Epoch [1/1], Step [8202/13804], Loss: 2.2742, Perplexity: 9.7203, time_taken_in_seconds: 1
Epoch [1/1], Step [8203/13804], Loss: 2.7168, Perplexity: 15.1320, time_taken_in_seconds: 2
Epoch [1/1], Step [8204/13804], Loss: 2.7287, Perplexity: 15.3125, time_taken_in_seconds: 3
Epoch [1/1], Step [8205/13804], Loss: 2.7166, Perplexity: 15.1283, time_taken_in_seconds: 4
Epoch [1/1], Step [8206/13804], Loss: 2.5963, Perplexity: 13.4140, time_taken_in_seconds: 4
Epoch [1/1], Step [8207/13804], Loss: 2.8165, Perplexity: 16.7190, time_taken_in_seconds: 5
Epoch [1/1], Step [8208/13804], Loss: 2.4846, Perplexity: 11.9969, time_taken_in_seconds: 6
Epoch [1/1], Step [8209/13804], Loss: 2.6207, Perplexity: 13.7450, time_taken_in_seconds: 7
Epoch [1/1], Step [8210/13804], Loss: 2.6131, Perplexity: 13.6416, time_taken_in_seconds: 8
Epoch [1/1], Step [8211/13804], Loss: 2.6868, Perplexity: 14.6846, time_taken_in_seconds: 8
Epoch [1/1], Step [8212/13804], Loss: 2.4263, Perplexity: 11.3166, time_taken_in_seconds: 9
Epoch [1/1], Step [8213/13804], Loss: 2.8911, Perplexity: 18.0124, time_taken_in_seconds: 10
Epoch [1/1], Step [8214/13804], Loss: 2.8439, Perplexity: 17.1825, time_taken_in_seconds: 11
Epoch [1/1], Step [8215/13804], Loss: 2.6791, Perplexity: 14.5714, time_taken_in_seconds: 12
Epoch [1/1], Step [8216/13804], Loss: 2.2925, Perplexity: 9.8994, time_taken_in_seconds: 13
Epoch [1/1], Step [8217/13804], Loss: 2.6608, Perplexity: 14.3074, time_taken_in_seconds: 13
Epoch [1/1], Step [8218/13804], Loss: 2.4979, Perplexity: 12.1572, time_taken_in_seconds: 14
Epoch [1/1], Step [8219/13804], Loss: 2.7839, Perplexity: 16.1825, time_taken_in_seconds: 15
Epoch [1/1], Step [8220/13804], Loss: 2.3914, Perplexity: 10.9291, time_taken_in_seconds: 16
Epoch [1/1], Step [8221/13804], Loss: 2.4649, Perplexity: 11.7623, time_taken_in_seconds: 17
Epoch [1/1], Step [8222/13804], Loss: 2.7410, Perplexity: 15.5019, time_taken_in_seconds: 18
Epoch [1/1], Step [8223/13804], Loss: 2.2345, Perplexity: 9.3420, time_taken_in_seconds: 18
Epoch [1/1], Step [8224/13804], Loss: 2.4418, Perplexity: 11.4940, time_taken_in_seconds: 19
Epoch [1/1], Step [8225/13804], Loss: 2.4136, Perplexity: 11.1746, time_taken_in_seconds: 20
Epoch [1/1], Step [8226/13804], Loss: 3.6974, Perplexity: 40.3442, time_taken_in_seconds: 21
Epoch [1/1], Step [8227/13804], Loss: 2.6206, Perplexity: 13.7433, time_taken_in_seconds: 22
Epoch [1/1], Step [8228/13804], Loss: 2.4920, Perplexity: 12.0858, time_taken_in_seconds: 23
Epoch [1/1], Step [8229/13804], Loss: 2.4870, Perplexity: 12.0254, time_taken_in_seconds: 23
Epoch [1/1], Step [8230/13804], Loss: 2.3579, Perplexity: 10.5683, time_taken_in_seconds: 24
Epoch [1/1], Step [8231/13804], Loss: 2.4417, Perplexity: 11.4920, time_taken_in_seconds: 25
Epoch [1/1], Step [8232/13804], Loss: 2.4677, Perplexity: 11.7954, time_taken_in_seconds: 26
Epoch [1/1], Step [8233/13804], Loss: 2.6463, Perplexity: 14.1016, time_taken_in_seconds: 27
Epoch [1/1], Step [8234/13804], Loss: 2.4048, Perplexity: 11.0757, time_taken_in_seconds: 28
Epoch [1/1], Step [8235/13804], Loss: 2.5094, Perplexity: 12.2974, time_taken_in_seconds: 28
Epoch [1/1], Step [8236/13804], Loss: 2.8866, Perplexity: 17.9330, time_taken_in_seconds: 29
Epoch [1/1], Step [8237/13804], Loss: 2.6294, Perplexity: 13.8654, time_taken_in_seconds: 30
Epoch [1/1], Step [8238/13804], Loss: 2.7201, Perplexity: 15.1823, time_taken_in_seconds: 31
Epoch [1/1], Step [8239/13804], Loss: 2.9893, Perplexity: 19.8710, time_taken_in_seconds: 32
Epoch [1/1], Step [8240/13804], Loss: 2.7375, Perplexity: 15.4480, time_taken_in_seconds: 33
Epoch [1/1], Step [8241/13804], Loss: 3.2066, Perplexity: 24.6944, time_taken_in_seconds: 33
Epoch [1/1], Step [8242/13804], Loss: 2.3580, Perplexity: 10.5696, time_taken_in_seconds: 34
Epoch [1/1], Step [8243/13804], Loss: 2.4413, Perplexity: 11.4885, time_taken_in_seconds: 35
Epoch [1/1], Step [8244/13804], Loss: 2.8178, Perplexity: 16.7397, time_taken_in_seconds: 36
Epoch [1/1], Step [8245/13804], Loss: 2.3070, Perplexity: 10.0439, time_taken_in_seconds: 37
Epoch [1/1], Step [8246/13804], Loss: 2.5557, Perplexity: 12.8797, time_taken_in_seconds: 37
Epoch [1/1], Step [8247/13804], Loss: 2.7873, Perplexity: 16.2372, time_taken_in_seconds: 38
Epoch [1/1], Step [8248/13804], Loss: 2.4931, Perplexity: 12.0982, time_taken_in_seconds: 39
Epoch [1/1], Step [8249/13804], Loss: 2.8652, Perplexity: 17.5524, time_taken_in_seconds: 40
Epoch [1/1], Step [8250/13804], Loss: 2.6556, Perplexity: 14.2330, time_taken_in_seconds: 41
Epoch [1/1], Step [8251/13804], Loss: 2.7982, Perplexity: 16.4153, time_taken_in_seconds: 42
Epoch [1/1], Step [8252/13804], Loss: 2.5927, Perplexity: 13.3654, time_taken_in_seconds: 42
Epoch [1/1], Step [8253/13804], Loss: 2.8182, Perplexity: 16.7462, time_taken_in_seconds: 43
Epoch [1/1], Step [8254/13804], Loss: 2.5461, Perplexity: 12.7577, time_taken_in_seconds: 44
Epoch [1/1], Step [8255/13804], Loss: 2.8688, Perplexity: 17.6155, time_taken_in_seconds: 45
Epoch [1/1], Step [8256/13804], Loss: 2.5530, Perplexity: 12.8459, time_taken_in_seconds: 46
Epoch [1/1], Step [8257/13804], Loss: 2.3553, Perplexity: 10.5410, time_taken_in_seconds: 46
Epoch [1/1], Step [8258/13804], Loss: 2.2044, Perplexity: 9.0652, time_taken_in_seconds: 47
Epoch [1/1], Step [8259/13804], Loss: 2.6377, Perplexity: 13.9806, time_taken_in_seconds: 48
Epoch [1/1], Step [8260/13804], Loss: 2.5262, Perplexity: 12.5059, time_taken_in_seconds: 49
Epoch [1/1], Step [8261/13804], Loss: 2.6137, Perplexity: 13.6491, time_taken_in_seconds: 50
Epoch [1/1], Step [8262/13804], Loss: 2.8086, Perplexity: 16.5859, time_taken_in_seconds: 51
Epoch [1/1], Step [8263/13804], Loss: 2.4534, Perplexity: 11.6277, time_taken_in_seconds: 51
Epoch [1/1], Step [8264/13804], Loss: 2.5325, Perplexity: 12.5852, time_taken_in_seconds: 52
Epoch [1/1], Step [8265/13804], Loss: 2.7847, Perplexity: 16.1952, time_taken_in_seconds: 53
Epoch [1/1], Step [8266/13804], Loss: 2.6089, Perplexity: 13.5836, time_taken_in_seconds: 54
Epoch [1/1], Step [8267/13804], Loss: 2.6174, Perplexity: 13.7002, time_taken_in_seconds: 55
Epoch [1/1], Step [8268/13804], Loss: 2.4680, Perplexity: 11.7984, time_taken_in_seconds: 55
Epoch [1/1], Step [8269/13804], Loss: 2.7219, Perplexity: 15.2091, time_taken_in_seconds: 56
Epoch [1/1], Step [8270/13804], Loss: 2.4215, Perplexity: 11.2632, time_taken_in_seconds: 57
Epoch [1/1], Step [8271/13804], Loss: 2.2245, Perplexity: 9.2488, time_taken_in_seconds: 58
Epoch [1/1], Step [8272/13804], Loss: 2.6460, Perplexity: 14.0977, time_taken_in_seconds: 59
Epoch [1/1], Step [8273/13804], Loss: 2.6651, Perplexity: 14.3690, time_taken_in_seconds: 59
Epoch [1/1], Step [8274/13804], Loss: 2.5430, Perplexity: 12.7182, time_taken_in_seconds: 60
Epoch [1/1], Step [8275/13804], Loss: 2.6620, Perplexity: 14.3256, time_taken_in_seconds: 61
Epoch [1/1], Step [8276/13804], Loss: 2.3529, Perplexity: 10.5161, time_taken_in_seconds: 62
Epoch [1/1], Step [8277/13804], Loss: 2.3812, Perplexity: 10.8180, time_taken_in_seconds: 63
Epoch [1/1], Step [8278/13804], Loss: 3.1939, Perplexity: 24.3835, time_taken_in_seconds: 64
Epoch [1/1], Step [8279/13804], Loss: 2.6532, Perplexity: 14.1987, time_taken_in_seconds: 64
Epoch [1/1], Step [8280/13804], Loss: 2.3488, Perplexity: 10.4727, time_taken_in_seconds: 65
Epoch [1/1], Step [8281/13804], Loss: 2.4107, Perplexity: 11.1413, time_taken_in_seconds: 66
Epoch [1/1], Step [8282/13804], Loss: 2.2739, Perplexity: 9.7176, time_taken_in_seconds: 67
Epoch [1/1], Step [8283/13804], Loss: 2.5678, Perplexity: 13.0369, time_taken_in_seconds: 68
Epoch [1/1], Step [8284/13804], Loss: 3.2761, Perplexity: 26.4731, time_taken_in_seconds: 68
Epoch [1/1], Step [8285/13804], Loss: 3.0277, Perplexity: 20.6488, time_taken_in_seconds: 69
Epoch [1/1], Step [8286/13804], Loss: 2.6356, Perplexity: 13.9517, time_taken_in_seconds: 70
Epoch [1/1], Step [8287/13804], Loss: 2.5886, Perplexity: 13.3113, time_taken_in_seconds: 71
Epoch [1/1], Step [8288/13804], Loss: 2.4786, Perplexity: 11.9246, time_taken_in_seconds: 72
Epoch [1/1], Step [8289/13804], Loss: 2.7050, Perplexity: 14.9544, time_taken_in_seconds: 73
Epoch [1/1], Step [8290/13804], Loss: 2.3043, Perplexity: 10.0175, time_taken_in_seconds: 73
Epoch [1/1], Step [8291/13804], Loss: 2.1382, Perplexity: 8.4845, time_taken_in_seconds: 74
Epoch [1/1], Step [8292/13804], Loss: 3.4503, Perplexity: 31.5105, time_taken_in_seconds: 75
Epoch [1/1], Step [8293/13804], Loss: 2.8799, Perplexity: 17.8130, time_taken_in_seconds: 76
Epoch [1/1], Step [8294/13804], Loss: 3.1707, Perplexity: 23.8233, time_taken_in_seconds: 77
Epoch [1/1], Step [8295/13804], Loss: 2.8552, Perplexity: 17.3782, time_taken_in_seconds: 78
Epoch [1/1], Step [8296/13804], Loss: 2.5221, Perplexity: 12.4547, time_taken_in_seconds: 78
Epoch [1/1], Step [8297/13804], Loss: 2.5537, Perplexity: 12.8547, time_taken_in_seconds: 79
Epoch [1/1], Step [8298/13804], Loss: 2.9010, Perplexity: 18.1927, time_taken_in_seconds: 80
Epoch [1/1], Step [8299/13804], Loss: 2.6597, Perplexity: 14.2913, time_taken_in_seconds: 81
Epoch [1/1], Step [8300/13804], Loss: 2.6448, Perplexity: 14.0807, time_taken_in_seconds: 82
Epoch [1/1], Step [8301/13804], Loss: 2.3656, Perplexity: 10.6506, time_taken_in_seconds: 0
Epoch [1/1], Step [8302/13804], Loss: 2.9976, Perplexity: 20.0373, time_taken_in_seconds: 1
Epoch [1/1], Step [8303/13804], Loss: 2.8904, Perplexity: 18.0010, time_taken_in_seconds: 2
Epoch [1/1], Step [8304/13804], Loss: 2.3912, Perplexity: 10.9267, time_taken_in_seconds: 3
Epoch [1/1], Step [8305/13804], Loss: 2.9457, Perplexity: 19.0235, time_taken_in_seconds: 4
Epoch [1/1], Step [8306/13804], Loss: 2.6022, Perplexity: 13.4934, time_taken_in_seconds: 4
Epoch [1/1], Step [8307/13804], Loss: 2.5762, Perplexity: 13.1466, time_taken_in_seconds: 5
Epoch [1/1], Step [8308/13804], Loss: 2.3092, Perplexity: 10.0668, time_taken_in_seconds: 6
Epoch [1/1], Step [8309/13804], Loss: 2.6338, Perplexity: 13.9271, time_taken_in_seconds: 7
Epoch [1/1], Step [8310/13804], Loss: 2.7198, Perplexity: 15.1769, time_taken_in_seconds: 8
Epoch [1/1], Step [8311/13804], Loss: 3.0480, Perplexity: 21.0740, time_taken_in_seconds: 8
Epoch [1/1], Step [8312/13804], Loss: 2.6080, Perplexity: 13.5715, time_taken_in_seconds: 9
Epoch [1/1], Step [8313/13804], Loss: 2.2462, Perplexity: 9.4515, time_taken_in_seconds: 10
Epoch [1/1], Step [8314/13804], Loss: 2.5570, Perplexity: 12.8974, time_taken_in_seconds: 11
Epoch [1/1], Step [8315/13804], Loss: 2.7451, Perplexity: 15.5659, time_taken_in_seconds: 12
Epoch [1/1], Step [8316/13804], Loss: 2.6462, Perplexity: 14.1000, time_taken_in_seconds: 13
Epoch [1/1], Step [8317/13804], Loss: 2.6552, Perplexity: 14.2272, time_taken_in_seconds: 13
Epoch [1/1], Step [8318/13804], Loss: 2.7488, Perplexity: 15.6232, time_taken_in_seconds: 14
Epoch [1/1], Step [8319/13804], Loss: 2.7337, Perplexity: 15.3899, time_taken_in_seconds: 15
Epoch [1/1], Step [8320/13804], Loss: 2.7652, Perplexity: 15.8826, time_taken_in_seconds: 16
Epoch [1/1], Step [8321/13804], Loss: 2.7749, Perplexity: 16.0374, time_taken_in_seconds: 17
Epoch [1/1], Step [8322/13804], Loss: 2.5989, Perplexity: 13.4487, time_taken_in_seconds: 18
Epoch [1/1], Step [8323/13804], Loss: 2.2711, Perplexity: 9.6898, time_taken_in_seconds: 18
Epoch [1/1], Step [8324/13804], Loss: 2.4857, Perplexity: 12.0092, time_taken_in_seconds: 19
Epoch [1/1], Step [8325/13804], Loss: 2.4530, Perplexity: 11.6232, time_taken_in_seconds: 20
Epoch [1/1], Step [8326/13804], Loss: 2.5268, Perplexity: 12.5130, time_taken_in_seconds: 21
Epoch [1/1], Step [8327/13804], Loss: 2.7002, Perplexity: 14.8824, time_taken_in_seconds: 22
Epoch [1/1], Step [8328/13804], Loss: 2.6624, Perplexity: 14.3312, time_taken_in_seconds: 22
Epoch [1/1], Step [8329/13804], Loss: 2.5502, Perplexity: 12.8091, time_taken_in_seconds: 23
Epoch [1/1], Step [8330/13804], Loss: 2.6116, Perplexity: 13.6209, time_taken_in_seconds: 24
Epoch [1/1], Step [8331/13804], Loss: 2.4605, Perplexity: 11.7106, time_taken_in_seconds: 25
Epoch [1/1], Step [8332/13804], Loss: 2.5782, Perplexity: 13.1733, time_taken_in_seconds: 26
Epoch [1/1], Step [8333/13804], Loss: 2.6106, Perplexity: 13.6069, time_taken_in_seconds: 27
Epoch [1/1], Step [8334/13804], Loss: 2.5320, Perplexity: 12.5785, time_taken_in_seconds: 27
Epoch [1/1], Step [8335/13804], Loss: 2.4529, Perplexity: 11.6218, time_taken_in_seconds: 28
Epoch [1/1], Step [8336/13804], Loss: 2.9966, Perplexity: 20.0182, time_taken_in_seconds: 29
Epoch [1/1], Step [8337/13804], Loss: 2.8470, Perplexity: 17.2360, time_taken_in_seconds: 30
Epoch [1/1], Step [8338/13804], Loss: 2.6024, Perplexity: 13.4961, time_taken_in_seconds: 31
Epoch [1/1], Step [8339/13804], Loss: 3.0767, Perplexity: 21.6865, time_taken_in_seconds: 31
Epoch [1/1], Step [8340/13804], Loss: 2.6472, Perplexity: 14.1147, time_taken_in_seconds: 32
Epoch [1/1], Step [8341/13804], Loss: 2.6170, Perplexity: 13.6950, time_taken_in_seconds: 33
Epoch [1/1], Step [8342/13804], Loss: 2.7254, Perplexity: 15.2632, time_taken_in_seconds: 34
Epoch [1/1], Step [8343/13804], Loss: 2.7378, Perplexity: 15.4522, time_taken_in_seconds: 35
Epoch [1/1], Step [8344/13804], Loss: 2.4261, Perplexity: 11.3150, time_taken_in_seconds: 36
Epoch [1/1], Step [8345/13804], Loss: 2.4106, Perplexity: 11.1406, time_taken_in_seconds: 36
Epoch [1/1], Step [8346/13804], Loss: 2.4131, Perplexity: 11.1690, time_taken_in_seconds: 37
Epoch [1/1], Step [8347/13804], Loss: 2.8350, Perplexity: 17.0304, time_taken_in_seconds: 38
Epoch [1/1], Step [8348/13804], Loss: 2.4429, Perplexity: 11.5058, time_taken_in_seconds: 39
Epoch [1/1], Step [8349/13804], Loss: 2.5510, Perplexity: 12.8201, time_taken_in_seconds: 40
Epoch [1/1], Step [8350/13804], Loss: 2.5376, Perplexity: 12.6489, time_taken_in_seconds: 40
Epoch [1/1], Step [8351/13804], Loss: 2.7463, Perplexity: 15.5848, time_taken_in_seconds: 41
Epoch [1/1], Step [8352/13804], Loss: 2.9096, Perplexity: 18.3488, time_taken_in_seconds: 42
Epoch [1/1], Step [8353/13804], Loss: 2.5955, Perplexity: 13.4034, time_taken_in_seconds: 43
Epoch [1/1], Step [8354/13804], Loss: 2.5048, Perplexity: 12.2409, time_taken_in_seconds: 44
Epoch [1/1], Step [8355/13804], Loss: 2.5856, Perplexity: 13.2709, time_taken_in_seconds: 45
Epoch [1/1], Step [8356/13804], Loss: 3.1900, Perplexity: 24.2876, time_taken_in_seconds: 45
Epoch [1/1], Step [8357/13804], Loss: 2.6886, Perplexity: 14.7107, time_taken_in_seconds: 46
Epoch [1/1], Step [8358/13804], Loss: 2.4908, Perplexity: 12.0712, time_taken_in_seconds: 47
Epoch [1/1], Step [8359/13804], Loss: 2.7653, Perplexity: 15.8832, time_taken_in_seconds: 48
Epoch [1/1], Step [8360/13804], Loss: 2.2630, Perplexity: 9.6119, time_taken_in_seconds: 49
Epoch [1/1], Step [8361/13804], Loss: 2.4006, Perplexity: 11.0296, time_taken_in_seconds: 50
Epoch [1/1], Step [8362/13804], Loss: 2.6232, Perplexity: 13.7793, time_taken_in_seconds: 50
Epoch [1/1], Step [8363/13804], Loss: 2.6742, Perplexity: 14.5005, time_taken_in_seconds: 51
Epoch [1/1], Step [8364/13804], Loss: 2.6897, Perplexity: 14.7265, time_taken_in_seconds: 52
Epoch [1/1], Step [8365/13804], Loss: 2.4015, Perplexity: 11.0396, time_taken_in_seconds: 53
Epoch [1/1], Step [8366/13804], Loss: 2.5068, Perplexity: 12.2652, time_taken_in_seconds: 54
Epoch [1/1], Step [8367/13804], Loss: 2.5485, Perplexity: 12.7875, time_taken_in_seconds: 55
Epoch [1/1], Step [8368/13804], Loss: 3.0483, Perplexity: 21.0786, time_taken_in_seconds: 55
Epoch [1/1], Step [8369/13804], Loss: 2.4984, Perplexity: 12.1631, time_taken_in_seconds: 56
Epoch [1/1], Step [8370/13804], Loss: 2.8777, Perplexity: 17.7730, time_taken_in_seconds: 57
Epoch [1/1], Step [8371/13804], Loss: 2.5518, Perplexity: 12.8299, time_taken_in_seconds: 58
Epoch [1/1], Step [8372/13804], Loss: 2.4231, Perplexity: 11.2809, time_taken_in_seconds: 59
Epoch [1/1], Step [8373/13804], Loss: 2.5314, Perplexity: 12.5707, time_taken_in_seconds: 59
Epoch [1/1], Step [8374/13804], Loss: 3.0312, Perplexity: 20.7230, time_taken_in_seconds: 60
Epoch [1/1], Step [8375/13804], Loss: 2.5787, Perplexity: 13.1802, time_taken_in_seconds: 61
Epoch [1/1], Step [8376/13804], Loss: 2.9657, Perplexity: 19.4089, time_taken_in_seconds: 62
Epoch [1/1], Step [8377/13804], Loss: 2.9187, Perplexity: 18.5175, time_taken_in_seconds: 63
Epoch [1/1], Step [8378/13804], Loss: 2.5313, Perplexity: 12.5697, time_taken_in_seconds: 64
Epoch [1/1], Step [8379/13804], Loss: 2.5157, Perplexity: 12.3747, time_taken_in_seconds: 64
Epoch [1/1], Step [8380/13804], Loss: 2.6812, Perplexity: 14.6022, time_taken_in_seconds: 65
Epoch [1/1], Step [8381/13804], Loss: 2.2567, Perplexity: 9.5517, time_taken_in_seconds: 66
Epoch [1/1], Step [8382/13804], Loss: 2.3695, Perplexity: 10.6921, time_taken_in_seconds: 67
Epoch [1/1], Step [8383/13804], Loss: 2.5418, Perplexity: 12.7031, time_taken_in_seconds: 68
Epoch [1/1], Step [8384/13804], Loss: 2.7077, Perplexity: 14.9944, time_taken_in_seconds: 68
Epoch [1/1], Step [8385/13804], Loss: 2.7809, Perplexity: 16.1331, time_taken_in_seconds: 69
Epoch [1/1], Step [8386/13804], Loss: 3.0015, Perplexity: 20.1154, time_taken_in_seconds: 70
Epoch [1/1], Step [8387/13804], Loss: 2.2671, Perplexity: 9.6515, time_taken_in_seconds: 71
Epoch [1/1], Step [8388/13804], Loss: 2.6541, Perplexity: 14.2126, time_taken_in_seconds: 72
Epoch [1/1], Step [8389/13804], Loss: 2.5641, Perplexity: 12.9884, time_taken_in_seconds: 73
Epoch [1/1], Step [8390/13804], Loss: 2.2217, Perplexity: 9.2232, time_taken_in_seconds: 73
Epoch [1/1], Step [8391/13804], Loss: 2.5228, Perplexity: 12.4638, time_taken_in_seconds: 74
Epoch [1/1], Step [8392/13804], Loss: 2.4808, Perplexity: 11.9509, time_taken_in_seconds: 75
Epoch [1/1], Step [8393/13804], Loss: 2.3433, Perplexity: 10.4160, time_taken_in_seconds: 76
Epoch [1/1], Step [8394/13804], Loss: 2.3679, Perplexity: 10.6748, time_taken_in_seconds: 77
Epoch [1/1], Step [8395/13804], Loss: 2.5391, Perplexity: 12.6682, time_taken_in_seconds: 77
Epoch [1/1], Step [8396/13804], Loss: 2.4473, Perplexity: 11.5565, time_taken_in_seconds: 78
Epoch [1/1], Step [8397/13804], Loss: 2.6955, Perplexity: 14.8133, time_taken_in_seconds: 79
Epoch [1/1], Step [8398/13804], Loss: 2.2291, Perplexity: 9.2913, time_taken_in_seconds: 80
Epoch [1/1], Step [8399/13804], Loss: 2.7249, Perplexity: 15.2542, time_taken_in_seconds: 81
Epoch [1/1], Step [8400/13804], Loss: 2.5514, Perplexity: 12.8247, time_taken_in_seconds: 82
Epoch [1/1], Step [8401/13804], Loss: 2.3426, Perplexity: 10.4088, time_taken_in_seconds: 0
Epoch [1/1], Step [8402/13804], Loss: 2.8036, Perplexity: 16.5045, time_taken_in_seconds: 1
Epoch [1/1], Step [8403/13804], Loss: 2.3779, Perplexity: 10.7827, time_taken_in_seconds: 2
Epoch [1/1], Step [8404/13804], Loss: 2.6420, Perplexity: 14.0406, time_taken_in_seconds: 3
Epoch [1/1], Step [8405/13804], Loss: 2.6636, Perplexity: 14.3484, time_taken_in_seconds: 4
Epoch [1/1], Step [8406/13804], Loss: 2.4327, Perplexity: 11.3893, time_taken_in_seconds: 4
Epoch [1/1], Step [8407/13804], Loss: 2.4147, Perplexity: 11.1859, time_taken_in_seconds: 5
Epoch [1/1], Step [8408/13804], Loss: 2.4037, Perplexity: 11.0640, time_taken_in_seconds: 6
Epoch [1/1], Step [8409/13804], Loss: 2.6508, Perplexity: 14.1657, time_taken_in_seconds: 7
Epoch [1/1], Step [8410/13804], Loss: 2.5177, Perplexity: 12.4002, time_taken_in_seconds: 8
Epoch [1/1], Step [8411/13804], Loss: 2.7434, Perplexity: 15.5394, time_taken_in_seconds: 8
Epoch [1/1], Step [8412/13804], Loss: 2.4996, Perplexity: 12.1782, time_taken_in_seconds: 9
Epoch [1/1], Step [8413/13804], Loss: 2.3497, Perplexity: 10.4819, time_taken_in_seconds: 10
Epoch [1/1], Step [8414/13804], Loss: 2.5818, Perplexity: 13.2210, time_taken_in_seconds: 11
Epoch [1/1], Step [8415/13804], Loss: 2.3374, Perplexity: 10.3539, time_taken_in_seconds: 12
Epoch [1/1], Step [8416/13804], Loss: 2.5678, Perplexity: 13.0373, time_taken_in_seconds: 13
Epoch [1/1], Step [8417/13804], Loss: 2.6594, Perplexity: 14.2880, time_taken_in_seconds: 13
Epoch [1/1], Step [8418/13804], Loss: 3.0513, Perplexity: 21.1429, time_taken_in_seconds: 14
Epoch [1/1], Step [8419/13804], Loss: 2.2545, Perplexity: 9.5303, time_taken_in_seconds: 15
Epoch [1/1], Step [8420/13804], Loss: 2.7007, Perplexity: 14.8901, time_taken_in_seconds: 16
Epoch [1/1], Step [8421/13804], Loss: 2.5293, Perplexity: 12.5449, time_taken_in_seconds: 17
Epoch [1/1], Step [8422/13804], Loss: 2.5192, Perplexity: 12.4181, time_taken_in_seconds: 17
Epoch [1/1], Step [8423/13804], Loss: 2.9401, Perplexity: 18.9173, time_taken_in_seconds: 18
Epoch [1/1], Step [8424/13804], Loss: 2.6655, Perplexity: 14.3758, time_taken_in_seconds: 19
Epoch [1/1], Step [8425/13804], Loss: 2.3556, Perplexity: 10.5446, time_taken_in_seconds: 20
Epoch [1/1], Step [8426/13804], Loss: 2.9727, Perplexity: 19.5444, time_taken_in_seconds: 21
Epoch [1/1], Step [8427/13804], Loss: 2.7617, Perplexity: 15.8265, time_taken_in_seconds: 22
Epoch [1/1], Step [8428/13804], Loss: 2.8744, Perplexity: 17.7139, time_taken_in_seconds: 22
Epoch [1/1], Step [8429/13804], Loss: 2.6960, Perplexity: 14.8201, time_taken_in_seconds: 23
Epoch [1/1], Step [8430/13804], Loss: 2.5825, Perplexity: 13.2299, time_taken_in_seconds: 24
Epoch [1/1], Step [8431/13804], Loss: 2.8069, Perplexity: 16.5581, time_taken_in_seconds: 25
Epoch [1/1], Step [8432/13804], Loss: 2.5745, Perplexity: 13.1241, time_taken_in_seconds: 26
Epoch [1/1], Step [8433/13804], Loss: 2.8699, Perplexity: 17.6356, time_taken_in_seconds: 26
Epoch [1/1], Step [8434/13804], Loss: 2.4279, Perplexity: 11.3348, time_taken_in_seconds: 27
Epoch [1/1], Step [8435/13804], Loss: 2.4554, Perplexity: 11.6512, time_taken_in_seconds: 28
Epoch [1/1], Step [8436/13804], Loss: 2.3648, Perplexity: 10.6419, time_taken_in_seconds: 29
Epoch [1/1], Step [8437/13804], Loss: 2.4784, Perplexity: 11.9222, time_taken_in_seconds: 30
Epoch [1/1], Step [8438/13804], Loss: 3.2867, Perplexity: 26.7534, time_taken_in_seconds: 31
Epoch [1/1], Step [8439/13804], Loss: 2.4431, Perplexity: 11.5091, time_taken_in_seconds: 32
Epoch [1/1], Step [8440/13804], Loss: 2.4684, Perplexity: 11.8041, time_taken_in_seconds: 33
Epoch [1/1], Step [8441/13804], Loss: 2.3710, Perplexity: 10.7086, time_taken_in_seconds: 33
Epoch [1/1], Step [8442/13804], Loss: 2.7619, Perplexity: 15.8299, time_taken_in_seconds: 34
Epoch [1/1], Step [8443/13804], Loss: 2.8180, Perplexity: 16.7434, time_taken_in_seconds: 35
Epoch [1/1], Step [8444/13804], Loss: 2.4974, Perplexity: 12.1514, time_taken_in_seconds: 36
Epoch [1/1], Step [8445/13804], Loss: 2.9727, Perplexity: 19.5455, time_taken_in_seconds: 37
Epoch [1/1], Step [8446/13804], Loss: 2.8527, Perplexity: 17.3340, time_taken_in_seconds: 37
Epoch [1/1], Step [8447/13804], Loss: 2.6121, Perplexity: 13.6270, time_taken_in_seconds: 38
Epoch [1/1], Step [8448/13804], Loss: 2.9516, Perplexity: 19.1374, time_taken_in_seconds: 39
Epoch [1/1], Step [8449/13804], Loss: 2.5752, Perplexity: 13.1343, time_taken_in_seconds: 40
Epoch [1/1], Step [8450/13804], Loss: 2.2743, Perplexity: 9.7210, time_taken_in_seconds: 41
Epoch [1/1], Step [8451/13804], Loss: 2.4792, Perplexity: 11.9316, time_taken_in_seconds: 42
Epoch [1/1], Step [8452/13804], Loss: 2.7365, Perplexity: 15.4325, time_taken_in_seconds: 42
Epoch [1/1], Step [8453/13804], Loss: 2.5477, Perplexity: 12.7776, time_taken_in_seconds: 43
Epoch [1/1], Step [8454/13804], Loss: 2.6175, Perplexity: 13.7017, time_taken_in_seconds: 44
Epoch [1/1], Step [8455/13804], Loss: 2.6663, Perplexity: 14.3873, time_taken_in_seconds: 45
Epoch [1/1], Step [8456/13804], Loss: 2.2701, Perplexity: 9.6802, time_taken_in_seconds: 46
Epoch [1/1], Step [8457/13804], Loss: 2.6855, Perplexity: 14.6658, time_taken_in_seconds: 46
Epoch [1/1], Step [8458/13804], Loss: 2.6678, Perplexity: 14.4083, time_taken_in_seconds: 47
Epoch [1/1], Step [8459/13804], Loss: 2.3787, Perplexity: 10.7912, time_taken_in_seconds: 48
Epoch [1/1], Step [8460/13804], Loss: 2.4728, Perplexity: 11.8550, time_taken_in_seconds: 49
Epoch [1/1], Step [8461/13804], Loss: 3.1026, Perplexity: 22.2554, time_taken_in_seconds: 50
Epoch [1/1], Step [8462/13804], Loss: 2.6249, Perplexity: 13.8038, time_taken_in_seconds: 51
Epoch [1/1], Step [8463/13804], Loss: 2.7728, Perplexity: 16.0030, time_taken_in_seconds: 51
Epoch [1/1], Step [8464/13804], Loss: 2.2137, Perplexity: 9.1496, time_taken_in_seconds: 52
Epoch [1/1], Step [8465/13804], Loss: 2.8948, Perplexity: 18.0801, time_taken_in_seconds: 53
Epoch [1/1], Step [8466/13804], Loss: 2.6721, Perplexity: 14.4699, time_taken_in_seconds: 54
Epoch [1/1], Step [8467/13804], Loss: 2.5339, Perplexity: 12.6026, time_taken_in_seconds: 55
Epoch [1/1], Step [8468/13804], Loss: 3.0768, Perplexity: 21.6882, time_taken_in_seconds: 55
Epoch [1/1], Step [8469/13804], Loss: 2.3981, Perplexity: 11.0026, time_taken_in_seconds: 56
Epoch [1/1], Step [8470/13804], Loss: 2.6138, Perplexity: 13.6505, time_taken_in_seconds: 57
Epoch [1/1], Step [8471/13804], Loss: 2.6031, Perplexity: 13.5050, time_taken_in_seconds: 58
Epoch [1/1], Step [8472/13804], Loss: 2.4800, Perplexity: 11.9418, time_taken_in_seconds: 59
Epoch [1/1], Step [8473/13804], Loss: 2.6927, Perplexity: 14.7717, time_taken_in_seconds: 59
Epoch [1/1], Step [8474/13804], Loss: 2.5818, Perplexity: 13.2213, time_taken_in_seconds: 60
Epoch [1/1], Step [8475/13804], Loss: 2.5220, Perplexity: 12.4540, time_taken_in_seconds: 61
Epoch [1/1], Step [8476/13804], Loss: 2.5312, Perplexity: 12.5685, time_taken_in_seconds: 62
Epoch [1/1], Step [8477/13804], Loss: 2.7118, Perplexity: 15.0571, time_taken_in_seconds: 63
Epoch [1/1], Step [8478/13804], Loss: 2.8240, Perplexity: 16.8448, time_taken_in_seconds: 64
Epoch [1/1], Step [8479/13804], Loss: 2.7033, Perplexity: 14.9288, time_taken_in_seconds: 64
Epoch [1/1], Step [8480/13804], Loss: 2.7573, Perplexity: 15.7572, time_taken_in_seconds: 65
Epoch [1/1], Step [8481/13804], Loss: 2.4169, Perplexity: 11.2107, time_taken_in_seconds: 66
Epoch [1/1], Step [8482/13804], Loss: 2.6759, Perplexity: 14.5249, time_taken_in_seconds: 67
Epoch [1/1], Step [8483/13804], Loss: 2.3550, Perplexity: 10.5378, time_taken_in_seconds: 68
Epoch [1/1], Step [8484/13804], Loss: 2.4486, Perplexity: 11.5717, time_taken_in_seconds: 68
Epoch [1/1], Step [8485/13804], Loss: 3.2265, Perplexity: 25.1922, time_taken_in_seconds: 69
Epoch [1/1], Step [8486/13804], Loss: 2.2754, Perplexity: 9.7323, time_taken_in_seconds: 70
Epoch [1/1], Step [8487/13804], Loss: 2.9177, Perplexity: 18.4981, time_taken_in_seconds: 71
Epoch [1/1], Step [8488/13804], Loss: 2.8319, Perplexity: 16.9768, time_taken_in_seconds: 72
Epoch [1/1], Step [8489/13804], Loss: 2.3725, Perplexity: 10.7240, time_taken_in_seconds: 73
Epoch [1/1], Step [8490/13804], Loss: 2.6282, Perplexity: 13.8482, time_taken_in_seconds: 73
Epoch [1/1], Step [8491/13804], Loss: 2.5462, Perplexity: 12.7580, time_taken_in_seconds: 74
Epoch [1/1], Step [8492/13804], Loss: 2.4903, Perplexity: 12.0646, time_taken_in_seconds: 75
Epoch [1/1], Step [8493/13804], Loss: 2.7986, Perplexity: 16.4209, time_taken_in_seconds: 76
Epoch [1/1], Step [8494/13804], Loss: 2.6928, Perplexity: 14.7734, time_taken_in_seconds: 77
Epoch [1/1], Step [8495/13804], Loss: 2.4547, Perplexity: 11.6434, time_taken_in_seconds: 77
Epoch [1/1], Step [8496/13804], Loss: 2.5062, Perplexity: 12.2581, time_taken_in_seconds: 78
Epoch [1/1], Step [8497/13804], Loss: 2.7098, Perplexity: 15.0265, time_taken_in_seconds: 79
Epoch [1/1], Step [8498/13804], Loss: 3.2867, Perplexity: 26.7539, time_taken_in_seconds: 80
Epoch [1/1], Step [8499/13804], Loss: 2.7381, Perplexity: 15.4581, time_taken_in_seconds: 81
Epoch [1/1], Step [8500/13804], Loss: 2.3215, Perplexity: 10.1907, time_taken_in_seconds: 81
Epoch [1/1], Step [8501/13804], Loss: 2.3435, Perplexity: 10.4181, time_taken_in_seconds: 0
Epoch [1/1], Step [8502/13804], Loss: 2.5710, Perplexity: 13.0788, time_taken_in_seconds: 1
Epoch [1/1], Step [8503/13804], Loss: 2.8415, Perplexity: 17.1415, time_taken_in_seconds: 2
Epoch [1/1], Step [8504/13804], Loss: 2.6726, Perplexity: 14.4770, time_taken_in_seconds: 3
Epoch [1/1], Step [8505/13804], Loss: 2.2581, Perplexity: 9.5647, time_taken_in_seconds: 4
Epoch [1/1], Step [8506/13804], Loss: 2.4968, Perplexity: 12.1436, time_taken_in_seconds: 4
Epoch [1/1], Step [8507/13804], Loss: 2.3603, Perplexity: 10.5937, time_taken_in_seconds: 5
Epoch [1/1], Step [8508/13804], Loss: 2.6857, Perplexity: 14.6692, time_taken_in_seconds: 6
Epoch [1/1], Step [8509/13804], Loss: 2.8423, Perplexity: 17.1559, time_taken_in_seconds: 7
Epoch [1/1], Step [8510/13804], Loss: 2.4922, Perplexity: 12.0875, time_taken_in_seconds: 8
Epoch [1/1], Step [8511/13804], Loss: 2.5683, Perplexity: 13.0431, time_taken_in_seconds: 9
Epoch [1/1], Step [8512/13804], Loss: 2.4076, Perplexity: 11.1068, time_taken_in_seconds: 9
Epoch [1/1], Step [8513/13804], Loss: 2.5555, Perplexity: 12.8780, time_taken_in_seconds: 10
Epoch [1/1], Step [8514/13804], Loss: 2.3887, Perplexity: 10.8998, time_taken_in_seconds: 11
Epoch [1/1], Step [8515/13804], Loss: 2.7062, Perplexity: 14.9729, time_taken_in_seconds: 12
Epoch [1/1], Step [8516/13804], Loss: 2.7805, Perplexity: 16.1267, time_taken_in_seconds: 13
Epoch [1/1], Step [8517/13804], Loss: 3.2478, Perplexity: 25.7336, time_taken_in_seconds: 14
Epoch [1/1], Step [8518/13804], Loss: 2.5920, Perplexity: 13.3558, time_taken_in_seconds: 14
Epoch [1/1], Step [8519/13804], Loss: 2.8329, Perplexity: 16.9941, time_taken_in_seconds: 15
Epoch [1/1], Step [8520/13804], Loss: 2.5919, Perplexity: 13.3555, time_taken_in_seconds: 16
Epoch [1/1], Step [8521/13804], Loss: 2.6191, Perplexity: 13.7234, time_taken_in_seconds: 17
Epoch [1/1], Step [8522/13804], Loss: 2.5559, Perplexity: 12.8833, time_taken_in_seconds: 18
Epoch [1/1], Step [8523/13804], Loss: 2.9029, Perplexity: 18.2275, time_taken_in_seconds: 19
Epoch [1/1], Step [8524/13804], Loss: 2.7296, Perplexity: 15.3274, time_taken_in_seconds: 19
Epoch [1/1], Step [8525/13804], Loss: 2.9975, Perplexity: 20.0347, time_taken_in_seconds: 20
Epoch [1/1], Step [8526/13804], Loss: 2.7997, Perplexity: 16.4405, time_taken_in_seconds: 21
Epoch [1/1], Step [8527/13804], Loss: 3.1400, Perplexity: 23.1033, time_taken_in_seconds: 22
Epoch [1/1], Step [8528/13804], Loss: 2.4906, Perplexity: 12.0681, time_taken_in_seconds: 23
Epoch [1/1], Step [8529/13804], Loss: 2.3326, Perplexity: 10.3043, time_taken_in_seconds: 23
Epoch [1/1], Step [8530/13804], Loss: 3.1026, Perplexity: 22.2554, time_taken_in_seconds: 24
Epoch [1/1], Step [8531/13804], Loss: 2.3389, Perplexity: 10.3701, time_taken_in_seconds: 25
Epoch [1/1], Step [8532/13804], Loss: 2.4780, Perplexity: 11.9176, time_taken_in_seconds: 26
Epoch [1/1], Step [8533/13804], Loss: 2.3542, Perplexity: 10.5299, time_taken_in_seconds: 27
Epoch [1/1], Step [8534/13804], Loss: 3.0355, Perplexity: 20.8106, time_taken_in_seconds: 28
Epoch [1/1], Step [8535/13804], Loss: 2.4246, Perplexity: 11.2977, time_taken_in_seconds: 28
Epoch [1/1], Step [8536/13804], Loss: 2.2996, Perplexity: 9.9703, time_taken_in_seconds: 29
Epoch [1/1], Step [8537/13804], Loss: 2.5800, Perplexity: 13.1977, time_taken_in_seconds: 30
Epoch [1/1], Step [8538/13804], Loss: 2.4980, Perplexity: 12.1579, time_taken_in_seconds: 31
Epoch [1/1], Step [8539/13804], Loss: 2.6143, Perplexity: 13.6578, time_taken_in_seconds: 32
Epoch [1/1], Step [8540/13804], Loss: 2.6772, Perplexity: 14.5448, time_taken_in_seconds: 33
Epoch [1/1], Step [8541/13804], Loss: 2.8457, Perplexity: 17.2136, time_taken_in_seconds: 33
Epoch [1/1], Step [8542/13804], Loss: 2.8796, Perplexity: 17.8073, time_taken_in_seconds: 34
Epoch [1/1], Step [8543/13804], Loss: 2.6759, Perplexity: 14.5253, time_taken_in_seconds: 35
Epoch [1/1], Step [8544/13804], Loss: 2.4293, Perplexity: 11.3513, time_taken_in_seconds: 36
Epoch [1/1], Step [8545/13804], Loss: 2.6923, Perplexity: 14.7659, time_taken_in_seconds: 37
Epoch [1/1], Step [8546/13804], Loss: 2.7325, Perplexity: 15.3714, time_taken_in_seconds: 38
Epoch [1/1], Step [8547/13804], Loss: 2.6637, Perplexity: 14.3497, time_taken_in_seconds: 38
Epoch [1/1], Step [8548/13804], Loss: 2.2680, Perplexity: 9.6603, time_taken_in_seconds: 39
Epoch [1/1], Step [8549/13804], Loss: 2.4475, Perplexity: 11.5599, time_taken_in_seconds: 40
Epoch [1/1], Step [8550/13804], Loss: 2.6667, Perplexity: 14.3917, time_taken_in_seconds: 41
Epoch [1/1], Step [8551/13804], Loss: 2.6626, Perplexity: 14.3331, time_taken_in_seconds: 42
Epoch [1/1], Step [8552/13804], Loss: 2.4088, Perplexity: 11.1207, time_taken_in_seconds: 42
Epoch [1/1], Step [8553/13804], Loss: 2.4294, Perplexity: 11.3519, time_taken_in_seconds: 43
Epoch [1/1], Step [8554/13804], Loss: 2.6148, Perplexity: 13.6645, time_taken_in_seconds: 44
Epoch [1/1], Step [8555/13804], Loss: 2.6154, Perplexity: 13.6729, time_taken_in_seconds: 45
Epoch [1/1], Step [8556/13804], Loss: 2.4502, Perplexity: 11.5901, time_taken_in_seconds: 46
Epoch [1/1], Step [8557/13804], Loss: 2.7834, Perplexity: 16.1745, time_taken_in_seconds: 47
Epoch [1/1], Step [8558/13804], Loss: 2.4957, Perplexity: 12.1299, time_taken_in_seconds: 47
Epoch [1/1], Step [8559/13804], Loss: 2.6370, Perplexity: 13.9716, time_taken_in_seconds: 48
Epoch [1/1], Step [8560/13804], Loss: 2.7785, Perplexity: 16.0954, time_taken_in_seconds: 49
Epoch [1/1], Step [8561/13804], Loss: 2.7448, Perplexity: 15.5613, time_taken_in_seconds: 50
Epoch [1/1], Step [8562/13804], Loss: 2.2963, Perplexity: 9.9376, time_taken_in_seconds: 51
Epoch [1/1], Step [8563/13804], Loss: 2.5239, Perplexity: 12.4766, time_taken_in_seconds: 52
Epoch [1/1], Step [8564/13804], Loss: 2.8380, Perplexity: 17.0819, time_taken_in_seconds: 52
Epoch [1/1], Step [8565/13804], Loss: 2.7790, Perplexity: 16.1028, time_taken_in_seconds: 53
Epoch [1/1], Step [8566/13804], Loss: 2.5130, Perplexity: 12.3425, time_taken_in_seconds: 54
Epoch [1/1], Step [8567/13804], Loss: 2.6125, Perplexity: 13.6331, time_taken_in_seconds: 55
Epoch [1/1], Step [8568/13804], Loss: 2.7533, Perplexity: 15.6942, time_taken_in_seconds: 56
Epoch [1/1], Step [8569/13804], Loss: 2.4854, Perplexity: 12.0056, time_taken_in_seconds: 56
Epoch [1/1], Step [8570/13804], Loss: 2.5366, Perplexity: 12.6367, time_taken_in_seconds: 57
Epoch [1/1], Step [8571/13804], Loss: 3.2307, Perplexity: 25.2983, time_taken_in_seconds: 58
Epoch [1/1], Step [8572/13804], Loss: 2.9402, Perplexity: 18.9188, time_taken_in_seconds: 59
Epoch [1/1], Step [8573/13804], Loss: 2.6603, Perplexity: 14.3002, time_taken_in_seconds: 60
Epoch [1/1], Step [8574/13804], Loss: 2.5836, Perplexity: 13.2451, time_taken_in_seconds: 61
Epoch [1/1], Step [8575/13804], Loss: 2.0910, Perplexity: 8.0929, time_taken_in_seconds: 61
Epoch [1/1], Step [8576/13804], Loss: 2.7783, Perplexity: 16.0920, time_taken_in_seconds: 62
Epoch [1/1], Step [8577/13804], Loss: 2.8914, Perplexity: 18.0189, time_taken_in_seconds: 63
Epoch [1/1], Step [8578/13804], Loss: 2.5892, Perplexity: 13.3192, time_taken_in_seconds: 64
Epoch [1/1], Step [8579/13804], Loss: 2.5281, Perplexity: 12.5297, time_taken_in_seconds: 65
Epoch [1/1], Step [8580/13804], Loss: 2.8052, Perplexity: 16.5305, time_taken_in_seconds: 66
Epoch [1/1], Step [8581/13804], Loss: 3.2833, Perplexity: 26.6645, time_taken_in_seconds: 66
Epoch [1/1], Step [8582/13804], Loss: 2.4809, Perplexity: 11.9520, time_taken_in_seconds: 67
Epoch [1/1], Step [8583/13804], Loss: 2.6664, Perplexity: 14.3881, time_taken_in_seconds: 68
Epoch [1/1], Step [8584/13804], Loss: 2.5261, Perplexity: 12.5046, time_taken_in_seconds: 69
Epoch [1/1], Step [8585/13804], Loss: 2.6506, Perplexity: 14.1624, time_taken_in_seconds: 70
Epoch [1/1], Step [8586/13804], Loss: 2.5791, Perplexity: 13.1848, time_taken_in_seconds: 71
Epoch [1/1], Step [8587/13804], Loss: 2.5087, Perplexity: 12.2884, time_taken_in_seconds: 72
Epoch [1/1], Step [8588/13804], Loss: 2.4562, Perplexity: 11.6608, time_taken_in_seconds: 73
Epoch [1/1], Step [8589/13804], Loss: 2.6067, Perplexity: 13.5547, time_taken_in_seconds: 73
Epoch [1/1], Step [8590/13804], Loss: 2.5160, Perplexity: 12.3786, time_taken_in_seconds: 74
Epoch [1/1], Step [8591/13804], Loss: 2.7145, Perplexity: 15.0969, time_taken_in_seconds: 75
Epoch [1/1], Step [8592/13804], Loss: 2.8592, Perplexity: 17.4469, time_taken_in_seconds: 76
Epoch [1/1], Step [8593/13804], Loss: 2.3540, Perplexity: 10.5274, time_taken_in_seconds: 77
Epoch [1/1], Step [8594/13804], Loss: 2.2881, Perplexity: 9.8561, time_taken_in_seconds: 77
Epoch [1/1], Step [8595/13804], Loss: 2.6670, Perplexity: 14.3963, time_taken_in_seconds: 78
Epoch [1/1], Step [8596/13804], Loss: 2.4255, Perplexity: 11.3084, time_taken_in_seconds: 79
Epoch [1/1], Step [8597/13804], Loss: 2.5050, Perplexity: 12.2439, time_taken_in_seconds: 80
Epoch [1/1], Step [8598/13804], Loss: 2.6500, Perplexity: 14.1541, time_taken_in_seconds: 81
Epoch [1/1], Step [8599/13804], Loss: 3.0890, Perplexity: 21.9544, time_taken_in_seconds: 82
Epoch [1/1], Step [8600/13804], Loss: 2.3868, Perplexity: 10.8785, time_taken_in_seconds: 82
Epoch [1/1], Step [8601/13804], Loss: 2.7443, Perplexity: 15.5544, time_taken_in_seconds: 0
Epoch [1/1], Step [8602/13804], Loss: 2.4883, Perplexity: 12.0405, time_taken_in_seconds: 1
Epoch [1/1], Step [8603/13804], Loss: 2.7161, Perplexity: 15.1219, time_taken_in_seconds: 2
Epoch [1/1], Step [8604/13804], Loss: 2.7200, Perplexity: 15.1811, time_taken_in_seconds: 3
Epoch [1/1], Step [8605/13804], Loss: 2.4639, Perplexity: 11.7500, time_taken_in_seconds: 4
Epoch [1/1], Step [8606/13804], Loss: 2.6371, Perplexity: 13.9729, time_taken_in_seconds: 4
Epoch [1/1], Step [8607/13804], Loss: 2.5089, Perplexity: 12.2908, time_taken_in_seconds: 5
Epoch [1/1], Step [8608/13804], Loss: 2.4918, Perplexity: 12.0828, time_taken_in_seconds: 6
Epoch [1/1], Step [8609/13804], Loss: 2.5595, Perplexity: 12.9287, time_taken_in_seconds: 7
Epoch [1/1], Step [8610/13804], Loss: 2.7426, Perplexity: 15.5275, time_taken_in_seconds: 8
Epoch [1/1], Step [8611/13804], Loss: 2.7343, Perplexity: 15.3990, time_taken_in_seconds: 9
Epoch [1/1], Step [8612/13804], Loss: 2.4930, Perplexity: 12.0971, time_taken_in_seconds: 9
Epoch [1/1], Step [8613/13804], Loss: 2.9510, Perplexity: 19.1254, time_taken_in_seconds: 10
Epoch [1/1], Step [8614/13804], Loss: 2.4608, Perplexity: 11.7144, time_taken_in_seconds: 11
Epoch [1/1], Step [8615/13804], Loss: 3.1836, Perplexity: 24.1344, time_taken_in_seconds: 12
Epoch [1/1], Step [8616/13804], Loss: 2.8034, Perplexity: 16.5009, time_taken_in_seconds: 13
Epoch [1/1], Step [8617/13804], Loss: 2.6057, Perplexity: 13.5413, time_taken_in_seconds: 14
Epoch [1/1], Step [8618/13804], Loss: 2.6264, Perplexity: 13.8243, time_taken_in_seconds: 14
Epoch [1/1], Step [8619/13804], Loss: 2.6670, Perplexity: 14.3962, time_taken_in_seconds: 15
Epoch [1/1], Step [8620/13804], Loss: 2.5794, Perplexity: 13.1895, time_taken_in_seconds: 16
Epoch [1/1], Step [8621/13804], Loss: 2.8766, Perplexity: 17.7533, time_taken_in_seconds: 17
Epoch [1/1], Step [8622/13804], Loss: 2.3875, Perplexity: 10.8857, time_taken_in_seconds: 18
Epoch [1/1], Step [8623/13804], Loss: 2.4608, Perplexity: 11.7146, time_taken_in_seconds: 18
Epoch [1/1], Step [8624/13804], Loss: 2.3560, Perplexity: 10.5485, time_taken_in_seconds: 19
Epoch [1/1], Step [8625/13804], Loss: 2.6009, Perplexity: 13.4755, time_taken_in_seconds: 20
Epoch [1/1], Step [8626/13804], Loss: 2.4335, Perplexity: 11.3992, time_taken_in_seconds: 21
Epoch [1/1], Step [8627/13804], Loss: 2.9713, Perplexity: 19.5177, time_taken_in_seconds: 22
Epoch [1/1], Step [8628/13804], Loss: 2.3312, Perplexity: 10.2900, time_taken_in_seconds: 23
Epoch [1/1], Step [8629/13804], Loss: 2.5172, Perplexity: 12.3933, time_taken_in_seconds: 23
Epoch [1/1], Step [8630/13804], Loss: 2.2473, Perplexity: 9.4624, time_taken_in_seconds: 24
Epoch [1/1], Step [8631/13804], Loss: 2.8103, Perplexity: 16.6148, time_taken_in_seconds: 25
Epoch [1/1], Step [8632/13804], Loss: 2.7127, Perplexity: 15.0699, time_taken_in_seconds: 26
Epoch [1/1], Step [8633/13804], Loss: 2.8052, Perplexity: 16.5309, time_taken_in_seconds: 27
Epoch [1/1], Step [8634/13804], Loss: 2.9267, Perplexity: 18.6654, time_taken_in_seconds: 28
Epoch [1/1], Step [8635/13804], Loss: 2.3805, Perplexity: 10.8098, time_taken_in_seconds: 28
Epoch [1/1], Step [8636/13804], Loss: 2.4009, Perplexity: 11.0334, time_taken_in_seconds: 29
Epoch [1/1], Step [8637/13804], Loss: 2.2114, Perplexity: 9.1288, time_taken_in_seconds: 30
Epoch [1/1], Step [8638/13804], Loss: 2.8668, Perplexity: 17.5810, time_taken_in_seconds: 31
Epoch [1/1], Step [8639/13804], Loss: 2.6691, Perplexity: 14.4276, time_taken_in_seconds: 32
Epoch [1/1], Step [8640/13804], Loss: 2.6504, Perplexity: 14.1595, time_taken_in_seconds: 32
Epoch [1/1], Step [8641/13804], Loss: 2.6443, Perplexity: 14.0739, time_taken_in_seconds: 33
Epoch [1/1], Step [8642/13804], Loss: 2.4661, Perplexity: 11.7767, time_taken_in_seconds: 34
Epoch [1/1], Step [8643/13804], Loss: 2.3307, Perplexity: 10.2853, time_taken_in_seconds: 35
Epoch [1/1], Step [8644/13804], Loss: 2.7811, Perplexity: 16.1372, time_taken_in_seconds: 36
Epoch [1/1], Step [8645/13804], Loss: 2.5797, Perplexity: 13.1933, time_taken_in_seconds: 37
Epoch [1/1], Step [8646/13804], Loss: 2.4114, Perplexity: 11.1499, time_taken_in_seconds: 37
Epoch [1/1], Step [8647/13804], Loss: 2.4071, Perplexity: 11.1014, time_taken_in_seconds: 38
Epoch [1/1], Step [8648/13804], Loss: 2.6163, Perplexity: 13.6857, time_taken_in_seconds: 39
Epoch [1/1], Step [8649/13804], Loss: 3.2113, Perplexity: 24.8125, time_taken_in_seconds: 40
Epoch [1/1], Step [8650/13804], Loss: 2.6166, Perplexity: 13.6889, time_taken_in_seconds: 41
Epoch [1/1], Step [8651/13804], Loss: 2.6059, Perplexity: 13.5437, time_taken_in_seconds: 42
Epoch [1/1], Step [8652/13804], Loss: 2.4435, Perplexity: 11.5129, time_taken_in_seconds: 42
Epoch [1/1], Step [8653/13804], Loss: 2.8405, Perplexity: 17.1243, time_taken_in_seconds: 43
Epoch [1/1], Step [8654/13804], Loss: 2.7053, Perplexity: 14.9582, time_taken_in_seconds: 44
Epoch [1/1], Step [8655/13804], Loss: 3.6140, Perplexity: 37.1139, time_taken_in_seconds: 45
Epoch [1/1], Step [8656/13804], Loss: 2.9894, Perplexity: 19.8744, time_taken_in_seconds: 46
Epoch [1/1], Step [8657/13804], Loss: 2.4592, Perplexity: 11.6958, time_taken_in_seconds: 46
Epoch [1/1], Step [8658/13804], Loss: 2.4820, Perplexity: 11.9650, time_taken_in_seconds: 47
Epoch [1/1], Step [8659/13804], Loss: 2.6847, Perplexity: 14.6539, time_taken_in_seconds: 48
Epoch [1/1], Step [8660/13804], Loss: 2.2020, Perplexity: 9.0427, time_taken_in_seconds: 49
Epoch [1/1], Step [8661/13804], Loss: 2.5422, Perplexity: 12.7082, time_taken_in_seconds: 50
Epoch [1/1], Step [8662/13804], Loss: 2.5340, Perplexity: 12.6036, time_taken_in_seconds: 51
Epoch [1/1], Step [8663/13804], Loss: 2.8391, Perplexity: 17.1005, time_taken_in_seconds: 52
Epoch [1/1], Step [8664/13804], Loss: 2.6594, Perplexity: 14.2871, time_taken_in_seconds: 52
Epoch [1/1], Step [8665/13804], Loss: 3.1147, Perplexity: 22.5258, time_taken_in_seconds: 53
Epoch [1/1], Step [8666/13804], Loss: 2.5480, Perplexity: 12.7818, time_taken_in_seconds: 54
Epoch [1/1], Step [8667/13804], Loss: 2.4076, Perplexity: 11.1075, time_taken_in_seconds: 55
Epoch [1/1], Step [8668/13804], Loss: 2.3334, Perplexity: 10.3125, time_taken_in_seconds: 56
Epoch [1/1], Step [8669/13804], Loss: 2.3226, Perplexity: 10.2020, time_taken_in_seconds: 56
Epoch [1/1], Step [8670/13804], Loss: 2.5373, Perplexity: 12.6455, time_taken_in_seconds: 57
Epoch [1/1], Step [8671/13804], Loss: 2.5000, Perplexity: 12.1826, time_taken_in_seconds: 58
Epoch [1/1], Step [8672/13804], Loss: 2.7067, Perplexity: 14.9803, time_taken_in_seconds: 59
Epoch [1/1], Step [8673/13804], Loss: 2.4432, Perplexity: 11.5101, time_taken_in_seconds: 60
Epoch [1/1], Step [8674/13804], Loss: 3.1456, Perplexity: 23.2338, time_taken_in_seconds: 61
Epoch [1/1], Step [8675/13804], Loss: 2.5281, Perplexity: 12.5293, time_taken_in_seconds: 61
Epoch [1/1], Step [8676/13804], Loss: 2.4221, Perplexity: 11.2693, time_taken_in_seconds: 62
Epoch [1/1], Step [8677/13804], Loss: 2.7649, Perplexity: 15.8782, time_taken_in_seconds: 63
Epoch [1/1], Step [8678/13804], Loss: 2.1196, Perplexity: 8.3281, time_taken_in_seconds: 64
Epoch [1/1], Step [8679/13804], Loss: 2.5759, Perplexity: 13.1433, time_taken_in_seconds: 65
Epoch [1/1], Step [8680/13804], Loss: 3.5022, Perplexity: 33.1898, time_taken_in_seconds: 66
Epoch [1/1], Step [8681/13804], Loss: 2.9485, Perplexity: 19.0777, time_taken_in_seconds: 66
Epoch [1/1], Step [8682/13804], Loss: 2.5108, Perplexity: 12.3149, time_taken_in_seconds: 67
Epoch [1/1], Step [8683/13804], Loss: 2.6080, Perplexity: 13.5721, time_taken_in_seconds: 68
Epoch [1/1], Step [8684/13804], Loss: 2.5355, Perplexity: 12.6225, time_taken_in_seconds: 69
Epoch [1/1], Step [8685/13804], Loss: 3.0772, Perplexity: 21.6975, time_taken_in_seconds: 70
Epoch [1/1], Step [8686/13804], Loss: 2.6001, Perplexity: 13.4646, time_taken_in_seconds: 71
Epoch [1/1], Step [8687/13804], Loss: 2.5042, Perplexity: 12.2337, time_taken_in_seconds: 71
Epoch [1/1], Step [8688/13804], Loss: 2.1487, Perplexity: 8.5741, time_taken_in_seconds: 72
Epoch [1/1], Step [8689/13804], Loss: 2.7699, Perplexity: 15.9576, time_taken_in_seconds: 73
Epoch [1/1], Step [8690/13804], Loss: 2.4181, Perplexity: 11.2246, time_taken_in_seconds: 74
Epoch [1/1], Step [8691/13804], Loss: 2.5547, Perplexity: 12.8677, time_taken_in_seconds: 75
Epoch [1/1], Step [8692/13804], Loss: 2.3685, Perplexity: 10.6812, time_taken_in_seconds: 76
Epoch [1/1], Step [8693/13804], Loss: 2.4959, Perplexity: 12.1332, time_taken_in_seconds: 76
Epoch [1/1], Step [8694/13804], Loss: 2.6841, Perplexity: 14.6447, time_taken_in_seconds: 77
Epoch [1/1], Step [8695/13804], Loss: 3.1098, Perplexity: 22.4164, time_taken_in_seconds: 78
Epoch [1/1], Step [8696/13804], Loss: 2.4709, Perplexity: 11.8334, time_taken_in_seconds: 79
Epoch [1/1], Step [8697/13804], Loss: 2.4907, Perplexity: 12.0702, time_taken_in_seconds: 80
Epoch [1/1], Step [8698/13804], Loss: 2.9706, Perplexity: 19.5029, time_taken_in_seconds: 81
Epoch [1/1], Step [8699/13804], Loss: 2.5339, Perplexity: 12.6030, time_taken_in_seconds: 81
Epoch [1/1], Step [8700/13804], Loss: 2.4477, Perplexity: 11.5613, time_taken_in_seconds: 82
Epoch [1/1], Step [8701/13804], Loss: 2.5403, Perplexity: 12.6839, time_taken_in_seconds: 0
Epoch [1/1], Step [8702/13804], Loss: 2.4571, Perplexity: 11.6715, time_taken_in_seconds: 1
Epoch [1/1], Step [8703/13804], Loss: 2.9316, Perplexity: 18.7574, time_taken_in_seconds: 2
Epoch [1/1], Step [8704/13804], Loss: 2.8981, Perplexity: 18.1401, time_taken_in_seconds: 3
Epoch [1/1], Step [8705/13804], Loss: 2.4843, Perplexity: 11.9924, time_taken_in_seconds: 4
Epoch [1/1], Step [8706/13804], Loss: 2.6217, Perplexity: 13.7585, time_taken_in_seconds: 4
Epoch [1/1], Step [8707/13804], Loss: 2.4406, Perplexity: 11.4802, time_taken_in_seconds: 5
Epoch [1/1], Step [8708/13804], Loss: 2.1714, Perplexity: 8.7707, time_taken_in_seconds: 6
Epoch [1/1], Step [8709/13804], Loss: 2.9667, Perplexity: 19.4277, time_taken_in_seconds: 7
Epoch [1/1], Step [8710/13804], Loss: 2.8555, Perplexity: 17.3824, time_taken_in_seconds: 8
Epoch [1/1], Step [8711/13804], Loss: 2.5224, Perplexity: 12.4586, time_taken_in_seconds: 9
Epoch [1/1], Step [8712/13804], Loss: 2.9774, Perplexity: 19.6362, time_taken_in_seconds: 9
Epoch [1/1], Step [8713/13804], Loss: 2.4410, Perplexity: 11.4847, time_taken_in_seconds: 10
Epoch [1/1], Step [8714/13804], Loss: 2.6591, Perplexity: 14.2838, time_taken_in_seconds: 11
Epoch [1/1], Step [8715/13804], Loss: 3.1765, Perplexity: 23.9638, time_taken_in_seconds: 12
Epoch [1/1], Step [8716/13804], Loss: 2.5710, Perplexity: 13.0787, time_taken_in_seconds: 13
Epoch [1/1], Step [8717/13804], Loss: 2.6811, Perplexity: 14.6011, time_taken_in_seconds: 14
Epoch [1/1], Step [8718/13804], Loss: 2.6368, Perplexity: 13.9691, time_taken_in_seconds: 14
Epoch [1/1], Step [8719/13804], Loss: 2.2414, Perplexity: 9.4062, time_taken_in_seconds: 15
Epoch [1/1], Step [8720/13804], Loss: 3.0976, Perplexity: 22.1439, time_taken_in_seconds: 16
Epoch [1/1], Step [8721/13804], Loss: 2.5677, Perplexity: 13.0353, time_taken_in_seconds: 17
Epoch [1/1], Step [8722/13804], Loss: 2.8078, Perplexity: 16.5731, time_taken_in_seconds: 18
Epoch [1/1], Step [8723/13804], Loss: 2.1875, Perplexity: 8.9130, time_taken_in_seconds: 19
Epoch [1/1], Step [8724/13804], Loss: 2.3293, Perplexity: 10.2712, time_taken_in_seconds: 19
Epoch [1/1], Step [8725/13804], Loss: 2.4214, Perplexity: 11.2617, time_taken_in_seconds: 20
Epoch [1/1], Step [8726/13804], Loss: 2.6174, Perplexity: 13.6999, time_taken_in_seconds: 21
Epoch [1/1], Step [8727/13804], Loss: 2.5852, Perplexity: 13.2659, time_taken_in_seconds: 22
Epoch [1/1], Step [8728/13804], Loss: 2.6736, Perplexity: 14.4926, time_taken_in_seconds: 23
Epoch [1/1], Step [8729/13804], Loss: 2.4173, Perplexity: 11.2156, time_taken_in_seconds: 24
Epoch [1/1], Step [8730/13804], Loss: 2.7226, Perplexity: 15.2199, time_taken_in_seconds: 24
Epoch [1/1], Step [8731/13804], Loss: 2.7211, Perplexity: 15.1977, time_taken_in_seconds: 25
Epoch [1/1], Step [8732/13804], Loss: 2.7653, Perplexity: 15.8843, time_taken_in_seconds: 26
Epoch [1/1], Step [8733/13804], Loss: 2.3127, Perplexity: 10.1013, time_taken_in_seconds: 27
Epoch [1/1], Step [8734/13804], Loss: 2.4174, Perplexity: 11.2171, time_taken_in_seconds: 28
Epoch [1/1], Step [8735/13804], Loss: 2.7198, Perplexity: 15.1780, time_taken_in_seconds: 29
Epoch [1/1], Step [8736/13804], Loss: 2.5728, Perplexity: 13.1019, time_taken_in_seconds: 30
Epoch [1/1], Step [8737/13804], Loss: 2.9657, Perplexity: 19.4088, time_taken_in_seconds: 30
Epoch [1/1], Step [8738/13804], Loss: 2.7230, Perplexity: 15.2260, time_taken_in_seconds: 31
Epoch [1/1], Step [8739/13804], Loss: 2.9228, Perplexity: 18.5926, time_taken_in_seconds: 32
Epoch [1/1], Step [8740/13804], Loss: 2.3592, Perplexity: 10.5828, time_taken_in_seconds: 33
Epoch [1/1], Step [8741/13804], Loss: 2.4209, Perplexity: 11.2557, time_taken_in_seconds: 34
Epoch [1/1], Step [8742/13804], Loss: 2.5132, Perplexity: 12.3441, time_taken_in_seconds: 35
Epoch [1/1], Step [8743/13804], Loss: 2.6037, Perplexity: 13.5132, time_taken_in_seconds: 35
Epoch [1/1], Step [8744/13804], Loss: 2.6936, Perplexity: 14.7849, time_taken_in_seconds: 36
Epoch [1/1], Step [8745/13804], Loss: 2.6502, Perplexity: 14.1563, time_taken_in_seconds: 37
Epoch [1/1], Step [8746/13804], Loss: 2.7848, Perplexity: 16.1960, time_taken_in_seconds: 38
Epoch [1/1], Step [8747/13804], Loss: 2.4159, Perplexity: 11.1994, time_taken_in_seconds: 39
Epoch [1/1], Step [8748/13804], Loss: 2.8615, Perplexity: 17.4881, time_taken_in_seconds: 40
Epoch [1/1], Step [8749/13804], Loss: 2.4059, Perplexity: 11.0888, time_taken_in_seconds: 40
Epoch [1/1], Step [8750/13804], Loss: 2.5947, Perplexity: 13.3922, time_taken_in_seconds: 41
Epoch [1/1], Step [8751/13804], Loss: 2.6397, Perplexity: 14.0088, time_taken_in_seconds: 42
Epoch [1/1], Step [8752/13804], Loss: 2.9142, Perplexity: 18.4335, time_taken_in_seconds: 43
Epoch [1/1], Step [8753/13804], Loss: 2.4634, Perplexity: 11.7449, time_taken_in_seconds: 44
Epoch [1/1], Step [8754/13804], Loss: 2.6023, Perplexity: 13.4948, time_taken_in_seconds: 45
Epoch [1/1], Step [8755/13804], Loss: 2.7592, Perplexity: 15.7871, time_taken_in_seconds: 45
Epoch [1/1], Step [8756/13804], Loss: 2.5842, Perplexity: 13.2522, time_taken_in_seconds: 46
Epoch [1/1], Step [8757/13804], Loss: 2.5530, Perplexity: 12.8456, time_taken_in_seconds: 47
Epoch [1/1], Step [8758/13804], Loss: 3.4352, Perplexity: 31.0367, time_taken_in_seconds: 48
Epoch [1/1], Step [8759/13804], Loss: 2.5621, Perplexity: 12.9631, time_taken_in_seconds: 49
Epoch [1/1], Step [8760/13804], Loss: 2.9655, Perplexity: 19.4041, time_taken_in_seconds: 49
Epoch [1/1], Step [8761/13804], Loss: 2.4328, Perplexity: 11.3905, time_taken_in_seconds: 50
Epoch [1/1], Step [8762/13804], Loss: 2.6021, Perplexity: 13.4924, time_taken_in_seconds: 51
Epoch [1/1], Step [8763/13804], Loss: 3.5288, Perplexity: 34.0822, time_taken_in_seconds: 52
Epoch [1/1], Step [8764/13804], Loss: 2.2643, Perplexity: 9.6249, time_taken_in_seconds: 53
Epoch [1/1], Step [8765/13804], Loss: 2.7995, Perplexity: 16.4371, time_taken_in_seconds: 54
Epoch [1/1], Step [8766/13804], Loss: 2.3993, Perplexity: 11.0155, time_taken_in_seconds: 54
Epoch [1/1], Step [8767/13804], Loss: 2.5642, Perplexity: 12.9909, time_taken_in_seconds: 55
Epoch [1/1], Step [8768/13804], Loss: 2.7239, Perplexity: 15.2393, time_taken_in_seconds: 56
Epoch [1/1], Step [8769/13804], Loss: 2.4180, Perplexity: 11.2230, time_taken_in_seconds: 57
Epoch [1/1], Step [8770/13804], Loss: 2.6733, Perplexity: 14.4876, time_taken_in_seconds: 58
Epoch [1/1], Step [8771/13804], Loss: 2.7157, Perplexity: 15.1149, time_taken_in_seconds: 58
Epoch [1/1], Step [8772/13804], Loss: 2.9898, Perplexity: 19.8819, time_taken_in_seconds: 59
Epoch [1/1], Step [8773/13804], Loss: 2.6233, Perplexity: 13.7816, time_taken_in_seconds: 60
Epoch [1/1], Step [8774/13804], Loss: 2.4875, Perplexity: 12.0315, time_taken_in_seconds: 61
Epoch [1/1], Step [8775/13804], Loss: 3.1517, Perplexity: 23.3760, time_taken_in_seconds: 62
Epoch [1/1], Step [8776/13804], Loss: 2.5878, Perplexity: 13.3003, time_taken_in_seconds: 63
Epoch [1/1], Step [8777/13804], Loss: 2.5642, Perplexity: 12.9904, time_taken_in_seconds: 63
Epoch [1/1], Step [8778/13804], Loss: 2.4914, Perplexity: 12.0786, time_taken_in_seconds: 64
Epoch [1/1], Step [8779/13804], Loss: 2.6200, Perplexity: 13.7358, time_taken_in_seconds: 65
Epoch [1/1], Step [8780/13804], Loss: 2.4388, Perplexity: 11.4596, time_taken_in_seconds: 66
Epoch [1/1], Step [8781/13804], Loss: 2.5654, Perplexity: 13.0056, time_taken_in_seconds: 67
Epoch [1/1], Step [8782/13804], Loss: 2.8127, Perplexity: 16.6554, time_taken_in_seconds: 67
Epoch [1/1], Step [8783/13804], Loss: 2.2251, Perplexity: 9.2548, time_taken_in_seconds: 68
Epoch [1/1], Step [8784/13804], Loss: 2.3118, Perplexity: 10.0922, time_taken_in_seconds: 69
Epoch [1/1], Step [8785/13804], Loss: 2.6584, Perplexity: 14.2735, time_taken_in_seconds: 70
Epoch [1/1], Step [8786/13804], Loss: 2.6905, Perplexity: 14.7397, time_taken_in_seconds: 71
Epoch [1/1], Step [8787/13804], Loss: 2.6720, Perplexity: 14.4686, time_taken_in_seconds: 72
Epoch [1/1], Step [8788/13804], Loss: 2.5202, Perplexity: 12.4311, time_taken_in_seconds: 72
Epoch [1/1], Step [8789/13804], Loss: 2.3659, Perplexity: 10.6535, time_taken_in_seconds: 73
Epoch [1/1], Step [8790/13804], Loss: 2.5752, Perplexity: 13.1339, time_taken_in_seconds: 74
Epoch [1/1], Step [8791/13804], Loss: 2.4115, Perplexity: 11.1505, time_taken_in_seconds: 75
Epoch [1/1], Step [8792/13804], Loss: 2.7988, Perplexity: 16.4244, time_taken_in_seconds: 76
Epoch [1/1], Step [8793/13804], Loss: 2.6329, Perplexity: 13.9145, time_taken_in_seconds: 76
Epoch [1/1], Step [8794/13804], Loss: 2.7661, Perplexity: 15.8970, time_taken_in_seconds: 77
Epoch [1/1], Step [8795/13804], Loss: 2.3916, Perplexity: 10.9314, time_taken_in_seconds: 78
Epoch [1/1], Step [8796/13804], Loss: 2.4003, Perplexity: 11.0266, time_taken_in_seconds: 79
Epoch [1/1], Step [8797/13804], Loss: 2.6676, Perplexity: 14.4053, time_taken_in_seconds: 80
Epoch [1/1], Step [8798/13804], Loss: 2.4701, Perplexity: 11.8236, time_taken_in_seconds: 81
Epoch [1/1], Step [8799/13804], Loss: 2.6267, Perplexity: 13.8279, time_taken_in_seconds: 81
Epoch [1/1], Step [8800/13804], Loss: 2.7173, Perplexity: 15.1394, time_taken_in_seconds: 82
Epoch [1/1], Step [8801/13804], Loss: 3.0462, Perplexity: 21.0349, time_taken_in_seconds: 0
Epoch [1/1], Step [8802/13804], Loss: 2.3101, Perplexity: 10.0758, time_taken_in_seconds: 1
Epoch [1/1], Step [8803/13804], Loss: 2.5770, Perplexity: 13.1582, time_taken_in_seconds: 2
Epoch [1/1], Step [8804/13804], Loss: 3.0848, Perplexity: 21.8622, time_taken_in_seconds: 3
Epoch [1/1], Step [8805/13804], Loss: 2.5863, Perplexity: 13.2810, time_taken_in_seconds: 4
Epoch [1/1], Step [8806/13804], Loss: 2.7724, Perplexity: 15.9970, time_taken_in_seconds: 5
Epoch [1/1], Step [8807/13804], Loss: 2.2842, Perplexity: 9.8182, time_taken_in_seconds: 5
Epoch [1/1], Step [8808/13804], Loss: 2.4317, Perplexity: 11.3781, time_taken_in_seconds: 6
Epoch [1/1], Step [8809/13804], Loss: 2.6151, Perplexity: 13.6680, time_taken_in_seconds: 7
Epoch [1/1], Step [8810/13804], Loss: 2.4577, Perplexity: 11.6781, time_taken_in_seconds: 8
Epoch [1/1], Step [8811/13804], Loss: 2.8217, Perplexity: 16.8054, time_taken_in_seconds: 9
Epoch [1/1], Step [8812/13804], Loss: 2.6005, Perplexity: 13.4708, time_taken_in_seconds: 10
Epoch [1/1], Step [8813/13804], Loss: 3.0033, Perplexity: 20.1529, time_taken_in_seconds: 10
Epoch [1/1], Step [8814/13804], Loss: 2.4719, Perplexity: 11.8447, time_taken_in_seconds: 11
Epoch [1/1], Step [8815/13804], Loss: 3.1490, Perplexity: 23.3136, time_taken_in_seconds: 12
Epoch [1/1], Step [8816/13804], Loss: 3.2470, Perplexity: 25.7130, time_taken_in_seconds: 13
Epoch [1/1], Step [8817/13804], Loss: 2.3874, Perplexity: 10.8848, time_taken_in_seconds: 14
Epoch [1/1], Step [8818/13804], Loss: 2.6913, Perplexity: 14.7502, time_taken_in_seconds: 14
Epoch [1/1], Step [8819/13804], Loss: 2.6827, Perplexity: 14.6252, time_taken_in_seconds: 15
Epoch [1/1], Step [8820/13804], Loss: 3.0413, Perplexity: 20.9324, time_taken_in_seconds: 16
Epoch [1/1], Step [8821/13804], Loss: 2.9822, Perplexity: 19.7304, time_taken_in_seconds: 17
Epoch [1/1], Step [8822/13804], Loss: 2.7622, Perplexity: 15.8339, time_taken_in_seconds: 18
Epoch [1/1], Step [8823/13804], Loss: 2.9600, Perplexity: 19.2974, time_taken_in_seconds: 19
Epoch [1/1], Step [8824/13804], Loss: 2.6464, Perplexity: 14.1028, time_taken_in_seconds: 19
Epoch [1/1], Step [8825/13804], Loss: 2.6925, Perplexity: 14.7678, time_taken_in_seconds: 20
Epoch [1/1], Step [8826/13804], Loss: 2.7870, Perplexity: 16.2328, time_taken_in_seconds: 21
Epoch [1/1], Step [8827/13804], Loss: 2.5366, Perplexity: 12.6367, time_taken_in_seconds: 22
Epoch [1/1], Step [8828/13804], Loss: 3.0962, Perplexity: 22.1129, time_taken_in_seconds: 23
Epoch [1/1], Step [8829/13804], Loss: 2.3127, Perplexity: 10.1018, time_taken_in_seconds: 24
Epoch [1/1], Step [8830/13804], Loss: 2.3414, Perplexity: 10.3953, time_taken_in_seconds: 24
Epoch [1/1], Step [8831/13804], Loss: 2.7494, Perplexity: 15.6333, time_taken_in_seconds: 25
Epoch [1/1], Step [8832/13804], Loss: 2.4751, Perplexity: 11.8830, time_taken_in_seconds: 26
Epoch [1/1], Step [8833/13804], Loss: 2.5820, Perplexity: 13.2240, time_taken_in_seconds: 27
Epoch [1/1], Step [8834/13804], Loss: 2.6303, Perplexity: 13.8786, time_taken_in_seconds: 28
Epoch [1/1], Step [8835/13804], Loss: 2.9392, Perplexity: 18.9011, time_taken_in_seconds: 28
Epoch [1/1], Step [8836/13804], Loss: 2.4265, Perplexity: 11.3198, time_taken_in_seconds: 29
Epoch [1/1], Step [8837/13804], Loss: 2.7756, Perplexity: 16.0484, time_taken_in_seconds: 30
Epoch [1/1], Step [8838/13804], Loss: 2.4785, Perplexity: 11.9229, time_taken_in_seconds: 31
Epoch [1/1], Step [8839/13804], Loss: 2.3554, Perplexity: 10.5424, time_taken_in_seconds: 32
Epoch [1/1], Step [8840/13804], Loss: 2.5931, Perplexity: 13.3712, time_taken_in_seconds: 33
Epoch [1/1], Step [8841/13804], Loss: 2.7886, Perplexity: 16.2581, time_taken_in_seconds: 33
Epoch [1/1], Step [8842/13804], Loss: 2.9840, Perplexity: 19.7665, time_taken_in_seconds: 34
Epoch [1/1], Step [8843/13804], Loss: 2.4612, Perplexity: 11.7191, time_taken_in_seconds: 35
Epoch [1/1], Step [8844/13804], Loss: 2.4736, Perplexity: 11.8650, time_taken_in_seconds: 36
Epoch [1/1], Step [8845/13804], Loss: 2.5868, Perplexity: 13.2868, time_taken_in_seconds: 37
Epoch [1/1], Step [8846/13804], Loss: 2.6026, Perplexity: 13.4995, time_taken_in_seconds: 38
Epoch [1/1], Step [8847/13804], Loss: 3.0231, Perplexity: 20.5550, time_taken_in_seconds: 38
Epoch [1/1], Step [8848/13804], Loss: 2.5797, Perplexity: 13.1933, time_taken_in_seconds: 39
Epoch [1/1], Step [8849/13804], Loss: 2.2275, Perplexity: 9.2768, time_taken_in_seconds: 40
Epoch [1/1], Step [8850/13804], Loss: 2.5889, Perplexity: 13.3154, time_taken_in_seconds: 41
Epoch [1/1], Step [8851/13804], Loss: 2.3833, Perplexity: 10.8402, time_taken_in_seconds: 42
Epoch [1/1], Step [8852/13804], Loss: 2.2192, Perplexity: 9.1997, time_taken_in_seconds: 42
Epoch [1/1], Step [8853/13804], Loss: 2.4580, Perplexity: 11.6810, time_taken_in_seconds: 43
Epoch [1/1], Step [8854/13804], Loss: 2.7459, Perplexity: 15.5791, time_taken_in_seconds: 44
Epoch [1/1], Step [8855/13804], Loss: 2.5781, Perplexity: 13.1716, time_taken_in_seconds: 45
Epoch [1/1], Step [8856/13804], Loss: 2.7264, Perplexity: 15.2779, time_taken_in_seconds: 46
Epoch [1/1], Step [8857/13804], Loss: 2.5091, Perplexity: 12.2934, time_taken_in_seconds: 47
Epoch [1/1], Step [8858/13804], Loss: 2.2100, Perplexity: 9.1155, time_taken_in_seconds: 47
Epoch [1/1], Step [8859/13804], Loss: 2.0581, Perplexity: 7.8310, time_taken_in_seconds: 48
Epoch [1/1], Step [8860/13804], Loss: 2.1310, Perplexity: 8.4234, time_taken_in_seconds: 49
Epoch [1/1], Step [8861/13804], Loss: 2.4344, Perplexity: 11.4093, time_taken_in_seconds: 50
Epoch [1/1], Step [8862/13804], Loss: 2.8163, Perplexity: 16.7155, time_taken_in_seconds: 51
Epoch [1/1], Step [8863/13804], Loss: 2.5618, Perplexity: 12.9585, time_taken_in_seconds: 52
Epoch [1/1], Step [8864/13804], Loss: 3.1860, Perplexity: 24.1918, time_taken_in_seconds: 52
Epoch [1/1], Step [8865/13804], Loss: 2.2939, Perplexity: 9.9136, time_taken_in_seconds: 53
Epoch [1/1], Step [8866/13804], Loss: 2.4069, Perplexity: 11.0994, time_taken_in_seconds: 54
Epoch [1/1], Step [8867/13804], Loss: 2.3859, Perplexity: 10.8693, time_taken_in_seconds: 55
Epoch [1/1], Step [8868/13804], Loss: 2.7594, Perplexity: 15.7909, time_taken_in_seconds: 56
Epoch [1/1], Step [8869/13804], Loss: 2.8103, Perplexity: 16.6144, time_taken_in_seconds: 56
Epoch [1/1], Step [8870/13804], Loss: 2.6189, Perplexity: 13.7211, time_taken_in_seconds: 57
Epoch [1/1], Step [8871/13804], Loss: 2.3881, Perplexity: 10.8931, time_taken_in_seconds: 58
Epoch [1/1], Step [8872/13804], Loss: 2.2647, Perplexity: 9.6283, time_taken_in_seconds: 59
Epoch [1/1], Step [8873/13804], Loss: 2.8143, Perplexity: 16.6811, time_taken_in_seconds: 60
Epoch [1/1], Step [8874/13804], Loss: 2.6887, Perplexity: 14.7132, time_taken_in_seconds: 61
Epoch [1/1], Step [8875/13804], Loss: 2.6801, Perplexity: 14.5861, time_taken_in_seconds: 61
Epoch [1/1], Step [8876/13804], Loss: 2.3473, Perplexity: 10.4569, time_taken_in_seconds: 62
Epoch [1/1], Step [8877/13804], Loss: 2.7842, Perplexity: 16.1867, time_taken_in_seconds: 63
Epoch [1/1], Step [8878/13804], Loss: 2.2886, Perplexity: 9.8615, time_taken_in_seconds: 64
Epoch [1/1], Step [8879/13804], Loss: 2.6870, Perplexity: 14.6869, time_taken_in_seconds: 65
Epoch [1/1], Step [8880/13804], Loss: 2.5426, Perplexity: 12.7130, time_taken_in_seconds: 66
Epoch [1/1], Step [8881/13804], Loss: 2.4750, Perplexity: 11.8814, time_taken_in_seconds: 67
Epoch [1/1], Step [8882/13804], Loss: 2.4280, Perplexity: 11.3358, time_taken_in_seconds: 68
Epoch [1/1], Step [8883/13804], Loss: 2.3956, Perplexity: 10.9745, time_taken_in_seconds: 68
Epoch [1/1], Step [8884/13804], Loss: 2.4977, Perplexity: 12.1545, time_taken_in_seconds: 69
Epoch [1/1], Step [8885/13804], Loss: 2.6680, Perplexity: 14.4118, time_taken_in_seconds: 70
Epoch [1/1], Step [8886/13804], Loss: 2.7440, Perplexity: 15.5496, time_taken_in_seconds: 71
Epoch [1/1], Step [8887/13804], Loss: 2.3396, Perplexity: 10.3774, time_taken_in_seconds: 72
Epoch [1/1], Step [8888/13804], Loss: 2.7243, Perplexity: 15.2462, time_taken_in_seconds: 72
Epoch [1/1], Step [8889/13804], Loss: 2.8090, Perplexity: 16.5934, time_taken_in_seconds: 73
Epoch [1/1], Step [8890/13804], Loss: 2.5850, Perplexity: 13.2631, time_taken_in_seconds: 74
Epoch [1/1], Step [8891/13804], Loss: 2.2976, Perplexity: 9.9498, time_taken_in_seconds: 75
Epoch [1/1], Step [8892/13804], Loss: 2.4416, Perplexity: 11.4912, time_taken_in_seconds: 76
Epoch [1/1], Step [8893/13804], Loss: 2.6651, Perplexity: 14.3687, time_taken_in_seconds: 77
Epoch [1/1], Step [8894/13804], Loss: 2.7076, Perplexity: 14.9927, time_taken_in_seconds: 77
Epoch [1/1], Step [8895/13804], Loss: 2.8465, Perplexity: 17.2273, time_taken_in_seconds: 78
Epoch [1/1], Step [8896/13804], Loss: 2.6539, Perplexity: 14.2097, time_taken_in_seconds: 79
Epoch [1/1], Step [8897/13804], Loss: 2.7486, Perplexity: 15.6211, time_taken_in_seconds: 80
Epoch [1/1], Step [8898/13804], Loss: 2.8175, Perplexity: 16.7357, time_taken_in_seconds: 81
Epoch [1/1], Step [8899/13804], Loss: 2.9035, Perplexity: 18.2374, time_taken_in_seconds: 82
Epoch [1/1], Step [8900/13804], Loss: 2.4680, Perplexity: 11.7990, time_taken_in_seconds: 82
Epoch [1/1], Step [8901/13804], Loss: 2.3122, Perplexity: 10.0970, time_taken_in_seconds: 0
Epoch [1/1], Step [8902/13804], Loss: 2.5789, Perplexity: 13.1825, time_taken_in_seconds: 1
Epoch [1/1], Step [8903/13804], Loss: 2.1539, Perplexity: 8.6181, time_taken_in_seconds: 2
Epoch [1/1], Step [8904/13804], Loss: 2.5945, Perplexity: 13.3893, time_taken_in_seconds: 3
Epoch [1/1], Step [8905/13804], Loss: 2.4668, Perplexity: 11.7850, time_taken_in_seconds: 4
Epoch [1/1], Step [8906/13804], Loss: 2.5682, Perplexity: 13.0422, time_taken_in_seconds: 4
Epoch [1/1], Step [8907/13804], Loss: 2.3374, Perplexity: 10.3541, time_taken_in_seconds: 5
Epoch [1/1], Step [8908/13804], Loss: 3.4974, Perplexity: 33.0310, time_taken_in_seconds: 6
Epoch [1/1], Step [8909/13804], Loss: 2.8510, Perplexity: 17.3045, time_taken_in_seconds: 7
Epoch [1/1], Step [8910/13804], Loss: 2.3272, Perplexity: 10.2491, time_taken_in_seconds: 8
Epoch [1/1], Step [8911/13804], Loss: 3.1840, Perplexity: 24.1436, time_taken_in_seconds: 9
Epoch [1/1], Step [8912/13804], Loss: 2.5580, Perplexity: 12.9100, time_taken_in_seconds: 9
Epoch [1/1], Step [8913/13804], Loss: 2.5979, Perplexity: 13.4355, time_taken_in_seconds: 10
Epoch [1/1], Step [8914/13804], Loss: 2.5027, Perplexity: 12.2160, time_taken_in_seconds: 11
Epoch [1/1], Step [8915/13804], Loss: 2.6869, Perplexity: 14.6854, time_taken_in_seconds: 12
Epoch [1/1], Step [8916/13804], Loss: 3.1034, Perplexity: 22.2742, time_taken_in_seconds: 13
Epoch [1/1], Step [8917/13804], Loss: 2.6760, Perplexity: 14.5275, time_taken_in_seconds: 13
Epoch [1/1], Step [8918/13804], Loss: 2.8330, Perplexity: 16.9960, time_taken_in_seconds: 14
Epoch [1/1], Step [8919/13804], Loss: 2.5277, Perplexity: 12.5247, time_taken_in_seconds: 15
Epoch [1/1], Step [8920/13804], Loss: 2.8129, Perplexity: 16.6578, time_taken_in_seconds: 16
Epoch [1/1], Step [8921/13804], Loss: 2.7602, Perplexity: 15.8033, time_taken_in_seconds: 17
Epoch [1/1], Step [8922/13804], Loss: 2.5803, Perplexity: 13.2018, time_taken_in_seconds: 18
Epoch [1/1], Step [8923/13804], Loss: 2.5292, Perplexity: 12.5441, time_taken_in_seconds: 18
Epoch [1/1], Step [8924/13804], Loss: 2.7524, Perplexity: 15.6808, time_taken_in_seconds: 19
Epoch [1/1], Step [8925/13804], Loss: 2.8632, Perplexity: 17.5176, time_taken_in_seconds: 20
Epoch [1/1], Step [8926/13804], Loss: 3.3664, Perplexity: 28.9726, time_taken_in_seconds: 21
Epoch [1/1], Step [8927/13804], Loss: 2.5818, Perplexity: 13.2213, time_taken_in_seconds: 22
Epoch [1/1], Step [8928/13804], Loss: 2.8938, Perplexity: 18.0624, time_taken_in_seconds: 22
Epoch [1/1], Step [8929/13804], Loss: 2.4438, Perplexity: 11.5172, time_taken_in_seconds: 23
Epoch [1/1], Step [8930/13804], Loss: 2.8763, Perplexity: 17.7477, time_taken_in_seconds: 24
Epoch [1/1], Step [8931/13804], Loss: 3.0226, Perplexity: 20.5437, time_taken_in_seconds: 25
Epoch [1/1], Step [8932/13804], Loss: 2.8371, Perplexity: 17.0666, time_taken_in_seconds: 26
Epoch [1/1], Step [8933/13804], Loss: 2.4952, Perplexity: 12.1238, time_taken_in_seconds: 27
Epoch [1/1], Step [8934/13804], Loss: 2.6081, Perplexity: 13.5736, time_taken_in_seconds: 27
Epoch [1/1], Step [8935/13804], Loss: 2.9644, Perplexity: 19.3833, time_taken_in_seconds: 28
Epoch [1/1], Step [8936/13804], Loss: 2.7452, Perplexity: 15.5674, time_taken_in_seconds: 29
Epoch [1/1], Step [8937/13804], Loss: 2.6944, Perplexity: 14.7970, time_taken_in_seconds: 30
Epoch [1/1], Step [8938/13804], Loss: 2.4671, Perplexity: 11.7886, time_taken_in_seconds: 31
Epoch [1/1], Step [8939/13804], Loss: 2.6987, Perplexity: 14.8608, time_taken_in_seconds: 32
Epoch [1/1], Step [8940/13804], Loss: 2.2474, Perplexity: 9.4627, time_taken_in_seconds: 32
Epoch [1/1], Step [8941/13804], Loss: 3.4083, Perplexity: 30.2133, time_taken_in_seconds: 33
Epoch [1/1], Step [8942/13804], Loss: 2.6123, Perplexity: 13.6303, time_taken_in_seconds: 34
Epoch [1/1], Step [8943/13804], Loss: 2.7356, Perplexity: 15.4194, time_taken_in_seconds: 35
Epoch [1/1], Step [8944/13804], Loss: 2.3426, Perplexity: 10.4082, time_taken_in_seconds: 36
Epoch [1/1], Step [8945/13804], Loss: 2.7607, Perplexity: 15.8107, time_taken_in_seconds: 36
Epoch [1/1], Step [8946/13804], Loss: 2.8730, Perplexity: 17.6900, time_taken_in_seconds: 37
Epoch [1/1], Step [8947/13804], Loss: 2.8701, Perplexity: 17.6387, time_taken_in_seconds: 38
Epoch [1/1], Step [8948/13804], Loss: 2.4711, Perplexity: 11.8353, time_taken_in_seconds: 39
Epoch [1/1], Step [8949/13804], Loss: 2.3534, Perplexity: 10.5212, time_taken_in_seconds: 40
Epoch [1/1], Step [8950/13804], Loss: 2.8956, Perplexity: 18.0949, time_taken_in_seconds: 41
Epoch [1/1], Step [8951/13804], Loss: 3.2342, Perplexity: 25.3865, time_taken_in_seconds: 42
Epoch [1/1], Step [8952/13804], Loss: 2.3220, Perplexity: 10.1958, time_taken_in_seconds: 43
Epoch [1/1], Step [8953/13804], Loss: 2.4276, Perplexity: 11.3319, time_taken_in_seconds: 43
Epoch [1/1], Step [8954/13804], Loss: 2.4506, Perplexity: 11.5949, time_taken_in_seconds: 44
Epoch [1/1], Step [8955/13804], Loss: 2.6224, Perplexity: 13.7687, time_taken_in_seconds: 45
Epoch [1/1], Step [8956/13804], Loss: 2.6760, Perplexity: 14.5263, time_taken_in_seconds: 46
Epoch [1/1], Step [8957/13804], Loss: 2.5262, Perplexity: 12.5062, time_taken_in_seconds: 47
Epoch [1/1], Step [8958/13804], Loss: 2.6979, Perplexity: 14.8484, time_taken_in_seconds: 48
Epoch [1/1], Step [8959/13804], Loss: 2.7058, Perplexity: 14.9665, time_taken_in_seconds: 48
Epoch [1/1], Step [8960/13804], Loss: 2.4378, Perplexity: 11.4473, time_taken_in_seconds: 49
Epoch [1/1], Step [8961/13804], Loss: 2.6801, Perplexity: 14.5871, time_taken_in_seconds: 50
Epoch [1/1], Step [8962/13804], Loss: 2.6524, Perplexity: 14.1876, time_taken_in_seconds: 51
Epoch [1/1], Step [8963/13804], Loss: 2.9213, Perplexity: 18.5660, time_taken_in_seconds: 52
Epoch [1/1], Step [8964/13804], Loss: 2.2938, Perplexity: 9.9128, time_taken_in_seconds: 52
Epoch [1/1], Step [8965/13804], Loss: 2.6811, Perplexity: 14.6007, time_taken_in_seconds: 53
Epoch [1/1], Step [8966/13804], Loss: 2.5996, Perplexity: 13.4582, time_taken_in_seconds: 54
Epoch [1/1], Step [8967/13804], Loss: 2.5236, Perplexity: 12.4733, time_taken_in_seconds: 55
Epoch [1/1], Step [8968/13804], Loss: 2.3980, Perplexity: 11.0009, time_taken_in_seconds: 56
Epoch [1/1], Step [8969/13804], Loss: 2.6127, Perplexity: 13.6355, time_taken_in_seconds: 57
Epoch [1/1], Step [8970/13804], Loss: 2.4812, Perplexity: 11.9551, time_taken_in_seconds: 57
Epoch [1/1], Step [8971/13804], Loss: 2.7700, Perplexity: 15.9586, time_taken_in_seconds: 58
Epoch [1/1], Step [8972/13804], Loss: 2.9667, Perplexity: 19.4271, time_taken_in_seconds: 59
Epoch [1/1], Step [8973/13804], Loss: 2.4917, Perplexity: 12.0820, time_taken_in_seconds: 60
Epoch [1/1], Step [8974/13804], Loss: 2.6512, Perplexity: 14.1715, time_taken_in_seconds: 61
Epoch [1/1], Step [8975/13804], Loss: 3.5656, Perplexity: 35.3618, time_taken_in_seconds: 62
Epoch [1/1], Step [8976/13804], Loss: 2.6337, Perplexity: 13.9248, time_taken_in_seconds: 62
Epoch [1/1], Step [8977/13804], Loss: 2.7556, Perplexity: 15.7308, time_taken_in_seconds: 63
Epoch [1/1], Step [8978/13804], Loss: 2.5180, Perplexity: 12.4036, time_taken_in_seconds: 64
Epoch [1/1], Step [8979/13804], Loss: 2.4725, Perplexity: 11.8517, time_taken_in_seconds: 65
Epoch [1/1], Step [8980/13804], Loss: 2.5081, Perplexity: 12.2816, time_taken_in_seconds: 66
Epoch [1/1], Step [8981/13804], Loss: 2.3559, Perplexity: 10.5474, time_taken_in_seconds: 67
Epoch [1/1], Step [8982/13804], Loss: 2.7171, Perplexity: 15.1369, time_taken_in_seconds: 67
Epoch [1/1], Step [8983/13804], Loss: 2.6831, Perplexity: 14.6297, time_taken_in_seconds: 68
Epoch [1/1], Step [8984/13804], Loss: 3.1277, Perplexity: 22.8216, time_taken_in_seconds: 69
Epoch [1/1], Step [8985/13804], Loss: 2.5201, Perplexity: 12.4302, time_taken_in_seconds: 70
Epoch [1/1], Step [8986/13804], Loss: 3.4450, Perplexity: 31.3428, time_taken_in_seconds: 71
Epoch [1/1], Step [8987/13804], Loss: 3.1864, Perplexity: 24.2016, time_taken_in_seconds: 72
Epoch [1/1], Step [8988/13804], Loss: 2.4820, Perplexity: 11.9654, time_taken_in_seconds: 72
Epoch [1/1], Step [8989/13804], Loss: 2.4081, Perplexity: 11.1125, time_taken_in_seconds: 73
Epoch [1/1], Step [8990/13804], Loss: 2.6636, Perplexity: 14.3477, time_taken_in_seconds: 74
Epoch [1/1], Step [8991/13804], Loss: 2.4337, Perplexity: 11.4011, time_taken_in_seconds: 75
Epoch [1/1], Step [8992/13804], Loss: 2.7992, Perplexity: 16.4309, time_taken_in_seconds: 76
Epoch [1/1], Step [8993/13804], Loss: 2.4528, Perplexity: 11.6213, time_taken_in_seconds: 77
Epoch [1/1], Step [8994/13804], Loss: 2.4879, Perplexity: 12.0355, time_taken_in_seconds: 77
Epoch [1/1], Step [8995/13804], Loss: 2.6558, Perplexity: 14.2357, time_taken_in_seconds: 78
Epoch [1/1], Step [8996/13804], Loss: 2.5092, Perplexity: 12.2945, time_taken_in_seconds: 79
Epoch [1/1], Step [8997/13804], Loss: 2.9308, Perplexity: 18.7434, time_taken_in_seconds: 80
Epoch [1/1], Step [8998/13804], Loss: 2.3361, Perplexity: 10.3409, time_taken_in_seconds: 81
Epoch [1/1], Step [8999/13804], Loss: 2.6940, Perplexity: 14.7909, time_taken_in_seconds: 81
Epoch [1/1], Step [9000/13804], Loss: 2.3924, Perplexity: 10.9402, time_taken_in_seconds: 82
Epoch [1/1], Step [9001/13804], Loss: 2.4187, Perplexity: 11.2316, time_taken_in_seconds: 0
Epoch [1/1], Step [9002/13804], Loss: 2.5686, Perplexity: 13.0474, time_taken_in_seconds: 1
Epoch [1/1], Step [9003/13804], Loss: 2.9240, Perplexity: 18.6149, time_taken_in_seconds: 2
Epoch [1/1], Step [9004/13804], Loss: 2.5482, Perplexity: 12.7846, time_taken_in_seconds: 3
Epoch [1/1], Step [9005/13804], Loss: 2.6355, Perplexity: 13.9504, time_taken_in_seconds: 4
Epoch [1/1], Step [9006/13804], Loss: 2.6877, Perplexity: 14.6976, time_taken_in_seconds: 4
Epoch [1/1], Step [9007/13804], Loss: 2.7098, Perplexity: 15.0270, time_taken_in_seconds: 5
Epoch [1/1], Step [9008/13804], Loss: 2.6042, Perplexity: 13.5208, time_taken_in_seconds: 6
Epoch [1/1], Step [9009/13804], Loss: 2.9589, Perplexity: 19.2764, time_taken_in_seconds: 7
Epoch [1/1], Step [9010/13804], Loss: 2.4581, Perplexity: 11.6825, time_taken_in_seconds: 8
Epoch [1/1], Step [9011/13804], Loss: 2.9248, Perplexity: 18.6296, time_taken_in_seconds: 9
Epoch [1/1], Step [9012/13804], Loss: 2.6646, Perplexity: 14.3625, time_taken_in_seconds: 9
Epoch [1/1], Step [9013/13804], Loss: 2.5149, Perplexity: 12.3649, time_taken_in_seconds: 10
Epoch [1/1], Step [9014/13804], Loss: 2.5475, Perplexity: 12.7752, time_taken_in_seconds: 11
Epoch [1/1], Step [9015/13804], Loss: 2.4152, Perplexity: 11.1922, time_taken_in_seconds: 12
Epoch [1/1], Step [9016/13804], Loss: 2.4814, Perplexity: 11.9577, time_taken_in_seconds: 13
Epoch [1/1], Step [9017/13804], Loss: 2.6171, Perplexity: 13.6965, time_taken_in_seconds: 13
Epoch [1/1], Step [9018/13804], Loss: 2.7379, Perplexity: 15.4540, time_taken_in_seconds: 14
Epoch [1/1], Step [9019/13804], Loss: 2.9020, Perplexity: 18.2105, time_taken_in_seconds: 15
Epoch [1/1], Step [9020/13804], Loss: 2.4999, Perplexity: 12.1811, time_taken_in_seconds: 16
Epoch [1/1], Step [9021/13804], Loss: 2.6196, Perplexity: 13.7303, time_taken_in_seconds: 17
Epoch [1/1], Step [9022/13804], Loss: 2.4806, Perplexity: 11.9482, time_taken_in_seconds: 18
Epoch [1/1], Step [9023/13804], Loss: 2.7808, Perplexity: 16.1321, time_taken_in_seconds: 18
Epoch [1/1], Step [9024/13804], Loss: 2.3276, Perplexity: 10.2530, time_taken_in_seconds: 19
Epoch [1/1], Step [9025/13804], Loss: 2.7398, Perplexity: 15.4837, time_taken_in_seconds: 20
Epoch [1/1], Step [9026/13804], Loss: 2.6272, Perplexity: 13.8353, time_taken_in_seconds: 21
Epoch [1/1], Step [9027/13804], Loss: 2.6307, Perplexity: 13.8830, time_taken_in_seconds: 22
Epoch [1/1], Step [9028/13804], Loss: 2.6464, Perplexity: 14.1027, time_taken_in_seconds: 23
Epoch [1/1], Step [9029/13804], Loss: 2.7852, Perplexity: 16.2026, time_taken_in_seconds: 24
Epoch [1/1], Step [9030/13804], Loss: 2.5691, Perplexity: 13.0536, time_taken_in_seconds: 24
Epoch [1/1], Step [9031/13804], Loss: 2.4004, Perplexity: 11.0274, time_taken_in_seconds: 25
Epoch [1/1], Step [9032/13804], Loss: 2.2269, Perplexity: 9.2715, time_taken_in_seconds: 26
Epoch [1/1], Step [9033/13804], Loss: 2.4316, Perplexity: 11.3766, time_taken_in_seconds: 27
Epoch [1/1], Step [9034/13804], Loss: 2.3921, Perplexity: 10.9362, time_taken_in_seconds: 28
Epoch [1/1], Step [9035/13804], Loss: 2.3484, Perplexity: 10.4687, time_taken_in_seconds: 29
Epoch [1/1], Step [9036/13804], Loss: 2.4679, Perplexity: 11.7974, time_taken_in_seconds: 29
Epoch [1/1], Step [9037/13804], Loss: 2.5515, Perplexity: 12.8269, time_taken_in_seconds: 30
Epoch [1/1], Step [9038/13804], Loss: 2.4958, Perplexity: 12.1316, time_taken_in_seconds: 31
Epoch [1/1], Step [9039/13804], Loss: 2.4088, Perplexity: 11.1206, time_taken_in_seconds: 32
Epoch [1/1], Step [9040/13804], Loss: 2.3994, Perplexity: 11.0169, time_taken_in_seconds: 33
Epoch [1/1], Step [9041/13804], Loss: 2.4917, Perplexity: 12.0816, time_taken_in_seconds: 34
Epoch [1/1], Step [9042/13804], Loss: 2.2560, Perplexity: 9.5445, time_taken_in_seconds: 34
Epoch [1/1], Step [9043/13804], Loss: 2.5322, Perplexity: 12.5815, time_taken_in_seconds: 35
Epoch [1/1], Step [9044/13804], Loss: 2.4560, Perplexity: 11.6586, time_taken_in_seconds: 36
Epoch [1/1], Step [9045/13804], Loss: 2.6923, Perplexity: 14.7650, time_taken_in_seconds: 37
Epoch [1/1], Step [9046/13804], Loss: 2.8033, Perplexity: 16.4989, time_taken_in_seconds: 38
Epoch [1/1], Step [9047/13804], Loss: 2.5531, Perplexity: 12.8470, time_taken_in_seconds: 38
Epoch [1/1], Step [9048/13804], Loss: 2.5851, Perplexity: 13.2653, time_taken_in_seconds: 39
Epoch [1/1], Step [9049/13804], Loss: 2.6656, Perplexity: 14.3771, time_taken_in_seconds: 40
Epoch [1/1], Step [9050/13804], Loss: 2.4124, Perplexity: 11.1606, time_taken_in_seconds: 41
Epoch [1/1], Step [9051/13804], Loss: 2.5432, Perplexity: 12.7199, time_taken_in_seconds: 42
Epoch [1/1], Step [9052/13804], Loss: 2.5845, Perplexity: 13.2567, time_taken_in_seconds: 43
Epoch [1/1], Step [9053/13804], Loss: 2.4723, Perplexity: 11.8501, time_taken_in_seconds: 43
Epoch [1/1], Step [9054/13804], Loss: 2.5699, Perplexity: 13.0643, time_taken_in_seconds: 44
Epoch [1/1], Step [9055/13804], Loss: 2.6766, Perplexity: 14.5359, time_taken_in_seconds: 45
Epoch [1/1], Step [9056/13804], Loss: 2.4459, Perplexity: 11.5405, time_taken_in_seconds: 46
Epoch [1/1], Step [9057/13804], Loss: 2.5282, Perplexity: 12.5308, time_taken_in_seconds: 47
Epoch [1/1], Step [9058/13804], Loss: 2.4168, Perplexity: 11.2099, time_taken_in_seconds: 47
Epoch [1/1], Step [9059/13804], Loss: 2.4695, Perplexity: 11.8168, time_taken_in_seconds: 48
Epoch [1/1], Step [9060/13804], Loss: 2.4879, Perplexity: 12.0360, time_taken_in_seconds: 49
Epoch [1/1], Step [9061/13804], Loss: 2.5089, Perplexity: 12.2911, time_taken_in_seconds: 50
Epoch [1/1], Step [9062/13804], Loss: 2.4420, Perplexity: 11.4962, time_taken_in_seconds: 51
Epoch [1/1], Step [9063/13804], Loss: 2.7333, Perplexity: 15.3837, time_taken_in_seconds: 52
Epoch [1/1], Step [9064/13804], Loss: 2.3796, Perplexity: 10.8001, time_taken_in_seconds: 52
Epoch [1/1], Step [9065/13804], Loss: 2.8390, Perplexity: 17.0982, time_taken_in_seconds: 53
Epoch [1/1], Step [9066/13804], Loss: 2.4528, Perplexity: 11.6213, time_taken_in_seconds: 54
Epoch [1/1], Step [9067/13804], Loss: 2.5423, Perplexity: 12.7084, time_taken_in_seconds: 55
Epoch [1/1], Step [9068/13804], Loss: 2.5079, Perplexity: 12.2793, time_taken_in_seconds: 56
Epoch [1/1], Step [9069/13804], Loss: 2.5726, Perplexity: 13.0996, time_taken_in_seconds: 57
Epoch [1/1], Step [9070/13804], Loss: 2.6051, Perplexity: 13.5322, time_taken_in_seconds: 57
Epoch [1/1], Step [9071/13804], Loss: 3.2159, Perplexity: 24.9254, time_taken_in_seconds: 58
Epoch [1/1], Step [9072/13804], Loss: 2.6288, Perplexity: 13.8568, time_taken_in_seconds: 59
Epoch [1/1], Step [9073/13804], Loss: 2.8151, Perplexity: 16.6941, time_taken_in_seconds: 60
Epoch [1/1], Step [9074/13804], Loss: 2.8752, Perplexity: 17.7282, time_taken_in_seconds: 61
Epoch [1/1], Step [9075/13804], Loss: 2.4366, Perplexity: 11.4336, time_taken_in_seconds: 62
Epoch [1/1], Step [9076/13804], Loss: 2.4728, Perplexity: 11.8560, time_taken_in_seconds: 62
Epoch [1/1], Step [9077/13804], Loss: 2.4550, Perplexity: 11.6459, time_taken_in_seconds: 63
Epoch [1/1], Step [9078/13804], Loss: 2.6109, Perplexity: 13.6111, time_taken_in_seconds: 64
Epoch [1/1], Step [9079/13804], Loss: 2.7915, Perplexity: 16.3060, time_taken_in_seconds: 65
Epoch [1/1], Step [9080/13804], Loss: 2.8653, Perplexity: 17.5537, time_taken_in_seconds: 66
Epoch [1/1], Step [9081/13804], Loss: 2.7226, Perplexity: 15.2194, time_taken_in_seconds: 66
Epoch [1/1], Step [9082/13804], Loss: 2.7669, Perplexity: 15.9093, time_taken_in_seconds: 67
Epoch [1/1], Step [9083/13804], Loss: 3.0565, Perplexity: 21.2540, time_taken_in_seconds: 68
Epoch [1/1], Step [9084/13804], Loss: 2.3205, Perplexity: 10.1806, time_taken_in_seconds: 69
Epoch [1/1], Step [9085/13804], Loss: 2.5240, Perplexity: 12.4786, time_taken_in_seconds: 70
Epoch [1/1], Step [9086/13804], Loss: 2.4468, Perplexity: 11.5517, time_taken_in_seconds: 71
Epoch [1/1], Step [9087/13804], Loss: 2.5414, Perplexity: 12.6977, time_taken_in_seconds: 71
Epoch [1/1], Step [9088/13804], Loss: 2.6344, Perplexity: 13.9344, time_taken_in_seconds: 72
Epoch [1/1], Step [9089/13804], Loss: 2.1110, Perplexity: 8.2569, time_taken_in_seconds: 73
Epoch [1/1], Step [9090/13804], Loss: 2.9869, Perplexity: 19.8234, time_taken_in_seconds: 74
Epoch [1/1], Step [9091/13804], Loss: 2.5569, Perplexity: 12.8954, time_taken_in_seconds: 75
Epoch [1/1], Step [9092/13804], Loss: 2.3501, Perplexity: 10.4864, time_taken_in_seconds: 76
Epoch [1/1], Step [9093/13804], Loss: 2.5857, Perplexity: 13.2725, time_taken_in_seconds: 76
Epoch [1/1], Step [9094/13804], Loss: 2.8110, Perplexity: 16.6261, time_taken_in_seconds: 77
Epoch [1/1], Step [9095/13804], Loss: 2.3145, Perplexity: 10.1202, time_taken_in_seconds: 78
Epoch [1/1], Step [9096/13804], Loss: 3.1070, Perplexity: 22.3532, time_taken_in_seconds: 79
Epoch [1/1], Step [9097/13804], Loss: 2.4711, Perplexity: 11.8353, time_taken_in_seconds: 80
Epoch [1/1], Step [9098/13804], Loss: 2.3998, Perplexity: 11.0208, time_taken_in_seconds: 81
Epoch [1/1], Step [9099/13804], Loss: 2.5971, Perplexity: 13.4242, time_taken_in_seconds: 82
Epoch [1/1], Step [9100/13804], Loss: 3.1793, Perplexity: 24.0309, time_taken_in_seconds: 82
Epoch [1/1], Step [9101/13804], Loss: 2.5349, Perplexity: 12.6151, time_taken_in_seconds: 0
Epoch [1/1], Step [9102/13804], Loss: 2.5937, Perplexity: 13.3797, time_taken_in_seconds: 1
Epoch [1/1], Step [9103/13804], Loss: 2.5104, Perplexity: 12.3103, time_taken_in_seconds: 2
Epoch [1/1], Step [9104/13804], Loss: 2.2390, Perplexity: 9.3835, time_taken_in_seconds: 3
Epoch [1/1], Step [9105/13804], Loss: 2.4703, Perplexity: 11.8264, time_taken_in_seconds: 4
Epoch [1/1], Step [9106/13804], Loss: 2.4315, Perplexity: 11.3755, time_taken_in_seconds: 4
Epoch [1/1], Step [9107/13804], Loss: 2.4621, Perplexity: 11.7291, time_taken_in_seconds: 5
Epoch [1/1], Step [9108/13804], Loss: 2.4090, Perplexity: 11.1224, time_taken_in_seconds: 6
Epoch [1/1], Step [9109/13804], Loss: 2.3874, Perplexity: 10.8851, time_taken_in_seconds: 7
Epoch [1/1], Step [9110/13804], Loss: 2.4534, Perplexity: 11.6283, time_taken_in_seconds: 8
Epoch [1/1], Step [9111/13804], Loss: 2.7418, Perplexity: 15.5145, time_taken_in_seconds: 9
Epoch [1/1], Step [9112/13804], Loss: 3.2212, Perplexity: 25.0580, time_taken_in_seconds: 9
Epoch [1/1], Step [9113/13804], Loss: 2.6235, Perplexity: 13.7841, time_taken_in_seconds: 10
Epoch [1/1], Step [9114/13804], Loss: 2.4919, Perplexity: 12.0848, time_taken_in_seconds: 11
Epoch [1/1], Step [9115/13804], Loss: 2.4161, Perplexity: 11.2021, time_taken_in_seconds: 12
Epoch [1/1], Step [9116/13804], Loss: 3.1301, Perplexity: 22.8772, time_taken_in_seconds: 13
Epoch [1/1], Step [9117/13804], Loss: 3.0064, Perplexity: 20.2152, time_taken_in_seconds: 14
Epoch [1/1], Step [9118/13804], Loss: 2.7571, Perplexity: 15.7538, time_taken_in_seconds: 14
Epoch [1/1], Step [9119/13804], Loss: 2.4734, Perplexity: 11.8633, time_taken_in_seconds: 15
Epoch [1/1], Step [9120/13804], Loss: 2.6009, Perplexity: 13.4763, time_taken_in_seconds: 16
Epoch [1/1], Step [9121/13804], Loss: 2.4921, Perplexity: 12.0864, time_taken_in_seconds: 17
Epoch [1/1], Step [9122/13804], Loss: 2.6559, Perplexity: 14.2377, time_taken_in_seconds: 18
Epoch [1/1], Step [9123/13804], Loss: 2.6290, Perplexity: 13.8597, time_taken_in_seconds: 19
Epoch [1/1], Step [9124/13804], Loss: 2.9332, Perplexity: 18.7870, time_taken_in_seconds: 19
Epoch [1/1], Step [9125/13804], Loss: 2.3816, Perplexity: 10.8217, time_taken_in_seconds: 20
Epoch [1/1], Step [9126/13804], Loss: 3.0876, Perplexity: 21.9253, time_taken_in_seconds: 21
Epoch [1/1], Step [9127/13804], Loss: 2.7897, Perplexity: 16.2760, time_taken_in_seconds: 22
Epoch [1/1], Step [9128/13804], Loss: 2.5418, Perplexity: 12.7025, time_taken_in_seconds: 23
Epoch [1/1], Step [9129/13804], Loss: 2.4544, Perplexity: 11.6393, time_taken_in_seconds: 24
Epoch [1/1], Step [9130/13804], Loss: 2.7837, Perplexity: 16.1789, time_taken_in_seconds: 24
Epoch [1/1], Step [9131/13804], Loss: 2.7046, Perplexity: 14.9487, time_taken_in_seconds: 25
Epoch [1/1], Step [9132/13804], Loss: 2.5349, Perplexity: 12.6150, time_taken_in_seconds: 26
Epoch [1/1], Step [9133/13804], Loss: 2.9380, Perplexity: 18.8772, time_taken_in_seconds: 27
Epoch [1/1], Step [9134/13804], Loss: 2.5503, Perplexity: 12.8111, time_taken_in_seconds: 28
Epoch [1/1], Step [9135/13804], Loss: 2.9690, Perplexity: 19.4733, time_taken_in_seconds: 29
Epoch [1/1], Step [9136/13804], Loss: 3.0647, Perplexity: 21.4276, time_taken_in_seconds: 29
Epoch [1/1], Step [9137/13804], Loss: 2.8066, Perplexity: 16.5527, time_taken_in_seconds: 30
Epoch [1/1], Step [9138/13804], Loss: 2.6958, Perplexity: 14.8176, time_taken_in_seconds: 31
Epoch [1/1], Step [9139/13804], Loss: 2.5744, Perplexity: 13.1239, time_taken_in_seconds: 32
Epoch [1/1], Step [9140/13804], Loss: 3.0530, Perplexity: 21.1789, time_taken_in_seconds: 33
Epoch [1/1], Step [9141/13804], Loss: 2.5575, Perplexity: 12.9034, time_taken_in_seconds: 34
Epoch [1/1], Step [9142/13804], Loss: 2.3924, Perplexity: 10.9395, time_taken_in_seconds: 34
Epoch [1/1], Step [9143/13804], Loss: 2.5068, Perplexity: 12.2655, time_taken_in_seconds: 35
Epoch [1/1], Step [9144/13804], Loss: 2.6361, Perplexity: 13.9592, time_taken_in_seconds: 36
Epoch [1/1], Step [9145/13804], Loss: 2.4889, Perplexity: 12.0481, time_taken_in_seconds: 37
Epoch [1/1], Step [9146/13804], Loss: 2.6422, Perplexity: 14.0439, time_taken_in_seconds: 38
Epoch [1/1], Step [9147/13804], Loss: 2.5776, Perplexity: 13.1651, time_taken_in_seconds: 38
Epoch [1/1], Step [9148/13804], Loss: 2.5391, Perplexity: 12.6689, time_taken_in_seconds: 39
Epoch [1/1], Step [9149/13804], Loss: 2.6133, Perplexity: 13.6439, time_taken_in_seconds: 40
Epoch [1/1], Step [9150/13804], Loss: 2.9885, Perplexity: 19.8562, time_taken_in_seconds: 41
Epoch [1/1], Step [9151/13804], Loss: 2.6202, Perplexity: 13.7385, time_taken_in_seconds: 42
Epoch [1/1], Step [9152/13804], Loss: 2.5732, Perplexity: 13.1072, time_taken_in_seconds: 43
Epoch [1/1], Step [9153/13804], Loss: 2.3389, Perplexity: 10.3701, time_taken_in_seconds: 43
Epoch [1/1], Step [9154/13804], Loss: 2.4736, Perplexity: 11.8650, time_taken_in_seconds: 44
Epoch [1/1], Step [9155/13804], Loss: 2.6915, Perplexity: 14.7540, time_taken_in_seconds: 45
Epoch [1/1], Step [9156/13804], Loss: 2.7504, Perplexity: 15.6492, time_taken_in_seconds: 46
Epoch [1/1], Step [9157/13804], Loss: 2.3574, Perplexity: 10.5636, time_taken_in_seconds: 47
Epoch [1/1], Step [9158/13804], Loss: 2.9309, Perplexity: 18.7437, time_taken_in_seconds: 48
Epoch [1/1], Step [9159/13804], Loss: 2.7473, Perplexity: 15.6006, time_taken_in_seconds: 48
Epoch [1/1], Step [9160/13804], Loss: 2.4258, Perplexity: 11.3115, time_taken_in_seconds: 49
Epoch [1/1], Step [9161/13804], Loss: 2.3608, Perplexity: 10.5997, time_taken_in_seconds: 50
Epoch [1/1], Step [9162/13804], Loss: 2.6625, Perplexity: 14.3319, time_taken_in_seconds: 51
Epoch [1/1], Step [9163/13804], Loss: 2.6584, Perplexity: 14.2734, time_taken_in_seconds: 52
Epoch [1/1], Step [9164/13804], Loss: 2.6285, Perplexity: 13.8536, time_taken_in_seconds: 52
Epoch [1/1], Step [9165/13804], Loss: 2.7833, Perplexity: 16.1718, time_taken_in_seconds: 53
Epoch [1/1], Step [9166/13804], Loss: 2.3765, Perplexity: 10.7673, time_taken_in_seconds: 54
Epoch [1/1], Step [9167/13804], Loss: 2.5412, Perplexity: 12.6947, time_taken_in_seconds: 55
Epoch [1/1], Step [9168/13804], Loss: 2.4624, Perplexity: 11.7335, time_taken_in_seconds: 56
Epoch [1/1], Step [9169/13804], Loss: 2.8598, Perplexity: 17.4585, time_taken_in_seconds: 57
Epoch [1/1], Step [9170/13804], Loss: 2.4888, Perplexity: 12.0466, time_taken_in_seconds: 58
Epoch [1/1], Step [9171/13804], Loss: 2.8815, Perplexity: 17.8410, time_taken_in_seconds: 59
Epoch [1/1], Step [9172/13804], Loss: 3.0463, Perplexity: 21.0381, time_taken_in_seconds: 59
Epoch [1/1], Step [9173/13804], Loss: 2.6125, Perplexity: 13.6333, time_taken_in_seconds: 60
Epoch [1/1], Step [9174/13804], Loss: 2.7747, Perplexity: 16.0336, time_taken_in_seconds: 61
Epoch [1/1], Step [9175/13804], Loss: 2.5350, Perplexity: 12.6161, time_taken_in_seconds: 62
Epoch [1/1], Step [9176/13804], Loss: 5.1121, Perplexity: 166.0244, time_taken_in_seconds: 63
Epoch [1/1], Step [9177/13804], Loss: 2.3962, Perplexity: 10.9817, time_taken_in_seconds: 64
Epoch [1/1], Step [9178/13804], Loss: 2.7096, Perplexity: 15.0233, time_taken_in_seconds: 64
Epoch [1/1], Step [9179/13804], Loss: 2.7193, Perplexity: 15.1695, time_taken_in_seconds: 65
Epoch [1/1], Step [9180/13804], Loss: 2.3312, Perplexity: 10.2900, time_taken_in_seconds: 66
Epoch [1/1], Step [9181/13804], Loss: 3.3597, Perplexity: 28.7807, time_taken_in_seconds: 67
Epoch [1/1], Step [9182/13804], Loss: 2.5485, Perplexity: 12.7879, time_taken_in_seconds: 68
Epoch [1/1], Step [9183/13804], Loss: 2.8632, Perplexity: 17.5175, time_taken_in_seconds: 68
Epoch [1/1], Step [9184/13804], Loss: 2.2839, Perplexity: 9.8145, time_taken_in_seconds: 69
Epoch [1/1], Step [9185/13804], Loss: 2.3107, Perplexity: 10.0819, time_taken_in_seconds: 70
Epoch [1/1], Step [9186/13804], Loss: 2.7379, Perplexity: 15.4547, time_taken_in_seconds: 71
Epoch [1/1], Step [9187/13804], Loss: 2.4727, Perplexity: 11.8543, time_taken_in_seconds: 72
Epoch [1/1], Step [9188/13804], Loss: 2.4817, Perplexity: 11.9611, time_taken_in_seconds: 73
Epoch [1/1], Step [9189/13804], Loss: 2.4900, Perplexity: 12.0614, time_taken_in_seconds: 73
Epoch [1/1], Step [9190/13804], Loss: 2.3387, Perplexity: 10.3681, time_taken_in_seconds: 74
Epoch [1/1], Step [9191/13804], Loss: 2.4226, Perplexity: 11.2747, time_taken_in_seconds: 75
Epoch [1/1], Step [9192/13804], Loss: 2.5936, Perplexity: 13.3775, time_taken_in_seconds: 76
Epoch [1/1], Step [9193/13804], Loss: 2.1620, Perplexity: 8.6883, time_taken_in_seconds: 77
Epoch [1/1], Step [9194/13804], Loss: 2.6508, Perplexity: 14.1654, time_taken_in_seconds: 78
Epoch [1/1], Step [9195/13804], Loss: 2.7799, Perplexity: 16.1182, time_taken_in_seconds: 78
Epoch [1/1], Step [9196/13804], Loss: 2.7231, Perplexity: 15.2270, time_taken_in_seconds: 79
Epoch [1/1], Step [9197/13804], Loss: 2.3063, Perplexity: 10.0376, time_taken_in_seconds: 80
Epoch [1/1], Step [9198/13804], Loss: 2.6811, Perplexity: 14.6011, time_taken_in_seconds: 81
Epoch [1/1], Step [9199/13804], Loss: 2.9963, Perplexity: 20.0104, time_taken_in_seconds: 82
Epoch [1/1], Step [9200/13804], Loss: 2.4304, Perplexity: 11.3632, time_taken_in_seconds: 83
Epoch [1/1], Step [9201/13804], Loss: 2.7446, Perplexity: 15.5580, time_taken_in_seconds: 0
Epoch [1/1], Step [9202/13804], Loss: 2.5267, Perplexity: 12.5119, time_taken_in_seconds: 1
Epoch [1/1], Step [9203/13804], Loss: 2.8837, Perplexity: 17.8806, time_taken_in_seconds: 2
Epoch [1/1], Step [9204/13804], Loss: 2.7604, Perplexity: 15.8065, time_taken_in_seconds: 3
Epoch [1/1], Step [9205/13804], Loss: 2.7180, Perplexity: 15.1503, time_taken_in_seconds: 4
Epoch [1/1], Step [9206/13804], Loss: 2.6774, Perplexity: 14.5475, time_taken_in_seconds: 4
Epoch [1/1], Step [9207/13804], Loss: 3.2454, Perplexity: 25.6726, time_taken_in_seconds: 5
Epoch [1/1], Step [9208/13804], Loss: 2.9198, Perplexity: 18.5382, time_taken_in_seconds: 6
Epoch [1/1], Step [9209/13804], Loss: 2.4545, Perplexity: 11.6410, time_taken_in_seconds: 7
Epoch [1/1], Step [9210/13804], Loss: 2.7240, Perplexity: 15.2415, time_taken_in_seconds: 8
Epoch [1/1], Step [9211/13804], Loss: 2.3821, Perplexity: 10.8277, time_taken_in_seconds: 9
Epoch [1/1], Step [9212/13804], Loss: 2.3258, Perplexity: 10.2348, time_taken_in_seconds: 9
Epoch [1/1], Step [9213/13804], Loss: 2.9053, Perplexity: 18.2701, time_taken_in_seconds: 10
Epoch [1/1], Step [9214/13804], Loss: 2.6068, Perplexity: 13.5549, time_taken_in_seconds: 11
Epoch [1/1], Step [9215/13804], Loss: 2.1230, Perplexity: 8.3563, time_taken_in_seconds: 12
Epoch [1/1], Step [9216/13804], Loss: 2.4784, Perplexity: 11.9226, time_taken_in_seconds: 13
Epoch [1/1], Step [9217/13804], Loss: 2.3661, Perplexity: 10.6555, time_taken_in_seconds: 14
Epoch [1/1], Step [9218/13804], Loss: 2.8094, Perplexity: 16.6006, time_taken_in_seconds: 14
Epoch [1/1], Step [9219/13804], Loss: 2.3197, Perplexity: 10.1729, time_taken_in_seconds: 15
Epoch [1/1], Step [9220/13804], Loss: 2.5208, Perplexity: 12.4389, time_taken_in_seconds: 16
Epoch [1/1], Step [9221/13804], Loss: 2.8467, Perplexity: 17.2304, time_taken_in_seconds: 17
Epoch [1/1], Step [9222/13804], Loss: 2.7071, Perplexity: 14.9863, time_taken_in_seconds: 18
Epoch [1/1], Step [9223/13804], Loss: 2.5516, Perplexity: 12.8278, time_taken_in_seconds: 18
Epoch [1/1], Step [9224/13804], Loss: 2.4009, Perplexity: 11.0331, time_taken_in_seconds: 19
Epoch [1/1], Step [9225/13804], Loss: 4.5010, Perplexity: 90.1102, time_taken_in_seconds: 20
Epoch [1/1], Step [9226/13804], Loss: 2.6894, Perplexity: 14.7232, time_taken_in_seconds: 21
Epoch [1/1], Step [9227/13804], Loss: 3.0802, Perplexity: 21.7625, time_taken_in_seconds: 22
Epoch [1/1], Step [9228/13804], Loss: 2.4655, Perplexity: 11.7693, time_taken_in_seconds: 23
Epoch [1/1], Step [9229/13804], Loss: 2.9410, Perplexity: 18.9350, time_taken_in_seconds: 23
Epoch [1/1], Step [9230/13804], Loss: 2.6803, Perplexity: 14.5894, time_taken_in_seconds: 24
Epoch [1/1], Step [9231/13804], Loss: 2.5482, Perplexity: 12.7836, time_taken_in_seconds: 25
Epoch [1/1], Step [9232/13804], Loss: 3.2108, Perplexity: 24.8001, time_taken_in_seconds: 26
Epoch [1/1], Step [9233/13804], Loss: 2.5365, Perplexity: 12.6358, time_taken_in_seconds: 27
Epoch [1/1], Step [9234/13804], Loss: 2.6798, Perplexity: 14.5819, time_taken_in_seconds: 28
Epoch [1/1], Step [9235/13804], Loss: 2.3715, Perplexity: 10.7130, time_taken_in_seconds: 28
Epoch [1/1], Step [9236/13804], Loss: 2.5891, Perplexity: 13.3177, time_taken_in_seconds: 29
Epoch [1/1], Step [9237/13804], Loss: 2.6624, Perplexity: 14.3309, time_taken_in_seconds: 30
Epoch [1/1], Step [9238/13804], Loss: 2.7121, Perplexity: 15.0607, time_taken_in_seconds: 31
Epoch [1/1], Step [9239/13804], Loss: 2.6254, Perplexity: 13.8103, time_taken_in_seconds: 32
Epoch [1/1], Step [9240/13804], Loss: 2.4979, Perplexity: 12.1570, time_taken_in_seconds: 33
Epoch [1/1], Step [9241/13804], Loss: 2.6720, Perplexity: 14.4694, time_taken_in_seconds: 33
Epoch [1/1], Step [9242/13804], Loss: 2.4232, Perplexity: 11.2824, time_taken_in_seconds: 34
Epoch [1/1], Step [9243/13804], Loss: 2.4580, Perplexity: 11.6809, time_taken_in_seconds: 35
Epoch [1/1], Step [9244/13804], Loss: 2.5746, Perplexity: 13.1264, time_taken_in_seconds: 36
Epoch [1/1], Step [9245/13804], Loss: 2.6161, Perplexity: 13.6828, time_taken_in_seconds: 37
Epoch [1/1], Step [9246/13804], Loss: 2.6285, Perplexity: 13.8530, time_taken_in_seconds: 38
Epoch [1/1], Step [9247/13804], Loss: 2.6711, Perplexity: 14.4552, time_taken_in_seconds: 39
Epoch [1/1], Step [9248/13804], Loss: 2.8288, Perplexity: 16.9256, time_taken_in_seconds: 39
Epoch [1/1], Step [9249/13804], Loss: 2.6030, Perplexity: 13.5039, time_taken_in_seconds: 40
Epoch [1/1], Step [9250/13804], Loss: 2.4414, Perplexity: 11.4886, time_taken_in_seconds: 41
Epoch [1/1], Step [9251/13804], Loss: 2.4993, Perplexity: 12.1746, time_taken_in_seconds: 42
Epoch [1/1], Step [9252/13804], Loss: 2.6823, Perplexity: 14.6190, time_taken_in_seconds: 43
Epoch [1/1], Step [9253/13804], Loss: 2.6359, Perplexity: 13.9555, time_taken_in_seconds: 44
Epoch [1/1], Step [9254/13804], Loss: 2.3678, Perplexity: 10.6743, time_taken_in_seconds: 44
Epoch [1/1], Step [9255/13804], Loss: 2.5648, Perplexity: 12.9980, time_taken_in_seconds: 45
Epoch [1/1], Step [9256/13804], Loss: 2.5755, Perplexity: 13.1378, time_taken_in_seconds: 46
Epoch [1/1], Step [9257/13804], Loss: 2.5002, Perplexity: 12.1853, time_taken_in_seconds: 47
Epoch [1/1], Step [9258/13804], Loss: 2.7152, Perplexity: 15.1078, time_taken_in_seconds: 48
Epoch [1/1], Step [9259/13804], Loss: 2.4562, Perplexity: 11.6600, time_taken_in_seconds: 48
Epoch [1/1], Step [9260/13804], Loss: 2.5249, Perplexity: 12.4903, time_taken_in_seconds: 49
Epoch [1/1], Step [9261/13804], Loss: 3.0399, Perplexity: 20.9022, time_taken_in_seconds: 50
Epoch [1/1], Step [9262/13804], Loss: 2.7841, Perplexity: 16.1860, time_taken_in_seconds: 51
Epoch [1/1], Step [9263/13804], Loss: 2.3954, Perplexity: 10.9722, time_taken_in_seconds: 52
Epoch [1/1], Step [9264/13804], Loss: 2.5159, Perplexity: 12.3780, time_taken_in_seconds: 53
Epoch [1/1], Step [9265/13804], Loss: 3.1645, Perplexity: 23.6769, time_taken_in_seconds: 53
Epoch [1/1], Step [9266/13804], Loss: 2.7836, Perplexity: 16.1773, time_taken_in_seconds: 54
Epoch [1/1], Step [9267/13804], Loss: 2.5508, Perplexity: 12.8174, time_taken_in_seconds: 55
Epoch [1/1], Step [9268/13804], Loss: 2.5611, Perplexity: 12.9505, time_taken_in_seconds: 56
Epoch [1/1], Step [9269/13804], Loss: 3.0225, Perplexity: 20.5424, time_taken_in_seconds: 57
Epoch [1/1], Step [9270/13804], Loss: 2.7299, Perplexity: 15.3314, time_taken_in_seconds: 57
Epoch [1/1], Step [9271/13804], Loss: 2.6959, Perplexity: 14.8190, time_taken_in_seconds: 58
Epoch [1/1], Step [9272/13804], Loss: 2.6294, Perplexity: 13.8658, time_taken_in_seconds: 59
Epoch [1/1], Step [9273/13804], Loss: 2.6614, Perplexity: 14.3165, time_taken_in_seconds: 60
Epoch [1/1], Step [9274/13804], Loss: 2.7375, Perplexity: 15.4478, time_taken_in_seconds: 61
Epoch [1/1], Step [9275/13804], Loss: 2.6323, Perplexity: 13.9056, time_taken_in_seconds: 62
Epoch [1/1], Step [9276/13804], Loss: 2.1557, Perplexity: 8.6336, time_taken_in_seconds: 62
Epoch [1/1], Step [9277/13804], Loss: 2.5628, Perplexity: 12.9716, time_taken_in_seconds: 63
Epoch [1/1], Step [9278/13804], Loss: 2.1957, Perplexity: 8.9867, time_taken_in_seconds: 64
Epoch [1/1], Step [9279/13804], Loss: 3.2721, Perplexity: 26.3678, time_taken_in_seconds: 65
Epoch [1/1], Step [9280/13804], Loss: 2.4089, Perplexity: 11.1223, time_taken_in_seconds: 66
Epoch [1/1], Step [9281/13804], Loss: 2.5367, Perplexity: 12.6381, time_taken_in_seconds: 67
Epoch [1/1], Step [9282/13804], Loss: 2.5064, Perplexity: 12.2613, time_taken_in_seconds: 67
Epoch [1/1], Step [9283/13804], Loss: 2.4940, Perplexity: 12.1093, time_taken_in_seconds: 68
Epoch [1/1], Step [9284/13804], Loss: 2.2447, Perplexity: 9.4372, time_taken_in_seconds: 69
Epoch [1/1], Step [9285/13804], Loss: 2.7565, Perplexity: 15.7452, time_taken_in_seconds: 70
Epoch [1/1], Step [9286/13804], Loss: 2.3198, Perplexity: 10.1738, time_taken_in_seconds: 71
Epoch [1/1], Step [9287/13804], Loss: 2.7852, Perplexity: 16.2031, time_taken_in_seconds: 71
Epoch [1/1], Step [9288/13804], Loss: 2.8800, Perplexity: 17.8151, time_taken_in_seconds: 72
Epoch [1/1], Step [9289/13804], Loss: 2.9080, Perplexity: 18.3204, time_taken_in_seconds: 73
Epoch [1/1], Step [9290/13804], Loss: 2.4518, Perplexity: 11.6087, time_taken_in_seconds: 74
Epoch [1/1], Step [9291/13804], Loss: 2.6263, Perplexity: 13.8228, time_taken_in_seconds: 75
Epoch [1/1], Step [9292/13804], Loss: 2.6638, Perplexity: 14.3514, time_taken_in_seconds: 76
Epoch [1/1], Step [9293/13804], Loss: 2.4152, Perplexity: 11.1920, time_taken_in_seconds: 76
Epoch [1/1], Step [9294/13804], Loss: 2.7437, Perplexity: 15.5446, time_taken_in_seconds: 77
Epoch [1/1], Step [9295/13804], Loss: 2.3197, Perplexity: 10.1723, time_taken_in_seconds: 78
Epoch [1/1], Step [9296/13804], Loss: 2.1880, Perplexity: 8.9175, time_taken_in_seconds: 79
Epoch [1/1], Step [9297/13804], Loss: 2.7340, Perplexity: 15.3946, time_taken_in_seconds: 80
Epoch [1/1], Step [9298/13804], Loss: 2.6519, Perplexity: 14.1809, time_taken_in_seconds: 80
Epoch [1/1], Step [9299/13804], Loss: 2.9037, Perplexity: 18.2415, time_taken_in_seconds: 81
Epoch [1/1], Step [9300/13804], Loss: 2.4966, Perplexity: 12.1418, time_taken_in_seconds: 82
Epoch [1/1], Step [9301/13804], Loss: 2.3909, Perplexity: 10.9236, time_taken_in_seconds: 0
Epoch [1/1], Step [9302/13804], Loss: 2.6416, Perplexity: 14.0353, time_taken_in_seconds: 1
Epoch [1/1], Step [9303/13804], Loss: 2.8107, Perplexity: 16.6214, time_taken_in_seconds: 2
Epoch [1/1], Step [9304/13804], Loss: 3.0025, Perplexity: 20.1349, time_taken_in_seconds: 3
Epoch [1/1], Step [9305/13804], Loss: 2.6900, Perplexity: 14.7314, time_taken_in_seconds: 4
Epoch [1/1], Step [9306/13804], Loss: 2.7989, Perplexity: 16.4269, time_taken_in_seconds: 4
Epoch [1/1], Step [9307/13804], Loss: 2.5088, Perplexity: 12.2906, time_taken_in_seconds: 5
Epoch [1/1], Step [9308/13804], Loss: 2.7850, Perplexity: 16.2002, time_taken_in_seconds: 6
Epoch [1/1], Step [9309/13804], Loss: 2.6955, Perplexity: 14.8135, time_taken_in_seconds: 7
Epoch [1/1], Step [9310/13804], Loss: 2.6480, Perplexity: 14.1261, time_taken_in_seconds: 8
Epoch [1/1], Step [9311/13804], Loss: 3.0995, Perplexity: 22.1863, time_taken_in_seconds: 9
Epoch [1/1], Step [9312/13804], Loss: 2.6815, Perplexity: 14.6071, time_taken_in_seconds: 9
Epoch [1/1], Step [9313/13804], Loss: 2.2622, Perplexity: 9.6042, time_taken_in_seconds: 10
Epoch [1/1], Step [9314/13804], Loss: 2.2940, Perplexity: 9.9142, time_taken_in_seconds: 11
Epoch [1/1], Step [9315/13804], Loss: 2.4720, Perplexity: 11.8461, time_taken_in_seconds: 12
Epoch [1/1], Step [9316/13804], Loss: 2.3798, Perplexity: 10.8025, time_taken_in_seconds: 13
Epoch [1/1], Step [9317/13804], Loss: 2.5347, Perplexity: 12.6130, time_taken_in_seconds: 14
Epoch [1/1], Step [9318/13804], Loss: 2.5762, Perplexity: 13.1469, time_taken_in_seconds: 14
Epoch [1/1], Step [9319/13804], Loss: 2.5993, Perplexity: 13.4548, time_taken_in_seconds: 15
Epoch [1/1], Step [9320/13804], Loss: 2.6452, Perplexity: 14.0862, time_taken_in_seconds: 16
Epoch [1/1], Step [9321/13804], Loss: 2.6237, Perplexity: 13.7872, time_taken_in_seconds: 17
Epoch [1/1], Step [9322/13804], Loss: 2.4087, Perplexity: 11.1200, time_taken_in_seconds: 18
Epoch [1/1], Step [9323/13804], Loss: 2.7735, Perplexity: 16.0148, time_taken_in_seconds: 19
Epoch [1/1], Step [9324/13804], Loss: 2.2266, Perplexity: 9.2683, time_taken_in_seconds: 19
Epoch [1/1], Step [9325/13804], Loss: 2.3887, Perplexity: 10.8990, time_taken_in_seconds: 20
Epoch [1/1], Step [9326/13804], Loss: 2.3700, Perplexity: 10.6971, time_taken_in_seconds: 21
Epoch [1/1], Step [9327/13804], Loss: 2.4789, Perplexity: 11.9276, time_taken_in_seconds: 22
Epoch [1/1], Step [9328/13804], Loss: 2.4102, Perplexity: 11.1362, time_taken_in_seconds: 23
Epoch [1/1], Step [9329/13804], Loss: 2.7759, Perplexity: 16.0538, time_taken_in_seconds: 23
Epoch [1/1], Step [9330/13804], Loss: 2.8292, Perplexity: 16.9319, time_taken_in_seconds: 24
Epoch [1/1], Step [9331/13804], Loss: 2.4674, Perplexity: 11.7919, time_taken_in_seconds: 25
Epoch [1/1], Step [9332/13804], Loss: 2.8431, Perplexity: 17.1689, time_taken_in_seconds: 26
Epoch [1/1], Step [9333/13804], Loss: 2.4373, Perplexity: 11.4420, time_taken_in_seconds: 27
Epoch [1/1], Step [9334/13804], Loss: 2.7271, Perplexity: 15.2883, time_taken_in_seconds: 28
Epoch [1/1], Step [9335/13804], Loss: 2.8507, Perplexity: 17.2994, time_taken_in_seconds: 28
Epoch [1/1], Step [9336/13804], Loss: 2.7248, Perplexity: 15.2530, time_taken_in_seconds: 29
Epoch [1/1], Step [9337/13804], Loss: 2.4191, Perplexity: 11.2360, time_taken_in_seconds: 30
Epoch [1/1], Step [9338/13804], Loss: 2.7813, Perplexity: 16.1398, time_taken_in_seconds: 31
Epoch [1/1], Step [9339/13804], Loss: 2.5359, Perplexity: 12.6284, time_taken_in_seconds: 32
Epoch [1/1], Step [9340/13804], Loss: 2.4579, Perplexity: 11.6806, time_taken_in_seconds: 32
Epoch [1/1], Step [9341/13804], Loss: 2.6477, Perplexity: 14.1216, time_taken_in_seconds: 33
Epoch [1/1], Step [9342/13804], Loss: 2.4545, Perplexity: 11.6410, time_taken_in_seconds: 34
Epoch [1/1], Step [9343/13804], Loss: 2.1870, Perplexity: 8.9085, time_taken_in_seconds: 35
Epoch [1/1], Step [9344/13804], Loss: 2.6210, Perplexity: 13.7499, time_taken_in_seconds: 36
Epoch [1/1], Step [9345/13804], Loss: 2.7985, Perplexity: 16.4196, time_taken_in_seconds: 36
Epoch [1/1], Step [9346/13804], Loss: 3.3564, Perplexity: 28.6862, time_taken_in_seconds: 37
Epoch [1/1], Step [9347/13804], Loss: 2.8362, Perplexity: 17.0507, time_taken_in_seconds: 38
Epoch [1/1], Step [9348/13804], Loss: 2.2561, Perplexity: 9.5455, time_taken_in_seconds: 39
Epoch [1/1], Step [9349/13804], Loss: 2.1425, Perplexity: 8.5206, time_taken_in_seconds: 40
Epoch [1/1], Step [9350/13804], Loss: 2.6926, Perplexity: 14.7701, time_taken_in_seconds: 41
Epoch [1/1], Step [9351/13804], Loss: 2.3667, Perplexity: 10.6625, time_taken_in_seconds: 41
Epoch [1/1], Step [9352/13804], Loss: 2.3248, Perplexity: 10.2249, time_taken_in_seconds: 42
Epoch [1/1], Step [9353/13804], Loss: 2.3085, Perplexity: 10.0595, time_taken_in_seconds: 43
Epoch [1/1], Step [9354/13804], Loss: 2.5230, Perplexity: 12.4655, time_taken_in_seconds: 44
Epoch [1/1], Step [9355/13804], Loss: 2.5877, Perplexity: 13.2991, time_taken_in_seconds: 45
Epoch [1/1], Step [9356/13804], Loss: 3.4042, Perplexity: 30.0908, time_taken_in_seconds: 45
Epoch [1/1], Step [9357/13804], Loss: 2.6101, Perplexity: 13.6011, time_taken_in_seconds: 46
Epoch [1/1], Step [9358/13804], Loss: 2.7763, Perplexity: 16.0600, time_taken_in_seconds: 47
Epoch [1/1], Step [9359/13804], Loss: 2.4739, Perplexity: 11.8688, time_taken_in_seconds: 48
Epoch [1/1], Step [9360/13804], Loss: 2.5279, Perplexity: 12.5277, time_taken_in_seconds: 49
Epoch [1/1], Step [9361/13804], Loss: 2.5860, Perplexity: 13.2771, time_taken_in_seconds: 49
Epoch [1/1], Step [9362/13804], Loss: 2.4992, Perplexity: 12.1732, time_taken_in_seconds: 50
Epoch [1/1], Step [9363/13804], Loss: 2.2072, Perplexity: 9.0898, time_taken_in_seconds: 51
Epoch [1/1], Step [9364/13804], Loss: 2.4609, Perplexity: 11.7148, time_taken_in_seconds: 52
Epoch [1/1], Step [9365/13804], Loss: 2.3615, Perplexity: 10.6064, time_taken_in_seconds: 53
Epoch [1/1], Step [9366/13804], Loss: 2.4957, Perplexity: 12.1307, time_taken_in_seconds: 53
Epoch [1/1], Step [9367/13804], Loss: 2.5793, Perplexity: 13.1880, time_taken_in_seconds: 54
Epoch [1/1], Step [9368/13804], Loss: 2.8497, Perplexity: 17.2834, time_taken_in_seconds: 55
Epoch [1/1], Step [9369/13804], Loss: 2.3118, Perplexity: 10.0923, time_taken_in_seconds: 56
Epoch [1/1], Step [9370/13804], Loss: 2.2548, Perplexity: 9.5333, time_taken_in_seconds: 57
Epoch [1/1], Step [9371/13804], Loss: 2.5059, Perplexity: 12.2551, time_taken_in_seconds: 58
Epoch [1/1], Step [9372/13804], Loss: 2.4718, Perplexity: 11.8435, time_taken_in_seconds: 58
Epoch [1/1], Step [9373/13804], Loss: 2.7830, Perplexity: 16.1675, time_taken_in_seconds: 59
Epoch [1/1], Step [9374/13804], Loss: 2.9624, Perplexity: 19.3441, time_taken_in_seconds: 60
Epoch [1/1], Step [9375/13804], Loss: 3.0393, Perplexity: 20.8912, time_taken_in_seconds: 61
Epoch [1/1], Step [9376/13804], Loss: 2.7098, Perplexity: 15.0258, time_taken_in_seconds: 62
Epoch [1/1], Step [9377/13804], Loss: 2.5649, Perplexity: 12.9989, time_taken_in_seconds: 62
Epoch [1/1], Step [9378/13804], Loss: 3.1487, Perplexity: 23.3068, time_taken_in_seconds: 63
Epoch [1/1], Step [9379/13804], Loss: 2.7803, Perplexity: 16.1234, time_taken_in_seconds: 64
Epoch [1/1], Step [9380/13804], Loss: 2.3280, Perplexity: 10.2578, time_taken_in_seconds: 65
Epoch [1/1], Step [9381/13804], Loss: 2.9208, Perplexity: 18.5564, time_taken_in_seconds: 66
Epoch [1/1], Step [9382/13804], Loss: 2.6249, Perplexity: 13.8032, time_taken_in_seconds: 66
Epoch [1/1], Step [9383/13804], Loss: 2.6525, Perplexity: 14.1896, time_taken_in_seconds: 67
Epoch [1/1], Step [9384/13804], Loss: 2.3006, Perplexity: 9.9799, time_taken_in_seconds: 68
Epoch [1/1], Step [9385/13804], Loss: 2.4771, Perplexity: 11.9064, time_taken_in_seconds: 69
Epoch [1/1], Step [9386/13804], Loss: 2.6794, Perplexity: 14.5766, time_taken_in_seconds: 70
Epoch [1/1], Step [9387/13804], Loss: 2.7005, Perplexity: 14.8872, time_taken_in_seconds: 70
Epoch [1/1], Step [9388/13804], Loss: 2.3741, Perplexity: 10.7410, time_taken_in_seconds: 71
Epoch [1/1], Step [9389/13804], Loss: 2.3926, Perplexity: 10.9424, time_taken_in_seconds: 72
Epoch [1/1], Step [9390/13804], Loss: 2.5899, Perplexity: 13.3282, time_taken_in_seconds: 73
Epoch [1/1], Step [9391/13804], Loss: 2.6202, Perplexity: 13.7387, time_taken_in_seconds: 74
Epoch [1/1], Step [9392/13804], Loss: 2.3589, Perplexity: 10.5797, time_taken_in_seconds: 75
Epoch [1/1], Step [9393/13804], Loss: 2.4487, Perplexity: 11.5736, time_taken_in_seconds: 76
Epoch [1/1], Step [9394/13804], Loss: 2.3964, Perplexity: 10.9834, time_taken_in_seconds: 76
Epoch [1/1], Step [9395/13804], Loss: 2.3281, Perplexity: 10.2585, time_taken_in_seconds: 77
Epoch [1/1], Step [9396/13804], Loss: 3.3774, Perplexity: 29.2960, time_taken_in_seconds: 78
Epoch [1/1], Step [9397/13804], Loss: 2.5524, Perplexity: 12.8380, time_taken_in_seconds: 79
Epoch [1/1], Step [9398/13804], Loss: 2.7077, Perplexity: 14.9948, time_taken_in_seconds: 80
Epoch [1/1], Step [9399/13804], Loss: 2.6791, Perplexity: 14.5721, time_taken_in_seconds: 80
Epoch [1/1], Step [9400/13804], Loss: 2.6665, Perplexity: 14.3902, time_taken_in_seconds: 81
Epoch [1/1], Step [9401/13804], Loss: 2.7014, Perplexity: 14.9009, time_taken_in_seconds: 0
Epoch [1/1], Step [9402/13804], Loss: 2.3082, Perplexity: 10.0568, time_taken_in_seconds: 1
Epoch [1/1], Step [9403/13804], Loss: 2.4251, Perplexity: 11.3032, time_taken_in_seconds: 2
Epoch [1/1], Step [9404/13804], Loss: 2.8629, Perplexity: 17.5115, time_taken_in_seconds: 3
Epoch [1/1], Step [9405/13804], Loss: 2.7402, Perplexity: 15.4906, time_taken_in_seconds: 4
Epoch [1/1], Step [9406/13804], Loss: 2.2109, Perplexity: 9.1240, time_taken_in_seconds: 4
Epoch [1/1], Step [9407/13804], Loss: 2.5065, Perplexity: 12.2616, time_taken_in_seconds: 5
Epoch [1/1], Step [9408/13804], Loss: 3.2628, Perplexity: 26.1231, time_taken_in_seconds: 6
Epoch [1/1], Step [9409/13804], Loss: 2.7373, Perplexity: 15.4446, time_taken_in_seconds: 7
Epoch [1/1], Step [9410/13804], Loss: 2.4100, Perplexity: 11.1344, time_taken_in_seconds: 8
Epoch [1/1], Step [9411/13804], Loss: 2.5537, Perplexity: 12.8549, time_taken_in_seconds: 9
Epoch [1/1], Step [9412/13804], Loss: 2.5602, Perplexity: 12.9383, time_taken_in_seconds: 9
Epoch [1/1], Step [9413/13804], Loss: 2.4652, Perplexity: 11.7660, time_taken_in_seconds: 10
Epoch [1/1], Step [9414/13804], Loss: 2.4963, Perplexity: 12.1374, time_taken_in_seconds: 11
Epoch [1/1], Step [9415/13804], Loss: 2.5165, Perplexity: 12.3854, time_taken_in_seconds: 12
Epoch [1/1], Step [9416/13804], Loss: 2.4989, Perplexity: 12.1690, time_taken_in_seconds: 13
Epoch [1/1], Step [9417/13804], Loss: 2.4502, Perplexity: 11.5911, time_taken_in_seconds: 13
Epoch [1/1], Step [9418/13804], Loss: 2.8090, Perplexity: 16.5928, time_taken_in_seconds: 14
Epoch [1/1], Step [9419/13804], Loss: 2.4509, Perplexity: 11.5986, time_taken_in_seconds: 15
Epoch [1/1], Step [9420/13804], Loss: 2.1860, Perplexity: 8.8993, time_taken_in_seconds: 16
Epoch [1/1], Step [9421/13804], Loss: 2.3721, Perplexity: 10.7196, time_taken_in_seconds: 17
Epoch [1/1], Step [9422/13804], Loss: 3.0245, Perplexity: 20.5834, time_taken_in_seconds: 18
Epoch [1/1], Step [9423/13804], Loss: 2.4196, Perplexity: 11.2410, time_taken_in_seconds: 18
Epoch [1/1], Step [9424/13804], Loss: 2.8231, Perplexity: 16.8293, time_taken_in_seconds: 19
Epoch [1/1], Step [9425/13804], Loss: 3.2646, Perplexity: 26.1699, time_taken_in_seconds: 20
Epoch [1/1], Step [9426/13804], Loss: 2.4620, Perplexity: 11.7278, time_taken_in_seconds: 21
Epoch [1/1], Step [9427/13804], Loss: 2.8416, Perplexity: 17.1431, time_taken_in_seconds: 22
Epoch [1/1], Step [9428/13804], Loss: 2.6100, Perplexity: 13.5996, time_taken_in_seconds: 22
Epoch [1/1], Step [9429/13804], Loss: 2.5617, Perplexity: 12.9574, time_taken_in_seconds: 23
Epoch [1/1], Step [9430/13804], Loss: 2.7079, Perplexity: 14.9980, time_taken_in_seconds: 24
Epoch [1/1], Step [9431/13804], Loss: 2.7885, Perplexity: 16.2559, time_taken_in_seconds: 25
Epoch [1/1], Step [9432/13804], Loss: 2.6039, Perplexity: 13.5159, time_taken_in_seconds: 26
Epoch [1/1], Step [9433/13804], Loss: 2.8492, Perplexity: 17.2733, time_taken_in_seconds: 27
Epoch [1/1], Step [9434/13804], Loss: 2.9385, Perplexity: 18.8881, time_taken_in_seconds: 27
Epoch [1/1], Step [9435/13804], Loss: 2.8174, Perplexity: 16.7336, time_taken_in_seconds: 28
Epoch [1/1], Step [9436/13804], Loss: 2.7231, Perplexity: 15.2271, time_taken_in_seconds: 29
Epoch [1/1], Step [9437/13804], Loss: 2.1483, Perplexity: 8.5703, time_taken_in_seconds: 30
Epoch [1/1], Step [9438/13804], Loss: 2.3786, Perplexity: 10.7897, time_taken_in_seconds: 31
Epoch [1/1], Step [9439/13804], Loss: 2.9338, Perplexity: 18.7995, time_taken_in_seconds: 31
Epoch [1/1], Step [9440/13804], Loss: 2.1566, Perplexity: 8.6413, time_taken_in_seconds: 32
Epoch [1/1], Step [9441/13804], Loss: 2.4201, Perplexity: 11.2467, time_taken_in_seconds: 33
Epoch [1/1], Step [9442/13804], Loss: 2.9930, Perplexity: 19.9452, time_taken_in_seconds: 34
Epoch [1/1], Step [9443/13804], Loss: 2.5820, Perplexity: 13.2234, time_taken_in_seconds: 35
Epoch [1/1], Step [9444/13804], Loss: 2.4065, Perplexity: 11.0952, time_taken_in_seconds: 36
Epoch [1/1], Step [9445/13804], Loss: 2.7314, Perplexity: 15.3544, time_taken_in_seconds: 36
Epoch [1/1], Step [9446/13804], Loss: 2.5399, Perplexity: 12.6786, time_taken_in_seconds: 37
Epoch [1/1], Step [9447/13804], Loss: 2.9124, Perplexity: 18.4002, time_taken_in_seconds: 38
Epoch [1/1], Step [9448/13804], Loss: 2.6755, Perplexity: 14.5194, time_taken_in_seconds: 39
Epoch [1/1], Step [9449/13804], Loss: 2.1046, Perplexity: 8.2035, time_taken_in_seconds: 40
Epoch [1/1], Step [9450/13804], Loss: 2.4279, Perplexity: 11.3346, time_taken_in_seconds: 40
Epoch [1/1], Step [9451/13804], Loss: 2.3409, Perplexity: 10.3910, time_taken_in_seconds: 41
Epoch [1/1], Step [9452/13804], Loss: 2.5394, Perplexity: 12.6721, time_taken_in_seconds: 42
Epoch [1/1], Step [9453/13804], Loss: 2.2646, Perplexity: 9.6268, time_taken_in_seconds: 43
Epoch [1/1], Step [9454/13804], Loss: 2.9225, Perplexity: 18.5883, time_taken_in_seconds: 44
Epoch [1/1], Step [9455/13804], Loss: 2.8434, Perplexity: 17.1735, time_taken_in_seconds: 45
Epoch [1/1], Step [9456/13804], Loss: 2.6873, Perplexity: 14.6914, time_taken_in_seconds: 45
Epoch [1/1], Step [9457/13804], Loss: 2.6153, Perplexity: 13.6716, time_taken_in_seconds: 46
Epoch [1/1], Step [9458/13804], Loss: 2.6343, Perplexity: 13.9330, time_taken_in_seconds: 47
Epoch [1/1], Step [9459/13804], Loss: 3.1101, Perplexity: 22.4227, time_taken_in_seconds: 48
Epoch [1/1], Step [9460/13804], Loss: 2.3379, Perplexity: 10.3600, time_taken_in_seconds: 49
Epoch [1/1], Step [9461/13804], Loss: 2.3619, Perplexity: 10.6115, time_taken_in_seconds: 49
Epoch [1/1], Step [9462/13804], Loss: 2.7913, Perplexity: 16.3024, time_taken_in_seconds: 50
Epoch [1/1], Step [9463/13804], Loss: 3.0473, Perplexity: 21.0585, time_taken_in_seconds: 51
Epoch [1/1], Step [9464/13804], Loss: 2.7683, Perplexity: 15.9310, time_taken_in_seconds: 52
Epoch [1/1], Step [9465/13804], Loss: 2.5624, Perplexity: 12.9663, time_taken_in_seconds: 53
Epoch [1/1], Step [9466/13804], Loss: 2.4463, Perplexity: 11.5453, time_taken_in_seconds: 54
Epoch [1/1], Step [9467/13804], Loss: 2.4922, Perplexity: 12.0880, time_taken_in_seconds: 55
Epoch [1/1], Step [9468/13804], Loss: 3.2982, Perplexity: 27.0643, time_taken_in_seconds: 55
Epoch [1/1], Step [9469/13804], Loss: 2.3882, Perplexity: 10.8939, time_taken_in_seconds: 56
Epoch [1/1], Step [9470/13804], Loss: 2.3525, Perplexity: 10.5114, time_taken_in_seconds: 57
Epoch [1/1], Step [9471/13804], Loss: 2.2893, Perplexity: 9.8684, time_taken_in_seconds: 58
Epoch [1/1], Step [9472/13804], Loss: 2.6643, Perplexity: 14.3585, time_taken_in_seconds: 59
Epoch [1/1], Step [9473/13804], Loss: 2.5889, Perplexity: 13.3155, time_taken_in_seconds: 60
Epoch [1/1], Step [9474/13804], Loss: 2.6437, Perplexity: 14.0656, time_taken_in_seconds: 60
Epoch [1/1], Step [9475/13804], Loss: 2.9804, Perplexity: 19.6958, time_taken_in_seconds: 61
Epoch [1/1], Step [9476/13804], Loss: 2.9363, Perplexity: 18.8468, time_taken_in_seconds: 62
Epoch [1/1], Step [9477/13804], Loss: 2.5477, Perplexity: 12.7780, time_taken_in_seconds: 63
Epoch [1/1], Step [9478/13804], Loss: 2.3993, Perplexity: 11.0154, time_taken_in_seconds: 64
Epoch [1/1], Step [9479/13804], Loss: 2.3028, Perplexity: 10.0018, time_taken_in_seconds: 64
Epoch [1/1], Step [9480/13804], Loss: 2.5692, Perplexity: 13.0557, time_taken_in_seconds: 65
Epoch [1/1], Step [9481/13804], Loss: 2.5771, Perplexity: 13.1588, time_taken_in_seconds: 66
Epoch [1/1], Step [9482/13804], Loss: 2.7388, Perplexity: 15.4682, time_taken_in_seconds: 67
Epoch [1/1], Step [9483/13804], Loss: 2.3963, Perplexity: 10.9830, time_taken_in_seconds: 68
Epoch [1/1], Step [9484/13804], Loss: 2.4442, Perplexity: 11.5214, time_taken_in_seconds: 69
Epoch [1/1], Step [9485/13804], Loss: 3.5614, Perplexity: 35.2128, time_taken_in_seconds: 69
Epoch [1/1], Step [9486/13804], Loss: 2.3366, Perplexity: 10.3460, time_taken_in_seconds: 70
Epoch [1/1], Step [9487/13804], Loss: 2.6667, Perplexity: 14.3930, time_taken_in_seconds: 71
Epoch [1/1], Step [9488/13804], Loss: 2.4701, Perplexity: 11.8241, time_taken_in_seconds: 72
Epoch [1/1], Step [9489/13804], Loss: 2.3496, Perplexity: 10.4815, time_taken_in_seconds: 73
Epoch [1/1], Step [9490/13804], Loss: 2.5736, Perplexity: 13.1128, time_taken_in_seconds: 74
Epoch [1/1], Step [9491/13804], Loss: 2.5662, Perplexity: 13.0159, time_taken_in_seconds: 74
Epoch [1/1], Step [9492/13804], Loss: 2.9412, Perplexity: 18.9393, time_taken_in_seconds: 75
Epoch [1/1], Step [9493/13804], Loss: 2.3586, Perplexity: 10.5766, time_taken_in_seconds: 76
Epoch [1/1], Step [9494/13804], Loss: 2.4628, Perplexity: 11.7376, time_taken_in_seconds: 77
Epoch [1/1], Step [9495/13804], Loss: 2.9564, Perplexity: 19.2289, time_taken_in_seconds: 78
Epoch [1/1], Step [9496/13804], Loss: 2.1591, Perplexity: 8.6636, time_taken_in_seconds: 78
Epoch [1/1], Step [9497/13804], Loss: 2.9889, Perplexity: 19.8642, time_taken_in_seconds: 79
Epoch [1/1], Step [9498/13804], Loss: 3.0530, Perplexity: 21.1780, time_taken_in_seconds: 80
Epoch [1/1], Step [9499/13804], Loss: 2.7015, Perplexity: 14.9014, time_taken_in_seconds: 81
Epoch [1/1], Step [9500/13804], Loss: 2.4192, Perplexity: 11.2373, time_taken_in_seconds: 82
Epoch [1/1], Step [9501/13804], Loss: 2.6199, Perplexity: 13.7340, time_taken_in_seconds: 0
Epoch [1/1], Step [9502/13804], Loss: 2.4230, Perplexity: 11.2792, time_taken_in_seconds: 1
Epoch [1/1], Step [9503/13804], Loss: 2.8790, Perplexity: 17.7964, time_taken_in_seconds: 2
Epoch [1/1], Step [9504/13804], Loss: 3.3354, Perplexity: 28.0906, time_taken_in_seconds: 3
Epoch [1/1], Step [9505/13804], Loss: 2.3713, Perplexity: 10.7117, time_taken_in_seconds: 4
Epoch [1/1], Step [9506/13804], Loss: 2.3097, Perplexity: 10.0709, time_taken_in_seconds: 4
Epoch [1/1], Step [9507/13804], Loss: 2.3215, Perplexity: 10.1913, time_taken_in_seconds: 5
Epoch [1/1], Step [9508/13804], Loss: 2.5720, Perplexity: 13.0913, time_taken_in_seconds: 6
Epoch [1/1], Step [9509/13804], Loss: 2.7432, Perplexity: 15.5361, time_taken_in_seconds: 7
Epoch [1/1], Step [9510/13804], Loss: 2.3142, Perplexity: 10.1167, time_taken_in_seconds: 8
Epoch [1/1], Step [9511/13804], Loss: 3.2821, Perplexity: 26.6313, time_taken_in_seconds: 9
Epoch [1/1], Step [9512/13804], Loss: 2.4363, Perplexity: 11.4309, time_taken_in_seconds: 9
Epoch [1/1], Step [9513/13804], Loss: 2.8837, Perplexity: 17.8799, time_taken_in_seconds: 10
Epoch [1/1], Step [9514/13804], Loss: 2.2871, Perplexity: 9.8462, time_taken_in_seconds: 11
Epoch [1/1], Step [9515/13804], Loss: 3.0680, Perplexity: 21.4994, time_taken_in_seconds: 12
Epoch [1/1], Step [9516/13804], Loss: 2.8000, Perplexity: 16.4452, time_taken_in_seconds: 13
Epoch [1/1], Step [9517/13804], Loss: 2.5777, Perplexity: 13.1666, time_taken_in_seconds: 13
Epoch [1/1], Step [9518/13804], Loss: 2.4200, Perplexity: 11.2455, time_taken_in_seconds: 14
Epoch [1/1], Step [9519/13804], Loss: 2.2727, Perplexity: 9.7056, time_taken_in_seconds: 15
Epoch [1/1], Step [9520/13804], Loss: 2.2881, Perplexity: 9.8563, time_taken_in_seconds: 16
Epoch [1/1], Step [9521/13804], Loss: 3.0476, Perplexity: 21.0646, time_taken_in_seconds: 17
Epoch [1/1], Step [9522/13804], Loss: 2.5333, Perplexity: 12.5949, time_taken_in_seconds: 18
Epoch [1/1], Step [9523/13804], Loss: 2.2275, Perplexity: 9.2768, time_taken_in_seconds: 18
Epoch [1/1], Step [9524/13804], Loss: 2.3736, Perplexity: 10.7358, time_taken_in_seconds: 19
Epoch [1/1], Step [9525/13804], Loss: 2.2887, Perplexity: 9.8616, time_taken_in_seconds: 20
Epoch [1/1], Step [9526/13804], Loss: 2.3338, Perplexity: 10.3166, time_taken_in_seconds: 21
Epoch [1/1], Step [9527/13804], Loss: 2.5961, Perplexity: 13.4110, time_taken_in_seconds: 22
Epoch [1/1], Step [9528/13804], Loss: 2.4600, Perplexity: 11.7044, time_taken_in_seconds: 22
Epoch [1/1], Step [9529/13804], Loss: 2.6165, Perplexity: 13.6883, time_taken_in_seconds: 23
Epoch [1/1], Step [9530/13804], Loss: 2.3879, Perplexity: 10.8901, time_taken_in_seconds: 24
Epoch [1/1], Step [9531/13804], Loss: 2.6146, Perplexity: 13.6616, time_taken_in_seconds: 25
Epoch [1/1], Step [9532/13804], Loss: 2.8419, Perplexity: 17.1476, time_taken_in_seconds: 26
Epoch [1/1], Step [9533/13804], Loss: 2.6486, Perplexity: 14.1348, time_taken_in_seconds: 27
Epoch [1/1], Step [9534/13804], Loss: 2.6437, Perplexity: 14.0658, time_taken_in_seconds: 27
Epoch [1/1], Step [9535/13804], Loss: 2.7590, Perplexity: 15.7840, time_taken_in_seconds: 28
Epoch [1/1], Step [9536/13804], Loss: 2.5303, Perplexity: 12.5568, time_taken_in_seconds: 29
Epoch [1/1], Step [9537/13804], Loss: 3.0226, Perplexity: 20.5443, time_taken_in_seconds: 30
Epoch [1/1], Step [9538/13804], Loss: 2.6727, Perplexity: 14.4791, time_taken_in_seconds: 31
Epoch [1/1], Step [9539/13804], Loss: 2.3793, Perplexity: 10.7976, time_taken_in_seconds: 32
Epoch [1/1], Step [9540/13804], Loss: 2.7083, Perplexity: 15.0035, time_taken_in_seconds: 33
Epoch [1/1], Step [9541/13804], Loss: 3.2635, Perplexity: 26.1408, time_taken_in_seconds: 33
Epoch [1/1], Step [9542/13804], Loss: 2.2718, Perplexity: 9.6965, time_taken_in_seconds: 34
Epoch [1/1], Step [9543/13804], Loss: 2.6266, Perplexity: 13.8270, time_taken_in_seconds: 35
Epoch [1/1], Step [9544/13804], Loss: 2.3924, Perplexity: 10.9394, time_taken_in_seconds: 36
Epoch [1/1], Step [9545/13804], Loss: 2.6185, Perplexity: 13.7156, time_taken_in_seconds: 37
Epoch [1/1], Step [9546/13804], Loss: 2.2195, Perplexity: 9.2024, time_taken_in_seconds: 37
Epoch [1/1], Step [9547/13804], Loss: 2.5807, Perplexity: 13.2065, time_taken_in_seconds: 38
Epoch [1/1], Step [9548/13804], Loss: 2.4258, Perplexity: 11.3108, time_taken_in_seconds: 39
Epoch [1/1], Step [9549/13804], Loss: 3.1006, Perplexity: 22.2119, time_taken_in_seconds: 40
Epoch [1/1], Step [9550/13804], Loss: 2.8626, Perplexity: 17.5071, time_taken_in_seconds: 41
Epoch [1/1], Step [9551/13804], Loss: 3.1814, Perplexity: 24.0812, time_taken_in_seconds: 42
Epoch [1/1], Step [9552/13804], Loss: 2.5792, Perplexity: 13.1870, time_taken_in_seconds: 42
Epoch [1/1], Step [9553/13804], Loss: 2.6288, Perplexity: 13.8566, time_taken_in_seconds: 43
Epoch [1/1], Step [9554/13804], Loss: 2.4721, Perplexity: 11.8472, time_taken_in_seconds: 44
Epoch [1/1], Step [9555/13804], Loss: 3.4346, Perplexity: 31.0194, time_taken_in_seconds: 45
Epoch [1/1], Step [9556/13804], Loss: 2.6895, Perplexity: 14.7239, time_taken_in_seconds: 46
Epoch [1/1], Step [9557/13804], Loss: 3.0109, Perplexity: 20.3055, time_taken_in_seconds: 47
Epoch [1/1], Step [9558/13804], Loss: 3.1397, Perplexity: 23.0964, time_taken_in_seconds: 47
Epoch [1/1], Step [9559/13804], Loss: 2.8795, Perplexity: 17.8052, time_taken_in_seconds: 48
Epoch [1/1], Step [9560/13804], Loss: 2.8202, Perplexity: 16.7800, time_taken_in_seconds: 49
Epoch [1/1], Step [9561/13804], Loss: 2.3083, Perplexity: 10.0575, time_taken_in_seconds: 50
Epoch [1/1], Step [9562/13804], Loss: 2.3416, Perplexity: 10.3980, time_taken_in_seconds: 51
Epoch [1/1], Step [9563/13804], Loss: 2.2260, Perplexity: 9.2632, time_taken_in_seconds: 51
Epoch [1/1], Step [9564/13804], Loss: 2.6084, Perplexity: 13.5777, time_taken_in_seconds: 52
Epoch [1/1], Step [9565/13804], Loss: 2.6246, Perplexity: 13.7985, time_taken_in_seconds: 53
Epoch [1/1], Step [9566/13804], Loss: 2.5238, Perplexity: 12.4758, time_taken_in_seconds: 54
Epoch [1/1], Step [9567/13804], Loss: 2.6527, Perplexity: 14.1918, time_taken_in_seconds: 55
Epoch [1/1], Step [9568/13804], Loss: 2.7276, Perplexity: 15.2964, time_taken_in_seconds: 56
Epoch [1/1], Step [9569/13804], Loss: 2.4888, Perplexity: 12.0463, time_taken_in_seconds: 56
Epoch [1/1], Step [9570/13804], Loss: 3.1833, Perplexity: 24.1251, time_taken_in_seconds: 57
Epoch [1/1], Step [9571/13804], Loss: 2.5291, Perplexity: 12.5426, time_taken_in_seconds: 58
Epoch [1/1], Step [9572/13804], Loss: 2.4188, Perplexity: 11.2323, time_taken_in_seconds: 59
Epoch [1/1], Step [9573/13804], Loss: 3.1791, Perplexity: 24.0250, time_taken_in_seconds: 60
Epoch [1/1], Step [9574/13804], Loss: 2.6930, Perplexity: 14.7755, time_taken_in_seconds: 61
Epoch [1/1], Step [9575/13804], Loss: 2.3324, Perplexity: 10.3027, time_taken_in_seconds: 61
Epoch [1/1], Step [9576/13804], Loss: 2.5617, Perplexity: 12.9581, time_taken_in_seconds: 62
Epoch [1/1], Step [9577/13804], Loss: 2.6942, Perplexity: 14.7943, time_taken_in_seconds: 63
Epoch [1/1], Step [9578/13804], Loss: 2.9741, Perplexity: 19.5728, time_taken_in_seconds: 64
Epoch [1/1], Step [9579/13804], Loss: 2.4301, Perplexity: 11.3598, time_taken_in_seconds: 65
Epoch [1/1], Step [9580/13804], Loss: 2.5329, Perplexity: 12.5897, time_taken_in_seconds: 65
Epoch [1/1], Step [9581/13804], Loss: 2.7704, Perplexity: 15.9658, time_taken_in_seconds: 66
Epoch [1/1], Step [9582/13804], Loss: 2.3359, Perplexity: 10.3385, time_taken_in_seconds: 67
Epoch [1/1], Step [9583/13804], Loss: 2.6440, Perplexity: 14.0693, time_taken_in_seconds: 68
Epoch [1/1], Step [9584/13804], Loss: 2.8272, Perplexity: 16.8978, time_taken_in_seconds: 69
Epoch [1/1], Step [9585/13804], Loss: 2.7192, Perplexity: 15.1684, time_taken_in_seconds: 69
Epoch [1/1], Step [9586/13804], Loss: 3.4182, Perplexity: 30.5142, time_taken_in_seconds: 70
Epoch [1/1], Step [9587/13804], Loss: 2.3926, Perplexity: 10.9422, time_taken_in_seconds: 71
Epoch [1/1], Step [9588/13804], Loss: 2.4551, Perplexity: 11.6473, time_taken_in_seconds: 72
Epoch [1/1], Step [9589/13804], Loss: 2.5550, Perplexity: 12.8717, time_taken_in_seconds: 73
Epoch [1/1], Step [9590/13804], Loss: 2.3218, Perplexity: 10.1938, time_taken_in_seconds: 74
Epoch [1/1], Step [9591/13804], Loss: 2.8046, Perplexity: 16.5207, time_taken_in_seconds: 74
Epoch [1/1], Step [9592/13804], Loss: 2.6827, Perplexity: 14.6249, time_taken_in_seconds: 75
Epoch [1/1], Step [9593/13804], Loss: 2.5547, Perplexity: 12.8675, time_taken_in_seconds: 76
Epoch [1/1], Step [9594/13804], Loss: 2.5653, Perplexity: 13.0052, time_taken_in_seconds: 77
Epoch [1/1], Step [9595/13804], Loss: 2.4789, Perplexity: 11.9282, time_taken_in_seconds: 78
Epoch [1/1], Step [9596/13804], Loss: 2.4802, Perplexity: 11.9442, time_taken_in_seconds: 79
Epoch [1/1], Step [9597/13804], Loss: 2.8176, Perplexity: 16.7372, time_taken_in_seconds: 79
Epoch [1/1], Step [9598/13804], Loss: 2.6265, Perplexity: 13.8259, time_taken_in_seconds: 80
Epoch [1/1], Step [9599/13804], Loss: 2.1902, Perplexity: 8.9372, time_taken_in_seconds: 81
Epoch [1/1], Step [9600/13804], Loss: 2.2620, Perplexity: 9.6024, time_taken_in_seconds: 82
Epoch [1/1], Step [9601/13804], Loss: 2.3076, Perplexity: 10.0499, time_taken_in_seconds: 0
Epoch [1/1], Step [9602/13804], Loss: 2.8646, Perplexity: 17.5424, time_taken_in_seconds: 1
Epoch [1/1], Step [9603/13804], Loss: 2.4567, Perplexity: 11.6660, time_taken_in_seconds: 2
Epoch [1/1], Step [9604/13804], Loss: 2.3632, Perplexity: 10.6247, time_taken_in_seconds: 3
Epoch [1/1], Step [9605/13804], Loss: 2.3172, Perplexity: 10.1473, time_taken_in_seconds: 4
Epoch [1/1], Step [9606/13804], Loss: 2.5985, Perplexity: 13.4436, time_taken_in_seconds: 4
Epoch [1/1], Step [9607/13804], Loss: 2.6797, Perplexity: 14.5807, time_taken_in_seconds: 5
Epoch [1/1], Step [9608/13804], Loss: 2.6821, Perplexity: 14.6155, time_taken_in_seconds: 6
Epoch [1/1], Step [9609/13804], Loss: 2.0790, Perplexity: 7.9962, time_taken_in_seconds: 7
Epoch [1/1], Step [9610/13804], Loss: 2.5630, Perplexity: 12.9752, time_taken_in_seconds: 8
Epoch [1/1], Step [9611/13804], Loss: 2.5395, Perplexity: 12.6733, time_taken_in_seconds: 9
Epoch [1/1], Step [9612/13804], Loss: 2.3008, Perplexity: 9.9818, time_taken_in_seconds: 10
Epoch [1/1], Step [9613/13804], Loss: 2.5696, Perplexity: 13.0611, time_taken_in_seconds: 10
Epoch [1/1], Step [9614/13804], Loss: 2.2030, Perplexity: 9.0520, time_taken_in_seconds: 11
Epoch [1/1], Step [9615/13804], Loss: 2.3459, Perplexity: 10.4427, time_taken_in_seconds: 12
Epoch [1/1], Step [9616/13804], Loss: 2.5113, Perplexity: 12.3212, time_taken_in_seconds: 13
Epoch [1/1], Step [9617/13804], Loss: 2.4680, Perplexity: 11.7993, time_taken_in_seconds: 14
Epoch [1/1], Step [9618/13804], Loss: 2.3237, Perplexity: 10.2134, time_taken_in_seconds: 14
Epoch [1/1], Step [9619/13804], Loss: 3.0928, Perplexity: 22.0394, time_taken_in_seconds: 15
Epoch [1/1], Step [9620/13804], Loss: 2.6516, Perplexity: 14.1767, time_taken_in_seconds: 16
Epoch [1/1], Step [9621/13804], Loss: 2.3520, Perplexity: 10.5065, time_taken_in_seconds: 17
Epoch [1/1], Step [9622/13804], Loss: 2.4460, Perplexity: 11.5421, time_taken_in_seconds: 18
Epoch [1/1], Step [9623/13804], Loss: 2.8966, Perplexity: 18.1125, time_taken_in_seconds: 19
Epoch [1/1], Step [9624/13804], Loss: 2.5678, Perplexity: 13.0371, time_taken_in_seconds: 19
Epoch [1/1], Step [9625/13804], Loss: 2.9153, Perplexity: 18.4541, time_taken_in_seconds: 20
Epoch [1/1], Step [9626/13804], Loss: 2.4628, Perplexity: 11.7378, time_taken_in_seconds: 21
Epoch [1/1], Step [9627/13804], Loss: 2.5058, Perplexity: 12.2535, time_taken_in_seconds: 22
Epoch [1/1], Step [9628/13804], Loss: 2.5749, Perplexity: 13.1305, time_taken_in_seconds: 23
Epoch [1/1], Step [9629/13804], Loss: 2.5812, Perplexity: 13.2133, time_taken_in_seconds: 23
Epoch [1/1], Step [9630/13804], Loss: 3.1843, Perplexity: 24.1515, time_taken_in_seconds: 24
Epoch [1/1], Step [9631/13804], Loss: 3.0772, Perplexity: 21.6974, time_taken_in_seconds: 25
Epoch [1/1], Step [9632/13804], Loss: 2.4917, Perplexity: 12.0816, time_taken_in_seconds: 26
Epoch [1/1], Step [9633/13804], Loss: 2.7386, Perplexity: 15.4656, time_taken_in_seconds: 27
Epoch [1/1], Step [9634/13804], Loss: 2.2758, Perplexity: 9.7355, time_taken_in_seconds: 28
Epoch [1/1], Step [9635/13804], Loss: 2.4729, Perplexity: 11.8571, time_taken_in_seconds: 28
Epoch [1/1], Step [9636/13804], Loss: 2.9388, Perplexity: 18.8937, time_taken_in_seconds: 29
Epoch [1/1], Step [9637/13804], Loss: 2.5839, Perplexity: 13.2493, time_taken_in_seconds: 30
Epoch [1/1], Step [9638/13804], Loss: 2.4724, Perplexity: 11.8506, time_taken_in_seconds: 31
Epoch [1/1], Step [9639/13804], Loss: 2.6369, Perplexity: 13.9693, time_taken_in_seconds: 32
Epoch [1/1], Step [9640/13804], Loss: 2.3104, Perplexity: 10.0783, time_taken_in_seconds: 32
Epoch [1/1], Step [9641/13804], Loss: 2.4607, Perplexity: 11.7133, time_taken_in_seconds: 33
Epoch [1/1], Step [9642/13804], Loss: 2.7647, Perplexity: 15.8746, time_taken_in_seconds: 34
Epoch [1/1], Step [9643/13804], Loss: 2.6322, Perplexity: 13.9046, time_taken_in_seconds: 35
Epoch [1/1], Step [9644/13804], Loss: 2.6204, Perplexity: 13.7411, time_taken_in_seconds: 36
Epoch [1/1], Step [9645/13804], Loss: 2.7091, Perplexity: 15.0153, time_taken_in_seconds: 36
Epoch [1/1], Step [9646/13804], Loss: 2.9447, Perplexity: 19.0042, time_taken_in_seconds: 37
Epoch [1/1], Step [9647/13804], Loss: 2.5524, Perplexity: 12.8375, time_taken_in_seconds: 38
Epoch [1/1], Step [9648/13804], Loss: 2.7618, Perplexity: 15.8282, time_taken_in_seconds: 39
Epoch [1/1], Step [9649/13804], Loss: 2.5754, Perplexity: 13.1364, time_taken_in_seconds: 40
Epoch [1/1], Step [9650/13804], Loss: 2.7786, Perplexity: 16.0967, time_taken_in_seconds: 41
Epoch [1/1], Step [9651/13804], Loss: 2.2608, Perplexity: 9.5910, time_taken_in_seconds: 41
Epoch [1/1], Step [9652/13804], Loss: 2.5396, Perplexity: 12.6745, time_taken_in_seconds: 42
Epoch [1/1], Step [9653/13804], Loss: 2.8605, Perplexity: 17.4707, time_taken_in_seconds: 43
Epoch [1/1], Step [9654/13804], Loss: 2.5554, Perplexity: 12.8770, time_taken_in_seconds: 44
Epoch [1/1], Step [9655/13804], Loss: 2.3254, Perplexity: 10.2306, time_taken_in_seconds: 45
Epoch [1/1], Step [9656/13804], Loss: 2.4574, Perplexity: 11.6741, time_taken_in_seconds: 45
Epoch [1/1], Step [9657/13804], Loss: 2.5684, Perplexity: 13.0449, time_taken_in_seconds: 46
Epoch [1/1], Step [9658/13804], Loss: 2.3061, Perplexity: 10.0348, time_taken_in_seconds: 47
Epoch [1/1], Step [9659/13804], Loss: 2.5000, Perplexity: 12.1830, time_taken_in_seconds: 48
Epoch [1/1], Step [9660/13804], Loss: 2.4215, Perplexity: 11.2629, time_taken_in_seconds: 49
Epoch [1/1], Step [9661/13804], Loss: 2.7589, Perplexity: 15.7822, time_taken_in_seconds: 50
Epoch [1/1], Step [9662/13804], Loss: 3.0542, Perplexity: 21.2040, time_taken_in_seconds: 50
Epoch [1/1], Step [9663/13804], Loss: 2.6838, Perplexity: 14.6405, time_taken_in_seconds: 51
Epoch [1/1], Step [9664/13804], Loss: 2.6533, Perplexity: 14.2009, time_taken_in_seconds: 52
Epoch [1/1], Step [9665/13804], Loss: 2.5463, Perplexity: 12.7600, time_taken_in_seconds: 53
Epoch [1/1], Step [9666/13804], Loss: 2.6382, Perplexity: 13.9877, time_taken_in_seconds: 54
Epoch [1/1], Step [9667/13804], Loss: 2.3672, Perplexity: 10.6671, time_taken_in_seconds: 54
Epoch [1/1], Step [9668/13804], Loss: 2.7705, Perplexity: 15.9659, time_taken_in_seconds: 55
Epoch [1/1], Step [9669/13804], Loss: 2.7724, Perplexity: 15.9973, time_taken_in_seconds: 56
Epoch [1/1], Step [9670/13804], Loss: 2.5216, Perplexity: 12.4480, time_taken_in_seconds: 57
Epoch [1/1], Step [9671/13804], Loss: 2.7311, Perplexity: 15.3503, time_taken_in_seconds: 58
Epoch [1/1], Step [9672/13804], Loss: 2.1588, Perplexity: 8.6606, time_taken_in_seconds: 59
Epoch [1/1], Step [9673/13804], Loss: 2.7646, Perplexity: 15.8729, time_taken_in_seconds: 59
Epoch [1/1], Step [9674/13804], Loss: 2.9989, Perplexity: 20.0639, time_taken_in_seconds: 60
Epoch [1/1], Step [9675/13804], Loss: 2.4083, Perplexity: 11.1152, time_taken_in_seconds: 61
Epoch [1/1], Step [9676/13804], Loss: 2.5936, Perplexity: 13.3778, time_taken_in_seconds: 62
Epoch [1/1], Step [9677/13804], Loss: 2.5192, Perplexity: 12.4184, time_taken_in_seconds: 63
Epoch [1/1], Step [9678/13804], Loss: 2.3865, Perplexity: 10.8755, time_taken_in_seconds: 63
Epoch [1/1], Step [9679/13804], Loss: 2.6092, Perplexity: 13.5878, time_taken_in_seconds: 64
Epoch [1/1], Step [9680/13804], Loss: 2.5806, Perplexity: 13.2053, time_taken_in_seconds: 65
Epoch [1/1], Step [9681/13804], Loss: 2.4055, Perplexity: 11.0843, time_taken_in_seconds: 66
Epoch [1/1], Step [9682/13804], Loss: 2.5089, Perplexity: 12.2913, time_taken_in_seconds: 67
Epoch [1/1], Step [9683/13804], Loss: 2.2974, Perplexity: 9.9483, time_taken_in_seconds: 68
Epoch [1/1], Step [9684/13804], Loss: 2.4681, Perplexity: 11.7996, time_taken_in_seconds: 69
Epoch [1/1], Step [9685/13804], Loss: 2.7487, Perplexity: 15.6228, time_taken_in_seconds: 69
Epoch [1/1], Step [9686/13804], Loss: 2.5645, Perplexity: 12.9946, time_taken_in_seconds: 70
Epoch [1/1], Step [9687/13804], Loss: 2.6190, Perplexity: 13.7225, time_taken_in_seconds: 71
Epoch [1/1], Step [9688/13804], Loss: 2.5143, Perplexity: 12.3580, time_taken_in_seconds: 72
Epoch [1/1], Step [9689/13804], Loss: 2.4714, Perplexity: 11.8386, time_taken_in_seconds: 73
Epoch [1/1], Step [9690/13804], Loss: 2.6806, Perplexity: 14.5943, time_taken_in_seconds: 73
Epoch [1/1], Step [9691/13804], Loss: 3.1241, Perplexity: 22.7398, time_taken_in_seconds: 74
Epoch [1/1], Step [9692/13804], Loss: 2.3312, Perplexity: 10.2905, time_taken_in_seconds: 75
Epoch [1/1], Step [9693/13804], Loss: 2.4080, Perplexity: 11.1113, time_taken_in_seconds: 76
Epoch [1/1], Step [9694/13804], Loss: 2.2223, Perplexity: 9.2287, time_taken_in_seconds: 77
Epoch [1/1], Step [9695/13804], Loss: 2.3586, Perplexity: 10.5764, time_taken_in_seconds: 78
Epoch [1/1], Step [9696/13804], Loss: 2.5657, Perplexity: 13.0100, time_taken_in_seconds: 78
Epoch [1/1], Step [9697/13804], Loss: 2.6428, Perplexity: 14.0528, time_taken_in_seconds: 79
Epoch [1/1], Step [9698/13804], Loss: 2.4231, Perplexity: 11.2811, time_taken_in_seconds: 80
Epoch [1/1], Step [9699/13804], Loss: 2.3463, Perplexity: 10.4470, time_taken_in_seconds: 81
Epoch [1/1], Step [9700/13804], Loss: 2.3224, Perplexity: 10.2004, time_taken_in_seconds: 82
Epoch [1/1], Step [9701/13804], Loss: 2.3650, Perplexity: 10.6439, time_taken_in_seconds: 0
Epoch [1/1], Step [9702/13804], Loss: 2.3690, Perplexity: 10.6865, time_taken_in_seconds: 1
Epoch [1/1], Step [9703/13804], Loss: 2.4502, Perplexity: 11.5902, time_taken_in_seconds: 2
Epoch [1/1], Step [9704/13804], Loss: 2.4761, Perplexity: 11.8943, time_taken_in_seconds: 3
Epoch [1/1], Step [9705/13804], Loss: 2.6114, Perplexity: 13.6187, time_taken_in_seconds: 4
Epoch [1/1], Step [9706/13804], Loss: 2.4953, Perplexity: 12.1252, time_taken_in_seconds: 4
Epoch [1/1], Step [9707/13804], Loss: 2.4892, Perplexity: 12.0521, time_taken_in_seconds: 5
Epoch [1/1], Step [9708/13804], Loss: 2.1911, Perplexity: 8.9449, time_taken_in_seconds: 6
Epoch [1/1], Step [9709/13804], Loss: 3.0727, Perplexity: 21.5992, time_taken_in_seconds: 7
Epoch [1/1], Step [9710/13804], Loss: 2.9857, Perplexity: 19.8007, time_taken_in_seconds: 8
Epoch [1/1], Step [9711/13804], Loss: 2.4057, Perplexity: 11.0864, time_taken_in_seconds: 9
Epoch [1/1], Step [9712/13804], Loss: 2.2273, Perplexity: 9.2748, time_taken_in_seconds: 9
Epoch [1/1], Step [9713/13804], Loss: 2.8995, Perplexity: 18.1651, time_taken_in_seconds: 10
Epoch [1/1], Step [9714/13804], Loss: 2.3534, Perplexity: 10.5218, time_taken_in_seconds: 11
Epoch [1/1], Step [9715/13804], Loss: 2.5948, Perplexity: 13.3945, time_taken_in_seconds: 12
Epoch [1/1], Step [9716/13804], Loss: 2.9252, Perplexity: 18.6371, time_taken_in_seconds: 13
Epoch [1/1], Step [9717/13804], Loss: 2.8205, Perplexity: 16.7850, time_taken_in_seconds: 14
Epoch [1/1], Step [9718/13804], Loss: 2.2940, Perplexity: 9.9141, time_taken_in_seconds: 14
Epoch [1/1], Step [9719/13804], Loss: 2.7654, Perplexity: 15.8854, time_taken_in_seconds: 15
Epoch [1/1], Step [9720/13804], Loss: 2.6114, Perplexity: 13.6185, time_taken_in_seconds: 16
Epoch [1/1], Step [9721/13804], Loss: 2.5856, Perplexity: 13.2712, time_taken_in_seconds: 17
Epoch [1/1], Step [9722/13804], Loss: 2.6086, Perplexity: 13.5795, time_taken_in_seconds: 18
Epoch [1/1], Step [9723/13804], Loss: 3.1224, Perplexity: 22.7016, time_taken_in_seconds: 19
Epoch [1/1], Step [9724/13804], Loss: 3.1146, Perplexity: 22.5242, time_taken_in_seconds: 19
Epoch [1/1], Step [9725/13804], Loss: 2.6212, Perplexity: 13.7521, time_taken_in_seconds: 20
Epoch [1/1], Step [9726/13804], Loss: 2.8700, Perplexity: 17.6372, time_taken_in_seconds: 21
Epoch [1/1], Step [9727/13804], Loss: 2.1724, Perplexity: 8.7794, time_taken_in_seconds: 22
Epoch [1/1], Step [9728/13804], Loss: 2.5866, Perplexity: 13.2841, time_taken_in_seconds: 23
Epoch [1/1], Step [9729/13804], Loss: 3.3072, Perplexity: 27.3073, time_taken_in_seconds: 23
Epoch [1/1], Step [9730/13804], Loss: 3.0114, Perplexity: 20.3166, time_taken_in_seconds: 24
Epoch [1/1], Step [9731/13804], Loss: 2.5731, Perplexity: 13.1067, time_taken_in_seconds: 25
Epoch [1/1], Step [9732/13804], Loss: 3.1052, Perplexity: 22.3127, time_taken_in_seconds: 26
Epoch [1/1], Step [9733/13804], Loss: 2.4944, Perplexity: 12.1146, time_taken_in_seconds: 27
Epoch [1/1], Step [9734/13804], Loss: 2.3447, Perplexity: 10.4300, time_taken_in_seconds: 28
Epoch [1/1], Step [9735/13804], Loss: 2.8728, Perplexity: 17.6859, time_taken_in_seconds: 28
Epoch [1/1], Step [9736/13804], Loss: 2.6393, Perplexity: 14.0036, time_taken_in_seconds: 29
Epoch [1/1], Step [9737/13804], Loss: 2.6081, Perplexity: 13.5726, time_taken_in_seconds: 30
Epoch [1/1], Step [9738/13804], Loss: 2.4175, Perplexity: 11.2183, time_taken_in_seconds: 31
Epoch [1/1], Step [9739/13804], Loss: 2.4821, Perplexity: 11.9668, time_taken_in_seconds: 32
Epoch [1/1], Step [9740/13804], Loss: 2.0896, Perplexity: 8.0820, time_taken_in_seconds: 33
Epoch [1/1], Step [9741/13804], Loss: 2.2396, Perplexity: 9.3898, time_taken_in_seconds: 33
Epoch [1/1], Step [9742/13804], Loss: 2.6372, Perplexity: 13.9743, time_taken_in_seconds: 34
Epoch [1/1], Step [9743/13804], Loss: 2.6189, Perplexity: 13.7200, time_taken_in_seconds: 35
Epoch [1/1], Step [9744/13804], Loss: 2.5058, Perplexity: 12.2530, time_taken_in_seconds: 36
Epoch [1/1], Step [9745/13804], Loss: 2.7382, Perplexity: 15.4587, time_taken_in_seconds: 37
Epoch [1/1], Step [9746/13804], Loss: 2.7593, Perplexity: 15.7887, time_taken_in_seconds: 38
Epoch [1/1], Step [9747/13804], Loss: 2.3035, Perplexity: 10.0089, time_taken_in_seconds: 38
Epoch [1/1], Step [9748/13804], Loss: 3.1860, Perplexity: 24.1908, time_taken_in_seconds: 39
Epoch [1/1], Step [9749/13804], Loss: 2.4168, Perplexity: 11.2100, time_taken_in_seconds: 40
Epoch [1/1], Step [9750/13804], Loss: 2.6132, Perplexity: 13.6430, time_taken_in_seconds: 41
Epoch [1/1], Step [9751/13804], Loss: 2.4536, Perplexity: 11.6307, time_taken_in_seconds: 42
Epoch [1/1], Step [9752/13804], Loss: 2.7711, Perplexity: 15.9758, time_taken_in_seconds: 43
Epoch [1/1], Step [9753/13804], Loss: 2.4313, Perplexity: 11.3736, time_taken_in_seconds: 43
Epoch [1/1], Step [9754/13804], Loss: 2.6164, Perplexity: 13.6866, time_taken_in_seconds: 44
Epoch [1/1], Step [9755/13804], Loss: 2.4892, Perplexity: 12.0521, time_taken_in_seconds: 45
Epoch [1/1], Step [9756/13804], Loss: 2.7450, Perplexity: 15.5642, time_taken_in_seconds: 46
Epoch [1/1], Step [9757/13804], Loss: 2.5285, Perplexity: 12.5348, time_taken_in_seconds: 47
Epoch [1/1], Step [9758/13804], Loss: 2.4622, Perplexity: 11.7308, time_taken_in_seconds: 48
Epoch [1/1], Step [9759/13804], Loss: 2.3780, Perplexity: 10.7836, time_taken_in_seconds: 49
Epoch [1/1], Step [9760/13804], Loss: 2.4197, Perplexity: 11.2426, time_taken_in_seconds: 49
Epoch [1/1], Step [9761/13804], Loss: 2.5149, Perplexity: 12.3650, time_taken_in_seconds: 50
Epoch [1/1], Step [9762/13804], Loss: 2.7274, Perplexity: 15.2935, time_taken_in_seconds: 51
Epoch [1/1], Step [9763/13804], Loss: 2.4927, Perplexity: 12.0935, time_taken_in_seconds: 52
Epoch [1/1], Step [9764/13804], Loss: 2.5272, Perplexity: 12.5187, time_taken_in_seconds: 53
Epoch [1/1], Step [9765/13804], Loss: 2.5874, Perplexity: 13.2952, time_taken_in_seconds: 53
Epoch [1/1], Step [9766/13804], Loss: 2.7518, Perplexity: 15.6701, time_taken_in_seconds: 54
Epoch [1/1], Step [9767/13804], Loss: 2.7768, Perplexity: 16.0669, time_taken_in_seconds: 55
Epoch [1/1], Step [9768/13804], Loss: 2.6181, Perplexity: 13.7093, time_taken_in_seconds: 56
Epoch [1/1], Step [9769/13804], Loss: 2.5981, Perplexity: 13.4382, time_taken_in_seconds: 57
Epoch [1/1], Step [9770/13804], Loss: 2.3245, Perplexity: 10.2212, time_taken_in_seconds: 58
Epoch [1/1], Step [9771/13804], Loss: 2.4758, Perplexity: 11.8916, time_taken_in_seconds: 58
Epoch [1/1], Step [9772/13804], Loss: 2.7493, Perplexity: 15.6320, time_taken_in_seconds: 59
Epoch [1/1], Step [9773/13804], Loss: 2.4508, Perplexity: 11.5982, time_taken_in_seconds: 60
Epoch [1/1], Step [9774/13804], Loss: 2.5573, Perplexity: 12.9007, time_taken_in_seconds: 61
Epoch [1/1], Step [9775/13804], Loss: 2.4760, Perplexity: 11.8935, time_taken_in_seconds: 62
Epoch [1/1], Step [9776/13804], Loss: 2.2028, Perplexity: 9.0500, time_taken_in_seconds: 62
Epoch [1/1], Step [9777/13804], Loss: 2.6877, Perplexity: 14.6978, time_taken_in_seconds: 63
Epoch [1/1], Step [9778/13804], Loss: 3.1499, Perplexity: 23.3333, time_taken_in_seconds: 64
Epoch [1/1], Step [9779/13804], Loss: 2.6969, Perplexity: 14.8343, time_taken_in_seconds: 65
Epoch [1/1], Step [9780/13804], Loss: 2.9055, Perplexity: 18.2745, time_taken_in_seconds: 66
Epoch [1/1], Step [9781/13804], Loss: 2.3492, Perplexity: 10.4769, time_taken_in_seconds: 67
Epoch [1/1], Step [9782/13804], Loss: 2.8833, Perplexity: 17.8730, time_taken_in_seconds: 67
Epoch [1/1], Step [9783/13804], Loss: 2.4299, Perplexity: 11.3580, time_taken_in_seconds: 68
Epoch [1/1], Step [9784/13804], Loss: 2.8634, Perplexity: 17.5212, time_taken_in_seconds: 69
Epoch [1/1], Step [9785/13804], Loss: 2.2247, Perplexity: 9.2507, time_taken_in_seconds: 70
Epoch [1/1], Step [9786/13804], Loss: 2.6021, Perplexity: 13.4914, time_taken_in_seconds: 71
Epoch [1/1], Step [9787/13804], Loss: 2.2570, Perplexity: 9.5544, time_taken_in_seconds: 71
Epoch [1/1], Step [9788/13804], Loss: 2.8791, Perplexity: 17.7980, time_taken_in_seconds: 72
Epoch [1/1], Step [9789/13804], Loss: 2.4665, Perplexity: 11.7809, time_taken_in_seconds: 73
Epoch [1/1], Step [9790/13804], Loss: 2.5517, Perplexity: 12.8290, time_taken_in_seconds: 74
Epoch [1/1], Step [9791/13804], Loss: 2.6031, Perplexity: 13.5055, time_taken_in_seconds: 75
Epoch [1/1], Step [9792/13804], Loss: 2.5576, Perplexity: 12.9048, time_taken_in_seconds: 76
Epoch [1/1], Step [9793/13804], Loss: 2.3229, Perplexity: 10.2051, time_taken_in_seconds: 76
Epoch [1/1], Step [9794/13804], Loss: 2.6284, Perplexity: 13.8520, time_taken_in_seconds: 77
Epoch [1/1], Step [9795/13804], Loss: 2.4631, Perplexity: 11.7417, time_taken_in_seconds: 78
Epoch [1/1], Step [9796/13804], Loss: 2.1834, Perplexity: 8.8760, time_taken_in_seconds: 79
Epoch [1/1], Step [9797/13804], Loss: 2.6930, Perplexity: 14.7764, time_taken_in_seconds: 80
Epoch [1/1], Step [9798/13804], Loss: 2.5891, Perplexity: 13.3182, time_taken_in_seconds: 80
Epoch [1/1], Step [9799/13804], Loss: 2.8256, Perplexity: 16.8711, time_taken_in_seconds: 81
Epoch [1/1], Step [9800/13804], Loss: 2.6295, Perplexity: 13.8665, time_taken_in_seconds: 82
Epoch [1/1], Step [9801/13804], Loss: 2.5203, Perplexity: 12.4319, time_taken_in_seconds: 0
Epoch [1/1], Step [9802/13804], Loss: 2.7253, Perplexity: 15.2609, time_taken_in_seconds: 1
Epoch [1/1], Step [9803/13804], Loss: 2.4716, Perplexity: 11.8408, time_taken_in_seconds: 2
Epoch [1/1], Step [9804/13804], Loss: 2.5828, Perplexity: 13.2339, time_taken_in_seconds: 3
Epoch [1/1], Step [9805/13804], Loss: 2.4275, Perplexity: 11.3303, time_taken_in_seconds: 4
Epoch [1/1], Step [9806/13804], Loss: 2.7538, Perplexity: 15.7016, time_taken_in_seconds: 4
Epoch [1/1], Step [9807/13804], Loss: 2.8573, Perplexity: 17.4136, time_taken_in_seconds: 5
Epoch [1/1], Step [9808/13804], Loss: 2.6388, Perplexity: 13.9969, time_taken_in_seconds: 6
Epoch [1/1], Step [9809/13804], Loss: 2.5752, Perplexity: 13.1341, time_taken_in_seconds: 7
Epoch [1/1], Step [9810/13804], Loss: 2.7105, Perplexity: 15.0372, time_taken_in_seconds: 8
Epoch [1/1], Step [9811/13804], Loss: 2.6018, Perplexity: 13.4884, time_taken_in_seconds: 8
Epoch [1/1], Step [9812/13804], Loss: 2.2553, Perplexity: 9.5382, time_taken_in_seconds: 9
Epoch [1/1], Step [9813/13804], Loss: 2.4004, Perplexity: 11.0280, time_taken_in_seconds: 10
Epoch [1/1], Step [9814/13804], Loss: 2.4789, Perplexity: 11.9280, time_taken_in_seconds: 11
Epoch [1/1], Step [9815/13804], Loss: 2.5316, Perplexity: 12.5736, time_taken_in_seconds: 12
Epoch [1/1], Step [9816/13804], Loss: 2.7777, Perplexity: 16.0820, time_taken_in_seconds: 12
Epoch [1/1], Step [9817/13804], Loss: 2.8678, Perplexity: 17.5989, time_taken_in_seconds: 13
Epoch [1/1], Step [9818/13804], Loss: 2.4669, Perplexity: 11.7858, time_taken_in_seconds: 14
Epoch [1/1], Step [9819/13804], Loss: 2.7872, Perplexity: 16.2347, time_taken_in_seconds: 15
Epoch [1/1], Step [9820/13804], Loss: 2.5421, Perplexity: 12.7061, time_taken_in_seconds: 16
Epoch [1/1], Step [9821/13804], Loss: 2.7804, Perplexity: 16.1253, time_taken_in_seconds: 17
Epoch [1/1], Step [9822/13804], Loss: 2.4118, Perplexity: 11.1540, time_taken_in_seconds: 17
Epoch [1/1], Step [9823/13804], Loss: 2.4424, Perplexity: 11.5012, time_taken_in_seconds: 18
Epoch [1/1], Step [9824/13804], Loss: 2.5924, Perplexity: 13.3617, time_taken_in_seconds: 19
Epoch [1/1], Step [9825/13804], Loss: 2.5520, Perplexity: 12.8329, time_taken_in_seconds: 20
Epoch [1/1], Step [9826/13804], Loss: 2.8458, Perplexity: 17.2145, time_taken_in_seconds: 21
Epoch [1/1], Step [9827/13804], Loss: 2.4262, Perplexity: 11.3159, time_taken_in_seconds: 22
Epoch [1/1], Step [9828/13804], Loss: 2.5968, Perplexity: 13.4209, time_taken_in_seconds: 22
Epoch [1/1], Step [9829/13804], Loss: 2.7652, Perplexity: 15.8822, time_taken_in_seconds: 23
Epoch [1/1], Step [9830/13804], Loss: 2.4438, Perplexity: 11.5172, time_taken_in_seconds: 24
Epoch [1/1], Step [9831/13804], Loss: 2.4153, Perplexity: 11.1930, time_taken_in_seconds: 25
Epoch [1/1], Step [9832/13804], Loss: 2.5187, Perplexity: 12.4122, time_taken_in_seconds: 26
Epoch [1/1], Step [9833/13804], Loss: 2.6043, Perplexity: 13.5222, time_taken_in_seconds: 27
Epoch [1/1], Step [9834/13804], Loss: 2.4683, Perplexity: 11.8022, time_taken_in_seconds: 27
Epoch [1/1], Step [9835/13804], Loss: 3.2394, Perplexity: 25.5194, time_taken_in_seconds: 28
Epoch [1/1], Step [9836/13804], Loss: 2.2932, Perplexity: 9.9063, time_taken_in_seconds: 29
Epoch [1/1], Step [9837/13804], Loss: 3.0134, Perplexity: 20.3564, time_taken_in_seconds: 30
Epoch [1/1], Step [9838/13804], Loss: 3.3398, Perplexity: 28.2141, time_taken_in_seconds: 31
Epoch [1/1], Step [9839/13804], Loss: 3.2500, Perplexity: 25.7916, time_taken_in_seconds: 31
Epoch [1/1], Step [9840/13804], Loss: 2.1489, Perplexity: 8.5751, time_taken_in_seconds: 32
Epoch [1/1], Step [9841/13804], Loss: 2.4279, Perplexity: 11.3348, time_taken_in_seconds: 33
Epoch [1/1], Step [9842/13804], Loss: 2.2608, Perplexity: 9.5909, time_taken_in_seconds: 34
Epoch [1/1], Step [9843/13804], Loss: 2.4451, Perplexity: 11.5319, time_taken_in_seconds: 35
Epoch [1/1], Step [9844/13804], Loss: 2.7250, Perplexity: 15.2561, time_taken_in_seconds: 35
Epoch [1/1], Step [9845/13804], Loss: 2.7632, Perplexity: 15.8511, time_taken_in_seconds: 36
Epoch [1/1], Step [9846/13804], Loss: 2.9190, Perplexity: 18.5230, time_taken_in_seconds: 37
Epoch [1/1], Step [9847/13804], Loss: 2.5748, Perplexity: 13.1289, time_taken_in_seconds: 38
Epoch [1/1], Step [9848/13804], Loss: 2.3569, Perplexity: 10.5577, time_taken_in_seconds: 39
Epoch [1/1], Step [9849/13804], Loss: 2.6565, Perplexity: 14.2461, time_taken_in_seconds: 40
Epoch [1/1], Step [9850/13804], Loss: 2.5629, Perplexity: 12.9733, time_taken_in_seconds: 40
Epoch [1/1], Step [9851/13804], Loss: 2.8368, Perplexity: 17.0613, time_taken_in_seconds: 41
Epoch [1/1], Step [9852/13804], Loss: 2.3139, Perplexity: 10.1140, time_taken_in_seconds: 42
Epoch [1/1], Step [9853/13804], Loss: 2.3591, Perplexity: 10.5813, time_taken_in_seconds: 43
Epoch [1/1], Step [9854/13804], Loss: 2.6133, Perplexity: 13.6447, time_taken_in_seconds: 44
Epoch [1/1], Step [9855/13804], Loss: 2.6435, Perplexity: 14.0626, time_taken_in_seconds: 44
Epoch [1/1], Step [9856/13804], Loss: 2.5779, Perplexity: 13.1694, time_taken_in_seconds: 45
Epoch [1/1], Step [9857/13804], Loss: 2.0317, Perplexity: 7.6271, time_taken_in_seconds: 46
Epoch [1/1], Step [9858/13804], Loss: 2.2206, Perplexity: 9.2132, time_taken_in_seconds: 47
Epoch [1/1], Step [9859/13804], Loss: 2.8413, Perplexity: 17.1376, time_taken_in_seconds: 48
Epoch [1/1], Step [9860/13804], Loss: 2.6542, Perplexity: 14.2136, time_taken_in_seconds: 49
Epoch [1/1], Step [9861/13804], Loss: 2.6971, Perplexity: 14.8372, time_taken_in_seconds: 49
Epoch [1/1], Step [9862/13804], Loss: 2.4160, Perplexity: 11.2010, time_taken_in_seconds: 50
Epoch [1/1], Step [9863/13804], Loss: 2.5044, Perplexity: 12.2357, time_taken_in_seconds: 51
Epoch [1/1], Step [9864/13804], Loss: 2.5560, Perplexity: 12.8836, time_taken_in_seconds: 52
Epoch [1/1], Step [9865/13804], Loss: 2.3086, Perplexity: 10.0599, time_taken_in_seconds: 53
Epoch [1/1], Step [9866/13804], Loss: 2.6351, Perplexity: 13.9446, time_taken_in_seconds: 53
Epoch [1/1], Step [9867/13804], Loss: 2.4282, Perplexity: 11.3386, time_taken_in_seconds: 54
Epoch [1/1], Step [9868/13804], Loss: 2.5882, Perplexity: 13.3053, time_taken_in_seconds: 55
Epoch [1/1], Step [9869/13804], Loss: 2.1222, Perplexity: 8.3496, time_taken_in_seconds: 56
Epoch [1/1], Step [9870/13804], Loss: 2.9213, Perplexity: 18.5654, time_taken_in_seconds: 57
Epoch [1/1], Step [9871/13804], Loss: 2.5357, Perplexity: 12.6256, time_taken_in_seconds: 57
Epoch [1/1], Step [9872/13804], Loss: 2.7047, Perplexity: 14.9495, time_taken_in_seconds: 58
Epoch [1/1], Step [9873/13804], Loss: 2.8147, Perplexity: 16.6888, time_taken_in_seconds: 59
Epoch [1/1], Step [9874/13804], Loss: 2.6307, Perplexity: 13.8832, time_taken_in_seconds: 60
Epoch [1/1], Step [9875/13804], Loss: 2.7683, Perplexity: 15.9312, time_taken_in_seconds: 61
Epoch [1/1], Step [9876/13804], Loss: 2.4673, Perplexity: 11.7905, time_taken_in_seconds: 61
Epoch [1/1], Step [9877/13804], Loss: 2.7797, Perplexity: 16.1135, time_taken_in_seconds: 62
Epoch [1/1], Step [9878/13804], Loss: 2.4028, Perplexity: 11.0544, time_taken_in_seconds: 63
Epoch [1/1], Step [9879/13804], Loss: 2.6242, Perplexity: 13.7933, time_taken_in_seconds: 64
Epoch [1/1], Step [9880/13804], Loss: 2.6053, Perplexity: 13.5352, time_taken_in_seconds: 65
Epoch [1/1], Step [9881/13804], Loss: 2.4014, Perplexity: 11.0383, time_taken_in_seconds: 66
Epoch [1/1], Step [9882/13804], Loss: 2.5389, Perplexity: 12.6652, time_taken_in_seconds: 66
Epoch [1/1], Step [9883/13804], Loss: 2.3928, Perplexity: 10.9443, time_taken_in_seconds: 67
Epoch [1/1], Step [9884/13804], Loss: 3.2290, Perplexity: 25.2532, time_taken_in_seconds: 68
Epoch [1/1], Step [9885/13804], Loss: 2.7741, Perplexity: 16.0238, time_taken_in_seconds: 69
Epoch [1/1], Step [9886/13804], Loss: 2.7282, Perplexity: 15.3054, time_taken_in_seconds: 70
Epoch [1/1], Step [9887/13804], Loss: 2.7715, Perplexity: 15.9831, time_taken_in_seconds: 70
Epoch [1/1], Step [9888/13804], Loss: 2.5476, Perplexity: 12.7761, time_taken_in_seconds: 71
Epoch [1/1], Step [9889/13804], Loss: 2.6740, Perplexity: 14.4974, time_taken_in_seconds: 72
Epoch [1/1], Step [9890/13804], Loss: 2.5836, Perplexity: 13.2444, time_taken_in_seconds: 73
Epoch [1/1], Step [9891/13804], Loss: 2.7183, Perplexity: 15.1541, time_taken_in_seconds: 74
Epoch [1/1], Step [9892/13804], Loss: 2.5584, Perplexity: 12.9153, time_taken_in_seconds: 74
Epoch [1/1], Step [9893/13804], Loss: 2.9005, Perplexity: 18.1828, time_taken_in_seconds: 75
Epoch [1/1], Step [9894/13804], Loss: 2.9090, Perplexity: 18.3376, time_taken_in_seconds: 76
Epoch [1/1], Step [9895/13804], Loss: 2.4276, Perplexity: 11.3320, time_taken_in_seconds: 77
Epoch [1/1], Step [9896/13804], Loss: 2.6926, Perplexity: 14.7696, time_taken_in_seconds: 78
Epoch [1/1], Step [9897/13804], Loss: 2.3036, Perplexity: 10.0098, time_taken_in_seconds: 79
Epoch [1/1], Step [9898/13804], Loss: 2.4439, Perplexity: 11.5176, time_taken_in_seconds: 79
Epoch [1/1], Step [9899/13804], Loss: 2.3407, Perplexity: 10.3884, time_taken_in_seconds: 80
Epoch [1/1], Step [9900/13804], Loss: 2.6374, Perplexity: 13.9772, time_taken_in_seconds: 81
Epoch [1/1], Step [9901/13804], Loss: 2.8421, Perplexity: 17.1524, time_taken_in_seconds: 0
Epoch [1/1], Step [9902/13804], Loss: 2.6329, Perplexity: 13.9134, time_taken_in_seconds: 1
Epoch [1/1], Step [9903/13804], Loss: 2.6992, Perplexity: 14.8673, time_taken_in_seconds: 2
Epoch [1/1], Step [9904/13804], Loss: 2.7722, Perplexity: 15.9939, time_taken_in_seconds: 3
Epoch [1/1], Step [9905/13804], Loss: 2.5184, Perplexity: 12.4083, time_taken_in_seconds: 4
Epoch [1/1], Step [9906/13804], Loss: 2.6224, Perplexity: 13.7693, time_taken_in_seconds: 5
Epoch [1/1], Step [9907/13804], Loss: 2.8699, Perplexity: 17.6345, time_taken_in_seconds: 6
Epoch [1/1], Step [9908/13804], Loss: 2.3439, Perplexity: 10.4220, time_taken_in_seconds: 6
Epoch [1/1], Step [9909/13804], Loss: 2.7384, Perplexity: 15.4625, time_taken_in_seconds: 7
Epoch [1/1], Step [9910/13804], Loss: 2.5587, Perplexity: 12.9186, time_taken_in_seconds: 8
Epoch [1/1], Step [9911/13804], Loss: 2.4124, Perplexity: 11.1602, time_taken_in_seconds: 9
Epoch [1/1], Step [9912/13804], Loss: 2.5630, Perplexity: 12.9743, time_taken_in_seconds: 10
Epoch [1/1], Step [9913/13804], Loss: 2.6828, Perplexity: 14.6265, time_taken_in_seconds: 10
Epoch [1/1], Step [9914/13804], Loss: 2.5289, Perplexity: 12.5395, time_taken_in_seconds: 11
Epoch [1/1], Step [9915/13804], Loss: 2.4332, Perplexity: 11.3957, time_taken_in_seconds: 12
Epoch [1/1], Step [9916/13804], Loss: 2.5191, Perplexity: 12.4177, time_taken_in_seconds: 13
Epoch [1/1], Step [9917/13804], Loss: 2.7533, Perplexity: 15.6950, time_taken_in_seconds: 14
Epoch [1/1], Step [9918/13804], Loss: 2.4080, Perplexity: 11.1122, time_taken_in_seconds: 14
Epoch [1/1], Step [9919/13804], Loss: 3.0250, Perplexity: 20.5934, time_taken_in_seconds: 15
Epoch [1/1], Step [9920/13804], Loss: 2.4902, Perplexity: 12.0633, time_taken_in_seconds: 16
Epoch [1/1], Step [9921/13804], Loss: 2.4752, Perplexity: 11.8837, time_taken_in_seconds: 17
Epoch [1/1], Step [9922/13804], Loss: 2.5441, Perplexity: 12.7312, time_taken_in_seconds: 18
Epoch [1/1], Step [9923/13804], Loss: 3.1796, Perplexity: 24.0360, time_taken_in_seconds: 19
Epoch [1/1], Step [9924/13804], Loss: 2.7596, Perplexity: 15.7934, time_taken_in_seconds: 19
Epoch [1/1], Step [9925/13804], Loss: 2.2408, Perplexity: 9.4008, time_taken_in_seconds: 20
Epoch [1/1], Step [9926/13804], Loss: 2.5977, Perplexity: 13.4323, time_taken_in_seconds: 21
Epoch [1/1], Step [9927/13804], Loss: 2.4450, Perplexity: 11.5310, time_taken_in_seconds: 22
Epoch [1/1], Step [9928/13804], Loss: 2.3571, Perplexity: 10.5599, time_taken_in_seconds: 23
Epoch [1/1], Step [9929/13804], Loss: 2.6492, Perplexity: 14.1434, time_taken_in_seconds: 24
Epoch [1/1], Step [9930/13804], Loss: 2.5723, Perplexity: 13.0960, time_taken_in_seconds: 24
Epoch [1/1], Step [9931/13804], Loss: 2.5079, Perplexity: 12.2788, time_taken_in_seconds: 25
Epoch [1/1], Step [9932/13804], Loss: 2.3663, Perplexity: 10.6583, time_taken_in_seconds: 26
Epoch [1/1], Step [9933/13804], Loss: 2.3764, Perplexity: 10.7657, time_taken_in_seconds: 27
Epoch [1/1], Step [9934/13804], Loss: 2.4149, Perplexity: 11.1888, time_taken_in_seconds: 28
Epoch [1/1], Step [9935/13804], Loss: 2.6707, Perplexity: 14.4494, time_taken_in_seconds: 28
Epoch [1/1], Step [9936/13804], Loss: 2.9834, Perplexity: 19.7539, time_taken_in_seconds: 29
Epoch [1/1], Step [9937/13804], Loss: 2.7232, Perplexity: 15.2285, time_taken_in_seconds: 30
Epoch [1/1], Step [9938/13804], Loss: 2.5431, Perplexity: 12.7188, time_taken_in_seconds: 31
Epoch [1/1], Step [9939/13804], Loss: 2.4114, Perplexity: 11.1497, time_taken_in_seconds: 32
Epoch [1/1], Step [9940/13804], Loss: 2.4282, Perplexity: 11.3390, time_taken_in_seconds: 33
Epoch [1/1], Step [9941/13804], Loss: 2.5581, Perplexity: 12.9114, time_taken_in_seconds: 33
Epoch [1/1], Step [9942/13804], Loss: 2.1440, Perplexity: 8.5332, time_taken_in_seconds: 34
Epoch [1/1], Step [9943/13804], Loss: 2.3176, Perplexity: 10.1515, time_taken_in_seconds: 35
Epoch [1/1], Step [9944/13804], Loss: 2.0989, Perplexity: 8.1569, time_taken_in_seconds: 36
Epoch [1/1], Step [9945/13804], Loss: 2.5215, Perplexity: 12.4476, time_taken_in_seconds: 37
Epoch [1/1], Step [9946/13804], Loss: 2.7924, Perplexity: 16.3195, time_taken_in_seconds: 37
Epoch [1/1], Step [9947/13804], Loss: 2.7759, Perplexity: 16.0529, time_taken_in_seconds: 38
Epoch [1/1], Step [9948/13804], Loss: 2.5103, Perplexity: 12.3082, time_taken_in_seconds: 39
Epoch [1/1], Step [9949/13804], Loss: 2.6843, Perplexity: 14.6483, time_taken_in_seconds: 40
Epoch [1/1], Step [9950/13804], Loss: 2.5296, Perplexity: 12.5481, time_taken_in_seconds: 41
Epoch [1/1], Step [9951/13804], Loss: 2.3780, Perplexity: 10.7838, time_taken_in_seconds: 42
Epoch [1/1], Step [9952/13804], Loss: 2.9521, Perplexity: 19.1459, time_taken_in_seconds: 42
Epoch [1/1], Step [9953/13804], Loss: 2.5919, Perplexity: 13.3557, time_taken_in_seconds: 43
Epoch [1/1], Step [9954/13804], Loss: 2.4107, Perplexity: 11.1421, time_taken_in_seconds: 44
Epoch [1/1], Step [9955/13804], Loss: 2.9435, Perplexity: 18.9830, time_taken_in_seconds: 45
Epoch [1/1], Step [9956/13804], Loss: 2.4671, Perplexity: 11.7879, time_taken_in_seconds: 46
Epoch [1/1], Step [9957/13804], Loss: 2.6081, Perplexity: 13.5735, time_taken_in_seconds: 46
Epoch [1/1], Step [9958/13804], Loss: 2.4250, Perplexity: 11.3024, time_taken_in_seconds: 47
Epoch [1/1], Step [9959/13804], Loss: 2.4193, Perplexity: 11.2378, time_taken_in_seconds: 48
Epoch [1/1], Step [9960/13804], Loss: 2.7260, Perplexity: 15.2714, time_taken_in_seconds: 49
Epoch [1/1], Step [9961/13804], Loss: 2.6177, Perplexity: 13.7046, time_taken_in_seconds: 50
Epoch [1/1], Step [9962/13804], Loss: 2.5605, Perplexity: 12.9428, time_taken_in_seconds: 51
Epoch [1/1], Step [9963/13804], Loss: 2.5146, Perplexity: 12.3619, time_taken_in_seconds: 51
Epoch [1/1], Step [9964/13804], Loss: 2.2722, Perplexity: 9.7011, time_taken_in_seconds: 52
Epoch [1/1], Step [9965/13804], Loss: 2.5465, Perplexity: 12.7626, time_taken_in_seconds: 53
Epoch [1/1], Step [9966/13804], Loss: 2.6961, Perplexity: 14.8219, time_taken_in_seconds: 54
Epoch [1/1], Step [9967/13804], Loss: 2.4824, Perplexity: 11.9697, time_taken_in_seconds: 55
Epoch [1/1], Step [9968/13804], Loss: 2.6070, Perplexity: 13.5583, time_taken_in_seconds: 55
Epoch [1/1], Step [9969/13804], Loss: 2.2929, Perplexity: 9.9031, time_taken_in_seconds: 56
Epoch [1/1], Step [9970/13804], Loss: 2.4740, Perplexity: 11.8703, time_taken_in_seconds: 57
Epoch [1/1], Step [9971/13804], Loss: 2.3534, Perplexity: 10.5208, time_taken_in_seconds: 58
Epoch [1/1], Step [9972/13804], Loss: 2.5109, Perplexity: 12.3159, time_taken_in_seconds: 59
Epoch [1/1], Step [9973/13804], Loss: 2.4481, Perplexity: 11.5667, time_taken_in_seconds: 59
Epoch [1/1], Step [9974/13804], Loss: 2.4075, Perplexity: 11.1060, time_taken_in_seconds: 60
Epoch [1/1], Step [9975/13804], Loss: 2.7055, Perplexity: 14.9617, time_taken_in_seconds: 61
Epoch [1/1], Step [9976/13804], Loss: 2.8858, Perplexity: 17.9171, time_taken_in_seconds: 62
Epoch [1/1], Step [9977/13804], Loss: 2.8915, Perplexity: 18.0206, time_taken_in_seconds: 63
Epoch [1/1], Step [9978/13804], Loss: 2.5249, Perplexity: 12.4898, time_taken_in_seconds: 64
Epoch [1/1], Step [9979/13804], Loss: 2.4591, Perplexity: 11.6946, time_taken_in_seconds: 65
Epoch [1/1], Step [9980/13804], Loss: 2.8870, Perplexity: 17.9397, time_taken_in_seconds: 65
Epoch [1/1], Step [9981/13804], Loss: 2.7745, Perplexity: 16.0298, time_taken_in_seconds: 66
Epoch [1/1], Step [9982/13804], Loss: 2.6372, Perplexity: 13.9738, time_taken_in_seconds: 67
Epoch [1/1], Step [9983/13804], Loss: 2.4900, Perplexity: 12.0612, time_taken_in_seconds: 68
Epoch [1/1], Step [9984/13804], Loss: 2.5509, Perplexity: 12.8184, time_taken_in_seconds: 69
Epoch [1/1], Step [9985/13804], Loss: 2.4769, Perplexity: 11.9045, time_taken_in_seconds: 70
Epoch [1/1], Step [9986/13804], Loss: 3.0980, Perplexity: 22.1542, time_taken_in_seconds: 70
Epoch [1/1], Step [9987/13804], Loss: 2.6006, Perplexity: 13.4725, time_taken_in_seconds: 71
Epoch [1/1], Step [9988/13804], Loss: 2.5293, Perplexity: 12.5449, time_taken_in_seconds: 72
Epoch [1/1], Step [9989/13804], Loss: 2.6576, Perplexity: 14.2626, time_taken_in_seconds: 73
Epoch [1/1], Step [9990/13804], Loss: 2.5710, Perplexity: 13.0793, time_taken_in_seconds: 74
Epoch [1/1], Step [9991/13804], Loss: 2.3363, Perplexity: 10.3432, time_taken_in_seconds: 74
Epoch [1/1], Step [9992/13804], Loss: 2.1449, Perplexity: 8.5412, time_taken_in_seconds: 75
Epoch [1/1], Step [9993/13804], Loss: 2.5088, Perplexity: 12.2899, time_taken_in_seconds: 76
Epoch [1/1], Step [9994/13804], Loss: 2.5908, Perplexity: 13.3401, time_taken_in_seconds: 77
Epoch [1/1], Step [9995/13804], Loss: 2.5833, Perplexity: 13.2405, time_taken_in_seconds: 78
Epoch [1/1], Step [9996/13804], Loss: 3.3021, Perplexity: 27.1699, time_taken_in_seconds: 79
Epoch [1/1], Step [9997/13804], Loss: 2.5501, Perplexity: 12.8082, time_taken_in_seconds: 79
Epoch [1/1], Step [9998/13804], Loss: 2.6120, Perplexity: 13.6268, time_taken_in_seconds: 80
Epoch [1/1], Step [9999/13804], Loss: 2.5011, Perplexity: 12.1962, time_taken_in_seconds: 81
Epoch [1/1], Step [10000/13804], Loss: 2.5815, Perplexity: 13.2165, time_taken_in_seconds: 82
Epoch [1/1], Step [10001/13804], Loss: 2.8573, Perplexity: 17.4148, time_taken_in_seconds: 0
Epoch [1/1], Step [10002/13804], Loss: 2.6872, Perplexity: 14.6908, time_taken_in_seconds: 1
Epoch [1/1], Step [10003/13804], Loss: 2.7498, Perplexity: 15.6402, time_taken_in_seconds: 2
Epoch [1/1], Step [10004/13804], Loss: 2.4278, Perplexity: 11.3339, time_taken_in_seconds: 3
Epoch [1/1], Step [10005/13804], Loss: 2.7882, Perplexity: 16.2515, time_taken_in_seconds: 4
Epoch [1/1], Step [10006/13804], Loss: 2.7748, Perplexity: 16.0361, time_taken_in_seconds: 4
Epoch [1/1], Step [10007/13804], Loss: 2.8495, Perplexity: 17.2792, time_taken_in_seconds: 5
Epoch [1/1], Step [10008/13804], Loss: 2.4492, Perplexity: 11.5790, time_taken_in_seconds: 6
Epoch [1/1], Step [10009/13804], Loss: 2.4730, Perplexity: 11.8582, time_taken_in_seconds: 7
Epoch [1/1], Step [10010/13804], Loss: 2.6813, Perplexity: 14.6043, time_taken_in_seconds: 8
Epoch [1/1], Step [10011/13804], Loss: 2.4330, Perplexity: 11.3933, time_taken_in_seconds: 8
Epoch [1/1], Step [10012/13804], Loss: 2.5241, Perplexity: 12.4798, time_taken_in_seconds: 9
Epoch [1/1], Step [10013/13804], Loss: 2.1519, Perplexity: 8.6015, time_taken_in_seconds: 10
Epoch [1/1], Step [10014/13804], Loss: 2.6720, Perplexity: 14.4695, time_taken_in_seconds: 11
Epoch [1/1], Step [10015/13804], Loss: 2.2317, Perplexity: 9.3158, time_taken_in_seconds: 12
Epoch [1/1], Step [10016/13804], Loss: 2.5861, Perplexity: 13.2774, time_taken_in_seconds: 13
Epoch [1/1], Step [10017/13804], Loss: 2.6973, Perplexity: 14.8402, time_taken_in_seconds: 13
Epoch [1/1], Step [10018/13804], Loss: 2.4439, Perplexity: 11.5179, time_taken_in_seconds: 14
Epoch [1/1], Step [10019/13804], Loss: 2.3378, Perplexity: 10.3579, time_taken_in_seconds: 15
Epoch [1/1], Step [10020/13804], Loss: 2.7424, Perplexity: 15.5238, time_taken_in_seconds: 16
Epoch [1/1], Step [10021/13804], Loss: 2.6447, Perplexity: 14.0787, time_taken_in_seconds: 17
Epoch [1/1], Step [10022/13804], Loss: 2.1661, Perplexity: 8.7239, time_taken_in_seconds: 17
Epoch [1/1], Step [10023/13804], Loss: 2.8375, Perplexity: 17.0725, time_taken_in_seconds: 18
Epoch [1/1], Step [10024/13804], Loss: 2.7506, Perplexity: 15.6525, time_taken_in_seconds: 19
Epoch [1/1], Step [10025/13804], Loss: 2.3527, Perplexity: 10.5138, time_taken_in_seconds: 20
Epoch [1/1], Step [10026/13804], Loss: 2.6102, Perplexity: 13.6011, time_taken_in_seconds: 21
Epoch [1/1], Step [10027/13804], Loss: 2.7008, Perplexity: 14.8910, time_taken_in_seconds: 22
Epoch [1/1], Step [10028/13804], Loss: 2.5347, Perplexity: 12.6122, time_taken_in_seconds: 22
Epoch [1/1], Step [10029/13804], Loss: 2.4226, Perplexity: 11.2747, time_taken_in_seconds: 23
Epoch [1/1], Step [10030/13804], Loss: 3.3416, Perplexity: 28.2637, time_taken_in_seconds: 24
Epoch [1/1], Step [10031/13804], Loss: 2.5763, Perplexity: 13.1484, time_taken_in_seconds: 25
Epoch [1/1], Step [10032/13804], Loss: 2.3703, Perplexity: 10.7003, time_taken_in_seconds: 26
Epoch [1/1], Step [10033/13804], Loss: 2.3018, Perplexity: 9.9923, time_taken_in_seconds: 26
Epoch [1/1], Step [10034/13804], Loss: 2.3558, Perplexity: 10.5469, time_taken_in_seconds: 27
Epoch [1/1], Step [10035/13804], Loss: 2.6812, Perplexity: 14.6025, time_taken_in_seconds: 28
Epoch [1/1], Step [10036/13804], Loss: 2.4295, Perplexity: 11.3529, time_taken_in_seconds: 29
Epoch [1/1], Step [10037/13804], Loss: 2.6069, Perplexity: 13.5576, time_taken_in_seconds: 30
Epoch [1/1], Step [10038/13804], Loss: 2.5389, Perplexity: 12.6652, time_taken_in_seconds: 31
Epoch [1/1], Step [10039/13804], Loss: 2.8860, Perplexity: 17.9208, time_taken_in_seconds: 31
Epoch [1/1], Step [10040/13804], Loss: 2.4029, Perplexity: 11.0549, time_taken_in_seconds: 32
Epoch [1/1], Step [10041/13804], Loss: 2.7302, Perplexity: 15.3356, time_taken_in_seconds: 33
Epoch [1/1], Step [10042/13804], Loss: 2.6797, Perplexity: 14.5814, time_taken_in_seconds: 34
Epoch [1/1], Step [10043/13804], Loss: 2.4127, Perplexity: 11.1643, time_taken_in_seconds: 35
Epoch [1/1], Step [10044/13804], Loss: 2.7574, Perplexity: 15.7589, time_taken_in_seconds: 35
Epoch [1/1], Step [10045/13804], Loss: 2.4207, Perplexity: 11.2543, time_taken_in_seconds: 36
Epoch [1/1], Step [10046/13804], Loss: 2.9081, Perplexity: 18.3225, time_taken_in_seconds: 37
Epoch [1/1], Step [10047/13804], Loss: 2.5142, Perplexity: 12.3564, time_taken_in_seconds: 38
Epoch [1/1], Step [10048/13804], Loss: 2.5434, Perplexity: 12.7231, time_taken_in_seconds: 39
Epoch [1/1], Step [10049/13804], Loss: 2.6825, Perplexity: 14.6215, time_taken_in_seconds: 40
Epoch [1/1], Step [10050/13804], Loss: 2.3623, Perplexity: 10.6151, time_taken_in_seconds: 40
Epoch [1/1], Step [10051/13804], Loss: 2.3957, Perplexity: 10.9758, time_taken_in_seconds: 41
Epoch [1/1], Step [10052/13804], Loss: 2.6873, Perplexity: 14.6915, time_taken_in_seconds: 42
Epoch [1/1], Step [10053/13804], Loss: 2.6585, Perplexity: 14.2748, time_taken_in_seconds: 43
Epoch [1/1], Step [10054/13804], Loss: 2.6832, Perplexity: 14.6325, time_taken_in_seconds: 44
Epoch [1/1], Step [10055/13804], Loss: 2.2828, Perplexity: 9.8046, time_taken_in_seconds: 45
Epoch [1/1], Step [10056/13804], Loss: 2.4805, Perplexity: 11.9470, time_taken_in_seconds: 45
Epoch [1/1], Step [10057/13804], Loss: 2.6261, Perplexity: 13.8193, time_taken_in_seconds: 46
Epoch [1/1], Step [10058/13804], Loss: 2.7031, Perplexity: 14.9261, time_taken_in_seconds: 47
Epoch [1/1], Step [10059/13804], Loss: 2.9471, Perplexity: 19.0499, time_taken_in_seconds: 48
Epoch [1/1], Step [10060/13804], Loss: 2.3574, Perplexity: 10.5637, time_taken_in_seconds: 49
Epoch [1/1], Step [10061/13804], Loss: 2.2237, Perplexity: 9.2416, time_taken_in_seconds: 49
Epoch [1/1], Step [10062/13804], Loss: 2.4993, Perplexity: 12.1735, time_taken_in_seconds: 50
Epoch [1/1], Step [10063/13804], Loss: 2.6216, Perplexity: 13.7571, time_taken_in_seconds: 51
Epoch [1/1], Step [10064/13804], Loss: 2.6592, Perplexity: 14.2848, time_taken_in_seconds: 52
Epoch [1/1], Step [10065/13804], Loss: 2.7580, Perplexity: 15.7690, time_taken_in_seconds: 53
Epoch [1/1], Step [10066/13804], Loss: 2.5403, Perplexity: 12.6834, time_taken_in_seconds: 54
Epoch [1/1], Step [10067/13804], Loss: 2.4507, Perplexity: 11.5968, time_taken_in_seconds: 54
Epoch [1/1], Step [10068/13804], Loss: 2.0350, Perplexity: 7.6520, time_taken_in_seconds: 55
Epoch [1/1], Step [10069/13804], Loss: 2.5749, Perplexity: 13.1294, time_taken_in_seconds: 56
Epoch [1/1], Step [10070/13804], Loss: 2.7981, Perplexity: 16.4129, time_taken_in_seconds: 57
Epoch [1/1], Step [10071/13804], Loss: 2.7066, Perplexity: 14.9777, time_taken_in_seconds: 58
Epoch [1/1], Step [10072/13804], Loss: 2.6344, Perplexity: 13.9352, time_taken_in_seconds: 58
Epoch [1/1], Step [10073/13804], Loss: 2.6321, Perplexity: 13.9027, time_taken_in_seconds: 59
Epoch [1/1], Step [10074/13804], Loss: 2.5665, Perplexity: 13.0207, time_taken_in_seconds: 60
Epoch [1/1], Step [10075/13804], Loss: 2.3942, Perplexity: 10.9596, time_taken_in_seconds: 61
Epoch [1/1], Step [10076/13804], Loss: 2.6389, Perplexity: 13.9972, time_taken_in_seconds: 62
Epoch [1/1], Step [10077/13804], Loss: 2.3849, Perplexity: 10.8579, time_taken_in_seconds: 63
Epoch [1/1], Step [10078/13804], Loss: 2.3381, Perplexity: 10.3616, time_taken_in_seconds: 63
Epoch [1/1], Step [10079/13804], Loss: 2.5464, Perplexity: 12.7608, time_taken_in_seconds: 64
Epoch [1/1], Step [10080/13804], Loss: 2.5905, Perplexity: 13.3363, time_taken_in_seconds: 65
Epoch [1/1], Step [10081/13804], Loss: 2.6497, Perplexity: 14.1499, time_taken_in_seconds: 66
Epoch [1/1], Step [10082/13804], Loss: 2.5698, Perplexity: 13.0637, time_taken_in_seconds: 67
Epoch [1/1], Step [10083/13804], Loss: 2.5800, Perplexity: 13.1972, time_taken_in_seconds: 67
Epoch [1/1], Step [10084/13804], Loss: 2.5618, Perplexity: 12.9592, time_taken_in_seconds: 68
Epoch [1/1], Step [10085/13804], Loss: 2.3768, Perplexity: 10.7703, time_taken_in_seconds: 69
Epoch [1/1], Step [10086/13804], Loss: 2.4012, Perplexity: 11.0367, time_taken_in_seconds: 70
Epoch [1/1], Step [10087/13804], Loss: 2.7963, Perplexity: 16.3845, time_taken_in_seconds: 71
Epoch [1/1], Step [10088/13804], Loss: 2.7555, Perplexity: 15.7295, time_taken_in_seconds: 72
Epoch [1/1], Step [10089/13804], Loss: 2.7103, Perplexity: 15.0341, time_taken_in_seconds: 72
Epoch [1/1], Step [10090/13804], Loss: 2.6219, Perplexity: 13.7620, time_taken_in_seconds: 73
Epoch [1/1], Step [10091/13804], Loss: 2.6277, Perplexity: 13.8415, time_taken_in_seconds: 74
Epoch [1/1], Step [10092/13804], Loss: 2.4244, Perplexity: 11.2955, time_taken_in_seconds: 75
Epoch [1/1], Step [10093/13804], Loss: 2.4212, Perplexity: 11.2597, time_taken_in_seconds: 76
Epoch [1/1], Step [10094/13804], Loss: 2.5622, Perplexity: 12.9639, time_taken_in_seconds: 76
Epoch [1/1], Step [10095/13804], Loss: 2.5074, Perplexity: 12.2725, time_taken_in_seconds: 77
Epoch [1/1], Step [10096/13804], Loss: 2.4905, Perplexity: 12.0668, time_taken_in_seconds: 78
Epoch [1/1], Step [10097/13804], Loss: 2.5377, Perplexity: 12.6505, time_taken_in_seconds: 79
Epoch [1/1], Step [10098/13804], Loss: 2.4464, Perplexity: 11.5469, time_taken_in_seconds: 80
Epoch [1/1], Step [10099/13804], Loss: 2.6127, Perplexity: 13.6364, time_taken_in_seconds: 81
Epoch [1/1], Step [10100/13804], Loss: 2.2555, Perplexity: 9.5398, time_taken_in_seconds: 81
Epoch [1/1], Step [10101/13804], Loss: 2.4423, Perplexity: 11.4997, time_taken_in_seconds: 0
Epoch [1/1], Step [10102/13804], Loss: 2.6842, Perplexity: 14.6459, time_taken_in_seconds: 1
Epoch [1/1], Step [10103/13804], Loss: 2.7605, Perplexity: 15.8075, time_taken_in_seconds: 2
Epoch [1/1], Step [10104/13804], Loss: 2.8072, Perplexity: 16.5639, time_taken_in_seconds: 3
Epoch [1/1], Step [10105/13804], Loss: 3.0866, Perplexity: 21.9021, time_taken_in_seconds: 4
Epoch [1/1], Step [10106/13804], Loss: 2.2051, Perplexity: 9.0715, time_taken_in_seconds: 4
Epoch [1/1], Step [10107/13804], Loss: 2.5927, Perplexity: 13.3663, time_taken_in_seconds: 5
Epoch [1/1], Step [10108/13804], Loss: 2.8553, Perplexity: 17.3800, time_taken_in_seconds: 6
Epoch [1/1], Step [10109/13804], Loss: 2.2197, Perplexity: 9.2046, time_taken_in_seconds: 7
Epoch [1/1], Step [10110/13804], Loss: 2.7096, Perplexity: 15.0228, time_taken_in_seconds: 8
Epoch [1/1], Step [10111/13804], Loss: 2.4253, Perplexity: 11.3061, time_taken_in_seconds: 9
Epoch [1/1], Step [10112/13804], Loss: 2.6708, Perplexity: 14.4516, time_taken_in_seconds: 9
Epoch [1/1], Step [10113/13804], Loss: 2.6297, Perplexity: 13.8702, time_taken_in_seconds: 10
Epoch [1/1], Step [10114/13804], Loss: 2.5728, Perplexity: 13.1028, time_taken_in_seconds: 11
Epoch [1/1], Step [10115/13804], Loss: 2.5366, Perplexity: 12.6369, time_taken_in_seconds: 12
Epoch [1/1], Step [10116/13804], Loss: 2.4250, Perplexity: 11.3022, time_taken_in_seconds: 13
Epoch [1/1], Step [10117/13804], Loss: 2.8718, Perplexity: 17.6693, time_taken_in_seconds: 13
Epoch [1/1], Step [10118/13804], Loss: 2.6560, Perplexity: 14.2388, time_taken_in_seconds: 14
Epoch [1/1], Step [10119/13804], Loss: 2.1855, Perplexity: 8.8950, time_taken_in_seconds: 15
Epoch [1/1], Step [10120/13804], Loss: 2.6482, Perplexity: 14.1286, time_taken_in_seconds: 16
Epoch [1/1], Step [10121/13804], Loss: 2.5285, Perplexity: 12.5341, time_taken_in_seconds: 17
Epoch [1/1], Step [10122/13804], Loss: 2.5979, Perplexity: 13.4352, time_taken_in_seconds: 17
Epoch [1/1], Step [10123/13804], Loss: 2.4394, Perplexity: 11.4657, time_taken_in_seconds: 18
Epoch [1/1], Step [10124/13804], Loss: 3.0905, Perplexity: 21.9881, time_taken_in_seconds: 19
Epoch [1/1], Step [10125/13804], Loss: 2.3724, Perplexity: 10.7230, time_taken_in_seconds: 20
Epoch [1/1], Step [10126/13804], Loss: 2.7593, Perplexity: 15.7881, time_taken_in_seconds: 21
Epoch [1/1], Step [10127/13804], Loss: 2.6186, Perplexity: 13.7169, time_taken_in_seconds: 22
Epoch [1/1], Step [10128/13804], Loss: 2.3338, Perplexity: 10.3166, time_taken_in_seconds: 23
Epoch [1/1], Step [10129/13804], Loss: 2.4674, Perplexity: 11.7922, time_taken_in_seconds: 23
Epoch [1/1], Step [10130/13804], Loss: 3.0296, Perplexity: 20.6898, time_taken_in_seconds: 24
Epoch [1/1], Step [10131/13804], Loss: 2.8802, Perplexity: 17.8187, time_taken_in_seconds: 25
Epoch [1/1], Step [10132/13804], Loss: 2.3519, Perplexity: 10.5053, time_taken_in_seconds: 26
Epoch [1/1], Step [10133/13804], Loss: 2.4993, Perplexity: 12.1745, time_taken_in_seconds: 27
Epoch [1/1], Step [10134/13804], Loss: 2.5129, Perplexity: 12.3405, time_taken_in_seconds: 27
Epoch [1/1], Step [10135/13804], Loss: 2.4283, Perplexity: 11.3398, time_taken_in_seconds: 28
Epoch [1/1], Step [10136/13804], Loss: 2.5620, Perplexity: 12.9613, time_taken_in_seconds: 29
Epoch [1/1], Step [10137/13804], Loss: 2.6318, Perplexity: 13.8993, time_taken_in_seconds: 30
Epoch [1/1], Step [10138/13804], Loss: 2.7781, Perplexity: 16.0881, time_taken_in_seconds: 31
Epoch [1/1], Step [10139/13804], Loss: 2.4534, Perplexity: 11.6282, time_taken_in_seconds: 32
Epoch [1/1], Step [10140/13804], Loss: 2.5545, Perplexity: 12.8646, time_taken_in_seconds: 32
Epoch [1/1], Step [10141/13804], Loss: 2.2256, Perplexity: 9.2594, time_taken_in_seconds: 33
Epoch [1/1], Step [10142/13804], Loss: 2.7420, Perplexity: 15.5177, time_taken_in_seconds: 34
Epoch [1/1], Step [10143/13804], Loss: 2.4608, Perplexity: 11.7145, time_taken_in_seconds: 35
Epoch [1/1], Step [10144/13804], Loss: 2.8202, Perplexity: 16.7809, time_taken_in_seconds: 36
Epoch [1/1], Step [10145/13804], Loss: 2.7790, Perplexity: 16.1037, time_taken_in_seconds: 37
Epoch [1/1], Step [10146/13804], Loss: 2.3688, Perplexity: 10.6843, time_taken_in_seconds: 37
Epoch [1/1], Step [10147/13804], Loss: 2.3352, Perplexity: 10.3316, time_taken_in_seconds: 38
Epoch [1/1], Step [10148/13804], Loss: 2.4690, Perplexity: 11.8104, time_taken_in_seconds: 39
Epoch [1/1], Step [10149/13804], Loss: 2.8514, Perplexity: 17.3124, time_taken_in_seconds: 40
Epoch [1/1], Step [10150/13804], Loss: 2.5248, Perplexity: 12.4879, time_taken_in_seconds: 41
Epoch [1/1], Step [10151/13804], Loss: 2.5954, Perplexity: 13.4016, time_taken_in_seconds: 41
Epoch [1/1], Step [10152/13804], Loss: 2.4324, Perplexity: 11.3861, time_taken_in_seconds: 42
Epoch [1/1], Step [10153/13804], Loss: 2.4118, Perplexity: 11.1537, time_taken_in_seconds: 43
Epoch [1/1], Step [10154/13804], Loss: 2.7989, Perplexity: 16.4271, time_taken_in_seconds: 44
Epoch [1/1], Step [10155/13804], Loss: 2.7411, Perplexity: 15.5043, time_taken_in_seconds: 45
Epoch [1/1], Step [10156/13804], Loss: 3.1724, Perplexity: 23.8645, time_taken_in_seconds: 45
Epoch [1/1], Step [10157/13804], Loss: 2.4464, Perplexity: 11.5471, time_taken_in_seconds: 46
Epoch [1/1], Step [10158/13804], Loss: 2.4351, Perplexity: 11.4173, time_taken_in_seconds: 47
Epoch [1/1], Step [10159/13804], Loss: 2.6071, Perplexity: 13.5594, time_taken_in_seconds: 48
Epoch [1/1], Step [10160/13804], Loss: 2.4540, Perplexity: 11.6349, time_taken_in_seconds: 49
Epoch [1/1], Step [10161/13804], Loss: 2.6299, Perplexity: 13.8719, time_taken_in_seconds: 50
Epoch [1/1], Step [10162/13804], Loss: 2.5166, Perplexity: 12.3860, time_taken_in_seconds: 50
Epoch [1/1], Step [10163/13804], Loss: 2.7043, Perplexity: 14.9442, time_taken_in_seconds: 51
Epoch [1/1], Step [10164/13804], Loss: 2.5966, Perplexity: 13.4182, time_taken_in_seconds: 52
Epoch [1/1], Step [10165/13804], Loss: 2.5141, Perplexity: 12.3554, time_taken_in_seconds: 53
Epoch [1/1], Step [10166/13804], Loss: 2.4858, Perplexity: 12.0102, time_taken_in_seconds: 54
Epoch [1/1], Step [10167/13804], Loss: 2.6536, Perplexity: 14.2056, time_taken_in_seconds: 54
Epoch [1/1], Step [10168/13804], Loss: 2.3036, Perplexity: 10.0098, time_taken_in_seconds: 55
Epoch [1/1], Step [10169/13804], Loss: 2.5810, Perplexity: 13.2107, time_taken_in_seconds: 56
Epoch [1/1], Step [10170/13804], Loss: 2.7213, Perplexity: 15.1997, time_taken_in_seconds: 57
Epoch [1/1], Step [10171/13804], Loss: 2.6279, Perplexity: 13.8441, time_taken_in_seconds: 58
Epoch [1/1], Step [10172/13804], Loss: 2.5618, Perplexity: 12.9591, time_taken_in_seconds: 58
Epoch [1/1], Step [10173/13804], Loss: 2.2759, Perplexity: 9.7366, time_taken_in_seconds: 59
Epoch [1/1], Step [10174/13804], Loss: 3.1427, Perplexity: 23.1675, time_taken_in_seconds: 60
Epoch [1/1], Step [10175/13804], Loss: 3.0844, Perplexity: 21.8539, time_taken_in_seconds: 61
Epoch [1/1], Step [10176/13804], Loss: 2.1917, Perplexity: 8.9506, time_taken_in_seconds: 62
Epoch [1/1], Step [10177/13804], Loss: 2.7730, Perplexity: 16.0061, time_taken_in_seconds: 63
Epoch [1/1], Step [10178/13804], Loss: 3.0880, Perplexity: 21.9336, time_taken_in_seconds: 63
Epoch [1/1], Step [10179/13804], Loss: 2.5066, Perplexity: 12.2637, time_taken_in_seconds: 64
Epoch [1/1], Step [10180/13804], Loss: 2.3792, Perplexity: 10.7966, time_taken_in_seconds: 65
Epoch [1/1], Step [10181/13804], Loss: 2.2478, Perplexity: 9.4665, time_taken_in_seconds: 66
Epoch [1/1], Step [10182/13804], Loss: 2.4980, Perplexity: 12.1586, time_taken_in_seconds: 67
Epoch [1/1], Step [10183/13804], Loss: 2.9217, Perplexity: 18.5731, time_taken_in_seconds: 68
Epoch [1/1], Step [10184/13804], Loss: 2.5999, Perplexity: 13.4621, time_taken_in_seconds: 68
Epoch [1/1], Step [10185/13804], Loss: 2.2797, Perplexity: 9.7734, time_taken_in_seconds: 69
Epoch [1/1], Step [10186/13804], Loss: 2.3445, Perplexity: 10.4276, time_taken_in_seconds: 70
Epoch [1/1], Step [10187/13804], Loss: 2.8730, Perplexity: 17.6892, time_taken_in_seconds: 71
Epoch [1/1], Step [10188/13804], Loss: 3.1011, Perplexity: 22.2227, time_taken_in_seconds: 72
Epoch [1/1], Step [10189/13804], Loss: 2.5719, Perplexity: 13.0903, time_taken_in_seconds: 72
Epoch [1/1], Step [10190/13804], Loss: 2.4143, Perplexity: 11.1817, time_taken_in_seconds: 73
Epoch [1/1], Step [10191/13804], Loss: 2.2473, Perplexity: 9.4623, time_taken_in_seconds: 74
Epoch [1/1], Step [10192/13804], Loss: 2.1735, Perplexity: 8.7887, time_taken_in_seconds: 75
Epoch [1/1], Step [10193/13804], Loss: 2.5471, Perplexity: 12.7696, time_taken_in_seconds: 76
Epoch [1/1], Step [10194/13804], Loss: 2.5645, Perplexity: 12.9936, time_taken_in_seconds: 77
Epoch [1/1], Step [10195/13804], Loss: 2.5869, Perplexity: 13.2880, time_taken_in_seconds: 77
Epoch [1/1], Step [10196/13804], Loss: 2.2619, Perplexity: 9.6010, time_taken_in_seconds: 78
Epoch [1/1], Step [10197/13804], Loss: 2.5882, Perplexity: 13.3052, time_taken_in_seconds: 79
Epoch [1/1], Step [10198/13804], Loss: 2.4349, Perplexity: 11.4152, time_taken_in_seconds: 80
Epoch [1/1], Step [10199/13804], Loss: 2.5558, Perplexity: 12.8821, time_taken_in_seconds: 81
Epoch [1/1], Step [10200/13804], Loss: 2.4383, Perplexity: 11.4536, time_taken_in_seconds: 82
Epoch [1/1], Step [10201/13804], Loss: 2.1261, Perplexity: 8.3823, time_taken_in_seconds: 0
Epoch [1/1], Step [10202/13804], Loss: 2.5330, Perplexity: 12.5906, time_taken_in_seconds: 1
Epoch [1/1], Step [10203/13804], Loss: 2.6658, Perplexity: 14.3788, time_taken_in_seconds: 2
Epoch [1/1], Step [10204/13804], Loss: 2.6376, Perplexity: 13.9794, time_taken_in_seconds: 3
Epoch [1/1], Step [10205/13804], Loss: 2.2291, Perplexity: 9.2914, time_taken_in_seconds: 4
Epoch [1/1], Step [10206/13804], Loss: 2.9550, Perplexity: 19.2009, time_taken_in_seconds: 4
Epoch [1/1], Step [10207/13804], Loss: 2.5276, Perplexity: 12.5240, time_taken_in_seconds: 5
Epoch [1/1], Step [10208/13804], Loss: 2.4503, Perplexity: 11.5921, time_taken_in_seconds: 6
Epoch [1/1], Step [10209/13804], Loss: 2.3426, Perplexity: 10.4088, time_taken_in_seconds: 7
Epoch [1/1], Step [10210/13804], Loss: 2.5378, Perplexity: 12.6512, time_taken_in_seconds: 8
Epoch [1/1], Step [10211/13804], Loss: 2.6225, Perplexity: 13.7699, time_taken_in_seconds: 8
Epoch [1/1], Step [10212/13804], Loss: 2.3127, Perplexity: 10.1014, time_taken_in_seconds: 9
Epoch [1/1], Step [10213/13804], Loss: 2.4196, Perplexity: 11.2417, time_taken_in_seconds: 10
Epoch [1/1], Step [10214/13804], Loss: 2.2058, Perplexity: 9.0775, time_taken_in_seconds: 11
Epoch [1/1], Step [10215/13804], Loss: 2.5989, Perplexity: 13.4485, time_taken_in_seconds: 12
Epoch [1/1], Step [10216/13804], Loss: 2.3269, Perplexity: 10.2463, time_taken_in_seconds: 13
Epoch [1/1], Step [10217/13804], Loss: 2.4351, Perplexity: 11.4173, time_taken_in_seconds: 14
Epoch [1/1], Step [10218/13804], Loss: 2.5470, Perplexity: 12.7693, time_taken_in_seconds: 14
Epoch [1/1], Step [10219/13804], Loss: 3.0715, Perplexity: 21.5734, time_taken_in_seconds: 15
Epoch [1/1], Step [10220/13804], Loss: 2.6994, Perplexity: 14.8712, time_taken_in_seconds: 16
Epoch [1/1], Step [10221/13804], Loss: 2.7214, Perplexity: 15.2019, time_taken_in_seconds: 17
Epoch [1/1], Step [10222/13804], Loss: 2.4651, Perplexity: 11.7649, time_taken_in_seconds: 18
Epoch [1/1], Step [10223/13804], Loss: 3.5248, Perplexity: 33.9469, time_taken_in_seconds: 18
Epoch [1/1], Step [10224/13804], Loss: 2.4502, Perplexity: 11.5901, time_taken_in_seconds: 19
Epoch [1/1], Step [10225/13804], Loss: 2.4993, Perplexity: 12.1744, time_taken_in_seconds: 20
Epoch [1/1], Step [10226/13804], Loss: 2.5281, Perplexity: 12.5295, time_taken_in_seconds: 21
Epoch [1/1], Step [10227/13804], Loss: 2.2970, Perplexity: 9.9442, time_taken_in_seconds: 22
Epoch [1/1], Step [10228/13804], Loss: 2.4644, Perplexity: 11.7564, time_taken_in_seconds: 22
Epoch [1/1], Step [10229/13804], Loss: 2.8704, Perplexity: 17.6446, time_taken_in_seconds: 23
Epoch [1/1], Step [10230/13804], Loss: 2.5642, Perplexity: 12.9902, time_taken_in_seconds: 24
Epoch [1/1], Step [10231/13804], Loss: 2.4856, Perplexity: 12.0083, time_taken_in_seconds: 25
Epoch [1/1], Step [10232/13804], Loss: 2.4135, Perplexity: 11.1735, time_taken_in_seconds: 26
Epoch [1/1], Step [10233/13804], Loss: 2.6469, Perplexity: 14.1100, time_taken_in_seconds: 27
Epoch [1/1], Step [10234/13804], Loss: 2.5755, Perplexity: 13.1385, time_taken_in_seconds: 27
Epoch [1/1], Step [10235/13804], Loss: 2.7533, Perplexity: 15.6945, time_taken_in_seconds: 28
Epoch [1/1], Step [10236/13804], Loss: 2.3900, Perplexity: 10.9137, time_taken_in_seconds: 29
Epoch [1/1], Step [10237/13804], Loss: 2.4952, Perplexity: 12.1245, time_taken_in_seconds: 30
Epoch [1/1], Step [10238/13804], Loss: 2.3544, Perplexity: 10.5314, time_taken_in_seconds: 31
Epoch [1/1], Step [10239/13804], Loss: 2.5333, Perplexity: 12.5949, time_taken_in_seconds: 31
Epoch [1/1], Step [10240/13804], Loss: 2.3599, Perplexity: 10.5896, time_taken_in_seconds: 32
Epoch [1/1], Step [10241/13804], Loss: 3.0474, Perplexity: 21.0596, time_taken_in_seconds: 33
Epoch [1/1], Step [10242/13804], Loss: 3.5921, Perplexity: 36.3103, time_taken_in_seconds: 34
Epoch [1/1], Step [10243/13804], Loss: 2.4512, Perplexity: 11.6019, time_taken_in_seconds: 35
Epoch [1/1], Step [10244/13804], Loss: 2.6593, Perplexity: 14.2856, time_taken_in_seconds: 36
Epoch [1/1], Step [10245/13804], Loss: 2.4924, Perplexity: 12.0909, time_taken_in_seconds: 36
Epoch [1/1], Step [10246/13804], Loss: 2.7397, Perplexity: 15.4828, time_taken_in_seconds: 37
Epoch [1/1], Step [10247/13804], Loss: 2.5730, Perplexity: 13.1052, time_taken_in_seconds: 38
Epoch [1/1], Step [10248/13804], Loss: 2.5036, Perplexity: 12.2258, time_taken_in_seconds: 39
Epoch [1/1], Step [10249/13804], Loss: 2.6096, Perplexity: 13.5933, time_taken_in_seconds: 40
Epoch [1/1], Step [10250/13804], Loss: 2.7358, Perplexity: 15.4225, time_taken_in_seconds: 40
Epoch [1/1], Step [10251/13804], Loss: 2.3874, Perplexity: 10.8850, time_taken_in_seconds: 41
Epoch [1/1], Step [10252/13804], Loss: 2.4923, Perplexity: 12.0893, time_taken_in_seconds: 42
Epoch [1/1], Step [10253/13804], Loss: 2.6166, Perplexity: 13.6887, time_taken_in_seconds: 43
Epoch [1/1], Step [10254/13804], Loss: 2.7680, Perplexity: 15.9275, time_taken_in_seconds: 44
Epoch [1/1], Step [10255/13804], Loss: 2.5323, Perplexity: 12.5826, time_taken_in_seconds: 45
Epoch [1/1], Step [10256/13804], Loss: 2.1859, Perplexity: 8.8982, time_taken_in_seconds: 45
Epoch [1/1], Step [10257/13804], Loss: 2.9421, Perplexity: 18.9559, time_taken_in_seconds: 46
Epoch [1/1], Step [10258/13804], Loss: 3.3229, Perplexity: 27.7400, time_taken_in_seconds: 47
Epoch [1/1], Step [10259/13804], Loss: 3.1199, Perplexity: 22.6437, time_taken_in_seconds: 48
Epoch [1/1], Step [10260/13804], Loss: 2.8086, Perplexity: 16.5868, time_taken_in_seconds: 49
Epoch [1/1], Step [10261/13804], Loss: 2.5141, Perplexity: 12.3554, time_taken_in_seconds: 49
Epoch [1/1], Step [10262/13804], Loss: 2.3732, Perplexity: 10.7317, time_taken_in_seconds: 50
Epoch [1/1], Step [10263/13804], Loss: 2.5791, Perplexity: 13.1854, time_taken_in_seconds: 51
Epoch [1/1], Step [10264/13804], Loss: 2.5807, Perplexity: 13.2059, time_taken_in_seconds: 52
Epoch [1/1], Step [10265/13804], Loss: 2.5098, Perplexity: 12.3020, time_taken_in_seconds: 53
Epoch [1/1], Step [10266/13804], Loss: 2.7998, Perplexity: 16.4417, time_taken_in_seconds: 54
Epoch [1/1], Step [10267/13804], Loss: 2.9213, Perplexity: 18.5651, time_taken_in_seconds: 54
Epoch [1/1], Step [10268/13804], Loss: 2.3512, Perplexity: 10.4977, time_taken_in_seconds: 55
Epoch [1/1], Step [10269/13804], Loss: 2.3251, Perplexity: 10.2275, time_taken_in_seconds: 56
Epoch [1/1], Step [10270/13804], Loss: 2.4266, Perplexity: 11.3200, time_taken_in_seconds: 57
Epoch [1/1], Step [10271/13804], Loss: 2.3615, Perplexity: 10.6069, time_taken_in_seconds: 58
Epoch [1/1], Step [10272/13804], Loss: 2.5667, Perplexity: 13.0227, time_taken_in_seconds: 58
Epoch [1/1], Step [10273/13804], Loss: 2.4101, Perplexity: 11.1353, time_taken_in_seconds: 59
Epoch [1/1], Step [10274/13804], Loss: 2.7531, Perplexity: 15.6910, time_taken_in_seconds: 60
Epoch [1/1], Step [10275/13804], Loss: 2.6882, Perplexity: 14.7057, time_taken_in_seconds: 61
Epoch [1/1], Step [10276/13804], Loss: 2.6385, Perplexity: 13.9922, time_taken_in_seconds: 62
Epoch [1/1], Step [10277/13804], Loss: 2.6017, Perplexity: 13.4863, time_taken_in_seconds: 63
Epoch [1/1], Step [10278/13804], Loss: 2.6143, Perplexity: 13.6576, time_taken_in_seconds: 64
Epoch [1/1], Step [10279/13804], Loss: 3.2202, Perplexity: 25.0329, time_taken_in_seconds: 64
Epoch [1/1], Step [10280/13804], Loss: 2.5747, Perplexity: 13.1273, time_taken_in_seconds: 65
Epoch [1/1], Step [10281/13804], Loss: 2.2524, Perplexity: 9.5103, time_taken_in_seconds: 66
Epoch [1/1], Step [10282/13804], Loss: 2.5782, Perplexity: 13.1729, time_taken_in_seconds: 67
Epoch [1/1], Step [10283/13804], Loss: 2.3782, Perplexity: 10.7858, time_taken_in_seconds: 68
Epoch [1/1], Step [10284/13804], Loss: 2.6554, Perplexity: 14.2305, time_taken_in_seconds: 68
Epoch [1/1], Step [10285/13804], Loss: 2.3062, Perplexity: 10.0366, time_taken_in_seconds: 69
Epoch [1/1], Step [10286/13804], Loss: 2.2570, Perplexity: 9.5541, time_taken_in_seconds: 70
Epoch [1/1], Step [10287/13804], Loss: 2.5975, Perplexity: 13.4300, time_taken_in_seconds: 71
Epoch [1/1], Step [10288/13804], Loss: 2.3413, Perplexity: 10.3950, time_taken_in_seconds: 72
Epoch [1/1], Step [10289/13804], Loss: 3.0307, Perplexity: 20.7116, time_taken_in_seconds: 73
Epoch [1/1], Step [10290/13804], Loss: 2.8438, Perplexity: 17.1806, time_taken_in_seconds: 73
Epoch [1/1], Step [10291/13804], Loss: 2.5123, Perplexity: 12.3329, time_taken_in_seconds: 74
Epoch [1/1], Step [10292/13804], Loss: 2.7219, Perplexity: 15.2097, time_taken_in_seconds: 75
Epoch [1/1], Step [10293/13804], Loss: 2.2159, Perplexity: 9.1693, time_taken_in_seconds: 76
Epoch [1/1], Step [10294/13804], Loss: 2.3737, Perplexity: 10.7371, time_taken_in_seconds: 77
Epoch [1/1], Step [10295/13804], Loss: 2.4031, Perplexity: 11.0570, time_taken_in_seconds: 78
Epoch [1/1], Step [10296/13804], Loss: 2.3290, Perplexity: 10.2672, time_taken_in_seconds: 78
Epoch [1/1], Step [10297/13804], Loss: 2.5246, Perplexity: 12.4855, time_taken_in_seconds: 79
Epoch [1/1], Step [10298/13804], Loss: 2.6280, Perplexity: 13.8456, time_taken_in_seconds: 80
Epoch [1/1], Step [10299/13804], Loss: 2.4749, Perplexity: 11.8803, time_taken_in_seconds: 81
Epoch [1/1], Step [10300/13804], Loss: 2.4018, Perplexity: 11.0434, time_taken_in_seconds: 82
Epoch [1/1], Step [10301/13804], Loss: 2.5890, Perplexity: 13.3159, time_taken_in_seconds: 0
Epoch [1/1], Step [10302/13804], Loss: 2.4664, Perplexity: 11.7802, time_taken_in_seconds: 1
Epoch [1/1], Step [10303/13804], Loss: 2.7862, Perplexity: 16.2194, time_taken_in_seconds: 2
Epoch [1/1], Step [10304/13804], Loss: 3.0488, Perplexity: 21.0906, time_taken_in_seconds: 3
Epoch [1/1], Step [10305/13804], Loss: 2.5005, Perplexity: 12.1882, time_taken_in_seconds: 4
Epoch [1/1], Step [10306/13804], Loss: 2.1947, Perplexity: 8.9772, time_taken_in_seconds: 4
Epoch [1/1], Step [10307/13804], Loss: 2.6641, Perplexity: 14.3551, time_taken_in_seconds: 5
Epoch [1/1], Step [10308/13804], Loss: 2.8876, Perplexity: 17.9506, time_taken_in_seconds: 6
Epoch [1/1], Step [10309/13804], Loss: 2.7194, Perplexity: 15.1713, time_taken_in_seconds: 7
Epoch [1/1], Step [10310/13804], Loss: 2.5852, Perplexity: 13.2661, time_taken_in_seconds: 8
Epoch [1/1], Step [10311/13804], Loss: 3.1549, Perplexity: 23.4516, time_taken_in_seconds: 9
Epoch [1/1], Step [10312/13804], Loss: 2.7188, Perplexity: 15.1616, time_taken_in_seconds: 9
Epoch [1/1], Step [10313/13804], Loss: 2.7028, Perplexity: 14.9220, time_taken_in_seconds: 10
Epoch [1/1], Step [10314/13804], Loss: 2.5542, Perplexity: 12.8607, time_taken_in_seconds: 11
Epoch [1/1], Step [10315/13804], Loss: 2.6352, Perplexity: 13.9466, time_taken_in_seconds: 12
Epoch [1/1], Step [10316/13804], Loss: 2.2692, Perplexity: 9.6718, time_taken_in_seconds: 13
Epoch [1/1], Step [10317/13804], Loss: 2.4934, Perplexity: 12.1026, time_taken_in_seconds: 13
Epoch [1/1], Step [10318/13804], Loss: 2.5921, Perplexity: 13.3578, time_taken_in_seconds: 14
Epoch [1/1], Step [10319/13804], Loss: 2.5755, Perplexity: 13.1375, time_taken_in_seconds: 15
Epoch [1/1], Step [10320/13804], Loss: 2.9708, Perplexity: 19.5084, time_taken_in_seconds: 16
Epoch [1/1], Step [10321/13804], Loss: 2.1082, Perplexity: 8.2332, time_taken_in_seconds: 17
Epoch [1/1], Step [10322/13804], Loss: 2.6637, Perplexity: 14.3488, time_taken_in_seconds: 18
Epoch [1/1], Step [10323/13804], Loss: 2.8359, Perplexity: 17.0460, time_taken_in_seconds: 18
Epoch [1/1], Step [10324/13804], Loss: 2.8052, Perplexity: 16.5307, time_taken_in_seconds: 19
Epoch [1/1], Step [10325/13804], Loss: 2.7220, Perplexity: 15.2112, time_taken_in_seconds: 20
Epoch [1/1], Step [10326/13804], Loss: 2.3463, Perplexity: 10.4469, time_taken_in_seconds: 21
Epoch [1/1], Step [10327/13804], Loss: 2.7161, Perplexity: 15.1209, time_taken_in_seconds: 22
Epoch [1/1], Step [10328/13804], Loss: 2.7270, Perplexity: 15.2869, time_taken_in_seconds: 22
Epoch [1/1], Step [10329/13804], Loss: 2.5323, Perplexity: 12.5826, time_taken_in_seconds: 23
Epoch [1/1], Step [10330/13804], Loss: 2.5341, Perplexity: 12.6051, time_taken_in_seconds: 24
Epoch [1/1], Step [10331/13804], Loss: 2.6425, Perplexity: 14.0483, time_taken_in_seconds: 25
Epoch [1/1], Step [10332/13804], Loss: 2.4818, Perplexity: 11.9624, time_taken_in_seconds: 26
Epoch [1/1], Step [10333/13804], Loss: 2.5068, Perplexity: 12.2658, time_taken_in_seconds: 27
Epoch [1/1], Step [10334/13804], Loss: 2.7988, Perplexity: 16.4252, time_taken_in_seconds: 27
Epoch [1/1], Step [10335/13804], Loss: 2.5476, Perplexity: 12.7770, time_taken_in_seconds: 28
Epoch [1/1], Step [10336/13804], Loss: 2.8030, Perplexity: 16.4936, time_taken_in_seconds: 29
Epoch [1/1], Step [10337/13804], Loss: 2.4706, Perplexity: 11.8294, time_taken_in_seconds: 30
Epoch [1/1], Step [10338/13804], Loss: 2.4219, Perplexity: 11.2677, time_taken_in_seconds: 31
Epoch [1/1], Step [10339/13804], Loss: 3.0385, Perplexity: 20.8740, time_taken_in_seconds: 32
Epoch [1/1], Step [10340/13804], Loss: 2.4973, Perplexity: 12.1500, time_taken_in_seconds: 32
Epoch [1/1], Step [10341/13804], Loss: 2.7690, Perplexity: 15.9433, time_taken_in_seconds: 33
Epoch [1/1], Step [10342/13804], Loss: 2.4307, Perplexity: 11.3672, time_taken_in_seconds: 34
Epoch [1/1], Step [10343/13804], Loss: 2.5120, Perplexity: 12.3296, time_taken_in_seconds: 35
Epoch [1/1], Step [10344/13804], Loss: 2.4493, Perplexity: 11.5805, time_taken_in_seconds: 36
Epoch [1/1], Step [10345/13804], Loss: 2.6273, Perplexity: 13.8366, time_taken_in_seconds: 36
Epoch [1/1], Step [10346/13804], Loss: 2.8188, Perplexity: 16.7570, time_taken_in_seconds: 37
Epoch [1/1], Step [10347/13804], Loss: 2.3569, Perplexity: 10.5586, time_taken_in_seconds: 38
Epoch [1/1], Step [10348/13804], Loss: 2.5647, Perplexity: 12.9964, time_taken_in_seconds: 39
Epoch [1/1], Step [10349/13804], Loss: 2.6346, Perplexity: 13.9374, time_taken_in_seconds: 40
Epoch [1/1], Step [10350/13804], Loss: 2.6520, Perplexity: 14.1823, time_taken_in_seconds: 41
Epoch [1/1], Step [10351/13804], Loss: 3.1695, Perplexity: 23.7950, time_taken_in_seconds: 42
Epoch [1/1], Step [10352/13804], Loss: 2.5836, Perplexity: 13.2445, time_taken_in_seconds: 42
Epoch [1/1], Step [10353/13804], Loss: 2.5123, Perplexity: 12.3329, time_taken_in_seconds: 43
Epoch [1/1], Step [10354/13804], Loss: 2.4742, Perplexity: 11.8726, time_taken_in_seconds: 44
Epoch [1/1], Step [10355/13804], Loss: 3.0555, Perplexity: 21.2315, time_taken_in_seconds: 45
Epoch [1/1], Step [10356/13804], Loss: 2.4972, Perplexity: 12.1483, time_taken_in_seconds: 46
Epoch [1/1], Step [10357/13804], Loss: 2.8532, Perplexity: 17.3434, time_taken_in_seconds: 46
Epoch [1/1], Step [10358/13804], Loss: 2.6372, Perplexity: 13.9743, time_taken_in_seconds: 47
Epoch [1/1], Step [10359/13804], Loss: 2.4409, Perplexity: 11.4829, time_taken_in_seconds: 48
Epoch [1/1], Step [10360/13804], Loss: 2.5543, Perplexity: 12.8622, time_taken_in_seconds: 49
Epoch [1/1], Step [10361/13804], Loss: 2.6815, Perplexity: 14.6076, time_taken_in_seconds: 50
Epoch [1/1], Step [10362/13804], Loss: 2.5634, Perplexity: 12.9801, time_taken_in_seconds: 51
Epoch [1/1], Step [10363/13804], Loss: 2.1740, Perplexity: 8.7933, time_taken_in_seconds: 51
Epoch [1/1], Step [10364/13804], Loss: 2.2554, Perplexity: 9.5386, time_taken_in_seconds: 52
Epoch [1/1], Step [10365/13804], Loss: 2.3887, Perplexity: 10.8996, time_taken_in_seconds: 53
Epoch [1/1], Step [10366/13804], Loss: 2.3920, Perplexity: 10.9356, time_taken_in_seconds: 54
Epoch [1/1], Step [10367/13804], Loss: 2.6251, Perplexity: 13.8065, time_taken_in_seconds: 55
Epoch [1/1], Step [10368/13804], Loss: 2.7088, Perplexity: 15.0115, time_taken_in_seconds: 55
Epoch [1/1], Step [10369/13804], Loss: 2.6417, Perplexity: 14.0375, time_taken_in_seconds: 56
Epoch [1/1], Step [10370/13804], Loss: 2.2341, Perplexity: 9.3379, time_taken_in_seconds: 57
Epoch [1/1], Step [10371/13804], Loss: 2.6329, Perplexity: 13.9142, time_taken_in_seconds: 58
Epoch [1/1], Step [10372/13804], Loss: 2.5579, Perplexity: 12.9085, time_taken_in_seconds: 59
Epoch [1/1], Step [10373/13804], Loss: 2.3229, Perplexity: 10.2052, time_taken_in_seconds: 60
Epoch [1/1], Step [10374/13804], Loss: 2.5230, Perplexity: 12.4662, time_taken_in_seconds: 60
Epoch [1/1], Step [10375/13804], Loss: 2.3661, Perplexity: 10.6556, time_taken_in_seconds: 61
Epoch [1/1], Step [10376/13804], Loss: 2.2514, Perplexity: 9.5010, time_taken_in_seconds: 62
Epoch [1/1], Step [10377/13804], Loss: 3.8218, Perplexity: 45.6875, time_taken_in_seconds: 63
Epoch [1/1], Step [10378/13804], Loss: 2.6313, Perplexity: 13.8918, time_taken_in_seconds: 64
Epoch [1/1], Step [10379/13804], Loss: 2.5434, Perplexity: 12.7232, time_taken_in_seconds: 65
Epoch [1/1], Step [10380/13804], Loss: 2.2085, Perplexity: 9.1025, time_taken_in_seconds: 65
Epoch [1/1], Step [10381/13804], Loss: 2.4986, Perplexity: 12.1659, time_taken_in_seconds: 66
Epoch [1/1], Step [10382/13804], Loss: 2.9681, Perplexity: 19.4550, time_taken_in_seconds: 67
Epoch [1/1], Step [10383/13804], Loss: 2.5676, Perplexity: 13.0339, time_taken_in_seconds: 68
Epoch [1/1], Step [10384/13804], Loss: 2.4536, Perplexity: 11.6296, time_taken_in_seconds: 69
Epoch [1/1], Step [10385/13804], Loss: 3.1362, Perplexity: 23.0173, time_taken_in_seconds: 69
Epoch [1/1], Step [10386/13804], Loss: 2.9976, Perplexity: 20.0380, time_taken_in_seconds: 70
Epoch [1/1], Step [10387/13804], Loss: 2.2449, Perplexity: 9.4399, time_taken_in_seconds: 71
Epoch [1/1], Step [10388/13804], Loss: 2.6775, Perplexity: 14.5487, time_taken_in_seconds: 72
Epoch [1/1], Step [10389/13804], Loss: 2.6620, Perplexity: 14.3254, time_taken_in_seconds: 73
Epoch [1/1], Step [10390/13804], Loss: 2.8814, Perplexity: 17.8383, time_taken_in_seconds: 74
Epoch [1/1], Step [10391/13804], Loss: 2.7927, Perplexity: 16.3243, time_taken_in_seconds: 74
Epoch [1/1], Step [10392/13804], Loss: 2.8364, Perplexity: 17.0542, time_taken_in_seconds: 75
Epoch [1/1], Step [10393/13804], Loss: 2.5557, Perplexity: 12.8804, time_taken_in_seconds: 76
Epoch [1/1], Step [10394/13804], Loss: 2.6981, Perplexity: 14.8519, time_taken_in_seconds: 77
Epoch [1/1], Step [10395/13804], Loss: 2.8147, Perplexity: 16.6886, time_taken_in_seconds: 78
Epoch [1/1], Step [10396/13804], Loss: 3.4229, Perplexity: 30.6578, time_taken_in_seconds: 79
Epoch [1/1], Step [10397/13804], Loss: 2.7622, Perplexity: 15.8345, time_taken_in_seconds: 79
Epoch [1/1], Step [10398/13804], Loss: 2.5550, Perplexity: 12.8710, time_taken_in_seconds: 80
Epoch [1/1], Step [10399/13804], Loss: 2.7067, Perplexity: 14.9795, time_taken_in_seconds: 81
Epoch [1/1], Step [10400/13804], Loss: 2.6733, Perplexity: 14.4878, time_taken_in_seconds: 82
Epoch [1/1], Step [10401/13804], Loss: 2.7695, Perplexity: 15.9502, time_taken_in_seconds: 0
Epoch [1/1], Step [10402/13804], Loss: 2.7047, Perplexity: 14.9505, time_taken_in_seconds: 1
Epoch [1/1], Step [10403/13804], Loss: 2.9139, Perplexity: 18.4286, time_taken_in_seconds: 2
Epoch [1/1], Step [10404/13804], Loss: 2.4449, Perplexity: 11.5292, time_taken_in_seconds: 3
Epoch [1/1], Step [10405/13804], Loss: 2.4549, Perplexity: 11.6450, time_taken_in_seconds: 4
Epoch [1/1], Step [10406/13804], Loss: 2.5481, Perplexity: 12.7832, time_taken_in_seconds: 4
Epoch [1/1], Step [10407/13804], Loss: 2.4404, Perplexity: 11.4773, time_taken_in_seconds: 5
Epoch [1/1], Step [10408/13804], Loss: 2.5282, Perplexity: 12.5314, time_taken_in_seconds: 6
Epoch [1/1], Step [10409/13804], Loss: 2.7283, Perplexity: 15.3073, time_taken_in_seconds: 7
Epoch [1/1], Step [10410/13804], Loss: 2.4510, Perplexity: 11.6001, time_taken_in_seconds: 8
Epoch [1/1], Step [10411/13804], Loss: 2.2423, Perplexity: 9.4146, time_taken_in_seconds: 9
Epoch [1/1], Step [10412/13804], Loss: 2.1952, Perplexity: 8.9820, time_taken_in_seconds: 9
Epoch [1/1], Step [10413/13804], Loss: 2.4944, Perplexity: 12.1145, time_taken_in_seconds: 10
Epoch [1/1], Step [10414/13804], Loss: 2.7301, Perplexity: 15.3348, time_taken_in_seconds: 11
Epoch [1/1], Step [10415/13804], Loss: 2.8528, Perplexity: 17.3366, time_taken_in_seconds: 12
Epoch [1/1], Step [10416/13804], Loss: 2.5875, Perplexity: 13.2971, time_taken_in_seconds: 13
Epoch [1/1], Step [10417/13804], Loss: 2.4491, Perplexity: 11.5783, time_taken_in_seconds: 13
Epoch [1/1], Step [10418/13804], Loss: 2.3149, Perplexity: 10.1243, time_taken_in_seconds: 14
Epoch [1/1], Step [10419/13804], Loss: 2.4507, Perplexity: 11.5966, time_taken_in_seconds: 15
Epoch [1/1], Step [10420/13804], Loss: 2.7829, Perplexity: 16.1663, time_taken_in_seconds: 16
Epoch [1/1], Step [10421/13804], Loss: 2.6019, Perplexity: 13.4887, time_taken_in_seconds: 17
Epoch [1/1], Step [10422/13804], Loss: 2.4761, Perplexity: 11.8943, time_taken_in_seconds: 18
Epoch [1/1], Step [10423/13804], Loss: 2.8630, Perplexity: 17.5140, time_taken_in_seconds: 19
Epoch [1/1], Step [10424/13804], Loss: 2.4334, Perplexity: 11.3979, time_taken_in_seconds: 19
Epoch [1/1], Step [10425/13804], Loss: 2.3436, Perplexity: 10.4192, time_taken_in_seconds: 20
Epoch [1/1], Step [10426/13804], Loss: 2.2762, Perplexity: 9.7392, time_taken_in_seconds: 21
Epoch [1/1], Step [10427/13804], Loss: 2.8378, Perplexity: 17.0780, time_taken_in_seconds: 22
Epoch [1/1], Step [10428/13804], Loss: 2.4908, Perplexity: 12.0705, time_taken_in_seconds: 23
Epoch [1/1], Step [10429/13804], Loss: 2.6463, Perplexity: 14.1012, time_taken_in_seconds: 23
Epoch [1/1], Step [10430/13804], Loss: 2.5602, Perplexity: 12.9389, time_taken_in_seconds: 24
Epoch [1/1], Step [10431/13804], Loss: 2.4639, Perplexity: 11.7506, time_taken_in_seconds: 25
Epoch [1/1], Step [10432/13804], Loss: 2.5513, Perplexity: 12.8244, time_taken_in_seconds: 26
Epoch [1/1], Step [10433/13804], Loss: 2.7028, Perplexity: 14.9214, time_taken_in_seconds: 27
Epoch [1/1], Step [10434/13804], Loss: 3.3480, Perplexity: 28.4472, time_taken_in_seconds: 28
Epoch [1/1], Step [10435/13804], Loss: 2.6101, Perplexity: 13.6003, time_taken_in_seconds: 28
Epoch [1/1], Step [10436/13804], Loss: 2.3762, Perplexity: 10.7638, time_taken_in_seconds: 29
Epoch [1/1], Step [10437/13804], Loss: 2.5519, Perplexity: 12.8320, time_taken_in_seconds: 30
Epoch [1/1], Step [10438/13804], Loss: 2.4967, Perplexity: 12.1418, time_taken_in_seconds: 31
Epoch [1/1], Step [10439/13804], Loss: 2.6763, Perplexity: 14.5311, time_taken_in_seconds: 32
Epoch [1/1], Step [10440/13804], Loss: 2.7066, Perplexity: 14.9783, time_taken_in_seconds: 32
Epoch [1/1], Step [10441/13804], Loss: 2.3128, Perplexity: 10.1030, time_taken_in_seconds: 33
Epoch [1/1], Step [10442/13804], Loss: 2.5895, Perplexity: 13.3226, time_taken_in_seconds: 34
Epoch [1/1], Step [10443/13804], Loss: 2.4587, Perplexity: 11.6901, time_taken_in_seconds: 35
Epoch [1/1], Step [10444/13804], Loss: 2.4240, Perplexity: 11.2909, time_taken_in_seconds: 36
Epoch [1/1], Step [10445/13804], Loss: 3.0239, Perplexity: 20.5706, time_taken_in_seconds: 37
Epoch [1/1], Step [10446/13804], Loss: 2.3246, Perplexity: 10.2226, time_taken_in_seconds: 37
Epoch [1/1], Step [10447/13804], Loss: 2.1980, Perplexity: 9.0067, time_taken_in_seconds: 38
Epoch [1/1], Step [10448/13804], Loss: 2.5130, Perplexity: 12.3419, time_taken_in_seconds: 39
Epoch [1/1], Step [10449/13804], Loss: 2.4197, Perplexity: 11.2426, time_taken_in_seconds: 40
Epoch [1/1], Step [10450/13804], Loss: 2.9578, Perplexity: 19.2551, time_taken_in_seconds: 41
Epoch [1/1], Step [10451/13804], Loss: 2.6807, Perplexity: 14.5948, time_taken_in_seconds: 41
Epoch [1/1], Step [10452/13804], Loss: 2.5123, Perplexity: 12.3327, time_taken_in_seconds: 42
Epoch [1/1], Step [10453/13804], Loss: 2.7966, Perplexity: 16.3891, time_taken_in_seconds: 43
Epoch [1/1], Step [10454/13804], Loss: 2.4348, Perplexity: 11.4138, time_taken_in_seconds: 44
Epoch [1/1], Step [10455/13804], Loss: 2.2672, Perplexity: 9.6521, time_taken_in_seconds: 45
Epoch [1/1], Step [10456/13804], Loss: 2.4133, Perplexity: 11.1702, time_taken_in_seconds: 45
Epoch [1/1], Step [10457/13804], Loss: 2.8058, Perplexity: 16.5410, time_taken_in_seconds: 46
Epoch [1/1], Step [10458/13804], Loss: 2.5049, Perplexity: 12.2424, time_taken_in_seconds: 47
Epoch [1/1], Step [10459/13804], Loss: 2.6350, Perplexity: 13.9436, time_taken_in_seconds: 48
Epoch [1/1], Step [10460/13804], Loss: 2.7375, Perplexity: 15.4489, time_taken_in_seconds: 49
Epoch [1/1], Step [10461/13804], Loss: 2.4105, Perplexity: 11.1394, time_taken_in_seconds: 50
Epoch [1/1], Step [10462/13804], Loss: 2.2101, Perplexity: 9.1170, time_taken_in_seconds: 50
Epoch [1/1], Step [10463/13804], Loss: 2.2939, Perplexity: 9.9136, time_taken_in_seconds: 51
Epoch [1/1], Step [10464/13804], Loss: 2.7051, Perplexity: 14.9563, time_taken_in_seconds: 52
Epoch [1/1], Step [10465/13804], Loss: 2.2977, Perplexity: 9.9510, time_taken_in_seconds: 53
Epoch [1/1], Step [10466/13804], Loss: 2.5894, Perplexity: 13.3213, time_taken_in_seconds: 54
Epoch [1/1], Step [10467/13804], Loss: 2.5197, Perplexity: 12.4244, time_taken_in_seconds: 54
Epoch [1/1], Step [10468/13804], Loss: 2.7140, Perplexity: 15.0898, time_taken_in_seconds: 55
Epoch [1/1], Step [10469/13804], Loss: 2.5530, Perplexity: 12.8461, time_taken_in_seconds: 56
Epoch [1/1], Step [10470/13804], Loss: 2.4131, Perplexity: 11.1680, time_taken_in_seconds: 57
Epoch [1/1], Step [10471/13804], Loss: 2.6827, Perplexity: 14.6251, time_taken_in_seconds: 58
Epoch [1/1], Step [10472/13804], Loss: 2.3112, Perplexity: 10.0862, time_taken_in_seconds: 58
Epoch [1/1], Step [10473/13804], Loss: 2.6182, Perplexity: 13.7104, time_taken_in_seconds: 59
Epoch [1/1], Step [10474/13804], Loss: 2.7347, Perplexity: 15.4048, time_taken_in_seconds: 60
Epoch [1/1], Step [10475/13804], Loss: 2.6635, Perplexity: 14.3470, time_taken_in_seconds: 61
Epoch [1/1], Step [10476/13804], Loss: 2.3695, Perplexity: 10.6917, time_taken_in_seconds: 62
Epoch [1/1], Step [10477/13804], Loss: 2.2623, Perplexity: 9.6051, time_taken_in_seconds: 63
Epoch [1/1], Step [10478/13804], Loss: 2.5580, Perplexity: 12.9100, time_taken_in_seconds: 63
Epoch [1/1], Step [10479/13804], Loss: 2.5243, Perplexity: 12.4825, time_taken_in_seconds: 64
Epoch [1/1], Step [10480/13804], Loss: 2.6697, Perplexity: 14.4355, time_taken_in_seconds: 65
Epoch [1/1], Step [10481/13804], Loss: 2.6069, Perplexity: 13.5572, time_taken_in_seconds: 66
Epoch [1/1], Step [10482/13804], Loss: 2.4636, Perplexity: 11.7468, time_taken_in_seconds: 67
Epoch [1/1], Step [10483/13804], Loss: 2.5223, Perplexity: 12.4568, time_taken_in_seconds: 67
Epoch [1/1], Step [10484/13804], Loss: 2.6683, Perplexity: 14.4153, time_taken_in_seconds: 68
Epoch [1/1], Step [10485/13804], Loss: 2.6096, Perplexity: 13.5933, time_taken_in_seconds: 69
Epoch [1/1], Step [10486/13804], Loss: 2.5957, Perplexity: 13.4063, time_taken_in_seconds: 70
Epoch [1/1], Step [10487/13804], Loss: 2.5240, Perplexity: 12.4781, time_taken_in_seconds: 71
Epoch [1/1], Step [10488/13804], Loss: 2.2723, Perplexity: 9.7017, time_taken_in_seconds: 71
Epoch [1/1], Step [10489/13804], Loss: 2.5197, Perplexity: 12.4250, time_taken_in_seconds: 72
Epoch [1/1], Step [10490/13804], Loss: 2.8682, Perplexity: 17.6046, time_taken_in_seconds: 73
Epoch [1/1], Step [10491/13804], Loss: 3.2928, Perplexity: 26.9173, time_taken_in_seconds: 74
Epoch [1/1], Step [10492/13804], Loss: 2.6288, Perplexity: 13.8575, time_taken_in_seconds: 75
Epoch [1/1], Step [10493/13804], Loss: 2.3992, Perplexity: 11.0148, time_taken_in_seconds: 76
Epoch [1/1], Step [10494/13804], Loss: 3.4047, Perplexity: 30.1043, time_taken_in_seconds: 77
Epoch [1/1], Step [10495/13804], Loss: 2.3655, Perplexity: 10.6498, time_taken_in_seconds: 77
Epoch [1/1], Step [10496/13804], Loss: 2.4246, Perplexity: 11.2972, time_taken_in_seconds: 78
Epoch [1/1], Step [10497/13804], Loss: 2.6197, Perplexity: 13.7317, time_taken_in_seconds: 79
Epoch [1/1], Step [10498/13804], Loss: 2.4084, Perplexity: 11.1165, time_taken_in_seconds: 80
Epoch [1/1], Step [10499/13804], Loss: 2.2638, Perplexity: 9.6191, time_taken_in_seconds: 81
Epoch [1/1], Step [10500/13804], Loss: 3.7237, Perplexity: 41.4181, time_taken_in_seconds: 81
Epoch [1/1], Step [10501/13804], Loss: 2.5408, Perplexity: 12.6898, time_taken_in_seconds: 0
Epoch [1/1], Step [10502/13804], Loss: 2.4439, Perplexity: 11.5183, time_taken_in_seconds: 1
Epoch [1/1], Step [10503/13804], Loss: 2.7043, Perplexity: 14.9441, time_taken_in_seconds: 2
Epoch [1/1], Step [10504/13804], Loss: 2.7904, Perplexity: 16.2873, time_taken_in_seconds: 3
Epoch [1/1], Step [10505/13804], Loss: 2.3264, Perplexity: 10.2415, time_taken_in_seconds: 4
Epoch [1/1], Step [10506/13804], Loss: 2.7123, Perplexity: 15.0643, time_taken_in_seconds: 4
Epoch [1/1], Step [10507/13804], Loss: 2.5023, Perplexity: 12.2108, time_taken_in_seconds: 5
Epoch [1/1], Step [10508/13804], Loss: 2.4766, Perplexity: 11.9001, time_taken_in_seconds: 6
Epoch [1/1], Step [10509/13804], Loss: 2.2803, Perplexity: 9.7797, time_taken_in_seconds: 7
Epoch [1/1], Step [10510/13804], Loss: 2.4173, Perplexity: 11.2150, time_taken_in_seconds: 8
Epoch [1/1], Step [10511/13804], Loss: 2.2504, Perplexity: 9.4915, time_taken_in_seconds: 8
Epoch [1/1], Step [10512/13804], Loss: 3.0015, Perplexity: 20.1159, time_taken_in_seconds: 9
Epoch [1/1], Step [10513/13804], Loss: 2.5853, Perplexity: 13.2673, time_taken_in_seconds: 10
Epoch [1/1], Step [10514/13804], Loss: 2.2651, Perplexity: 9.6322, time_taken_in_seconds: 11
Epoch [1/1], Step [10515/13804], Loss: 2.5797, Perplexity: 13.1937, time_taken_in_seconds: 12
Epoch [1/1], Step [10516/13804], Loss: 2.4411, Perplexity: 11.4858, time_taken_in_seconds: 13
Epoch [1/1], Step [10517/13804], Loss: 2.5978, Perplexity: 13.4345, time_taken_in_seconds: 13
Epoch [1/1], Step [10518/13804], Loss: 2.5881, Perplexity: 13.3040, time_taken_in_seconds: 14
Epoch [1/1], Step [10519/13804], Loss: 2.5560, Perplexity: 12.8842, time_taken_in_seconds: 15
Epoch [1/1], Step [10520/13804], Loss: 3.1587, Perplexity: 23.5402, time_taken_in_seconds: 16
Epoch [1/1], Step [10521/13804], Loss: 2.3806, Perplexity: 10.8111, time_taken_in_seconds: 17
Epoch [1/1], Step [10522/13804], Loss: 2.2584, Perplexity: 9.5680, time_taken_in_seconds: 18
Epoch [1/1], Step [10523/13804], Loss: 2.2041, Perplexity: 9.0621, time_taken_in_seconds: 18
Epoch [1/1], Step [10524/13804], Loss: 2.4654, Perplexity: 11.7683, time_taken_in_seconds: 19
Epoch [1/1], Step [10525/13804], Loss: 2.4623, Perplexity: 11.7312, time_taken_in_seconds: 20
Epoch [1/1], Step [10526/13804], Loss: 2.1043, Perplexity: 8.2016, time_taken_in_seconds: 21
Epoch [1/1], Step [10527/13804], Loss: 2.4772, Perplexity: 11.9077, time_taken_in_seconds: 22
Epoch [1/1], Step [10528/13804], Loss: 2.7298, Perplexity: 15.3299, time_taken_in_seconds: 22
Epoch [1/1], Step [10529/13804], Loss: 2.8560, Perplexity: 17.3913, time_taken_in_seconds: 23
Epoch [1/1], Step [10530/13804], Loss: 2.4021, Perplexity: 11.0459, time_taken_in_seconds: 24
Epoch [1/1], Step [10531/13804], Loss: 2.7041, Perplexity: 14.9406, time_taken_in_seconds: 25
Epoch [1/1], Step [10532/13804], Loss: 2.3871, Perplexity: 10.8821, time_taken_in_seconds: 26
Epoch [1/1], Step [10533/13804], Loss: 2.5120, Perplexity: 12.3297, time_taken_in_seconds: 27
Epoch [1/1], Step [10534/13804], Loss: 2.5434, Perplexity: 12.7234, time_taken_in_seconds: 27
Epoch [1/1], Step [10535/13804], Loss: 2.4335, Perplexity: 11.3985, time_taken_in_seconds: 28
Epoch [1/1], Step [10536/13804], Loss: 2.5797, Perplexity: 13.1933, time_taken_in_seconds: 29
Epoch [1/1], Step [10537/13804], Loss: 2.8529, Perplexity: 17.3374, time_taken_in_seconds: 30
Epoch [1/1], Step [10538/13804], Loss: 2.8245, Perplexity: 16.8532, time_taken_in_seconds: 31
Epoch [1/1], Step [10539/13804], Loss: 2.6058, Perplexity: 13.5420, time_taken_in_seconds: 31
Epoch [1/1], Step [10540/13804], Loss: 2.3528, Perplexity: 10.5154, time_taken_in_seconds: 32
Epoch [1/1], Step [10541/13804], Loss: 2.4498, Perplexity: 11.5860, time_taken_in_seconds: 33
Epoch [1/1], Step [10542/13804], Loss: 2.6224, Perplexity: 13.7682, time_taken_in_seconds: 34
Epoch [1/1], Step [10543/13804], Loss: 2.5017, Perplexity: 12.2035, time_taken_in_seconds: 35
Epoch [1/1], Step [10544/13804], Loss: 2.4111, Perplexity: 11.1458, time_taken_in_seconds: 36
Epoch [1/1], Step [10545/13804], Loss: 2.4628, Perplexity: 11.7374, time_taken_in_seconds: 36
Epoch [1/1], Step [10546/13804], Loss: 2.6811, Perplexity: 14.6008, time_taken_in_seconds: 37
Epoch [1/1], Step [10547/13804], Loss: 2.6435, Perplexity: 14.0619, time_taken_in_seconds: 38
Epoch [1/1], Step [10548/13804], Loss: 2.5522, Perplexity: 12.8347, time_taken_in_seconds: 39
Epoch [1/1], Step [10549/13804], Loss: 2.8112, Perplexity: 16.6305, time_taken_in_seconds: 40
Epoch [1/1], Step [10550/13804], Loss: 2.2408, Perplexity: 9.4012, time_taken_in_seconds: 40
Epoch [1/1], Step [10551/13804], Loss: 2.6589, Perplexity: 14.2804, time_taken_in_seconds: 41
Epoch [1/1], Step [10552/13804], Loss: 2.5893, Perplexity: 13.3210, time_taken_in_seconds: 42
Epoch [1/1], Step [10553/13804], Loss: 2.5395, Perplexity: 12.6732, time_taken_in_seconds: 43
Epoch [1/1], Step [10554/13804], Loss: 2.4391, Perplexity: 11.4627, time_taken_in_seconds: 44
Epoch [1/1], Step [10555/13804], Loss: 2.9774, Perplexity: 19.6364, time_taken_in_seconds: 45
Epoch [1/1], Step [10556/13804], Loss: 2.4258, Perplexity: 11.3109, time_taken_in_seconds: 45
Epoch [1/1], Step [10557/13804], Loss: 2.4263, Perplexity: 11.3169, time_taken_in_seconds: 46
Epoch [1/1], Step [10558/13804], Loss: 2.3879, Perplexity: 10.8902, time_taken_in_seconds: 47
Epoch [1/1], Step [10559/13804], Loss: 2.2184, Perplexity: 9.1924, time_taken_in_seconds: 48
Epoch [1/1], Step [10560/13804], Loss: 2.6971, Perplexity: 14.8363, time_taken_in_seconds: 49
Epoch [1/1], Step [10561/13804], Loss: 2.3649, Perplexity: 10.6434, time_taken_in_seconds: 50
Epoch [1/1], Step [10562/13804], Loss: 2.9598, Perplexity: 19.2948, time_taken_in_seconds: 50
Epoch [1/1], Step [10563/13804], Loss: 2.5663, Perplexity: 13.0171, time_taken_in_seconds: 51
Epoch [1/1], Step [10564/13804], Loss: 2.6070, Perplexity: 13.5584, time_taken_in_seconds: 52
Epoch [1/1], Step [10565/13804], Loss: 3.2306, Perplexity: 25.2953, time_taken_in_seconds: 53
Epoch [1/1], Step [10566/13804], Loss: 3.0578, Perplexity: 21.2799, time_taken_in_seconds: 54
Epoch [1/1], Step [10567/13804], Loss: 2.7718, Perplexity: 15.9881, time_taken_in_seconds: 55
Epoch [1/1], Step [10568/13804], Loss: 2.6116, Perplexity: 13.6208, time_taken_in_seconds: 55
Epoch [1/1], Step [10569/13804], Loss: 2.6186, Perplexity: 13.7163, time_taken_in_seconds: 56
Epoch [1/1], Step [10570/13804], Loss: 2.5838, Perplexity: 13.2477, time_taken_in_seconds: 57
Epoch [1/1], Step [10571/13804], Loss: 2.5846, Perplexity: 13.2583, time_taken_in_seconds: 58
Epoch [1/1], Step [10572/13804], Loss: 2.7122, Perplexity: 15.0623, time_taken_in_seconds: 59
Epoch [1/1], Step [10573/13804], Loss: 2.5576, Perplexity: 12.9055, time_taken_in_seconds: 59
Epoch [1/1], Step [10574/13804], Loss: 2.6188, Perplexity: 13.7189, time_taken_in_seconds: 60
Epoch [1/1], Step [10575/13804], Loss: 2.4515, Perplexity: 11.6057, time_taken_in_seconds: 61
Epoch [1/1], Step [10576/13804], Loss: 2.2882, Perplexity: 9.8569, time_taken_in_seconds: 62
Epoch [1/1], Step [10577/13804], Loss: 2.5450, Perplexity: 12.7431, time_taken_in_seconds: 63
Epoch [1/1], Step [10578/13804], Loss: 2.1825, Perplexity: 8.8686, time_taken_in_seconds: 64
Epoch [1/1], Step [10579/13804], Loss: 2.2968, Perplexity: 9.9423, time_taken_in_seconds: 64
Epoch [1/1], Step [10580/13804], Loss: 2.6942, Perplexity: 14.7933, time_taken_in_seconds: 65
Epoch [1/1], Step [10581/13804], Loss: 2.4119, Perplexity: 11.1550, time_taken_in_seconds: 66
Epoch [1/1], Step [10582/13804], Loss: 2.6867, Perplexity: 14.6829, time_taken_in_seconds: 67
Epoch [1/1], Step [10583/13804], Loss: 2.4389, Perplexity: 11.4609, time_taken_in_seconds: 68
Epoch [1/1], Step [10584/13804], Loss: 2.4783, Perplexity: 11.9214, time_taken_in_seconds: 68
Epoch [1/1], Step [10585/13804], Loss: 2.3571, Perplexity: 10.5605, time_taken_in_seconds: 69
Epoch [1/1], Step [10586/13804], Loss: 2.5562, Perplexity: 12.8866, time_taken_in_seconds: 70
Epoch [1/1], Step [10587/13804], Loss: 2.6121, Perplexity: 13.6278, time_taken_in_seconds: 71
Epoch [1/1], Step [10588/13804], Loss: 2.2891, Perplexity: 9.8658, time_taken_in_seconds: 72
Epoch [1/1], Step [10589/13804], Loss: 2.6033, Perplexity: 13.5082, time_taken_in_seconds: 73
Epoch [1/1], Step [10590/13804], Loss: 2.5081, Perplexity: 12.2819, time_taken_in_seconds: 73
Epoch [1/1], Step [10591/13804], Loss: 2.2239, Perplexity: 9.2431, time_taken_in_seconds: 74
Epoch [1/1], Step [10592/13804], Loss: 2.3749, Perplexity: 10.7504, time_taken_in_seconds: 75
Epoch [1/1], Step [10593/13804], Loss: 2.9589, Perplexity: 19.2768, time_taken_in_seconds: 76
Epoch [1/1], Step [10594/13804], Loss: 2.4812, Perplexity: 11.9558, time_taken_in_seconds: 77
Epoch [1/1], Step [10595/13804], Loss: 2.7106, Perplexity: 15.0383, time_taken_in_seconds: 77
Epoch [1/1], Step [10596/13804], Loss: 2.5817, Perplexity: 13.2200, time_taken_in_seconds: 78
Epoch [1/1], Step [10597/13804], Loss: 2.8869, Perplexity: 17.9382, time_taken_in_seconds: 79
Epoch [1/1], Step [10598/13804], Loss: 2.4867, Perplexity: 12.0213, time_taken_in_seconds: 80
Epoch [1/1], Step [10599/13804], Loss: 2.8208, Perplexity: 16.7895, time_taken_in_seconds: 81
Epoch [1/1], Step [10600/13804], Loss: 2.2558, Perplexity: 9.5427, time_taken_in_seconds: 81
Epoch [1/1], Step [10601/13804], Loss: 3.0366, Perplexity: 20.8346, time_taken_in_seconds: 0
Epoch [1/1], Step [10602/13804], Loss: 2.6910, Perplexity: 14.7458, time_taken_in_seconds: 1
Epoch [1/1], Step [10603/13804], Loss: 2.3591, Perplexity: 10.5810, time_taken_in_seconds: 2
Epoch [1/1], Step [10604/13804], Loss: 2.6781, Perplexity: 14.5571, time_taken_in_seconds: 3
Epoch [1/1], Step [10605/13804], Loss: 2.3746, Perplexity: 10.7465, time_taken_in_seconds: 4
Epoch [1/1], Step [10606/13804], Loss: 2.8970, Perplexity: 18.1190, time_taken_in_seconds: 4
Epoch [1/1], Step [10607/13804], Loss: 2.6511, Perplexity: 14.1696, time_taken_in_seconds: 5
Epoch [1/1], Step [10608/13804], Loss: 2.4503, Perplexity: 11.5918, time_taken_in_seconds: 6
Epoch [1/1], Step [10609/13804], Loss: 2.5598, Perplexity: 12.9328, time_taken_in_seconds: 7
Epoch [1/1], Step [10610/13804], Loss: 2.8990, Perplexity: 18.1558, time_taken_in_seconds: 8
Epoch [1/1], Step [10611/13804], Loss: 2.6719, Perplexity: 14.4680, time_taken_in_seconds: 8
Epoch [1/1], Step [10612/13804], Loss: 2.5079, Perplexity: 12.2791, time_taken_in_seconds: 9
Epoch [1/1], Step [10613/13804], Loss: 2.3261, Perplexity: 10.2374, time_taken_in_seconds: 10
Epoch [1/1], Step [10614/13804], Loss: 2.7275, Perplexity: 15.2949, time_taken_in_seconds: 11
Epoch [1/1], Step [10615/13804], Loss: 2.6577, Perplexity: 14.2631, time_taken_in_seconds: 12
Epoch [1/1], Step [10616/13804], Loss: 2.4745, Perplexity: 11.8755, time_taken_in_seconds: 12
Epoch [1/1], Step [10617/13804], Loss: 2.4939, Perplexity: 12.1079, time_taken_in_seconds: 13
Epoch [1/1], Step [10618/13804], Loss: 2.5724, Perplexity: 13.0979, time_taken_in_seconds: 14
Epoch [1/1], Step [10619/13804], Loss: 2.5919, Perplexity: 13.3548, time_taken_in_seconds: 15
Epoch [1/1], Step [10620/13804], Loss: 2.6681, Perplexity: 14.4124, time_taken_in_seconds: 16
Epoch [1/1], Step [10621/13804], Loss: 2.4036, Perplexity: 11.0626, time_taken_in_seconds: 16
Epoch [1/1], Step [10622/13804], Loss: 3.5163, Perplexity: 33.6598, time_taken_in_seconds: 17
Epoch [1/1], Step [10623/13804], Loss: 2.3340, Perplexity: 10.3188, time_taken_in_seconds: 18
Epoch [1/1], Step [10624/13804], Loss: 2.4726, Perplexity: 11.8535, time_taken_in_seconds: 19
Epoch [1/1], Step [10625/13804], Loss: 2.8696, Perplexity: 17.6305, time_taken_in_seconds: 20
Epoch [1/1], Step [10626/13804], Loss: 2.5590, Perplexity: 12.9231, time_taken_in_seconds: 21
Epoch [1/1], Step [10627/13804], Loss: 2.5789, Perplexity: 13.1828, time_taken_in_seconds: 21
Epoch [1/1], Step [10628/13804], Loss: 2.5822, Perplexity: 13.2267, time_taken_in_seconds: 22
Epoch [1/1], Step [10629/13804], Loss: 2.4603, Perplexity: 11.7087, time_taken_in_seconds: 23
Epoch [1/1], Step [10630/13804], Loss: 2.4472, Perplexity: 11.5555, time_taken_in_seconds: 24
Epoch [1/1], Step [10631/13804], Loss: 2.5407, Perplexity: 12.6881, time_taken_in_seconds: 25
Epoch [1/1], Step [10632/13804], Loss: 2.6112, Perplexity: 13.6150, time_taken_in_seconds: 25
Epoch [1/1], Step [10633/13804], Loss: 2.4090, Perplexity: 11.1227, time_taken_in_seconds: 26
Epoch [1/1], Step [10634/13804], Loss: 2.5666, Perplexity: 13.0216, time_taken_in_seconds: 27
Epoch [1/1], Step [10635/13804], Loss: 2.9771, Perplexity: 19.6298, time_taken_in_seconds: 28
Epoch [1/1], Step [10636/13804], Loss: 2.3709, Perplexity: 10.7074, time_taken_in_seconds: 29
Epoch [1/1], Step [10637/13804], Loss: 2.4677, Perplexity: 11.7952, time_taken_in_seconds: 29
Epoch [1/1], Step [10638/13804], Loss: 2.3231, Perplexity: 10.2071, time_taken_in_seconds: 30
Epoch [1/1], Step [10639/13804], Loss: 2.4077, Perplexity: 11.1085, time_taken_in_seconds: 31
Epoch [1/1], Step [10640/13804], Loss: 2.6595, Perplexity: 14.2896, time_taken_in_seconds: 32
Epoch [1/1], Step [10641/13804], Loss: 2.4585, Perplexity: 11.6872, time_taken_in_seconds: 33
Epoch [1/1], Step [10642/13804], Loss: 2.3984, Perplexity: 11.0053, time_taken_in_seconds: 34
Epoch [1/1], Step [10643/13804], Loss: 2.7879, Perplexity: 16.2470, time_taken_in_seconds: 35
Epoch [1/1], Step [10644/13804], Loss: 2.4319, Perplexity: 11.3807, time_taken_in_seconds: 35
Epoch [1/1], Step [10645/13804], Loss: 2.6961, Perplexity: 14.8222, time_taken_in_seconds: 36
Epoch [1/1], Step [10646/13804], Loss: 2.4788, Perplexity: 11.9267, time_taken_in_seconds: 37
Epoch [1/1], Step [10647/13804], Loss: 2.3320, Perplexity: 10.2989, time_taken_in_seconds: 38
Epoch [1/1], Step [10648/13804], Loss: 2.8637, Perplexity: 17.5256, time_taken_in_seconds: 39
Epoch [1/1], Step [10649/13804], Loss: 2.4403, Perplexity: 11.4770, time_taken_in_seconds: 40
Epoch [1/1], Step [10650/13804], Loss: 3.5092, Perplexity: 33.4216, time_taken_in_seconds: 40
Epoch [1/1], Step [10651/13804], Loss: 2.4683, Perplexity: 11.8022, time_taken_in_seconds: 41
Epoch [1/1], Step [10652/13804], Loss: 2.2722, Perplexity: 9.7004, time_taken_in_seconds: 42
Epoch [1/1], Step [10653/13804], Loss: 2.9625, Perplexity: 19.3470, time_taken_in_seconds: 43
Epoch [1/1], Step [10654/13804], Loss: 2.9716, Perplexity: 19.5234, time_taken_in_seconds: 44
Epoch [1/1], Step [10655/13804], Loss: 2.7900, Perplexity: 16.2805, time_taken_in_seconds: 44
Epoch [1/1], Step [10656/13804], Loss: 2.5785, Perplexity: 13.1767, time_taken_in_seconds: 45
Epoch [1/1], Step [10657/13804], Loss: 3.0139, Perplexity: 20.3675, time_taken_in_seconds: 46
Epoch [1/1], Step [10658/13804], Loss: 2.7221, Perplexity: 15.2127, time_taken_in_seconds: 47
Epoch [1/1], Step [10659/13804], Loss: 2.6952, Perplexity: 14.8078, time_taken_in_seconds: 48
Epoch [1/1], Step [10660/13804], Loss: 2.4689, Perplexity: 11.8092, time_taken_in_seconds: 49
Epoch [1/1], Step [10661/13804], Loss: 2.3538, Perplexity: 10.5250, time_taken_in_seconds: 49
Epoch [1/1], Step [10662/13804], Loss: 2.5501, Perplexity: 12.8082, time_taken_in_seconds: 50
Epoch [1/1], Step [10663/13804], Loss: 2.4147, Perplexity: 11.1867, time_taken_in_seconds: 51
Epoch [1/1], Step [10664/13804], Loss: 2.6804, Perplexity: 14.5904, time_taken_in_seconds: 52
Epoch [1/1], Step [10665/13804], Loss: 2.5651, Perplexity: 13.0014, time_taken_in_seconds: 53
Epoch [1/1], Step [10666/13804], Loss: 2.5690, Perplexity: 13.0525, time_taken_in_seconds: 53
Epoch [1/1], Step [10667/13804], Loss: 2.4344, Perplexity: 11.4095, time_taken_in_seconds: 54
Epoch [1/1], Step [10668/13804], Loss: 2.1425, Perplexity: 8.5204, time_taken_in_seconds: 55
Epoch [1/1], Step [10669/13804], Loss: 2.2683, Perplexity: 9.6628, time_taken_in_seconds: 56
Epoch [1/1], Step [10670/13804], Loss: 2.4312, Perplexity: 11.3730, time_taken_in_seconds: 57
Epoch [1/1], Step [10671/13804], Loss: 2.7079, Perplexity: 14.9976, time_taken_in_seconds: 57
Epoch [1/1], Step [10672/13804], Loss: 2.4093, Perplexity: 11.1258, time_taken_in_seconds: 58
Epoch [1/1], Step [10673/13804], Loss: 2.5288, Perplexity: 12.5380, time_taken_in_seconds: 59
Epoch [1/1], Step [10674/13804], Loss: 2.3286, Perplexity: 10.2631, time_taken_in_seconds: 60
Epoch [1/1], Step [10675/13804], Loss: 2.7036, Perplexity: 14.9332, time_taken_in_seconds: 61
Epoch [1/1], Step [10676/13804], Loss: 2.5732, Perplexity: 13.1080, time_taken_in_seconds: 61
Epoch [1/1], Step [10677/13804], Loss: 2.3212, Perplexity: 10.1874, time_taken_in_seconds: 62
Epoch [1/1], Step [10678/13804], Loss: 2.8411, Perplexity: 17.1339, time_taken_in_seconds: 63
Epoch [1/1], Step [10679/13804], Loss: 2.5508, Perplexity: 12.8173, time_taken_in_seconds: 64
Epoch [1/1], Step [10680/13804], Loss: 2.7798, Perplexity: 16.1161, time_taken_in_seconds: 65
Epoch [1/1], Step [10681/13804], Loss: 2.8889, Perplexity: 17.9736, time_taken_in_seconds: 65
Epoch [1/1], Step [10682/13804], Loss: 2.9133, Perplexity: 18.4168, time_taken_in_seconds: 66
Epoch [1/1], Step [10683/13804], Loss: 3.1014, Perplexity: 22.2280, time_taken_in_seconds: 67
Epoch [1/1], Step [10684/13804], Loss: 2.3375, Perplexity: 10.3553, time_taken_in_seconds: 68
Epoch [1/1], Step [10685/13804], Loss: 2.7704, Perplexity: 15.9658, time_taken_in_seconds: 69
Epoch [1/1], Step [10686/13804], Loss: 2.7055, Perplexity: 14.9615, time_taken_in_seconds: 70
Epoch [1/1], Step [10687/13804], Loss: 2.4070, Perplexity: 11.1002, time_taken_in_seconds: 70
Epoch [1/1], Step [10688/13804], Loss: 2.9328, Perplexity: 18.7804, time_taken_in_seconds: 71
Epoch [1/1], Step [10689/13804], Loss: 2.4885, Perplexity: 12.0428, time_taken_in_seconds: 72
Epoch [1/1], Step [10690/13804], Loss: 2.5913, Perplexity: 13.3465, time_taken_in_seconds: 73
Epoch [1/1], Step [10691/13804], Loss: 2.7664, Perplexity: 15.9017, time_taken_in_seconds: 74
Epoch [1/1], Step [10692/13804], Loss: 2.5410, Perplexity: 12.6922, time_taken_in_seconds: 75
Epoch [1/1], Step [10693/13804], Loss: 2.9049, Perplexity: 18.2641, time_taken_in_seconds: 75
Epoch [1/1], Step [10694/13804], Loss: 3.0555, Perplexity: 21.2312, time_taken_in_seconds: 76
Epoch [1/1], Step [10695/13804], Loss: 2.3286, Perplexity: 10.2636, time_taken_in_seconds: 77
Epoch [1/1], Step [10696/13804], Loss: 2.3423, Perplexity: 10.4054, time_taken_in_seconds: 78
Epoch [1/1], Step [10697/13804], Loss: 2.3838, Perplexity: 10.8463, time_taken_in_seconds: 79
Epoch [1/1], Step [10698/13804], Loss: 2.6406, Perplexity: 14.0212, time_taken_in_seconds: 79
Epoch [1/1], Step [10699/13804], Loss: 2.3664, Perplexity: 10.6591, time_taken_in_seconds: 80
Epoch [1/1], Step [10700/13804], Loss: 2.3616, Perplexity: 10.6080, time_taken_in_seconds: 81
Epoch [1/1], Step [10701/13804], Loss: 2.6021, Perplexity: 13.4918, time_taken_in_seconds: 0
Epoch [1/1], Step [10702/13804], Loss: 2.8052, Perplexity: 16.5303, time_taken_in_seconds: 1
Epoch [1/1], Step [10703/13804], Loss: 2.5776, Perplexity: 13.1652, time_taken_in_seconds: 2
Epoch [1/1], Step [10704/13804], Loss: 2.7993, Perplexity: 16.4333, time_taken_in_seconds: 3
Epoch [1/1], Step [10705/13804], Loss: 2.4974, Perplexity: 12.1509, time_taken_in_seconds: 4
Epoch [1/1], Step [10706/13804], Loss: 2.6027, Perplexity: 13.4999, time_taken_in_seconds: 4
Epoch [1/1], Step [10707/13804], Loss: 3.4321, Perplexity: 30.9428, time_taken_in_seconds: 5
Epoch [1/1], Step [10708/13804], Loss: 2.7327, Perplexity: 15.3749, time_taken_in_seconds: 6
Epoch [1/1], Step [10709/13804], Loss: 2.6965, Perplexity: 14.8270, time_taken_in_seconds: 7
Epoch [1/1], Step [10710/13804], Loss: 2.9112, Perplexity: 18.3787, time_taken_in_seconds: 8
Epoch [1/1], Step [10711/13804], Loss: 2.8529, Perplexity: 17.3377, time_taken_in_seconds: 8
Epoch [1/1], Step [10712/13804], Loss: 2.6307, Perplexity: 13.8832, time_taken_in_seconds: 9
Epoch [1/1], Step [10713/13804], Loss: 2.6588, Perplexity: 14.2793, time_taken_in_seconds: 10
Epoch [1/1], Step [10714/13804], Loss: 2.4803, Perplexity: 11.9445, time_taken_in_seconds: 11
Epoch [1/1], Step [10715/13804], Loss: 2.5729, Perplexity: 13.1040, time_taken_in_seconds: 12
Epoch [1/1], Step [10716/13804], Loss: 2.5203, Perplexity: 12.4318, time_taken_in_seconds: 13
Epoch [1/1], Step [10717/13804], Loss: 2.4726, Perplexity: 11.8536, time_taken_in_seconds: 13
Epoch [1/1], Step [10718/13804], Loss: 2.3753, Perplexity: 10.7540, time_taken_in_seconds: 14
Epoch [1/1], Step [10719/13804], Loss: 2.5893, Perplexity: 13.3203, time_taken_in_seconds: 15
Epoch [1/1], Step [10720/13804], Loss: 2.5912, Perplexity: 13.3455, time_taken_in_seconds: 16
Epoch [1/1], Step [10721/13804], Loss: 2.3515, Perplexity: 10.5013, time_taken_in_seconds: 17
Epoch [1/1], Step [10722/13804], Loss: 2.4363, Perplexity: 11.4304, time_taken_in_seconds: 18
Epoch [1/1], Step [10723/13804], Loss: 2.6527, Perplexity: 14.1918, time_taken_in_seconds: 18
Epoch [1/1], Step [10724/13804], Loss: 2.5837, Perplexity: 13.2466, time_taken_in_seconds: 19
Epoch [1/1], Step [10725/13804], Loss: 2.7118, Perplexity: 15.0568, time_taken_in_seconds: 20
Epoch [1/1], Step [10726/13804], Loss: 2.5491, Perplexity: 12.7958, time_taken_in_seconds: 21
Epoch [1/1], Step [10727/13804], Loss: 2.8525, Perplexity: 17.3317, time_taken_in_seconds: 22
Epoch [1/1], Step [10728/13804], Loss: 4.5573, Perplexity: 95.3287, time_taken_in_seconds: 22
Epoch [1/1], Step [10729/13804], Loss: 2.6579, Perplexity: 14.2661, time_taken_in_seconds: 23
Epoch [1/1], Step [10730/13804], Loss: 3.1638, Perplexity: 23.6600, time_taken_in_seconds: 24
Epoch [1/1], Step [10731/13804], Loss: 2.6262, Perplexity: 13.8210, time_taken_in_seconds: 25
Epoch [1/1], Step [10732/13804], Loss: 2.4361, Perplexity: 11.4288, time_taken_in_seconds: 26
Epoch [1/1], Step [10733/13804], Loss: 2.3676, Perplexity: 10.6721, time_taken_in_seconds: 27
Epoch [1/1], Step [10734/13804], Loss: 2.4544, Perplexity: 11.6394, time_taken_in_seconds: 27
Epoch [1/1], Step [10735/13804], Loss: 2.6787, Perplexity: 14.5657, time_taken_in_seconds: 28
Epoch [1/1], Step [10736/13804], Loss: 2.4691, Perplexity: 11.8116, time_taken_in_seconds: 29
Epoch [1/1], Step [10737/13804], Loss: 2.5386, Perplexity: 12.6624, time_taken_in_seconds: 30
Epoch [1/1], Step [10738/13804], Loss: 3.0157, Perplexity: 20.4030, time_taken_in_seconds: 31
Epoch [1/1], Step [10739/13804], Loss: 2.6724, Perplexity: 14.4753, time_taken_in_seconds: 31
Epoch [1/1], Step [10740/13804], Loss: 2.3268, Perplexity: 10.2454, time_taken_in_seconds: 32
Epoch [1/1], Step [10741/13804], Loss: 2.4262, Perplexity: 11.3155, time_taken_in_seconds: 33
Epoch [1/1], Step [10742/13804], Loss: 2.4951, Perplexity: 12.1230, time_taken_in_seconds: 34
Epoch [1/1], Step [10743/13804], Loss: 2.9798, Perplexity: 19.6833, time_taken_in_seconds: 35
Epoch [1/1], Step [10744/13804], Loss: 2.2909, Perplexity: 9.8841, time_taken_in_seconds: 36
Epoch [1/1], Step [10745/13804], Loss: 2.4266, Perplexity: 11.3198, time_taken_in_seconds: 36
Epoch [1/1], Step [10746/13804], Loss: 2.6955, Perplexity: 14.8135, time_taken_in_seconds: 37
Epoch [1/1], Step [10747/13804], Loss: 2.3408, Perplexity: 10.3893, time_taken_in_seconds: 38
Epoch [1/1], Step [10748/13804], Loss: 2.5802, Perplexity: 13.1994, time_taken_in_seconds: 39
Epoch [1/1], Step [10749/13804], Loss: 2.7546, Perplexity: 15.7153, time_taken_in_seconds: 40
Epoch [1/1], Step [10750/13804], Loss: 2.8795, Perplexity: 17.8057, time_taken_in_seconds: 40
Epoch [1/1], Step [10751/13804], Loss: 2.6261, Perplexity: 13.8199, time_taken_in_seconds: 41
Epoch [1/1], Step [10752/13804], Loss: 2.5612, Perplexity: 12.9510, time_taken_in_seconds: 42
Epoch [1/1], Step [10753/13804], Loss: 2.1049, Perplexity: 8.2064, time_taken_in_seconds: 43
Epoch [1/1], Step [10754/13804], Loss: 2.7009, Perplexity: 14.8929, time_taken_in_seconds: 44
Epoch [1/1], Step [10755/13804], Loss: 3.0344, Perplexity: 20.7878, time_taken_in_seconds: 45
Epoch [1/1], Step [10756/13804], Loss: 2.3880, Perplexity: 10.8913, time_taken_in_seconds: 45
Epoch [1/1], Step [10757/13804], Loss: 2.7452, Perplexity: 15.5683, time_taken_in_seconds: 46
Epoch [1/1], Step [10758/13804], Loss: 2.9733, Perplexity: 19.5569, time_taken_in_seconds: 47
Epoch [1/1], Step [10759/13804], Loss: 2.5011, Perplexity: 12.1961, time_taken_in_seconds: 48
Epoch [1/1], Step [10760/13804], Loss: 2.4896, Perplexity: 12.0559, time_taken_in_seconds: 49
Epoch [1/1], Step [10761/13804], Loss: 2.6854, Perplexity: 14.6648, time_taken_in_seconds: 49
Epoch [1/1], Step [10762/13804], Loss: 2.4897, Perplexity: 12.0574, time_taken_in_seconds: 50
Epoch [1/1], Step [10763/13804], Loss: 2.3297, Perplexity: 10.2747, time_taken_in_seconds: 51
Epoch [1/1], Step [10764/13804], Loss: 3.4523, Perplexity: 31.5733, time_taken_in_seconds: 52
Epoch [1/1], Step [10765/13804], Loss: 2.8590, Perplexity: 17.4447, time_taken_in_seconds: 53
Epoch [1/1], Step [10766/13804], Loss: 2.3625, Perplexity: 10.6178, time_taken_in_seconds: 54
Epoch [1/1], Step [10767/13804], Loss: 2.5841, Perplexity: 13.2520, time_taken_in_seconds: 54
Epoch [1/1], Step [10768/13804], Loss: 2.3933, Perplexity: 10.9500, time_taken_in_seconds: 55
Epoch [1/1], Step [10769/13804], Loss: 2.2827, Perplexity: 9.8034, time_taken_in_seconds: 56
Epoch [1/1], Step [10770/13804], Loss: 2.5013, Perplexity: 12.1985, time_taken_in_seconds: 57
Epoch [1/1], Step [10771/13804], Loss: 2.1274, Perplexity: 8.3932, time_taken_in_seconds: 58
Epoch [1/1], Step [10772/13804], Loss: 2.6991, Perplexity: 14.8665, time_taken_in_seconds: 58
Epoch [1/1], Step [10773/13804], Loss: 2.6846, Perplexity: 14.6518, time_taken_in_seconds: 59
Epoch [1/1], Step [10774/13804], Loss: 2.5107, Perplexity: 12.3138, time_taken_in_seconds: 60
Epoch [1/1], Step [10775/13804], Loss: 2.4453, Perplexity: 11.5335, time_taken_in_seconds: 61
Epoch [1/1], Step [10776/13804], Loss: 2.3785, Perplexity: 10.7886, time_taken_in_seconds: 62
Epoch [1/1], Step [10777/13804], Loss: 2.8523, Perplexity: 17.3269, time_taken_in_seconds: 63
Epoch [1/1], Step [10778/13804], Loss: 2.6482, Perplexity: 14.1285, time_taken_in_seconds: 63
Epoch [1/1], Step [10779/13804], Loss: 2.8923, Perplexity: 18.0353, time_taken_in_seconds: 64
Epoch [1/1], Step [10780/13804], Loss: 2.2841, Perplexity: 9.8167, time_taken_in_seconds: 65
Epoch [1/1], Step [10781/13804], Loss: 2.5465, Perplexity: 12.7624, time_taken_in_seconds: 66
Epoch [1/1], Step [10782/13804], Loss: 2.6081, Perplexity: 13.5730, time_taken_in_seconds: 67
Epoch [1/1], Step [10783/13804], Loss: 2.2890, Perplexity: 9.8652, time_taken_in_seconds: 67
Epoch [1/1], Step [10784/13804], Loss: 2.3948, Perplexity: 10.9655, time_taken_in_seconds: 68
Epoch [1/1], Step [10785/13804], Loss: 2.5104, Perplexity: 12.3096, time_taken_in_seconds: 69
Epoch [1/1], Step [10786/13804], Loss: 2.3800, Perplexity: 10.8052, time_taken_in_seconds: 70
Epoch [1/1], Step [10787/13804], Loss: 2.3529, Perplexity: 10.5160, time_taken_in_seconds: 71
Epoch [1/1], Step [10788/13804], Loss: 2.5087, Perplexity: 12.2885, time_taken_in_seconds: 72
Epoch [1/1], Step [10789/13804], Loss: 2.3368, Perplexity: 10.3482, time_taken_in_seconds: 73
Epoch [1/1], Step [10790/13804], Loss: 2.5482, Perplexity: 12.7845, time_taken_in_seconds: 73
Epoch [1/1], Step [10791/13804], Loss: 3.2232, Perplexity: 25.1072, time_taken_in_seconds: 74
Epoch [1/1], Step [10792/13804], Loss: 2.2221, Perplexity: 9.2271, time_taken_in_seconds: 75
Epoch [1/1], Step [10793/13804], Loss: 2.3875, Perplexity: 10.8866, time_taken_in_seconds: 76
Epoch [1/1], Step [10794/13804], Loss: 2.6947, Perplexity: 14.8004, time_taken_in_seconds: 77
Epoch [1/1], Step [10795/13804], Loss: 2.5693, Perplexity: 13.0566, time_taken_in_seconds: 78
Epoch [1/1], Step [10796/13804], Loss: 2.4543, Perplexity: 11.6378, time_taken_in_seconds: 78
Epoch [1/1], Step [10797/13804], Loss: 2.5628, Perplexity: 12.9720, time_taken_in_seconds: 79
Epoch [1/1], Step [10798/13804], Loss: 2.5199, Perplexity: 12.4275, time_taken_in_seconds: 80
Epoch [1/1], Step [10799/13804], Loss: 2.5199, Perplexity: 12.4269, time_taken_in_seconds: 81
Epoch [1/1], Step [10800/13804], Loss: 2.6271, Perplexity: 13.8330, time_taken_in_seconds: 82
Epoch [1/1], Step [10801/13804], Loss: 2.4447, Perplexity: 11.5274, time_taken_in_seconds: 0
Epoch [1/1], Step [10802/13804], Loss: 2.4938, Perplexity: 12.1075, time_taken_in_seconds: 1
Epoch [1/1], Step [10803/13804], Loss: 2.8203, Perplexity: 16.7827, time_taken_in_seconds: 2
Epoch [1/1], Step [10804/13804], Loss: 2.9004, Perplexity: 18.1822, time_taken_in_seconds: 3
Epoch [1/1], Step [10805/13804], Loss: 2.3039, Perplexity: 10.0128, time_taken_in_seconds: 4
Epoch [1/1], Step [10806/13804], Loss: 2.4480, Perplexity: 11.5655, time_taken_in_seconds: 4
Epoch [1/1], Step [10807/13804], Loss: 2.5087, Perplexity: 12.2890, time_taken_in_seconds: 5
Epoch [1/1], Step [10808/13804], Loss: 2.4876, Perplexity: 12.0318, time_taken_in_seconds: 6
Epoch [1/1], Step [10809/13804], Loss: 2.5371, Perplexity: 12.6427, time_taken_in_seconds: 7
Epoch [1/1], Step [10810/13804], Loss: 2.3890, Perplexity: 10.9022, time_taken_in_seconds: 8
Epoch [1/1], Step [10811/13804], Loss: 2.2939, Perplexity: 9.9131, time_taken_in_seconds: 8
Epoch [1/1], Step [10812/13804], Loss: 2.5978, Perplexity: 13.4342, time_taken_in_seconds: 9
Epoch [1/1], Step [10813/13804], Loss: 2.6862, Perplexity: 14.6762, time_taken_in_seconds: 10
Epoch [1/1], Step [10814/13804], Loss: 2.2670, Perplexity: 9.6500, time_taken_in_seconds: 11
Epoch [1/1], Step [10815/13804], Loss: 2.2068, Perplexity: 9.0865, time_taken_in_seconds: 12
Epoch [1/1], Step [10816/13804], Loss: 2.5129, Perplexity: 12.3404, time_taken_in_seconds: 13
Epoch [1/1], Step [10817/13804], Loss: 2.6268, Perplexity: 13.8300, time_taken_in_seconds: 13
Epoch [1/1], Step [10818/13804], Loss: 3.0638, Perplexity: 21.4094, time_taken_in_seconds: 14
Epoch [1/1], Step [10819/13804], Loss: 2.7125, Perplexity: 15.0667, time_taken_in_seconds: 15
Epoch [1/1], Step [10820/13804], Loss: 2.4336, Perplexity: 11.3995, time_taken_in_seconds: 16
Epoch [1/1], Step [10821/13804], Loss: 2.7571, Perplexity: 15.7545, time_taken_in_seconds: 17
Epoch [1/1], Step [10822/13804], Loss: 2.5610, Perplexity: 12.9490, time_taken_in_seconds: 17
Epoch [1/1], Step [10823/13804], Loss: 2.8524, Perplexity: 17.3296, time_taken_in_seconds: 18
Epoch [1/1], Step [10824/13804], Loss: 2.3477, Perplexity: 10.4619, time_taken_in_seconds: 19
Epoch [1/1], Step [10825/13804], Loss: 2.3296, Perplexity: 10.2743, time_taken_in_seconds: 20
Epoch [1/1], Step [10826/13804], Loss: 2.6332, Perplexity: 13.9176, time_taken_in_seconds: 21
Epoch [1/1], Step [10827/13804], Loss: 2.7072, Perplexity: 14.9877, time_taken_in_seconds: 22
Epoch [1/1], Step [10828/13804], Loss: 2.3890, Perplexity: 10.9029, time_taken_in_seconds: 22
Epoch [1/1], Step [10829/13804], Loss: 2.4616, Perplexity: 11.7240, time_taken_in_seconds: 23
Epoch [1/1], Step [10830/13804], Loss: 2.5093, Perplexity: 12.2966, time_taken_in_seconds: 24
Epoch [1/1], Step [10831/13804], Loss: 2.6544, Perplexity: 14.2162, time_taken_in_seconds: 25
Epoch [1/1], Step [10832/13804], Loss: 2.3322, Perplexity: 10.3001, time_taken_in_seconds: 26
Epoch [1/1], Step [10833/13804], Loss: 2.5166, Perplexity: 12.3868, time_taken_in_seconds: 26
Epoch [1/1], Step [10834/13804], Loss: 2.6940, Perplexity: 14.7907, time_taken_in_seconds: 27
Epoch [1/1], Step [10835/13804], Loss: 2.5965, Perplexity: 13.4170, time_taken_in_seconds: 28
Epoch [1/1], Step [10836/13804], Loss: 2.5975, Perplexity: 13.4300, time_taken_in_seconds: 29
Epoch [1/1], Step [10837/13804], Loss: 2.2570, Perplexity: 9.5544, time_taken_in_seconds: 30
Epoch [1/1], Step [10838/13804], Loss: 2.4201, Perplexity: 11.2474, time_taken_in_seconds: 30
Epoch [1/1], Step [10839/13804], Loss: 3.4337, Perplexity: 30.9903, time_taken_in_seconds: 31
Epoch [1/1], Step [10840/13804], Loss: 2.6580, Perplexity: 14.2677, time_taken_in_seconds: 32
Epoch [1/1], Step [10841/13804], Loss: 2.6930, Perplexity: 14.7757, time_taken_in_seconds: 33
Epoch [1/1], Step [10842/13804], Loss: 2.5272, Perplexity: 12.5186, time_taken_in_seconds: 34
Epoch [1/1], Step [10843/13804], Loss: 3.3121, Perplexity: 27.4429, time_taken_in_seconds: 35
Epoch [1/1], Step [10844/13804], Loss: 2.3506, Perplexity: 10.4915, time_taken_in_seconds: 35
Epoch [1/1], Step [10845/13804], Loss: 3.3446, Perplexity: 28.3483, time_taken_in_seconds: 36
Epoch [1/1], Step [10846/13804], Loss: 2.6620, Perplexity: 14.3249, time_taken_in_seconds: 37
Epoch [1/1], Step [10847/13804], Loss: 2.3131, Perplexity: 10.1062, time_taken_in_seconds: 38
Epoch [1/1], Step [10848/13804], Loss: 2.6145, Perplexity: 13.6610, time_taken_in_seconds: 39
Epoch [1/1], Step [10849/13804], Loss: 2.4080, Perplexity: 11.1118, time_taken_in_seconds: 39
Epoch [1/1], Step [10850/13804], Loss: 2.7954, Perplexity: 16.3699, time_taken_in_seconds: 40
Epoch [1/1], Step [10851/13804], Loss: 2.3078, Perplexity: 10.0520, time_taken_in_seconds: 41
Epoch [1/1], Step [10852/13804], Loss: 2.6668, Perplexity: 14.3937, time_taken_in_seconds: 42
Epoch [1/1], Step [10853/13804], Loss: 2.5927, Perplexity: 13.3660, time_taken_in_seconds: 43
Epoch [1/1], Step [10854/13804], Loss: 2.6246, Perplexity: 13.7990, time_taken_in_seconds: 43
Epoch [1/1], Step [10855/13804], Loss: 2.4399, Perplexity: 11.4715, time_taken_in_seconds: 44
Epoch [1/1], Step [10856/13804], Loss: 2.2783, Perplexity: 9.7605, time_taken_in_seconds: 45
Epoch [1/1], Step [10857/13804], Loss: 2.4489, Perplexity: 11.5751, time_taken_in_seconds: 46
Epoch [1/1], Step [10858/13804], Loss: 2.6100, Perplexity: 13.5985, time_taken_in_seconds: 47
Epoch [1/1], Step [10859/13804], Loss: 2.3373, Perplexity: 10.3531, time_taken_in_seconds: 48
Epoch [1/1], Step [10860/13804], Loss: 2.6445, Perplexity: 14.0764, time_taken_in_seconds: 48
Epoch [1/1], Step [10861/13804], Loss: 2.9461, Perplexity: 19.0318, time_taken_in_seconds: 49
Epoch [1/1], Step [10862/13804], Loss: 3.0035, Perplexity: 20.1569, time_taken_in_seconds: 50
Epoch [1/1], Step [10863/13804], Loss: 2.6312, Perplexity: 13.8901, time_taken_in_seconds: 51
Epoch [1/1], Step [10864/13804], Loss: 2.6391, Perplexity: 14.0004, time_taken_in_seconds: 52
Epoch [1/1], Step [10865/13804], Loss: 2.9470, Perplexity: 19.0490, time_taken_in_seconds: 53
Epoch [1/1], Step [10866/13804], Loss: 2.3366, Perplexity: 10.3458, time_taken_in_seconds: 53
Epoch [1/1], Step [10867/13804], Loss: 2.2198, Perplexity: 9.2052, time_taken_in_seconds: 54
Epoch [1/1], Step [10868/13804], Loss: 2.7187, Perplexity: 15.1604, time_taken_in_seconds: 55
Epoch [1/1], Step [10869/13804], Loss: 2.5179, Perplexity: 12.4029, time_taken_in_seconds: 56
Epoch [1/1], Step [10870/13804], Loss: 2.3584, Perplexity: 10.5741, time_taken_in_seconds: 57
Epoch [1/1], Step [10871/13804], Loss: 2.7561, Perplexity: 15.7380, time_taken_in_seconds: 57
Epoch [1/1], Step [10872/13804], Loss: 2.3865, Perplexity: 10.8751, time_taken_in_seconds: 58
Epoch [1/1], Step [10873/13804], Loss: 2.9410, Perplexity: 18.9350, time_taken_in_seconds: 59
Epoch [1/1], Step [10874/13804], Loss: 2.4745, Perplexity: 11.8754, time_taken_in_seconds: 60
Epoch [1/1], Step [10875/13804], Loss: 2.2663, Perplexity: 9.6434, time_taken_in_seconds: 61
Epoch [1/1], Step [10876/13804], Loss: 2.4565, Perplexity: 11.6642, time_taken_in_seconds: 62
Epoch [1/1], Step [10877/13804], Loss: 2.7045, Perplexity: 14.9467, time_taken_in_seconds: 62
Epoch [1/1], Step [10878/13804], Loss: 2.3401, Perplexity: 10.3828, time_taken_in_seconds: 63
Epoch [1/1], Step [10879/13804], Loss: 2.4507, Perplexity: 11.5961, time_taken_in_seconds: 64
Epoch [1/1], Step [10880/13804], Loss: 2.9802, Perplexity: 19.6923, time_taken_in_seconds: 65
Epoch [1/1], Step [10881/13804], Loss: 3.5585, Perplexity: 35.1117, time_taken_in_seconds: 66
Epoch [1/1], Step [10882/13804], Loss: 2.4153, Perplexity: 11.1935, time_taken_in_seconds: 66
Epoch [1/1], Step [10883/13804], Loss: 2.4385, Perplexity: 11.4557, time_taken_in_seconds: 67
Epoch [1/1], Step [10884/13804], Loss: 2.1156, Perplexity: 8.2942, time_taken_in_seconds: 68
Epoch [1/1], Step [10885/13804], Loss: 2.6864, Perplexity: 14.6782, time_taken_in_seconds: 69
Epoch [1/1], Step [10886/13804], Loss: 2.5223, Perplexity: 12.4570, time_taken_in_seconds: 70
Epoch [1/1], Step [10887/13804], Loss: 2.3019, Perplexity: 9.9932, time_taken_in_seconds: 71
Epoch [1/1], Step [10888/13804], Loss: 2.9830, Perplexity: 19.7463, time_taken_in_seconds: 71
Epoch [1/1], Step [10889/13804], Loss: 2.7938, Perplexity: 16.3425, time_taken_in_seconds: 72
Epoch [1/1], Step [10890/13804], Loss: 2.5021, Perplexity: 12.2076, time_taken_in_seconds: 73
Epoch [1/1], Step [10891/13804], Loss: 2.6416, Perplexity: 14.0355, time_taken_in_seconds: 74
Epoch [1/1], Step [10892/13804], Loss: 2.5542, Perplexity: 12.8605, time_taken_in_seconds: 75
Epoch [1/1], Step [10893/13804], Loss: 2.3617, Perplexity: 10.6086, time_taken_in_seconds: 75
Epoch [1/1], Step [10894/13804], Loss: 2.2307, Perplexity: 9.3061, time_taken_in_seconds: 76
Epoch [1/1], Step [10895/13804], Loss: 2.5986, Perplexity: 13.4455, time_taken_in_seconds: 77
Epoch [1/1], Step [10896/13804], Loss: 2.6587, Perplexity: 14.2780, time_taken_in_seconds: 78
Epoch [1/1], Step [10897/13804], Loss: 2.3145, Perplexity: 10.1203, time_taken_in_seconds: 79
Epoch [1/1], Step [10898/13804], Loss: 3.4105, Perplexity: 30.2799, time_taken_in_seconds: 80
Epoch [1/1], Step [10899/13804], Loss: 2.3262, Perplexity: 10.2391, time_taken_in_seconds: 80
Epoch [1/1], Step [10900/13804], Loss: 2.5864, Perplexity: 13.2820, time_taken_in_seconds: 81
Epoch [1/1], Step [10901/13804], Loss: 2.2785, Perplexity: 9.7621, time_taken_in_seconds: 0
Epoch [1/1], Step [10902/13804], Loss: 2.2442, Perplexity: 9.4324, time_taken_in_seconds: 1
Epoch [1/1], Step [10903/13804], Loss: 2.4756, Perplexity: 11.8885, time_taken_in_seconds: 2
Epoch [1/1], Step [10904/13804], Loss: 2.5045, Perplexity: 12.2378, time_taken_in_seconds: 3
Epoch [1/1], Step [10905/13804], Loss: 2.9288, Perplexity: 18.7059, time_taken_in_seconds: 4
Epoch [1/1], Step [10906/13804], Loss: 2.2039, Perplexity: 9.0602, time_taken_in_seconds: 4
Epoch [1/1], Step [10907/13804], Loss: 2.6543, Perplexity: 14.2148, time_taken_in_seconds: 5
Epoch [1/1], Step [10908/13804], Loss: 2.5526, Perplexity: 12.8406, time_taken_in_seconds: 6
Epoch [1/1], Step [10909/13804], Loss: 2.6738, Perplexity: 14.4954, time_taken_in_seconds: 7
Epoch [1/1], Step [10910/13804], Loss: 2.6529, Perplexity: 14.1956, time_taken_in_seconds: 8
Epoch [1/1], Step [10911/13804], Loss: 2.5572, Perplexity: 12.8991, time_taken_in_seconds: 8
Epoch [1/1], Step [10912/13804], Loss: 2.5517, Perplexity: 12.8289, time_taken_in_seconds: 9
Epoch [1/1], Step [10913/13804], Loss: 2.3128, Perplexity: 10.1030, time_taken_in_seconds: 10
Epoch [1/1], Step [10914/13804], Loss: 2.7626, Perplexity: 15.8407, time_taken_in_seconds: 11
Epoch [1/1], Step [10915/13804], Loss: 3.0651, Perplexity: 21.4360, time_taken_in_seconds: 12
Epoch [1/1], Step [10916/13804], Loss: 2.4659, Perplexity: 11.7743, time_taken_in_seconds: 13
Epoch [1/1], Step [10917/13804], Loss: 2.3085, Perplexity: 10.0596, time_taken_in_seconds: 13
Epoch [1/1], Step [10918/13804], Loss: 2.6518, Perplexity: 14.1791, time_taken_in_seconds: 14
Epoch [1/1], Step [10919/13804], Loss: 2.0727, Perplexity: 7.9462, time_taken_in_seconds: 15
Epoch [1/1], Step [10920/13804], Loss: 2.6796, Perplexity: 14.5794, time_taken_in_seconds: 16
Epoch [1/1], Step [10921/13804], Loss: 2.1836, Perplexity: 8.8779, time_taken_in_seconds: 17
Epoch [1/1], Step [10922/13804], Loss: 2.5550, Perplexity: 12.8709, time_taken_in_seconds: 17
Epoch [1/1], Step [10923/13804], Loss: 2.6023, Perplexity: 13.4948, time_taken_in_seconds: 18
Epoch [1/1], Step [10924/13804], Loss: 2.4365, Perplexity: 11.4333, time_taken_in_seconds: 19
Epoch [1/1], Step [10925/13804], Loss: 2.3391, Perplexity: 10.3724, time_taken_in_seconds: 20
Epoch [1/1], Step [10926/13804], Loss: 2.6481, Perplexity: 14.1267, time_taken_in_seconds: 21
Epoch [1/1], Step [10927/13804], Loss: 2.4118, Perplexity: 11.1542, time_taken_in_seconds: 21
Epoch [1/1], Step [10928/13804], Loss: 2.3776, Perplexity: 10.7788, time_taken_in_seconds: 22
Epoch [1/1], Step [10929/13804], Loss: 2.5018, Perplexity: 12.2043, time_taken_in_seconds: 23
Epoch [1/1], Step [10930/13804], Loss: 2.3918, Perplexity: 10.9327, time_taken_in_seconds: 24
Epoch [1/1], Step [10931/13804], Loss: 2.5691, Perplexity: 13.0543, time_taken_in_seconds: 25
Epoch [1/1], Step [10932/13804], Loss: 2.8694, Perplexity: 17.6263, time_taken_in_seconds: 26
Epoch [1/1], Step [10933/13804], Loss: 2.2353, Perplexity: 9.3493, time_taken_in_seconds: 26
Epoch [1/1], Step [10934/13804], Loss: 2.4877, Perplexity: 12.0338, time_taken_in_seconds: 27
Epoch [1/1], Step [10935/13804], Loss: 2.3506, Perplexity: 10.4921, time_taken_in_seconds: 28
Epoch [1/1], Step [10936/13804], Loss: 2.8259, Perplexity: 16.8756, time_taken_in_seconds: 29
Epoch [1/1], Step [10937/13804], Loss: 2.7658, Perplexity: 15.8911, time_taken_in_seconds: 30
Epoch [1/1], Step [10938/13804], Loss: 2.5823, Perplexity: 13.2278, time_taken_in_seconds: 31
Epoch [1/1], Step [10939/13804], Loss: 2.7098, Perplexity: 15.0264, time_taken_in_seconds: 31
Epoch [1/1], Step [10940/13804], Loss: 2.5759, Perplexity: 13.1432, time_taken_in_seconds: 32
Epoch [1/1], Step [10941/13804], Loss: 2.6049, Perplexity: 13.5302, time_taken_in_seconds: 33
Epoch [1/1], Step [10942/13804], Loss: 2.3844, Perplexity: 10.8526, time_taken_in_seconds: 34
Epoch [1/1], Step [10943/13804], Loss: 2.3524, Perplexity: 10.5106, time_taken_in_seconds: 35
Epoch [1/1], Step [10944/13804], Loss: 2.5392, Perplexity: 12.6694, time_taken_in_seconds: 36
Epoch [1/1], Step [10945/13804], Loss: 2.5293, Perplexity: 12.5453, time_taken_in_seconds: 36
Epoch [1/1], Step [10946/13804], Loss: 2.7206, Perplexity: 15.1893, time_taken_in_seconds: 37
Epoch [1/1], Step [10947/13804], Loss: 2.5969, Perplexity: 13.4217, time_taken_in_seconds: 38
Epoch [1/1], Step [10948/13804], Loss: 2.3171, Perplexity: 10.1466, time_taken_in_seconds: 39
Epoch [1/1], Step [10949/13804], Loss: 2.8099, Perplexity: 16.6084, time_taken_in_seconds: 40
Epoch [1/1], Step [10950/13804], Loss: 2.6349, Perplexity: 13.9417, time_taken_in_seconds: 40
Epoch [1/1], Step [10951/13804], Loss: 2.5508, Perplexity: 12.8168, time_taken_in_seconds: 41
Epoch [1/1], Step [10952/13804], Loss: 2.4705, Perplexity: 11.8280, time_taken_in_seconds: 42
Epoch [1/1], Step [10953/13804], Loss: 2.6258, Perplexity: 13.8162, time_taken_in_seconds: 43
Epoch [1/1], Step [10954/13804], Loss: 2.4760, Perplexity: 11.8935, time_taken_in_seconds: 44
Epoch [1/1], Step [10955/13804], Loss: 2.6286, Perplexity: 13.8547, time_taken_in_seconds: 44
Epoch [1/1], Step [10956/13804], Loss: 2.3979, Perplexity: 11.0002, time_taken_in_seconds: 45
Epoch [1/1], Step [10957/13804], Loss: 2.5561, Perplexity: 12.8859, time_taken_in_seconds: 46
Epoch [1/1], Step [10958/13804], Loss: 2.4116, Perplexity: 11.1516, time_taken_in_seconds: 47
Epoch [1/1], Step [10959/13804], Loss: 2.6655, Perplexity: 14.3744, time_taken_in_seconds: 48
Epoch [1/1], Step [10960/13804], Loss: 2.4104, Perplexity: 11.1379, time_taken_in_seconds: 49
Epoch [1/1], Step [10961/13804], Loss: 2.4285, Perplexity: 11.3421, time_taken_in_seconds: 49
Epoch [1/1], Step [10962/13804], Loss: 2.5044, Perplexity: 12.2359, time_taken_in_seconds: 50
Epoch [1/1], Step [10963/13804], Loss: 2.4957, Perplexity: 12.1306, time_taken_in_seconds: 51
Epoch [1/1], Step [10964/13804], Loss: 2.3996, Perplexity: 11.0183, time_taken_in_seconds: 52
Epoch [1/1], Step [10965/13804], Loss: 2.2566, Perplexity: 9.5508, time_taken_in_seconds: 53
Epoch [1/1], Step [10966/13804], Loss: 2.6827, Perplexity: 14.6242, time_taken_in_seconds: 53
Epoch [1/1], Step [10967/13804], Loss: 2.4655, Perplexity: 11.7695, time_taken_in_seconds: 54
Epoch [1/1], Step [10968/13804], Loss: 2.4638, Perplexity: 11.7499, time_taken_in_seconds: 55
Epoch [1/1], Step [10969/13804], Loss: 2.6458, Perplexity: 14.0943, time_taken_in_seconds: 56
Epoch [1/1], Step [10970/13804], Loss: 2.8491, Perplexity: 17.2725, time_taken_in_seconds: 57
Epoch [1/1], Step [10971/13804], Loss: 2.5486, Perplexity: 12.7896, time_taken_in_seconds: 58
Epoch [1/1], Step [10972/13804], Loss: 2.5120, Perplexity: 12.3299, time_taken_in_seconds: 58
Epoch [1/1], Step [10973/13804], Loss: 2.6269, Perplexity: 13.8308, time_taken_in_seconds: 59
Epoch [1/1], Step [10974/13804], Loss: 2.6866, Perplexity: 14.6818, time_taken_in_seconds: 60
Epoch [1/1], Step [10975/13804], Loss: 2.6520, Perplexity: 14.1829, time_taken_in_seconds: 61
Epoch [1/1], Step [10976/13804], Loss: 2.2834, Perplexity: 9.8104, time_taken_in_seconds: 62
Epoch [1/1], Step [10977/13804], Loss: 2.4679, Perplexity: 11.7979, time_taken_in_seconds: 62
Epoch [1/1], Step [10978/13804], Loss: 2.4959, Perplexity: 12.1321, time_taken_in_seconds: 63
Epoch [1/1], Step [10979/13804], Loss: 2.0181, Perplexity: 7.5239, time_taken_in_seconds: 64
Epoch [1/1], Step [10980/13804], Loss: 2.2606, Perplexity: 9.5890, time_taken_in_seconds: 65
Epoch [1/1], Step [10981/13804], Loss: 3.0245, Perplexity: 20.5843, time_taken_in_seconds: 66
Epoch [1/1], Step [10982/13804], Loss: 2.7204, Perplexity: 15.1863, time_taken_in_seconds: 67
Epoch [1/1], Step [10983/13804], Loss: 2.3891, Perplexity: 10.9037, time_taken_in_seconds: 67
Epoch [1/1], Step [10984/13804], Loss: 2.4865, Perplexity: 12.0196, time_taken_in_seconds: 68
Epoch [1/1], Step [10985/13804], Loss: 3.4470, Perplexity: 31.4064, time_taken_in_seconds: 69
Epoch [1/1], Step [10986/13804], Loss: 2.6813, Perplexity: 14.6040, time_taken_in_seconds: 70
Epoch [1/1], Step [10987/13804], Loss: 2.1695, Perplexity: 8.7538, time_taken_in_seconds: 71
Epoch [1/1], Step [10988/13804], Loss: 2.7136, Perplexity: 15.0838, time_taken_in_seconds: 72
Epoch [1/1], Step [10989/13804], Loss: 2.7166, Perplexity: 15.1289, time_taken_in_seconds: 72
Epoch [1/1], Step [10990/13804], Loss: 2.9321, Perplexity: 18.7678, time_taken_in_seconds: 73
Epoch [1/1], Step [10991/13804], Loss: 2.8908, Perplexity: 18.0085, time_taken_in_seconds: 74
Epoch [1/1], Step [10992/13804], Loss: 2.3941, Perplexity: 10.9579, time_taken_in_seconds: 75
Epoch [1/1], Step [10993/13804], Loss: 2.2963, Perplexity: 9.9375, time_taken_in_seconds: 76
Epoch [1/1], Step [10994/13804], Loss: 2.6220, Perplexity: 13.7628, time_taken_in_seconds: 76
Epoch [1/1], Step [10995/13804], Loss: 2.4077, Perplexity: 11.1087, time_taken_in_seconds: 77
Epoch [1/1], Step [10996/13804], Loss: 2.4558, Perplexity: 11.6559, time_taken_in_seconds: 78
Epoch [1/1], Step [10997/13804], Loss: 2.6617, Perplexity: 14.3209, time_taken_in_seconds: 79
Epoch [1/1], Step [10998/13804], Loss: 2.7368, Perplexity: 15.4375, time_taken_in_seconds: 80
Epoch [1/1], Step [10999/13804], Loss: 2.6881, Perplexity: 14.7034, time_taken_in_seconds: 80
Epoch [1/1], Step [11000/13804], Loss: 2.4482, Perplexity: 11.5681, time_taken_in_seconds: 81
Epoch [1/1], Step [11001/13804], Loss: 2.5301, Perplexity: 12.5550, time_taken_in_seconds: 0
Epoch [1/1], Step [11002/13804], Loss: 2.5107, Perplexity: 12.3133, time_taken_in_seconds: 1
Epoch [1/1], Step [11003/13804], Loss: 2.4796, Perplexity: 11.9362, time_taken_in_seconds: 2
Epoch [1/1], Step [11004/13804], Loss: 2.5252, Perplexity: 12.4928, time_taken_in_seconds: 3
Epoch [1/1], Step [11005/13804], Loss: 2.7676, Perplexity: 15.9200, time_taken_in_seconds: 4
Epoch [1/1], Step [11006/13804], Loss: 2.3772, Perplexity: 10.7751, time_taken_in_seconds: 4
Epoch [1/1], Step [11007/13804], Loss: 2.4385, Perplexity: 11.4555, time_taken_in_seconds: 5
Epoch [1/1], Step [11008/13804], Loss: 2.4170, Perplexity: 11.2122, time_taken_in_seconds: 6
Epoch [1/1], Step [11009/13804], Loss: 2.6116, Perplexity: 13.6214, time_taken_in_seconds: 7
Epoch [1/1], Step [11010/13804], Loss: 2.3978, Perplexity: 10.9989, time_taken_in_seconds: 8
Epoch [1/1], Step [11011/13804], Loss: 2.3971, Perplexity: 10.9908, time_taken_in_seconds: 9
Epoch [1/1], Step [11012/13804], Loss: 2.3414, Perplexity: 10.3962, time_taken_in_seconds: 10
Epoch [1/1], Step [11013/13804], Loss: 2.5785, Perplexity: 13.1770, time_taken_in_seconds: 10
Epoch [1/1], Step [11014/13804], Loss: 2.4783, Perplexity: 11.9211, time_taken_in_seconds: 11
Epoch [1/1], Step [11015/13804], Loss: 2.6571, Perplexity: 14.2549, time_taken_in_seconds: 12
Epoch [1/1], Step [11016/13804], Loss: 2.4027, Perplexity: 11.0531, time_taken_in_seconds: 13
Epoch [1/1], Step [11017/13804], Loss: 2.6397, Perplexity: 14.0089, time_taken_in_seconds: 14
Epoch [1/1], Step [11018/13804], Loss: 2.7791, Perplexity: 16.1053, time_taken_in_seconds: 14
Epoch [1/1], Step [11019/13804], Loss: 2.3579, Perplexity: 10.5692, time_taken_in_seconds: 15
Epoch [1/1], Step [11020/13804], Loss: 2.4743, Perplexity: 11.8736, time_taken_in_seconds: 16
Epoch [1/1], Step [11021/13804], Loss: 2.3778, Perplexity: 10.7811, time_taken_in_seconds: 17
Epoch [1/1], Step [11022/13804], Loss: 2.9944, Perplexity: 19.9738, time_taken_in_seconds: 18
Epoch [1/1], Step [11023/13804], Loss: 2.4583, Perplexity: 11.6852, time_taken_in_seconds: 19
Epoch [1/1], Step [11024/13804], Loss: 2.8050, Perplexity: 16.5275, time_taken_in_seconds: 19
Epoch [1/1], Step [11025/13804], Loss: 2.3637, Perplexity: 10.6298, time_taken_in_seconds: 20
Epoch [1/1], Step [11026/13804], Loss: 2.3374, Perplexity: 10.3546, time_taken_in_seconds: 21
Epoch [1/1], Step [11027/13804], Loss: 3.1573, Perplexity: 23.5064, time_taken_in_seconds: 22
Epoch [1/1], Step [11028/13804], Loss: 2.2642, Perplexity: 9.6232, time_taken_in_seconds: 23
Epoch [1/1], Step [11029/13804], Loss: 2.2525, Perplexity: 9.5118, time_taken_in_seconds: 23
Epoch [1/1], Step [11030/13804], Loss: 2.7085, Perplexity: 15.0061, time_taken_in_seconds: 24
Epoch [1/1], Step [11031/13804], Loss: 2.3713, Perplexity: 10.7109, time_taken_in_seconds: 25
Epoch [1/1], Step [11032/13804], Loss: 2.4974, Perplexity: 12.1508, time_taken_in_seconds: 26
Epoch [1/1], Step [11033/13804], Loss: 2.7839, Perplexity: 16.1823, time_taken_in_seconds: 27
Epoch [1/1], Step [11034/13804], Loss: 2.4289, Perplexity: 11.3468, time_taken_in_seconds: 28
Epoch [1/1], Step [11035/13804], Loss: 2.2979, Perplexity: 9.9536, time_taken_in_seconds: 28
Epoch [1/1], Step [11036/13804], Loss: 2.9053, Perplexity: 18.2703, time_taken_in_seconds: 29
Epoch [1/1], Step [11037/13804], Loss: 2.7707, Perplexity: 15.9703, time_taken_in_seconds: 30
Epoch [1/1], Step [11038/13804], Loss: 2.6602, Perplexity: 14.2992, time_taken_in_seconds: 31
Epoch [1/1], Step [11039/13804], Loss: 2.3215, Perplexity: 10.1907, time_taken_in_seconds: 32
Epoch [1/1], Step [11040/13804], Loss: 2.4965, Perplexity: 12.1395, time_taken_in_seconds: 33
Epoch [1/1], Step [11041/13804], Loss: 2.4971, Perplexity: 12.1469, time_taken_in_seconds: 33
Epoch [1/1], Step [11042/13804], Loss: 2.4492, Perplexity: 11.5796, time_taken_in_seconds: 34
Epoch [1/1], Step [11043/13804], Loss: 2.5644, Perplexity: 12.9927, time_taken_in_seconds: 35
Epoch [1/1], Step [11044/13804], Loss: 2.5579, Perplexity: 12.9090, time_taken_in_seconds: 36
Epoch [1/1], Step [11045/13804], Loss: 2.4685, Perplexity: 11.8044, time_taken_in_seconds: 37
Epoch [1/1], Step [11046/13804], Loss: 2.5135, Perplexity: 12.3482, time_taken_in_seconds: 37
Epoch [1/1], Step [11047/13804], Loss: 2.3820, Perplexity: 10.8267, time_taken_in_seconds: 38
Epoch [1/1], Step [11048/13804], Loss: 2.7156, Perplexity: 15.1137, time_taken_in_seconds: 39
Epoch [1/1], Step [11049/13804], Loss: 2.9422, Perplexity: 18.9575, time_taken_in_seconds: 40
Epoch [1/1], Step [11050/13804], Loss: 2.4594, Perplexity: 11.6980, time_taken_in_seconds: 41
Epoch [1/1], Step [11051/13804], Loss: 2.4731, Perplexity: 11.8591, time_taken_in_seconds: 41
Epoch [1/1], Step [11052/13804], Loss: 2.5727, Perplexity: 13.1005, time_taken_in_seconds: 42
Epoch [1/1], Step [11053/13804], Loss: 2.9031, Perplexity: 18.2305, time_taken_in_seconds: 43
Epoch [1/1], Step [11054/13804], Loss: 2.8081, Perplexity: 16.5783, time_taken_in_seconds: 44
Epoch [1/1], Step [11055/13804], Loss: 2.3876, Perplexity: 10.8869, time_taken_in_seconds: 45
Epoch [1/1], Step [11056/13804], Loss: 2.4801, Perplexity: 11.9429, time_taken_in_seconds: 46
Epoch [1/1], Step [11057/13804], Loss: 2.4351, Perplexity: 11.4168, time_taken_in_seconds: 46
Epoch [1/1], Step [11058/13804], Loss: 2.4793, Perplexity: 11.9332, time_taken_in_seconds: 47
Epoch [1/1], Step [11059/13804], Loss: 2.3282, Perplexity: 10.2591, time_taken_in_seconds: 48
Epoch [1/1], Step [11060/13804], Loss: 2.4208, Perplexity: 11.2550, time_taken_in_seconds: 49
Epoch [1/1], Step [11061/13804], Loss: 2.5939, Perplexity: 13.3821, time_taken_in_seconds: 50
Epoch [1/1], Step [11062/13804], Loss: 2.9797, Perplexity: 19.6816, time_taken_in_seconds: 51
Epoch [1/1], Step [11063/13804], Loss: 2.4659, Perplexity: 11.7744, time_taken_in_seconds: 51
Epoch [1/1], Step [11064/13804], Loss: 2.9034, Perplexity: 18.2356, time_taken_in_seconds: 52
Epoch [1/1], Step [11065/13804], Loss: 2.6206, Perplexity: 13.7440, time_taken_in_seconds: 53
Epoch [1/1], Step [11066/13804], Loss: 3.0794, Perplexity: 21.7461, time_taken_in_seconds: 54
Epoch [1/1], Step [11067/13804], Loss: 2.7152, Perplexity: 15.1082, time_taken_in_seconds: 55
Epoch [1/1], Step [11068/13804], Loss: 2.4323, Perplexity: 11.3845, time_taken_in_seconds: 55
Epoch [1/1], Step [11069/13804], Loss: 2.6269, Perplexity: 13.8303, time_taken_in_seconds: 56
Epoch [1/1], Step [11070/13804], Loss: 2.2946, Perplexity: 9.9204, time_taken_in_seconds: 57
Epoch [1/1], Step [11071/13804], Loss: 2.6128, Perplexity: 13.6377, time_taken_in_seconds: 58
Epoch [1/1], Step [11072/13804], Loss: 2.3853, Perplexity: 10.8624, time_taken_in_seconds: 59
Epoch [1/1], Step [11073/13804], Loss: 2.2681, Perplexity: 9.6606, time_taken_in_seconds: 60
Epoch [1/1], Step [11074/13804], Loss: 2.9224, Perplexity: 18.5858, time_taken_in_seconds: 60
Epoch [1/1], Step [11075/13804], Loss: 2.3077, Perplexity: 10.0510, time_taken_in_seconds: 61
Epoch [1/1], Step [11076/13804], Loss: 2.6779, Perplexity: 14.5539, time_taken_in_seconds: 62
Epoch [1/1], Step [11077/13804], Loss: 2.3834, Perplexity: 10.8412, time_taken_in_seconds: 63
Epoch [1/1], Step [11078/13804], Loss: 2.4364, Perplexity: 11.4322, time_taken_in_seconds: 64
Epoch [1/1], Step [11079/13804], Loss: 2.4798, Perplexity: 11.9386, time_taken_in_seconds: 65
Epoch [1/1], Step [11080/13804], Loss: 2.5097, Perplexity: 12.3011, time_taken_in_seconds: 65
Epoch [1/1], Step [11081/13804], Loss: 2.8030, Perplexity: 16.4936, time_taken_in_seconds: 66
Epoch [1/1], Step [11082/13804], Loss: 2.6074, Perplexity: 13.5633, time_taken_in_seconds: 67
Epoch [1/1], Step [11083/13804], Loss: 2.3776, Perplexity: 10.7785, time_taken_in_seconds: 68
Epoch [1/1], Step [11084/13804], Loss: 2.7763, Perplexity: 16.0600, time_taken_in_seconds: 69
Epoch [1/1], Step [11085/13804], Loss: 2.4304, Perplexity: 11.3629, time_taken_in_seconds: 70
Epoch [1/1], Step [11086/13804], Loss: 2.5013, Perplexity: 12.1981, time_taken_in_seconds: 70
Epoch [1/1], Step [11087/13804], Loss: 2.6115, Perplexity: 13.6188, time_taken_in_seconds: 71
Epoch [1/1], Step [11088/13804], Loss: 2.5302, Perplexity: 12.5564, time_taken_in_seconds: 72
Epoch [1/1], Step [11089/13804], Loss: 2.7812, Perplexity: 16.1382, time_taken_in_seconds: 73
Epoch [1/1], Step [11090/13804], Loss: 2.5574, Perplexity: 12.9023, time_taken_in_seconds: 74
Epoch [1/1], Step [11091/13804], Loss: 2.8609, Perplexity: 17.4780, time_taken_in_seconds: 75
Epoch [1/1], Step [11092/13804], Loss: 2.4617, Perplexity: 11.7246, time_taken_in_seconds: 75
Epoch [1/1], Step [11093/13804], Loss: 2.6991, Perplexity: 14.8657, time_taken_in_seconds: 76
Epoch [1/1], Step [11094/13804], Loss: 2.3835, Perplexity: 10.8428, time_taken_in_seconds: 77
Epoch [1/1], Step [11095/13804], Loss: 2.6408, Perplexity: 14.0246, time_taken_in_seconds: 78
Epoch [1/1], Step [11096/13804], Loss: 2.7126, Perplexity: 15.0679, time_taken_in_seconds: 79
Epoch [1/1], Step [11097/13804], Loss: 2.2685, Perplexity: 9.6649, time_taken_in_seconds: 80
Epoch [1/1], Step [11098/13804], Loss: 2.9510, Perplexity: 19.1252, time_taken_in_seconds: 80
Epoch [1/1], Step [11099/13804], Loss: 2.3432, Perplexity: 10.4142, time_taken_in_seconds: 81
Epoch [1/1], Step [11100/13804], Loss: 2.8261, Perplexity: 16.8794, time_taken_in_seconds: 82
Epoch [1/1], Step [11101/13804], Loss: 2.6568, Perplexity: 14.2510, time_taken_in_seconds: 0
Epoch [1/1], Step [11102/13804], Loss: 2.4617, Perplexity: 11.7244, time_taken_in_seconds: 1
Epoch [1/1], Step [11103/13804], Loss: 2.5234, Perplexity: 12.4712, time_taken_in_seconds: 2
Epoch [1/1], Step [11104/13804], Loss: 2.4561, Perplexity: 11.6594, time_taken_in_seconds: 3
Epoch [1/1], Step [11105/13804], Loss: 2.6315, Perplexity: 13.8943, time_taken_in_seconds: 4
Epoch [1/1], Step [11106/13804], Loss: 2.4293, Perplexity: 11.3514, time_taken_in_seconds: 4
Epoch [1/1], Step [11107/13804], Loss: 2.9552, Perplexity: 19.2064, time_taken_in_seconds: 5
Epoch [1/1], Step [11108/13804], Loss: 2.3039, Perplexity: 10.0128, time_taken_in_seconds: 6
Epoch [1/1], Step [11109/13804], Loss: 2.7596, Perplexity: 15.7930, time_taken_in_seconds: 7
Epoch [1/1], Step [11110/13804], Loss: 2.7590, Perplexity: 15.7844, time_taken_in_seconds: 8
Epoch [1/1], Step [11111/13804], Loss: 2.4747, Perplexity: 11.8776, time_taken_in_seconds: 8
Epoch [1/1], Step [11112/13804], Loss: 2.5169, Perplexity: 12.3905, time_taken_in_seconds: 9
Epoch [1/1], Step [11113/13804], Loss: 2.2128, Perplexity: 9.1414, time_taken_in_seconds: 10
Epoch [1/1], Step [11114/13804], Loss: 2.4303, Perplexity: 11.3620, time_taken_in_seconds: 11
Epoch [1/1], Step [11115/13804], Loss: 2.4833, Perplexity: 11.9810, time_taken_in_seconds: 12
Epoch [1/1], Step [11116/13804], Loss: 2.5102, Perplexity: 12.3077, time_taken_in_seconds: 13
Epoch [1/1], Step [11117/13804], Loss: 2.7761, Perplexity: 16.0560, time_taken_in_seconds: 13
Epoch [1/1], Step [11118/13804], Loss: 2.6965, Perplexity: 14.8272, time_taken_in_seconds: 14
Epoch [1/1], Step [11119/13804], Loss: 2.5111, Perplexity: 12.3184, time_taken_in_seconds: 15
Epoch [1/1], Step [11120/13804], Loss: 2.3606, Perplexity: 10.5972, time_taken_in_seconds: 16
Epoch [1/1], Step [11121/13804], Loss: 2.3694, Perplexity: 10.6910, time_taken_in_seconds: 17
Epoch [1/1], Step [11122/13804], Loss: 3.0913, Perplexity: 22.0057, time_taken_in_seconds: 17
Epoch [1/1], Step [11123/13804], Loss: 2.3526, Perplexity: 10.5125, time_taken_in_seconds: 18
Epoch [1/1], Step [11124/13804], Loss: 2.7130, Perplexity: 15.0749, time_taken_in_seconds: 19
Epoch [1/1], Step [11125/13804], Loss: 2.5526, Perplexity: 12.8399, time_taken_in_seconds: 20
Epoch [1/1], Step [11126/13804], Loss: 2.6082, Perplexity: 13.5749, time_taken_in_seconds: 21
Epoch [1/1], Step [11127/13804], Loss: 2.7090, Perplexity: 15.0141, time_taken_in_seconds: 22
Epoch [1/1], Step [11128/13804], Loss: 2.8256, Perplexity: 16.8715, time_taken_in_seconds: 22
Epoch [1/1], Step [11129/13804], Loss: 2.5007, Perplexity: 12.1908, time_taken_in_seconds: 23
Epoch [1/1], Step [11130/13804], Loss: 3.2864, Perplexity: 26.7460, time_taken_in_seconds: 24
Epoch [1/1], Step [11131/13804], Loss: 2.6426, Perplexity: 14.0491, time_taken_in_seconds: 25
Epoch [1/1], Step [11132/13804], Loss: 2.6240, Perplexity: 13.7909, time_taken_in_seconds: 26
Epoch [1/1], Step [11133/13804], Loss: 2.4755, Perplexity: 11.8880, time_taken_in_seconds: 26
Epoch [1/1], Step [11134/13804], Loss: 2.7912, Perplexity: 16.2998, time_taken_in_seconds: 27
Epoch [1/1], Step [11135/13804], Loss: 2.1400, Perplexity: 8.4992, time_taken_in_seconds: 28
Epoch [1/1], Step [11136/13804], Loss: 2.3400, Perplexity: 10.3814, time_taken_in_seconds: 29
Epoch [1/1], Step [11137/13804], Loss: 2.3857, Perplexity: 10.8672, time_taken_in_seconds: 30
Epoch [1/1], Step [11138/13804], Loss: 3.2346, Perplexity: 25.3962, time_taken_in_seconds: 30
Epoch [1/1], Step [11139/13804], Loss: 2.7166, Perplexity: 15.1281, time_taken_in_seconds: 31
Epoch [1/1], Step [11140/13804], Loss: 2.3640, Perplexity: 10.6334, time_taken_in_seconds: 32
Epoch [1/1], Step [11141/13804], Loss: 2.5150, Perplexity: 12.3670, time_taken_in_seconds: 33
Epoch [1/1], Step [11142/13804], Loss: 2.2692, Perplexity: 9.6717, time_taken_in_seconds: 34
Epoch [1/1], Step [11143/13804], Loss: 2.9476, Perplexity: 19.0592, time_taken_in_seconds: 34
Epoch [1/1], Step [11144/13804], Loss: 2.4532, Perplexity: 11.6256, time_taken_in_seconds: 35
Epoch [1/1], Step [11145/13804], Loss: 2.3397, Perplexity: 10.3786, time_taken_in_seconds: 36
Epoch [1/1], Step [11146/13804], Loss: 2.5033, Perplexity: 12.2233, time_taken_in_seconds: 37
Epoch [1/1], Step [11147/13804], Loss: 2.4012, Perplexity: 11.0366, time_taken_in_seconds: 38
Epoch [1/1], Step [11148/13804], Loss: 2.8047, Perplexity: 16.5218, time_taken_in_seconds: 39
Epoch [1/1], Step [11149/13804], Loss: 2.4402, Perplexity: 11.4754, time_taken_in_seconds: 39
Epoch [1/1], Step [11150/13804], Loss: 2.4819, Perplexity: 11.9638, time_taken_in_seconds: 40
Epoch [1/1], Step [11151/13804], Loss: 2.5010, Perplexity: 12.1945, time_taken_in_seconds: 41
Epoch [1/1], Step [11152/13804], Loss: 2.3910, Perplexity: 10.9239, time_taken_in_seconds: 42
Epoch [1/1], Step [11153/13804], Loss: 2.6257, Perplexity: 13.8147, time_taken_in_seconds: 43
Epoch [1/1], Step [11154/13804], Loss: 2.6046, Perplexity: 13.5252, time_taken_in_seconds: 44
Epoch [1/1], Step [11155/13804], Loss: 2.5041, Perplexity: 12.2323, time_taken_in_seconds: 44
Epoch [1/1], Step [11156/13804], Loss: 2.5545, Perplexity: 12.8652, time_taken_in_seconds: 45
Epoch [1/1], Step [11157/13804], Loss: 2.9761, Perplexity: 19.6121, time_taken_in_seconds: 46
Epoch [1/1], Step [11158/13804], Loss: 2.3527, Perplexity: 10.5135, time_taken_in_seconds: 47
Epoch [1/1], Step [11159/13804], Loss: 2.5787, Perplexity: 13.1801, time_taken_in_seconds: 48
Epoch [1/1], Step [11160/13804], Loss: 2.6155, Perplexity: 13.6739, time_taken_in_seconds: 49
Epoch [1/1], Step [11161/13804], Loss: 2.5464, Perplexity: 12.7607, time_taken_in_seconds: 49
Epoch [1/1], Step [11162/13804], Loss: 2.6682, Perplexity: 14.4140, time_taken_in_seconds: 50
Epoch [1/1], Step [11163/13804], Loss: 2.6411, Perplexity: 14.0290, time_taken_in_seconds: 51
Epoch [1/1], Step [11164/13804], Loss: 2.8852, Perplexity: 17.9067, time_taken_in_seconds: 52
Epoch [1/1], Step [11165/13804], Loss: 2.2275, Perplexity: 9.2764, time_taken_in_seconds: 53
Epoch [1/1], Step [11166/13804], Loss: 2.7915, Perplexity: 16.3060, time_taken_in_seconds: 54
Epoch [1/1], Step [11167/13804], Loss: 2.3583, Perplexity: 10.5732, time_taken_in_seconds: 54
Epoch [1/1], Step [11168/13804], Loss: 4.8769, Perplexity: 131.2295, time_taken_in_seconds: 55
Epoch [1/1], Step [11169/13804], Loss: 2.3626, Perplexity: 10.6185, time_taken_in_seconds: 56
Epoch [1/1], Step [11170/13804], Loss: 2.4629, Perplexity: 11.7382, time_taken_in_seconds: 57
Epoch [1/1], Step [11171/13804], Loss: 2.4666, Perplexity: 11.7824, time_taken_in_seconds: 58
Epoch [1/1], Step [11172/13804], Loss: 3.1026, Perplexity: 22.2554, time_taken_in_seconds: 59
Epoch [1/1], Step [11173/13804], Loss: 2.1997, Perplexity: 9.0220, time_taken_in_seconds: 59
Epoch [1/1], Step [11174/13804], Loss: 2.5984, Perplexity: 13.4416, time_taken_in_seconds: 60
Epoch [1/1], Step [11175/13804], Loss: 2.8469, Perplexity: 17.2338, time_taken_in_seconds: 61
Epoch [1/1], Step [11176/13804], Loss: 2.9643, Perplexity: 19.3808, time_taken_in_seconds: 62
Epoch [1/1], Step [11177/13804], Loss: 2.4149, Perplexity: 11.1882, time_taken_in_seconds: 63
Epoch [1/1], Step [11178/13804], Loss: 2.3906, Perplexity: 10.9200, time_taken_in_seconds: 63
Epoch [1/1], Step [11179/13804], Loss: 2.9895, Perplexity: 19.8755, time_taken_in_seconds: 64
Epoch [1/1], Step [11180/13804], Loss: 2.4075, Perplexity: 11.1066, time_taken_in_seconds: 65
Epoch [1/1], Step [11181/13804], Loss: 2.5792, Perplexity: 13.1869, time_taken_in_seconds: 66
Epoch [1/1], Step [11182/13804], Loss: 2.7499, Perplexity: 15.6415, time_taken_in_seconds: 67
Epoch [1/1], Step [11183/13804], Loss: 2.8490, Perplexity: 17.2698, time_taken_in_seconds: 68
Epoch [1/1], Step [11184/13804], Loss: 2.7144, Perplexity: 15.0952, time_taken_in_seconds: 68
Epoch [1/1], Step [11185/13804], Loss: 3.1625, Perplexity: 23.6284, time_taken_in_seconds: 69
Epoch [1/1], Step [11186/13804], Loss: 2.6093, Perplexity: 13.5895, time_taken_in_seconds: 70
Epoch [1/1], Step [11187/13804], Loss: 3.4232, Perplexity: 30.6662, time_taken_in_seconds: 71
Epoch [1/1], Step [11188/13804], Loss: 2.6262, Perplexity: 13.8213, time_taken_in_seconds: 72
Epoch [1/1], Step [11189/13804], Loss: 2.5534, Perplexity: 12.8505, time_taken_in_seconds: 72
Epoch [1/1], Step [11190/13804], Loss: 2.3338, Perplexity: 10.3167, time_taken_in_seconds: 73
Epoch [1/1], Step [11191/13804], Loss: 2.1556, Perplexity: 8.6329, time_taken_in_seconds: 74
Epoch [1/1], Step [11192/13804], Loss: 2.3284, Perplexity: 10.2618, time_taken_in_seconds: 75
Epoch [1/1], Step [11193/13804], Loss: 2.7616, Perplexity: 15.8250, time_taken_in_seconds: 76
Epoch [1/1], Step [11194/13804], Loss: 2.3824, Perplexity: 10.8314, time_taken_in_seconds: 76
Epoch [1/1], Step [11195/13804], Loss: 2.4995, Perplexity: 12.1767, time_taken_in_seconds: 77
Epoch [1/1], Step [11196/13804], Loss: 2.6090, Perplexity: 13.5855, time_taken_in_seconds: 78
Epoch [1/1], Step [11197/13804], Loss: 2.7519, Perplexity: 15.6723, time_taken_in_seconds: 79
Epoch [1/1], Step [11198/13804], Loss: 2.7118, Perplexity: 15.0565, time_taken_in_seconds: 80
Epoch [1/1], Step [11199/13804], Loss: 2.5119, Perplexity: 12.3278, time_taken_in_seconds: 81
Epoch [1/1], Step [11200/13804], Loss: 2.2480, Perplexity: 9.4692, time_taken_in_seconds: 81
Epoch [1/1], Step [11201/13804], Loss: 2.7328, Perplexity: 15.3766, time_taken_in_seconds: 0
Epoch [1/1], Step [11202/13804], Loss: 2.5392, Perplexity: 12.6694, time_taken_in_seconds: 1
Epoch [1/1], Step [11203/13804], Loss: 2.5833, Perplexity: 13.2406, time_taken_in_seconds: 2
Epoch [1/1], Step [11204/13804], Loss: 2.5896, Perplexity: 13.3249, time_taken_in_seconds: 3
Epoch [1/1], Step [11205/13804], Loss: 2.4537, Perplexity: 11.6310, time_taken_in_seconds: 4
Epoch [1/1], Step [11206/13804], Loss: 2.5513, Perplexity: 12.8234, time_taken_in_seconds: 4
Epoch [1/1], Step [11207/13804], Loss: 2.5191, Perplexity: 12.4178, time_taken_in_seconds: 5
Epoch [1/1], Step [11208/13804], Loss: 2.1380, Perplexity: 8.4824, time_taken_in_seconds: 6
Epoch [1/1], Step [11209/13804], Loss: 2.8086, Perplexity: 16.5871, time_taken_in_seconds: 7
Epoch [1/1], Step [11210/13804], Loss: 2.2708, Perplexity: 9.6872, time_taken_in_seconds: 8
Epoch [1/1], Step [11211/13804], Loss: 2.1435, Perplexity: 8.5290, time_taken_in_seconds: 8
Epoch [1/1], Step [11212/13804], Loss: 2.4386, Perplexity: 11.4574, time_taken_in_seconds: 9
Epoch [1/1], Step [11213/13804], Loss: 2.3249, Perplexity: 10.2256, time_taken_in_seconds: 10
Epoch [1/1], Step [11214/13804], Loss: 2.7106, Perplexity: 15.0389, time_taken_in_seconds: 11
Epoch [1/1], Step [11215/13804], Loss: 2.4590, Perplexity: 11.6927, time_taken_in_seconds: 12
Epoch [1/1], Step [11216/13804], Loss: 2.2854, Perplexity: 9.8301, time_taken_in_seconds: 13
Epoch [1/1], Step [11217/13804], Loss: 3.3651, Perplexity: 28.9360, time_taken_in_seconds: 13
Epoch [1/1], Step [11218/13804], Loss: 2.5485, Perplexity: 12.7874, time_taken_in_seconds: 14
Epoch [1/1], Step [11219/13804], Loss: 2.8201, Perplexity: 16.7790, time_taken_in_seconds: 15
Epoch [1/1], Step [11220/13804], Loss: 2.7507, Perplexity: 15.6533, time_taken_in_seconds: 16
Epoch [1/1], Step [11221/13804], Loss: 2.6913, Perplexity: 14.7511, time_taken_in_seconds: 17
Epoch [1/1], Step [11222/13804], Loss: 2.6649, Perplexity: 14.3661, time_taken_in_seconds: 17
Epoch [1/1], Step [11223/13804], Loss: 3.1016, Perplexity: 22.2341, time_taken_in_seconds: 18
Epoch [1/1], Step [11224/13804], Loss: 2.6465, Perplexity: 14.1050, time_taken_in_seconds: 19
Epoch [1/1], Step [11225/13804], Loss: 2.3713, Perplexity: 10.7115, time_taken_in_seconds: 20
Epoch [1/1], Step [11226/13804], Loss: 2.5066, Perplexity: 12.2633, time_taken_in_seconds: 21
Epoch [1/1], Step [11227/13804], Loss: 3.0790, Perplexity: 21.7372, time_taken_in_seconds: 22
Epoch [1/1], Step [11228/13804], Loss: 2.2745, Perplexity: 9.7235, time_taken_in_seconds: 22
Epoch [1/1], Step [11229/13804], Loss: 2.5835, Perplexity: 13.2432, time_taken_in_seconds: 23
Epoch [1/1], Step [11230/13804], Loss: 2.5952, Perplexity: 13.3991, time_taken_in_seconds: 24
Epoch [1/1], Step [11231/13804], Loss: 2.6016, Perplexity: 13.4850, time_taken_in_seconds: 25
Epoch [1/1], Step [11232/13804], Loss: 2.0707, Perplexity: 7.9305, time_taken_in_seconds: 26
Epoch [1/1], Step [11233/13804], Loss: 2.3791, Perplexity: 10.7950, time_taken_in_seconds: 27
Epoch [1/1], Step [11234/13804], Loss: 2.5000, Perplexity: 12.1824, time_taken_in_seconds: 28
Epoch [1/1], Step [11235/13804], Loss: 2.6024, Perplexity: 13.4962, time_taken_in_seconds: 28
Epoch [1/1], Step [11236/13804], Loss: 2.4568, Perplexity: 11.6672, time_taken_in_seconds: 29
Epoch [1/1], Step [11237/13804], Loss: 2.8198, Perplexity: 16.7737, time_taken_in_seconds: 30
Epoch [1/1], Step [11238/13804], Loss: 2.7851, Perplexity: 16.2020, time_taken_in_seconds: 31
Epoch [1/1], Step [11239/13804], Loss: 2.2479, Perplexity: 9.4677, time_taken_in_seconds: 32
Epoch [1/1], Step [11240/13804], Loss: 2.7380, Perplexity: 15.4553, time_taken_in_seconds: 32
Epoch [1/1], Step [11241/13804], Loss: 2.7751, Perplexity: 16.0408, time_taken_in_seconds: 33
Epoch [1/1], Step [11242/13804], Loss: 2.3164, Perplexity: 10.1389, time_taken_in_seconds: 34
Epoch [1/1], Step [11243/13804], Loss: 2.6509, Perplexity: 14.1675, time_taken_in_seconds: 35
Epoch [1/1], Step [11244/13804], Loss: 2.2005, Perplexity: 9.0297, time_taken_in_seconds: 36
Epoch [1/1], Step [11245/13804], Loss: 2.3791, Perplexity: 10.7948, time_taken_in_seconds: 36
Epoch [1/1], Step [11246/13804], Loss: 2.3947, Perplexity: 10.9653, time_taken_in_seconds: 37
Epoch [1/1], Step [11247/13804], Loss: 2.4403, Perplexity: 11.4768, time_taken_in_seconds: 38
Epoch [1/1], Step [11248/13804], Loss: 3.0675, Perplexity: 21.4878, time_taken_in_seconds: 39
Epoch [1/1], Step [11249/13804], Loss: 2.6991, Perplexity: 14.8663, time_taken_in_seconds: 40
Epoch [1/1], Step [11250/13804], Loss: 2.3357, Perplexity: 10.3368, time_taken_in_seconds: 41
Epoch [1/1], Step [11251/13804], Loss: 2.5792, Perplexity: 13.1871, time_taken_in_seconds: 41
Epoch [1/1], Step [11252/13804], Loss: 2.8954, Perplexity: 18.0908, time_taken_in_seconds: 42
Epoch [1/1], Step [11253/13804], Loss: 2.4522, Perplexity: 11.6139, time_taken_in_seconds: 43
Epoch [1/1], Step [11254/13804], Loss: 2.2866, Perplexity: 9.8411, time_taken_in_seconds: 44
Epoch [1/1], Step [11255/13804], Loss: 2.3881, Perplexity: 10.8930, time_taken_in_seconds: 45
Epoch [1/1], Step [11256/13804], Loss: 2.5194, Perplexity: 12.4209, time_taken_in_seconds: 45
Epoch [1/1], Step [11257/13804], Loss: 2.2630, Perplexity: 9.6116, time_taken_in_seconds: 46
Epoch [1/1], Step [11258/13804], Loss: 2.6503, Perplexity: 14.1581, time_taken_in_seconds: 47
Epoch [1/1], Step [11259/13804], Loss: 2.4195, Perplexity: 11.2400, time_taken_in_seconds: 48
Epoch [1/1], Step [11260/13804], Loss: 2.8644, Perplexity: 17.5377, time_taken_in_seconds: 49
Epoch [1/1], Step [11261/13804], Loss: 2.3540, Perplexity: 10.5273, time_taken_in_seconds: 50
Epoch [1/1], Step [11262/13804], Loss: 2.7125, Perplexity: 15.0671, time_taken_in_seconds: 50
Epoch [1/1], Step [11263/13804], Loss: 2.6261, Perplexity: 13.8198, time_taken_in_seconds: 51
Epoch [1/1], Step [11264/13804], Loss: 2.3241, Perplexity: 10.2171, time_taken_in_seconds: 52
Epoch [1/1], Step [11265/13804], Loss: 3.0097, Perplexity: 20.2822, time_taken_in_seconds: 53
Epoch [1/1], Step [11266/13804], Loss: 3.0776, Perplexity: 21.7057, time_taken_in_seconds: 54
Epoch [1/1], Step [11267/13804], Loss: 2.6901, Perplexity: 14.7335, time_taken_in_seconds: 54
Epoch [1/1], Step [11268/13804], Loss: 2.8697, Perplexity: 17.6309, time_taken_in_seconds: 55
Epoch [1/1], Step [11269/13804], Loss: 2.4751, Perplexity: 11.8826, time_taken_in_seconds: 56
Epoch [1/1], Step [11270/13804], Loss: 2.4788, Perplexity: 11.9268, time_taken_in_seconds: 57
Epoch [1/1], Step [11271/13804], Loss: 2.2025, Perplexity: 9.0474, time_taken_in_seconds: 58
Epoch [1/1], Step [11272/13804], Loss: 3.1056, Perplexity: 22.3220, time_taken_in_seconds: 59
Epoch [1/1], Step [11273/13804], Loss: 2.8639, Perplexity: 17.5303, time_taken_in_seconds: 59
Epoch [1/1], Step [11274/13804], Loss: 2.6922, Perplexity: 14.7648, time_taken_in_seconds: 60
Epoch [1/1], Step [11275/13804], Loss: 2.6976, Perplexity: 14.8439, time_taken_in_seconds: 61
Epoch [1/1], Step [11276/13804], Loss: 2.7194, Perplexity: 15.1717, time_taken_in_seconds: 62
Epoch [1/1], Step [11277/13804], Loss: 2.6933, Perplexity: 14.7809, time_taken_in_seconds: 63
Epoch [1/1], Step [11278/13804], Loss: 2.5233, Perplexity: 12.4691, time_taken_in_seconds: 63
Epoch [1/1], Step [11279/13804], Loss: 2.5766, Perplexity: 13.1524, time_taken_in_seconds: 64
Epoch [1/1], Step [11280/13804], Loss: 2.6308, Perplexity: 13.8856, time_taken_in_seconds: 65
Epoch [1/1], Step [11281/13804], Loss: 2.6329, Perplexity: 13.9138, time_taken_in_seconds: 66
Epoch [1/1], Step [11282/13804], Loss: 4.9140, Perplexity: 136.1868, time_taken_in_seconds: 67
Epoch [1/1], Step [11283/13804], Loss: 2.7053, Perplexity: 14.9587, time_taken_in_seconds: 67
Epoch [1/1], Step [11284/13804], Loss: 2.5695, Perplexity: 13.0595, time_taken_in_seconds: 68
Epoch [1/1], Step [11285/13804], Loss: 2.4643, Perplexity: 11.7558, time_taken_in_seconds: 69
Epoch [1/1], Step [11286/13804], Loss: 2.6932, Perplexity: 14.7794, time_taken_in_seconds: 70
Epoch [1/1], Step [11287/13804], Loss: 2.3804, Perplexity: 10.8089, time_taken_in_seconds: 71
Epoch [1/1], Step [11288/13804], Loss: 3.3017, Perplexity: 27.1601, time_taken_in_seconds: 71
Epoch [1/1], Step [11289/13804], Loss: 2.3481, Perplexity: 10.4658, time_taken_in_seconds: 72
Epoch [1/1], Step [11290/13804], Loss: 2.7382, Perplexity: 15.4586, time_taken_in_seconds: 73
Epoch [1/1], Step [11291/13804], Loss: 2.4770, Perplexity: 11.9055, time_taken_in_seconds: 74
Epoch [1/1], Step [11292/13804], Loss: 2.6205, Perplexity: 13.7422, time_taken_in_seconds: 75
Epoch [1/1], Step [11293/13804], Loss: 2.6659, Perplexity: 14.3812, time_taken_in_seconds: 76
Epoch [1/1], Step [11294/13804], Loss: 2.7697, Perplexity: 15.9536, time_taken_in_seconds: 76
Epoch [1/1], Step [11295/13804], Loss: 2.4877, Perplexity: 12.0332, time_taken_in_seconds: 77
Epoch [1/1], Step [11296/13804], Loss: 2.3325, Perplexity: 10.3035, time_taken_in_seconds: 78
Epoch [1/1], Step [11297/13804], Loss: 2.4908, Perplexity: 12.0712, time_taken_in_seconds: 79
Epoch [1/1], Step [11298/13804], Loss: 3.0491, Perplexity: 21.0955, time_taken_in_seconds: 80
Epoch [1/1], Step [11299/13804], Loss: 2.7885, Perplexity: 16.2573, time_taken_in_seconds: 80
Epoch [1/1], Step [11300/13804], Loss: 2.7137, Perplexity: 15.0844, time_taken_in_seconds: 81
Epoch [1/1], Step [11301/13804], Loss: 2.2505, Perplexity: 9.4928, time_taken_in_seconds: 0
Epoch [1/1], Step [11302/13804], Loss: 2.5355, Perplexity: 12.6225, time_taken_in_seconds: 1
Epoch [1/1], Step [11303/13804], Loss: 2.2115, Perplexity: 9.1291, time_taken_in_seconds: 2
Epoch [1/1], Step [11304/13804], Loss: 2.6589, Perplexity: 14.2806, time_taken_in_seconds: 3
Epoch [1/1], Step [11305/13804], Loss: 2.5797, Perplexity: 13.1928, time_taken_in_seconds: 4
Epoch [1/1], Step [11306/13804], Loss: 2.8174, Perplexity: 16.7331, time_taken_in_seconds: 5
Epoch [1/1], Step [11307/13804], Loss: 2.6568, Perplexity: 14.2509, time_taken_in_seconds: 5
Epoch [1/1], Step [11308/13804], Loss: 2.1314, Perplexity: 8.4270, time_taken_in_seconds: 6
Epoch [1/1], Step [11309/13804], Loss: 2.6296, Perplexity: 13.8680, time_taken_in_seconds: 7
Epoch [1/1], Step [11310/13804], Loss: 2.5560, Perplexity: 12.8838, time_taken_in_seconds: 8
Epoch [1/1], Step [11311/13804], Loss: 2.5075, Perplexity: 12.2746, time_taken_in_seconds: 9
Epoch [1/1], Step [11312/13804], Loss: 2.4132, Perplexity: 11.1694, time_taken_in_seconds: 9
Epoch [1/1], Step [11313/13804], Loss: 2.4188, Perplexity: 11.2328, time_taken_in_seconds: 10
Epoch [1/1], Step [11314/13804], Loss: 2.5872, Perplexity: 13.2928, time_taken_in_seconds: 11
Epoch [1/1], Step [11315/13804], Loss: 2.3751, Perplexity: 10.7517, time_taken_in_seconds: 12
Epoch [1/1], Step [11316/13804], Loss: 2.5144, Perplexity: 12.3595, time_taken_in_seconds: 13
Epoch [1/1], Step [11317/13804], Loss: 2.7669, Perplexity: 15.9095, time_taken_in_seconds: 13
Epoch [1/1], Step [11318/13804], Loss: 2.7358, Perplexity: 15.4217, time_taken_in_seconds: 14
Epoch [1/1], Step [11319/13804], Loss: 2.5935, Perplexity: 13.3763, time_taken_in_seconds: 15
Epoch [1/1], Step [11320/13804], Loss: 2.3921, Perplexity: 10.9365, time_taken_in_seconds: 16
Epoch [1/1], Step [11321/13804], Loss: 2.6616, Perplexity: 14.3197, time_taken_in_seconds: 17
Epoch [1/1], Step [11322/13804], Loss: 2.4252, Perplexity: 11.3050, time_taken_in_seconds: 17
Epoch [1/1], Step [11323/13804], Loss: 2.9751, Perplexity: 19.5912, time_taken_in_seconds: 18
Epoch [1/1], Step [11324/13804], Loss: 2.4620, Perplexity: 11.7283, time_taken_in_seconds: 19
Epoch [1/1], Step [11325/13804], Loss: 2.5094, Perplexity: 12.2972, time_taken_in_seconds: 20
Epoch [1/1], Step [11326/13804], Loss: 2.9392, Perplexity: 18.9007, time_taken_in_seconds: 21
Epoch [1/1], Step [11327/13804], Loss: 2.1321, Perplexity: 8.4323, time_taken_in_seconds: 22
Epoch [1/1], Step [11328/13804], Loss: 2.5032, Perplexity: 12.2214, time_taken_in_seconds: 22
Epoch [1/1], Step [11329/13804], Loss: 2.5038, Perplexity: 12.2291, time_taken_in_seconds: 23
Epoch [1/1], Step [11330/13804], Loss: 2.1152, Perplexity: 8.2916, time_taken_in_seconds: 24
Epoch [1/1], Step [11331/13804], Loss: 2.6145, Perplexity: 13.6610, time_taken_in_seconds: 25
Epoch [1/1], Step [11332/13804], Loss: 2.8759, Perplexity: 17.7409, time_taken_in_seconds: 26
Epoch [1/1], Step [11333/13804], Loss: 2.7447, Perplexity: 15.5597, time_taken_in_seconds: 26
Epoch [1/1], Step [11334/13804], Loss: 2.7411, Perplexity: 15.5033, time_taken_in_seconds: 27
Epoch [1/1], Step [11335/13804], Loss: 2.5574, Perplexity: 12.9029, time_taken_in_seconds: 28
Epoch [1/1], Step [11336/13804], Loss: 3.3795, Perplexity: 29.3557, time_taken_in_seconds: 29
Epoch [1/1], Step [11337/13804], Loss: 2.6004, Perplexity: 13.4692, time_taken_in_seconds: 30
Epoch [1/1], Step [11338/13804], Loss: 2.9665, Perplexity: 19.4236, time_taken_in_seconds: 30
Epoch [1/1], Step [11339/13804], Loss: 2.3722, Perplexity: 10.7211, time_taken_in_seconds: 31
Epoch [1/1], Step [11340/13804], Loss: 2.2607, Perplexity: 9.5901, time_taken_in_seconds: 32
Epoch [1/1], Step [11341/13804], Loss: 2.6601, Perplexity: 14.2974, time_taken_in_seconds: 33
Epoch [1/1], Step [11342/13804], Loss: 2.4366, Perplexity: 11.4341, time_taken_in_seconds: 34
Epoch [1/1], Step [11343/13804], Loss: 2.1426, Perplexity: 8.5215, time_taken_in_seconds: 34
Epoch [1/1], Step [11344/13804], Loss: 2.9420, Perplexity: 18.9540, time_taken_in_seconds: 35
Epoch [1/1], Step [11345/13804], Loss: 2.5032, Perplexity: 12.2209, time_taken_in_seconds: 36
Epoch [1/1], Step [11346/13804], Loss: 2.1127, Perplexity: 8.2709, time_taken_in_seconds: 37
Epoch [1/1], Step [11347/13804], Loss: 3.4647, Perplexity: 31.9658, time_taken_in_seconds: 38
Epoch [1/1], Step [11348/13804], Loss: 2.5443, Perplexity: 12.7348, time_taken_in_seconds: 38
Epoch [1/1], Step [11349/13804], Loss: 3.1053, Perplexity: 22.3157, time_taken_in_seconds: 39
Epoch [1/1], Step [11350/13804], Loss: 2.3098, Perplexity: 10.0726, time_taken_in_seconds: 40
Epoch [1/1], Step [11351/13804], Loss: 2.9683, Perplexity: 19.4588, time_taken_in_seconds: 41
Epoch [1/1], Step [11352/13804], Loss: 2.5505, Perplexity: 12.8130, time_taken_in_seconds: 42
Epoch [1/1], Step [11353/13804], Loss: 2.7028, Perplexity: 14.9219, time_taken_in_seconds: 43
Epoch [1/1], Step [11354/13804], Loss: 2.5501, Perplexity: 12.8078, time_taken_in_seconds: 43
Epoch [1/1], Step [11355/13804], Loss: 2.5213, Perplexity: 12.4448, time_taken_in_seconds: 44
Epoch [1/1], Step [11356/13804], Loss: 2.4465, Perplexity: 11.5483, time_taken_in_seconds: 45
Epoch [1/1], Step [11357/13804], Loss: 2.5773, Perplexity: 13.1610, time_taken_in_seconds: 46
Epoch [1/1], Step [11358/13804], Loss: 2.4077, Perplexity: 11.1088, time_taken_in_seconds: 47
Epoch [1/1], Step [11359/13804], Loss: 2.5643, Perplexity: 12.9913, time_taken_in_seconds: 47
Epoch [1/1], Step [11360/13804], Loss: 2.2337, Perplexity: 9.3342, time_taken_in_seconds: 48
Epoch [1/1], Step [11361/13804], Loss: 2.2583, Perplexity: 9.5664, time_taken_in_seconds: 49
Epoch [1/1], Step [11362/13804], Loss: 2.5119, Perplexity: 12.3287, time_taken_in_seconds: 50
Epoch [1/1], Step [11363/13804], Loss: 3.0885, Perplexity: 21.9447, time_taken_in_seconds: 51
Epoch [1/1], Step [11364/13804], Loss: 2.4579, Perplexity: 11.6800, time_taken_in_seconds: 51
Epoch [1/1], Step [11365/13804], Loss: 2.5530, Perplexity: 12.8457, time_taken_in_seconds: 52
Epoch [1/1], Step [11366/13804], Loss: 2.6690, Perplexity: 14.4258, time_taken_in_seconds: 53
Epoch [1/1], Step [11367/13804], Loss: 2.8092, Perplexity: 16.5969, time_taken_in_seconds: 54
Epoch [1/1], Step [11368/13804], Loss: 2.6617, Perplexity: 14.3203, time_taken_in_seconds: 55
Epoch [1/1], Step [11369/13804], Loss: 2.6859, Perplexity: 14.6719, time_taken_in_seconds: 55
Epoch [1/1], Step [11370/13804], Loss: 2.3612, Perplexity: 10.6037, time_taken_in_seconds: 56
Epoch [1/1], Step [11371/13804], Loss: 2.7300, Perplexity: 15.3336, time_taken_in_seconds: 57
Epoch [1/1], Step [11372/13804], Loss: 2.3065, Perplexity: 10.0388, time_taken_in_seconds: 58
Epoch [1/1], Step [11373/13804], Loss: 2.6299, Perplexity: 13.8724, time_taken_in_seconds: 59
Epoch [1/1], Step [11374/13804], Loss: 2.4621, Perplexity: 11.7297, time_taken_in_seconds: 59
Epoch [1/1], Step [11375/13804], Loss: 2.5263, Perplexity: 12.5077, time_taken_in_seconds: 60
Epoch [1/1], Step [11376/13804], Loss: 2.8643, Perplexity: 17.5359, time_taken_in_seconds: 61
Epoch [1/1], Step [11377/13804], Loss: 2.2496, Perplexity: 9.4839, time_taken_in_seconds: 62
Epoch [1/1], Step [11378/13804], Loss: 2.4577, Perplexity: 11.6777, time_taken_in_seconds: 63
Epoch [1/1], Step [11379/13804], Loss: 2.4202, Perplexity: 11.2486, time_taken_in_seconds: 63
Epoch [1/1], Step [11380/13804], Loss: 2.3163, Perplexity: 10.1380, time_taken_in_seconds: 64
Epoch [1/1], Step [11381/13804], Loss: 2.6370, Perplexity: 13.9711, time_taken_in_seconds: 65
Epoch [1/1], Step [11382/13804], Loss: 2.7942, Perplexity: 16.3489, time_taken_in_seconds: 66
Epoch [1/1], Step [11383/13804], Loss: 2.9675, Perplexity: 19.4432, time_taken_in_seconds: 67
Epoch [1/1], Step [11384/13804], Loss: 2.6400, Perplexity: 14.0133, time_taken_in_seconds: 68
Epoch [1/1], Step [11385/13804], Loss: 2.5474, Perplexity: 12.7742, time_taken_in_seconds: 68
Epoch [1/1], Step [11386/13804], Loss: 2.3462, Perplexity: 10.4454, time_taken_in_seconds: 69
Epoch [1/1], Step [11387/13804], Loss: 2.3157, Perplexity: 10.1322, time_taken_in_seconds: 70
Epoch [1/1], Step [11388/13804], Loss: 2.5038, Perplexity: 12.2286, time_taken_in_seconds: 71
Epoch [1/1], Step [11389/13804], Loss: 2.4845, Perplexity: 11.9950, time_taken_in_seconds: 72
Epoch [1/1], Step [11390/13804], Loss: 2.6904, Perplexity: 14.7378, time_taken_in_seconds: 73
Epoch [1/1], Step [11391/13804], Loss: 2.3090, Perplexity: 10.0646, time_taken_in_seconds: 73
Epoch [1/1], Step [11392/13804], Loss: 2.4428, Perplexity: 11.5054, time_taken_in_seconds: 74
Epoch [1/1], Step [11393/13804], Loss: 2.5544, Perplexity: 12.8634, time_taken_in_seconds: 75
Epoch [1/1], Step [11394/13804], Loss: 2.6847, Perplexity: 14.6543, time_taken_in_seconds: 76
Epoch [1/1], Step [11395/13804], Loss: 3.1578, Perplexity: 23.5192, time_taken_in_seconds: 77
Epoch [1/1], Step [11396/13804], Loss: 2.4232, Perplexity: 11.2814, time_taken_in_seconds: 77
Epoch [1/1], Step [11397/13804], Loss: 2.5519, Perplexity: 12.8310, time_taken_in_seconds: 78
Epoch [1/1], Step [11398/13804], Loss: 2.3645, Perplexity: 10.6384, time_taken_in_seconds: 79
Epoch [1/1], Step [11399/13804], Loss: 2.8352, Perplexity: 17.0330, time_taken_in_seconds: 80
Epoch [1/1], Step [11400/13804], Loss: 2.3631, Perplexity: 10.6233, time_taken_in_seconds: 81
Epoch [1/1], Step [11401/13804], Loss: 2.6130, Perplexity: 13.6401, time_taken_in_seconds: 0
Epoch [1/1], Step [11402/13804], Loss: 2.4643, Perplexity: 11.7556, time_taken_in_seconds: 1
Epoch [1/1], Step [11403/13804], Loss: 2.5198, Perplexity: 12.4264, time_taken_in_seconds: 2
Epoch [1/1], Step [11404/13804], Loss: 2.8715, Perplexity: 17.6635, time_taken_in_seconds: 3
Epoch [1/1], Step [11405/13804], Loss: 2.4916, Perplexity: 12.0806, time_taken_in_seconds: 4
Epoch [1/1], Step [11406/13804], Loss: 2.7203, Perplexity: 15.1843, time_taken_in_seconds: 4
Epoch [1/1], Step [11407/13804], Loss: 2.4839, Perplexity: 11.9878, time_taken_in_seconds: 5
Epoch [1/1], Step [11408/13804], Loss: 2.4720, Perplexity: 11.8465, time_taken_in_seconds: 6
Epoch [1/1], Step [11409/13804], Loss: 2.5546, Perplexity: 12.8668, time_taken_in_seconds: 7
Epoch [1/1], Step [11410/13804], Loss: 2.3023, Perplexity: 9.9968, time_taken_in_seconds: 8
Epoch [1/1], Step [11411/13804], Loss: 2.2673, Perplexity: 9.6531, time_taken_in_seconds: 8
Epoch [1/1], Step [11412/13804], Loss: 2.3174, Perplexity: 10.1492, time_taken_in_seconds: 9
Epoch [1/1], Step [11413/13804], Loss: 2.4126, Perplexity: 11.1635, time_taken_in_seconds: 10
Epoch [1/1], Step [11414/13804], Loss: 2.5505, Perplexity: 12.8138, time_taken_in_seconds: 11
Epoch [1/1], Step [11415/13804], Loss: 2.6803, Perplexity: 14.5893, time_taken_in_seconds: 12
Epoch [1/1], Step [11416/13804], Loss: 2.3522, Perplexity: 10.5082, time_taken_in_seconds: 12
Epoch [1/1], Step [11417/13804], Loss: 2.4750, Perplexity: 11.8821, time_taken_in_seconds: 13
Epoch [1/1], Step [11418/13804], Loss: 3.0168, Perplexity: 20.4260, time_taken_in_seconds: 14
Epoch [1/1], Step [11419/13804], Loss: 2.1598, Perplexity: 8.6695, time_taken_in_seconds: 15
Epoch [1/1], Step [11420/13804], Loss: 2.2608, Perplexity: 9.5910, time_taken_in_seconds: 16
Epoch [1/1], Step [11421/13804], Loss: 2.4581, Perplexity: 11.6821, time_taken_in_seconds: 17
Epoch [1/1], Step [11422/13804], Loss: 2.7079, Perplexity: 14.9984, time_taken_in_seconds: 17
Epoch [1/1], Step [11423/13804], Loss: 2.8814, Perplexity: 17.8391, time_taken_in_seconds: 18
Epoch [1/1], Step [11424/13804], Loss: 2.3493, Perplexity: 10.4778, time_taken_in_seconds: 19
Epoch [1/1], Step [11425/13804], Loss: 2.7204, Perplexity: 15.1864, time_taken_in_seconds: 20
Epoch [1/1], Step [11426/13804], Loss: 2.7357, Perplexity: 15.4208, time_taken_in_seconds: 21
Epoch [1/1], Step [11427/13804], Loss: 2.7331, Perplexity: 15.3811, time_taken_in_seconds: 21
Epoch [1/1], Step [11428/13804], Loss: 2.5965, Perplexity: 13.4162, time_taken_in_seconds: 22
Epoch [1/1], Step [11429/13804], Loss: 2.4407, Perplexity: 11.4812, time_taken_in_seconds: 23
Epoch [1/1], Step [11430/13804], Loss: 2.4416, Perplexity: 11.4909, time_taken_in_seconds: 24
Epoch [1/1], Step [11431/13804], Loss: 2.3385, Perplexity: 10.3655, time_taken_in_seconds: 25
Epoch [1/1], Step [11432/13804], Loss: 2.4438, Perplexity: 11.5170, time_taken_in_seconds: 25
Epoch [1/1], Step [11433/13804], Loss: 2.4071, Perplexity: 11.1023, time_taken_in_seconds: 26
Epoch [1/1], Step [11434/13804], Loss: 2.2408, Perplexity: 9.4004, time_taken_in_seconds: 27
Epoch [1/1], Step [11435/13804], Loss: 2.5912, Perplexity: 13.3454, time_taken_in_seconds: 28
Epoch [1/1], Step [11436/13804], Loss: 2.6087, Perplexity: 13.5808, time_taken_in_seconds: 29
Epoch [1/1], Step [11437/13804], Loss: 2.8689, Perplexity: 17.6172, time_taken_in_seconds: 29
Epoch [1/1], Step [11438/13804], Loss: 2.6546, Perplexity: 14.2188, time_taken_in_seconds: 30
Epoch [1/1], Step [11439/13804], Loss: 2.9806, Perplexity: 19.6992, time_taken_in_seconds: 31
Epoch [1/1], Step [11440/13804], Loss: 2.3963, Perplexity: 10.9828, time_taken_in_seconds: 32
Epoch [1/1], Step [11441/13804], Loss: 2.7569, Perplexity: 15.7511, time_taken_in_seconds: 33
Epoch [1/1], Step [11442/13804], Loss: 2.3677, Perplexity: 10.6733, time_taken_in_seconds: 34
Epoch [1/1], Step [11443/13804], Loss: 2.7180, Perplexity: 15.1501, time_taken_in_seconds: 34
Epoch [1/1], Step [11444/13804], Loss: 2.9153, Perplexity: 18.4547, time_taken_in_seconds: 35
Epoch [1/1], Step [11445/13804], Loss: 2.9522, Perplexity: 19.1480, time_taken_in_seconds: 36
Epoch [1/1], Step [11446/13804], Loss: 2.5268, Perplexity: 12.5139, time_taken_in_seconds: 37
Epoch [1/1], Step [11447/13804], Loss: 2.6228, Perplexity: 13.7743, time_taken_in_seconds: 38
Epoch [1/1], Step [11448/13804], Loss: 2.4376, Perplexity: 11.4450, time_taken_in_seconds: 38
Epoch [1/1], Step [11449/13804], Loss: 2.4026, Perplexity: 11.0521, time_taken_in_seconds: 39
Epoch [1/1], Step [11450/13804], Loss: 3.1676, Perplexity: 23.7493, time_taken_in_seconds: 40
Epoch [1/1], Step [11451/13804], Loss: 2.5591, Perplexity: 12.9243, time_taken_in_seconds: 41
Epoch [1/1], Step [11452/13804], Loss: 2.6842, Perplexity: 14.6471, time_taken_in_seconds: 42
Epoch [1/1], Step [11453/13804], Loss: 2.2961, Perplexity: 9.9358, time_taken_in_seconds: 42
Epoch [1/1], Step [11454/13804], Loss: 3.3498, Perplexity: 28.4979, time_taken_in_seconds: 43
Epoch [1/1], Step [11455/13804], Loss: 2.1320, Perplexity: 8.4316, time_taken_in_seconds: 44
Epoch [1/1], Step [11456/13804], Loss: 2.6850, Perplexity: 14.6579, time_taken_in_seconds: 45
Epoch [1/1], Step [11457/13804], Loss: 2.3352, Perplexity: 10.3319, time_taken_in_seconds: 46
Epoch [1/1], Step [11458/13804], Loss: 2.6809, Perplexity: 14.5983, time_taken_in_seconds: 47
Epoch [1/1], Step [11459/13804], Loss: 2.4104, Perplexity: 11.1387, time_taken_in_seconds: 47
Epoch [1/1], Step [11460/13804], Loss: 2.5544, Perplexity: 12.8636, time_taken_in_seconds: 48
Epoch [1/1], Step [11461/13804], Loss: 2.4585, Perplexity: 11.6868, time_taken_in_seconds: 49
Epoch [1/1], Step [11462/13804], Loss: 2.7988, Perplexity: 16.4247, time_taken_in_seconds: 50
Epoch [1/1], Step [11463/13804], Loss: 2.4045, Perplexity: 11.0731, time_taken_in_seconds: 51
Epoch [1/1], Step [11464/13804], Loss: 2.4890, Perplexity: 12.0496, time_taken_in_seconds: 51
Epoch [1/1], Step [11465/13804], Loss: 2.6103, Perplexity: 13.6027, time_taken_in_seconds: 52
Epoch [1/1], Step [11466/13804], Loss: 2.7831, Perplexity: 16.1696, time_taken_in_seconds: 53
Epoch [1/1], Step [11467/13804], Loss: 2.3957, Perplexity: 10.9761, time_taken_in_seconds: 54
Epoch [1/1], Step [11468/13804], Loss: 2.6366, Perplexity: 13.9651, time_taken_in_seconds: 55
Epoch [1/1], Step [11469/13804], Loss: 2.5819, Perplexity: 13.2229, time_taken_in_seconds: 56
Epoch [1/1], Step [11470/13804], Loss: 2.2810, Perplexity: 9.7864, time_taken_in_seconds: 56
Epoch [1/1], Step [11471/13804], Loss: 2.6852, Perplexity: 14.6609, time_taken_in_seconds: 57
Epoch [1/1], Step [11472/13804], Loss: 2.3866, Perplexity: 10.8760, time_taken_in_seconds: 58
Epoch [1/1], Step [11473/13804], Loss: 2.4706, Perplexity: 11.8301, time_taken_in_seconds: 59
Epoch [1/1], Step [11474/13804], Loss: 2.4987, Perplexity: 12.1667, time_taken_in_seconds: 60
Epoch [1/1], Step [11475/13804], Loss: 2.7440, Perplexity: 15.5489, time_taken_in_seconds: 60
Epoch [1/1], Step [11476/13804], Loss: 2.6543, Perplexity: 14.2156, time_taken_in_seconds: 61
Epoch [1/1], Step [11477/13804], Loss: 2.4150, Perplexity: 11.1901, time_taken_in_seconds: 62
Epoch [1/1], Step [11478/13804], Loss: 2.3248, Perplexity: 10.2249, time_taken_in_seconds: 63
Epoch [1/1], Step [11479/13804], Loss: 2.6166, Perplexity: 13.6887, time_taken_in_seconds: 64
Epoch [1/1], Step [11480/13804], Loss: 3.0439, Perplexity: 20.9863, time_taken_in_seconds: 64
Epoch [1/1], Step [11481/13804], Loss: 2.3964, Perplexity: 10.9834, time_taken_in_seconds: 65
Epoch [1/1], Step [11482/13804], Loss: 2.8569, Perplexity: 17.4076, time_taken_in_seconds: 66
Epoch [1/1], Step [11483/13804], Loss: 3.1172, Perplexity: 22.5834, time_taken_in_seconds: 67
Epoch [1/1], Step [11484/13804], Loss: 2.5076, Perplexity: 12.2754, time_taken_in_seconds: 68
Epoch [1/1], Step [11485/13804], Loss: 2.7127, Perplexity: 15.0697, time_taken_in_seconds: 68
Epoch [1/1], Step [11486/13804], Loss: 2.2110, Perplexity: 9.1247, time_taken_in_seconds: 69
Epoch [1/1], Step [11487/13804], Loss: 2.5498, Perplexity: 12.8045, time_taken_in_seconds: 70
Epoch [1/1], Step [11488/13804], Loss: 2.6950, Perplexity: 14.8057, time_taken_in_seconds: 71
Epoch [1/1], Step [11489/13804], Loss: 2.4134, Perplexity: 11.1720, time_taken_in_seconds: 72
Epoch [1/1], Step [11490/13804], Loss: 2.2732, Perplexity: 9.7108, time_taken_in_seconds: 72
Epoch [1/1], Step [11491/13804], Loss: 2.3029, Perplexity: 10.0032, time_taken_in_seconds: 73
Epoch [1/1], Step [11492/13804], Loss: 2.4867, Perplexity: 12.0220, time_taken_in_seconds: 74
Epoch [1/1], Step [11493/13804], Loss: 2.2799, Perplexity: 9.7762, time_taken_in_seconds: 75
Epoch [1/1], Step [11494/13804], Loss: 2.5033, Perplexity: 12.2227, time_taken_in_seconds: 76
Epoch [1/1], Step [11495/13804], Loss: 2.2973, Perplexity: 9.9470, time_taken_in_seconds: 76
Epoch [1/1], Step [11496/13804], Loss: 2.4795, Perplexity: 11.9357, time_taken_in_seconds: 77
Epoch [1/1], Step [11497/13804], Loss: 2.4940, Perplexity: 12.1098, time_taken_in_seconds: 78
Epoch [1/1], Step [11498/13804], Loss: 2.1341, Perplexity: 8.4496, time_taken_in_seconds: 79
Epoch [1/1], Step [11499/13804], Loss: 2.8505, Perplexity: 17.2969, time_taken_in_seconds: 80
Epoch [1/1], Step [11500/13804], Loss: 2.6260, Perplexity: 13.8182, time_taken_in_seconds: 81
Epoch [1/1], Step [11501/13804], Loss: 2.6540, Perplexity: 14.2109, time_taken_in_seconds: 0
Epoch [1/1], Step [11502/13804], Loss: 3.2049, Perplexity: 24.6526, time_taken_in_seconds: 1
Epoch [1/1], Step [11503/13804], Loss: 2.6919, Perplexity: 14.7604, time_taken_in_seconds: 2
Epoch [1/1], Step [11504/13804], Loss: 2.1350, Perplexity: 8.4568, time_taken_in_seconds: 3
Epoch [1/1], Step [11505/13804], Loss: 2.7198, Perplexity: 15.1779, time_taken_in_seconds: 4
Epoch [1/1], Step [11506/13804], Loss: 2.4254, Perplexity: 11.3063, time_taken_in_seconds: 4
Epoch [1/1], Step [11507/13804], Loss: 2.8123, Perplexity: 16.6475, time_taken_in_seconds: 5
Epoch [1/1], Step [11508/13804], Loss: 2.6970, Perplexity: 14.8349, time_taken_in_seconds: 6
Epoch [1/1], Step [11509/13804], Loss: 2.2509, Perplexity: 9.4962, time_taken_in_seconds: 7
Epoch [1/1], Step [11510/13804], Loss: 2.1120, Perplexity: 8.2644, time_taken_in_seconds: 8
Epoch [1/1], Step [11511/13804], Loss: 2.5551, Perplexity: 12.8720, time_taken_in_seconds: 8
Epoch [1/1], Step [11512/13804], Loss: 2.4027, Perplexity: 11.0533, time_taken_in_seconds: 9
Epoch [1/1], Step [11513/13804], Loss: 2.7587, Perplexity: 15.7793, time_taken_in_seconds: 10
Epoch [1/1], Step [11514/13804], Loss: 2.7972, Perplexity: 16.3993, time_taken_in_seconds: 11
Epoch [1/1], Step [11515/13804], Loss: 2.6914, Perplexity: 14.7522, time_taken_in_seconds: 12
Epoch [1/1], Step [11516/13804], Loss: 2.7014, Perplexity: 14.9008, time_taken_in_seconds: 12
Epoch [1/1], Step [11517/13804], Loss: 2.3955, Perplexity: 10.9738, time_taken_in_seconds: 13
Epoch [1/1], Step [11518/13804], Loss: 2.5052, Perplexity: 12.2460, time_taken_in_seconds: 14
Epoch [1/1], Step [11519/13804], Loss: 2.6811, Perplexity: 14.6009, time_taken_in_seconds: 15
Epoch [1/1], Step [11520/13804], Loss: 2.4255, Perplexity: 11.3078, time_taken_in_seconds: 16
Epoch [1/1], Step [11521/13804], Loss: 2.3436, Perplexity: 10.4187, time_taken_in_seconds: 16
Epoch [1/1], Step [11522/13804], Loss: 2.4132, Perplexity: 11.1698, time_taken_in_seconds: 17
Epoch [1/1], Step [11523/13804], Loss: 2.7210, Perplexity: 15.1958, time_taken_in_seconds: 18
Epoch [1/1], Step [11524/13804], Loss: 2.5583, Perplexity: 12.9135, time_taken_in_seconds: 19
Epoch [1/1], Step [11525/13804], Loss: 2.7822, Perplexity: 16.1545, time_taken_in_seconds: 20
Epoch [1/1], Step [11526/13804], Loss: 2.4798, Perplexity: 11.9386, time_taken_in_seconds: 20
Epoch [1/1], Step [11527/13804], Loss: 2.5353, Perplexity: 12.6206, time_taken_in_seconds: 21
Epoch [1/1], Step [11528/13804], Loss: 2.6886, Perplexity: 14.7106, time_taken_in_seconds: 22
Epoch [1/1], Step [11529/13804], Loss: 2.7281, Perplexity: 15.3032, time_taken_in_seconds: 23
Epoch [1/1], Step [11530/13804], Loss: 2.6382, Perplexity: 13.9883, time_taken_in_seconds: 24
Epoch [1/1], Step [11531/13804], Loss: 2.4665, Perplexity: 11.7806, time_taken_in_seconds: 25
Epoch [1/1], Step [11532/13804], Loss: 2.1907, Perplexity: 8.9417, time_taken_in_seconds: 26
Epoch [1/1], Step [11533/13804], Loss: 2.8078, Perplexity: 16.5730, time_taken_in_seconds: 26
Epoch [1/1], Step [11534/13804], Loss: 2.6190, Perplexity: 13.7220, time_taken_in_seconds: 27
Epoch [1/1], Step [11535/13804], Loss: 2.4125, Perplexity: 11.1621, time_taken_in_seconds: 28
Epoch [1/1], Step [11536/13804], Loss: 2.9038, Perplexity: 18.2440, time_taken_in_seconds: 29
Epoch [1/1], Step [11537/13804], Loss: 2.2048, Perplexity: 9.0680, time_taken_in_seconds: 30
Epoch [1/1], Step [11538/13804], Loss: 2.9445, Perplexity: 19.0014, time_taken_in_seconds: 30
Epoch [1/1], Step [11539/13804], Loss: 2.4372, Perplexity: 11.4415, time_taken_in_seconds: 31
Epoch [1/1], Step [11540/13804], Loss: 2.5324, Perplexity: 12.5839, time_taken_in_seconds: 32
Epoch [1/1], Step [11541/13804], Loss: 2.6295, Perplexity: 13.8672, time_taken_in_seconds: 33
Epoch [1/1], Step [11542/13804], Loss: 2.3918, Perplexity: 10.9331, time_taken_in_seconds: 34
Epoch [1/1], Step [11543/13804], Loss: 2.8356, Perplexity: 17.0401, time_taken_in_seconds: 34
Epoch [1/1], Step [11544/13804], Loss: 2.6885, Perplexity: 14.7099, time_taken_in_seconds: 35
Epoch [1/1], Step [11545/13804], Loss: 2.8872, Perplexity: 17.9426, time_taken_in_seconds: 36
Epoch [1/1], Step [11546/13804], Loss: 2.7988, Perplexity: 16.4247, time_taken_in_seconds: 37
Epoch [1/1], Step [11547/13804], Loss: 2.7039, Perplexity: 14.9375, time_taken_in_seconds: 38
Epoch [1/1], Step [11548/13804], Loss: 2.5679, Perplexity: 13.0390, time_taken_in_seconds: 39
Epoch [1/1], Step [11549/13804], Loss: 2.4315, Perplexity: 11.3758, time_taken_in_seconds: 39
Epoch [1/1], Step [11550/13804], Loss: 2.5003, Perplexity: 12.1860, time_taken_in_seconds: 40
Epoch [1/1], Step [11551/13804], Loss: 2.5923, Perplexity: 13.3600, time_taken_in_seconds: 41
Epoch [1/1], Step [11552/13804], Loss: 2.3906, Perplexity: 10.9203, time_taken_in_seconds: 42
Epoch [1/1], Step [11553/13804], Loss: 2.7566, Perplexity: 15.7468, time_taken_in_seconds: 43
Epoch [1/1], Step [11554/13804], Loss: 2.5405, Perplexity: 12.6862, time_taken_in_seconds: 43
Epoch [1/1], Step [11555/13804], Loss: 2.5293, Perplexity: 12.5443, time_taken_in_seconds: 44
Epoch [1/1], Step [11556/13804], Loss: 2.2044, Perplexity: 9.0648, time_taken_in_seconds: 45
Epoch [1/1], Step [11557/13804], Loss: 3.8488, Perplexity: 46.9359, time_taken_in_seconds: 46
Epoch [1/1], Step [11558/13804], Loss: 2.6483, Perplexity: 14.1301, time_taken_in_seconds: 47
Epoch [1/1], Step [11559/13804], Loss: 2.5362, Perplexity: 12.6320, time_taken_in_seconds: 47
Epoch [1/1], Step [11560/13804], Loss: 2.3806, Perplexity: 10.8112, time_taken_in_seconds: 48
Epoch [1/1], Step [11561/13804], Loss: 2.3705, Perplexity: 10.7028, time_taken_in_seconds: 49
Epoch [1/1], Step [11562/13804], Loss: 2.5943, Perplexity: 13.3873, time_taken_in_seconds: 50
Epoch [1/1], Step [11563/13804], Loss: 2.7930, Perplexity: 16.3295, time_taken_in_seconds: 51
Epoch [1/1], Step [11564/13804], Loss: 3.0876, Perplexity: 21.9239, time_taken_in_seconds: 52
Epoch [1/1], Step [11565/13804], Loss: 2.3231, Perplexity: 10.2071, time_taken_in_seconds: 52
Epoch [1/1], Step [11566/13804], Loss: 2.6942, Perplexity: 14.7930, time_taken_in_seconds: 53
Epoch [1/1], Step [11567/13804], Loss: 2.6456, Perplexity: 14.0924, time_taken_in_seconds: 54
Epoch [1/1], Step [11568/13804], Loss: 2.6442, Perplexity: 14.0721, time_taken_in_seconds: 55
Epoch [1/1], Step [11569/13804], Loss: 2.4228, Perplexity: 11.2773, time_taken_in_seconds: 56
Epoch [1/1], Step [11570/13804], Loss: 2.3572, Perplexity: 10.5611, time_taken_in_seconds: 56
Epoch [1/1], Step [11571/13804], Loss: 2.3435, Perplexity: 10.4178, time_taken_in_seconds: 57
Epoch [1/1], Step [11572/13804], Loss: 2.3574, Perplexity: 10.5630, time_taken_in_seconds: 58
Epoch [1/1], Step [11573/13804], Loss: 2.7938, Perplexity: 16.3423, time_taken_in_seconds: 59
Epoch [1/1], Step [11574/13804], Loss: 2.2373, Perplexity: 9.3678, time_taken_in_seconds: 60
Epoch [1/1], Step [11575/13804], Loss: 2.7379, Perplexity: 15.4542, time_taken_in_seconds: 60
Epoch [1/1], Step [11576/13804], Loss: 2.5365, Perplexity: 12.6358, time_taken_in_seconds: 61
Epoch [1/1], Step [11577/13804], Loss: 2.2879, Perplexity: 9.8543, time_taken_in_seconds: 62
Epoch [1/1], Step [11578/13804], Loss: 2.5447, Perplexity: 12.7389, time_taken_in_seconds: 63
Epoch [1/1], Step [11579/13804], Loss: 2.6028, Perplexity: 13.5009, time_taken_in_seconds: 64
Epoch [1/1], Step [11580/13804], Loss: 2.6372, Perplexity: 13.9738, time_taken_in_seconds: 65
Epoch [1/1], Step [11581/13804], Loss: 2.3863, Perplexity: 10.8732, time_taken_in_seconds: 65
Epoch [1/1], Step [11582/13804], Loss: 2.4256, Perplexity: 11.3094, time_taken_in_seconds: 66
Epoch [1/1], Step [11583/13804], Loss: 2.6819, Perplexity: 14.6135, time_taken_in_seconds: 67
Epoch [1/1], Step [11584/13804], Loss: 2.7989, Perplexity: 16.4272, time_taken_in_seconds: 68
Epoch [1/1], Step [11585/13804], Loss: 2.2686, Perplexity: 9.6660, time_taken_in_seconds: 69
Epoch [1/1], Step [11586/13804], Loss: 2.8583, Perplexity: 17.4324, time_taken_in_seconds: 69
Epoch [1/1], Step [11587/13804], Loss: 3.6661, Perplexity: 39.0976, time_taken_in_seconds: 70
Epoch [1/1], Step [11588/13804], Loss: 2.6267, Perplexity: 13.8279, time_taken_in_seconds: 71
Epoch [1/1], Step [11589/13804], Loss: 2.4901, Perplexity: 12.0621, time_taken_in_seconds: 72
Epoch [1/1], Step [11590/13804], Loss: 2.1955, Perplexity: 8.9844, time_taken_in_seconds: 73
Epoch [1/1], Step [11591/13804], Loss: 2.6838, Perplexity: 14.6402, time_taken_in_seconds: 73
Epoch [1/1], Step [11592/13804], Loss: 2.5711, Perplexity: 13.0802, time_taken_in_seconds: 74
Epoch [1/1], Step [11593/13804], Loss: 2.6164, Perplexity: 13.6867, time_taken_in_seconds: 75
Epoch [1/1], Step [11594/13804], Loss: 2.5367, Perplexity: 12.6379, time_taken_in_seconds: 76
Epoch [1/1], Step [11595/13804], Loss: 2.9411, Perplexity: 18.9370, time_taken_in_seconds: 77
Epoch [1/1], Step [11596/13804], Loss: 2.7053, Perplexity: 14.9592, time_taken_in_seconds: 77
Epoch [1/1], Step [11597/13804], Loss: 3.2749, Perplexity: 26.4395, time_taken_in_seconds: 78
Epoch [1/1], Step [11598/13804], Loss: 2.4115, Perplexity: 11.1509, time_taken_in_seconds: 79
Epoch [1/1], Step [11599/13804], Loss: 2.3607, Perplexity: 10.5980, time_taken_in_seconds: 80
Epoch [1/1], Step [11600/13804], Loss: 2.9516, Perplexity: 19.1374, time_taken_in_seconds: 81
Epoch [1/1], Step [11601/13804], Loss: 2.4614, Perplexity: 11.7214, time_taken_in_seconds: 0
Epoch [1/1], Step [11602/13804], Loss: 2.4686, Perplexity: 11.8065, time_taken_in_seconds: 1
Epoch [1/1], Step [11603/13804], Loss: 2.6911, Perplexity: 14.7474, time_taken_in_seconds: 2
Epoch [1/1], Step [11604/13804], Loss: 2.4808, Perplexity: 11.9507, time_taken_in_seconds: 3
Epoch [1/1], Step [11605/13804], Loss: 2.4056, Perplexity: 11.0851, time_taken_in_seconds: 4
Epoch [1/1], Step [11606/13804], Loss: 2.4229, Perplexity: 11.2783, time_taken_in_seconds: 5
Epoch [1/1], Step [11607/13804], Loss: 2.4300, Perplexity: 11.3593, time_taken_in_seconds: 5
Epoch [1/1], Step [11608/13804], Loss: 4.3940, Perplexity: 80.9649, time_taken_in_seconds: 6
Epoch [1/1], Step [11609/13804], Loss: 2.3290, Perplexity: 10.2675, time_taken_in_seconds: 7
Epoch [1/1], Step [11610/13804], Loss: 2.6656, Perplexity: 14.3761, time_taken_in_seconds: 8
Epoch [1/1], Step [11611/13804], Loss: 3.1290, Perplexity: 22.8522, time_taken_in_seconds: 9
Epoch [1/1], Step [11612/13804], Loss: 2.4216, Perplexity: 11.2641, time_taken_in_seconds: 9
Epoch [1/1], Step [11613/13804], Loss: 2.2383, Perplexity: 9.3778, time_taken_in_seconds: 10
Epoch [1/1], Step [11614/13804], Loss: 2.3943, Perplexity: 10.9607, time_taken_in_seconds: 11
Epoch [1/1], Step [11615/13804], Loss: 2.6143, Perplexity: 13.6573, time_taken_in_seconds: 12
Epoch [1/1], Step [11616/13804], Loss: 2.3439, Perplexity: 10.4217, time_taken_in_seconds: 13
Epoch [1/1], Step [11617/13804], Loss: 3.2259, Perplexity: 25.1757, time_taken_in_seconds: 13
Epoch [1/1], Step [11618/13804], Loss: 2.2670, Perplexity: 9.6501, time_taken_in_seconds: 14
Epoch [1/1], Step [11619/13804], Loss: 2.4404, Perplexity: 11.4774, time_taken_in_seconds: 15
Epoch [1/1], Step [11620/13804], Loss: 2.4963, Perplexity: 12.1376, time_taken_in_seconds: 16
Epoch [1/1], Step [11621/13804], Loss: 2.4813, Perplexity: 11.9572, time_taken_in_seconds: 17
Epoch [1/1], Step [11622/13804], Loss: 2.4056, Perplexity: 11.0847, time_taken_in_seconds: 18
Epoch [1/1], Step [11623/13804], Loss: 2.4379, Perplexity: 11.4485, time_taken_in_seconds: 18
Epoch [1/1], Step [11624/13804], Loss: 2.7278, Perplexity: 15.2993, time_taken_in_seconds: 19
Epoch [1/1], Step [11625/13804], Loss: 2.4658, Perplexity: 11.7724, time_taken_in_seconds: 20
Epoch [1/1], Step [11626/13804], Loss: 2.6944, Perplexity: 14.7968, time_taken_in_seconds: 21
Epoch [1/1], Step [11627/13804], Loss: 2.5612, Perplexity: 12.9509, time_taken_in_seconds: 22
Epoch [1/1], Step [11628/13804], Loss: 2.7786, Perplexity: 16.0958, time_taken_in_seconds: 22
Epoch [1/1], Step [11629/13804], Loss: 2.4468, Perplexity: 11.5511, time_taken_in_seconds: 23
Epoch [1/1], Step [11630/13804], Loss: 2.8038, Perplexity: 16.5075, time_taken_in_seconds: 24
Epoch [1/1], Step [11631/13804], Loss: 2.2368, Perplexity: 9.3629, time_taken_in_seconds: 25
Epoch [1/1], Step [11632/13804], Loss: 2.5633, Perplexity: 12.9783, time_taken_in_seconds: 26
Epoch [1/1], Step [11633/13804], Loss: 2.4769, Perplexity: 11.9040, time_taken_in_seconds: 26
Epoch [1/1], Step [11634/13804], Loss: 2.4181, Perplexity: 11.2249, time_taken_in_seconds: 27
Epoch [1/1], Step [11635/13804], Loss: 2.3534, Perplexity: 10.5210, time_taken_in_seconds: 28
Epoch [1/1], Step [11636/13804], Loss: 2.9838, Perplexity: 19.7623, time_taken_in_seconds: 29
Epoch [1/1], Step [11637/13804], Loss: 2.8072, Perplexity: 16.5628, time_taken_in_seconds: 30
Epoch [1/1], Step [11638/13804], Loss: 2.2304, Perplexity: 9.3033, time_taken_in_seconds: 31
Epoch [1/1], Step [11639/13804], Loss: 2.2235, Perplexity: 9.2394, time_taken_in_seconds: 31
Epoch [1/1], Step [11640/13804], Loss: 2.4999, Perplexity: 12.1812, time_taken_in_seconds: 32
Epoch [1/1], Step [11641/13804], Loss: 2.3817, Perplexity: 10.8236, time_taken_in_seconds: 33
Epoch [1/1], Step [11642/13804], Loss: 2.5100, Perplexity: 12.3053, time_taken_in_seconds: 34
Epoch [1/1], Step [11643/13804], Loss: 2.5628, Perplexity: 12.9726, time_taken_in_seconds: 35
Epoch [1/1], Step [11644/13804], Loss: 2.6603, Perplexity: 14.3003, time_taken_in_seconds: 35
Epoch [1/1], Step [11645/13804], Loss: 2.5630, Perplexity: 12.9745, time_taken_in_seconds: 36
Epoch [1/1], Step [11646/13804], Loss: 2.6036, Perplexity: 13.5124, time_taken_in_seconds: 37
Epoch [1/1], Step [11647/13804], Loss: 2.5256, Perplexity: 12.4985, time_taken_in_seconds: 38
Epoch [1/1], Step [11648/13804], Loss: 2.4298, Perplexity: 11.3562, time_taken_in_seconds: 39
Epoch [1/1], Step [11649/13804], Loss: 2.7440, Perplexity: 15.5485, time_taken_in_seconds: 40
Epoch [1/1], Step [11650/13804], Loss: 2.4432, Perplexity: 11.5098, time_taken_in_seconds: 40
Epoch [1/1], Step [11651/13804], Loss: 2.6306, Perplexity: 13.8824, time_taken_in_seconds: 41
Epoch [1/1], Step [11652/13804], Loss: 2.4998, Perplexity: 12.1797, time_taken_in_seconds: 42
Epoch [1/1], Step [11653/13804], Loss: 2.4570, Perplexity: 11.6693, time_taken_in_seconds: 43
Epoch [1/1], Step [11654/13804], Loss: 2.3080, Perplexity: 10.0543, time_taken_in_seconds: 44
Epoch [1/1], Step [11655/13804], Loss: 2.5795, Perplexity: 13.1899, time_taken_in_seconds: 44
Epoch [1/1], Step [11656/13804], Loss: 2.4454, Perplexity: 11.5346, time_taken_in_seconds: 45
Epoch [1/1], Step [11657/13804], Loss: 2.4949, Perplexity: 12.1203, time_taken_in_seconds: 46
Epoch [1/1], Step [11658/13804], Loss: 2.8149, Perplexity: 16.6916, time_taken_in_seconds: 47
Epoch [1/1], Step [11659/13804], Loss: 2.2599, Perplexity: 9.5820, time_taken_in_seconds: 48
Epoch [1/1], Step [11660/13804], Loss: 2.3301, Perplexity: 10.2785, time_taken_in_seconds: 49
Epoch [1/1], Step [11661/13804], Loss: 2.9557, Perplexity: 19.2161, time_taken_in_seconds: 49
Epoch [1/1], Step [11662/13804], Loss: 2.4484, Perplexity: 11.5703, time_taken_in_seconds: 50
Epoch [1/1], Step [11663/13804], Loss: 2.7995, Perplexity: 16.4363, time_taken_in_seconds: 51
Epoch [1/1], Step [11664/13804], Loss: 2.7238, Perplexity: 15.2383, time_taken_in_seconds: 52
Epoch [1/1], Step [11665/13804], Loss: 2.7211, Perplexity: 15.1963, time_taken_in_seconds: 53
Epoch [1/1], Step [11666/13804], Loss: 2.2240, Perplexity: 9.2444, time_taken_in_seconds: 53
Epoch [1/1], Step [11667/13804], Loss: 2.4415, Perplexity: 11.4907, time_taken_in_seconds: 54
Epoch [1/1], Step [11668/13804], Loss: 2.5024, Perplexity: 12.2114, time_taken_in_seconds: 55
Epoch [1/1], Step [11669/13804], Loss: 2.4960, Perplexity: 12.1339, time_taken_in_seconds: 56
Epoch [1/1], Step [11670/13804], Loss: 2.3034, Perplexity: 10.0085, time_taken_in_seconds: 57
Epoch [1/1], Step [11671/13804], Loss: 2.9068, Perplexity: 18.2985, time_taken_in_seconds: 57
Epoch [1/1], Step [11672/13804], Loss: 2.3848, Perplexity: 10.8568, time_taken_in_seconds: 58
Epoch [1/1], Step [11673/13804], Loss: 3.4989, Perplexity: 33.0804, time_taken_in_seconds: 59
Epoch [1/1], Step [11674/13804], Loss: 2.6387, Perplexity: 13.9943, time_taken_in_seconds: 60
Epoch [1/1], Step [11675/13804], Loss: 2.5709, Perplexity: 13.0780, time_taken_in_seconds: 61
Epoch [1/1], Step [11676/13804], Loss: 2.5449, Perplexity: 12.7420, time_taken_in_seconds: 61
Epoch [1/1], Step [11677/13804], Loss: 2.5477, Perplexity: 12.7771, time_taken_in_seconds: 62
Epoch [1/1], Step [11678/13804], Loss: 2.6926, Perplexity: 14.7700, time_taken_in_seconds: 63
Epoch [1/1], Step [11679/13804], Loss: 3.2575, Perplexity: 25.9848, time_taken_in_seconds: 64
Epoch [1/1], Step [11680/13804], Loss: 2.3925, Perplexity: 10.9408, time_taken_in_seconds: 65
Epoch [1/1], Step [11681/13804], Loss: 2.3001, Perplexity: 9.9748, time_taken_in_seconds: 66
Epoch [1/1], Step [11682/13804], Loss: 2.9132, Perplexity: 18.4163, time_taken_in_seconds: 67
Epoch [1/1], Step [11683/13804], Loss: 2.7346, Perplexity: 15.4034, time_taken_in_seconds: 68
Epoch [1/1], Step [11684/13804], Loss: 2.3976, Perplexity: 10.9964, time_taken_in_seconds: 68
Epoch [1/1], Step [11685/13804], Loss: 2.2420, Perplexity: 9.4122, time_taken_in_seconds: 69
Epoch [1/1], Step [11686/13804], Loss: 2.5137, Perplexity: 12.3509, time_taken_in_seconds: 70
Epoch [1/1], Step [11687/13804], Loss: 2.4151, Perplexity: 11.1907, time_taken_in_seconds: 71
Epoch [1/1], Step [11688/13804], Loss: 2.6109, Perplexity: 13.6107, time_taken_in_seconds: 72
Epoch [1/1], Step [11689/13804], Loss: 2.6418, Perplexity: 14.0391, time_taken_in_seconds: 72
Epoch [1/1], Step [11690/13804], Loss: 2.4256, Perplexity: 11.3089, time_taken_in_seconds: 73
Epoch [1/1], Step [11691/13804], Loss: 2.7972, Perplexity: 16.3983, time_taken_in_seconds: 74
Epoch [1/1], Step [11692/13804], Loss: 2.5297, Perplexity: 12.5501, time_taken_in_seconds: 75
Epoch [1/1], Step [11693/13804], Loss: 2.4647, Perplexity: 11.7604, time_taken_in_seconds: 76
Epoch [1/1], Step [11694/13804], Loss: 2.6351, Perplexity: 13.9444, time_taken_in_seconds: 76
Epoch [1/1], Step [11695/13804], Loss: 2.4810, Perplexity: 11.9534, time_taken_in_seconds: 77
Epoch [1/1], Step [11696/13804], Loss: 2.3683, Perplexity: 10.6788, time_taken_in_seconds: 78
Epoch [1/1], Step [11697/13804], Loss: 2.9158, Perplexity: 18.4633, time_taken_in_seconds: 79
Epoch [1/1], Step [11698/13804], Loss: 2.5160, Perplexity: 12.3792, time_taken_in_seconds: 80
Epoch [1/1], Step [11699/13804], Loss: 2.4970, Perplexity: 12.1459, time_taken_in_seconds: 81
Epoch [1/1], Step [11700/13804], Loss: 2.4489, Perplexity: 11.5757, time_taken_in_seconds: 81
Epoch [1/1], Step [11701/13804], Loss: 2.2341, Perplexity: 9.3377, time_taken_in_seconds: 0
Epoch [1/1], Step [11702/13804], Loss: 2.4138, Perplexity: 11.1762, time_taken_in_seconds: 1
Epoch [1/1], Step [11703/13804], Loss: 2.5673, Perplexity: 13.0306, time_taken_in_seconds: 2
Epoch [1/1], Step [11704/13804], Loss: 2.6737, Perplexity: 14.4933, time_taken_in_seconds: 3
Epoch [1/1], Step [11705/13804], Loss: 2.9384, Perplexity: 18.8860, time_taken_in_seconds: 4
Epoch [1/1], Step [11706/13804], Loss: 2.3938, Perplexity: 10.9553, time_taken_in_seconds: 4
Epoch [1/1], Step [11707/13804], Loss: 2.8053, Perplexity: 16.5318, time_taken_in_seconds: 5
Epoch [1/1], Step [11708/13804], Loss: 2.7910, Perplexity: 16.2971, time_taken_in_seconds: 6
Epoch [1/1], Step [11709/13804], Loss: 2.7110, Perplexity: 15.0437, time_taken_in_seconds: 7
Epoch [1/1], Step [11710/13804], Loss: 2.7280, Perplexity: 15.3020, time_taken_in_seconds: 8
Epoch [1/1], Step [11711/13804], Loss: 2.4596, Perplexity: 11.7005, time_taken_in_seconds: 8
Epoch [1/1], Step [11712/13804], Loss: 2.6850, Perplexity: 14.6584, time_taken_in_seconds: 9
Epoch [1/1], Step [11713/13804], Loss: 3.0446, Perplexity: 21.0019, time_taken_in_seconds: 10
Epoch [1/1], Step [11714/13804], Loss: 2.9901, Perplexity: 19.8873, time_taken_in_seconds: 11
Epoch [1/1], Step [11715/13804], Loss: 2.4418, Perplexity: 11.4936, time_taken_in_seconds: 12
Epoch [1/1], Step [11716/13804], Loss: 2.5662, Perplexity: 13.0167, time_taken_in_seconds: 12
Epoch [1/1], Step [11717/13804], Loss: 2.5269, Perplexity: 12.5145, time_taken_in_seconds: 13
Epoch [1/1], Step [11718/13804], Loss: 2.5796, Perplexity: 13.1918, time_taken_in_seconds: 14
Epoch [1/1], Step [11719/13804], Loss: 2.3317, Perplexity: 10.2951, time_taken_in_seconds: 15
Epoch [1/1], Step [11720/13804], Loss: 2.4842, Perplexity: 11.9912, time_taken_in_seconds: 16
Epoch [1/1], Step [11721/13804], Loss: 2.2504, Perplexity: 9.4915, time_taken_in_seconds: 16
Epoch [1/1], Step [11722/13804], Loss: 2.6461, Perplexity: 14.0984, time_taken_in_seconds: 17
Epoch [1/1], Step [11723/13804], Loss: 2.9687, Perplexity: 19.4673, time_taken_in_seconds: 18
Epoch [1/1], Step [11724/13804], Loss: 2.2282, Perplexity: 9.2832, time_taken_in_seconds: 19
Epoch [1/1], Step [11725/13804], Loss: 2.5183, Perplexity: 12.4073, time_taken_in_seconds: 20
Epoch [1/1], Step [11726/13804], Loss: 2.3442, Perplexity: 10.4250, time_taken_in_seconds: 20
Epoch [1/1], Step [11727/13804], Loss: 2.7381, Perplexity: 15.4573, time_taken_in_seconds: 21
Epoch [1/1], Step [11728/13804], Loss: 2.7944, Perplexity: 16.3528, time_taken_in_seconds: 22
Epoch [1/1], Step [11729/13804], Loss: 2.4300, Perplexity: 11.3587, time_taken_in_seconds: 23
Epoch [1/1], Step [11730/13804], Loss: 2.4585, Perplexity: 11.6878, time_taken_in_seconds: 24
Epoch [1/1], Step [11731/13804], Loss: 2.6636, Perplexity: 14.3477, time_taken_in_seconds: 25
Epoch [1/1], Step [11732/13804], Loss: 2.4166, Perplexity: 11.2077, time_taken_in_seconds: 25
Epoch [1/1], Step [11733/13804], Loss: 2.2962, Perplexity: 9.9367, time_taken_in_seconds: 26
Epoch [1/1], Step [11734/13804], Loss: 2.4177, Perplexity: 11.2205, time_taken_in_seconds: 27
Epoch [1/1], Step [11735/13804], Loss: 2.7773, Perplexity: 16.0752, time_taken_in_seconds: 28
Epoch [1/1], Step [11736/13804], Loss: 2.3706, Perplexity: 10.7033, time_taken_in_seconds: 29
Epoch [1/1], Step [11737/13804], Loss: 3.4668, Perplexity: 32.0350, time_taken_in_seconds: 29
Epoch [1/1], Step [11738/13804], Loss: 2.4653, Perplexity: 11.7672, time_taken_in_seconds: 30
Epoch [1/1], Step [11739/13804], Loss: 2.4332, Perplexity: 11.3952, time_taken_in_seconds: 31
Epoch [1/1], Step [11740/13804], Loss: 2.4057, Perplexity: 11.0858, time_taken_in_seconds: 32
Epoch [1/1], Step [11741/13804], Loss: 2.5188, Perplexity: 12.4141, time_taken_in_seconds: 33
Epoch [1/1], Step [11742/13804], Loss: 2.5517, Perplexity: 12.8284, time_taken_in_seconds: 33
Epoch [1/1], Step [11743/13804], Loss: 2.1932, Perplexity: 8.9635, time_taken_in_seconds: 34
Epoch [1/1], Step [11744/13804], Loss: 2.6951, Perplexity: 14.8070, time_taken_in_seconds: 35
Epoch [1/1], Step [11745/13804], Loss: 2.4737, Perplexity: 11.8665, time_taken_in_seconds: 36
Epoch [1/1], Step [11746/13804], Loss: 2.4843, Perplexity: 11.9931, time_taken_in_seconds: 37
Epoch [1/1], Step [11747/13804], Loss: 2.7168, Perplexity: 15.1315, time_taken_in_seconds: 37
Epoch [1/1], Step [11748/13804], Loss: 2.4161, Perplexity: 11.2018, time_taken_in_seconds: 38
Epoch [1/1], Step [11749/13804], Loss: 2.4321, Perplexity: 11.3826, time_taken_in_seconds: 39
Epoch [1/1], Step [11750/13804], Loss: 2.1077, Perplexity: 8.2294, time_taken_in_seconds: 40
Epoch [1/1], Step [11751/13804], Loss: 2.8378, Perplexity: 17.0788, time_taken_in_seconds: 41
Epoch [1/1], Step [11752/13804], Loss: 2.5057, Perplexity: 12.2515, time_taken_in_seconds: 41
Epoch [1/1], Step [11753/13804], Loss: 2.2912, Perplexity: 9.8867, time_taken_in_seconds: 42
Epoch [1/1], Step [11754/13804], Loss: 2.5195, Perplexity: 12.4226, time_taken_in_seconds: 43
Epoch [1/1], Step [11755/13804], Loss: 2.3932, Perplexity: 10.9485, time_taken_in_seconds: 44
Epoch [1/1], Step [11756/13804], Loss: 2.4081, Perplexity: 11.1124, time_taken_in_seconds: 45
Epoch [1/1], Step [11757/13804], Loss: 2.7294, Perplexity: 15.3238, time_taken_in_seconds: 46
Epoch [1/1], Step [11758/13804], Loss: 2.4471, Perplexity: 11.5547, time_taken_in_seconds: 47
Epoch [1/1], Step [11759/13804], Loss: 2.4226, Perplexity: 11.2750, time_taken_in_seconds: 47
Epoch [1/1], Step [11760/13804], Loss: 2.5123, Perplexity: 12.3331, time_taken_in_seconds: 48
Epoch [1/1], Step [11761/13804], Loss: 2.5649, Perplexity: 12.9996, time_taken_in_seconds: 49
Epoch [1/1], Step [11762/13804], Loss: 2.7848, Perplexity: 16.1967, time_taken_in_seconds: 50
Epoch [1/1], Step [11763/13804], Loss: 2.4899, Perplexity: 12.0606, time_taken_in_seconds: 51
Epoch [1/1], Step [11764/13804], Loss: 2.7254, Perplexity: 15.2618, time_taken_in_seconds: 51
Epoch [1/1], Step [11765/13804], Loss: 2.9575, Perplexity: 19.2501, time_taken_in_seconds: 52
Epoch [1/1], Step [11766/13804], Loss: 2.1383, Perplexity: 8.4848, time_taken_in_seconds: 53
Epoch [1/1], Step [11767/13804], Loss: 3.1219, Perplexity: 22.6892, time_taken_in_seconds: 54
Epoch [1/1], Step [11768/13804], Loss: 2.4824, Perplexity: 11.9694, time_taken_in_seconds: 55
Epoch [1/1], Step [11769/13804], Loss: 2.7519, Perplexity: 15.6719, time_taken_in_seconds: 55
Epoch [1/1], Step [11770/13804], Loss: 2.9417, Perplexity: 18.9475, time_taken_in_seconds: 56
Epoch [1/1], Step [11771/13804], Loss: 2.3949, Perplexity: 10.9674, time_taken_in_seconds: 57
Epoch [1/1], Step [11772/13804], Loss: 2.6031, Perplexity: 13.5061, time_taken_in_seconds: 58
Epoch [1/1], Step [11773/13804], Loss: 2.6300, Perplexity: 13.8731, time_taken_in_seconds: 59
Epoch [1/1], Step [11774/13804], Loss: 2.4532, Perplexity: 11.6250, time_taken_in_seconds: 59
Epoch [1/1], Step [11775/13804], Loss: 2.9552, Perplexity: 19.2046, time_taken_in_seconds: 60
Epoch [1/1], Step [11776/13804], Loss: 2.4563, Perplexity: 11.6610, time_taken_in_seconds: 61
Epoch [1/1], Step [11777/13804], Loss: 2.7156, Perplexity: 15.1144, time_taken_in_seconds: 62
Epoch [1/1], Step [11778/13804], Loss: 2.8360, Perplexity: 17.0478, time_taken_in_seconds: 63
Epoch [1/1], Step [11779/13804], Loss: 2.5411, Perplexity: 12.6936, time_taken_in_seconds: 64
Epoch [1/1], Step [11780/13804], Loss: 2.4927, Perplexity: 12.0942, time_taken_in_seconds: 64
Epoch [1/1], Step [11781/13804], Loss: 2.6770, Perplexity: 14.5416, time_taken_in_seconds: 65
Epoch [1/1], Step [11782/13804], Loss: 2.3918, Perplexity: 10.9334, time_taken_in_seconds: 66
Epoch [1/1], Step [11783/13804], Loss: 2.1674, Perplexity: 8.7355, time_taken_in_seconds: 67
Epoch [1/1], Step [11784/13804], Loss: 2.4816, Perplexity: 11.9601, time_taken_in_seconds: 68
Epoch [1/1], Step [11785/13804], Loss: 3.1026, Perplexity: 22.2549, time_taken_in_seconds: 68
Epoch [1/1], Step [11786/13804], Loss: 2.3845, Perplexity: 10.8540, time_taken_in_seconds: 69
Epoch [1/1], Step [11787/13804], Loss: 3.7847, Perplexity: 44.0240, time_taken_in_seconds: 70
Epoch [1/1], Step [11788/13804], Loss: 2.6229, Perplexity: 13.7759, time_taken_in_seconds: 71
Epoch [1/1], Step [11789/13804], Loss: 2.5058, Perplexity: 12.2535, time_taken_in_seconds: 72
Epoch [1/1], Step [11790/13804], Loss: 2.6507, Perplexity: 14.1642, time_taken_in_seconds: 72
Epoch [1/1], Step [11791/13804], Loss: 2.4822, Perplexity: 11.9670, time_taken_in_seconds: 73
Epoch [1/1], Step [11792/13804], Loss: 2.4956, Perplexity: 12.1285, time_taken_in_seconds: 74
Epoch [1/1], Step [11793/13804], Loss: 2.7629, Perplexity: 15.8453, time_taken_in_seconds: 75
Epoch [1/1], Step [11794/13804], Loss: 2.8147, Perplexity: 16.6890, time_taken_in_seconds: 76
Epoch [1/1], Step [11795/13804], Loss: 2.2582, Perplexity: 9.5657, time_taken_in_seconds: 76
Epoch [1/1], Step [11796/13804], Loss: 2.4440, Perplexity: 11.5185, time_taken_in_seconds: 77
Epoch [1/1], Step [11797/13804], Loss: 2.5987, Perplexity: 13.4460, time_taken_in_seconds: 78
Epoch [1/1], Step [11798/13804], Loss: 2.2516, Perplexity: 9.5029, time_taken_in_seconds: 79
Epoch [1/1], Step [11799/13804], Loss: 2.2363, Perplexity: 9.3587, time_taken_in_seconds: 80
Epoch [1/1], Step [11800/13804], Loss: 2.7960, Perplexity: 16.3784, time_taken_in_seconds: 80
Epoch [1/1], Step [11801/13804], Loss: 2.6188, Perplexity: 13.7198, time_taken_in_seconds: 0
Epoch [1/1], Step [11802/13804], Loss: 2.2073, Perplexity: 9.0914, time_taken_in_seconds: 1
Epoch [1/1], Step [11803/13804], Loss: 2.3511, Perplexity: 10.4966, time_taken_in_seconds: 2
Epoch [1/1], Step [11804/13804], Loss: 2.7537, Perplexity: 15.7011, time_taken_in_seconds: 3
Epoch [1/1], Step [11805/13804], Loss: 2.6384, Perplexity: 13.9911, time_taken_in_seconds: 4
Epoch [1/1], Step [11806/13804], Loss: 2.2642, Perplexity: 9.6237, time_taken_in_seconds: 4
Epoch [1/1], Step [11807/13804], Loss: 2.4824, Perplexity: 11.9701, time_taken_in_seconds: 5
Epoch [1/1], Step [11808/13804], Loss: 2.3603, Perplexity: 10.5945, time_taken_in_seconds: 6
Epoch [1/1], Step [11809/13804], Loss: 2.4721, Perplexity: 11.8477, time_taken_in_seconds: 7
Epoch [1/1], Step [11810/13804], Loss: 2.3529, Perplexity: 10.5156, time_taken_in_seconds: 8
Epoch [1/1], Step [11811/13804], Loss: 2.4956, Perplexity: 12.1290, time_taken_in_seconds: 8
Epoch [1/1], Step [11812/13804], Loss: 2.4730, Perplexity: 11.8583, time_taken_in_seconds: 9
Epoch [1/1], Step [11813/13804], Loss: 2.4561, Perplexity: 11.6598, time_taken_in_seconds: 10
Epoch [1/1], Step [11814/13804], Loss: 2.7416, Perplexity: 15.5111, time_taken_in_seconds: 11
Epoch [1/1], Step [11815/13804], Loss: 2.5724, Perplexity: 13.0978, time_taken_in_seconds: 12
Epoch [1/1], Step [11816/13804], Loss: 2.5782, Perplexity: 13.1735, time_taken_in_seconds: 12
Epoch [1/1], Step [11817/13804], Loss: 2.9202, Perplexity: 18.5449, time_taken_in_seconds: 13
Epoch [1/1], Step [11818/13804], Loss: 2.5338, Perplexity: 12.6019, time_taken_in_seconds: 14
Epoch [1/1], Step [11819/13804], Loss: 2.5802, Perplexity: 13.1999, time_taken_in_seconds: 15
Epoch [1/1], Step [11820/13804], Loss: 2.3163, Perplexity: 10.1385, time_taken_in_seconds: 16
Epoch [1/1], Step [11821/13804], Loss: 2.3302, Perplexity: 10.2803, time_taken_in_seconds: 17
Epoch [1/1], Step [11822/13804], Loss: 2.4013, Perplexity: 11.0370, time_taken_in_seconds: 17
Epoch [1/1], Step [11823/13804], Loss: 3.0226, Perplexity: 20.5449, time_taken_in_seconds: 18
Epoch [1/1], Step [11824/13804], Loss: 2.5257, Perplexity: 12.4994, time_taken_in_seconds: 19
Epoch [1/1], Step [11825/13804], Loss: 2.5369, Perplexity: 12.6405, time_taken_in_seconds: 20
Epoch [1/1], Step [11826/13804], Loss: 2.2972, Perplexity: 9.9460, time_taken_in_seconds: 21
Epoch [1/1], Step [11827/13804], Loss: 2.6879, Perplexity: 14.7011, time_taken_in_seconds: 22
Epoch [1/1], Step [11828/13804], Loss: 2.4539, Perplexity: 11.6340, time_taken_in_seconds: 23
Epoch [1/1], Step [11829/13804], Loss: 2.6046, Perplexity: 13.5257, time_taken_in_seconds: 23
Epoch [1/1], Step [11830/13804], Loss: 2.7559, Perplexity: 15.7355, time_taken_in_seconds: 24
Epoch [1/1], Step [11831/13804], Loss: 3.1095, Perplexity: 22.4110, time_taken_in_seconds: 25
Epoch [1/1], Step [11832/13804], Loss: 2.9642, Perplexity: 19.3799, time_taken_in_seconds: 26
Epoch [1/1], Step [11833/13804], Loss: 3.1505, Perplexity: 23.3484, time_taken_in_seconds: 27
Epoch [1/1], Step [11834/13804], Loss: 2.2785, Perplexity: 9.7621, time_taken_in_seconds: 27
Epoch [1/1], Step [11835/13804], Loss: 2.7198, Perplexity: 15.1769, time_taken_in_seconds: 28
Epoch [1/1], Step [11836/13804], Loss: 2.4556, Perplexity: 11.6533, time_taken_in_seconds: 29
Epoch [1/1], Step [11837/13804], Loss: 2.3768, Perplexity: 10.7703, time_taken_in_seconds: 30
Epoch [1/1], Step [11838/13804], Loss: 2.7089, Perplexity: 15.0132, time_taken_in_seconds: 31
Epoch [1/1], Step [11839/13804], Loss: 2.9701, Perplexity: 19.4933, time_taken_in_seconds: 32
Epoch [1/1], Step [11840/13804], Loss: 3.0786, Perplexity: 21.7279, time_taken_in_seconds: 32
Epoch [1/1], Step [11841/13804], Loss: 2.4499, Perplexity: 11.5876, time_taken_in_seconds: 33
Epoch [1/1], Step [11842/13804], Loss: 2.4839, Perplexity: 11.9880, time_taken_in_seconds: 34
Epoch [1/1], Step [11843/13804], Loss: 2.4256, Perplexity: 11.3087, time_taken_in_seconds: 35
Epoch [1/1], Step [11844/13804], Loss: 2.6332, Perplexity: 13.9178, time_taken_in_seconds: 36
Epoch [1/1], Step [11845/13804], Loss: 2.6980, Perplexity: 14.8496, time_taken_in_seconds: 36
Epoch [1/1], Step [11846/13804], Loss: 2.8983, Perplexity: 18.1428, time_taken_in_seconds: 37
Epoch [1/1], Step [11847/13804], Loss: 2.2056, Perplexity: 9.0759, time_taken_in_seconds: 38
Epoch [1/1], Step [11848/13804], Loss: 2.5513, Perplexity: 12.8237, time_taken_in_seconds: 39
Epoch [1/1], Step [11849/13804], Loss: 2.2434, Perplexity: 9.4255, time_taken_in_seconds: 40
Epoch [1/1], Step [11850/13804], Loss: 3.0707, Perplexity: 21.5571, time_taken_in_seconds: 40
Epoch [1/1], Step [11851/13804], Loss: 2.3719, Perplexity: 10.7175, time_taken_in_seconds: 41
Epoch [1/1], Step [11852/13804], Loss: 2.7581, Perplexity: 15.7694, time_taken_in_seconds: 42
Epoch [1/1], Step [11853/13804], Loss: 2.7200, Perplexity: 15.1802, time_taken_in_seconds: 43
Epoch [1/1], Step [11854/13804], Loss: 2.4499, Perplexity: 11.5875, time_taken_in_seconds: 44
Epoch [1/1], Step [11855/13804], Loss: 2.7890, Perplexity: 16.2645, time_taken_in_seconds: 44
Epoch [1/1], Step [11856/13804], Loss: 3.3406, Perplexity: 28.2359, time_taken_in_seconds: 45
Epoch [1/1], Step [11857/13804], Loss: 2.3642, Perplexity: 10.6354, time_taken_in_seconds: 46
Epoch [1/1], Step [11858/13804], Loss: 2.3965, Perplexity: 10.9843, time_taken_in_seconds: 47
Epoch [1/1], Step [11859/13804], Loss: 2.6809, Perplexity: 14.5983, time_taken_in_seconds: 48
Epoch [1/1], Step [11860/13804], Loss: 2.6022, Perplexity: 13.4940, time_taken_in_seconds: 49
Epoch [1/1], Step [11861/13804], Loss: 2.3282, Perplexity: 10.2595, time_taken_in_seconds: 49
Epoch [1/1], Step [11862/13804], Loss: 2.7938, Perplexity: 16.3432, time_taken_in_seconds: 50
Epoch [1/1], Step [11863/13804], Loss: 2.5727, Perplexity: 13.1008, time_taken_in_seconds: 51
Epoch [1/1], Step [11864/13804], Loss: 2.8219, Perplexity: 16.8088, time_taken_in_seconds: 52
Epoch [1/1], Step [11865/13804], Loss: 2.6704, Perplexity: 14.4452, time_taken_in_seconds: 53
Epoch [1/1], Step [11866/13804], Loss: 2.4498, Perplexity: 11.5856, time_taken_in_seconds: 53
Epoch [1/1], Step [11867/13804], Loss: 3.8055, Perplexity: 44.9498, time_taken_in_seconds: 54
Epoch [1/1], Step [11868/13804], Loss: 2.2755, Perplexity: 9.7328, time_taken_in_seconds: 55
Epoch [1/1], Step [11869/13804], Loss: 2.4986, Perplexity: 12.1652, time_taken_in_seconds: 56
Epoch [1/1], Step [11870/13804], Loss: 2.6435, Perplexity: 14.0627, time_taken_in_seconds: 57
Epoch [1/1], Step [11871/13804], Loss: 2.2096, Perplexity: 9.1125, time_taken_in_seconds: 58
Epoch [1/1], Step [11872/13804], Loss: 2.3154, Perplexity: 10.1290, time_taken_in_seconds: 58
Epoch [1/1], Step [11873/13804], Loss: 2.6158, Perplexity: 13.6787, time_taken_in_seconds: 59
Epoch [1/1], Step [11874/13804], Loss: 2.9638, Perplexity: 19.3719, time_taken_in_seconds: 60
Epoch [1/1], Step [11875/13804], Loss: 2.6170, Perplexity: 13.6952, time_taken_in_seconds: 61
Epoch [1/1], Step [11876/13804], Loss: 2.6900, Perplexity: 14.7316, time_taken_in_seconds: 62
Epoch [1/1], Step [11877/13804], Loss: 2.4270, Perplexity: 11.3245, time_taken_in_seconds: 62
Epoch [1/1], Step [11878/13804], Loss: 2.2323, Perplexity: 9.3216, time_taken_in_seconds: 63
Epoch [1/1], Step [11879/13804], Loss: 3.0196, Perplexity: 20.4833, time_taken_in_seconds: 64
Epoch [1/1], Step [11880/13804], Loss: 2.3991, Perplexity: 11.0137, time_taken_in_seconds: 65
Epoch [1/1], Step [11881/13804], Loss: 2.4624, Perplexity: 11.7330, time_taken_in_seconds: 66
Epoch [1/1], Step [11882/13804], Loss: 2.6164, Perplexity: 13.6860, time_taken_in_seconds: 66
Epoch [1/1], Step [11883/13804], Loss: 2.5250, Perplexity: 12.4904, time_taken_in_seconds: 67
Epoch [1/1], Step [11884/13804], Loss: 2.6556, Perplexity: 14.2341, time_taken_in_seconds: 68
Epoch [1/1], Step [11885/13804], Loss: 2.3480, Perplexity: 10.4646, time_taken_in_seconds: 69
Epoch [1/1], Step [11886/13804], Loss: 2.5434, Perplexity: 12.7227, time_taken_in_seconds: 70
Epoch [1/1], Step [11887/13804], Loss: 2.3390, Perplexity: 10.3704, time_taken_in_seconds: 70
Epoch [1/1], Step [11888/13804], Loss: 2.5568, Perplexity: 12.8949, time_taken_in_seconds: 71
Epoch [1/1], Step [11889/13804], Loss: 2.1079, Perplexity: 8.2310, time_taken_in_seconds: 72
Epoch [1/1], Step [11890/13804], Loss: 2.5008, Perplexity: 12.1921, time_taken_in_seconds: 73
Epoch [1/1], Step [11891/13804], Loss: 2.4222, Perplexity: 11.2703, time_taken_in_seconds: 74
Epoch [1/1], Step [11892/13804], Loss: 2.4172, Perplexity: 11.2149, time_taken_in_seconds: 74
Epoch [1/1], Step [11893/13804], Loss: 2.7005, Perplexity: 14.8867, time_taken_in_seconds: 75
Epoch [1/1], Step [11894/13804], Loss: 2.3586, Perplexity: 10.5765, time_taken_in_seconds: 76
Epoch [1/1], Step [11895/13804], Loss: 2.5338, Perplexity: 12.6018, time_taken_in_seconds: 77
Epoch [1/1], Step [11896/13804], Loss: 2.5396, Perplexity: 12.6745, time_taken_in_seconds: 78
Epoch [1/1], Step [11897/13804], Loss: 2.4388, Perplexity: 11.4591, time_taken_in_seconds: 79
Epoch [1/1], Step [11898/13804], Loss: 2.8284, Perplexity: 16.9175, time_taken_in_seconds: 79
Epoch [1/1], Step [11899/13804], Loss: 2.2663, Perplexity: 9.6433, time_taken_in_seconds: 80
Epoch [1/1], Step [11900/13804], Loss: 2.5859, Perplexity: 13.2756, time_taken_in_seconds: 81
Epoch [1/1], Step [11901/13804], Loss: 2.5268, Perplexity: 12.5136, time_taken_in_seconds: 1
Epoch [1/1], Step [11902/13804], Loss: 2.4266, Perplexity: 11.3201, time_taken_in_seconds: 1
Epoch [1/1], Step [11903/13804], Loss: 2.7733, Perplexity: 16.0119, time_taken_in_seconds: 2
Epoch [1/1], Step [11904/13804], Loss: 2.3932, Perplexity: 10.9487, time_taken_in_seconds: 3
Epoch [1/1], Step [11905/13804], Loss: 2.6619, Perplexity: 14.3232, time_taken_in_seconds: 4
Epoch [1/1], Step [11906/13804], Loss: 2.7002, Perplexity: 14.8823, time_taken_in_seconds: 5
Epoch [1/1], Step [11907/13804], Loss: 2.3822, Perplexity: 10.8287, time_taken_in_seconds: 5
Epoch [1/1], Step [11908/13804], Loss: 2.9747, Perplexity: 19.5841, time_taken_in_seconds: 6
Epoch [1/1], Step [11909/13804], Loss: 2.8421, Perplexity: 17.1520, time_taken_in_seconds: 7
Epoch [1/1], Step [11910/13804], Loss: 2.6705, Perplexity: 14.4465, time_taken_in_seconds: 8
Epoch [1/1], Step [11911/13804], Loss: 2.6320, Perplexity: 13.9013, time_taken_in_seconds: 9
Epoch [1/1], Step [11912/13804], Loss: 2.6030, Perplexity: 13.5038, time_taken_in_seconds: 9
Epoch [1/1], Step [11913/13804], Loss: 2.5572, Perplexity: 12.8994, time_taken_in_seconds: 10
Epoch [1/1], Step [11914/13804], Loss: 2.7998, Perplexity: 16.4418, time_taken_in_seconds: 11
Epoch [1/1], Step [11915/13804], Loss: 2.3789, Perplexity: 10.7931, time_taken_in_seconds: 12
Epoch [1/1], Step [11916/13804], Loss: 2.6476, Perplexity: 14.1197, time_taken_in_seconds: 13
Epoch [1/1], Step [11917/13804], Loss: 2.6297, Perplexity: 13.8700, time_taken_in_seconds: 14
Epoch [1/1], Step [11918/13804], Loss: 2.2983, Perplexity: 9.9577, time_taken_in_seconds: 14
Epoch [1/1], Step [11919/13804], Loss: 2.6201, Perplexity: 13.7371, time_taken_in_seconds: 15
Epoch [1/1], Step [11920/13804], Loss: 2.6981, Perplexity: 14.8521, time_taken_in_seconds: 16
Epoch [1/1], Step [11921/13804], Loss: 2.7522, Perplexity: 15.6772, time_taken_in_seconds: 17
Epoch [1/1], Step [11922/13804], Loss: 2.7650, Perplexity: 15.8785, time_taken_in_seconds: 18
Epoch [1/1], Step [11923/13804], Loss: 2.4588, Perplexity: 11.6903, time_taken_in_seconds: 18
Epoch [1/1], Step [11924/13804], Loss: 3.1916, Perplexity: 24.3264, time_taken_in_seconds: 19
Epoch [1/1], Step [11925/13804], Loss: 2.5023, Perplexity: 12.2108, time_taken_in_seconds: 20
Epoch [1/1], Step [11926/13804], Loss: 2.3947, Perplexity: 10.9651, time_taken_in_seconds: 21
Epoch [1/1], Step [11927/13804], Loss: 2.3989, Perplexity: 11.0112, time_taken_in_seconds: 22
Epoch [1/1], Step [11928/13804], Loss: 2.2920, Perplexity: 9.8950, time_taken_in_seconds: 22
Epoch [1/1], Step [11929/13804], Loss: 2.1923, Perplexity: 8.9562, time_taken_in_seconds: 23
Epoch [1/1], Step [11930/13804], Loss: 2.7922, Perplexity: 16.3161, time_taken_in_seconds: 24
Epoch [1/1], Step [11931/13804], Loss: 2.0799, Perplexity: 8.0040, time_taken_in_seconds: 25
Epoch [1/1], Step [11932/13804], Loss: 2.6847, Perplexity: 14.6543, time_taken_in_seconds: 26
Epoch [1/1], Step [11933/13804], Loss: 2.3681, Perplexity: 10.6770, time_taken_in_seconds: 27
Epoch [1/1], Step [11934/13804], Loss: 2.3453, Perplexity: 10.4369, time_taken_in_seconds: 27
Epoch [1/1], Step [11935/13804], Loss: 2.6279, Perplexity: 13.8445, time_taken_in_seconds: 28
Epoch [1/1], Step [11936/13804], Loss: 2.3140, Perplexity: 10.1147, time_taken_in_seconds: 29
Epoch [1/1], Step [11937/13804], Loss: 2.3146, Perplexity: 10.1209, time_taken_in_seconds: 30
Epoch [1/1], Step [11938/13804], Loss: 2.2959, Perplexity: 9.9334, time_taken_in_seconds: 31
Epoch [1/1], Step [11939/13804], Loss: 2.6947, Perplexity: 14.8009, time_taken_in_seconds: 31
Epoch [1/1], Step [11940/13804], Loss: 2.5979, Perplexity: 13.4349, time_taken_in_seconds: 32
Epoch [1/1], Step [11941/13804], Loss: 2.5051, Perplexity: 12.2453, time_taken_in_seconds: 33
Epoch [1/1], Step [11942/13804], Loss: 2.8533, Perplexity: 17.3454, time_taken_in_seconds: 34
Epoch [1/1], Step [11943/13804], Loss: 2.3476, Perplexity: 10.4610, time_taken_in_seconds: 35
Epoch [1/1], Step [11944/13804], Loss: 2.5868, Perplexity: 13.2872, time_taken_in_seconds: 35
Epoch [1/1], Step [11945/13804], Loss: 2.7939, Perplexity: 16.3453, time_taken_in_seconds: 36
Epoch [1/1], Step [11946/13804], Loss: 2.4330, Perplexity: 11.3924, time_taken_in_seconds: 37
Epoch [1/1], Step [11947/13804], Loss: 2.6624, Perplexity: 14.3310, time_taken_in_seconds: 38
Epoch [1/1], Step [11948/13804], Loss: 2.2946, Perplexity: 9.9203, time_taken_in_seconds: 39
Epoch [1/1], Step [11949/13804], Loss: 2.4732, Perplexity: 11.8601, time_taken_in_seconds: 40
Epoch [1/1], Step [11950/13804], Loss: 2.3737, Perplexity: 10.7374, time_taken_in_seconds: 40
Epoch [1/1], Step [11951/13804], Loss: 2.4341, Perplexity: 11.4051, time_taken_in_seconds: 41
Epoch [1/1], Step [11952/13804], Loss: 2.4835, Perplexity: 11.9835, time_taken_in_seconds: 42
Epoch [1/1], Step [11953/13804], Loss: 2.5135, Perplexity: 12.3485, time_taken_in_seconds: 43
Epoch [1/1], Step [11954/13804], Loss: 2.5480, Perplexity: 12.7819, time_taken_in_seconds: 44
Epoch [1/1], Step [11955/13804], Loss: 2.4519, Perplexity: 11.6108, time_taken_in_seconds: 44
Epoch [1/1], Step [11956/13804], Loss: 2.2945, Perplexity: 9.9198, time_taken_in_seconds: 45
Epoch [1/1], Step [11957/13804], Loss: 2.9832, Perplexity: 19.7507, time_taken_in_seconds: 46
Epoch [1/1], Step [11958/13804], Loss: 3.3428, Perplexity: 28.2991, time_taken_in_seconds: 47
Epoch [1/1], Step [11959/13804], Loss: 2.5105, Perplexity: 12.3107, time_taken_in_seconds: 48
Epoch [1/1], Step [11960/13804], Loss: 2.7202, Perplexity: 15.1833, time_taken_in_seconds: 48
Epoch [1/1], Step [11961/13804], Loss: 2.5377, Perplexity: 12.6506, time_taken_in_seconds: 49
Epoch [1/1], Step [11962/13804], Loss: 2.7625, Perplexity: 15.8392, time_taken_in_seconds: 50
Epoch [1/1], Step [11963/13804], Loss: 2.9420, Perplexity: 18.9536, time_taken_in_seconds: 51
Epoch [1/1], Step [11964/13804], Loss: 2.6399, Perplexity: 14.0121, time_taken_in_seconds: 52
Epoch [1/1], Step [11965/13804], Loss: 2.5762, Perplexity: 13.1474, time_taken_in_seconds: 53
Epoch [1/1], Step [11966/13804], Loss: 2.4359, Perplexity: 11.4256, time_taken_in_seconds: 53
Epoch [1/1], Step [11967/13804], Loss: 2.4344, Perplexity: 11.4085, time_taken_in_seconds: 54
Epoch [1/1], Step [11968/13804], Loss: 2.3447, Perplexity: 10.4298, time_taken_in_seconds: 55
Epoch [1/1], Step [11969/13804], Loss: 2.1105, Perplexity: 8.2525, time_taken_in_seconds: 56
Epoch [1/1], Step [11970/13804], Loss: 2.2957, Perplexity: 9.9318, time_taken_in_seconds: 57
Epoch [1/1], Step [11971/13804], Loss: 2.7688, Perplexity: 15.9389, time_taken_in_seconds: 57
Epoch [1/1], Step [11972/13804], Loss: 2.9660, Perplexity: 19.4148, time_taken_in_seconds: 58
Epoch [1/1], Step [11973/13804], Loss: 2.2621, Perplexity: 9.6036, time_taken_in_seconds: 59
Epoch [1/1], Step [11974/13804], Loss: 2.3754, Perplexity: 10.7549, time_taken_in_seconds: 60
Epoch [1/1], Step [11975/13804], Loss: 2.8390, Perplexity: 17.0984, time_taken_in_seconds: 61
Epoch [1/1], Step [11976/13804], Loss: 3.0801, Perplexity: 21.7599, time_taken_in_seconds: 62
Epoch [1/1], Step [11977/13804], Loss: 2.6703, Perplexity: 14.4436, time_taken_in_seconds: 63
Epoch [1/1], Step [11978/13804], Loss: 2.6333, Perplexity: 13.9203, time_taken_in_seconds: 63
Epoch [1/1], Step [11979/13804], Loss: 2.4811, Perplexity: 11.9546, time_taken_in_seconds: 64
Epoch [1/1], Step [11980/13804], Loss: 2.2338, Perplexity: 9.3357, time_taken_in_seconds: 65
Epoch [1/1], Step [11981/13804], Loss: 2.4854, Perplexity: 12.0057, time_taken_in_seconds: 66
Epoch [1/1], Step [11982/13804], Loss: 2.2185, Perplexity: 9.1938, time_taken_in_seconds: 67
Epoch [1/1], Step [11983/13804], Loss: 2.3339, Perplexity: 10.3179, time_taken_in_seconds: 67
Epoch [1/1], Step [11984/13804], Loss: 2.4521, Perplexity: 11.6126, time_taken_in_seconds: 68
Epoch [1/1], Step [11985/13804], Loss: 2.5373, Perplexity: 12.6459, time_taken_in_seconds: 69
Epoch [1/1], Step [11986/13804], Loss: 2.6415, Perplexity: 14.0342, time_taken_in_seconds: 70
Epoch [1/1], Step [11987/13804], Loss: 2.3154, Perplexity: 10.1286, time_taken_in_seconds: 71
Epoch [1/1], Step [11988/13804], Loss: 2.2855, Perplexity: 9.8306, time_taken_in_seconds: 72
Epoch [1/1], Step [11989/13804], Loss: 2.5699, Perplexity: 13.0648, time_taken_in_seconds: 72
Epoch [1/1], Step [11990/13804], Loss: 2.4037, Perplexity: 11.0642, time_taken_in_seconds: 73
Epoch [1/1], Step [11991/13804], Loss: 2.8780, Perplexity: 17.7781, time_taken_in_seconds: 74
Epoch [1/1], Step [11992/13804], Loss: 2.1769, Perplexity: 8.8192, time_taken_in_seconds: 75
Epoch [1/1], Step [11993/13804], Loss: 2.6715, Perplexity: 14.4615, time_taken_in_seconds: 76
Epoch [1/1], Step [11994/13804], Loss: 2.3674, Perplexity: 10.6691, time_taken_in_seconds: 76
Epoch [1/1], Step [11995/13804], Loss: 2.8688, Perplexity: 17.6166, time_taken_in_seconds: 77
Epoch [1/1], Step [11996/13804], Loss: 2.5567, Perplexity: 12.8938, time_taken_in_seconds: 78
Epoch [1/1], Step [11997/13804], Loss: 3.1844, Perplexity: 24.1532, time_taken_in_seconds: 79
Epoch [1/1], Step [11998/13804], Loss: 2.4639, Perplexity: 11.7511, time_taken_in_seconds: 80
Epoch [1/1], Step [11999/13804], Loss: 2.3537, Perplexity: 10.5242, time_taken_in_seconds: 81
Epoch [1/1], Step [12000/13804], Loss: 2.8147, Perplexity: 16.6878, time_taken_in_seconds: 81
Epoch [1/1], Step [12001/13804], Loss: 2.5361, Perplexity: 12.6303, time_taken_in_seconds: 0
Epoch [1/1], Step [12002/13804], Loss: 2.5030, Perplexity: 12.2189, time_taken_in_seconds: 1
Epoch [1/1], Step [12003/13804], Loss: 2.3595, Perplexity: 10.5855, time_taken_in_seconds: 2
Epoch [1/1], Step [12004/13804], Loss: 2.6366, Perplexity: 13.9650, time_taken_in_seconds: 3
Epoch [1/1], Step [12005/13804], Loss: 2.4237, Perplexity: 11.2873, time_taken_in_seconds: 4
Epoch [1/1], Step [12006/13804], Loss: 2.9333, Perplexity: 18.7901, time_taken_in_seconds: 4
Epoch [1/1], Step [12007/13804], Loss: 3.0873, Perplexity: 21.9188, time_taken_in_seconds: 5
Epoch [1/1], Step [12008/13804], Loss: 2.5702, Perplexity: 13.0685, time_taken_in_seconds: 6
Epoch [1/1], Step [12009/13804], Loss: 2.6139, Perplexity: 13.6521, time_taken_in_seconds: 7
Epoch [1/1], Step [12010/13804], Loss: 2.4367, Perplexity: 11.4357, time_taken_in_seconds: 8
Epoch [1/1], Step [12011/13804], Loss: 2.6435, Perplexity: 14.0623, time_taken_in_seconds: 8
Epoch [1/1], Step [12012/13804], Loss: 3.3738, Perplexity: 29.1885, time_taken_in_seconds: 9
Epoch [1/1], Step [12013/13804], Loss: 2.4983, Perplexity: 12.1624, time_taken_in_seconds: 10
Epoch [1/1], Step [12014/13804], Loss: 2.5694, Perplexity: 13.0579, time_taken_in_seconds: 11
Epoch [1/1], Step [12015/13804], Loss: 2.3015, Perplexity: 9.9891, time_taken_in_seconds: 12
Epoch [1/1], Step [12016/13804], Loss: 2.7044, Perplexity: 14.9456, time_taken_in_seconds: 13
Epoch [1/1], Step [12017/13804], Loss: 2.3007, Perplexity: 9.9811, time_taken_in_seconds: 13
Epoch [1/1], Step [12018/13804], Loss: 2.6474, Perplexity: 14.1177, time_taken_in_seconds: 14
Epoch [1/1], Step [12019/13804], Loss: 2.3245, Perplexity: 10.2220, time_taken_in_seconds: 15
Epoch [1/1], Step [12020/13804], Loss: 2.2979, Perplexity: 9.9531, time_taken_in_seconds: 16
Epoch [1/1], Step [12021/13804], Loss: 2.3434, Perplexity: 10.4166, time_taken_in_seconds: 17
Epoch [1/1], Step [12022/13804], Loss: 2.2393, Perplexity: 9.3864, time_taken_in_seconds: 17
Epoch [1/1], Step [12023/13804], Loss: 3.1368, Perplexity: 23.0305, time_taken_in_seconds: 18
Epoch [1/1], Step [12024/13804], Loss: 2.7933, Perplexity: 16.3345, time_taken_in_seconds: 19
Epoch [1/1], Step [12025/13804], Loss: 2.3912, Perplexity: 10.9270, time_taken_in_seconds: 20
Epoch [1/1], Step [12026/13804], Loss: 2.7230, Perplexity: 15.2257, time_taken_in_seconds: 21
Epoch [1/1], Step [12027/13804], Loss: 2.4850, Perplexity: 12.0008, time_taken_in_seconds: 22
Epoch [1/1], Step [12028/13804], Loss: 2.9311, Perplexity: 18.7487, time_taken_in_seconds: 22
Epoch [1/1], Step [12029/13804], Loss: 2.6926, Perplexity: 14.7704, time_taken_in_seconds: 23
Epoch [1/1], Step [12030/13804], Loss: 2.1024, Perplexity: 8.1857, time_taken_in_seconds: 24
Epoch [1/1], Step [12031/13804], Loss: 2.4305, Perplexity: 11.3640, time_taken_in_seconds: 25
Epoch [1/1], Step [12032/13804], Loss: 2.6531, Perplexity: 14.1984, time_taken_in_seconds: 26
Epoch [1/1], Step [12033/13804], Loss: 2.5676, Perplexity: 13.0348, time_taken_in_seconds: 26
Epoch [1/1], Step [12034/13804], Loss: 2.8593, Perplexity: 17.4485, time_taken_in_seconds: 27
Epoch [1/1], Step [12035/13804], Loss: 2.3885, Perplexity: 10.8970, time_taken_in_seconds: 28
Epoch [1/1], Step [12036/13804], Loss: 2.4330, Perplexity: 11.3931, time_taken_in_seconds: 29
Epoch [1/1], Step [12037/13804], Loss: 2.2677, Perplexity: 9.6570, time_taken_in_seconds: 30
Epoch [1/1], Step [12038/13804], Loss: 2.6171, Perplexity: 13.6960, time_taken_in_seconds: 30
Epoch [1/1], Step [12039/13804], Loss: 2.5294, Perplexity: 12.5460, time_taken_in_seconds: 31
Epoch [1/1], Step [12040/13804], Loss: 2.8233, Perplexity: 16.8316, time_taken_in_seconds: 32
Epoch [1/1], Step [12041/13804], Loss: 2.2557, Perplexity: 9.5423, time_taken_in_seconds: 33
Epoch [1/1], Step [12042/13804], Loss: 2.4623, Perplexity: 11.7320, time_taken_in_seconds: 34
Epoch [1/1], Step [12043/13804], Loss: 2.3732, Perplexity: 10.7319, time_taken_in_seconds: 35
Epoch [1/1], Step [12044/13804], Loss: 2.1612, Perplexity: 8.6814, time_taken_in_seconds: 35
Epoch [1/1], Step [12045/13804], Loss: 2.2252, Perplexity: 9.2550, time_taken_in_seconds: 36
Epoch [1/1], Step [12046/13804], Loss: 3.0414, Perplexity: 20.9343, time_taken_in_seconds: 37
Epoch [1/1], Step [12047/13804], Loss: 2.8414, Perplexity: 17.1404, time_taken_in_seconds: 38
Epoch [1/1], Step [12048/13804], Loss: 2.1012, Perplexity: 8.1762, time_taken_in_seconds: 39
Epoch [1/1], Step [12049/13804], Loss: 2.3553, Perplexity: 10.5413, time_taken_in_seconds: 40
Epoch [1/1], Step [12050/13804], Loss: 2.4659, Perplexity: 11.7738, time_taken_in_seconds: 40
Epoch [1/1], Step [12051/13804], Loss: 2.4884, Perplexity: 12.0419, time_taken_in_seconds: 41
Epoch [1/1], Step [12052/13804], Loss: 2.5001, Perplexity: 12.1838, time_taken_in_seconds: 42
Epoch [1/1], Step [12053/13804], Loss: 2.7644, Perplexity: 15.8687, time_taken_in_seconds: 43
Epoch [1/1], Step [12054/13804], Loss: 2.6167, Perplexity: 13.6903, time_taken_in_seconds: 44
Epoch [1/1], Step [12055/13804], Loss: 2.3469, Perplexity: 10.4536, time_taken_in_seconds: 44
Epoch [1/1], Step [12056/13804], Loss: 2.5751, Perplexity: 13.1324, time_taken_in_seconds: 45
Epoch [1/1], Step [12057/13804], Loss: 2.4488, Perplexity: 11.5744, time_taken_in_seconds: 46
Epoch [1/1], Step [12058/13804], Loss: 2.7341, Perplexity: 15.3955, time_taken_in_seconds: 47
Epoch [1/1], Step [12059/13804], Loss: 2.6443, Perplexity: 14.0730, time_taken_in_seconds: 48
Epoch [1/1], Step [12060/13804], Loss: 2.6592, Perplexity: 14.2842, time_taken_in_seconds: 48
Epoch [1/1], Step [12061/13804], Loss: 2.5450, Perplexity: 12.7435, time_taken_in_seconds: 49
Epoch [1/1], Step [12062/13804], Loss: 2.5917, Perplexity: 13.3528, time_taken_in_seconds: 50
Epoch [1/1], Step [12063/13804], Loss: 2.8049, Perplexity: 16.5254, time_taken_in_seconds: 51
Epoch [1/1], Step [12064/13804], Loss: 2.5684, Perplexity: 13.0456, time_taken_in_seconds: 52
Epoch [1/1], Step [12065/13804], Loss: 2.8429, Perplexity: 17.1655, time_taken_in_seconds: 53
Epoch [1/1], Step [12066/13804], Loss: 2.3765, Perplexity: 10.7666, time_taken_in_seconds: 53
Epoch [1/1], Step [12067/13804], Loss: 2.3243, Perplexity: 10.2197, time_taken_in_seconds: 54
Epoch [1/1], Step [12068/13804], Loss: 2.1817, Perplexity: 8.8613, time_taken_in_seconds: 55
Epoch [1/1], Step [12069/13804], Loss: 2.4285, Perplexity: 11.3417, time_taken_in_seconds: 56
Epoch [1/1], Step [12070/13804], Loss: 3.0062, Perplexity: 20.2100, time_taken_in_seconds: 57
Epoch [1/1], Step [12071/13804], Loss: 3.1985, Perplexity: 24.4960, time_taken_in_seconds: 57
Epoch [1/1], Step [12072/13804], Loss: 2.6604, Perplexity: 14.3025, time_taken_in_seconds: 58
Epoch [1/1], Step [12073/13804], Loss: 2.6909, Perplexity: 14.7454, time_taken_in_seconds: 59
Epoch [1/1], Step [12074/13804], Loss: 2.0923, Perplexity: 8.1038, time_taken_in_seconds: 60
Epoch [1/1], Step [12075/13804], Loss: 2.4550, Perplexity: 11.6462, time_taken_in_seconds: 61
Epoch [1/1], Step [12076/13804], Loss: 3.0410, Perplexity: 20.9253, time_taken_in_seconds: 61
Epoch [1/1], Step [12077/13804], Loss: 2.2571, Perplexity: 9.5555, time_taken_in_seconds: 62
Epoch [1/1], Step [12078/13804], Loss: 2.8384, Perplexity: 17.0891, time_taken_in_seconds: 63
Epoch [1/1], Step [12079/13804], Loss: 2.9306, Perplexity: 18.7385, time_taken_in_seconds: 64
Epoch [1/1], Step [12080/13804], Loss: 2.6023, Perplexity: 13.4947, time_taken_in_seconds: 65
Epoch [1/1], Step [12081/13804], Loss: 2.4200, Perplexity: 11.2459, time_taken_in_seconds: 66
Epoch [1/1], Step [12082/13804], Loss: 2.4880, Perplexity: 12.0372, time_taken_in_seconds: 66
Epoch [1/1], Step [12083/13804], Loss: 2.5720, Perplexity: 13.0924, time_taken_in_seconds: 67
Epoch [1/1], Step [12084/13804], Loss: 2.3521, Perplexity: 10.5080, time_taken_in_seconds: 68
Epoch [1/1], Step [12085/13804], Loss: 2.7063, Perplexity: 14.9742, time_taken_in_seconds: 69
Epoch [1/1], Step [12086/13804], Loss: 2.2446, Perplexity: 9.4370, time_taken_in_seconds: 70
Epoch [1/1], Step [12087/13804], Loss: 2.3612, Perplexity: 10.6036, time_taken_in_seconds: 70
Epoch [1/1], Step [12088/13804], Loss: 2.1956, Perplexity: 8.9858, time_taken_in_seconds: 71
Epoch [1/1], Step [12089/13804], Loss: 2.1473, Perplexity: 8.5614, time_taken_in_seconds: 72
Epoch [1/1], Step [12090/13804], Loss: 2.5987, Perplexity: 13.4464, time_taken_in_seconds: 73
Epoch [1/1], Step [12091/13804], Loss: 2.7279, Perplexity: 15.3007, time_taken_in_seconds: 74
Epoch [1/1], Step [12092/13804], Loss: 2.9369, Perplexity: 18.8578, time_taken_in_seconds: 74
Epoch [1/1], Step [12093/13804], Loss: 2.5420, Perplexity: 12.7050, time_taken_in_seconds: 75
Epoch [1/1], Step [12094/13804], Loss: 2.8628, Perplexity: 17.5113, time_taken_in_seconds: 76
Epoch [1/1], Step [12095/13804], Loss: 2.5894, Perplexity: 13.3211, time_taken_in_seconds: 77
Epoch [1/1], Step [12096/13804], Loss: 2.5312, Perplexity: 12.5689, time_taken_in_seconds: 78
Epoch [1/1], Step [12097/13804], Loss: 2.2605, Perplexity: 9.5882, time_taken_in_seconds: 78
Epoch [1/1], Step [12098/13804], Loss: 2.5410, Perplexity: 12.6922, time_taken_in_seconds: 79
Epoch [1/1], Step [12099/13804], Loss: 2.1881, Perplexity: 8.9183, time_taken_in_seconds: 80
Epoch [1/1], Step [12100/13804], Loss: 2.4805, Perplexity: 11.9474, time_taken_in_seconds: 81
Epoch [1/1], Step [12101/13804], Loss: 2.3209, Perplexity: 10.1845, time_taken_in_seconds: 0
Epoch [1/1], Step [12102/13804], Loss: 2.6640, Perplexity: 14.3539, time_taken_in_seconds: 1
Epoch [1/1], Step [12103/13804], Loss: 2.5903, Perplexity: 13.3340, time_taken_in_seconds: 2
Epoch [1/1], Step [12104/13804], Loss: 2.1606, Perplexity: 8.6761, time_taken_in_seconds: 3
Epoch [1/1], Step [12105/13804], Loss: 2.3492, Perplexity: 10.4777, time_taken_in_seconds: 4
Epoch [1/1], Step [12106/13804], Loss: 2.2625, Perplexity: 9.6066, time_taken_in_seconds: 4
Epoch [1/1], Step [12107/13804], Loss: 2.2787, Perplexity: 9.7642, time_taken_in_seconds: 5
Epoch [1/1], Step [12108/13804], Loss: 2.7023, Perplexity: 14.9143, time_taken_in_seconds: 6
Epoch [1/1], Step [12109/13804], Loss: 2.6921, Perplexity: 14.7632, time_taken_in_seconds: 7
Epoch [1/1], Step [12110/13804], Loss: 2.2243, Perplexity: 9.2467, time_taken_in_seconds: 8
Epoch [1/1], Step [12111/13804], Loss: 2.6473, Perplexity: 14.1160, time_taken_in_seconds: 8
Epoch [1/1], Step [12112/13804], Loss: 3.6122, Perplexity: 37.0461, time_taken_in_seconds: 9
Epoch [1/1], Step [12113/13804], Loss: 2.5560, Perplexity: 12.8844, time_taken_in_seconds: 10
Epoch [1/1], Step [12114/13804], Loss: 2.4338, Perplexity: 11.4017, time_taken_in_seconds: 11
Epoch [1/1], Step [12115/13804], Loss: 3.6943, Perplexity: 40.2178, time_taken_in_seconds: 12
Epoch [1/1], Step [12116/13804], Loss: 2.5537, Perplexity: 12.8544, time_taken_in_seconds: 13
Epoch [1/1], Step [12117/13804], Loss: 2.3806, Perplexity: 10.8111, time_taken_in_seconds: 13
Epoch [1/1], Step [12118/13804], Loss: 2.4297, Perplexity: 11.3556, time_taken_in_seconds: 14
Epoch [1/1], Step [12119/13804], Loss: 2.3540, Perplexity: 10.5275, time_taken_in_seconds: 15
Epoch [1/1], Step [12120/13804], Loss: 2.6087, Perplexity: 13.5809, time_taken_in_seconds: 16
Epoch [1/1], Step [12121/13804], Loss: 2.4750, Perplexity: 11.8819, time_taken_in_seconds: 17
Epoch [1/1], Step [12122/13804], Loss: 2.0553, Perplexity: 7.8090, time_taken_in_seconds: 17
Epoch [1/1], Step [12123/13804], Loss: 3.4953, Perplexity: 32.9591, time_taken_in_seconds: 19
Epoch [1/1], Step [12124/13804], Loss: 2.5340, Perplexity: 12.6043, time_taken_in_seconds: 20
Epoch [1/1], Step [12125/13804], Loss: 2.6812, Perplexity: 14.6028, time_taken_in_seconds: 20
Epoch [1/1], Step [12126/13804], Loss: 2.3078, Perplexity: 10.0520, time_taken_in_seconds: 21
Epoch [1/1], Step [12127/13804], Loss: 2.4170, Perplexity: 11.2120, time_taken_in_seconds: 22
Epoch [1/1], Step [12128/13804], Loss: 2.9301, Perplexity: 18.7300, time_taken_in_seconds: 23
Epoch [1/1], Step [12129/13804], Loss: 2.6962, Perplexity: 14.8237, time_taken_in_seconds: 24
Epoch [1/1], Step [12130/13804], Loss: 2.6077, Perplexity: 13.5684, time_taken_in_seconds: 24
Epoch [1/1], Step [12131/13804], Loss: 3.4668, Perplexity: 32.0350, time_taken_in_seconds: 25
Epoch [1/1], Step [12132/13804], Loss: 2.5596, Perplexity: 12.9303, time_taken_in_seconds: 26
Epoch [1/1], Step [12133/13804], Loss: 2.9600, Perplexity: 19.2973, time_taken_in_seconds: 27
Epoch [1/1], Step [12134/13804], Loss: 2.6443, Perplexity: 14.0730, time_taken_in_seconds: 28
Epoch [1/1], Step [12135/13804], Loss: 2.3144, Perplexity: 10.1185, time_taken_in_seconds: 29
Epoch [1/1], Step [12136/13804], Loss: 2.5456, Perplexity: 12.7504, time_taken_in_seconds: 29
Epoch [1/1], Step [12137/13804], Loss: 2.4466, Perplexity: 11.5496, time_taken_in_seconds: 30
Epoch [1/1], Step [12138/13804], Loss: 2.7341, Perplexity: 15.3959, time_taken_in_seconds: 31
Epoch [1/1], Step [12139/13804], Loss: 2.7175, Perplexity: 15.1428, time_taken_in_seconds: 32
Epoch [1/1], Step [12140/13804], Loss: 2.2349, Perplexity: 9.3460, time_taken_in_seconds: 33
Epoch [1/1], Step [12141/13804], Loss: 2.1373, Perplexity: 8.4762, time_taken_in_seconds: 33
Epoch [1/1], Step [12142/13804], Loss: 2.4148, Perplexity: 11.1871, time_taken_in_seconds: 34
Epoch [1/1], Step [12143/13804], Loss: 2.2551, Perplexity: 9.5363, time_taken_in_seconds: 35
Epoch [1/1], Step [12144/13804], Loss: 2.4980, Perplexity: 12.1580, time_taken_in_seconds: 36
Epoch [1/1], Step [12145/13804], Loss: 2.5250, Perplexity: 12.4913, time_taken_in_seconds: 37
Epoch [1/1], Step [12146/13804], Loss: 2.2846, Perplexity: 9.8215, time_taken_in_seconds: 37
Epoch [1/1], Step [12147/13804], Loss: 2.5752, Perplexity: 13.1333, time_taken_in_seconds: 38
Epoch [1/1], Step [12148/13804], Loss: 2.3108, Perplexity: 10.0829, time_taken_in_seconds: 39
Epoch [1/1], Step [12149/13804], Loss: 2.3424, Perplexity: 10.4059, time_taken_in_seconds: 40
Epoch [1/1], Step [12150/13804], Loss: 2.4138, Perplexity: 11.1763, time_taken_in_seconds: 41
Epoch [1/1], Step [12151/13804], Loss: 2.5711, Perplexity: 13.0799, time_taken_in_seconds: 42
Epoch [1/1], Step [12152/13804], Loss: 2.5065, Perplexity: 12.2618, time_taken_in_seconds: 42
Epoch [1/1], Step [12153/13804], Loss: 2.1581, Perplexity: 8.6548, time_taken_in_seconds: 43
Epoch [1/1], Step [12154/13804], Loss: 2.3285, Perplexity: 10.2622, time_taken_in_seconds: 44
Epoch [1/1], Step [12155/13804], Loss: 2.3521, Perplexity: 10.5075, time_taken_in_seconds: 45
Epoch [1/1], Step [12156/13804], Loss: 2.2035, Perplexity: 9.0567, time_taken_in_seconds: 46
Epoch [1/1], Step [12157/13804], Loss: 3.1998, Perplexity: 24.5281, time_taken_in_seconds: 46
Epoch [1/1], Step [12158/13804], Loss: 2.4986, Perplexity: 12.1654, time_taken_in_seconds: 47
Epoch [1/1], Step [12159/13804], Loss: 2.5465, Perplexity: 12.7627, time_taken_in_seconds: 48
Epoch [1/1], Step [12160/13804], Loss: 2.4453, Perplexity: 11.5343, time_taken_in_seconds: 49
Epoch [1/1], Step [12161/13804], Loss: 2.4950, Perplexity: 12.1215, time_taken_in_seconds: 50
Epoch [1/1], Step [12162/13804], Loss: 2.6431, Perplexity: 14.0573, time_taken_in_seconds: 51
Epoch [1/1], Step [12163/13804], Loss: 2.4726, Perplexity: 11.8537, time_taken_in_seconds: 51
Epoch [1/1], Step [12164/13804], Loss: 2.5174, Perplexity: 12.3963, time_taken_in_seconds: 52
Epoch [1/1], Step [12165/13804], Loss: 2.5749, Perplexity: 13.1305, time_taken_in_seconds: 53
Epoch [1/1], Step [12166/13804], Loss: 2.7607, Perplexity: 15.8102, time_taken_in_seconds: 54
Epoch [1/1], Step [12167/13804], Loss: 2.2299, Perplexity: 9.2992, time_taken_in_seconds: 55
Epoch [1/1], Step [12168/13804], Loss: 2.3546, Perplexity: 10.5336, time_taken_in_seconds: 55
Epoch [1/1], Step [12169/13804], Loss: 2.4960, Perplexity: 12.1342, time_taken_in_seconds: 56
Epoch [1/1], Step [12170/13804], Loss: 2.5938, Perplexity: 13.3808, time_taken_in_seconds: 57
Epoch [1/1], Step [12171/13804], Loss: 2.6848, Perplexity: 14.6554, time_taken_in_seconds: 58
Epoch [1/1], Step [12172/13804], Loss: 2.4775, Perplexity: 11.9119, time_taken_in_seconds: 59
Epoch [1/1], Step [12173/13804], Loss: 2.5728, Perplexity: 13.1024, time_taken_in_seconds: 59
Epoch [1/1], Step [12174/13804], Loss: 2.3649, Perplexity: 10.6428, time_taken_in_seconds: 60
Epoch [1/1], Step [12175/13804], Loss: 2.5829, Perplexity: 13.2361, time_taken_in_seconds: 61
Epoch [1/1], Step [12176/13804], Loss: 2.6047, Perplexity: 13.5276, time_taken_in_seconds: 62
Epoch [1/1], Step [12177/13804], Loss: 2.8216, Perplexity: 16.8042, time_taken_in_seconds: 63
Epoch [1/1], Step [12178/13804], Loss: 2.3921, Perplexity: 10.9367, time_taken_in_seconds: 64
Epoch [1/1], Step [12179/13804], Loss: 2.4070, Perplexity: 11.1007, time_taken_in_seconds: 64
Epoch [1/1], Step [12180/13804], Loss: 2.2844, Perplexity: 9.8196, time_taken_in_seconds: 65
Epoch [1/1], Step [12181/13804], Loss: 2.6148, Perplexity: 13.6641, time_taken_in_seconds: 66
Epoch [1/1], Step [12182/13804], Loss: 2.3332, Perplexity: 10.3107, time_taken_in_seconds: 67
Epoch [1/1], Step [12183/13804], Loss: 2.8457, Perplexity: 17.2143, time_taken_in_seconds: 68
Epoch [1/1], Step [12184/13804], Loss: 2.6391, Perplexity: 14.0001, time_taken_in_seconds: 68
Epoch [1/1], Step [12185/13804], Loss: 2.2514, Perplexity: 9.5012, time_taken_in_seconds: 69
Epoch [1/1], Step [12186/13804], Loss: 2.7425, Perplexity: 15.5252, time_taken_in_seconds: 70
Epoch [1/1], Step [12187/13804], Loss: 2.3600, Perplexity: 10.5906, time_taken_in_seconds: 71
Epoch [1/1], Step [12188/13804], Loss: 2.3197, Perplexity: 10.1725, time_taken_in_seconds: 72
Epoch [1/1], Step [12189/13804], Loss: 2.3747, Perplexity: 10.7474, time_taken_in_seconds: 73
Epoch [1/1], Step [12190/13804], Loss: 2.5167, Perplexity: 12.3878, time_taken_in_seconds: 73
Epoch [1/1], Step [12191/13804], Loss: 2.9509, Perplexity: 19.1240, time_taken_in_seconds: 74
Epoch [1/1], Step [12192/13804], Loss: 2.8492, Perplexity: 17.2735, time_taken_in_seconds: 75
Epoch [1/1], Step [12193/13804], Loss: 2.7466, Perplexity: 15.5898, time_taken_in_seconds: 76
Epoch [1/1], Step [12194/13804], Loss: 2.6358, Perplexity: 13.9547, time_taken_in_seconds: 77
Epoch [1/1], Step [12195/13804], Loss: 2.6119, Perplexity: 13.6248, time_taken_in_seconds: 77
Epoch [1/1], Step [12196/13804], Loss: 2.8388, Perplexity: 17.0944, time_taken_in_seconds: 78
Epoch [1/1], Step [12197/13804], Loss: 2.3017, Perplexity: 9.9907, time_taken_in_seconds: 79
Epoch [1/1], Step [12198/13804], Loss: 2.5880, Perplexity: 13.3032, time_taken_in_seconds: 80
Epoch [1/1], Step [12199/13804], Loss: 2.3088, Perplexity: 10.0619, time_taken_in_seconds: 81
Epoch [1/1], Step [12200/13804], Loss: 2.8386, Perplexity: 17.0919, time_taken_in_seconds: 82
Epoch [1/1], Step [12201/13804], Loss: 2.2039, Perplexity: 9.0600, time_taken_in_seconds: 0
Epoch [1/1], Step [12202/13804], Loss: 2.4118, Perplexity: 11.1539, time_taken_in_seconds: 1
Epoch [1/1], Step [12203/13804], Loss: 2.4215, Perplexity: 11.2627, time_taken_in_seconds: 2
Epoch [1/1], Step [12204/13804], Loss: 2.2657, Perplexity: 9.6380, time_taken_in_seconds: 3
Epoch [1/1], Step [12205/13804], Loss: 2.6622, Perplexity: 14.3284, time_taken_in_seconds: 4
Epoch [1/1], Step [12206/13804], Loss: 2.1571, Perplexity: 8.6459, time_taken_in_seconds: 4
Epoch [1/1], Step [12207/13804], Loss: 2.6584, Perplexity: 14.2737, time_taken_in_seconds: 5
Epoch [1/1], Step [12208/13804], Loss: 2.4832, Perplexity: 11.9793, time_taken_in_seconds: 6
Epoch [1/1], Step [12209/13804], Loss: 2.2971, Perplexity: 9.9450, time_taken_in_seconds: 7
Epoch [1/1], Step [12210/13804], Loss: 2.5426, Perplexity: 12.7131, time_taken_in_seconds: 8
Epoch [1/1], Step [12211/13804], Loss: 2.4456, Perplexity: 11.5372, time_taken_in_seconds: 8
Epoch [1/1], Step [12212/13804], Loss: 2.3980, Perplexity: 11.0008, time_taken_in_seconds: 9
Epoch [1/1], Step [12213/13804], Loss: 2.7051, Perplexity: 14.9560, time_taken_in_seconds: 10
Epoch [1/1], Step [12214/13804], Loss: 2.1411, Perplexity: 8.5086, time_taken_in_seconds: 11
Epoch [1/1], Step [12215/13804], Loss: 2.0684, Perplexity: 7.9123, time_taken_in_seconds: 12
Epoch [1/1], Step [12216/13804], Loss: 2.4466, Perplexity: 11.5493, time_taken_in_seconds: 13
Epoch [1/1], Step [12217/13804], Loss: 2.5284, Perplexity: 12.5335, time_taken_in_seconds: 13
Epoch [1/1], Step [12218/13804], Loss: 2.4084, Perplexity: 11.1167, time_taken_in_seconds: 14
Epoch [1/1], Step [12219/13804], Loss: 2.7039, Perplexity: 14.9374, time_taken_in_seconds: 15
Epoch [1/1], Step [12220/13804], Loss: 2.6786, Perplexity: 14.5642, time_taken_in_seconds: 16
Epoch [1/1], Step [12221/13804], Loss: 2.3646, Perplexity: 10.6393, time_taken_in_seconds: 17
Epoch [1/1], Step [12222/13804], Loss: 2.5595, Perplexity: 12.9294, time_taken_in_seconds: 17
Epoch [1/1], Step [12223/13804], Loss: 2.3535, Perplexity: 10.5218, time_taken_in_seconds: 18
Epoch [1/1], Step [12224/13804], Loss: 2.5894, Perplexity: 13.3218, time_taken_in_seconds: 19
Epoch [1/1], Step [12225/13804], Loss: 2.5547, Perplexity: 12.8680, time_taken_in_seconds: 20
Epoch [1/1], Step [12226/13804], Loss: 2.7320, Perplexity: 15.3635, time_taken_in_seconds: 21
Epoch [1/1], Step [12227/13804], Loss: 2.2510, Perplexity: 9.4976, time_taken_in_seconds: 21
Epoch [1/1], Step [12228/13804], Loss: 2.5410, Perplexity: 12.6929, time_taken_in_seconds: 22
Epoch [1/1], Step [12229/13804], Loss: 2.8042, Perplexity: 16.5131, time_taken_in_seconds: 23
Epoch [1/1], Step [12230/13804], Loss: 2.2143, Perplexity: 9.1547, time_taken_in_seconds: 24
Epoch [1/1], Step [12231/13804], Loss: 2.3403, Perplexity: 10.3839, time_taken_in_seconds: 25
Epoch [1/1], Step [12232/13804], Loss: 2.7450, Perplexity: 15.5653, time_taken_in_seconds: 26
Epoch [1/1], Step [12233/13804], Loss: 2.0930, Perplexity: 8.1091, time_taken_in_seconds: 26
Epoch [1/1], Step [12234/13804], Loss: 2.5341, Perplexity: 12.6051, time_taken_in_seconds: 27
Epoch [1/1], Step [12235/13804], Loss: 2.4277, Perplexity: 11.3326, time_taken_in_seconds: 28
Epoch [1/1], Step [12236/13804], Loss: 2.8276, Perplexity: 16.9049, time_taken_in_seconds: 29
Epoch [1/1], Step [12237/13804], Loss: 2.4385, Perplexity: 11.4555, time_taken_in_seconds: 30
Epoch [1/1], Step [12238/13804], Loss: 2.2543, Perplexity: 9.5286, time_taken_in_seconds: 30
Epoch [1/1], Step [12239/13804], Loss: 3.2386, Perplexity: 25.4975, time_taken_in_seconds: 31
Epoch [1/1], Step [12240/13804], Loss: 2.9255, Perplexity: 18.6443, time_taken_in_seconds: 32
Epoch [1/1], Step [12241/13804], Loss: 2.9443, Perplexity: 18.9970, time_taken_in_seconds: 33
Epoch [1/1], Step [12242/13804], Loss: 2.7262, Perplexity: 15.2743, time_taken_in_seconds: 34
Epoch [1/1], Step [12243/13804], Loss: 2.7741, Perplexity: 16.0234, time_taken_in_seconds: 34
Epoch [1/1], Step [12244/13804], Loss: 2.4255, Perplexity: 11.3075, time_taken_in_seconds: 35
Epoch [1/1], Step [12245/13804], Loss: 2.2482, Perplexity: 9.4706, time_taken_in_seconds: 36
Epoch [1/1], Step [12246/13804], Loss: 2.6666, Perplexity: 14.3902, time_taken_in_seconds: 37
Epoch [1/1], Step [12247/13804], Loss: 2.5922, Perplexity: 13.3597, time_taken_in_seconds: 38
Epoch [1/1], Step [12248/13804], Loss: 2.8563, Perplexity: 17.3978, time_taken_in_seconds: 39
Epoch [1/1], Step [12249/13804], Loss: 2.7612, Perplexity: 15.8189, time_taken_in_seconds: 39
Epoch [1/1], Step [12250/13804], Loss: 2.4146, Perplexity: 11.1852, time_taken_in_seconds: 40
Epoch [1/1], Step [12251/13804], Loss: 2.6960, Perplexity: 14.8205, time_taken_in_seconds: 41
Epoch [1/1], Step [12252/13804], Loss: 2.4471, Perplexity: 11.5550, time_taken_in_seconds: 42
Epoch [1/1], Step [12253/13804], Loss: 2.4308, Perplexity: 11.3681, time_taken_in_seconds: 43
Epoch [1/1], Step [12254/13804], Loss: 2.3579, Perplexity: 10.5691, time_taken_in_seconds: 43
Epoch [1/1], Step [12255/13804], Loss: 2.2103, Perplexity: 9.1180, time_taken_in_seconds: 44
Epoch [1/1], Step [12256/13804], Loss: 2.3703, Perplexity: 10.7008, time_taken_in_seconds: 45
Epoch [1/1], Step [12257/13804], Loss: 2.3482, Perplexity: 10.4664, time_taken_in_seconds: 46
Epoch [1/1], Step [12258/13804], Loss: 2.6713, Perplexity: 14.4583, time_taken_in_seconds: 47
Epoch [1/1], Step [12259/13804], Loss: 2.7412, Perplexity: 15.5051, time_taken_in_seconds: 47
Epoch [1/1], Step [12260/13804], Loss: 2.4565, Perplexity: 11.6635, time_taken_in_seconds: 48
Epoch [1/1], Step [12261/13804], Loss: 2.1608, Perplexity: 8.6780, time_taken_in_seconds: 49
Epoch [1/1], Step [12262/13804], Loss: 2.7511, Perplexity: 15.6600, time_taken_in_seconds: 50
Epoch [1/1], Step [12263/13804], Loss: 2.8453, Perplexity: 17.2062, time_taken_in_seconds: 51
Epoch [1/1], Step [12264/13804], Loss: 2.2569, Perplexity: 9.5532, time_taken_in_seconds: 51
Epoch [1/1], Step [12265/13804], Loss: 2.4725, Perplexity: 11.8520, time_taken_in_seconds: 52
Epoch [1/1], Step [12266/13804], Loss: 2.2327, Perplexity: 9.3246, time_taken_in_seconds: 53
Epoch [1/1], Step [12267/13804], Loss: 2.4483, Perplexity: 11.5685, time_taken_in_seconds: 54
Epoch [1/1], Step [12268/13804], Loss: 2.5140, Perplexity: 12.3546, time_taken_in_seconds: 55
Epoch [1/1], Step [12269/13804], Loss: 2.4728, Perplexity: 11.8554, time_taken_in_seconds: 56
Epoch [1/1], Step [12270/13804], Loss: 2.5864, Perplexity: 13.2823, time_taken_in_seconds: 57
Epoch [1/1], Step [12271/13804], Loss: 2.4568, Perplexity: 11.6677, time_taken_in_seconds: 57
Epoch [1/1], Step [12272/13804], Loss: 2.5608, Perplexity: 12.9461, time_taken_in_seconds: 58
Epoch [1/1], Step [12273/13804], Loss: 2.5853, Perplexity: 13.2675, time_taken_in_seconds: 59
Epoch [1/1], Step [12274/13804], Loss: 2.5331, Perplexity: 12.5922, time_taken_in_seconds: 60
Epoch [1/1], Step [12275/13804], Loss: 2.4354, Perplexity: 11.4203, time_taken_in_seconds: 61
Epoch [1/1], Step [12276/13804], Loss: 2.7631, Perplexity: 15.8486, time_taken_in_seconds: 61
Epoch [1/1], Step [12277/13804], Loss: 2.2773, Perplexity: 9.7501, time_taken_in_seconds: 62
Epoch [1/1], Step [12278/13804], Loss: 2.5801, Perplexity: 13.1989, time_taken_in_seconds: 63
Epoch [1/1], Step [12279/13804], Loss: 2.5629, Perplexity: 12.9734, time_taken_in_seconds: 64
Epoch [1/1], Step [12280/13804], Loss: 2.7209, Perplexity: 15.1942, time_taken_in_seconds: 65
Epoch [1/1], Step [12281/13804], Loss: 2.8095, Perplexity: 16.6011, time_taken_in_seconds: 65
Epoch [1/1], Step [12282/13804], Loss: 2.8376, Perplexity: 17.0755, time_taken_in_seconds: 66
Epoch [1/1], Step [12283/13804], Loss: 2.3903, Perplexity: 10.9173, time_taken_in_seconds: 67
Epoch [1/1], Step [12284/13804], Loss: 2.4863, Perplexity: 12.0170, time_taken_in_seconds: 68
Epoch [1/1], Step [12285/13804], Loss: 2.5367, Perplexity: 12.6376, time_taken_in_seconds: 69
Epoch [1/1], Step [12286/13804], Loss: 2.9839, Perplexity: 19.7638, time_taken_in_seconds: 70
Epoch [1/1], Step [12287/13804], Loss: 2.6436, Perplexity: 14.0635, time_taken_in_seconds: 70
Epoch [1/1], Step [12288/13804], Loss: 2.6206, Perplexity: 13.7435, time_taken_in_seconds: 71
Epoch [1/1], Step [12289/13804], Loss: 2.8984, Perplexity: 18.1443, time_taken_in_seconds: 72
Epoch [1/1], Step [12290/13804], Loss: 2.4823, Perplexity: 11.9688, time_taken_in_seconds: 73
Epoch [1/1], Step [12291/13804], Loss: 2.4080, Perplexity: 11.1112, time_taken_in_seconds: 74
Epoch [1/1], Step [12292/13804], Loss: 2.5180, Perplexity: 12.4040, time_taken_in_seconds: 74
Epoch [1/1], Step [12293/13804], Loss: 2.4646, Perplexity: 11.7583, time_taken_in_seconds: 75
Epoch [1/1], Step [12294/13804], Loss: 2.7708, Perplexity: 15.9720, time_taken_in_seconds: 76
Epoch [1/1], Step [12295/13804], Loss: 2.5928, Perplexity: 13.3666, time_taken_in_seconds: 77
Epoch [1/1], Step [12296/13804], Loss: 2.7008, Perplexity: 14.8916, time_taken_in_seconds: 78
Epoch [1/1], Step [12297/13804], Loss: 2.7399, Perplexity: 15.4854, time_taken_in_seconds: 79
Epoch [1/1], Step [12298/13804], Loss: 2.5311, Perplexity: 12.5672, time_taken_in_seconds: 79
Epoch [1/1], Step [12299/13804], Loss: 2.7144, Perplexity: 15.0952, time_taken_in_seconds: 80
Epoch [1/1], Step [12300/13804], Loss: 2.5046, Perplexity: 12.2388, time_taken_in_seconds: 81
Epoch [1/1], Step [12301/13804], Loss: 2.7901, Perplexity: 16.2827, time_taken_in_seconds: 0
Epoch [1/1], Step [12302/13804], Loss: 2.3923, Perplexity: 10.9384, time_taken_in_seconds: 1
Epoch [1/1], Step [12303/13804], Loss: 2.6102, Perplexity: 13.6022, time_taken_in_seconds: 2
Epoch [1/1], Step [12304/13804], Loss: 2.6271, Perplexity: 13.8337, time_taken_in_seconds: 3
Epoch [1/1], Step [12305/13804], Loss: 2.5381, Perplexity: 12.6554, time_taken_in_seconds: 4
Epoch [1/1], Step [12306/13804], Loss: 2.5109, Perplexity: 12.3157, time_taken_in_seconds: 4
Epoch [1/1], Step [12307/13804], Loss: 2.5473, Perplexity: 12.7731, time_taken_in_seconds: 5
Epoch [1/1], Step [12308/13804], Loss: 2.5532, Perplexity: 12.8481, time_taken_in_seconds: 6
Epoch [1/1], Step [12309/13804], Loss: 2.4516, Perplexity: 11.6071, time_taken_in_seconds: 7
Epoch [1/1], Step [12310/13804], Loss: 2.0752, Perplexity: 7.9665, time_taken_in_seconds: 8
Epoch [1/1], Step [12311/13804], Loss: 2.6325, Perplexity: 13.9088, time_taken_in_seconds: 8
Epoch [1/1], Step [12312/13804], Loss: 2.9075, Perplexity: 18.3101, time_taken_in_seconds: 9
Epoch [1/1], Step [12313/13804], Loss: 2.5653, Perplexity: 13.0041, time_taken_in_seconds: 10
Epoch [1/1], Step [12314/13804], Loss: 2.0503, Perplexity: 7.7700, time_taken_in_seconds: 11
Epoch [1/1], Step [12315/13804], Loss: 2.5440, Perplexity: 12.7303, time_taken_in_seconds: 12
Epoch [1/1], Step [12316/13804], Loss: 2.4675, Perplexity: 11.7930, time_taken_in_seconds: 13
Epoch [1/1], Step [12317/13804], Loss: 3.7791, Perplexity: 43.7785, time_taken_in_seconds: 13
Epoch [1/1], Step [12318/13804], Loss: 2.6801, Perplexity: 14.5866, time_taken_in_seconds: 14
Epoch [1/1], Step [12319/13804], Loss: 2.6217, Perplexity: 13.7584, time_taken_in_seconds: 15
Epoch [1/1], Step [12320/13804], Loss: 2.7618, Perplexity: 15.8280, time_taken_in_seconds: 16
Epoch [1/1], Step [12321/13804], Loss: 2.2207, Perplexity: 9.2142, time_taken_in_seconds: 17
Epoch [1/1], Step [12322/13804], Loss: 3.0133, Perplexity: 20.3551, time_taken_in_seconds: 17
Epoch [1/1], Step [12323/13804], Loss: 2.5261, Perplexity: 12.5043, time_taken_in_seconds: 18
Epoch [1/1], Step [12324/13804], Loss: 2.3799, Perplexity: 10.8042, time_taken_in_seconds: 19
Epoch [1/1], Step [12325/13804], Loss: 2.4614, Perplexity: 11.7217, time_taken_in_seconds: 20
Epoch [1/1], Step [12326/13804], Loss: 2.7446, Perplexity: 15.5581, time_taken_in_seconds: 21
Epoch [1/1], Step [12327/13804], Loss: 2.6530, Perplexity: 14.1971, time_taken_in_seconds: 21
Epoch [1/1], Step [12328/13804], Loss: 2.7929, Perplexity: 16.3280, time_taken_in_seconds: 22
Epoch [1/1], Step [12329/13804], Loss: 2.7646, Perplexity: 15.8719, time_taken_in_seconds: 23
Epoch [1/1], Step [12330/13804], Loss: 2.5801, Perplexity: 13.1990, time_taken_in_seconds: 24
Epoch [1/1], Step [12331/13804], Loss: 2.6063, Perplexity: 13.5486, time_taken_in_seconds: 25
Epoch [1/1], Step [12332/13804], Loss: 2.4917, Perplexity: 12.0821, time_taken_in_seconds: 25
Epoch [1/1], Step [12333/13804], Loss: 2.6463, Perplexity: 14.1020, time_taken_in_seconds: 26
Epoch [1/1], Step [12334/13804], Loss: 2.6042, Perplexity: 13.5202, time_taken_in_seconds: 27
Epoch [1/1], Step [12335/13804], Loss: 2.3571, Perplexity: 10.5603, time_taken_in_seconds: 28
Epoch [1/1], Step [12336/13804], Loss: 2.6886, Perplexity: 14.7111, time_taken_in_seconds: 29
Epoch [1/1], Step [12337/13804], Loss: 2.5640, Perplexity: 12.9870, time_taken_in_seconds: 29
Epoch [1/1], Step [12338/13804], Loss: 3.1720, Perplexity: 23.8559, time_taken_in_seconds: 30
Epoch [1/1], Step [12339/13804], Loss: 2.4372, Perplexity: 11.4406, time_taken_in_seconds: 31
Epoch [1/1], Step [12340/13804], Loss: 2.4813, Perplexity: 11.9564, time_taken_in_seconds: 32
Epoch [1/1], Step [12341/13804], Loss: 2.7262, Perplexity: 15.2754, time_taken_in_seconds: 33
Epoch [1/1], Step [12342/13804], Loss: 2.8693, Perplexity: 17.6251, time_taken_in_seconds: 34
Epoch [1/1], Step [12343/13804], Loss: 2.5511, Perplexity: 12.8214, time_taken_in_seconds: 34
Epoch [1/1], Step [12344/13804], Loss: 2.7827, Perplexity: 16.1625, time_taken_in_seconds: 35
Epoch [1/1], Step [12345/13804], Loss: 2.3297, Perplexity: 10.2752, time_taken_in_seconds: 36
Epoch [1/1], Step [12346/13804], Loss: 2.6803, Perplexity: 14.5898, time_taken_in_seconds: 37
Epoch [1/1], Step [12347/13804], Loss: 2.6197, Perplexity: 13.7320, time_taken_in_seconds: 38
Epoch [1/1], Step [12348/13804], Loss: 2.2622, Perplexity: 9.6040, time_taken_in_seconds: 39
Epoch [1/1], Step [12349/13804], Loss: 2.7528, Perplexity: 15.6861, time_taken_in_seconds: 39
Epoch [1/1], Step [12350/13804], Loss: 2.4087, Perplexity: 11.1199, time_taken_in_seconds: 40
Epoch [1/1], Step [12351/13804], Loss: 2.6909, Perplexity: 14.7451, time_taken_in_seconds: 41
Epoch [1/1], Step [12352/13804], Loss: 2.7334, Perplexity: 15.3848, time_taken_in_seconds: 42
Epoch [1/1], Step [12353/13804], Loss: 2.5301, Perplexity: 12.5549, time_taken_in_seconds: 43
Epoch [1/1], Step [12354/13804], Loss: 2.3506, Perplexity: 10.4924, time_taken_in_seconds: 43
Epoch [1/1], Step [12355/13804], Loss: 2.4420, Perplexity: 11.4955, time_taken_in_seconds: 44
Epoch [1/1], Step [12356/13804], Loss: 2.4722, Perplexity: 11.8479, time_taken_in_seconds: 45
Epoch [1/1], Step [12357/13804], Loss: 2.6481, Perplexity: 14.1268, time_taken_in_seconds: 46
Epoch [1/1], Step [12358/13804], Loss: 2.8084, Perplexity: 16.5831, time_taken_in_seconds: 47
Epoch [1/1], Step [12359/13804], Loss: 2.3717, Perplexity: 10.7156, time_taken_in_seconds: 48
Epoch [1/1], Step [12360/13804], Loss: 2.3393, Perplexity: 10.3740, time_taken_in_seconds: 48
Epoch [1/1], Step [12361/13804], Loss: 2.4963, Perplexity: 12.1374, time_taken_in_seconds: 49
Epoch [1/1], Step [12362/13804], Loss: 2.3566, Perplexity: 10.5549, time_taken_in_seconds: 50
Epoch [1/1], Step [12363/13804], Loss: 2.4639, Perplexity: 11.7507, time_taken_in_seconds: 51
Epoch [1/1], Step [12364/13804], Loss: 2.8887, Perplexity: 17.9704, time_taken_in_seconds: 52
Epoch [1/1], Step [12365/13804], Loss: 2.4307, Perplexity: 11.3671, time_taken_in_seconds: 52
Epoch [1/1], Step [12366/13804], Loss: 2.6951, Perplexity: 14.8073, time_taken_in_seconds: 53
Epoch [1/1], Step [12367/13804], Loss: 2.3383, Perplexity: 10.3633, time_taken_in_seconds: 54
Epoch [1/1], Step [12368/13804], Loss: 2.4355, Perplexity: 11.4215, time_taken_in_seconds: 55
Epoch [1/1], Step [12369/13804], Loss: 2.3912, Perplexity: 10.9264, time_taken_in_seconds: 56
Epoch [1/1], Step [12370/13804], Loss: 2.3295, Perplexity: 10.2728, time_taken_in_seconds: 56
Epoch [1/1], Step [12371/13804], Loss: 2.7006, Perplexity: 14.8893, time_taken_in_seconds: 57
Epoch [1/1], Step [12372/13804], Loss: 2.6388, Perplexity: 13.9960, time_taken_in_seconds: 58
Epoch [1/1], Step [12373/13804], Loss: 2.6772, Perplexity: 14.5440, time_taken_in_seconds: 59
Epoch [1/1], Step [12374/13804], Loss: 2.4977, Perplexity: 12.1542, time_taken_in_seconds: 60
Epoch [1/1], Step [12375/13804], Loss: 3.3812, Perplexity: 29.4074, time_taken_in_seconds: 60
Epoch [1/1], Step [12376/13804], Loss: 3.0009, Perplexity: 20.1041, time_taken_in_seconds: 61
Epoch [1/1], Step [12377/13804], Loss: 2.5009, Perplexity: 12.1931, time_taken_in_seconds: 62
Epoch [1/1], Step [12378/13804], Loss: 2.5802, Perplexity: 13.1995, time_taken_in_seconds: 63
Epoch [1/1], Step [12379/13804], Loss: 2.6779, Perplexity: 14.5538, time_taken_in_seconds: 64
Epoch [1/1], Step [12380/13804], Loss: 2.8680, Perplexity: 17.6013, time_taken_in_seconds: 65
Epoch [1/1], Step [12381/13804], Loss: 2.5158, Perplexity: 12.3761, time_taken_in_seconds: 65
Epoch [1/1], Step [12382/13804], Loss: 2.3098, Perplexity: 10.0723, time_taken_in_seconds: 66
Epoch [1/1], Step [12383/13804], Loss: 2.2813, Perplexity: 9.7896, time_taken_in_seconds: 67
Epoch [1/1], Step [12384/13804], Loss: 2.7717, Perplexity: 15.9854, time_taken_in_seconds: 68
Epoch [1/1], Step [12385/13804], Loss: 2.8722, Perplexity: 17.6750, time_taken_in_seconds: 69
Epoch [1/1], Step [12386/13804], Loss: 2.3563, Perplexity: 10.5515, time_taken_in_seconds: 69
Epoch [1/1], Step [12387/13804], Loss: 2.6374, Perplexity: 13.9767, time_taken_in_seconds: 70
Epoch [1/1], Step [12388/13804], Loss: 2.4892, Perplexity: 12.0515, time_taken_in_seconds: 71
Epoch [1/1], Step [12389/13804], Loss: 2.4007, Perplexity: 11.0313, time_taken_in_seconds: 72
Epoch [1/1], Step [12390/13804], Loss: 2.2950, Perplexity: 9.9245, time_taken_in_seconds: 73
Epoch [1/1], Step [12391/13804], Loss: 2.3256, Perplexity: 10.2324, time_taken_in_seconds: 73
Epoch [1/1], Step [12392/13804], Loss: 3.4608, Perplexity: 31.8438, time_taken_in_seconds: 74
Epoch [1/1], Step [12393/13804], Loss: 2.5955, Perplexity: 13.4028, time_taken_in_seconds: 75
Epoch [1/1], Step [12394/13804], Loss: 2.6095, Perplexity: 13.5922, time_taken_in_seconds: 76
Epoch [1/1], Step [12395/13804], Loss: 2.6192, Perplexity: 13.7245, time_taken_in_seconds: 77
Epoch [1/1], Step [12396/13804], Loss: 2.4688, Perplexity: 11.8085, time_taken_in_seconds: 77
Epoch [1/1], Step [12397/13804], Loss: 2.3770, Perplexity: 10.7725, time_taken_in_seconds: 78
Epoch [1/1], Step [12398/13804], Loss: 2.5936, Perplexity: 13.3782, time_taken_in_seconds: 79
Epoch [1/1], Step [12399/13804], Loss: 2.4572, Perplexity: 11.6717, time_taken_in_seconds: 80
Epoch [1/1], Step [12400/13804], Loss: 2.5991, Perplexity: 13.4514, time_taken_in_seconds: 81
Epoch [1/1], Step [12401/13804], Loss: 2.7132, Perplexity: 15.0771, time_taken_in_seconds: 0
Epoch [1/1], Step [12402/13804], Loss: 2.4623, Perplexity: 11.7318, time_taken_in_seconds: 1
Epoch [1/1], Step [12403/13804], Loss: 2.5569, Perplexity: 12.8959, time_taken_in_seconds: 2
Epoch [1/1], Step [12404/13804], Loss: 2.5106, Perplexity: 12.3122, time_taken_in_seconds: 3
Epoch [1/1], Step [12405/13804], Loss: 2.1752, Perplexity: 8.8035, time_taken_in_seconds: 4
Epoch [1/1], Step [12406/13804], Loss: 2.7520, Perplexity: 15.6743, time_taken_in_seconds: 4
Epoch [1/1], Step [12407/13804], Loss: 2.9071, Perplexity: 18.3039, time_taken_in_seconds: 5
Epoch [1/1], Step [12408/13804], Loss: 2.4820, Perplexity: 11.9649, time_taken_in_seconds: 6
Epoch [1/1], Step [12409/13804], Loss: 2.5940, Perplexity: 13.3827, time_taken_in_seconds: 7
Epoch [1/1], Step [12410/13804], Loss: 2.5914, Perplexity: 13.3479, time_taken_in_seconds: 8
Epoch [1/1], Step [12411/13804], Loss: 2.2540, Perplexity: 9.5259, time_taken_in_seconds: 8
Epoch [1/1], Step [12412/13804], Loss: 2.4200, Perplexity: 11.2458, time_taken_in_seconds: 9
Epoch [1/1], Step [12413/13804], Loss: 2.5928, Perplexity: 13.3672, time_taken_in_seconds: 10
Epoch [1/1], Step [12414/13804], Loss: 2.7395, Perplexity: 15.4785, time_taken_in_seconds: 11
Epoch [1/1], Step [12415/13804], Loss: 2.7732, Perplexity: 16.0096, time_taken_in_seconds: 12
Epoch [1/1], Step [12416/13804], Loss: 2.5794, Perplexity: 13.1896, time_taken_in_seconds: 12
Epoch [1/1], Step [12417/13804], Loss: 2.1876, Perplexity: 8.9137, time_taken_in_seconds: 13
Epoch [1/1], Step [12418/13804], Loss: 2.6494, Perplexity: 14.1452, time_taken_in_seconds: 14
Epoch [1/1], Step [12419/13804], Loss: 2.3566, Perplexity: 10.5553, time_taken_in_seconds: 15
Epoch [1/1], Step [12420/13804], Loss: 2.4138, Perplexity: 11.1761, time_taken_in_seconds: 16
Epoch [1/1], Step [12421/13804], Loss: 2.6328, Perplexity: 13.9132, time_taken_in_seconds: 17
Epoch [1/1], Step [12422/13804], Loss: 2.3321, Perplexity: 10.2998, time_taken_in_seconds: 18
Epoch [1/1], Step [12423/13804], Loss: 2.6477, Perplexity: 14.1221, time_taken_in_seconds: 18
Epoch [1/1], Step [12424/13804], Loss: 2.7293, Perplexity: 15.3215, time_taken_in_seconds: 19
Epoch [1/1], Step [12425/13804], Loss: 2.3984, Perplexity: 11.0061, time_taken_in_seconds: 20
Epoch [1/1], Step [12426/13804], Loss: 2.4019, Perplexity: 11.0437, time_taken_in_seconds: 21
Epoch [1/1], Step [12427/13804], Loss: 2.6089, Perplexity: 13.5844, time_taken_in_seconds: 22
Epoch [1/1], Step [12428/13804], Loss: 2.2495, Perplexity: 9.4826, time_taken_in_seconds: 22
Epoch [1/1], Step [12429/13804], Loss: 2.5037, Perplexity: 12.2281, time_taken_in_seconds: 23
Epoch [1/1], Step [12430/13804], Loss: 2.9386, Perplexity: 18.8887, time_taken_in_seconds: 24
Epoch [1/1], Step [12431/13804], Loss: 2.5894, Perplexity: 13.3217, time_taken_in_seconds: 25
Epoch [1/1], Step [12432/13804], Loss: 2.4815, Perplexity: 11.9591, time_taken_in_seconds: 26
Epoch [1/1], Step [12433/13804], Loss: 2.3977, Perplexity: 10.9974, time_taken_in_seconds: 27
Epoch [1/1], Step [12434/13804], Loss: 2.3070, Perplexity: 10.0445, time_taken_in_seconds: 27
Epoch [1/1], Step [12435/13804], Loss: 2.4058, Perplexity: 11.0876, time_taken_in_seconds: 28
Epoch [1/1], Step [12436/13804], Loss: 2.5330, Perplexity: 12.5908, time_taken_in_seconds: 29
Epoch [1/1], Step [12437/13804], Loss: 2.8582, Perplexity: 17.4295, time_taken_in_seconds: 30
Epoch [1/1], Step [12438/13804], Loss: 2.7239, Perplexity: 15.2391, time_taken_in_seconds: 31
Epoch [1/1], Step [12439/13804], Loss: 2.3394, Perplexity: 10.3746, time_taken_in_seconds: 31
Epoch [1/1], Step [12440/13804], Loss: 2.6004, Perplexity: 13.4686, time_taken_in_seconds: 32
Epoch [1/1], Step [12441/13804], Loss: 2.0961, Perplexity: 8.1341, time_taken_in_seconds: 33
Epoch [1/1], Step [12442/13804], Loss: 2.3350, Perplexity: 10.3292, time_taken_in_seconds: 34
Epoch [1/1], Step [12443/13804], Loss: 2.6999, Perplexity: 14.8788, time_taken_in_seconds: 35
Epoch [1/1], Step [12444/13804], Loss: 2.7562, Perplexity: 15.7393, time_taken_in_seconds: 36
Epoch [1/1], Step [12445/13804], Loss: 2.5731, Perplexity: 13.1062, time_taken_in_seconds: 36
Epoch [1/1], Step [12446/13804], Loss: 2.5982, Perplexity: 13.4398, time_taken_in_seconds: 37
Epoch [1/1], Step [12447/13804], Loss: 2.3837, Perplexity: 10.8445, time_taken_in_seconds: 38
Epoch [1/1], Step [12448/13804], Loss: 3.2058, Perplexity: 24.6760, time_taken_in_seconds: 39
Epoch [1/1], Step [12449/13804], Loss: 2.3027, Perplexity: 10.0012, time_taken_in_seconds: 40
Epoch [1/1], Step [12450/13804], Loss: 2.4951, Perplexity: 12.1234, time_taken_in_seconds: 40
Epoch [1/1], Step [12451/13804], Loss: 2.5639, Perplexity: 12.9861, time_taken_in_seconds: 41
Epoch [1/1], Step [12452/13804], Loss: 2.1945, Perplexity: 8.9757, time_taken_in_seconds: 42
Epoch [1/1], Step [12453/13804], Loss: 2.5477, Perplexity: 12.7775, time_taken_in_seconds: 43
Epoch [1/1], Step [12454/13804], Loss: 2.6663, Perplexity: 14.3870, time_taken_in_seconds: 44
Epoch [1/1], Step [12455/13804], Loss: 2.4323, Perplexity: 11.3855, time_taken_in_seconds: 44
Epoch [1/1], Step [12456/13804], Loss: 2.6231, Perplexity: 13.7780, time_taken_in_seconds: 45
Epoch [1/1], Step [12457/13804], Loss: 2.3093, Perplexity: 10.0678, time_taken_in_seconds: 46
Epoch [1/1], Step [12458/13804], Loss: 2.6199, Perplexity: 13.7342, time_taken_in_seconds: 47
Epoch [1/1], Step [12459/13804], Loss: 2.3662, Perplexity: 10.6568, time_taken_in_seconds: 48
Epoch [1/1], Step [12460/13804], Loss: 2.5964, Perplexity: 13.4157, time_taken_in_seconds: 49
Epoch [1/1], Step [12461/13804], Loss: 2.2587, Perplexity: 9.5702, time_taken_in_seconds: 49
Epoch [1/1], Step [12462/13804], Loss: 2.4574, Perplexity: 11.6743, time_taken_in_seconds: 50
Epoch [1/1], Step [12463/13804], Loss: 2.5567, Perplexity: 12.8927, time_taken_in_seconds: 51
Epoch [1/1], Step [12464/13804], Loss: 2.8670, Perplexity: 17.5846, time_taken_in_seconds: 52
Epoch [1/1], Step [12465/13804], Loss: 2.2960, Perplexity: 9.9339, time_taken_in_seconds: 53
Epoch [1/1], Step [12466/13804], Loss: 2.3464, Perplexity: 10.4477, time_taken_in_seconds: 53
Epoch [1/1], Step [12467/13804], Loss: 3.2771, Perplexity: 26.4996, time_taken_in_seconds: 54
Epoch [1/1], Step [12468/13804], Loss: 2.2290, Perplexity: 9.2910, time_taken_in_seconds: 55
Epoch [1/1], Step [12469/13804], Loss: 2.3303, Perplexity: 10.2808, time_taken_in_seconds: 56
Epoch [1/1], Step [12470/13804], Loss: 2.6139, Perplexity: 13.6515, time_taken_in_seconds: 57
Epoch [1/1], Step [12471/13804], Loss: 2.4166, Perplexity: 11.2079, time_taken_in_seconds: 58
Epoch [1/1], Step [12472/13804], Loss: 2.6664, Perplexity: 14.3888, time_taken_in_seconds: 58
Epoch [1/1], Step [12473/13804], Loss: 2.4494, Perplexity: 11.5816, time_taken_in_seconds: 59
Epoch [1/1], Step [12474/13804], Loss: 2.3784, Perplexity: 10.7874, time_taken_in_seconds: 60
Epoch [1/1], Step [12475/13804], Loss: 2.7011, Perplexity: 14.8963, time_taken_in_seconds: 61
Epoch [1/1], Step [12476/13804], Loss: 2.7032, Perplexity: 14.9281, time_taken_in_seconds: 62
Epoch [1/1], Step [12477/13804], Loss: 2.6867, Perplexity: 14.6838, time_taken_in_seconds: 62
Epoch [1/1], Step [12478/13804], Loss: 2.3004, Perplexity: 9.9782, time_taken_in_seconds: 63
Epoch [1/1], Step [12479/13804], Loss: 2.4114, Perplexity: 11.1490, time_taken_in_seconds: 64
Epoch [1/1], Step [12480/13804], Loss: 2.4017, Perplexity: 11.0418, time_taken_in_seconds: 65
Epoch [1/1], Step [12481/13804], Loss: 2.6003, Perplexity: 13.4676, time_taken_in_seconds: 66
Epoch [1/1], Step [12482/13804], Loss: 2.5030, Perplexity: 12.2189, time_taken_in_seconds: 66
Epoch [1/1], Step [12483/13804], Loss: 2.3031, Perplexity: 10.0056, time_taken_in_seconds: 67
Epoch [1/1], Step [12484/13804], Loss: 2.7726, Perplexity: 16.0004, time_taken_in_seconds: 68
Epoch [1/1], Step [12485/13804], Loss: 2.0806, Perplexity: 8.0096, time_taken_in_seconds: 69
Epoch [1/1], Step [12486/13804], Loss: 2.5683, Perplexity: 13.0435, time_taken_in_seconds: 70
Epoch [1/1], Step [12487/13804], Loss: 2.5730, Perplexity: 13.1052, time_taken_in_seconds: 71
Epoch [1/1], Step [12488/13804], Loss: 2.3993, Perplexity: 11.0154, time_taken_in_seconds: 71
Epoch [1/1], Step [12489/13804], Loss: 2.5045, Perplexity: 12.2375, time_taken_in_seconds: 72
Epoch [1/1], Step [12490/13804], Loss: 2.7570, Perplexity: 15.7526, time_taken_in_seconds: 73
Epoch [1/1], Step [12491/13804], Loss: 2.4420, Perplexity: 11.4955, time_taken_in_seconds: 74
Epoch [1/1], Step [12492/13804], Loss: 2.4230, Perplexity: 11.2796, time_taken_in_seconds: 75
Epoch [1/1], Step [12493/13804], Loss: 2.6475, Perplexity: 14.1180, time_taken_in_seconds: 76
Epoch [1/1], Step [12494/13804], Loss: 2.5521, Perplexity: 12.8339, time_taken_in_seconds: 76
Epoch [1/1], Step [12495/13804], Loss: 2.4424, Perplexity: 11.5008, time_taken_in_seconds: 77
Epoch [1/1], Step [12496/13804], Loss: 2.4196, Perplexity: 11.2410, time_taken_in_seconds: 78
Epoch [1/1], Step [12497/13804], Loss: 2.4267, Perplexity: 11.3212, time_taken_in_seconds: 79
Epoch [1/1], Step [12498/13804], Loss: 2.4872, Perplexity: 12.0274, time_taken_in_seconds: 80
Epoch [1/1], Step [12499/13804], Loss: 2.3194, Perplexity: 10.1699, time_taken_in_seconds: 80
Epoch [1/1], Step [12500/13804], Loss: 2.7780, Perplexity: 16.0871, time_taken_in_seconds: 81
Epoch [1/1], Step [12501/13804], Loss: 2.1434, Perplexity: 8.5280, time_taken_in_seconds: 0
Epoch [1/1], Step [12502/13804], Loss: 2.3105, Perplexity: 10.0793, time_taken_in_seconds: 1
Epoch [1/1], Step [12503/13804], Loss: 2.5807, Perplexity: 13.2065, time_taken_in_seconds: 2
Epoch [1/1], Step [12504/13804], Loss: 3.0511, Perplexity: 21.1391, time_taken_in_seconds: 3
Epoch [1/1], Step [12505/13804], Loss: 2.6154, Perplexity: 13.6727, time_taken_in_seconds: 4
Epoch [1/1], Step [12506/13804], Loss: 2.8614, Perplexity: 17.4862, time_taken_in_seconds: 4
Epoch [1/1], Step [12507/13804], Loss: 2.5617, Perplexity: 12.9581, time_taken_in_seconds: 5
Epoch [1/1], Step [12508/13804], Loss: 2.4375, Perplexity: 11.4447, time_taken_in_seconds: 6
Epoch [1/1], Step [12509/13804], Loss: 2.3992, Perplexity: 11.0140, time_taken_in_seconds: 7
Epoch [1/1], Step [12510/13804], Loss: 2.4721, Perplexity: 11.8475, time_taken_in_seconds: 8
Epoch [1/1], Step [12511/13804], Loss: 2.6400, Perplexity: 14.0133, time_taken_in_seconds: 8
Epoch [1/1], Step [12512/13804], Loss: 2.9755, Perplexity: 19.5997, time_taken_in_seconds: 9
Epoch [1/1], Step [12513/13804], Loss: 2.5478, Perplexity: 12.7793, time_taken_in_seconds: 10
Epoch [1/1], Step [12514/13804], Loss: 2.7825, Perplexity: 16.1587, time_taken_in_seconds: 11
Epoch [1/1], Step [12515/13804], Loss: 2.4305, Perplexity: 11.3640, time_taken_in_seconds: 12
Epoch [1/1], Step [12516/13804], Loss: 2.7784, Perplexity: 16.0934, time_taken_in_seconds: 13
Epoch [1/1], Step [12517/13804], Loss: 3.9633, Perplexity: 52.6306, time_taken_in_seconds: 13
Epoch [1/1], Step [12518/13804], Loss: 2.6203, Perplexity: 13.7402, time_taken_in_seconds: 14
Epoch [1/1], Step [12519/13804], Loss: 2.5745, Perplexity: 13.1250, time_taken_in_seconds: 15
Epoch [1/1], Step [12520/13804], Loss: 2.5034, Perplexity: 12.2244, time_taken_in_seconds: 16
Epoch [1/1], Step [12521/13804], Loss: 2.5674, Perplexity: 13.0316, time_taken_in_seconds: 17
Epoch [1/1], Step [12522/13804], Loss: 2.5017, Perplexity: 12.2036, time_taken_in_seconds: 18
Epoch [1/1], Step [12523/13804], Loss: 2.6432, Perplexity: 14.0583, time_taken_in_seconds: 18
Epoch [1/1], Step [12524/13804], Loss: 2.5215, Perplexity: 12.4476, time_taken_in_seconds: 19
Epoch [1/1], Step [12525/13804], Loss: 2.5544, Perplexity: 12.8633, time_taken_in_seconds: 20
Epoch [1/1], Step [12526/13804], Loss: 2.5630, Perplexity: 12.9744, time_taken_in_seconds: 21
Epoch [1/1], Step [12527/13804], Loss: 2.5108, Perplexity: 12.3149, time_taken_in_seconds: 22
Epoch [1/1], Step [12528/13804], Loss: 2.2674, Perplexity: 9.6539, time_taken_in_seconds: 22
Epoch [1/1], Step [12529/13804], Loss: 2.5624, Perplexity: 12.9671, time_taken_in_seconds: 23
Epoch [1/1], Step [12530/13804], Loss: 2.4613, Perplexity: 11.7199, time_taken_in_seconds: 24
Epoch [1/1], Step [12531/13804], Loss: 2.5200, Perplexity: 12.4280, time_taken_in_seconds: 25
Epoch [1/1], Step [12532/13804], Loss: 2.4698, Perplexity: 11.8203, time_taken_in_seconds: 26
Epoch [1/1], Step [12533/13804], Loss: 2.4577, Perplexity: 11.6775, time_taken_in_seconds: 27
Epoch [1/1], Step [12534/13804], Loss: 2.3739, Perplexity: 10.7396, time_taken_in_seconds: 27
Epoch [1/1], Step [12535/13804], Loss: 2.3553, Perplexity: 10.5414, time_taken_in_seconds: 28
Epoch [1/1], Step [12536/13804], Loss: 2.3673, Perplexity: 10.6688, time_taken_in_seconds: 29
Epoch [1/1], Step [12537/13804], Loss: 2.7768, Perplexity: 16.0674, time_taken_in_seconds: 30
Epoch [1/1], Step [12538/13804], Loss: 2.5071, Perplexity: 12.2695, time_taken_in_seconds: 31
Epoch [1/1], Step [12539/13804], Loss: 2.2129, Perplexity: 9.1419, time_taken_in_seconds: 31
Epoch [1/1], Step [12540/13804], Loss: 2.3436, Perplexity: 10.4186, time_taken_in_seconds: 32
Epoch [1/1], Step [12541/13804], Loss: 2.5661, Perplexity: 13.0151, time_taken_in_seconds: 33
Epoch [1/1], Step [12542/13804], Loss: 2.5390, Perplexity: 12.6664, time_taken_in_seconds: 34
Epoch [1/1], Step [12543/13804], Loss: 2.7841, Perplexity: 16.1851, time_taken_in_seconds: 35
Epoch [1/1], Step [12544/13804], Loss: 2.9668, Perplexity: 19.4303, time_taken_in_seconds: 35
Epoch [1/1], Step [12545/13804], Loss: 2.4289, Perplexity: 11.3465, time_taken_in_seconds: 36
Epoch [1/1], Step [12546/13804], Loss: 3.4792, Perplexity: 32.4352, time_taken_in_seconds: 37
Epoch [1/1], Step [12547/13804], Loss: 2.2988, Perplexity: 9.9627, time_taken_in_seconds: 38
Epoch [1/1], Step [12548/13804], Loss: 2.5218, Perplexity: 12.4504, time_taken_in_seconds: 39
Epoch [1/1], Step [12549/13804], Loss: 3.4462, Perplexity: 31.3820, time_taken_in_seconds: 40
Epoch [1/1], Step [12550/13804], Loss: 2.3485, Perplexity: 10.4697, time_taken_in_seconds: 40
Epoch [1/1], Step [12551/13804], Loss: 2.4422, Perplexity: 11.4986, time_taken_in_seconds: 41
Epoch [1/1], Step [12552/13804], Loss: 2.7502, Perplexity: 15.6458, time_taken_in_seconds: 42
Epoch [1/1], Step [12553/13804], Loss: 2.7021, Perplexity: 14.9111, time_taken_in_seconds: 43
Epoch [1/1], Step [12554/13804], Loss: 2.8632, Perplexity: 17.5168, time_taken_in_seconds: 44
Epoch [1/1], Step [12555/13804], Loss: 2.5216, Perplexity: 12.4489, time_taken_in_seconds: 44
Epoch [1/1], Step [12556/13804], Loss: 2.5782, Perplexity: 13.1734, time_taken_in_seconds: 45
Epoch [1/1], Step [12557/13804], Loss: 2.6211, Perplexity: 13.7505, time_taken_in_seconds: 46
Epoch [1/1], Step [12558/13804], Loss: 2.4570, Perplexity: 11.6703, time_taken_in_seconds: 47
Epoch [1/1], Step [12559/13804], Loss: 2.6634, Perplexity: 14.3444, time_taken_in_seconds: 48
Epoch [1/1], Step [12560/13804], Loss: 2.4741, Perplexity: 11.8705, time_taken_in_seconds: 48
Epoch [1/1], Step [12561/13804], Loss: 2.4429, Perplexity: 11.5061, time_taken_in_seconds: 49
Epoch [1/1], Step [12562/13804], Loss: 2.2443, Perplexity: 9.4334, time_taken_in_seconds: 50
Epoch [1/1], Step [12563/13804], Loss: 3.0088, Perplexity: 20.2623, time_taken_in_seconds: 51
Epoch [1/1], Step [12564/13804], Loss: 2.3065, Perplexity: 10.0389, time_taken_in_seconds: 52
Epoch [1/1], Step [12565/13804], Loss: 2.2893, Perplexity: 9.8677, time_taken_in_seconds: 52
Epoch [1/1], Step [12566/13804], Loss: 2.6865, Perplexity: 14.6802, time_taken_in_seconds: 54
Epoch [1/1], Step [12567/13804], Loss: 2.7034, Perplexity: 14.9307, time_taken_in_seconds: 54
Epoch [1/1], Step [12568/13804], Loss: 2.7071, Perplexity: 14.9862, time_taken_in_seconds: 55
Epoch [1/1], Step [12569/13804], Loss: 2.4453, Perplexity: 11.5338, time_taken_in_seconds: 56
Epoch [1/1], Step [12570/13804], Loss: 2.4033, Perplexity: 11.0598, time_taken_in_seconds: 57
Epoch [1/1], Step [12571/13804], Loss: 3.4204, Perplexity: 30.5804, time_taken_in_seconds: 58
Epoch [1/1], Step [12572/13804], Loss: 2.9563, Perplexity: 19.2261, time_taken_in_seconds: 58
Epoch [1/1], Step [12573/13804], Loss: 2.2620, Perplexity: 9.6021, time_taken_in_seconds: 59
Epoch [1/1], Step [12574/13804], Loss: 2.6389, Perplexity: 13.9973, time_taken_in_seconds: 60
Epoch [1/1], Step [12575/13804], Loss: 3.2510, Perplexity: 25.8167, time_taken_in_seconds: 61
Epoch [1/1], Step [12576/13804], Loss: 2.4127, Perplexity: 11.1642, time_taken_in_seconds: 62
Epoch [1/1], Step [12577/13804], Loss: 2.7920, Perplexity: 16.3133, time_taken_in_seconds: 62
Epoch [1/1], Step [12578/13804], Loss: 2.4771, Perplexity: 11.9067, time_taken_in_seconds: 63
Epoch [1/1], Step [12579/13804], Loss: 2.6995, Perplexity: 14.8724, time_taken_in_seconds: 64
Epoch [1/1], Step [12580/13804], Loss: 2.7548, Perplexity: 15.7184, time_taken_in_seconds: 65
Epoch [1/1], Step [12581/13804], Loss: 2.7054, Perplexity: 14.9604, time_taken_in_seconds: 66
Epoch [1/1], Step [12582/13804], Loss: 2.5368, Perplexity: 12.6394, time_taken_in_seconds: 67
Epoch [1/1], Step [12583/13804], Loss: 2.8491, Perplexity: 17.2723, time_taken_in_seconds: 67
Epoch [1/1], Step [12584/13804], Loss: 2.6554, Perplexity: 14.2313, time_taken_in_seconds: 68
Epoch [1/1], Step [12585/13804], Loss: 2.3719, Perplexity: 10.7177, time_taken_in_seconds: 69
Epoch [1/1], Step [12586/13804], Loss: 2.2845, Perplexity: 9.8212, time_taken_in_seconds: 70
Epoch [1/1], Step [12587/13804], Loss: 2.5725, Perplexity: 13.0986, time_taken_in_seconds: 71
Epoch [1/1], Step [12588/13804], Loss: 2.3444, Perplexity: 10.4271, time_taken_in_seconds: 71
Epoch [1/1], Step [12589/13804], Loss: 2.4121, Perplexity: 11.1570, time_taken_in_seconds: 72
Epoch [1/1], Step [12590/13804], Loss: 2.8450, Perplexity: 17.2007, time_taken_in_seconds: 73
Epoch [1/1], Step [12591/13804], Loss: 2.4738, Perplexity: 11.8676, time_taken_in_seconds: 74
Epoch [1/1], Step [12592/13804], Loss: 2.4659, Perplexity: 11.7745, time_taken_in_seconds: 75
Epoch [1/1], Step [12593/13804], Loss: 2.4817, Perplexity: 11.9619, time_taken_in_seconds: 75
Epoch [1/1], Step [12594/13804], Loss: 2.6419, Perplexity: 14.0402, time_taken_in_seconds: 76
Epoch [1/1], Step [12595/13804], Loss: 2.6329, Perplexity: 13.9144, time_taken_in_seconds: 77
Epoch [1/1], Step [12596/13804], Loss: 2.7837, Perplexity: 16.1794, time_taken_in_seconds: 78
Epoch [1/1], Step [12597/13804], Loss: 2.8889, Perplexity: 17.9742, time_taken_in_seconds: 79
Epoch [1/1], Step [12598/13804], Loss: 2.5304, Perplexity: 12.5585, time_taken_in_seconds: 79
Epoch [1/1], Step [12599/13804], Loss: 2.5719, Perplexity: 13.0910, time_taken_in_seconds: 80
Epoch [1/1], Step [12600/13804], Loss: 2.5136, Perplexity: 12.3491, time_taken_in_seconds: 81
Epoch [1/1], Step [12601/13804], Loss: 2.1664, Perplexity: 8.7271, time_taken_in_seconds: 0
Epoch [1/1], Step [12602/13804], Loss: 2.8278, Perplexity: 16.9078, time_taken_in_seconds: 1
Epoch [1/1], Step [12603/13804], Loss: 2.7737, Perplexity: 16.0173, time_taken_in_seconds: 2
Epoch [1/1], Step [12604/13804], Loss: 2.2653, Perplexity: 9.6343, time_taken_in_seconds: 3
Epoch [1/1], Step [12605/13804], Loss: 2.4367, Perplexity: 11.4348, time_taken_in_seconds: 4
Epoch [1/1], Step [12606/13804], Loss: 3.0335, Perplexity: 20.7708, time_taken_in_seconds: 4
Epoch [1/1], Step [12607/13804], Loss: 2.3831, Perplexity: 10.8385, time_taken_in_seconds: 5
Epoch [1/1], Step [12608/13804], Loss: 2.7484, Perplexity: 15.6169, time_taken_in_seconds: 6
Epoch [1/1], Step [12609/13804], Loss: 2.4295, Perplexity: 11.3532, time_taken_in_seconds: 7
Epoch [1/1], Step [12610/13804], Loss: 2.5102, Perplexity: 12.3076, time_taken_in_seconds: 8
Epoch [1/1], Step [12611/13804], Loss: 2.3221, Perplexity: 10.1974, time_taken_in_seconds: 8
Epoch [1/1], Step [12612/13804], Loss: 2.4950, Perplexity: 12.1222, time_taken_in_seconds: 9
Epoch [1/1], Step [12613/13804], Loss: 2.4701, Perplexity: 11.8238, time_taken_in_seconds: 10
Epoch [1/1], Step [12614/13804], Loss: 2.4065, Perplexity: 11.0955, time_taken_in_seconds: 11
Epoch [1/1], Step [12615/13804], Loss: 2.3161, Perplexity: 10.1363, time_taken_in_seconds: 12
Epoch [1/1], Step [12616/13804], Loss: 2.6268, Perplexity: 13.8296, time_taken_in_seconds: 12
Epoch [1/1], Step [12617/13804], Loss: 2.4915, Perplexity: 12.0791, time_taken_in_seconds: 13
Epoch [1/1], Step [12618/13804], Loss: 2.4317, Perplexity: 11.3778, time_taken_in_seconds: 14
Epoch [1/1], Step [12619/13804], Loss: 2.6110, Perplexity: 13.6128, time_taken_in_seconds: 15
Epoch [1/1], Step [12620/13804], Loss: 2.5100, Perplexity: 12.3051, time_taken_in_seconds: 16
Epoch [1/1], Step [12621/13804], Loss: 2.2832, Perplexity: 9.8080, time_taken_in_seconds: 17
Epoch [1/1], Step [12622/13804], Loss: 2.5743, Perplexity: 13.1226, time_taken_in_seconds: 17
Epoch [1/1], Step [12623/13804], Loss: 2.3580, Perplexity: 10.5694, time_taken_in_seconds: 18
Epoch [1/1], Step [12624/13804], Loss: 3.4237, Perplexity: 30.6820, time_taken_in_seconds: 19
Epoch [1/1], Step [12625/13804], Loss: 2.7892, Perplexity: 16.2687, time_taken_in_seconds: 20
Epoch [1/1], Step [12626/13804], Loss: 2.5697, Perplexity: 13.0616, time_taken_in_seconds: 21
Epoch [1/1], Step [12627/13804], Loss: 2.8711, Perplexity: 17.6570, time_taken_in_seconds: 21
Epoch [1/1], Step [12628/13804], Loss: 2.5116, Perplexity: 12.3250, time_taken_in_seconds: 22
Epoch [1/1], Step [12629/13804], Loss: 2.1912, Perplexity: 8.9456, time_taken_in_seconds: 23
Epoch [1/1], Step [12630/13804], Loss: 2.8035, Perplexity: 16.5019, time_taken_in_seconds: 24
Epoch [1/1], Step [12631/13804], Loss: 2.4053, Perplexity: 11.0814, time_taken_in_seconds: 25
Epoch [1/1], Step [12632/13804], Loss: 2.7208, Perplexity: 15.1932, time_taken_in_seconds: 25
Epoch [1/1], Step [12633/13804], Loss: 2.6194, Perplexity: 13.7272, time_taken_in_seconds: 26
Epoch [1/1], Step [12634/13804], Loss: 2.3750, Perplexity: 10.7506, time_taken_in_seconds: 27
Epoch [1/1], Step [12635/13804], Loss: 2.5304, Perplexity: 12.5580, time_taken_in_seconds: 28
Epoch [1/1], Step [12636/13804], Loss: 3.0472, Perplexity: 21.0570, time_taken_in_seconds: 29
Epoch [1/1], Step [12637/13804], Loss: 2.5550, Perplexity: 12.8718, time_taken_in_seconds: 29
Epoch [1/1], Step [12638/13804], Loss: 2.4576, Perplexity: 11.6764, time_taken_in_seconds: 30
Epoch [1/1], Step [12639/13804], Loss: 2.2586, Perplexity: 9.5702, time_taken_in_seconds: 31
Epoch [1/1], Step [12640/13804], Loss: 2.2159, Perplexity: 9.1701, time_taken_in_seconds: 32
Epoch [1/1], Step [12641/13804], Loss: 2.6259, Perplexity: 13.8163, time_taken_in_seconds: 33
Epoch [1/1], Step [12642/13804], Loss: 2.3170, Perplexity: 10.1454, time_taken_in_seconds: 34
Epoch [1/1], Step [12643/13804], Loss: 2.3423, Perplexity: 10.4052, time_taken_in_seconds: 34
Epoch [1/1], Step [12644/13804], Loss: 2.8739, Perplexity: 17.7054, time_taken_in_seconds: 35
Epoch [1/1], Step [12645/13804], Loss: 2.4274, Perplexity: 11.3291, time_taken_in_seconds: 36
Epoch [1/1], Step [12646/13804], Loss: 2.4549, Perplexity: 11.6455, time_taken_in_seconds: 37
Epoch [1/1], Step [12647/13804], Loss: 2.6158, Perplexity: 13.6781, time_taken_in_seconds: 38
Epoch [1/1], Step [12648/13804], Loss: 2.6448, Perplexity: 14.0805, time_taken_in_seconds: 39
Epoch [1/1], Step [12649/13804], Loss: 2.4097, Perplexity: 11.1304, time_taken_in_seconds: 39
Epoch [1/1], Step [12650/13804], Loss: 2.4022, Perplexity: 11.0470, time_taken_in_seconds: 40
Epoch [1/1], Step [12651/13804], Loss: 2.7518, Perplexity: 15.6711, time_taken_in_seconds: 41
Epoch [1/1], Step [12652/13804], Loss: 2.5759, Perplexity: 13.1433, time_taken_in_seconds: 42
Epoch [1/1], Step [12653/13804], Loss: 2.3212, Perplexity: 10.1876, time_taken_in_seconds: 43
Epoch [1/1], Step [12654/13804], Loss: 2.4338, Perplexity: 11.4020, time_taken_in_seconds: 43
Epoch [1/1], Step [12655/13804], Loss: 3.0369, Perplexity: 20.8404, time_taken_in_seconds: 44
Epoch [1/1], Step [12656/13804], Loss: 2.7145, Perplexity: 15.0977, time_taken_in_seconds: 45
Epoch [1/1], Step [12657/13804], Loss: 2.5650, Perplexity: 13.0002, time_taken_in_seconds: 46
Epoch [1/1], Step [12658/13804], Loss: 2.1442, Perplexity: 8.5349, time_taken_in_seconds: 47
Epoch [1/1], Step [12659/13804], Loss: 2.7742, Perplexity: 16.0259, time_taken_in_seconds: 47
Epoch [1/1], Step [12660/13804], Loss: 2.4971, Perplexity: 12.1475, time_taken_in_seconds: 48
Epoch [1/1], Step [12661/13804], Loss: 2.4568, Perplexity: 11.6679, time_taken_in_seconds: 49
Epoch [1/1], Step [12662/13804], Loss: 2.7540, Perplexity: 15.7055, time_taken_in_seconds: 50
Epoch [1/1], Step [12663/13804], Loss: 2.4396, Perplexity: 11.4681, time_taken_in_seconds: 51
Epoch [1/1], Step [12664/13804], Loss: 2.3664, Perplexity: 10.6586, time_taken_in_seconds: 52
Epoch [1/1], Step [12665/13804], Loss: 2.3716, Perplexity: 10.7147, time_taken_in_seconds: 52
Epoch [1/1], Step [12666/13804], Loss: 2.4505, Perplexity: 11.5945, time_taken_in_seconds: 53
Epoch [1/1], Step [12667/13804], Loss: 2.6315, Perplexity: 13.8953, time_taken_in_seconds: 54
Epoch [1/1], Step [12668/13804], Loss: 2.3470, Perplexity: 10.4543, time_taken_in_seconds: 55
Epoch [1/1], Step [12669/13804], Loss: 2.5861, Perplexity: 13.2781, time_taken_in_seconds: 56
Epoch [1/1], Step [12670/13804], Loss: 2.4143, Perplexity: 11.1818, time_taken_in_seconds: 56
Epoch [1/1], Step [12671/13804], Loss: 2.6602, Perplexity: 14.2992, time_taken_in_seconds: 57
Epoch [1/1], Step [12672/13804], Loss: 2.7064, Perplexity: 14.9751, time_taken_in_seconds: 58
Epoch [1/1], Step [12673/13804], Loss: 2.3890, Perplexity: 10.9030, time_taken_in_seconds: 59
Epoch [1/1], Step [12674/13804], Loss: 2.6269, Perplexity: 13.8312, time_taken_in_seconds: 60
Epoch [1/1], Step [12675/13804], Loss: 2.5230, Perplexity: 12.4665, time_taken_in_seconds: 60
Epoch [1/1], Step [12676/13804], Loss: 2.4835, Perplexity: 11.9829, time_taken_in_seconds: 61
Epoch [1/1], Step [12677/13804], Loss: 2.4333, Perplexity: 11.3961, time_taken_in_seconds: 62
Epoch [1/1], Step [12678/13804], Loss: 2.2255, Perplexity: 9.2578, time_taken_in_seconds: 63
Epoch [1/1], Step [12679/13804], Loss: 2.4396, Perplexity: 11.4682, time_taken_in_seconds: 64
Epoch [1/1], Step [12680/13804], Loss: 2.4629, Perplexity: 11.7391, time_taken_in_seconds: 64
Epoch [1/1], Step [12681/13804], Loss: 2.4669, Perplexity: 11.7853, time_taken_in_seconds: 65
Epoch [1/1], Step [12682/13804], Loss: 2.3732, Perplexity: 10.7315, time_taken_in_seconds: 66
Epoch [1/1], Step [12683/13804], Loss: 2.3312, Perplexity: 10.2901, time_taken_in_seconds: 67
Epoch [1/1], Step [12684/13804], Loss: 2.3239, Perplexity: 10.2156, time_taken_in_seconds: 68
Epoch [1/1], Step [12685/13804], Loss: 2.7510, Perplexity: 15.6586, time_taken_in_seconds: 69
Epoch [1/1], Step [12686/13804], Loss: 2.5791, Perplexity: 13.1857, time_taken_in_seconds: 69
Epoch [1/1], Step [12687/13804], Loss: 2.7250, Perplexity: 15.2562, time_taken_in_seconds: 70
Epoch [1/1], Step [12688/13804], Loss: 2.3153, Perplexity: 10.1275, time_taken_in_seconds: 71
Epoch [1/1], Step [12689/13804], Loss: 2.5412, Perplexity: 12.6949, time_taken_in_seconds: 72
Epoch [1/1], Step [12690/13804], Loss: 2.1547, Perplexity: 8.6253, time_taken_in_seconds: 73
Epoch [1/1], Step [12691/13804], Loss: 2.7343, Perplexity: 15.3984, time_taken_in_seconds: 73
Epoch [1/1], Step [12692/13804], Loss: 2.1621, Perplexity: 8.6895, time_taken_in_seconds: 74
Epoch [1/1], Step [12693/13804], Loss: 2.6153, Perplexity: 13.6716, time_taken_in_seconds: 75
Epoch [1/1], Step [12694/13804], Loss: 2.2135, Perplexity: 9.1474, time_taken_in_seconds: 76
Epoch [1/1], Step [12695/13804], Loss: 2.9546, Perplexity: 19.1945, time_taken_in_seconds: 77
Epoch [1/1], Step [12696/13804], Loss: 2.2997, Perplexity: 9.9708, time_taken_in_seconds: 77
Epoch [1/1], Step [12697/13804], Loss: 2.5446, Perplexity: 12.7379, time_taken_in_seconds: 78
Epoch [1/1], Step [12698/13804], Loss: 2.2953, Perplexity: 9.9277, time_taken_in_seconds: 79
Epoch [1/1], Step [12699/13804], Loss: 2.3913, Perplexity: 10.9281, time_taken_in_seconds: 80
Epoch [1/1], Step [12700/13804], Loss: 2.4303, Perplexity: 11.3619, time_taken_in_seconds: 81
Epoch [1/1], Step [12701/13804], Loss: 2.7527, Perplexity: 15.6850, time_taken_in_seconds: 0
Epoch [1/1], Step [12702/13804], Loss: 2.4204, Perplexity: 11.2505, time_taken_in_seconds: 1
Epoch [1/1], Step [12703/13804], Loss: 2.9009, Perplexity: 18.1896, time_taken_in_seconds: 2
Epoch [1/1], Step [12704/13804], Loss: 2.4363, Perplexity: 11.4312, time_taken_in_seconds: 3
Epoch [1/1], Step [12705/13804], Loss: 3.0010, Perplexity: 20.1047, time_taken_in_seconds: 4
Epoch [1/1], Step [12706/13804], Loss: 2.7416, Perplexity: 15.5124, time_taken_in_seconds: 4
Epoch [1/1], Step [12707/13804], Loss: 2.7443, Perplexity: 15.5537, time_taken_in_seconds: 5
Epoch [1/1], Step [12708/13804], Loss: 2.4013, Perplexity: 11.0376, time_taken_in_seconds: 6
Epoch [1/1], Step [12709/13804], Loss: 2.8109, Perplexity: 16.6241, time_taken_in_seconds: 7
Epoch [1/1], Step [12710/13804], Loss: 2.6077, Perplexity: 13.5674, time_taken_in_seconds: 8
Epoch [1/1], Step [12711/13804], Loss: 2.3142, Perplexity: 10.1171, time_taken_in_seconds: 9
Epoch [1/1], Step [12712/13804], Loss: 2.4617, Perplexity: 11.7244, time_taken_in_seconds: 9
Epoch [1/1], Step [12713/13804], Loss: 2.8038, Perplexity: 16.5072, time_taken_in_seconds: 10
Epoch [1/1], Step [12714/13804], Loss: 2.5111, Perplexity: 12.3190, time_taken_in_seconds: 11
Epoch [1/1], Step [12715/13804], Loss: 2.6848, Perplexity: 14.6550, time_taken_in_seconds: 12
Epoch [1/1], Step [12716/13804], Loss: 2.2958, Perplexity: 9.9320, time_taken_in_seconds: 13
Epoch [1/1], Step [12717/13804], Loss: 2.4847, Perplexity: 11.9977, time_taken_in_seconds: 14
Epoch [1/1], Step [12718/13804], Loss: 2.7380, Perplexity: 15.4564, time_taken_in_seconds: 14
Epoch [1/1], Step [12719/13804], Loss: 2.9535, Perplexity: 19.1733, time_taken_in_seconds: 15
Epoch [1/1], Step [12720/13804], Loss: 2.5685, Perplexity: 13.0457, time_taken_in_seconds: 16
Epoch [1/1], Step [12721/13804], Loss: 2.6890, Perplexity: 14.7171, time_taken_in_seconds: 17
Epoch [1/1], Step [12722/13804], Loss: 2.3165, Perplexity: 10.1405, time_taken_in_seconds: 18
Epoch [1/1], Step [12723/13804], Loss: 2.4714, Perplexity: 11.8391, time_taken_in_seconds: 18
Epoch [1/1], Step [12724/13804], Loss: 2.8855, Perplexity: 17.9120, time_taken_in_seconds: 19
Epoch [1/1], Step [12725/13804], Loss: 2.3576, Perplexity: 10.5653, time_taken_in_seconds: 20
Epoch [1/1], Step [12726/13804], Loss: 2.3604, Perplexity: 10.5955, time_taken_in_seconds: 21
Epoch [1/1], Step [12727/13804], Loss: 2.6958, Perplexity: 14.8179, time_taken_in_seconds: 22
Epoch [1/1], Step [12728/13804], Loss: 2.3853, Perplexity: 10.8619, time_taken_in_seconds: 23
Epoch [1/1], Step [12729/13804], Loss: 2.6299, Perplexity: 13.8723, time_taken_in_seconds: 23
Epoch [1/1], Step [12730/13804], Loss: 2.1971, Perplexity: 8.9988, time_taken_in_seconds: 24
Epoch [1/1], Step [12731/13804], Loss: 2.6736, Perplexity: 14.4921, time_taken_in_seconds: 25
Epoch [1/1], Step [12732/13804], Loss: 2.2831, Perplexity: 9.8069, time_taken_in_seconds: 26
Epoch [1/1], Step [12733/13804], Loss: 2.8079, Perplexity: 16.5748, time_taken_in_seconds: 27
Epoch [1/1], Step [12734/13804], Loss: 2.6993, Perplexity: 14.8688, time_taken_in_seconds: 27
Epoch [1/1], Step [12735/13804], Loss: 2.5247, Perplexity: 12.4872, time_taken_in_seconds: 28
Epoch [1/1], Step [12736/13804], Loss: 2.3250, Perplexity: 10.2264, time_taken_in_seconds: 29
Epoch [1/1], Step [12737/13804], Loss: 2.4630, Perplexity: 11.7395, time_taken_in_seconds: 30
Epoch [1/1], Step [12738/13804], Loss: 2.5843, Perplexity: 13.2538, time_taken_in_seconds: 31
Epoch [1/1], Step [12739/13804], Loss: 2.2479, Perplexity: 9.4675, time_taken_in_seconds: 32
Epoch [1/1], Step [12740/13804], Loss: 2.6167, Perplexity: 13.6908, time_taken_in_seconds: 32
Epoch [1/1], Step [12741/13804], Loss: 2.4693, Perplexity: 11.8147, time_taken_in_seconds: 33
Epoch [1/1], Step [12742/13804], Loss: 2.9012, Perplexity: 18.1959, time_taken_in_seconds: 34
Epoch [1/1], Step [12743/13804], Loss: 2.6113, Perplexity: 13.6164, time_taken_in_seconds: 35
Epoch [1/1], Step [12744/13804], Loss: 3.6669, Perplexity: 39.1311, time_taken_in_seconds: 36
Epoch [1/1], Step [12745/13804], Loss: 2.6941, Perplexity: 14.7922, time_taken_in_seconds: 37
Epoch [1/1], Step [12746/13804], Loss: 2.5119, Perplexity: 12.3283, time_taken_in_seconds: 37
Epoch [1/1], Step [12747/13804], Loss: 2.3688, Perplexity: 10.6846, time_taken_in_seconds: 38
Epoch [1/1], Step [12748/13804], Loss: 2.5317, Perplexity: 12.5745, time_taken_in_seconds: 39
Epoch [1/1], Step [12749/13804], Loss: 2.4213, Perplexity: 11.2601, time_taken_in_seconds: 40
Epoch [1/1], Step [12750/13804], Loss: 2.5943, Perplexity: 13.3866, time_taken_in_seconds: 41
Epoch [1/1], Step [12751/13804], Loss: 2.3030, Perplexity: 10.0040, time_taken_in_seconds: 41
Epoch [1/1], Step [12752/13804], Loss: 2.4998, Perplexity: 12.1799, time_taken_in_seconds: 42
Epoch [1/1], Step [12753/13804], Loss: 2.6801, Perplexity: 14.5859, time_taken_in_seconds: 43
Epoch [1/1], Step [12754/13804], Loss: 2.4898, Perplexity: 12.0593, time_taken_in_seconds: 44
Epoch [1/1], Step [12755/13804], Loss: 2.7490, Perplexity: 15.6273, time_taken_in_seconds: 45
Epoch [1/1], Step [12756/13804], Loss: 2.5547, Perplexity: 12.8679, time_taken_in_seconds: 45
Epoch [1/1], Step [12757/13804], Loss: 2.3738, Perplexity: 10.7380, time_taken_in_seconds: 46
Epoch [1/1], Step [12758/13804], Loss: 2.4883, Perplexity: 12.0405, time_taken_in_seconds: 47
Epoch [1/1], Step [12759/13804], Loss: 2.7749, Perplexity: 16.0367, time_taken_in_seconds: 48
Epoch [1/1], Step [12760/13804], Loss: 2.3001, Perplexity: 9.9747, time_taken_in_seconds: 49
Epoch [1/1], Step [12761/13804], Loss: 2.8698, Perplexity: 17.6342, time_taken_in_seconds: 50
Epoch [1/1], Step [12762/13804], Loss: 2.6755, Perplexity: 14.5190, time_taken_in_seconds: 50
Epoch [1/1], Step [12763/13804], Loss: 2.5788, Perplexity: 13.1812, time_taken_in_seconds: 51
Epoch [1/1], Step [12764/13804], Loss: 2.4535, Perplexity: 11.6286, time_taken_in_seconds: 52
Epoch [1/1], Step [12765/13804], Loss: 2.4370, Perplexity: 11.4382, time_taken_in_seconds: 53
Epoch [1/1], Step [12766/13804], Loss: 2.4895, Perplexity: 12.0552, time_taken_in_seconds: 54
Epoch [1/1], Step [12767/13804], Loss: 2.4194, Perplexity: 11.2386, time_taken_in_seconds: 54
Epoch [1/1], Step [12768/13804], Loss: 2.1975, Perplexity: 9.0021, time_taken_in_seconds: 55
Epoch [1/1], Step [12769/13804], Loss: 2.1674, Perplexity: 8.7354, time_taken_in_seconds: 56
Epoch [1/1], Step [12770/13804], Loss: 2.4851, Perplexity: 12.0021, time_taken_in_seconds: 57
Epoch [1/1], Step [12771/13804], Loss: 2.6503, Perplexity: 14.1581, time_taken_in_seconds: 58
Epoch [1/1], Step [12772/13804], Loss: 2.3515, Perplexity: 10.5011, time_taken_in_seconds: 59
Epoch [1/1], Step [12773/13804], Loss: 2.9997, Perplexity: 20.0786, time_taken_in_seconds: 59
Epoch [1/1], Step [12774/13804], Loss: 2.2992, Perplexity: 9.9664, time_taken_in_seconds: 60
Epoch [1/1], Step [12775/13804], Loss: 2.6210, Perplexity: 13.7501, time_taken_in_seconds: 61
Epoch [1/1], Step [12776/13804], Loss: 2.8698, Perplexity: 17.6342, time_taken_in_seconds: 62
Epoch [1/1], Step [12777/13804], Loss: 2.6063, Perplexity: 13.5493, time_taken_in_seconds: 63
Epoch [1/1], Step [12778/13804], Loss: 2.5275, Perplexity: 12.5217, time_taken_in_seconds: 63
Epoch [1/1], Step [12779/13804], Loss: 2.8329, Perplexity: 16.9946, time_taken_in_seconds: 64
Epoch [1/1], Step [12780/13804], Loss: 2.5234, Perplexity: 12.4712, time_taken_in_seconds: 65
Epoch [1/1], Step [12781/13804], Loss: 2.5780, Perplexity: 13.1712, time_taken_in_seconds: 66
Epoch [1/1], Step [12782/13804], Loss: 3.1409, Perplexity: 23.1252, time_taken_in_seconds: 67
Epoch [1/1], Step [12783/13804], Loss: 2.6253, Perplexity: 13.8085, time_taken_in_seconds: 68
Epoch [1/1], Step [12784/13804], Loss: 2.4127, Perplexity: 11.1642, time_taken_in_seconds: 68
Epoch [1/1], Step [12785/13804], Loss: 2.3522, Perplexity: 10.5087, time_taken_in_seconds: 69
Epoch [1/1], Step [12786/13804], Loss: 3.3681, Perplexity: 29.0240, time_taken_in_seconds: 70
Epoch [1/1], Step [12787/13804], Loss: 2.3183, Perplexity: 10.1580, time_taken_in_seconds: 71
Epoch [1/1], Step [12788/13804], Loss: 2.1725, Perplexity: 8.7799, time_taken_in_seconds: 72
Epoch [1/1], Step [12789/13804], Loss: 2.1903, Perplexity: 8.9379, time_taken_in_seconds: 73
Epoch [1/1], Step [12790/13804], Loss: 2.2753, Perplexity: 9.7304, time_taken_in_seconds: 73
Epoch [1/1], Step [12791/13804], Loss: 2.4143, Perplexity: 11.1817, time_taken_in_seconds: 74
Epoch [1/1], Step [12792/13804], Loss: 2.4375, Perplexity: 11.4449, time_taken_in_seconds: 75
Epoch [1/1], Step [12793/13804], Loss: 2.2613, Perplexity: 9.5957, time_taken_in_seconds: 76
Epoch [1/1], Step [12794/13804], Loss: 2.6195, Perplexity: 13.7288, time_taken_in_seconds: 77
Epoch [1/1], Step [12795/13804], Loss: 2.5496, Perplexity: 12.8024, time_taken_in_seconds: 78
Epoch [1/1], Step [12796/13804], Loss: 2.4036, Perplexity: 11.0628, time_taken_in_seconds: 78
Epoch [1/1], Step [12797/13804], Loss: 2.5182, Perplexity: 12.4062, time_taken_in_seconds: 79
Epoch [1/1], Step [12798/13804], Loss: 2.4021, Perplexity: 11.0462, time_taken_in_seconds: 80
Epoch [1/1], Step [12799/13804], Loss: 2.4112, Perplexity: 11.1478, time_taken_in_seconds: 81
Epoch [1/1], Step [12800/13804], Loss: 2.1244, Perplexity: 8.3678, time_taken_in_seconds: 82
Epoch [1/1], Step [12801/13804], Loss: 2.2771, Perplexity: 9.7486, time_taken_in_seconds: 0
Epoch [1/1], Step [12802/13804], Loss: 2.4006, Perplexity: 11.0295, time_taken_in_seconds: 1
Epoch [1/1], Step [12803/13804], Loss: 2.1796, Perplexity: 8.8429, time_taken_in_seconds: 2
Epoch [1/1], Step [12804/13804], Loss: 2.1062, Perplexity: 8.2168, time_taken_in_seconds: 3
Epoch [1/1], Step [12805/13804], Loss: 2.4655, Perplexity: 11.7695, time_taken_in_seconds: 4
Epoch [1/1], Step [12806/13804], Loss: 2.3294, Perplexity: 10.2722, time_taken_in_seconds: 4
Epoch [1/1], Step [12807/13804], Loss: 2.5780, Perplexity: 13.1713, time_taken_in_seconds: 5
Epoch [1/1], Step [12808/13804], Loss: 2.2679, Perplexity: 9.6590, time_taken_in_seconds: 6
Epoch [1/1], Step [12809/13804], Loss: 2.4838, Perplexity: 11.9871, time_taken_in_seconds: 7
Epoch [1/1], Step [12810/13804], Loss: 2.7908, Perplexity: 16.2942, time_taken_in_seconds: 8
Epoch [1/1], Step [12811/13804], Loss: 2.6683, Perplexity: 14.4152, time_taken_in_seconds: 8
Epoch [1/1], Step [12812/13804], Loss: 2.6337, Perplexity: 13.9254, time_taken_in_seconds: 9
Epoch [1/1], Step [12813/13804], Loss: 2.5705, Perplexity: 13.0718, time_taken_in_seconds: 10
Epoch [1/1], Step [12814/13804], Loss: 2.6498, Perplexity: 14.1506, time_taken_in_seconds: 11
Epoch [1/1], Step [12815/13804], Loss: 2.7241, Perplexity: 15.2424, time_taken_in_seconds: 12
Epoch [1/1], Step [12816/13804], Loss: 2.4684, Perplexity: 11.8030, time_taken_in_seconds: 13
Epoch [1/1], Step [12817/13804], Loss: 2.4593, Perplexity: 11.6964, time_taken_in_seconds: 13
Epoch [1/1], Step [12818/13804], Loss: 2.6082, Perplexity: 13.5741, time_taken_in_seconds: 14
Epoch [1/1], Step [12819/13804], Loss: 2.5249, Perplexity: 12.4897, time_taken_in_seconds: 15
Epoch [1/1], Step [12820/13804], Loss: 2.5374, Perplexity: 12.6463, time_taken_in_seconds: 16
Epoch [1/1], Step [12821/13804], Loss: 2.5962, Perplexity: 13.4124, time_taken_in_seconds: 17
Epoch [1/1], Step [12822/13804], Loss: 3.1219, Perplexity: 22.6892, time_taken_in_seconds: 17
Epoch [1/1], Step [12823/13804], Loss: 2.6241, Perplexity: 13.7925, time_taken_in_seconds: 18
Epoch [1/1], Step [12824/13804], Loss: 2.1923, Perplexity: 8.9561, time_taken_in_seconds: 19
Epoch [1/1], Step [12825/13804], Loss: 2.3626, Perplexity: 10.6188, time_taken_in_seconds: 20
Epoch [1/1], Step [12826/13804], Loss: 2.5549, Perplexity: 12.8698, time_taken_in_seconds: 21
Epoch [1/1], Step [12827/13804], Loss: 2.0761, Perplexity: 7.9732, time_taken_in_seconds: 22
Epoch [1/1], Step [12828/13804], Loss: 2.0945, Perplexity: 8.1211, time_taken_in_seconds: 22
Epoch [1/1], Step [12829/13804], Loss: 2.4124, Perplexity: 11.1612, time_taken_in_seconds: 23
Epoch [1/1], Step [12830/13804], Loss: 2.3701, Perplexity: 10.6983, time_taken_in_seconds: 24
Epoch [1/1], Step [12831/13804], Loss: 2.4353, Perplexity: 11.4198, time_taken_in_seconds: 25
Epoch [1/1], Step [12832/13804], Loss: 2.5798, Perplexity: 13.1948, time_taken_in_seconds: 26
Epoch [1/1], Step [12833/13804], Loss: 2.4412, Perplexity: 11.4873, time_taken_in_seconds: 26
Epoch [1/1], Step [12834/13804], Loss: 2.2054, Perplexity: 9.0742, time_taken_in_seconds: 27
Epoch [1/1], Step [12835/13804], Loss: 2.6370, Perplexity: 13.9714, time_taken_in_seconds: 28
Epoch [1/1], Step [12836/13804], Loss: 3.1700, Perplexity: 23.8067, time_taken_in_seconds: 29
Epoch [1/1], Step [12837/13804], Loss: 2.5046, Perplexity: 12.2383, time_taken_in_seconds: 30
Epoch [1/1], Step [12838/13804], Loss: 2.7387, Perplexity: 15.4666, time_taken_in_seconds: 31
Epoch [1/1], Step [12839/13804], Loss: 2.4501, Perplexity: 11.5890, time_taken_in_seconds: 31
Epoch [1/1], Step [12840/13804], Loss: 2.2981, Perplexity: 9.9557, time_taken_in_seconds: 32
Epoch [1/1], Step [12841/13804], Loss: 2.2387, Perplexity: 9.3812, time_taken_in_seconds: 33
Epoch [1/1], Step [12842/13804], Loss: 2.5429, Perplexity: 12.7161, time_taken_in_seconds: 34
Epoch [1/1], Step [12843/13804], Loss: 2.6781, Perplexity: 14.5571, time_taken_in_seconds: 35
Epoch [1/1], Step [12844/13804], Loss: 2.0973, Perplexity: 8.1443, time_taken_in_seconds: 35
Epoch [1/1], Step [12845/13804], Loss: 2.3886, Perplexity: 10.8981, time_taken_in_seconds: 36
Epoch [1/1], Step [12846/13804], Loss: 2.4368, Perplexity: 11.4364, time_taken_in_seconds: 37
Epoch [1/1], Step [12847/13804], Loss: 2.4724, Perplexity: 11.8514, time_taken_in_seconds: 38
Epoch [1/1], Step [12848/13804], Loss: 2.4621, Perplexity: 11.7292, time_taken_in_seconds: 39
Epoch [1/1], Step [12849/13804], Loss: 2.4004, Perplexity: 11.0274, time_taken_in_seconds: 39
Epoch [1/1], Step [12850/13804], Loss: 2.5346, Perplexity: 12.6112, time_taken_in_seconds: 40
Epoch [1/1], Step [12851/13804], Loss: 2.6979, Perplexity: 14.8492, time_taken_in_seconds: 41
Epoch [1/1], Step [12852/13804], Loss: 2.9923, Perplexity: 19.9324, time_taken_in_seconds: 42
Epoch [1/1], Step [12853/13804], Loss: 2.3582, Perplexity: 10.5717, time_taken_in_seconds: 43
Epoch [1/1], Step [12854/13804], Loss: 2.0892, Perplexity: 8.0788, time_taken_in_seconds: 44
Epoch [1/1], Step [12855/13804], Loss: 2.4037, Perplexity: 11.0645, time_taken_in_seconds: 44
Epoch [1/1], Step [12856/13804], Loss: 2.3883, Perplexity: 10.8948, time_taken_in_seconds: 45
Epoch [1/1], Step [12857/13804], Loss: 2.6500, Perplexity: 14.1539, time_taken_in_seconds: 46
Epoch [1/1], Step [12858/13804], Loss: 2.2827, Perplexity: 9.8030, time_taken_in_seconds: 47
Epoch [1/1], Step [12859/13804], Loss: 2.3745, Perplexity: 10.7453, time_taken_in_seconds: 48
Epoch [1/1], Step [12860/13804], Loss: 2.6031, Perplexity: 13.5062, time_taken_in_seconds: 48
Epoch [1/1], Step [12861/13804], Loss: 2.3805, Perplexity: 10.8098, time_taken_in_seconds: 49
Epoch [1/1], Step [12862/13804], Loss: 2.4304, Perplexity: 11.3633, time_taken_in_seconds: 50
Epoch [1/1], Step [12863/13804], Loss: 2.1686, Perplexity: 8.7460, time_taken_in_seconds: 51
Epoch [1/1], Step [12864/13804], Loss: 2.9827, Perplexity: 19.7417, time_taken_in_seconds: 52
Epoch [1/1], Step [12865/13804], Loss: 2.6602, Perplexity: 14.2992, time_taken_in_seconds: 53
Epoch [1/1], Step [12866/13804], Loss: 2.3159, Perplexity: 10.1335, time_taken_in_seconds: 53
Epoch [1/1], Step [12867/13804], Loss: 2.2413, Perplexity: 9.4057, time_taken_in_seconds: 54
Epoch [1/1], Step [12868/13804], Loss: 2.3456, Perplexity: 10.4392, time_taken_in_seconds: 55
Epoch [1/1], Step [12869/13804], Loss: 2.3569, Perplexity: 10.5578, time_taken_in_seconds: 56
Epoch [1/1], Step [12870/13804], Loss: 2.7203, Perplexity: 15.1847, time_taken_in_seconds: 57
Epoch [1/1], Step [12871/13804], Loss: 2.6571, Perplexity: 14.2543, time_taken_in_seconds: 57
Epoch [1/1], Step [12872/13804], Loss: 2.3450, Perplexity: 10.4330, time_taken_in_seconds: 58
Epoch [1/1], Step [12873/13804], Loss: 2.5680, Perplexity: 13.0401, time_taken_in_seconds: 59
Epoch [1/1], Step [12874/13804], Loss: 2.4973, Perplexity: 12.1497, time_taken_in_seconds: 60
Epoch [1/1], Step [12875/13804], Loss: 2.8040, Perplexity: 16.5102, time_taken_in_seconds: 61
Epoch [1/1], Step [12876/13804], Loss: 2.3011, Perplexity: 9.9856, time_taken_in_seconds: 62
Epoch [1/1], Step [12877/13804], Loss: 2.8631, Perplexity: 17.5159, time_taken_in_seconds: 62
Epoch [1/1], Step [12878/13804], Loss: 2.4375, Perplexity: 11.4449, time_taken_in_seconds: 63
Epoch [1/1], Step [12879/13804], Loss: 2.2277, Perplexity: 9.2789, time_taken_in_seconds: 64
Epoch [1/1], Step [12880/13804], Loss: 2.3540, Perplexity: 10.5275, time_taken_in_seconds: 65
Epoch [1/1], Step [12881/13804], Loss: 2.7426, Perplexity: 15.5275, time_taken_in_seconds: 66
Epoch [1/1], Step [12882/13804], Loss: 2.4051, Perplexity: 11.0791, time_taken_in_seconds: 66
Epoch [1/1], Step [12883/13804], Loss: 2.8128, Perplexity: 16.6570, time_taken_in_seconds: 67
Epoch [1/1], Step [12884/13804], Loss: 2.7695, Perplexity: 15.9500, time_taken_in_seconds: 68
Epoch [1/1], Step [12885/13804], Loss: 2.3978, Perplexity: 10.9985, time_taken_in_seconds: 69
Epoch [1/1], Step [12886/13804], Loss: 2.2521, Perplexity: 9.5079, time_taken_in_seconds: 70
Epoch [1/1], Step [12887/13804], Loss: 2.4242, Perplexity: 11.2934, time_taken_in_seconds: 71
Epoch [1/1], Step [12888/13804], Loss: 2.4760, Perplexity: 11.8935, time_taken_in_seconds: 71
Epoch [1/1], Step [12889/13804], Loss: 2.6382, Perplexity: 13.9881, time_taken_in_seconds: 72
Epoch [1/1], Step [12890/13804], Loss: 2.6043, Perplexity: 13.5218, time_taken_in_seconds: 73
Epoch [1/1], Step [12891/13804], Loss: 2.4623, Perplexity: 11.7321, time_taken_in_seconds: 74
Epoch [1/1], Step [12892/13804], Loss: 2.4438, Perplexity: 11.5164, time_taken_in_seconds: 75
Epoch [1/1], Step [12893/13804], Loss: 2.7710, Perplexity: 15.9743, time_taken_in_seconds: 75
Epoch [1/1], Step [12894/13804], Loss: 2.3191, Perplexity: 10.1663, time_taken_in_seconds: 76
Epoch [1/1], Step [12895/13804], Loss: 2.5217, Perplexity: 12.4497, time_taken_in_seconds: 77
Epoch [1/1], Step [12896/13804], Loss: 2.3948, Perplexity: 10.9661, time_taken_in_seconds: 78
Epoch [1/1], Step [12897/13804], Loss: 2.6919, Perplexity: 14.7593, time_taken_in_seconds: 79
Epoch [1/1], Step [12898/13804], Loss: 2.7150, Perplexity: 15.1042, time_taken_in_seconds: 80
Epoch [1/1], Step [12899/13804], Loss: 2.5099, Perplexity: 12.3032, time_taken_in_seconds: 80
Epoch [1/1], Step [12900/13804], Loss: 2.5138, Perplexity: 12.3520, time_taken_in_seconds: 81
Epoch [1/1], Step [12901/13804], Loss: 2.1545, Perplexity: 8.6235, time_taken_in_seconds: 0
Epoch [1/1], Step [12902/13804], Loss: 2.0942, Perplexity: 8.1188, time_taken_in_seconds: 1
Epoch [1/1], Step [12903/13804], Loss: 2.7840, Perplexity: 16.1841, time_taken_in_seconds: 2
Epoch [1/1], Step [12904/13804], Loss: 2.1041, Perplexity: 8.1993, time_taken_in_seconds: 3
Epoch [1/1], Step [12905/13804], Loss: 2.9319, Perplexity: 18.7630, time_taken_in_seconds: 4
Epoch [1/1], Step [12906/13804], Loss: 2.7108, Perplexity: 15.0416, time_taken_in_seconds: 4
Epoch [1/1], Step [12907/13804], Loss: 2.5209, Perplexity: 12.4400, time_taken_in_seconds: 5
Epoch [1/1], Step [12908/13804], Loss: 2.7485, Perplexity: 15.6195, time_taken_in_seconds: 6
Epoch [1/1], Step [12909/13804], Loss: 2.3471, Perplexity: 10.4548, time_taken_in_seconds: 7
Epoch [1/1], Step [12910/13804], Loss: 2.5157, Perplexity: 12.3755, time_taken_in_seconds: 8
Epoch [1/1], Step [12911/13804], Loss: 2.2275, Perplexity: 9.2763, time_taken_in_seconds: 8
Epoch [1/1], Step [12912/13804], Loss: 2.5476, Perplexity: 12.7762, time_taken_in_seconds: 9
Epoch [1/1], Step [12913/13804], Loss: 2.2569, Perplexity: 9.5536, time_taken_in_seconds: 10
Epoch [1/1], Step [12914/13804], Loss: 2.2739, Perplexity: 9.7172, time_taken_in_seconds: 11
Epoch [1/1], Step [12915/13804], Loss: 2.6553, Perplexity: 14.2293, time_taken_in_seconds: 12
Epoch [1/1], Step [12916/13804], Loss: 2.4844, Perplexity: 11.9942, time_taken_in_seconds: 12
Epoch [1/1], Step [12917/13804], Loss: 2.3106, Perplexity: 10.0802, time_taken_in_seconds: 13
Epoch [1/1], Step [12918/13804], Loss: 2.3449, Perplexity: 10.4324, time_taken_in_seconds: 14
Epoch [1/1], Step [12919/13804], Loss: 2.2597, Perplexity: 9.5798, time_taken_in_seconds: 15
Epoch [1/1], Step [12920/13804], Loss: 2.5276, Perplexity: 12.5237, time_taken_in_seconds: 16
Epoch [1/1], Step [12921/13804], Loss: 2.4543, Perplexity: 11.6377, time_taken_in_seconds: 17
Epoch [1/1], Step [12922/13804], Loss: 2.5202, Perplexity: 12.4307, time_taken_in_seconds: 17
Epoch [1/1], Step [12923/13804], Loss: 2.8762, Perplexity: 17.7464, time_taken_in_seconds: 18
Epoch [1/1], Step [12924/13804], Loss: 2.6376, Perplexity: 13.9793, time_taken_in_seconds: 19
Epoch [1/1], Step [12925/13804], Loss: 2.4636, Perplexity: 11.7473, time_taken_in_seconds: 20
Epoch [1/1], Step [12926/13804], Loss: 2.5196, Perplexity: 12.4238, time_taken_in_seconds: 21
Epoch [1/1], Step [12927/13804], Loss: 2.6669, Perplexity: 14.3955, time_taken_in_seconds: 21
Epoch [1/1], Step [12928/13804], Loss: 2.3663, Perplexity: 10.6578, time_taken_in_seconds: 22
Epoch [1/1], Step [12929/13804], Loss: 2.6313, Perplexity: 13.8923, time_taken_in_seconds: 23
Epoch [1/1], Step [12930/13804], Loss: 2.2000, Perplexity: 9.0246, time_taken_in_seconds: 24
Epoch [1/1], Step [12931/13804], Loss: 2.3685, Perplexity: 10.6816, time_taken_in_seconds: 25
Epoch [1/1], Step [12932/13804], Loss: 2.7963, Perplexity: 16.3845, time_taken_in_seconds: 26
Epoch [1/1], Step [12933/13804], Loss: 3.0693, Perplexity: 21.5266, time_taken_in_seconds: 26
Epoch [1/1], Step [12934/13804], Loss: 2.2355, Perplexity: 9.3514, time_taken_in_seconds: 27
Epoch [1/1], Step [12935/13804], Loss: 2.5838, Perplexity: 13.2473, time_taken_in_seconds: 28
Epoch [1/1], Step [12936/13804], Loss: 2.4147, Perplexity: 11.1864, time_taken_in_seconds: 29
Epoch [1/1], Step [12937/13804], Loss: 2.3838, Perplexity: 10.8457, time_taken_in_seconds: 30
Epoch [1/1], Step [12938/13804], Loss: 2.7357, Perplexity: 15.4206, time_taken_in_seconds: 31
Epoch [1/1], Step [12939/13804], Loss: 2.4393, Perplexity: 11.4655, time_taken_in_seconds: 31
Epoch [1/1], Step [12940/13804], Loss: 2.4238, Perplexity: 11.2892, time_taken_in_seconds: 32
Epoch [1/1], Step [12941/13804], Loss: 2.3177, Perplexity: 10.1520, time_taken_in_seconds: 33
Epoch [1/1], Step [12942/13804], Loss: 2.3086, Perplexity: 10.0606, time_taken_in_seconds: 34
Epoch [1/1], Step [12943/13804], Loss: 2.5487, Perplexity: 12.7903, time_taken_in_seconds: 35
Epoch [1/1], Step [12944/13804], Loss: 2.4326, Perplexity: 11.3886, time_taken_in_seconds: 35
Epoch [1/1], Step [12945/13804], Loss: 2.7237, Perplexity: 15.2367, time_taken_in_seconds: 36
Epoch [1/1], Step [12946/13804], Loss: 2.7880, Perplexity: 16.2489, time_taken_in_seconds: 37
Epoch [1/1], Step [12947/13804], Loss: 2.6123, Perplexity: 13.6308, time_taken_in_seconds: 38
Epoch [1/1], Step [12948/13804], Loss: 2.2261, Perplexity: 9.2633, time_taken_in_seconds: 39
Epoch [1/1], Step [12949/13804], Loss: 2.6311, Perplexity: 13.8890, time_taken_in_seconds: 39
Epoch [1/1], Step [12950/13804], Loss: 2.4876, Perplexity: 12.0321, time_taken_in_seconds: 40
Epoch [1/1], Step [12951/13804], Loss: 2.7380, Perplexity: 15.4553, time_taken_in_seconds: 41
Epoch [1/1], Step [12952/13804], Loss: 2.5439, Perplexity: 12.7291, time_taken_in_seconds: 42
Epoch [1/1], Step [12953/13804], Loss: 2.3971, Perplexity: 10.9911, time_taken_in_seconds: 43
Epoch [1/1], Step [12954/13804], Loss: 3.6494, Perplexity: 38.4529, time_taken_in_seconds: 44
Epoch [1/1], Step [12955/13804], Loss: 2.4932, Perplexity: 12.1001, time_taken_in_seconds: 44
Epoch [1/1], Step [12956/13804], Loss: 3.3049, Perplexity: 27.2465, time_taken_in_seconds: 45
Epoch [1/1], Step [12957/13804], Loss: 2.7372, Perplexity: 15.4431, time_taken_in_seconds: 46
Epoch [1/1], Step [12958/13804], Loss: 2.6398, Perplexity: 14.0105, time_taken_in_seconds: 47
Epoch [1/1], Step [12959/13804], Loss: 2.5210, Perplexity: 12.4406, time_taken_in_seconds: 48
Epoch [1/1], Step [12960/13804], Loss: 2.8371, Perplexity: 17.0668, time_taken_in_seconds: 48
Epoch [1/1], Step [12961/13804], Loss: 2.3053, Perplexity: 10.0267, time_taken_in_seconds: 49
Epoch [1/1], Step [12962/13804], Loss: 2.5953, Perplexity: 13.4006, time_taken_in_seconds: 50
Epoch [1/1], Step [12963/13804], Loss: 2.4769, Perplexity: 11.9048, time_taken_in_seconds: 51
Epoch [1/1], Step [12964/13804], Loss: 2.7124, Perplexity: 15.0647, time_taken_in_seconds: 52
Epoch [1/1], Step [12965/13804], Loss: 2.5603, Perplexity: 12.9396, time_taken_in_seconds: 52
Epoch [1/1], Step [12966/13804], Loss: 2.4503, Perplexity: 11.5922, time_taken_in_seconds: 53
Epoch [1/1], Step [12967/13804], Loss: 2.3466, Perplexity: 10.4503, time_taken_in_seconds: 54
Epoch [1/1], Step [12968/13804], Loss: 2.3942, Perplexity: 10.9598, time_taken_in_seconds: 55
Epoch [1/1], Step [12969/13804], Loss: 2.4113, Perplexity: 11.1486, time_taken_in_seconds: 56
Epoch [1/1], Step [12970/13804], Loss: 2.7159, Perplexity: 15.1187, time_taken_in_seconds: 56
Epoch [1/1], Step [12971/13804], Loss: 2.4497, Perplexity: 11.5846, time_taken_in_seconds: 57
Epoch [1/1], Step [12972/13804], Loss: 2.2463, Perplexity: 9.4528, time_taken_in_seconds: 58
Epoch [1/1], Step [12973/13804], Loss: 2.6098, Perplexity: 13.5957, time_taken_in_seconds: 59
Epoch [1/1], Step [12974/13804], Loss: 2.6988, Perplexity: 14.8613, time_taken_in_seconds: 60
Epoch [1/1], Step [12975/13804], Loss: 2.4476, Perplexity: 11.5609, time_taken_in_seconds: 61
Epoch [1/1], Step [12976/13804], Loss: 2.5391, Perplexity: 12.6684, time_taken_in_seconds: 61
Epoch [1/1], Step [12977/13804], Loss: 2.3349, Perplexity: 10.3288, time_taken_in_seconds: 62
Epoch [1/1], Step [12978/13804], Loss: 2.5977, Perplexity: 13.4334, time_taken_in_seconds: 63
Epoch [1/1], Step [12979/13804], Loss: 2.3238, Perplexity: 10.2148, time_taken_in_seconds: 64
Epoch [1/1], Step [12980/13804], Loss: 2.5122, Perplexity: 12.3324, time_taken_in_seconds: 65
Epoch [1/1], Step [12981/13804], Loss: 2.3897, Perplexity: 10.9102, time_taken_in_seconds: 65
Epoch [1/1], Step [12982/13804], Loss: 2.6768, Perplexity: 14.5390, time_taken_in_seconds: 66
Epoch [1/1], Step [12983/13804], Loss: 2.4324, Perplexity: 11.3856, time_taken_in_seconds: 67
Epoch [1/1], Step [12984/13804], Loss: 2.8120, Perplexity: 16.6434, time_taken_in_seconds: 68
Epoch [1/1], Step [12985/13804], Loss: 2.5443, Perplexity: 12.7337, time_taken_in_seconds: 69
Epoch [1/1], Step [12986/13804], Loss: 2.6658, Perplexity: 14.3801, time_taken_in_seconds: 69
Epoch [1/1], Step [12987/13804], Loss: 2.5600, Perplexity: 12.9357, time_taken_in_seconds: 70
Epoch [1/1], Step [12988/13804], Loss: 2.5809, Perplexity: 13.2096, time_taken_in_seconds: 71
Epoch [1/1], Step [12989/13804], Loss: 2.2347, Perplexity: 9.3441, time_taken_in_seconds: 72
Epoch [1/1], Step [12990/13804], Loss: 2.2074, Perplexity: 9.0925, time_taken_in_seconds: 73
Epoch [1/1], Step [12991/13804], Loss: 2.3435, Perplexity: 10.4174, time_taken_in_seconds: 73
Epoch [1/1], Step [12992/13804], Loss: 2.9336, Perplexity: 18.7945, time_taken_in_seconds: 74
Epoch [1/1], Step [12993/13804], Loss: 2.5677, Perplexity: 13.0355, time_taken_in_seconds: 75
Epoch [1/1], Step [12994/13804], Loss: 2.5197, Perplexity: 12.4249, time_taken_in_seconds: 76
Epoch [1/1], Step [12995/13804], Loss: 2.2595, Perplexity: 9.5785, time_taken_in_seconds: 77
Epoch [1/1], Step [12996/13804], Loss: 2.6792, Perplexity: 14.5737, time_taken_in_seconds: 78
Epoch [1/1], Step [12997/13804], Loss: 2.2377, Perplexity: 9.3716, time_taken_in_seconds: 78
Epoch [1/1], Step [12998/13804], Loss: 2.3066, Perplexity: 10.0407, time_taken_in_seconds: 79
Epoch [1/1], Step [12999/13804], Loss: 2.5609, Perplexity: 12.9477, time_taken_in_seconds: 80
Epoch [1/1], Step [13000/13804], Loss: 2.5371, Perplexity: 12.6427, time_taken_in_seconds: 81
Epoch [1/1], Step [13001/13804], Loss: 2.3339, Perplexity: 10.3183, time_taken_in_seconds: 0
Epoch [1/1], Step [13002/13804], Loss: 2.6580, Perplexity: 14.2678, time_taken_in_seconds: 1
Epoch [1/1], Step [13003/13804], Loss: 2.2694, Perplexity: 9.6736, time_taken_in_seconds: 2
Epoch [1/1], Step [13004/13804], Loss: 2.5266, Perplexity: 12.5107, time_taken_in_seconds: 3
Epoch [1/1], Step [13005/13804], Loss: 2.5003, Perplexity: 12.1863, time_taken_in_seconds: 4
Epoch [1/1], Step [13006/13804], Loss: 2.5275, Perplexity: 12.5218, time_taken_in_seconds: 4
Epoch [1/1], Step [13007/13804], Loss: 2.3933, Perplexity: 10.9495, time_taken_in_seconds: 5
Epoch [1/1], Step [13008/13804], Loss: 2.5524, Perplexity: 12.8378, time_taken_in_seconds: 6
Epoch [1/1], Step [13009/13804], Loss: 2.4587, Perplexity: 11.6895, time_taken_in_seconds: 7
Epoch [1/1], Step [13010/13804], Loss: 2.7067, Perplexity: 14.9804, time_taken_in_seconds: 8
Epoch [1/1], Step [13011/13804], Loss: 2.4549, Perplexity: 11.6453, time_taken_in_seconds: 9
Epoch [1/1], Step [13012/13804], Loss: 2.3813, Perplexity: 10.8191, time_taken_in_seconds: 9
Epoch [1/1], Step [13013/13804], Loss: 2.5803, Perplexity: 13.2013, time_taken_in_seconds: 10
Epoch [1/1], Step [13014/13804], Loss: 2.6532, Perplexity: 14.1997, time_taken_in_seconds: 11
Epoch [1/1], Step [13015/13804], Loss: 2.7012, Perplexity: 14.8979, time_taken_in_seconds: 12
Epoch [1/1], Step [13016/13804], Loss: 2.4440, Perplexity: 11.5192, time_taken_in_seconds: 13
Epoch [1/1], Step [13017/13804], Loss: 2.6892, Perplexity: 14.7196, time_taken_in_seconds: 13
Epoch [1/1], Step [13018/13804], Loss: 2.4011, Perplexity: 11.0357, time_taken_in_seconds: 14
Epoch [1/1], Step [13019/13804], Loss: 2.4389, Perplexity: 11.4599, time_taken_in_seconds: 15
Epoch [1/1], Step [13020/13804], Loss: 3.2325, Perplexity: 25.3428, time_taken_in_seconds: 16
Epoch [1/1], Step [13021/13804], Loss: 2.7699, Perplexity: 15.9565, time_taken_in_seconds: 17
Epoch [1/1], Step [13022/13804], Loss: 2.8910, Perplexity: 18.0115, time_taken_in_seconds: 18
Epoch [1/1], Step [13023/13804], Loss: 2.4379, Perplexity: 11.4492, time_taken_in_seconds: 18
Epoch [1/1], Step [13024/13804], Loss: 2.7610, Perplexity: 15.8155, time_taken_in_seconds: 19
Epoch [1/1], Step [13025/13804], Loss: 3.3992, Perplexity: 29.9409, time_taken_in_seconds: 20
Epoch [1/1], Step [13026/13804], Loss: 2.6463, Perplexity: 14.1020, time_taken_in_seconds: 21
Epoch [1/1], Step [13027/13804], Loss: 3.1042, Perplexity: 22.2915, time_taken_in_seconds: 22
Epoch [1/1], Step [13028/13804], Loss: 2.6001, Perplexity: 13.4650, time_taken_in_seconds: 22
Epoch [1/1], Step [13029/13804], Loss: 2.5347, Perplexity: 12.6124, time_taken_in_seconds: 23
Epoch [1/1], Step [13030/13804], Loss: 2.7766, Perplexity: 16.0640, time_taken_in_seconds: 24
Epoch [1/1], Step [13031/13804], Loss: 2.5026, Perplexity: 12.2146, time_taken_in_seconds: 25
Epoch [1/1], Step [13032/13804], Loss: 2.3982, Perplexity: 11.0031, time_taken_in_seconds: 26
Epoch [1/1], Step [13033/13804], Loss: 2.5812, Perplexity: 13.2125, time_taken_in_seconds: 27
Epoch [1/1], Step [13034/13804], Loss: 2.5720, Perplexity: 13.0919, time_taken_in_seconds: 27
Epoch [1/1], Step [13035/13804], Loss: 2.6346, Perplexity: 13.9381, time_taken_in_seconds: 28
Epoch [1/1], Step [13036/13804], Loss: 2.9326, Perplexity: 18.7757, time_taken_in_seconds: 29
Epoch [1/1], Step [13037/13804], Loss: 2.5380, Perplexity: 12.6545, time_taken_in_seconds: 30
Epoch [1/1], Step [13038/13804], Loss: 2.2584, Perplexity: 9.5680, time_taken_in_seconds: 31
Epoch [1/1], Step [13039/13804], Loss: 2.4850, Perplexity: 12.0017, time_taken_in_seconds: 31
Epoch [1/1], Step [13040/13804], Loss: 2.4096, Perplexity: 11.1294, time_taken_in_seconds: 32
Epoch [1/1], Step [13041/13804], Loss: 2.5159, Perplexity: 12.3773, time_taken_in_seconds: 33
Epoch [1/1], Step [13042/13804], Loss: 2.6198, Perplexity: 13.7325, time_taken_in_seconds: 34
Epoch [1/1], Step [13043/13804], Loss: 2.4468, Perplexity: 11.5514, time_taken_in_seconds: 35
Epoch [1/1], Step [13044/13804], Loss: 2.6113, Perplexity: 13.6169, time_taken_in_seconds: 35
Epoch [1/1], Step [13045/13804], Loss: 2.5432, Perplexity: 12.7205, time_taken_in_seconds: 36
Epoch [1/1], Step [13046/13804], Loss: 3.2212, Perplexity: 25.0593, time_taken_in_seconds: 37
Epoch [1/1], Step [13047/13804], Loss: 2.6269, Perplexity: 13.8312, time_taken_in_seconds: 38
Epoch [1/1], Step [13048/13804], Loss: 2.3360, Perplexity: 10.3397, time_taken_in_seconds: 39
Epoch [1/1], Step [13049/13804], Loss: 2.1823, Perplexity: 8.8663, time_taken_in_seconds: 40
Epoch [1/1], Step [13050/13804], Loss: 2.2638, Perplexity: 9.6196, time_taken_in_seconds: 40
Epoch [1/1], Step [13051/13804], Loss: 2.3410, Perplexity: 10.3913, time_taken_in_seconds: 41
Epoch [1/1], Step [13052/13804], Loss: 2.5600, Perplexity: 12.9360, time_taken_in_seconds: 42
Epoch [1/1], Step [13053/13804], Loss: 2.5642, Perplexity: 12.9900, time_taken_in_seconds: 43
Epoch [1/1], Step [13054/13804], Loss: 2.5388, Perplexity: 12.6647, time_taken_in_seconds: 44
Epoch [1/1], Step [13055/13804], Loss: 2.6349, Perplexity: 13.9417, time_taken_in_seconds: 44
Epoch [1/1], Step [13056/13804], Loss: 2.3903, Perplexity: 10.9170, time_taken_in_seconds: 45
Epoch [1/1], Step [13057/13804], Loss: 2.4927, Perplexity: 12.0940, time_taken_in_seconds: 46
Epoch [1/1], Step [13058/13804], Loss: 2.4855, Perplexity: 12.0070, time_taken_in_seconds: 47
Epoch [1/1], Step [13059/13804], Loss: 2.3184, Perplexity: 10.1596, time_taken_in_seconds: 48
Epoch [1/1], Step [13060/13804], Loss: 2.6353, Perplexity: 13.9474, time_taken_in_seconds: 48
Epoch [1/1], Step [13061/13804], Loss: 2.7459, Perplexity: 15.5787, time_taken_in_seconds: 49
Epoch [1/1], Step [13062/13804], Loss: 3.0958, Perplexity: 22.1043, time_taken_in_seconds: 50
Epoch [1/1], Step [13063/13804], Loss: 2.4092, Perplexity: 11.1248, time_taken_in_seconds: 51
Epoch [1/1], Step [13064/13804], Loss: 2.5079, Perplexity: 12.2790, time_taken_in_seconds: 52
Epoch [1/1], Step [13065/13804], Loss: 2.3135, Perplexity: 10.1098, time_taken_in_seconds: 52
Epoch [1/1], Step [13066/13804], Loss: 2.2046, Perplexity: 9.0669, time_taken_in_seconds: 53
Epoch [1/1], Step [13067/13804], Loss: 2.4046, Perplexity: 11.0735, time_taken_in_seconds: 54
Epoch [1/1], Step [13068/13804], Loss: 2.6372, Perplexity: 13.9745, time_taken_in_seconds: 55
Epoch [1/1], Step [13069/13804], Loss: 2.3705, Perplexity: 10.7023, time_taken_in_seconds: 56
Epoch [1/1], Step [13070/13804], Loss: 2.5811, Perplexity: 13.2111, time_taken_in_seconds: 57
Epoch [1/1], Step [13071/13804], Loss: 2.3923, Perplexity: 10.9387, time_taken_in_seconds: 57
Epoch [1/1], Step [13072/13804], Loss: 2.7712, Perplexity: 15.9773, time_taken_in_seconds: 58
Epoch [1/1], Step [13073/13804], Loss: 2.5634, Perplexity: 12.9803, time_taken_in_seconds: 59
Epoch [1/1], Step [13074/13804], Loss: 2.8019, Perplexity: 16.4758, time_taken_in_seconds: 60
Epoch [1/1], Step [13075/13804], Loss: 2.5147, Perplexity: 12.3633, time_taken_in_seconds: 61
Epoch [1/1], Step [13076/13804], Loss: 2.4980, Perplexity: 12.1577, time_taken_in_seconds: 61
Epoch [1/1], Step [13077/13804], Loss: 2.8453, Perplexity: 17.2066, time_taken_in_seconds: 62
Epoch [1/1], Step [13078/13804], Loss: 2.3820, Perplexity: 10.8263, time_taken_in_seconds: 63
Epoch [1/1], Step [13079/13804], Loss: 2.5481, Perplexity: 12.7832, time_taken_in_seconds: 64
Epoch [1/1], Step [13080/13804], Loss: 2.3525, Perplexity: 10.5121, time_taken_in_seconds: 65
Epoch [1/1], Step [13081/13804], Loss: 2.5265, Perplexity: 12.5095, time_taken_in_seconds: 65
Epoch [1/1], Step [13082/13804], Loss: 2.9711, Perplexity: 19.5135, time_taken_in_seconds: 66
Epoch [1/1], Step [13083/13804], Loss: 2.5386, Perplexity: 12.6614, time_taken_in_seconds: 67
Epoch [1/1], Step [13084/13804], Loss: 2.3407, Perplexity: 10.3887, time_taken_in_seconds: 68
Epoch [1/1], Step [13085/13804], Loss: 2.5879, Perplexity: 13.3021, time_taken_in_seconds: 69
Epoch [1/1], Step [13086/13804], Loss: 2.8163, Perplexity: 16.7148, time_taken_in_seconds: 70
Epoch [1/1], Step [13087/13804], Loss: 2.3276, Perplexity: 10.2537, time_taken_in_seconds: 70
Epoch [1/1], Step [13088/13804], Loss: 2.6733, Perplexity: 14.4870, time_taken_in_seconds: 71
Epoch [1/1], Step [13089/13804], Loss: 2.3642, Perplexity: 10.6358, time_taken_in_seconds: 72
Epoch [1/1], Step [13090/13804], Loss: 2.8148, Perplexity: 16.6899, time_taken_in_seconds: 73
Epoch [1/1], Step [13091/13804], Loss: 2.5218, Perplexity: 12.4515, time_taken_in_seconds: 74
Epoch [1/1], Step [13092/13804], Loss: 2.6017, Perplexity: 13.4871, time_taken_in_seconds: 75
Epoch [1/1], Step [13093/13804], Loss: 2.4645, Perplexity: 11.7572, time_taken_in_seconds: 75
Epoch [1/1], Step [13094/13804], Loss: 2.4787, Perplexity: 11.9261, time_taken_in_seconds: 76
Epoch [1/1], Step [13095/13804], Loss: 2.3811, Perplexity: 10.8167, time_taken_in_seconds: 77
Epoch [1/1], Step [13096/13804], Loss: 2.6806, Perplexity: 14.5935, time_taken_in_seconds: 78
Epoch [1/1], Step [13097/13804], Loss: 3.1099, Perplexity: 22.4182, time_taken_in_seconds: 79
Epoch [1/1], Step [13098/13804], Loss: 2.6778, Perplexity: 14.5527, time_taken_in_seconds: 79
Epoch [1/1], Step [13099/13804], Loss: 2.3469, Perplexity: 10.4534, time_taken_in_seconds: 80
Epoch [1/1], Step [13100/13804], Loss: 2.6304, Perplexity: 13.8789, time_taken_in_seconds: 81
Epoch [1/1], Step [13101/13804], Loss: 2.3708, Perplexity: 10.7056, time_taken_in_seconds: 0
Epoch [1/1], Step [13102/13804], Loss: 2.3874, Perplexity: 10.8849, time_taken_in_seconds: 1
Epoch [1/1], Step [13103/13804], Loss: 2.5838, Perplexity: 13.2474, time_taken_in_seconds: 2
Epoch [1/1], Step [13104/13804], Loss: 2.4256, Perplexity: 11.3090, time_taken_in_seconds: 3
Epoch [1/1], Step [13105/13804], Loss: 2.5408, Perplexity: 12.6901, time_taken_in_seconds: 4
Epoch [1/1], Step [13106/13804], Loss: 2.6144, Perplexity: 13.6595, time_taken_in_seconds: 4
Epoch [1/1], Step [13107/13804], Loss: 2.5011, Perplexity: 12.1955, time_taken_in_seconds: 5
Epoch [1/1], Step [13108/13804], Loss: 2.3783, Perplexity: 10.7863, time_taken_in_seconds: 6
Epoch [1/1], Step [13109/13804], Loss: 2.4439, Perplexity: 11.5176, time_taken_in_seconds: 7
Epoch [1/1], Step [13110/13804], Loss: 2.5275, Perplexity: 12.5216, time_taken_in_seconds: 8
Epoch [1/1], Step [13111/13804], Loss: 2.5536, Perplexity: 12.8527, time_taken_in_seconds: 8
Epoch [1/1], Step [13112/13804], Loss: 2.5612, Perplexity: 12.9512, time_taken_in_seconds: 9
Epoch [1/1], Step [13113/13804], Loss: 2.4894, Perplexity: 12.0538, time_taken_in_seconds: 10
Epoch [1/1], Step [13114/13804], Loss: 2.5326, Perplexity: 12.5861, time_taken_in_seconds: 11
Epoch [1/1], Step [13115/13804], Loss: 2.4484, Perplexity: 11.5701, time_taken_in_seconds: 12
Epoch [1/1], Step [13116/13804], Loss: 2.7013, Perplexity: 14.8985, time_taken_in_seconds: 13
Epoch [1/1], Step [13117/13804], Loss: 2.4710, Perplexity: 11.8343, time_taken_in_seconds: 13
Epoch [1/1], Step [13118/13804], Loss: 3.6514, Perplexity: 38.5294, time_taken_in_seconds: 14
Epoch [1/1], Step [13119/13804], Loss: 2.4336, Perplexity: 11.4003, time_taken_in_seconds: 15
Epoch [1/1], Step [13120/13804], Loss: 2.4041, Perplexity: 11.0680, time_taken_in_seconds: 16
Epoch [1/1], Step [13121/13804], Loss: 2.5611, Perplexity: 12.9504, time_taken_in_seconds: 17
Epoch [1/1], Step [13122/13804], Loss: 2.9766, Perplexity: 19.6202, time_taken_in_seconds: 18
Epoch [1/1], Step [13123/13804], Loss: 2.3654, Perplexity: 10.6478, time_taken_in_seconds: 18
Epoch [1/1], Step [13124/13804], Loss: 2.5426, Perplexity: 12.7126, time_taken_in_seconds: 19
Epoch [1/1], Step [13125/13804], Loss: 2.4813, Perplexity: 11.9569, time_taken_in_seconds: 20
Epoch [1/1], Step [13126/13804], Loss: 2.5128, Perplexity: 12.3398, time_taken_in_seconds: 21
Epoch [1/1], Step [13127/13804], Loss: 2.7229, Perplexity: 15.2238, time_taken_in_seconds: 22
Epoch [1/1], Step [13128/13804], Loss: 2.7629, Perplexity: 15.8460, time_taken_in_seconds: 22
Epoch [1/1], Step [13129/13804], Loss: 2.4518, Perplexity: 11.6088, time_taken_in_seconds: 23
Epoch [1/1], Step [13130/13804], Loss: 2.4747, Perplexity: 11.8784, time_taken_in_seconds: 24
Epoch [1/1], Step [13131/13804], Loss: 2.4745, Perplexity: 11.8753, time_taken_in_seconds: 25
Epoch [1/1], Step [13132/13804], Loss: 2.5866, Perplexity: 13.2852, time_taken_in_seconds: 26
Epoch [1/1], Step [13133/13804], Loss: 2.5907, Perplexity: 13.3394, time_taken_in_seconds: 27
Epoch [1/1], Step [13134/13804], Loss: 2.2931, Perplexity: 9.9061, time_taken_in_seconds: 27
Epoch [1/1], Step [13135/13804], Loss: 2.5676, Perplexity: 13.0345, time_taken_in_seconds: 28
Epoch [1/1], Step [13136/13804], Loss: 2.8396, Perplexity: 17.1097, time_taken_in_seconds: 29
Epoch [1/1], Step [13137/13804], Loss: 2.7470, Perplexity: 15.5950, time_taken_in_seconds: 30
Epoch [1/1], Step [13138/13804], Loss: 2.5264, Perplexity: 12.5079, time_taken_in_seconds: 31
Epoch [1/1], Step [13139/13804], Loss: 2.7891, Perplexity: 16.2659, time_taken_in_seconds: 31
Epoch [1/1], Step [13140/13804], Loss: 2.5732, Perplexity: 13.1079, time_taken_in_seconds: 32
Epoch [1/1], Step [13141/13804], Loss: 2.5217, Perplexity: 12.4496, time_taken_in_seconds: 33
Epoch [1/1], Step [13142/13804], Loss: 2.4337, Perplexity: 11.4008, time_taken_in_seconds: 34
Epoch [1/1], Step [13143/13804], Loss: 2.5078, Perplexity: 12.2774, time_taken_in_seconds: 35
Epoch [1/1], Step [13144/13804], Loss: 2.5556, Perplexity: 12.8784, time_taken_in_seconds: 36
Epoch [1/1], Step [13145/13804], Loss: 2.6411, Perplexity: 14.0282, time_taken_in_seconds: 36
Epoch [1/1], Step [13146/13804], Loss: 2.2849, Perplexity: 9.8248, time_taken_in_seconds: 37
Epoch [1/1], Step [13147/13804], Loss: 2.7199, Perplexity: 15.1786, time_taken_in_seconds: 38
Epoch [1/1], Step [13148/13804], Loss: 2.9158, Perplexity: 18.4637, time_taken_in_seconds: 39
Epoch [1/1], Step [13149/13804], Loss: 2.7493, Perplexity: 15.6317, time_taken_in_seconds: 40
Epoch [1/1], Step [13150/13804], Loss: 2.9574, Perplexity: 19.2485, time_taken_in_seconds: 40
Epoch [1/1], Step [13151/13804], Loss: 2.3526, Perplexity: 10.5133, time_taken_in_seconds: 41
Epoch [1/1], Step [13152/13804], Loss: 2.4596, Perplexity: 11.6998, time_taken_in_seconds: 42
Epoch [1/1], Step [13153/13804], Loss: 2.3039, Perplexity: 10.0129, time_taken_in_seconds: 43
Epoch [1/1], Step [13154/13804], Loss: 2.3044, Perplexity: 10.0177, time_taken_in_seconds: 44
Epoch [1/1], Step [13155/13804], Loss: 2.4933, Perplexity: 12.1009, time_taken_in_seconds: 44
Epoch [1/1], Step [13156/13804], Loss: 2.6785, Perplexity: 14.5637, time_taken_in_seconds: 45
Epoch [1/1], Step [13157/13804], Loss: 2.2795, Perplexity: 9.7721, time_taken_in_seconds: 46
Epoch [1/1], Step [13158/13804], Loss: 2.3539, Perplexity: 10.5264, time_taken_in_seconds: 47
Epoch [1/1], Step [13159/13804], Loss: 2.2257, Perplexity: 9.2602, time_taken_in_seconds: 48
Epoch [1/1], Step [13160/13804], Loss: 2.3014, Perplexity: 9.9880, time_taken_in_seconds: 49
Epoch [1/1], Step [13161/13804], Loss: 2.5527, Perplexity: 12.8412, time_taken_in_seconds: 50
Epoch [1/1], Step [13162/13804], Loss: 2.6151, Perplexity: 13.6689, time_taken_in_seconds: 50
Epoch [1/1], Step [13163/13804], Loss: 2.3903, Perplexity: 10.9173, time_taken_in_seconds: 51
Epoch [1/1], Step [13164/13804], Loss: 3.1983, Perplexity: 24.4919, time_taken_in_seconds: 52
Epoch [1/1], Step [13165/13804], Loss: 2.6576, Perplexity: 14.2622, time_taken_in_seconds: 53
Epoch [1/1], Step [13166/13804], Loss: 2.6291, Perplexity: 13.8610, time_taken_in_seconds: 54
Epoch [1/1], Step [13167/13804], Loss: 2.2396, Perplexity: 9.3892, time_taken_in_seconds: 55
Epoch [1/1], Step [13168/13804], Loss: 2.2324, Perplexity: 9.3224, time_taken_in_seconds: 55
Epoch [1/1], Step [13169/13804], Loss: 2.4676, Perplexity: 11.7941, time_taken_in_seconds: 56
Epoch [1/1], Step [13170/13804], Loss: 2.3776, Perplexity: 10.7789, time_taken_in_seconds: 57
Epoch [1/1], Step [13171/13804], Loss: 2.2558, Perplexity: 9.5431, time_taken_in_seconds: 58
Epoch [1/1], Step [13172/13804], Loss: 3.3337, Perplexity: 28.0417, time_taken_in_seconds: 59
Epoch [1/1], Step [13173/13804], Loss: 2.7620, Perplexity: 15.8308, time_taken_in_seconds: 59
Epoch [1/1], Step [13174/13804], Loss: 2.6082, Perplexity: 13.5750, time_taken_in_seconds: 60
Epoch [1/1], Step [13175/13804], Loss: 2.6134, Perplexity: 13.6455, time_taken_in_seconds: 61
Epoch [1/1], Step [13176/13804], Loss: 2.1931, Perplexity: 8.9631, time_taken_in_seconds: 62
Epoch [1/1], Step [13177/13804], Loss: 2.6920, Perplexity: 14.7605, time_taken_in_seconds: 63
Epoch [1/1], Step [13178/13804], Loss: 2.5545, Perplexity: 12.8647, time_taken_in_seconds: 63
Epoch [1/1], Step [13179/13804], Loss: 2.8081, Perplexity: 16.5782, time_taken_in_seconds: 64
Epoch [1/1], Step [13180/13804], Loss: 2.2286, Perplexity: 9.2870, time_taken_in_seconds: 65
Epoch [1/1], Step [13181/13804], Loss: 2.8989, Perplexity: 18.1537, time_taken_in_seconds: 66
Epoch [1/1], Step [13182/13804], Loss: 2.5043, Perplexity: 12.2354, time_taken_in_seconds: 67
Epoch [1/1], Step [13183/13804], Loss: 2.5614, Perplexity: 12.9540, time_taken_in_seconds: 68
Epoch [1/1], Step [13184/13804], Loss: 2.5002, Perplexity: 12.1853, time_taken_in_seconds: 68
Epoch [1/1], Step [13185/13804], Loss: 2.4018, Perplexity: 11.0425, time_taken_in_seconds: 69
Epoch [1/1], Step [13186/13804], Loss: 2.6396, Perplexity: 14.0082, time_taken_in_seconds: 70
Epoch [1/1], Step [13187/13804], Loss: 2.5371, Perplexity: 12.6429, time_taken_in_seconds: 71
Epoch [1/1], Step [13188/13804], Loss: 2.9077, Perplexity: 18.3143, time_taken_in_seconds: 72
Epoch [1/1], Step [13189/13804], Loss: 2.5217, Perplexity: 12.4497, time_taken_in_seconds: 72
Epoch [1/1], Step [13190/13804], Loss: 2.6042, Perplexity: 13.5208, time_taken_in_seconds: 73
Epoch [1/1], Step [13191/13804], Loss: 2.7337, Perplexity: 15.3903, time_taken_in_seconds: 74
Epoch [1/1], Step [13192/13804], Loss: 2.2497, Perplexity: 9.4850, time_taken_in_seconds: 75
Epoch [1/1], Step [13193/13804], Loss: 2.3669, Perplexity: 10.6641, time_taken_in_seconds: 76
Epoch [1/1], Step [13194/13804], Loss: 2.3944, Perplexity: 10.9613, time_taken_in_seconds: 76
Epoch [1/1], Step [13195/13804], Loss: 2.4433, Perplexity: 11.5112, time_taken_in_seconds: 77
Epoch [1/1], Step [13196/13804], Loss: 2.4099, Perplexity: 11.1326, time_taken_in_seconds: 78
Epoch [1/1], Step [13197/13804], Loss: 2.5081, Perplexity: 12.2818, time_taken_in_seconds: 79
Epoch [1/1], Step [13198/13804], Loss: 2.8545, Perplexity: 17.3651, time_taken_in_seconds: 80
Epoch [1/1], Step [13199/13804], Loss: 2.4233, Perplexity: 11.2826, time_taken_in_seconds: 81
Epoch [1/1], Step [13200/13804], Loss: 2.4647, Perplexity: 11.7601, time_taken_in_seconds: 81
Epoch [1/1], Step [13201/13804], Loss: 2.7056, Perplexity: 14.9633, time_taken_in_seconds: 0
Epoch [1/1], Step [13202/13804], Loss: 2.2670, Perplexity: 9.6503, time_taken_in_seconds: 1
Epoch [1/1], Step [13203/13804], Loss: 2.6067, Perplexity: 13.5539, time_taken_in_seconds: 2
Epoch [1/1], Step [13204/13804], Loss: 2.2838, Perplexity: 9.8142, time_taken_in_seconds: 3
Epoch [1/1], Step [13205/13804], Loss: 3.4529, Perplexity: 31.5907, time_taken_in_seconds: 4
Epoch [1/1], Step [13206/13804], Loss: 2.4916, Perplexity: 12.0806, time_taken_in_seconds: 4
Epoch [1/1], Step [13207/13804], Loss: 2.4918, Perplexity: 12.0826, time_taken_in_seconds: 5
Epoch [1/1], Step [13208/13804], Loss: 2.3996, Perplexity: 11.0183, time_taken_in_seconds: 6
Epoch [1/1], Step [13209/13804], Loss: 2.8897, Perplexity: 17.9878, time_taken_in_seconds: 7
Epoch [1/1], Step [13210/13804], Loss: 2.4973, Perplexity: 12.1492, time_taken_in_seconds: 8
Epoch [1/1], Step [13211/13804], Loss: 2.9795, Perplexity: 19.6784, time_taken_in_seconds: 8
Epoch [1/1], Step [13212/13804], Loss: 2.7965, Perplexity: 16.3868, time_taken_in_seconds: 9
Epoch [1/1], Step [13213/13804], Loss: 2.3134, Perplexity: 10.1090, time_taken_in_seconds: 10
Epoch [1/1], Step [13214/13804], Loss: 2.6419, Perplexity: 14.0394, time_taken_in_seconds: 11
Epoch [1/1], Step [13215/13804], Loss: 2.4472, Perplexity: 11.5564, time_taken_in_seconds: 12
Epoch [1/1], Step [13216/13804], Loss: 2.5676, Perplexity: 13.0343, time_taken_in_seconds: 13
Epoch [1/1], Step [13217/13804], Loss: 2.7148, Perplexity: 15.1012, time_taken_in_seconds: 13
Epoch [1/1], Step [13218/13804], Loss: 2.3763, Perplexity: 10.7652, time_taken_in_seconds: 14
Epoch [1/1], Step [13219/13804], Loss: 2.2809, Perplexity: 9.7858, time_taken_in_seconds: 15
Epoch [1/1], Step [13220/13804], Loss: 2.4996, Perplexity: 12.1781, time_taken_in_seconds: 16
Epoch [1/1], Step [13221/13804], Loss: 2.3893, Perplexity: 10.9055, time_taken_in_seconds: 17
Epoch [1/1], Step [13222/13804], Loss: 2.8707, Perplexity: 17.6496, time_taken_in_seconds: 18
Epoch [1/1], Step [13223/13804], Loss: 2.4390, Perplexity: 11.4612, time_taken_in_seconds: 18
Epoch [1/1], Step [13224/13804], Loss: 2.2955, Perplexity: 9.9297, time_taken_in_seconds: 19
Epoch [1/1], Step [13225/13804], Loss: 3.7548, Perplexity: 42.7243, time_taken_in_seconds: 20
Epoch [1/1], Step [13226/13804], Loss: 2.7735, Perplexity: 16.0139, time_taken_in_seconds: 21
Epoch [1/1], Step [13227/13804], Loss: 2.7581, Perplexity: 15.7695, time_taken_in_seconds: 22
Epoch [1/1], Step [13228/13804], Loss: 2.5131, Perplexity: 12.3426, time_taken_in_seconds: 22
Epoch [1/1], Step [13229/13804], Loss: 2.3066, Perplexity: 10.0406, time_taken_in_seconds: 23
Epoch [1/1], Step [13230/13804], Loss: 2.6815, Perplexity: 14.6068, time_taken_in_seconds: 24
Epoch [1/1], Step [13231/13804], Loss: 2.1242, Perplexity: 8.3665, time_taken_in_seconds: 25
Epoch [1/1], Step [13232/13804], Loss: 2.4300, Perplexity: 11.3593, time_taken_in_seconds: 26
Epoch [1/1], Step [13233/13804], Loss: 2.5829, Perplexity: 13.2360, time_taken_in_seconds: 27
Epoch [1/1], Step [13234/13804], Loss: 2.6817, Perplexity: 14.6092, time_taken_in_seconds: 27
Epoch [1/1], Step [13235/13804], Loss: 2.4227, Perplexity: 11.2767, time_taken_in_seconds: 28
Epoch [1/1], Step [13236/13804], Loss: 2.7152, Perplexity: 15.1074, time_taken_in_seconds: 29
Epoch [1/1], Step [13237/13804], Loss: 2.3817, Perplexity: 10.8232, time_taken_in_seconds: 30
Epoch [1/1], Step [13238/13804], Loss: 2.4600, Perplexity: 11.7044, time_taken_in_seconds: 31
Epoch [1/1], Step [13239/13804], Loss: 2.6811, Perplexity: 14.6014, time_taken_in_seconds: 31
Epoch [1/1], Step [13240/13804], Loss: 2.4124, Perplexity: 11.1602, time_taken_in_seconds: 32
Epoch [1/1], Step [13241/13804], Loss: 2.7651, Perplexity: 15.8801, time_taken_in_seconds: 33
Epoch [1/1], Step [13242/13804], Loss: 2.4432, Perplexity: 11.5096, time_taken_in_seconds: 34
Epoch [1/1], Step [13243/13804], Loss: 2.9833, Perplexity: 19.7528, time_taken_in_seconds: 35
Epoch [1/1], Step [13244/13804], Loss: 2.5581, Perplexity: 12.9116, time_taken_in_seconds: 35
Epoch [1/1], Step [13245/13804], Loss: 2.3693, Perplexity: 10.6897, time_taken_in_seconds: 36
Epoch [1/1], Step [13246/13804], Loss: 2.3949, Perplexity: 10.9673, time_taken_in_seconds: 37
Epoch [1/1], Step [13247/13804], Loss: 2.3921, Perplexity: 10.9368, time_taken_in_seconds: 38
Epoch [1/1], Step [13248/13804], Loss: 2.7796, Perplexity: 16.1122, time_taken_in_seconds: 39
Epoch [1/1], Step [13249/13804], Loss: 2.6658, Perplexity: 14.3791, time_taken_in_seconds: 40
Epoch [1/1], Step [13250/13804], Loss: 2.4648, Perplexity: 11.7612, time_taken_in_seconds: 40
Epoch [1/1], Step [13251/13804], Loss: 2.1460, Perplexity: 8.5509, time_taken_in_seconds: 41
Epoch [1/1], Step [13252/13804], Loss: 2.2223, Perplexity: 9.2283, time_taken_in_seconds: 42
Epoch [1/1], Step [13253/13804], Loss: 2.3671, Perplexity: 10.6665, time_taken_in_seconds: 43
Epoch [1/1], Step [13254/13804], Loss: 2.6023, Perplexity: 13.4944, time_taken_in_seconds: 44
Epoch [1/1], Step [13255/13804], Loss: 2.6411, Perplexity: 14.0280, time_taken_in_seconds: 44
Epoch [1/1], Step [13256/13804], Loss: 2.4225, Perplexity: 11.2744, time_taken_in_seconds: 45
Epoch [1/1], Step [13257/13804], Loss: 2.4995, Perplexity: 12.1766, time_taken_in_seconds: 46
Epoch [1/1], Step [13258/13804], Loss: 2.3554, Perplexity: 10.5429, time_taken_in_seconds: 47
Epoch [1/1], Step [13259/13804], Loss: 2.5731, Perplexity: 13.1060, time_taken_in_seconds: 48
Epoch [1/1], Step [13260/13804], Loss: 2.3088, Perplexity: 10.0622, time_taken_in_seconds: 49
Epoch [1/1], Step [13261/13804], Loss: 2.3113, Perplexity: 10.0872, time_taken_in_seconds: 49
Epoch [1/1], Step [13262/13804], Loss: 2.6709, Perplexity: 14.4531, time_taken_in_seconds: 50
Epoch [1/1], Step [13263/13804], Loss: 2.5873, Perplexity: 13.2939, time_taken_in_seconds: 51
Epoch [1/1], Step [13264/13804], Loss: 3.3105, Perplexity: 27.3981, time_taken_in_seconds: 52
Epoch [1/1], Step [13265/13804], Loss: 2.4788, Perplexity: 11.9274, time_taken_in_seconds: 53
Epoch [1/1], Step [13266/13804], Loss: 2.4632, Perplexity: 11.7425, time_taken_in_seconds: 53
Epoch [1/1], Step [13267/13804], Loss: 2.9656, Perplexity: 19.4069, time_taken_in_seconds: 54
Epoch [1/1], Step [13268/13804], Loss: 2.7110, Perplexity: 15.0444, time_taken_in_seconds: 55
Epoch [1/1], Step [13269/13804], Loss: 2.2564, Perplexity: 9.5486, time_taken_in_seconds: 56
Epoch [1/1], Step [13270/13804], Loss: 2.7339, Perplexity: 15.3935, time_taken_in_seconds: 57
Epoch [1/1], Step [13271/13804], Loss: 2.6818, Perplexity: 14.6107, time_taken_in_seconds: 58
Epoch [1/1], Step [13272/13804], Loss: 2.6633, Perplexity: 14.3433, time_taken_in_seconds: 58
Epoch [1/1], Step [13273/13804], Loss: 2.2403, Perplexity: 9.3958, time_taken_in_seconds: 59
Epoch [1/1], Step [13274/13804], Loss: 3.0841, Perplexity: 21.8472, time_taken_in_seconds: 60
Epoch [1/1], Step [13275/13804], Loss: 2.3465, Perplexity: 10.4492, time_taken_in_seconds: 61
Epoch [1/1], Step [13276/13804], Loss: 3.2386, Perplexity: 25.4977, time_taken_in_seconds: 62
Epoch [1/1], Step [13277/13804], Loss: 2.4526, Perplexity: 11.6189, time_taken_in_seconds: 62
Epoch [1/1], Step [13278/13804], Loss: 2.5244, Perplexity: 12.4829, time_taken_in_seconds: 63
Epoch [1/1], Step [13279/13804], Loss: 2.9421, Perplexity: 18.9560, time_taken_in_seconds: 64
Epoch [1/1], Step [13280/13804], Loss: 2.5012, Perplexity: 12.1968, time_taken_in_seconds: 65
Epoch [1/1], Step [13281/13804], Loss: 2.6297, Perplexity: 13.8689, time_taken_in_seconds: 66
Epoch [1/1], Step [13282/13804], Loss: 2.4055, Perplexity: 11.0835, time_taken_in_seconds: 67
Epoch [1/1], Step [13283/13804], Loss: 2.6658, Perplexity: 14.3795, time_taken_in_seconds: 67
Epoch [1/1], Step [13284/13804], Loss: 2.4672, Perplexity: 11.7897, time_taken_in_seconds: 68
Epoch [1/1], Step [13285/13804], Loss: 2.6615, Perplexity: 14.3184, time_taken_in_seconds: 69
Epoch [1/1], Step [13286/13804], Loss: 2.4024, Perplexity: 11.0499, time_taken_in_seconds: 70
Epoch [1/1], Step [13287/13804], Loss: 2.3781, Perplexity: 10.7840, time_taken_in_seconds: 71
Epoch [1/1], Step [13288/13804], Loss: 2.3917, Perplexity: 10.9315, time_taken_in_seconds: 72
Epoch [1/1], Step [13289/13804], Loss: 2.4182, Perplexity: 11.2254, time_taken_in_seconds: 72
Epoch [1/1], Step [13290/13804], Loss: 2.5540, Perplexity: 12.8589, time_taken_in_seconds: 73
Epoch [1/1], Step [13291/13804], Loss: 2.8576, Perplexity: 17.4191, time_taken_in_seconds: 74
Epoch [1/1], Step [13292/13804], Loss: 2.6205, Perplexity: 13.7426, time_taken_in_seconds: 75
Epoch [1/1], Step [13293/13804], Loss: 2.3638, Perplexity: 10.6317, time_taken_in_seconds: 76
Epoch [1/1], Step [13294/13804], Loss: 2.0991, Perplexity: 8.1588, time_taken_in_seconds: 76
Epoch [1/1], Step [13295/13804], Loss: 2.5381, Perplexity: 12.6555, time_taken_in_seconds: 77
Epoch [1/1], Step [13296/13804], Loss: 2.2027, Perplexity: 9.0498, time_taken_in_seconds: 78
Epoch [1/1], Step [13297/13804], Loss: 2.1964, Perplexity: 8.9930, time_taken_in_seconds: 79
Epoch [1/1], Step [13298/13804], Loss: 2.8455, Perplexity: 17.2102, time_taken_in_seconds: 80
Epoch [1/1], Step [13299/13804], Loss: 2.1899, Perplexity: 8.9342, time_taken_in_seconds: 81
Epoch [1/1], Step [13300/13804], Loss: 2.1306, Perplexity: 8.4199, time_taken_in_seconds: 81
Epoch [1/1], Step [13301/13804], Loss: 2.2303, Perplexity: 9.3031, time_taken_in_seconds: 0
Epoch [1/1], Step [13302/13804], Loss: 2.2766, Perplexity: 9.7433, time_taken_in_seconds: 1
Epoch [1/1], Step [13303/13804], Loss: 2.3737, Perplexity: 10.7368, time_taken_in_seconds: 2
Epoch [1/1], Step [13304/13804], Loss: 2.8562, Perplexity: 17.3945, time_taken_in_seconds: 3
Epoch [1/1], Step [13305/13804], Loss: 2.7326, Perplexity: 15.3733, time_taken_in_seconds: 4
Epoch [1/1], Step [13306/13804], Loss: 2.5928, Perplexity: 13.3665, time_taken_in_seconds: 5
Epoch [1/1], Step [13307/13804], Loss: 2.2835, Perplexity: 9.8113, time_taken_in_seconds: 5
Epoch [1/1], Step [13308/13804], Loss: 2.6181, Perplexity: 13.7096, time_taken_in_seconds: 6
Epoch [1/1], Step [13309/13804], Loss: 2.6250, Perplexity: 13.8041, time_taken_in_seconds: 7
Epoch [1/1], Step [13310/13804], Loss: 2.7052, Perplexity: 14.9578, time_taken_in_seconds: 8
Epoch [1/1], Step [13311/13804], Loss: 2.2633, Perplexity: 9.6147, time_taken_in_seconds: 9
Epoch [1/1], Step [13312/13804], Loss: 2.3819, Perplexity: 10.8255, time_taken_in_seconds: 9
Epoch [1/1], Step [13313/13804], Loss: 2.2551, Perplexity: 9.5363, time_taken_in_seconds: 10
Epoch [1/1], Step [13314/13804], Loss: 2.3920, Perplexity: 10.9349, time_taken_in_seconds: 11
Epoch [1/1], Step [13315/13804], Loss: 2.2329, Perplexity: 9.3273, time_taken_in_seconds: 12
Epoch [1/1], Step [13316/13804], Loss: 2.4599, Perplexity: 11.7031, time_taken_in_seconds: 13
Epoch [1/1], Step [13317/13804], Loss: 2.6227, Perplexity: 13.7731, time_taken_in_seconds: 14
Epoch [1/1], Step [13318/13804], Loss: 2.4424, Perplexity: 11.5003, time_taken_in_seconds: 14
Epoch [1/1], Step [13319/13804], Loss: 2.4057, Perplexity: 11.0857, time_taken_in_seconds: 15
Epoch [1/1], Step [13320/13804], Loss: 2.4091, Perplexity: 11.1242, time_taken_in_seconds: 16
Epoch [1/1], Step [13321/13804], Loss: 3.1499, Perplexity: 23.3348, time_taken_in_seconds: 17
Epoch [1/1], Step [13322/13804], Loss: 2.5793, Perplexity: 13.1876, time_taken_in_seconds: 18
Epoch [1/1], Step [13323/13804], Loss: 2.1722, Perplexity: 8.7780, time_taken_in_seconds: 18
Epoch [1/1], Step [13324/13804], Loss: 2.4218, Perplexity: 11.2665, time_taken_in_seconds: 19
Epoch [1/1], Step [13325/13804], Loss: 2.4744, Perplexity: 11.8749, time_taken_in_seconds: 20
Epoch [1/1], Step [13326/13804], Loss: 2.4193, Perplexity: 11.2379, time_taken_in_seconds: 21
Epoch [1/1], Step [13327/13804], Loss: 2.3296, Perplexity: 10.2743, time_taken_in_seconds: 22
Epoch [1/1], Step [13328/13804], Loss: 2.3101, Perplexity: 10.0750, time_taken_in_seconds: 22
Epoch [1/1], Step [13329/13804], Loss: 2.5586, Perplexity: 12.9171, time_taken_in_seconds: 23
Epoch [1/1], Step [13330/13804], Loss: 2.5922, Perplexity: 13.3589, time_taken_in_seconds: 24
Epoch [1/1], Step [13331/13804], Loss: 2.2917, Perplexity: 9.8914, time_taken_in_seconds: 25
Epoch [1/1], Step [13332/13804], Loss: 2.3966, Perplexity: 10.9861, time_taken_in_seconds: 26
Epoch [1/1], Step [13333/13804], Loss: 2.6529, Perplexity: 14.1957, time_taken_in_seconds: 26
Epoch [1/1], Step [13334/13804], Loss: 2.6257, Perplexity: 13.8136, time_taken_in_seconds: 27
Epoch [1/1], Step [13335/13804], Loss: 2.3530, Perplexity: 10.5173, time_taken_in_seconds: 28
Epoch [1/1], Step [13336/13804], Loss: 2.5030, Perplexity: 12.2188, time_taken_in_seconds: 29
Epoch [1/1], Step [13337/13804], Loss: 2.6894, Perplexity: 14.7235, time_taken_in_seconds: 30
Epoch [1/1], Step [13338/13804], Loss: 2.4871, Perplexity: 12.0263, time_taken_in_seconds: 31
Epoch [1/1], Step [13339/13804], Loss: 2.5330, Perplexity: 12.5914, time_taken_in_seconds: 31
Epoch [1/1], Step [13340/13804], Loss: 2.9554, Perplexity: 19.2091, time_taken_in_seconds: 32
Epoch [1/1], Step [13341/13804], Loss: 2.5425, Perplexity: 12.7113, time_taken_in_seconds: 33
Epoch [1/1], Step [13342/13804], Loss: 2.3342, Perplexity: 10.3217, time_taken_in_seconds: 34
Epoch [1/1], Step [13343/13804], Loss: 3.0590, Perplexity: 21.3061, time_taken_in_seconds: 35
Epoch [1/1], Step [13344/13804], Loss: 2.4955, Perplexity: 12.1274, time_taken_in_seconds: 35
Epoch [1/1], Step [13345/13804], Loss: 2.5660, Perplexity: 13.0142, time_taken_in_seconds: 36
Epoch [1/1], Step [13346/13804], Loss: 3.1013, Perplexity: 22.2258, time_taken_in_seconds: 37
Epoch [1/1], Step [13347/13804], Loss: 2.5760, Perplexity: 13.1438, time_taken_in_seconds: 38
Epoch [1/1], Step [13348/13804], Loss: 2.2305, Perplexity: 9.3048, time_taken_in_seconds: 39
Epoch [1/1], Step [13349/13804], Loss: 2.3255, Perplexity: 10.2321, time_taken_in_seconds: 40
Epoch [1/1], Step [13350/13804], Loss: 2.6299, Perplexity: 13.8720, time_taken_in_seconds: 40
Epoch [1/1], Step [13351/13804], Loss: 2.2450, Perplexity: 9.4405, time_taken_in_seconds: 41
Epoch [1/1], Step [13352/13804], Loss: 2.3687, Perplexity: 10.6831, time_taken_in_seconds: 42
Epoch [1/1], Step [13353/13804], Loss: 2.5072, Perplexity: 12.2711, time_taken_in_seconds: 43
Epoch [1/1], Step [13354/13804], Loss: 2.4488, Perplexity: 11.5750, time_taken_in_seconds: 44
Epoch [1/1], Step [13355/13804], Loss: 2.5438, Perplexity: 12.7276, time_taken_in_seconds: 44
Epoch [1/1], Step [13356/13804], Loss: 2.5169, Perplexity: 12.3896, time_taken_in_seconds: 45
Epoch [1/1], Step [13357/13804], Loss: 2.1455, Perplexity: 8.5460, time_taken_in_seconds: 46
Epoch [1/1], Step [13358/13804], Loss: 2.1607, Perplexity: 8.6776, time_taken_in_seconds: 47
Epoch [1/1], Step [13359/13804], Loss: 2.0858, Perplexity: 8.0507, time_taken_in_seconds: 48
Epoch [1/1], Step [13360/13804], Loss: 2.4484, Perplexity: 11.5693, time_taken_in_seconds: 49
Epoch [1/1], Step [13361/13804], Loss: 2.6340, Perplexity: 13.9288, time_taken_in_seconds: 49
Epoch [1/1], Step [13362/13804], Loss: 2.6722, Perplexity: 14.4721, time_taken_in_seconds: 50
Epoch [1/1], Step [13363/13804], Loss: 2.7134, Perplexity: 15.0809, time_taken_in_seconds: 51
Epoch [1/1], Step [13364/13804], Loss: 2.6841, Perplexity: 14.6450, time_taken_in_seconds: 52
Epoch [1/1], Step [13365/13804], Loss: 2.5301, Perplexity: 12.5554, time_taken_in_seconds: 53
Epoch [1/1], Step [13366/13804], Loss: 2.4253, Perplexity: 11.3057, time_taken_in_seconds: 53
Epoch [1/1], Step [13367/13804], Loss: 2.2684, Perplexity: 9.6638, time_taken_in_seconds: 54
Epoch [1/1], Step [13368/13804], Loss: 2.4605, Perplexity: 11.7104, time_taken_in_seconds: 55
Epoch [1/1], Step [13369/13804], Loss: 2.3322, Perplexity: 10.3004, time_taken_in_seconds: 56
Epoch [1/1], Step [13370/13804], Loss: 2.3790, Perplexity: 10.7940, time_taken_in_seconds: 57
Epoch [1/1], Step [13371/13804], Loss: 2.6084, Perplexity: 13.5776, time_taken_in_seconds: 57
Epoch [1/1], Step [13372/13804], Loss: 2.5483, Perplexity: 12.7848, time_taken_in_seconds: 58
Epoch [1/1], Step [13373/13804], Loss: 2.7961, Perplexity: 16.3805, time_taken_in_seconds: 59
Epoch [1/1], Step [13374/13804], Loss: 2.1601, Perplexity: 8.6721, time_taken_in_seconds: 60
Epoch [1/1], Step [13375/13804], Loss: 2.7026, Perplexity: 14.9180, time_taken_in_seconds: 61
Epoch [1/1], Step [13376/13804], Loss: 2.5624, Perplexity: 12.9668, time_taken_in_seconds: 62
Epoch [1/1], Step [13377/13804], Loss: 2.7279, Perplexity: 15.3001, time_taken_in_seconds: 62
Epoch [1/1], Step [13378/13804], Loss: 2.4326, Perplexity: 11.3882, time_taken_in_seconds: 63
Epoch [1/1], Step [13379/13804], Loss: 2.3661, Perplexity: 10.6555, time_taken_in_seconds: 64
Epoch [1/1], Step [13380/13804], Loss: 2.4683, Perplexity: 11.8020, time_taken_in_seconds: 65
Epoch [1/1], Step [13381/13804], Loss: 2.5551, Perplexity: 12.8729, time_taken_in_seconds: 66
Epoch [1/1], Step [13382/13804], Loss: 2.6459, Perplexity: 14.0959, time_taken_in_seconds: 67
Epoch [1/1], Step [13383/13804], Loss: 2.4078, Perplexity: 11.1094, time_taken_in_seconds: 68
Epoch [1/1], Step [13384/13804], Loss: 2.4566, Perplexity: 11.6651, time_taken_in_seconds: 68
Epoch [1/1], Step [13385/13804], Loss: 2.4777, Perplexity: 11.9142, time_taken_in_seconds: 69
Epoch [1/1], Step [13386/13804], Loss: 2.5461, Perplexity: 12.7578, time_taken_in_seconds: 70
Epoch [1/1], Step [13387/13804], Loss: 2.5951, Perplexity: 13.3985, time_taken_in_seconds: 71
Epoch [1/1], Step [13388/13804], Loss: 2.2677, Perplexity: 9.6576, time_taken_in_seconds: 72
Epoch [1/1], Step [13389/13804], Loss: 2.5416, Perplexity: 12.7001, time_taken_in_seconds: 72
Epoch [1/1], Step [13390/13804], Loss: 2.5958, Perplexity: 13.4075, time_taken_in_seconds: 73
Epoch [1/1], Step [13391/13804], Loss: 2.4898, Perplexity: 12.0591, time_taken_in_seconds: 74
Epoch [1/1], Step [13392/13804], Loss: 2.2144, Perplexity: 9.1564, time_taken_in_seconds: 75
Epoch [1/1], Step [13393/13804], Loss: 2.5569, Perplexity: 12.8957, time_taken_in_seconds: 76
Epoch [1/1], Step [13394/13804], Loss: 2.8169, Perplexity: 16.7255, time_taken_in_seconds: 77
Epoch [1/1], Step [13395/13804], Loss: 2.2380, Perplexity: 9.3749, time_taken_in_seconds: 77
Epoch [1/1], Step [13396/13804], Loss: 2.4036, Perplexity: 11.0629, time_taken_in_seconds: 78
Epoch [1/1], Step [13397/13804], Loss: 2.9110, Perplexity: 18.3751, time_taken_in_seconds: 79
Epoch [1/1], Step [13398/13804], Loss: 2.5227, Perplexity: 12.4622, time_taken_in_seconds: 80
Epoch [1/1], Step [13399/13804], Loss: 2.3988, Perplexity: 11.0101, time_taken_in_seconds: 81
Epoch [1/1], Step [13400/13804], Loss: 2.6541, Perplexity: 14.2117, time_taken_in_seconds: 81
Epoch [1/1], Step [13401/13804], Loss: 2.7914, Perplexity: 16.3031, time_taken_in_seconds: 0
Epoch [1/1], Step [13402/13804], Loss: 2.5675, Perplexity: 13.0328, time_taken_in_seconds: 1
Epoch [1/1], Step [13403/13804], Loss: 2.3159, Perplexity: 10.1341, time_taken_in_seconds: 2
Epoch [1/1], Step [13404/13804], Loss: 2.3718, Perplexity: 10.7166, time_taken_in_seconds: 3
Epoch [1/1], Step [13405/13804], Loss: 2.6138, Perplexity: 13.6503, time_taken_in_seconds: 4
Epoch [1/1], Step [13406/13804], Loss: 2.4613, Perplexity: 11.7197, time_taken_in_seconds: 4
Epoch [1/1], Step [13407/13804], Loss: 2.8886, Perplexity: 17.9686, time_taken_in_seconds: 5
Epoch [1/1], Step [13408/13804], Loss: 2.4356, Perplexity: 11.4221, time_taken_in_seconds: 6
Epoch [1/1], Step [13409/13804], Loss: 2.5916, Perplexity: 13.3509, time_taken_in_seconds: 7
Epoch [1/1], Step [13410/13804], Loss: 2.3621, Perplexity: 10.6127, time_taken_in_seconds: 8
Epoch [1/1], Step [13411/13804], Loss: 2.5013, Perplexity: 12.1983, time_taken_in_seconds: 9
Epoch [1/1], Step [13412/13804], Loss: 2.4122, Perplexity: 11.1586, time_taken_in_seconds: 9
Epoch [1/1], Step [13413/13804], Loss: 2.2440, Perplexity: 9.4308, time_taken_in_seconds: 10
Epoch [1/1], Step [13414/13804], Loss: 2.3664, Perplexity: 10.6592, time_taken_in_seconds: 11
Epoch [1/1], Step [13415/13804], Loss: 2.1808, Perplexity: 8.8537, time_taken_in_seconds: 12
Epoch [1/1], Step [13416/13804], Loss: 2.5689, Perplexity: 13.0517, time_taken_in_seconds: 13
Epoch [1/1], Step [13417/13804], Loss: 2.9458, Perplexity: 19.0266, time_taken_in_seconds: 13
Epoch [1/1], Step [13418/13804], Loss: 2.3000, Perplexity: 9.9743, time_taken_in_seconds: 14
Epoch [1/1], Step [13419/13804], Loss: 2.5775, Perplexity: 13.1646, time_taken_in_seconds: 15
Epoch [1/1], Step [13420/13804], Loss: 2.3821, Perplexity: 10.8274, time_taken_in_seconds: 16
Epoch [1/1], Step [13421/13804], Loss: 2.8464, Perplexity: 17.2257, time_taken_in_seconds: 17
Epoch [1/1], Step [13422/13804], Loss: 2.3900, Perplexity: 10.9138, time_taken_in_seconds: 18
Epoch [1/1], Step [13423/13804], Loss: 2.2819, Perplexity: 9.7954, time_taken_in_seconds: 18
Epoch [1/1], Step [13424/13804], Loss: 2.3372, Perplexity: 10.3527, time_taken_in_seconds: 19
Epoch [1/1], Step [13425/13804], Loss: 2.5504, Perplexity: 12.8119, time_taken_in_seconds: 20
Epoch [1/1], Step [13426/13804], Loss: 2.9735, Perplexity: 19.5610, time_taken_in_seconds: 21
Epoch [1/1], Step [13427/13804], Loss: 2.3702, Perplexity: 10.6992, time_taken_in_seconds: 22
Epoch [1/1], Step [13428/13804], Loss: 2.5153, Perplexity: 12.3708, time_taken_in_seconds: 22
Epoch [1/1], Step [13429/13804], Loss: 2.4078, Perplexity: 11.1093, time_taken_in_seconds: 23
Epoch [1/1], Step [13430/13804], Loss: 2.2301, Perplexity: 9.3010, time_taken_in_seconds: 24
Epoch [1/1], Step [13431/13804], Loss: 2.2193, Perplexity: 9.2011, time_taken_in_seconds: 25
Epoch [1/1], Step [13432/13804], Loss: 2.4360, Perplexity: 11.4275, time_taken_in_seconds: 26
Epoch [1/1], Step [13433/13804], Loss: 2.6799, Perplexity: 14.5836, time_taken_in_seconds: 26
Epoch [1/1], Step [13434/13804], Loss: 2.5369, Perplexity: 12.6403, time_taken_in_seconds: 27
Epoch [1/1], Step [13435/13804], Loss: 2.2586, Perplexity: 9.5697, time_taken_in_seconds: 28
Epoch [1/1], Step [13436/13804], Loss: 2.5274, Perplexity: 12.5213, time_taken_in_seconds: 29
Epoch [1/1], Step [13437/13804], Loss: 2.6235, Perplexity: 13.7836, time_taken_in_seconds: 30
Epoch [1/1], Step [13438/13804], Loss: 2.8006, Perplexity: 16.4540, time_taken_in_seconds: 31
Epoch [1/1], Step [13439/13804], Loss: 2.3593, Perplexity: 10.5839, time_taken_in_seconds: 31
Epoch [1/1], Step [13440/13804], Loss: 2.4196, Perplexity: 11.2413, time_taken_in_seconds: 32
Epoch [1/1], Step [13441/13804], Loss: 2.2700, Perplexity: 9.6789, time_taken_in_seconds: 33
Epoch [1/1], Step [13442/13804], Loss: 2.9661, Perplexity: 19.4166, time_taken_in_seconds: 34
Epoch [1/1], Step [13443/13804], Loss: 2.4105, Perplexity: 11.1394, time_taken_in_seconds: 35
Epoch [1/1], Step [13444/13804], Loss: 2.6979, Perplexity: 14.8490, time_taken_in_seconds: 36
Epoch [1/1], Step [13445/13804], Loss: 2.4333, Perplexity: 11.3960, time_taken_in_seconds: 36
Epoch [1/1], Step [13446/13804], Loss: 2.3880, Perplexity: 10.8916, time_taken_in_seconds: 37
Epoch [1/1], Step [13447/13804], Loss: 2.2668, Perplexity: 9.6482, time_taken_in_seconds: 38
Epoch [1/1], Step [13448/13804], Loss: 2.2086, Perplexity: 9.1028, time_taken_in_seconds: 39
Epoch [1/1], Step [13449/13804], Loss: 2.2998, Perplexity: 9.9719, time_taken_in_seconds: 40
Epoch [1/1], Step [13450/13804], Loss: 3.3669, Perplexity: 28.9876, time_taken_in_seconds: 40
Epoch [1/1], Step [13451/13804], Loss: 2.7524, Perplexity: 15.6808, time_taken_in_seconds: 41
Epoch [1/1], Step [13452/13804], Loss: 2.7776, Perplexity: 16.0797, time_taken_in_seconds: 42
Epoch [1/1], Step [13453/13804], Loss: 2.4606, Perplexity: 11.7118, time_taken_in_seconds: 43
Epoch [1/1], Step [13454/13804], Loss: 2.5708, Perplexity: 13.0763, time_taken_in_seconds: 44
Epoch [1/1], Step [13455/13804], Loss: 2.3537, Perplexity: 10.5248, time_taken_in_seconds: 45
Epoch [1/1], Step [13456/13804], Loss: 2.6755, Perplexity: 14.5202, time_taken_in_seconds: 46
Epoch [1/1], Step [13457/13804], Loss: 2.3391, Perplexity: 10.3723, time_taken_in_seconds: 46
Epoch [1/1], Step [13458/13804], Loss: 2.3155, Perplexity: 10.1297, time_taken_in_seconds: 47
Epoch [1/1], Step [13459/13804], Loss: 2.3167, Perplexity: 10.1425, time_taken_in_seconds: 48
Epoch [1/1], Step [13460/13804], Loss: 2.2852, Perplexity: 9.8272, time_taken_in_seconds: 49
Epoch [1/1], Step [13461/13804], Loss: 2.5428, Perplexity: 12.7147, time_taken_in_seconds: 50
Epoch [1/1], Step [13462/13804], Loss: 2.4117, Perplexity: 11.1528, time_taken_in_seconds: 51
Epoch [1/1], Step [13463/13804], Loss: 2.5315, Perplexity: 12.5723, time_taken_in_seconds: 51
Epoch [1/1], Step [13464/13804], Loss: 2.6447, Perplexity: 14.0788, time_taken_in_seconds: 52
Epoch [1/1], Step [13465/13804], Loss: 2.3926, Perplexity: 10.9414, time_taken_in_seconds: 53
Epoch [1/1], Step [13466/13804], Loss: 2.7377, Perplexity: 15.4512, time_taken_in_seconds: 54
Epoch [1/1], Step [13467/13804], Loss: 2.6714, Perplexity: 14.4606, time_taken_in_seconds: 55
Epoch [1/1], Step [13468/13804], Loss: 2.3829, Perplexity: 10.8363, time_taken_in_seconds: 55
Epoch [1/1], Step [13469/13804], Loss: 2.6805, Perplexity: 14.5925, time_taken_in_seconds: 56
Epoch [1/1], Step [13470/13804], Loss: 2.6997, Perplexity: 14.8756, time_taken_in_seconds: 57
Epoch [1/1], Step [13471/13804], Loss: 3.1163, Perplexity: 22.5625, time_taken_in_seconds: 58
Epoch [1/1], Step [13472/13804], Loss: 2.8068, Perplexity: 16.5570, time_taken_in_seconds: 59
Epoch [1/1], Step [13473/13804], Loss: 2.7095, Perplexity: 15.0223, time_taken_in_seconds: 60
Epoch [1/1], Step [13474/13804], Loss: 2.8183, Perplexity: 16.7476, time_taken_in_seconds: 60
Epoch [1/1], Step [13475/13804], Loss: 2.3776, Perplexity: 10.7790, time_taken_in_seconds: 61
Epoch [1/1], Step [13476/13804], Loss: 2.9778, Perplexity: 19.6439, time_taken_in_seconds: 62
Epoch [1/1], Step [13477/13804], Loss: 2.2110, Perplexity: 9.1245, time_taken_in_seconds: 63
Epoch [1/1], Step [13478/13804], Loss: 2.5386, Perplexity: 12.6624, time_taken_in_seconds: 64
Epoch [1/1], Step [13479/13804], Loss: 2.2469, Perplexity: 9.4587, time_taken_in_seconds: 64
Epoch [1/1], Step [13480/13804], Loss: 2.5301, Perplexity: 12.5547, time_taken_in_seconds: 65
Epoch [1/1], Step [13481/13804], Loss: 2.5900, Perplexity: 13.3303, time_taken_in_seconds: 66
Epoch [1/1], Step [13482/13804], Loss: 2.5166, Perplexity: 12.3869, time_taken_in_seconds: 67
Epoch [1/1], Step [13483/13804], Loss: 2.2497, Perplexity: 9.4852, time_taken_in_seconds: 68
Epoch [1/1], Step [13484/13804], Loss: 2.5064, Perplexity: 12.2607, time_taken_in_seconds: 68
Epoch [1/1], Step [13485/13804], Loss: 2.2653, Perplexity: 9.6337, time_taken_in_seconds: 69
Epoch [1/1], Step [13486/13804], Loss: 2.6940, Perplexity: 14.7904, time_taken_in_seconds: 70
Epoch [1/1], Step [13487/13804], Loss: 2.6041, Perplexity: 13.5190, time_taken_in_seconds: 71
Epoch [1/1], Step [13488/13804], Loss: 2.5415, Perplexity: 12.6987, time_taken_in_seconds: 72
Epoch [1/1], Step [13489/13804], Loss: 2.6348, Perplexity: 13.9410, time_taken_in_seconds: 73
Epoch [1/1], Step [13490/13804], Loss: 2.3587, Perplexity: 10.5773, time_taken_in_seconds: 73
Epoch [1/1], Step [13491/13804], Loss: 2.6034, Perplexity: 13.5098, time_taken_in_seconds: 74
Epoch [1/1], Step [13492/13804], Loss: 2.5245, Perplexity: 12.4850, time_taken_in_seconds: 75
Epoch [1/1], Step [13493/13804], Loss: 2.3000, Perplexity: 9.9740, time_taken_in_seconds: 76
Epoch [1/1], Step [13494/13804], Loss: 2.3161, Perplexity: 10.1361, time_taken_in_seconds: 77
Epoch [1/1], Step [13495/13804], Loss: 2.5746, Perplexity: 13.1265, time_taken_in_seconds: 77
Epoch [1/1], Step [13496/13804], Loss: 2.2491, Perplexity: 9.4788, time_taken_in_seconds: 78
Epoch [1/1], Step [13497/13804], Loss: 2.2158, Perplexity: 9.1692, time_taken_in_seconds: 79
Epoch [1/1], Step [13498/13804], Loss: 2.7654, Perplexity: 15.8854, time_taken_in_seconds: 80
Epoch [1/1], Step [13499/13804], Loss: 2.2665, Perplexity: 9.6460, time_taken_in_seconds: 81
Epoch [1/1], Step [13500/13804], Loss: 2.2945, Perplexity: 9.9198, time_taken_in_seconds: 81
Epoch [1/1], Step [13501/13804], Loss: 2.6598, Perplexity: 14.2935, time_taken_in_seconds: 0
Epoch [1/1], Step [13502/13804], Loss: 2.2677, Perplexity: 9.6568, time_taken_in_seconds: 1
Epoch [1/1], Step [13503/13804], Loss: 2.5941, Perplexity: 13.3844, time_taken_in_seconds: 2
Epoch [1/1], Step [13504/13804], Loss: 2.8383, Perplexity: 17.0875, time_taken_in_seconds: 3
Epoch [1/1], Step [13505/13804], Loss: 2.7264, Perplexity: 15.2781, time_taken_in_seconds: 4
Epoch [1/1], Step [13506/13804], Loss: 2.0751, Perplexity: 7.9653, time_taken_in_seconds: 4
Epoch [1/1], Step [13507/13804], Loss: 2.6866, Perplexity: 14.6812, time_taken_in_seconds: 5
Epoch [1/1], Step [13508/13804], Loss: 2.8215, Perplexity: 16.8027, time_taken_in_seconds: 6
Epoch [1/1], Step [13509/13804], Loss: 2.5404, Perplexity: 12.6848, time_taken_in_seconds: 7
Epoch [1/1], Step [13510/13804], Loss: 3.1506, Perplexity: 23.3496, time_taken_in_seconds: 8
Epoch [1/1], Step [13511/13804], Loss: 2.2866, Perplexity: 9.8412, time_taken_in_seconds: 8
Epoch [1/1], Step [13512/13804], Loss: 2.6984, Perplexity: 14.8555, time_taken_in_seconds: 9
Epoch [1/1], Step [13513/13804], Loss: 2.2766, Perplexity: 9.7438, time_taken_in_seconds: 10
Epoch [1/1], Step [13514/13804], Loss: 2.5415, Perplexity: 12.6988, time_taken_in_seconds: 11
Epoch [1/1], Step [13515/13804], Loss: 2.4682, Perplexity: 11.8016, time_taken_in_seconds: 12
Epoch [1/1], Step [13516/13804], Loss: 2.3917, Perplexity: 10.9316, time_taken_in_seconds: 13
Epoch [1/1], Step [13517/13804], Loss: 2.4040, Perplexity: 11.0677, time_taken_in_seconds: 13
Epoch [1/1], Step [13518/13804], Loss: 2.4551, Perplexity: 11.6471, time_taken_in_seconds: 14
Epoch [1/1], Step [13519/13804], Loss: 2.3185, Perplexity: 10.1599, time_taken_in_seconds: 15
Epoch [1/1], Step [13520/13804], Loss: 2.5419, Perplexity: 12.7039, time_taken_in_seconds: 16
Epoch [1/1], Step [13521/13804], Loss: 2.4784, Perplexity: 11.9228, time_taken_in_seconds: 17
Epoch [1/1], Step [13522/13804], Loss: 2.7492, Perplexity: 15.6295, time_taken_in_seconds: 17
Epoch [1/1], Step [13523/13804], Loss: 2.5645, Perplexity: 12.9946, time_taken_in_seconds: 18
Epoch [1/1], Step [13524/13804], Loss: 2.5308, Perplexity: 12.5635, time_taken_in_seconds: 19
Epoch [1/1], Step [13525/13804], Loss: 2.2150, Perplexity: 9.1613, time_taken_in_seconds: 20
Epoch [1/1], Step [13526/13804], Loss: 2.3820, Perplexity: 10.8260, time_taken_in_seconds: 21
Epoch [1/1], Step [13527/13804], Loss: 2.4109, Perplexity: 11.1440, time_taken_in_seconds: 22
Epoch [1/1], Step [13528/13804], Loss: 2.6539, Perplexity: 14.2100, time_taken_in_seconds: 22
Epoch [1/1], Step [13529/13804], Loss: 2.3523, Perplexity: 10.5095, time_taken_in_seconds: 23
Epoch [1/1], Step [13530/13804], Loss: 2.1202, Perplexity: 8.3329, time_taken_in_seconds: 24
Epoch [1/1], Step [13531/13804], Loss: 2.1489, Perplexity: 8.5751, time_taken_in_seconds: 25
Epoch [1/1], Step [13532/13804], Loss: 2.7941, Perplexity: 16.3479, time_taken_in_seconds: 26
Epoch [1/1], Step [13533/13804], Loss: 2.5717, Perplexity: 13.0883, time_taken_in_seconds: 27
Epoch [1/1], Step [13534/13804], Loss: 2.3467, Perplexity: 10.4508, time_taken_in_seconds: 27
Epoch [1/1], Step [13535/13804], Loss: 2.3043, Perplexity: 10.0172, time_taken_in_seconds: 28
Epoch [1/1], Step [13536/13804], Loss: 2.3500, Perplexity: 10.4858, time_taken_in_seconds: 29
Epoch [1/1], Step [13537/13804], Loss: 2.6657, Perplexity: 14.3784, time_taken_in_seconds: 30
Epoch [1/1], Step [13538/13804], Loss: 2.6962, Perplexity: 14.8236, time_taken_in_seconds: 31
Epoch [1/1], Step [13539/13804], Loss: 2.5152, Perplexity: 12.3692, time_taken_in_seconds: 31
Epoch [1/1], Step [13540/13804], Loss: 2.7978, Perplexity: 16.4088, time_taken_in_seconds: 32
Epoch [1/1], Step [13541/13804], Loss: 2.3961, Perplexity: 10.9808, time_taken_in_seconds: 33
Epoch [1/1], Step [13542/13804], Loss: 2.5516, Perplexity: 12.8272, time_taken_in_seconds: 34
Epoch [1/1], Step [13543/13804], Loss: 2.3462, Perplexity: 10.4461, time_taken_in_seconds: 35
Epoch [1/1], Step [13544/13804], Loss: 2.9965, Perplexity: 20.0155, time_taken_in_seconds: 35
Epoch [1/1], Step [13545/13804], Loss: 2.2497, Perplexity: 9.4852, time_taken_in_seconds: 36
Epoch [1/1], Step [13546/13804], Loss: 2.5779, Perplexity: 13.1688, time_taken_in_seconds: 37
Epoch [1/1], Step [13547/13804], Loss: 2.5049, Perplexity: 12.2425, time_taken_in_seconds: 38
Epoch [1/1], Step [13548/13804], Loss: 2.5696, Perplexity: 13.0608, time_taken_in_seconds: 39
Epoch [1/1], Step [13549/13804], Loss: 2.2765, Perplexity: 9.7423, time_taken_in_seconds: 40
Epoch [1/1], Step [13550/13804], Loss: 2.3634, Perplexity: 10.6274, time_taken_in_seconds: 40
Epoch [1/1], Step [13551/13804], Loss: 2.7946, Perplexity: 16.3553, time_taken_in_seconds: 41
Epoch [1/1], Step [13552/13804], Loss: 2.8586, Perplexity: 17.4369, time_taken_in_seconds: 42
Epoch [1/1], Step [13553/13804], Loss: 2.8267, Perplexity: 16.8900, time_taken_in_seconds: 43
Epoch [1/1], Step [13554/13804], Loss: 2.4212, Perplexity: 11.2590, time_taken_in_seconds: 44
Epoch [1/1], Step [13555/13804], Loss: 2.6087, Perplexity: 13.5808, time_taken_in_seconds: 44
Epoch [1/1], Step [13556/13804], Loss: 2.3584, Perplexity: 10.5743, time_taken_in_seconds: 45
Epoch [1/1], Step [13557/13804], Loss: 2.6118, Perplexity: 13.6238, time_taken_in_seconds: 46
Epoch [1/1], Step [13558/13804], Loss: 3.1112, Perplexity: 22.4477, time_taken_in_seconds: 47
Epoch [1/1], Step [13559/13804], Loss: 2.6197, Perplexity: 13.7312, time_taken_in_seconds: 48
Epoch [1/1], Step [13560/13804], Loss: 2.4339, Perplexity: 11.4038, time_taken_in_seconds: 49
Epoch [1/1], Step [13561/13804], Loss: 2.3058, Perplexity: 10.0321, time_taken_in_seconds: 49
Epoch [1/1], Step [13562/13804], Loss: 3.1509, Perplexity: 23.3568, time_taken_in_seconds: 50
Epoch [1/1], Step [13563/13804], Loss: 2.7677, Perplexity: 15.9216, time_taken_in_seconds: 51
Epoch [1/1], Step [13564/13804], Loss: 2.0009, Perplexity: 7.3958, time_taken_in_seconds: 52
Epoch [1/1], Step [13565/13804], Loss: 2.3214, Perplexity: 10.1904, time_taken_in_seconds: 53
Epoch [1/1], Step [13566/13804], Loss: 2.8977, Perplexity: 18.1330, time_taken_in_seconds: 53
Epoch [1/1], Step [13567/13804], Loss: 2.5896, Perplexity: 13.3239, time_taken_in_seconds: 54
Epoch [1/1], Step [13568/13804], Loss: 2.9451, Perplexity: 19.0124, time_taken_in_seconds: 55
Epoch [1/1], Step [13569/13804], Loss: 2.1512, Perplexity: 8.5954, time_taken_in_seconds: 56
Epoch [1/1], Step [13570/13804], Loss: 2.1291, Perplexity: 8.4075, time_taken_in_seconds: 57
Epoch [1/1], Step [13571/13804], Loss: 2.5804, Perplexity: 13.2022, time_taken_in_seconds: 58
Epoch [1/1], Step [13572/13804], Loss: 2.6466, Perplexity: 14.1055, time_taken_in_seconds: 58
Epoch [1/1], Step [13573/13804], Loss: 2.8303, Perplexity: 16.9512, time_taken_in_seconds: 59
Epoch [1/1], Step [13574/13804], Loss: 2.4075, Perplexity: 11.1057, time_taken_in_seconds: 60
Epoch [1/1], Step [13575/13804], Loss: 2.6274, Perplexity: 13.8380, time_taken_in_seconds: 61
Epoch [1/1], Step [13576/13804], Loss: 2.5378, Perplexity: 12.6518, time_taken_in_seconds: 62
Epoch [1/1], Step [13577/13804], Loss: 2.6445, Perplexity: 14.0764, time_taken_in_seconds: 62
Epoch [1/1], Step [13578/13804], Loss: 2.2615, Perplexity: 9.5971, time_taken_in_seconds: 63
Epoch [1/1], Step [13579/13804], Loss: 2.2872, Perplexity: 9.8475, time_taken_in_seconds: 64
Epoch [1/1], Step [13580/13804], Loss: 2.4040, Perplexity: 11.0670, time_taken_in_seconds: 65
Epoch [1/1], Step [13581/13804], Loss: 2.4733, Perplexity: 11.8616, time_taken_in_seconds: 66
Epoch [1/1], Step [13582/13804], Loss: 2.4098, Perplexity: 11.1321, time_taken_in_seconds: 66
Epoch [1/1], Step [13583/13804], Loss: 2.4866, Perplexity: 12.0203, time_taken_in_seconds: 67
Epoch [1/1], Step [13584/13804], Loss: 2.2670, Perplexity: 9.6500, time_taken_in_seconds: 68
Epoch [1/1], Step [13585/13804], Loss: 2.5706, Perplexity: 13.0743, time_taken_in_seconds: 69
Epoch [1/1], Step [13586/13804], Loss: 2.4108, Perplexity: 11.1431, time_taken_in_seconds: 70
Epoch [1/1], Step [13587/13804], Loss: 2.5970, Perplexity: 13.4240, time_taken_in_seconds: 70
Epoch [1/1], Step [13588/13804], Loss: 2.6232, Perplexity: 13.7792, time_taken_in_seconds: 71
Epoch [1/1], Step [13589/13804], Loss: 2.6036, Perplexity: 13.5128, time_taken_in_seconds: 72
Epoch [1/1], Step [13590/13804], Loss: 2.1681, Perplexity: 8.7420, time_taken_in_seconds: 73
Epoch [1/1], Step [13591/13804], Loss: 2.4793, Perplexity: 11.9323, time_taken_in_seconds: 74
Epoch [1/1], Step [13592/13804], Loss: 2.6215, Perplexity: 13.7559, time_taken_in_seconds: 75
Epoch [1/1], Step [13593/13804], Loss: 2.6321, Perplexity: 13.9028, time_taken_in_seconds: 75
Epoch [1/1], Step [13594/13804], Loss: 2.1743, Perplexity: 8.7960, time_taken_in_seconds: 76
Epoch [1/1], Step [13595/13804], Loss: 2.6621, Perplexity: 14.3267, time_taken_in_seconds: 77
Epoch [1/1], Step [13596/13804], Loss: 2.2739, Perplexity: 9.7169, time_taken_in_seconds: 78
Epoch [1/1], Step [13597/13804], Loss: 2.7103, Perplexity: 15.0334, time_taken_in_seconds: 79
Epoch [1/1], Step [13598/13804], Loss: 2.6650, Perplexity: 14.3683, time_taken_in_seconds: 79
Epoch [1/1], Step [13599/13804], Loss: 3.3246, Perplexity: 27.7890, time_taken_in_seconds: 80
Epoch [1/1], Step [13600/13804], Loss: 2.5167, Perplexity: 12.3873, time_taken_in_seconds: 81
Epoch [1/1], Step [13601/13804], Loss: 2.6283, Perplexity: 13.8499, time_taken_in_seconds: 1
Epoch [1/1], Step [13602/13804], Loss: 2.6469, Perplexity: 14.1107, time_taken_in_seconds: 1
Epoch [1/1], Step [13603/13804], Loss: 2.4243, Perplexity: 11.2943, time_taken_in_seconds: 2
Epoch [1/1], Step [13604/13804], Loss: 2.4870, Perplexity: 12.0257, time_taken_in_seconds: 3
Epoch [1/1], Step [13605/13804], Loss: 3.1080, Perplexity: 22.3756, time_taken_in_seconds: 4
Epoch [1/1], Step [13606/13804], Loss: 2.4021, Perplexity: 11.0460, time_taken_in_seconds: 5
Epoch [1/1], Step [13607/13804], Loss: 2.8302, Perplexity: 16.9486, time_taken_in_seconds: 5
Epoch [1/1], Step [13608/13804], Loss: 2.2001, Perplexity: 9.0258, time_taken_in_seconds: 6
Epoch [1/1], Step [13609/13804], Loss: 2.6869, Perplexity: 14.6863, time_taken_in_seconds: 7
Epoch [1/1], Step [13610/13804], Loss: 2.1339, Perplexity: 8.4479, time_taken_in_seconds: 8
Epoch [1/1], Step [13611/13804], Loss: 2.5372, Perplexity: 12.6444, time_taken_in_seconds: 9
Epoch [1/1], Step [13612/13804], Loss: 2.7341, Perplexity: 15.3961, time_taken_in_seconds: 9
Epoch [1/1], Step [13613/13804], Loss: 2.5383, Perplexity: 12.6576, time_taken_in_seconds: 10
Epoch [1/1], Step [13614/13804], Loss: 2.2652, Perplexity: 9.6332, time_taken_in_seconds: 11
Epoch [1/1], Step [13615/13804], Loss: 3.2368, Perplexity: 25.4528, time_taken_in_seconds: 12
Epoch [1/1], Step [13616/13804], Loss: 2.3051, Perplexity: 10.0256, time_taken_in_seconds: 13
Epoch [1/1], Step [13617/13804], Loss: 2.3908, Perplexity: 10.9226, time_taken_in_seconds: 14
Epoch [1/1], Step [13618/13804], Loss: 2.6317, Perplexity: 13.8973, time_taken_in_seconds: 14
Epoch [1/1], Step [13619/13804], Loss: 2.6346, Perplexity: 13.9382, time_taken_in_seconds: 15
Epoch [1/1], Step [13620/13804], Loss: 2.5292, Perplexity: 12.5436, time_taken_in_seconds: 16
Epoch [1/1], Step [13621/13804], Loss: 2.6801, Perplexity: 14.5859, time_taken_in_seconds: 17
Epoch [1/1], Step [13622/13804], Loss: 2.4827, Perplexity: 11.9736, time_taken_in_seconds: 18
Epoch [1/1], Step [13623/13804], Loss: 2.5153, Perplexity: 12.3698, time_taken_in_seconds: 18
Epoch [1/1], Step [13624/13804], Loss: 2.4691, Perplexity: 11.8123, time_taken_in_seconds: 19
Epoch [1/1], Step [13625/13804], Loss: 2.5343, Perplexity: 12.6082, time_taken_in_seconds: 20
Epoch [1/1], Step [13626/13804], Loss: 2.3073, Perplexity: 10.0475, time_taken_in_seconds: 21
Epoch [1/1], Step [13627/13804], Loss: 2.3368, Perplexity: 10.3486, time_taken_in_seconds: 22
Epoch [1/1], Step [13628/13804], Loss: 2.6816, Perplexity: 14.6088, time_taken_in_seconds: 23
Epoch [1/1], Step [13629/13804], Loss: 2.6449, Perplexity: 14.0824, time_taken_in_seconds: 23
Epoch [1/1], Step [13630/13804], Loss: 2.3019, Perplexity: 9.9935, time_taken_in_seconds: 24
Epoch [1/1], Step [13631/13804], Loss: 2.7510, Perplexity: 15.6577, time_taken_in_seconds: 25
Epoch [1/1], Step [13632/13804], Loss: 2.5806, Perplexity: 13.2052, time_taken_in_seconds: 26
Epoch [1/1], Step [13633/13804], Loss: 2.3028, Perplexity: 10.0017, time_taken_in_seconds: 27
Epoch [1/1], Step [13634/13804], Loss: 2.7283, Perplexity: 15.3072, time_taken_in_seconds: 27
Epoch [1/1], Step [13635/13804], Loss: 2.6924, Perplexity: 14.7669, time_taken_in_seconds: 28
Epoch [1/1], Step [13636/13804], Loss: 2.6593, Perplexity: 14.2868, time_taken_in_seconds: 29
Epoch [1/1], Step [13637/13804], Loss: 2.5681, Perplexity: 13.0408, time_taken_in_seconds: 30
Epoch [1/1], Step [13638/13804], Loss: 2.6090, Perplexity: 13.5848, time_taken_in_seconds: 31
Epoch [1/1], Step [13639/13804], Loss: 2.3905, Perplexity: 10.9193, time_taken_in_seconds: 31
Epoch [1/1], Step [13640/13804], Loss: 2.3556, Perplexity: 10.5442, time_taken_in_seconds: 32
Epoch [1/1], Step [13641/13804], Loss: 3.0027, Perplexity: 20.1401, time_taken_in_seconds: 33
Epoch [1/1], Step [13642/13804], Loss: 2.6152, Perplexity: 13.6697, time_taken_in_seconds: 34
Epoch [1/1], Step [13643/13804], Loss: 2.2729, Perplexity: 9.7074, time_taken_in_seconds: 35
Epoch [1/1], Step [13644/13804], Loss: 2.8431, Perplexity: 17.1681, time_taken_in_seconds: 36
Epoch [1/1], Step [13645/13804], Loss: 2.3974, Perplexity: 10.9941, time_taken_in_seconds: 36
Epoch [1/1], Step [13646/13804], Loss: 2.7781, Perplexity: 16.0880, time_taken_in_seconds: 37
Epoch [1/1], Step [13647/13804], Loss: 2.5176, Perplexity: 12.3986, time_taken_in_seconds: 38
Epoch [1/1], Step [13648/13804], Loss: 2.7572, Perplexity: 15.7550, time_taken_in_seconds: 39
Epoch [1/1], Step [13649/13804], Loss: 2.5252, Perplexity: 12.4934, time_taken_in_seconds: 40
Epoch [1/1], Step [13650/13804], Loss: 2.5974, Perplexity: 13.4294, time_taken_in_seconds: 40
Epoch [1/1], Step [13651/13804], Loss: 2.5674, Perplexity: 13.0321, time_taken_in_seconds: 41
Epoch [1/1], Step [13652/13804], Loss: 2.5132, Perplexity: 12.3441, time_taken_in_seconds: 42
Epoch [1/1], Step [13653/13804], Loss: 2.6746, Perplexity: 14.5063, time_taken_in_seconds: 43
Epoch [1/1], Step [13654/13804], Loss: 2.3267, Perplexity: 10.2442, time_taken_in_seconds: 44
Epoch [1/1], Step [13655/13804], Loss: 2.6914, Perplexity: 14.7518, time_taken_in_seconds: 45
Epoch [1/1], Step [13656/13804], Loss: 2.3031, Perplexity: 10.0054, time_taken_in_seconds: 45
Epoch [1/1], Step [13657/13804], Loss: 2.2846, Perplexity: 9.8217, time_taken_in_seconds: 46
Epoch [1/1], Step [13658/13804], Loss: 3.1038, Perplexity: 22.2816, time_taken_in_seconds: 47
Epoch [1/1], Step [13659/13804], Loss: 2.6019, Perplexity: 13.4894, time_taken_in_seconds: 48
Epoch [1/1], Step [13660/13804], Loss: 2.5766, Perplexity: 13.1520, time_taken_in_seconds: 49
Epoch [1/1], Step [13661/13804], Loss: 2.2961, Perplexity: 9.9351, time_taken_in_seconds: 49
Epoch [1/1], Step [13662/13804], Loss: 2.2598, Perplexity: 9.5809, time_taken_in_seconds: 50
Epoch [1/1], Step [13663/13804], Loss: 2.1983, Perplexity: 9.0100, time_taken_in_seconds: 51
Epoch [1/1], Step [13664/13804], Loss: 2.3689, Perplexity: 10.6855, time_taken_in_seconds: 52
Epoch [1/1], Step [13665/13804], Loss: 2.3402, Perplexity: 10.3833, time_taken_in_seconds: 53
Epoch [1/1], Step [13666/13804], Loss: 2.6807, Perplexity: 14.5951, time_taken_in_seconds: 54
Epoch [1/1], Step [13667/13804], Loss: 2.3567, Perplexity: 10.5564, time_taken_in_seconds: 54
Epoch [1/1], Step [13668/13804], Loss: 2.4708, Perplexity: 11.8323, time_taken_in_seconds: 55
Epoch [1/1], Step [13669/13804], Loss: 2.3351, Perplexity: 10.3307, time_taken_in_seconds: 56
Epoch [1/1], Step [13670/13804], Loss: 2.2846, Perplexity: 9.8215, time_taken_in_seconds: 57
Epoch [1/1], Step [13671/13804], Loss: 2.2070, Perplexity: 9.0880, time_taken_in_seconds: 58
Epoch [1/1], Step [13672/13804], Loss: 2.8728, Perplexity: 17.6868, time_taken_in_seconds: 58
Epoch [1/1], Step [13673/13804], Loss: 2.2543, Perplexity: 9.5291, time_taken_in_seconds: 59
Epoch [1/1], Step [13674/13804], Loss: 2.3506, Perplexity: 10.4914, time_taken_in_seconds: 60
Epoch [1/1], Step [13675/13804], Loss: 2.5548, Perplexity: 12.8692, time_taken_in_seconds: 61
Epoch [1/1], Step [13676/13804], Loss: 2.5341, Perplexity: 12.6050, time_taken_in_seconds: 62
Epoch [1/1], Step [13677/13804], Loss: 2.2233, Perplexity: 9.2380, time_taken_in_seconds: 63
Epoch [1/1], Step [13678/13804], Loss: 2.4770, Perplexity: 11.9058, time_taken_in_seconds: 64
Epoch [1/1], Step [13679/13804], Loss: 2.7032, Perplexity: 14.9277, time_taken_in_seconds: 64
Epoch [1/1], Step [13680/13804], Loss: 2.3813, Perplexity: 10.8189, time_taken_in_seconds: 65
Epoch [1/1], Step [13681/13804], Loss: 2.5446, Perplexity: 12.7378, time_taken_in_seconds: 66
Epoch [1/1], Step [13682/13804], Loss: 2.4828, Perplexity: 11.9742, time_taken_in_seconds: 67
Epoch [1/1], Step [13683/13804], Loss: 2.3020, Perplexity: 9.9940, time_taken_in_seconds: 68
Epoch [1/1], Step [13684/13804], Loss: 2.3268, Perplexity: 10.2454, time_taken_in_seconds: 68
Epoch [1/1], Step [13685/13804], Loss: 3.5100, Perplexity: 33.4472, time_taken_in_seconds: 69
Epoch [1/1], Step [13686/13804], Loss: 2.5827, Perplexity: 13.2330, time_taken_in_seconds: 70
Epoch [1/1], Step [13687/13804], Loss: 2.3673, Perplexity: 10.6684, time_taken_in_seconds: 71
Epoch [1/1], Step [13688/13804], Loss: 3.1778, Perplexity: 23.9942, time_taken_in_seconds: 72
Epoch [1/1], Step [13689/13804], Loss: 2.1048, Perplexity: 8.2057, time_taken_in_seconds: 73
Epoch [1/1], Step [13690/13804], Loss: 2.4610, Perplexity: 11.7169, time_taken_in_seconds: 73
Epoch [1/1], Step [13691/13804], Loss: 2.5348, Perplexity: 12.6133, time_taken_in_seconds: 74
Epoch [1/1], Step [13692/13804], Loss: 2.5179, Perplexity: 12.4023, time_taken_in_seconds: 75
Epoch [1/1], Step [13693/13804], Loss: 2.7528, Perplexity: 15.6865, time_taken_in_seconds: 76
Epoch [1/1], Step [13694/13804], Loss: 2.1444, Perplexity: 8.5368, time_taken_in_seconds: 77
Epoch [1/1], Step [13695/13804], Loss: 2.5379, Perplexity: 12.6532, time_taken_in_seconds: 77
Epoch [1/1], Step [13696/13804], Loss: 2.4933, Perplexity: 12.1015, time_taken_in_seconds: 78
Epoch [1/1], Step [13697/13804], Loss: 2.6997, Perplexity: 14.8753, time_taken_in_seconds: 79
Epoch [1/1], Step [13698/13804], Loss: 2.6764, Perplexity: 14.5320, time_taken_in_seconds: 80
Epoch [1/1], Step [13699/13804], Loss: 2.4222, Perplexity: 11.2710, time_taken_in_seconds: 81
Epoch [1/1], Step [13700/13804], Loss: 2.6462, Perplexity: 14.0998, time_taken_in_seconds: 82
Epoch [1/1], Step [13701/13804], Loss: 2.7040, Perplexity: 14.9397, time_taken_in_seconds: 0
Epoch [1/1], Step [13702/13804], Loss: 2.1528, Perplexity: 8.6089, time_taken_in_seconds: 1
Epoch [1/1], Step [13703/13804], Loss: 2.0936, Perplexity: 8.1143, time_taken_in_seconds: 2
Epoch [1/1], Step [13704/13804], Loss: 3.0712, Perplexity: 21.5679, time_taken_in_seconds: 3
Epoch [1/1], Step [13705/13804], Loss: 2.4309, Perplexity: 11.3689, time_taken_in_seconds: 4
Epoch [1/1], Step [13706/13804], Loss: 2.2350, Perplexity: 9.3461, time_taken_in_seconds: 4
Epoch [1/1], Step [13707/13804], Loss: 3.1196, Perplexity: 22.6384, time_taken_in_seconds: 5
Epoch [1/1], Step [13708/13804], Loss: 2.3801, Perplexity: 10.8058, time_taken_in_seconds: 6
Epoch [1/1], Step [13709/13804], Loss: 2.5580, Perplexity: 12.9104, time_taken_in_seconds: 7
Epoch [1/1], Step [13710/13804], Loss: 2.8020, Perplexity: 16.4769, time_taken_in_seconds: 8
Epoch [1/1], Step [13711/13804], Loss: 2.4574, Perplexity: 11.6740, time_taken_in_seconds: 9
Epoch [1/1], Step [13712/13804], Loss: 2.3784, Perplexity: 10.7880, time_taken_in_seconds: 9
Epoch [1/1], Step [13713/13804], Loss: 2.2947, Perplexity: 9.9213, time_taken_in_seconds: 10
Epoch [1/1], Step [13714/13804], Loss: 2.4539, Perplexity: 11.6337, time_taken_in_seconds: 11
Epoch [1/1], Step [13715/13804], Loss: 2.2100, Perplexity: 9.1154, time_taken_in_seconds: 12
Epoch [1/1], Step [13716/13804], Loss: 2.4898, Perplexity: 12.0587, time_taken_in_seconds: 13
Epoch [1/1], Step [13717/13804], Loss: 2.4230, Perplexity: 11.2794, time_taken_in_seconds: 13
Epoch [1/1], Step [13718/13804], Loss: 2.4791, Perplexity: 11.9307, time_taken_in_seconds: 14
Epoch [1/1], Step [13719/13804], Loss: 2.6964, Perplexity: 14.8261, time_taken_in_seconds: 15
Epoch [1/1], Step [13720/13804], Loss: 2.7633, Perplexity: 15.8523, time_taken_in_seconds: 16
Epoch [1/1], Step [13721/13804], Loss: 2.4505, Perplexity: 11.5946, time_taken_in_seconds: 17
Epoch [1/1], Step [13722/13804], Loss: 2.3873, Perplexity: 10.8835, time_taken_in_seconds: 18
Epoch [1/1], Step [13723/13804], Loss: 2.5387, Perplexity: 12.6634, time_taken_in_seconds: 18
Epoch [1/1], Step [13724/13804], Loss: 2.7547, Perplexity: 15.7167, time_taken_in_seconds: 19
Epoch [1/1], Step [13725/13804], Loss: 2.4860, Perplexity: 12.0136, time_taken_in_seconds: 20
Epoch [1/1], Step [13726/13804], Loss: 2.3196, Perplexity: 10.1711, time_taken_in_seconds: 21
Epoch [1/1], Step [13727/13804], Loss: 2.1452, Perplexity: 8.5435, time_taken_in_seconds: 22
Epoch [1/1], Step [13728/13804], Loss: 2.3148, Perplexity: 10.1228, time_taken_in_seconds: 22
Epoch [1/1], Step [13729/13804], Loss: 2.7575, Perplexity: 15.7611, time_taken_in_seconds: 23
Epoch [1/1], Step [13730/13804], Loss: 2.4908, Perplexity: 12.0714, time_taken_in_seconds: 24
Epoch [1/1], Step [13731/13804], Loss: 2.4667, Perplexity: 11.7836, time_taken_in_seconds: 25
Epoch [1/1], Step [13732/13804], Loss: 2.7713, Perplexity: 15.9788, time_taken_in_seconds: 26
Epoch [1/1], Step [13733/13804], Loss: 2.8665, Perplexity: 17.5758, time_taken_in_seconds: 26
Epoch [1/1], Step [13734/13804], Loss: 2.2915, Perplexity: 9.8893, time_taken_in_seconds: 27
Epoch [1/1], Step [13735/13804], Loss: 2.3363, Perplexity: 10.3425, time_taken_in_seconds: 28
Epoch [1/1], Step [13736/13804], Loss: 2.5582, Perplexity: 12.9131, time_taken_in_seconds: 29
Epoch [1/1], Step [13737/13804], Loss: 2.7319, Perplexity: 15.3618, time_taken_in_seconds: 30
Epoch [1/1], Step [13738/13804], Loss: 2.3681, Perplexity: 10.6775, time_taken_in_seconds: 30
Epoch [1/1], Step [13739/13804], Loss: 2.9937, Perplexity: 19.9595, time_taken_in_seconds: 31
Epoch [1/1], Step [13740/13804], Loss: 2.7812, Perplexity: 16.1386, time_taken_in_seconds: 32
Epoch [1/1], Step [13741/13804], Loss: 2.4674, Perplexity: 11.7918, time_taken_in_seconds: 33
Epoch [1/1], Step [13742/13804], Loss: 3.1472, Perplexity: 23.2711, time_taken_in_seconds: 34
Epoch [1/1], Step [13743/13804], Loss: 2.4526, Perplexity: 11.6188, time_taken_in_seconds: 35
Epoch [1/1], Step [13744/13804], Loss: 2.5857, Perplexity: 13.2730, time_taken_in_seconds: 35
Epoch [1/1], Step [13745/13804], Loss: 2.6419, Perplexity: 14.0397, time_taken_in_seconds: 36
Epoch [1/1], Step [13746/13804], Loss: 2.3348, Perplexity: 10.3271, time_taken_in_seconds: 37
Epoch [1/1], Step [13747/13804], Loss: 2.8794, Perplexity: 17.8040, time_taken_in_seconds: 38
Epoch [1/1], Step [13748/13804], Loss: 2.3667, Perplexity: 10.6622, time_taken_in_seconds: 39
Epoch [1/1], Step [13749/13804], Loss: 2.6846, Perplexity: 14.6523, time_taken_in_seconds: 40
Epoch [1/1], Step [13750/13804], Loss: 2.2296, Perplexity: 9.2961, time_taken_in_seconds: 40
Epoch [1/1], Step [13751/13804], Loss: 2.2812, Perplexity: 9.7882, time_taken_in_seconds: 41
Epoch [1/1], Step [13752/13804], Loss: 2.9143, Perplexity: 18.4363, time_taken_in_seconds: 42
Epoch [1/1], Step [13753/13804], Loss: 2.1981, Perplexity: 9.0077, time_taken_in_seconds: 43
Epoch [1/1], Step [13754/13804], Loss: 2.3903, Perplexity: 10.9165, time_taken_in_seconds: 44
Epoch [1/1], Step [13755/13804], Loss: 2.5415, Perplexity: 12.6985, time_taken_in_seconds: 44
Epoch [1/1], Step [13756/13804], Loss: 3.4535, Perplexity: 31.6109, time_taken_in_seconds: 45
Epoch [1/1], Step [13757/13804], Loss: 2.1973, Perplexity: 9.0011, time_taken_in_seconds: 46
Epoch [1/1], Step [13758/13804], Loss: 2.4163, Perplexity: 11.2038, time_taken_in_seconds: 47
Epoch [1/1], Step [13759/13804], Loss: 2.2710, Perplexity: 9.6887, time_taken_in_seconds: 48
Epoch [1/1], Step [13760/13804], Loss: 2.7832, Perplexity: 16.1699, time_taken_in_seconds: 49
Epoch [1/1], Step [13761/13804], Loss: 1.9324, Perplexity: 6.9063, time_taken_in_seconds: 49
Epoch [1/1], Step [13762/13804], Loss: 2.3949, Perplexity: 10.9671, time_taken_in_seconds: 50
Epoch [1/1], Step [13763/13804], Loss: 2.7167, Perplexity: 15.1306, time_taken_in_seconds: 51
Epoch [1/1], Step [13764/13804], Loss: 2.8652, Perplexity: 17.5530, time_taken_in_seconds: 52
Epoch [1/1], Step [13765/13804], Loss: 3.0039, Perplexity: 20.1636, time_taken_in_seconds: 53
Epoch [1/1], Step [13766/13804], Loss: 2.5526, Perplexity: 12.8410, time_taken_in_seconds: 53
Epoch [1/1], Step [13767/13804], Loss: 2.7833, Perplexity: 16.1720, time_taken_in_seconds: 54
Epoch [1/1], Step [13768/13804], Loss: 2.6316, Perplexity: 13.8958, time_taken_in_seconds: 55
Epoch [1/1], Step [13769/13804], Loss: 2.3363, Perplexity: 10.3426, time_taken_in_seconds: 56
Epoch [1/1], Step [13770/13804], Loss: 2.5677, Perplexity: 13.0361, time_taken_in_seconds: 57
Epoch [1/1], Step [13771/13804], Loss: 2.9590, Perplexity: 19.2795, time_taken_in_seconds: 57
Epoch [1/1], Step [13772/13804], Loss: 2.8276, Perplexity: 16.9048, time_taken_in_seconds: 58
Epoch [1/1], Step [13773/13804], Loss: 2.5211, Perplexity: 12.4422, time_taken_in_seconds: 59
Epoch [1/1], Step [13774/13804], Loss: 2.3890, Perplexity: 10.9027, time_taken_in_seconds: 60
Epoch [1/1], Step [13775/13804], Loss: 2.4384, Perplexity: 11.4548, time_taken_in_seconds: 61
Epoch [1/1], Step [13776/13804], Loss: 2.3616, Perplexity: 10.6084, time_taken_in_seconds: 61
Epoch [1/1], Step [13777/13804], Loss: 2.4898, Perplexity: 12.0594, time_taken_in_seconds: 62
Epoch [1/1], Step [13778/13804], Loss: 2.3470, Perplexity: 10.4540, time_taken_in_seconds: 63
Epoch [1/1], Step [13779/13804], Loss: 2.4081, Perplexity: 11.1125, time_taken_in_seconds: 64
Epoch [1/1], Step [13780/13804], Loss: 2.3617, Perplexity: 10.6093, time_taken_in_seconds: 65
Epoch [1/1], Step [13781/13804], Loss: 2.4558, Perplexity: 11.6557, time_taken_in_seconds: 66
Epoch [1/1], Step [13782/13804], Loss: 2.4775, Perplexity: 11.9110, time_taken_in_seconds: 66
Epoch [1/1], Step [13783/13804], Loss: 2.3747, Perplexity: 10.7478, time_taken_in_seconds: 67
Epoch [1/1], Step [13784/13804], Loss: 2.7172, Perplexity: 15.1371, time_taken_in_seconds: 68
Epoch [1/1], Step [13785/13804], Loss: 2.4913, Perplexity: 12.0768, time_taken_in_seconds: 69
Epoch [1/1], Step [13786/13804], Loss: 2.5248, Perplexity: 12.4884, time_taken_in_seconds: 70
Epoch [1/1], Step [13787/13804], Loss: 2.1630, Perplexity: 8.6976, time_taken_in_seconds: 70
Epoch [1/1], Step [13788/13804], Loss: 2.1142, Perplexity: 8.2832, time_taken_in_seconds: 71
Epoch [1/1], Step [13789/13804], Loss: 2.3170, Perplexity: 10.1451, time_taken_in_seconds: 72
Epoch [1/1], Step [13790/13804], Loss: 2.9372, Perplexity: 18.8624, time_taken_in_seconds: 73
Epoch [1/1], Step [13791/13804], Loss: 2.5859, Perplexity: 13.2752, time_taken_in_seconds: 74
Epoch [1/1], Step [13792/13804], Loss: 2.2772, Perplexity: 9.7496, time_taken_in_seconds: 74
Epoch [1/1], Step [13793/13804], Loss: 2.4784, Perplexity: 11.9225, time_taken_in_seconds: 75
Epoch [1/1], Step [13794/13804], Loss: 2.6606, Perplexity: 14.3055, time_taken_in_seconds: 76
Epoch [1/1], Step [13795/13804], Loss: 2.5489, Perplexity: 12.7929, time_taken_in_seconds: 77
Epoch [1/1], Step [13796/13804], Loss: 2.2812, Perplexity: 9.7880, time_taken_in_seconds: 78
Epoch [1/1], Step [13797/13804], Loss: 2.6549, Perplexity: 14.2237, time_taken_in_seconds: 78
Epoch [1/1], Step [13798/13804], Loss: 2.1841, Perplexity: 8.8822, time_taken_in_seconds: 79
Epoch [1/1], Step [13799/13804], Loss: 3.3570, Perplexity: 28.7042, time_taken_in_seconds: 80
Epoch [1/1], Step [13800/13804], Loss: 2.8175, Perplexity: 16.7353, time_taken_in_seconds: 81
Epoch [1/1], Step [13801/13804], Loss: 2.6069, Perplexity: 13.5572, time_taken_in_seconds: 0
Epoch [1/1], Step [13802/13804], Loss: 2.5568, Perplexity: 12.8945, time_taken_in_seconds: 1
Epoch [1/1], Step [13803/13804], Loss: 2.8878, Perplexity: 17.9531, time_taken_in_seconds: 2
Epoch [1/1], Step [13804/13804], Loss: 2.5990, Perplexity: 13.4503, time_taken_in_seconds: 3
