# Computer Vision Nanodegree

## Project: Image Captioning

---

In this notebook, some template code has already been provided for you ...

Feel free to use the links below to navigate the notebook:
- [Step 1](#step1): Training Setup
- [Step 2](#step2): Train your Model!
- [Step 3](#step3): (Optional) Validate your Model.

<a id='step1'></a>
### Step 1: Training Setup

do NOT change the lines of code that are not preceded with a TODO statement.

In [1]:
import torch
import torch.nn as nn
from torch.autograd import Variable
from torchvision import transforms
from data_loader import get_loader
from model import EncoderCNN, DecoderRNN
import math

def to_var(x, volatile=False):
    """ converts a Pytorch Tensor to a variable and moves to GPU if CUDA is available """
    if torch.cuda.is_available():
        x = x.cuda()
    return Variable(x, volatile=volatile)

## TODO: Select appropriate values for the Python variables below.
batch_size = 128
vocab_threshold = 4
embed_size = 256
hidden_size = 512
learning_rate = 0.001
num_epochs = 5
save_every = 1

# TODO: Amend the image transform below.
transform_train = transforms.Compose([ 
    transforms.RandomCrop(224),
    transforms.RandomHorizontalFlip(), 
    transforms.ToTensor(), 
    transforms.Normalize((0.485, 0.456, 0.406), 
                         (0.229, 0.224, 0.225))])

# Build data loader.
data_loader = get_loader(transform=transform_train,
                         mode='train',
                         batch_size=batch_size)

# The size of the vocabulary.
vocab_size = len(data_loader.dataset.vocab)

# Initialize the encoder and decoder. 
encoder = EncoderCNN(embed_size)
decoder = DecoderRNN(embed_size, hidden_size, vocab_size)

# Move models to GPU if CUDA is available. 
if torch.cuda.is_available():
    encoder.cuda()
    decoder.cuda()

# Define the loss function. 
criterion = nn.CrossEntropyLoss().cuda() if torch.cuda.is_available() else nn.CrossEntropyLoss()

# TODO: Specify the learnable parameters of the model.
params = list(decoder.parameters()) + list(encoder.linear.parameters()) 

# TODO: Define the optimizer.
optimizer = torch.optim.Adam(params=params, lr=0.001)

# Set the total number of training steps per epoch.
total_step = math.ceil(len(data_loader.dataset.caption_lengths) / data_loader.batch_sampler.batch_size)

Vocabulary successfully loaded from vocab.pkl file!
loading annotations into memory...
Done (t=0.59s)
creating index...


  0%|          | 887/414113 [00:00<00:46, 8866.29it/s]

index created!
Obtaining caption lengths ...


100%|██████████| 414113/414113 [00:40<00:00, 10258.13it/s]


<a id='step2'></a>
### Step 2: Train your Model!

In [None]:
# (Optional) TODO: Load pre-trained weights before resuming training.
encoder.load_state_dict(torch.load(os.path.join('./models', encoder_file)))
decoder.load_state_dict(torch.load(os.path.join('./models', decoder_file)))

# need to clarify what will happen here ... 
# when resume training ... what will get overwritten, etc

In [2]:
import torch.utils.data as data
import numpy as np
import os

for epoch in range(num_epochs):
    
    for i_step in range(0, total_step):
        
        # Randomly sample a caption length, and sample indices with that length.
        indices = data_loader.dataset.get_train_indices()
        # Create and assign a batch sampler to retrieve a batch with the sampled indices.
        new_sampler = data.sampler.SubsetRandomSampler(indices=indices)
        data_loader.batch_sampler.sampler = new_sampler

        # Obtain the batch.
        for batch in data_loader:
            images, captions = batch[0], batch[1]
            break 
        
        # Convert batch of images and captions to Pytorch Variable.
        images = to_var(images, volatile=True)
        captions = to_var(captions)
        
        # Zero the gradients.
        decoder.zero_grad()
        encoder.zero_grad()
        
        # Pass the inputs through the CNN-RNN model.
        features = encoder(images)
        outputs = decoder(features, captions)
        
        # Calculate the batch loss.
        loss = criterion(outputs.view(-1, vocab_size), captions.view(-1))
        
        # Backward pass.
        loss.backward()
        
        # Update the parameters in the optimizer.
        optimizer.step()
            
        # Print training statistics.
        print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f, Perplexity: %5.4f'
            %(epoch+1, num_epochs, i_step, total_step, loss.data[0], np.exp(loss.data[0]))) 
            
    # Save the weights.
    if epoch % save_every == 0:
        torch.save(decoder.state_dict(), os.path.join('./models', 'decoder-%d.pkl' %(epoch+1)))
        torch.save(encoder.state_dict(), os.path.join('./models', 'encoder-%d.pkl' %(epoch+1)))

Epoch [1/5], Step [0/3236], Loss: 9.2039, Perplexity: 9936.0150
Epoch [1/5], Step [1/3236], Loss: 9.0564, Perplexity: 8572.8497
Epoch [1/5], Step [2/3236], Loss: 8.8579, Perplexity: 7029.4075
Epoch [1/5], Step [3/3236], Loss: 8.6047, Perplexity: 5457.0668
Epoch [1/5], Step [4/3236], Loss: 8.1229, Perplexity: 3370.6694
Epoch [1/5], Step [5/3236], Loss: 7.5465, Perplexity: 1894.1898
Epoch [1/5], Step [6/3236], Loss: 7.0369, Perplexity: 1137.8972
Epoch [1/5], Step [7/3236], Loss: 6.0811, Perplexity: 437.5135
Epoch [1/5], Step [8/3236], Loss: 5.5091, Perplexity: 246.9386
Epoch [1/5], Step [9/3236], Loss: 5.2919, Perplexity: 198.7156
Epoch [1/5], Step [10/3236], Loss: 5.0867, Perplexity: 161.8497
Epoch [1/5], Step [11/3236], Loss: 4.8807, Perplexity: 131.7214
Epoch [1/5], Step [12/3236], Loss: 4.7135, Perplexity: 111.4370
Epoch [1/5], Step [13/3236], Loss: 4.7484, Perplexity: 115.4011
Epoch [1/5], Step [14/3236], Loss: 4.8466, Perplexity: 127.3083
Epoch [1/5], Step [15/3236], Loss: 5.1318, 

Process Process-34:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 36, in _worker_loop
    r = index_queue.get()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Process Process-33:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 343, in get
    res = self._reader.recv_bytes()
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)


KeyboardInterrupt: 

<a id='step3'></a>
### Step 3: (Optional) Validate your Model