##  The Image Captioning Task is divided into 5 sub-tasks. 

- __DataPreprocessing__ - For this task, we use Flickr 8k Dataset, which is convenient to setup and train on our local machine. Here, we extract descriptions.txt file which contains all the captions mapped with the corresponding Image IDs. The code is found in the file (Part1_Data-Preprocess.ipynb)  


- __Extracting Features__ - For the Flickr 8K Image dataset, we use InceptionV3 pretrained model to encode the image features(train + test images) and dump into a featuresNew.pkl file. The code is found in the file (Part2_Features-Extract.ipynb)


- __Training the model__ - The LSTM model is built using Keras API. The model (CNN + LSTM) is trained for 10 epochs. However, since we already have the CNN features extracted, only the LSTM model is trained. The code is found in the file (Part3_Training_Notebook.ipynb)


- __Evaluate the training process__ - The training history is plotted here, specifically how the loss decreases during the training. The code is found in the file (Part4_Plotloss.ipynb)


- __Generating Captions__ - Finally, this is the last step of the captioning task where we generate captions using our trained LSTM model. The captions are generated on test images. The code is found in the file (Part5_Generate_Caption_testSet.ipynb)

In [1]:
import string
import pickle

In [12]:
root_path = 'Z:/Flickr_Data/'

In [2]:
# Loading a file

def load_doc(filename):
    # open the file as read only
    with open(filename, 'r') as file:
        text = file.read()
    return text

In [3]:
# Returns mapping of Image_ID and Image_Captions

def load_descriptions(doc):
    mapping = dict()
    # process lines
    for line in doc.split('\n'):
        # split line by white space
        tokens = line.split()
        
        if not line:
            continue 
        image_id = tokens.pop(0)
        image_desc = tokens[:]

        # remove filename from image id
        image_id = image_id.split('.').pop(0)
        image_id = image_id + '.jpg'
        
        # convert description tokens back to string
        image_desc = ' '.join(map(str,image_desc))
        
        # create the list if needed
        if image_id not in mapping:
            mapping[image_id] = list()
        # store description
        mapping[image_id].append(image_desc)
          
    return mapping

In [4]:
# Vocabulary from all the captions

def to_vocabulary(descriptions):
    # build a list of all description strings
    all_desc = set()
    for key in descriptions.keys():
        [all_desc.update(d.split()) for d in descriptions[key]]
    return all_desc

In [5]:
# Save the Image ID with the corresponding captions (contains both train and test image IDs)

def save_descriptions(descriptions, filename):
    lines = list()
    for _ ,(key, desc_list) in enumerate(descriptions.items()):
        _ = [lines.append(key + ' ' + desc) for desc in desc_list]
    data = '\n'.join(map(str,lines))
    with open(filename, 'w') as file:
        text = file.write(data)

In [7]:
filename = 'Flickr_Data/Flickr_TextData/Flickr8k.token.txt'
# load descriptions
doc = load_doc(filename)
# parse descriptions
descriptions = load_descriptions(doc)
print('Loaded: %d ' % len(descriptions))


# summarize vocabulary
vocabulary = to_vocabulary(descriptions)

print('Vocabulary Size: %d' % len(vocabulary))
# save to file
save_descriptions(descriptions, 'save/descriptions.txt')

Loaded: 8092 
Vocabulary Size: 9630


In [8]:
vocab=list(vocabulary)
vocab.append("<start>")
vocab.append("<end>")
print(len(vocab))
pickle.dump( vocab, open( "save/vocab.p", "wb" ) )

9632


In [9]:
# Generating captions for training, captions appended with <start> and <end> sequences.

def generate_captions(fileName,dataset):
    
    imgs_captions = open(fileName,'w')
    
    dataset = open(dataset).read().split('\n')[:-1]
         
    start = "<start> "
    end = " <end>"
    
    for img_id in dataset:
        for caption in descriptions[img_id]:

            full_caption = start + caption + end
            imgs_captions.write(img_id+"\t"+full_caption+"\n")
            imgs_captions.flush()
        
    imgs_captions.close()

In [10]:
# Generating captions(ID, caption) for training - 6000 Images

train_imgs_captions = "save/trainCaptions.txt"
train_imgs_id = "features/Flickr_8k.trainImages.txt"

generate_captions(train_imgs_captions,train_imgs_id)

In [11]:
# Generating captions(ID, caption) for testing - 6000 Images

test_imgs_captions = "save/testCaptions.txt"
test_imgs_id = "features/Flickr_8k.testImages.txt"

generate_captions(test_imgs_captions,test_imgs_id)