## Sort-of-CLEVR Dataset

Sort-of-CLEVR dataset is a simplified version of the [CLEVR dataset](https://cs.stanford.edu/people/jcjohns/clevr/). In this simplified version the scene presented is in 2D and the questions are encoded as fixed-length binary vector to avoid the difficulty with the natural language processing of the questions.

Similar to the implementatin by [kimhc6028](https://github.com/kimhc6028/relational-networks), the dataset is composed of 10000 images with each image being associated with 20 questions (10 relational questions and 10 non-relational questions). For each image, 6 colored (red, blue, green, orange, yellow, gray) shapes (square or circle) are placed randomly in a 128x128 image.

Given a queried color, all the possible questions are as follows.

**Non-relational questions**

1. Is it a circle or a rectangle?
2. Is it on the bottom of the image?
3. Is it on the left of the image?

**Relational questions**

1. The shape of the nearest object?
2. The shape of the farthest object?
3. How many objects have the same shape?

Questions are encoded into a one-hot vector of size 11 as shown below.

*\[red, blue, green, orange, yellow, gray, relational, non-relational, question 1, question 2, question 3\]*

And the possible answer is a fixed length one-hot vector whose elements represent

*\[yes, no, rectangle, circle, 1, 2, 3, 4, 5, 6\]*

**Example**

![](figures/example.png)

So a query of "What is the shape of the nearest object to the blue object" will be represented as

\[0,1,0,0,0,0,1,0,1,0,0\]

And the answer to the query could be

\[0,0,1,0,0,0,0,0,0,0\] #rectangle

In [1]:
import time
from tqdm import *
from sort_of_clevr_generator import SortOfCLEVRGenerator

generator = SortOfCLEVRGenerator()
test_dataset = []
for i in tqdm(xrange(5000), desc='Generating Sort-of-CLEVR Dataset'):
    dataset = generator.generate_dataset()
    test_dataset.append(dataset)


Generating Sort-of-CLEVR Dataset: 100%|██████████| 9800/9800 [00:55<00:00, 177.57it/s]


In [2]:
import pickle
with open("sort-of-clevr.p", 'wb') as f:
    pickle.dump(test_dataset, f)

KeyboardInterrupt: 

![](figures/RNEquation.png)

In [2]:
import numpy as np
import keras
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Activation, Flatten, Input, Lambda, Concatenate, Add

def slicer(x_loc, y_loc):
    def func(x):
        return x[:,:,x_loc, y_loc]
    return Lambda(func)

def compute_relations(objects, question):
    relations = []
    #objects are tagged CNN output that has the format of (batch_size, 24+2(tagging), d, d)
    d = objects.shape[2]
    for i in xrange(d*d):
        o_i = slicer(int(i / d), int(i % d))(objects)
        for j in xrange(d*d):
            o_j = slicer(int(j / d), int(j % d))(objects)
            relations.append(Concatenate([o_i, o_j, question]))
    return relations

def g_theta(h_unit=256, layers=4):
    def f(model):
        for n in xrange(layers):
            model = Dense(h_unit)(model)
            model = Activation('relu')(model)
        return model
    return f

def f_theta(h_unit=256):
    def f(model):
        model = Dense(h_unit)(model)
        model = Activation('relu')(model)
        model = Dense(h_unit)(model)
        model = Activation('relu')(model)
        model = Dropout(0.5)(model)
        model = Dense(29)(model)
        model = Activation('relu')(model)
        return model
    return f

def RelationNetworks(objects, question):
    relations = compute_relations(objects,question)
    for i, r in enumerate(relations):
        relations[i] = g_theta()(r)
    combined_relation = Add()(relations)
    f_out = f_theta()(combined_relation)
    return f_out

In [None]:
def ConvolutionNetworks():
    def conv(model):
        
        return model
    return conv

In [None]:
def object_tagging(objects):
    

In [None]:
    pred_out = Dense(10)(f_out)
    pred_out = Activation('softmax')