# Counting Everyday Objects in Everyday Scenes

In this notebook it is shown how to use the ported Keras implementation of the counting models presented in the paper Chattopadhyay P. ["Counting Everyday Objects in Everyday Scenes"](https://arxiv.org/abs/1604.03505)

All the models are implemented according to the guidelines of the authors. For the implementation of the models with more efficient pipelines and better deployment check the notebook "Transfer Learning to Count Objects in Custom Datasets"

## Download and compile

In [0]:
!rm -r object_counting keras*

In [0]:
!tar -xf keras.tar.xz

In [0]:
!rm -r weights

In [0]:
!mkdir weights

In [0]:
!wget http://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar

In [0]:
!wget http://pjreddie.com/media/files/VOCtest_06-Nov-2007.tar

In [0]:
!wget http://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar

In [0]:
!tar -xf VOCtest_06-Nov-2007.tar

In [0]:
!tar -xf VOCtrainval_06-Nov-2007.tar

In [0]:
!tar -xf VOCtrainval_11-May-2012.tar

In [0]:
import os
os.chdir('/content/object_counting/objects_counter/gt_generation/cython/')
!python setup.py build_ext --inplace

In [0]:
os.chdir('/content/object_counting/objects_counter/utils/cython/')
!python setup.py build_ext --inplace

os.chdir('/content/')

## The different models

The model presented in the paper are the followings (results on PASCAL VOC 07 Dataset):

<br>
<div class="ModelsTable">
    
|Model      |Annotation     |mRMSE |
|-----------|:-------------:|:---------:|
|**Detect** |Bounding Boxes |0.50 ± 0.01|
|**Glance** |Count          |0.50 ± 0.02|
|**Aso-Sub**|Bounding Boxes |0.43 ± 0.01|
|**Seq-Sub**|Bounding Boxes |0.42 ± 0.01|

</div>
<br>

## Models applied to PASCAL VOC dataset

From now on, I will show you how to use the code and instantiate the different models and test them on the PASCAL VOC dataset.

In each block, most of the operations are the same thanks to the flexibility of the class inspired by Object Oriented paradigm, exploiting objects ineritance and interface style.

### Preliminary imports

In [0]:
import os
import time
import numpy as np

from keras import backend as K

# Uncomment if your GPU has not enough memory
# os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
# os.environ["CUDA_VISIBLE_DEVICES"] = ""

from object_counting.objects_counter.objects_counter import ObjectsCounter
from object_counting.objects_counter.utils.data_generator import DataGenerator

### Setting variables

In [0]:
base_model = 'resnet152' # available: ['vgg16', 'mobilenet', 'resnet50', 'resnet152', 'inceptionresnet']b'
classes = ['aeroplane', 'bicycle', 'bird', 'boat',
           'bottle', 'bus', 'car', 'cat',
           'chair', 'cow', 'diningtable', 'dog',
           'horse', 'motorbike', 'person', 'pottedplant',
           'sheep', 'sofa', 'train', 'tvmonitor']
input_shape = (224,224,3)

### Prediction Interpretation function

In [0]:
def pretty_prediction(classes, y_pred, y_true=None):
    
    if y_true is not None:
        for i in range(y_pred.shape[0]):
            print("Prediction {}".format(i))
            for c in range(len(classes)):
                if not y_pred[i,c] == 0 or not y_true[i,c] == 0:
                    print("\t{}: {} {}".format(classes[c], y_pred[i,c], y_true[i,c]))
    else:
        for i in range(y_pred.shape[0]):
            print("Prediction {}".format(i))
            for c in range(len(classes)):
                if not y_pred[i,c] == 0:
                    print("\t{}: {}".format(classes[c], y_pred[i,c]))

## MODEL

In [0]:
count_model = 'asosub' # available: ['glance', 'asosub', 'asosubfc', 'seqsub', 'detect']
model_args = {'num_hidden': 2, # valid only for glance and asosub, comment otherwise
              #'num_bilstms': 2, # valid only for seqsub, comment otherwise
              'hidden_size': 250,
              'include_relu': False,
              'weights': None}
grid_division = (3,3) # set to (n,n) in case of asosub or seqsub
weights = None

### Load datasets and convert them in the correct format for the model

In [0]:
sets = ['/content/VOCdevkit/VOC2007/ImageSets/Main/train.txt',
        '/content/VOCdevkit/VOC2007/ImageSets/Main/val.txt',
        '/content/VOCdevkit/VOC2007/ImageSets/Main/test.txt',
       ]
setnames = ['train_asosub', 'val_asosub_asosub', 'test_asosub']
images_folder = '/content/VOCdevkit/VOC2007/JPEGImages/'
annotations_folder = '/content/VOCdevkit/VOC2007/Annotations/'
annotations_ext = 'xml'

dataset = DataGenerator(classes,
                        count_model=count_model,
                        base_model=base_model,
                        base_model_weights=weights,
                        input_shape=input_shape,
                        grid_division=grid_division,
                        gt_mode={'mode': 'linear',
                                 'std_ratio': (5, 5)})


Load the features for the original non-end-to-end models

In [0]:
##############################
#           NON-E2E
##############################

# If you want to load the data from scratch and extract the features run this cell

dataset.load_dataset(sets=sets,
                     images_folder=images_folder,
                     annotations_folder=annotations_folder,
                     annotations_ext=annotations_ext,
                     setnames=setnames,
                     h5dataset_path=None,
                     h5saving_folder='/content/VOCdevkit/VOC2007/',
                     h5saving_file=None,
                     load_into_memory=False,
                     serialize_cells=False) #set true for asosub



In [0]:
sets = ['/content/VOCdevkit/VOC2012/ImageSets/Main/trainval.txt']
setnames = ['train_asosub_12']
images_folder = '/content/VOCdevkit/VOC2012/JPEGImages/'
annotations_folder = '/content/VOCdevkit/VOC2012/Annotations/'
annotations_ext = 'xml'

dataset_12 = DataGenerator(classes,
                        count_model=count_model,
                        base_model=base_model,
                        base_model_weights=weights,
                        input_shape=input_shape,
                        grid_division=grid_division,
                        gt_mode={'mode': 'linear',
                                 'std_ratio': (5, 5)})   

dataset_12.load_dataset(sets=sets,
                     images_folder=images_folder,
                     annotations_folder=annotations_folder,
                     annotations_ext=annotations_ext,
                     setnames=setnames,
                     h5dataset_path=None,
                     h5saving_folder='/content/VOCdevkit/VOC2007/',
                     h5saving_file=None,
                     load_into_memory=False,
                     serialize_cells=False) #set true for asosub

Load the features for the end-to-end models

In [0]:
##############################
#             E2E            #
##############################

# If you want to load the data from scratch and extract the features run this cell

dataset.load_dataset_noftr(sets=sets,
                           images_folder=images_folder,
                           annotations_folder=annotations_folder,
                           annotations_ext=annotations_ext,
                           setnames=setnames,
                           h5saving_folder='/content/VOCdevkit/VOC2007/',
                           h5saving_file=None,
                           load_into_memory=True)

In [0]:
# If you already have the features extracted and saved in a HDF5 file, run this cell

setnames = ['train_asosub', 'val_asosub_asosub', 'test_asosub', 'train_asosub_12']

X_train_07,y_train_07 = dataset.h5features_to_memory('/content/VOCdevkit/VOC2007/{}.h5'.format(setnames[0]), gt=True)
X_train_12,y_train_12 = dataset.h5features_to_memory('/content/VOCdevkit/VOC2007/{}.h5'.format(setnames[3]), gt=True)
X_val,y_val = dataset.h5features_to_memory('/content/VOCdevkit/VOC2007/{}.h5'.format(setnames[1]), gt=True)
X_test,y_test = dataset.h5features_to_memory('/content/VOCdevkit/VOC2007/{}.h5'.format(setnames[2]), gt=True)

X_train = np.concatenate([X_train_07, X_train_12])
y_train = np.concatenate([y_train_07, y_train_12])

X_train = np.reshape(X_train, (-1, 2048))
X_val = np.reshape(X_val, (-1, 2048))
X_test = np.reshape(X_test, (-1, 2048))
y_train = np.reshape(y_train, (-1, 20))
y_val = np.reshape(y_val, (-1, 20))
y_test = np.reshape(y_test, (-1, 20))

In [0]:
print('Training data size: {}'.format(X_train.shape))
print('Validation data size: {}'.format(X_val.shape))
print('Test data size: {}'.format(X_test.shape))
print('Training gt size: {}'.format(y_train.shape))
print('Validation gt size: {}'.format(y_val.shape))
print('Test gt size: {}'.format(y_test.shape))

### Instantiate the model

In [0]:
# input shape is taken from the first dataset, and starting from the second index
# because the model don't need the number of samples in the input shape
K.clear_session()
OC_model = ObjectsCounter(base_model,
                          count_model,
                          classes,
                          model_args,
                          input_shape=X_train.shape[1:],
                          e2e=False)

In [0]:
from keras.optimizers import Adam
from keras.losses import poisson

# OC_model.compile(lr=0.0001)

OC_model.counting_model.model.compile(Adam(lr=0.001), loss='mse')
OC_model.counting_model.model.summary()
weights = None
if weights is not None:
  OC_model.load_weights(weights)

In [0]:
history = OC_model.train(X_train,
                         y_train,
                         batch_size=100,
                         epochs=200,
                         initial_epoch=0,
                         steps_per_epoch=None,
                         val_data=(X_val, y_val),
                         validation_steps=None,
                         lr_schedule=None,
                         checkpoint_folder='weights/')

In [0]:
# Fine tune the model
OC_model.counting_model.defreeze_layers(0)
OC_model.compile(lr=0.0001)
OC_model.load_weights(weights)

In [0]:
OC_model.train(X,
               y,
               batch_size=25,
               epochs=50,
               initial_epoch=0,
               steps_per_epoch=None,
               val_data=(X_val,y_val),
               validation_steps=None,
               lr_schedule=None,
               chekpoint_folder='weights/')

### Evaluate model

<br>

In this section, we load the test set, we load the trained weights of the model and then evaluate it.

In [0]:
# load weights to the model
K.clear_session()
OC_model = ObjectsCounter(base_model, count_model, classes, model_args, input_shape=X_test.shape[1:], e2e=False)
OC_model.compile()
#OC_model.load_weights('weights/Glanceweights_epoch-9895_val_loss-0.3736.h5')

In [0]:
metrics = OC_model.evaluate(X_test, y_test, 25)
print('Loss: {}'.format(metrics))

In [0]:
predictions = OC_model.predict(X_test)
print(predictions.shape)

In [0]:
y_test_grouped = np.round(np.reshape(y_test, (-1, 9, len(classes))).sum(axis=1))
print(y_test_grouped.shape)

In [0]:
RMSE = np.sqrt(np.mean((predictions-y_test_grouped)**2))
print("RMSE: ", RMSE)
mRMSE = np.mean(np.sqrt(np.mean((predictions-y_test_grouped)**2, axis=0)))
print("mRMSE: ", mRMSE)
m_relRMSE = np.mean(np.sqrt(np.mean(((predictions-y_test_grouped)**2)/(y_test_grouped+1), axis=0)))
print("m_relRMSE: ", m_relRMSE)

In [0]:
print("The first number is the prediction, the second number is the ground truth.\n")
pretty_prediction(classes, np.round(predictions[:15]), y_test_grouped[:15])