[Object Detection] Evaluation procedure throwing negative values for AP and AR

### System information
- **What is the top-level directory of the model you are using**: models/research/object_detection/
- **Have I written custom code (as opposed to using a stock example script provided in TensorFlow)**: Yes
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**: Ubuntu
- **TensorFlow installed from (source or binary)**: Source
- **TensorFlow version (use command below)**: 1.9.0
- **Bazel version (if compiling from source)**:
- **CUDA/cuDNN version**:
- **GPU model and memory**: NVIDIA K80 GPUs
- **Exact command to reproduce**: `python /home/ubuntu/data/tensorflow/models/research/object_detection/metrics/offline_eval_map_corloc.py \
  --eval_dir='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics' \
  --eval_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt' \
  --input_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt'`

### Describe the problem
I fine-tuned the faster_rcnn_resnet101 model available on the model zoo. I had used the train and evaluation dataset for the train process. On the tensorboard I was monitoring the model performance on the metrics - mAP and AR. Once I get the fine-tuned model, I want to evaluate the performance of a test dataset that the model has not seen. 

I found followed this documentation on offline evaluation but for my dataset - https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/oid_inference_and_evaluation.md. Here are the steps that I followed:

1. Created the TFRecord for the test dataset with similar fields as in the TFRecord for the train dataset
2. Ran the inference on this test dataset using the query:
`python /home/ubuntu/data/tensorflow/models/research/object_detection/inference/infer_detections.py \
  --input_tfrecord_paths=$TF_RECORD_FILES \
  --output_tfrecord_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/Predictions/train.record' \
  --inference_graph=$OUTPUT_INFERENCE_GRAPH \
  --discard_image_pixels`
This creates bounding box predictions for the test data.
3. I then evaluate the detection bounding boxes using this:
`
echo "
label_map_path: '/home/ubuntu/data/tensorflow/my_workspace/training_demo/annotations/label_map.pbtxt'
tf_record_input_reader: { input_path: '/home/ubuntu/data/tensorflow/my_workspace/training_demo/Predictions/train.record' }
" > /home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt

echo "
metrics_set: 'coco_detection_metrics'
" > /home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt


python /home/ubuntu/data/tensorflow/models/research/object_detection/metrics/offline_eval_map_corloc.py \
  --eval_dir='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics' \
  --eval_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt' \
  --input_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt'`

 
This whole thing (step 1, 2, 3) works perfect however, I see negative values (-1.0) for some mAP and AR.

Here is the output of the evaluation (on train, eval and test dataset) that I ran using the above queries:

`
# Evaluation on test data
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.459
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.601
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.543
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.459
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.543
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.627
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.628
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.628
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000


# evaluation on eval.record
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.371
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.521
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.428
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.371
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.458
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.537
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.539
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.539
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000

# evaluation on train.record
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.525
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.677
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.619
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.525
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.521
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.614
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.615
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.615
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
`

I am not sure why I see -1.0 in the AP and AR when I do have the correct label map and bounding boxes of small, medium and large sizes available in my dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Object Detection] Evaluation procedure throwing negative values for AP and AR #6021

System information

Describe the problem

Evaluation on test data

evaluation on eval.record

evaluation on train.record

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Object Detection] Evaluation procedure throwing negative values for AP and AR #6021

Description

System information

Describe the problem

Evaluation on test data

evaluation on eval.record

evaluation on train.record

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions