Skip to content

[Object Detection] Evaluation procedure throwing negative values for AP and AR #6021

@Manish-rai21bit

Description

@Manish-rai21bit

System information

  • What is the top-level directory of the model you are using: models/research/object_detection/
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu
  • TensorFlow installed from (source or binary): Source
  • TensorFlow version (use command below): 1.9.0
  • Bazel version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory: NVIDIA K80 GPUs
  • Exact command to reproduce: python /home/ubuntu/data/tensorflow/models/research/object_detection/metrics/offline_eval_map_corloc.py \ --eval_dir='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics' \ --eval_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt' \ --input_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt'

Describe the problem

I fine-tuned the faster_rcnn_resnet101 model available on the model zoo. I had used the train and evaluation dataset for the train process. On the tensorboard I was monitoring the model performance on the metrics - mAP and AR. Once I get the fine-tuned model, I want to evaluate the performance of a test dataset that the model has not seen.

I found followed this documentation on offline evaluation but for my dataset - https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/oid_inference_and_evaluation.md. Here are the steps that I followed:

  1. Created the TFRecord for the test dataset with similar fields as in the TFRecord for the train dataset
  2. Ran the inference on this test dataset using the query:
    python /home/ubuntu/data/tensorflow/models/research/object_detection/inference/infer_detections.py \ --input_tfrecord_paths=$TF_RECORD_FILES \ --output_tfrecord_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/Predictions/train.record' \ --inference_graph=$OUTPUT_INFERENCE_GRAPH \ --discard_image_pixels
    This creates bounding box predictions for the test data.
  3. I then evaluate the detection bounding boxes using this:
    `
    echo "
    label_map_path: '/home/ubuntu/data/tensorflow/my_workspace/training_demo/annotations/label_map.pbtxt'
    tf_record_input_reader: { input_path: '/home/ubuntu/data/tensorflow/my_workspace/training_demo/Predictions/train.record' }
    " > /home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt

echo "
metrics_set: 'coco_detection_metrics'
" > /home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt

python /home/ubuntu/data/tensorflow/models/research/object_detection/metrics/offline_eval_map_corloc.py
--eval_dir='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics'
--eval_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt'
--input_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt'`

This whole thing (step 1, 2, 3) works perfect however, I see negative values (-1.0) for some mAP and AR.

Here is the output of the evaluation (on train, eval and test dataset) that I ran using the above queries:

`

Evaluation on test data

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.459
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.601
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.543
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.459
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.627
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000

evaluation on eval.record

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.371
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.521
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.428
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.371
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.458
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.537
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.539
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.539
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000

evaluation on train.record

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.525
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.677
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.619
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.525
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.521
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.614
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.615
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.615
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
`

I am not sure why I see -1.0 in the AP and AR when I do have the correct label map and bounding boxes of small, medium and large sizes available in my dataset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions