System information
- What is the top-level directory of the model you are using: models/research/object_detection/
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu
- TensorFlow installed from (source or binary): Source
- TensorFlow version (use command below): 1.9.0
- Bazel version (if compiling from source):
- CUDA/cuDNN version:
- GPU model and memory: NVIDIA K80 GPUs
- Exact command to reproduce:
python /home/ubuntu/data/tensorflow/models/research/object_detection/metrics/offline_eval_map_corloc.py \ --eval_dir='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics' \ --eval_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt' \ --input_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt'
Describe the problem
I fine-tuned the faster_rcnn_resnet101 model available on the model zoo. I had used the train and evaluation dataset for the train process. On the tensorboard I was monitoring the model performance on the metrics - mAP and AR. Once I get the fine-tuned model, I want to evaluate the performance of a test dataset that the model has not seen.
I found followed this documentation on offline evaluation but for my dataset - https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/oid_inference_and_evaluation.md. Here are the steps that I followed:
- Created the TFRecord for the test dataset with similar fields as in the TFRecord for the train dataset
- Ran the inference on this test dataset using the query:
python /home/ubuntu/data/tensorflow/models/research/object_detection/inference/infer_detections.py \ --input_tfrecord_paths=$TF_RECORD_FILES \ --output_tfrecord_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/Predictions/train.record' \ --inference_graph=$OUTPUT_INFERENCE_GRAPH \ --discard_image_pixels
This creates bounding box predictions for the test data.
- I then evaluate the detection bounding boxes using this:
`
echo "
label_map_path: '/home/ubuntu/data/tensorflow/my_workspace/training_demo/annotations/label_map.pbtxt'
tf_record_input_reader: { input_path: '/home/ubuntu/data/tensorflow/my_workspace/training_demo/Predictions/train.record' }
" > /home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt
echo "
metrics_set: 'coco_detection_metrics'
" > /home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt
python /home/ubuntu/data/tensorflow/models/research/object_detection/metrics/offline_eval_map_corloc.py
--eval_dir='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics'
--eval_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt'
--input_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt'`
This whole thing (step 1, 2, 3) works perfect however, I see negative values (-1.0) for some mAP and AR.
Here is the output of the evaluation (on train, eval and test dataset) that I ran using the above queries:
`
Evaluation on test data
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.459
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.601
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.543
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.459
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.627
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
evaluation on eval.record
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.371
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.521
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.428
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.371
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.458
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.537
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.539
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.539
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
evaluation on train.record
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.525
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.677
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.619
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.525
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.521
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.614
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.615
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.615
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
`
I am not sure why I see -1.0 in the AP and AR when I do have the correct label map and bounding boxes of small, medium and large sizes available in my dataset.
System information
python /home/ubuntu/data/tensorflow/models/research/object_detection/metrics/offline_eval_map_corloc.py \ --eval_dir='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics' \ --eval_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt' \ --input_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt'Describe the problem
I fine-tuned the faster_rcnn_resnet101 model available on the model zoo. I had used the train and evaluation dataset for the train process. On the tensorboard I was monitoring the model performance on the metrics - mAP and AR. Once I get the fine-tuned model, I want to evaluate the performance of a test dataset that the model has not seen.
I found followed this documentation on offline evaluation but for my dataset - https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/oid_inference_and_evaluation.md. Here are the steps that I followed:
python /home/ubuntu/data/tensorflow/models/research/object_detection/inference/infer_detections.py \ --input_tfrecord_paths=$TF_RECORD_FILES \ --output_tfrecord_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/Predictions/train.record' \ --inference_graph=$OUTPUT_INFERENCE_GRAPH \ --discard_image_pixelsThis creates bounding box predictions for the test data.
`
echo "
label_map_path: '/home/ubuntu/data/tensorflow/my_workspace/training_demo/annotations/label_map.pbtxt'
tf_record_input_reader: { input_path: '/home/ubuntu/data/tensorflow/my_workspace/training_demo/Predictions/train.record' }
" > /home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt
echo "
metrics_set: 'coco_detection_metrics'
" > /home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt
python /home/ubuntu/data/tensorflow/models/research/object_detection/metrics/offline_eval_map_corloc.py
--eval_dir='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics'
--eval_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt'
--input_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt'`
This whole thing (step 1, 2, 3) works perfect however, I see negative values (-1.0) for some mAP and AR.
Here is the output of the evaluation (on train, eval and test dataset) that I ran using the above queries:
`
Evaluation on test data
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.459
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.601
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.543
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.459
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.627
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
evaluation on eval.record
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.371
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.521
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.428
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.371
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.458
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.537
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.539
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.539
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
evaluation on train.record
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.525
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.677
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.619
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.525
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.521
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.614
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.615
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.615
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
`
I am not sure why I see -1.0 in the AP and AR when I do have the correct label map and bounding boxes of small, medium and large sizes available in my dataset.