-
Couldn't load subscription status.
- Fork 45.4k
Closed
Description
System information
- What is the top-level directory of the model you are using: ~/tf_1_10_src/tensorflow/tensorflow/models
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 16.04.3 LTS
- TensorFlow installed from (source or binary): source
- TensorFlow version (use command below): 1.10.1
- Bazel version (if compiling from source): 0.15.2
- CUDA/cuDNN version: CUDA 9.0/cuDNN 7.3
- GPU model and memory: GTX TitanXp
- Exact command to reproduce:
cd ~/tf_1_10_src/tensorflow/tensorflow/models/research; ~/train_object_detection_v1.sh
Describe the problem
Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.
Source code / logs
Script train_object_detection_v1.sh source:
!/bin/bash
echo "Object detection script v.1"
echo "Check execution path"
if [[ "$PWD" = "$TFMODELPATH/research" ]]
then
echo "Current working directory is correct."
PROJECT_DIR=/media/nikita/LinuxBD4Tb/SSD_PROJECT
PIPELINE_CONFIG_PATH=$PROJECT_DIR/ssd_inception_v2_coco_nik.config
MODEL_DIR=$PROJECT_DIR/models/model
NUM_TRAIN_STEPS=4000000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
echo "PIPELINE_CONFIG_PATH=$PIPELINE_CONFIG_PATH"
echo "MODEL_DIR=$MODEL_DIR"
echo "NUM_TRAIN_STEPS=$NUM_TRAIN_STEPS"
echo "SAMPLE_1_OF_N_EVAL_EXAMPLES=$SAMPLE_1_OF_N_EVAL_EXAMPLES"
echo "-----------------"
echo "Start object_detection/model_main.py"
python object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${MODEL_DIR} \
--num_train_steps=${NUM_TRAIN_STEPS} \
--sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
--alsologtostderr
else
echo "Current working directory must be 'tensorflow/models/research/'."
echo "Correct path is '$TFMODELPATH/research'"
echo "Exit."
fi
Dir struct:
./SSD_PROJECT/
├── data
│ ├── mscoco_label_map.pbtxt
│ ├── mscoco_train.record
│ └── mscoco_val.record
├── models
│ └── model
│ ├── eval
│ ├── pipeline.config
│ └── train
└── ssd_inception_v2_coco_nik.config
File ssd_inception_v2_coco_nik.config:
model {
ssd {
num_classes: 90
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
reduce_boxes_in_lowest_layer: true
}
}
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 3
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
}
}
}
feature_extractor {
type: 'ssd_inception_v2'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
override_base_feature_extractor_hyperparams: true
}
loss {
classification_loss {
weighted_sigmoid {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 0
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
batch_size: 32
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
fine_tune_checkpoint: "/home/nikita/tf_1_10_src/pretrained_InceptionV2_ImageNet_CLS2012/inception_v2.ckpt"
from_detection_checkpoint: false
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
#num_steps: 300000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "/media/nikita/LinuxBD4Tb/SSD_PROJECT/data/mscoco_train.record"
}
label_map_path: "/media/nikita/LinuxBD4Tb/SSD_PROJECT/data/mscoco_label_map.pbtxt"
}
eval_config: {
num_examples: 5000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "/media/nikita/LinuxBD4Tb/SSD_PROJECT/data/mscoco_val.record"
}
label_map_path: "/media/nikita/LinuxBD4Tb/SSD_PROJECT/data/mscoco_label_map.pbtxt"
shuffle: false
num_readers: 1
}
Error log:
nikita@ubuntulinux:~/tf_1_10_src/tensorflow/tensorflow/models/research$ ~/train_object_detection_v1.sh
Object detection script v.1
Check execution path
Current working directory is correct.
PIPELINE_CONFIG_PATH=/media/nikita/LinuxBD4Tb/SSD_PROJECT/ssd_inception_v2_coco_nik.config
MODEL_DIR=/media/nikita/LinuxBD4Tb/SSD_PROJECT/models/model
NUM_TRAIN_STEPS=4000000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
-----------------
Start object_detection/model_main.py
/home/nikita/tf_1_10_src/tensorflow/tensorflow/models/research/object_detection/utils/visualization_utils.py:27: UserWarning:
This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.
The backend was *originally* set to u'TkAgg' by the following code:
File "object_detection/model_main.py", line 26, in <module>
from object_detection import model_lib
File "/home/nikita/tf_1_10_src/tensorflow/tensorflow/models/research/object_detection/model_lib.py", line 27, in <module>
from object_detection import eval_util
File "/home/nikita/tf_1_10_src/tensorflow/tensorflow/models/research/object_detection/eval_util.py", line 27, in <module>
from object_detection.metrics import coco_evaluation
File "/home/nikita/tf_1_10_src/tensorflow/tensorflow/models/research/object_detection/metrics/coco_evaluation.py", line 20, in <module>
from object_detection.metrics import coco_tools
File "/home/nikita/tf_1_10_src/tensorflow/tensorflow/models/research/object_detection/metrics/coco_tools.py", line 47, in <module>
from pycocotools import coco
File "/home/nikita/tf_1_10_src/tensorflow/tensorflow/models/research/pycocotools/coco.py", line 49, in <module>
import matplotlib.pyplot as plt
File "/usr/local/lib/python2.7/dist-packages/matplotlib/pyplot.py", line 69, in <module>
from matplotlib.backends import pylab_setup
File "/usr/local/lib/python2.7/dist-packages/matplotlib/backends/__init__.py", line 14, in <module>
line for line in traceback.format_stack()
import matplotlib; matplotlib.use('Agg') # pylint: disable=multiple-statements
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0924 15:58:51.491380 140112981780224 tf_logging.py:125] Forced number of epochs for all eval validations to be 1.
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W0924 15:58:51.491605 140112981780224 tf_logging.py:125] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
WARNING:tensorflow:Estimator's model_fn (<function model_fn at 0x7f6e49985c80>) includes params argument, but params are not passed to Estimator.
W0924 15:58:51.491858 140112981780224 tf_logging.py:125] Estimator's model_fn (<function model_fn at 0x7f6e49985c80>) includes params argument, but params are not passed to Estimator.
1) exporter_name=Servo_0; eval_spec_name=0(type <type 'int'>)
Traceback (most recent call last):
File "object_detection/model_main.py", line 109, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "object_detection/model_main.py", line 102, in main
eval_on_train_data=False)
File "/home/nikita/tf_1_10_src/tensorflow/tensorflow/models/research/object_detection/model_lib.py", line 659, in create_train_and_eval_specs
exporters=exporter))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 237, in __new__
raise TypeError('`name` must be string, given: {}'.format(name))
TypeError: `name` must be string, given: 0
Debug of file object_detection/model_lib.py shows:
File modifications:
eval_specs = [] #line 646
cntr = 0 # add counter
for eval_spec_name, eval_input_fn in zip(eval_spec_names, eval_input_fns):
cntr += 1 # add counter inc
exporter_name = '{}_{}'.format(final_exporter_name, eval_spec_name)
print("{}) exporter_name={}; eval_spec_name={}(type {})".format(cntr, exporter_name, eval_spec_name, type(eval_spec_name))) # add debuging variables
exporter = tf.estimator.FinalExporter(
name=exporter_name, serving_input_receiver_fn=predict_input_fn)
eval_specs.append(
tf.estimator.EvalSpec(
name=eval_spec_name),
input_fn=eval_input_fn,
steps=None,
exporters=exporter))
Result:
...
1) exporter_name=Servo_0; eval_spec_name=0(type <type 'int'>)
...
Possible solution (IMHO):
Change in object_detection/model_lib.py:
eval_specs.append(
tf.estimator.EvalSpec(
name=eval_spec_name),
input_fn=eval_input_fn,
steps=None,
exporters=exporter))
to:
eval_specs.append(
tf.estimator.EvalSpec(
name=str(eval_spec_name),
input_fn=eval_input_fn,
steps=None,
exporters=exporter))
P.S. Can't understand why in file object_detection/model_lib.py at line 644 list of integers generated:
if eval_spec_names is None: # line 643
eval_spec_names = range(len(eval_input_fns)) # creates integers. Line 644
Metadata
Metadata
Assignees
Labels
No labels