Skip to content

object detection: transfer learning have errors #2672

@scotthuang1989

Description

@scotthuang1989

System information

  • What is the top-level directory of the model you are using:models/research
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):NO
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):ubuntu 16.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below):tf_nightly_gpu-1.5.0.dev20171031-cp35-cp35m-manylinux1_x86_64
  • Bazel version (if compiling from source):
  • CUDA/cuDNN version: cuda8.0
  • GPU model and memory:GT1060/6GB
  • Exact command to reproduce:

python object_detection/train.py --logtostderr --pipeline_config_path=/home/scott/github/models/research/object_detection/samples/configs/ssd_mobilenet_v1_pets.config --train_dir=./ssd_mobile_v1_pets_retrain

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

Describe the problem

when I run above command. I got following errors, I omit most of them for simplicity:

File "/home/scott/github/models/research/object_detection/utils/variables_helper.py", line 122, in get_variables_available_in_checkpoint
ckpt_reader = tf.train.NewCheckpointReader(checkpoint_path)
File "/home/scott/anaconda3/envs/tfgpu/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 195, in NewCheckpointReader
return CheckpointReader(compat.as_bytes(filepattern), status)
File "/home/scott/anaconda3/envs/tfgpu/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /home/scott/github/models/research/object_detection/ssd_mobilenet_v1_coco_11_06_2017/model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
(tfgpu) scott@scott-b250:~/github/models/research$

From the log, It seems the format of checkpoint file is not right. but I got this from model zoom. I also try other models(faster_rcnn_resnet101_coco_11_06_2017), same error.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions