Got high loss after restoring parameters from ckpt #2952

xiaoxTM · 2017-12-04T09:47:38Z

Please go to Stack Overflow for help and support:

http://stackoverflow.com/questions/tagged/tensorflow

Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:

It must be a bug or a feature request.
The form below must be filled out.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.

System information

What is the top-level directory of the model you are using: research
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): customized for task of segmentation
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 14.04 (64bit)
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 1.20
Bazel version (if compiling from source):
CUDA/cuDNN version: CUDA-8 CUDNN-6
GPU model and memory: Tesla-P100
Exact command to reproduce:

TRAIN_DIR=object_detection/ckpt/ssd_inception
PIPELINE_CONFIG_PATH=object_detection/samples/configs/ssd_inception_v2_coco.config

python3 object_detection/train.py \
            --logtostderr \
            --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
            --train_dir=${TRAIN_DIR}

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

Hi everyone here:
I have got a issue when training (fine-tuning) Tensorflow Object Detection API model.
I restored parameters from fine-tuning-ckpt (saved models to another directory, e.g., TRAIN_DIR), and after fine-tuning with big learning rate, the loss decreased as I expected. As it went lower and lower (e.g., 2.0), it is time to use smaller learning rate. So I moved the saved parameters from TRAIN_DIR to fine-tuning-ckpt, and removed all files under TRAIN_DIR. When I re-run the program, I expected the loss to be low (about 2.0), but got 300+. I tried several times and got the same results every time. So I guess the reason is that, the program I was running did not use the saved parameters (or did not restore from TRAIN_DIR), so I printed the variables_to_restore but found that all variable names. Does anybody have any idea or have the same problem ?

Best Regards

Source code / logs

idorozenberg · 2017-12-31T10:42:42Z

Also happened to me. Please help with this issue. Makes it impossible to work with the SSD meta-model if I can't use transfer learning properly.

idorozenberg · 2018-01-10T15:04:39Z

@drpngx @derekjchow. Are you looking into this?

Thank you.

sarah-zhu · 2018-01-13T02:07:18Z

I also had the same issue when training SSD model.

drpngx · 2018-01-13T23:38:24Z

@derekjchow knows best. What optimizer are you using?

idorozenberg · 2018-01-14T09:20:31Z

Tried Adam, regular momentum.

sarah-zhu · 2018-01-16T07:18:01Z

It seems the variables are restored from pre-trained model, but the program is not using the restored variables.
Please help!

dextroza · 2018-04-16T14:13:13Z

Also happend to me. I took pretrained model ssd_mobilenet_v2_coco and tried to continue to train on COCO dataset. First few losses are 300+. I expected less than 10. Did not change anything in ssd_mobilenet_v2_coco_2018_03_29.config

JulienSiems · 2018-05-17T11:36:36Z

Same here with a Mask RCNN inception V2 model pretrained on a custom data set.

dcarnino · 2018-05-21T16:34:02Z

+1 with Faster RCNN inception resnet v2 atrous coco on a custom data set.

tT0NG · 2018-06-14T21:52:37Z

+1 with SSD inception v2 coco on custom data set:

load the SSD_inception_v2_coco model, training on custom data till 100k iterations. The cost value decreases from 15 to around 1.2
save the trained model to 100k.ckpt
simply restore from the 100k.ckpt directly without changing any data, and resume training
the cost won't start from previous value (1.2) and jumped to a higher value as the initial iteration (15)

TitusTom · 2018-07-13T10:56:58Z

same as dcarnino. Happened on multiple occasions.
+2 with Faster RCNN inception resnet v2 on a custom dataset

huangbiubiu · 2018-07-15T11:40:22Z

+1 with SphereFace.
I use Dataset API and Adam optimizer with default parameters.
I was not using models in this repo. I write codes myself.

Update:

There is a possible solution for people who have this problem:

Please make sure your label is same each time especially you use array indices as labels. Usually happens when use set() or os.listdir() to get a class collection.
This question on StackOverflow might fix your problem.

wt-huang · 2018-11-03T00:30:56Z

Closing as this is resolved

drpngx assigned derekjchow Dec 6, 2017

drpngx added the stat:awaiting model gardener Waiting on input from TensorFlow model gardener label Jan 13, 2018

tensorflowbutler removed the stat:awaiting model gardener Waiting on input from TensorFlow model gardener label Apr 6, 2018

bsaendig mentioned this issue Jul 31, 2018

High loss when fine-tuning a pre-trained detection model #4944

Closed

wt-huang closed this as completed Nov 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got high loss after restoring parameters from ckpt #2952

Got high loss after restoring parameters from ckpt #2952

xiaoxTM commented Dec 4, 2017 •

edited

idorozenberg commented Dec 31, 2017

idorozenberg commented Jan 10, 2018

sarah-zhu commented Jan 13, 2018

drpngx commented Jan 13, 2018

idorozenberg commented Jan 14, 2018

sarah-zhu commented Jan 16, 2018

dextroza commented Apr 16, 2018 •

edited

JulienSiems commented May 17, 2018

dcarnino commented May 21, 2018

tT0NG commented Jun 14, 2018 •

edited

TitusTom commented Jul 13, 2018

huangbiubiu commented Jul 15, 2018 •

edited

wt-huang commented Nov 3, 2018

Got high loss after restoring parameters from ckpt #2952

Got high loss after restoring parameters from ckpt #2952

Comments

xiaoxTM commented Dec 4, 2017 • edited

System information

Describe the problem

Source code / logs

idorozenberg commented Dec 31, 2017

idorozenberg commented Jan 10, 2018

sarah-zhu commented Jan 13, 2018

drpngx commented Jan 13, 2018

idorozenberg commented Jan 14, 2018

sarah-zhu commented Jan 16, 2018

dextroza commented Apr 16, 2018 • edited

JulienSiems commented May 17, 2018

dcarnino commented May 21, 2018

tT0NG commented Jun 14, 2018 • edited

TitusTom commented Jul 13, 2018

huangbiubiu commented Jul 15, 2018 • edited

Update:

wt-huang commented Nov 3, 2018

xiaoxTM commented Dec 4, 2017 •

edited

dextroza commented Apr 16, 2018 •

edited

tT0NG commented Jun 14, 2018 •

edited

huangbiubiu commented Jul 15, 2018 •

edited