Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got high loss after restoring parameters from ckpt #2952

Closed
xiaoxTM opened this issue Dec 4, 2017 · 13 comments
Closed

Got high loss after restoring parameters from ckpt #2952

xiaoxTM opened this issue Dec 4, 2017 · 13 comments
Assignees

Comments

@xiaoxTM
Copy link

xiaoxTM commented Dec 4, 2017

Please go to Stack Overflow for help and support:

http://stackoverflow.com/questions/tagged/tensorflow

Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:

  1. It must be a bug or a feature request.
  2. The form below must be filled out.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

  • What is the top-level directory of the model you are using: research
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): customized for task of segmentation
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 14.04 (64bit)
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 1.20
  • Bazel version (if compiling from source):
  • CUDA/cuDNN version: CUDA-8 CUDNN-6
  • GPU model and memory: Tesla-P100
  • Exact command to reproduce:
TRAIN_DIR=object_detection/ckpt/ssd_inception
PIPELINE_CONFIG_PATH=object_detection/samples/configs/ssd_inception_v2_coco.config

python3 object_detection/train.py \
            --logtostderr \
            --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
            --train_dir=${TRAIN_DIR}

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

Hi everyone here:
I have got a issue when training (fine-tuning) Tensorflow Object Detection API model.
I restored parameters from fine-tuning-ckpt (saved models to another directory, e.g., TRAIN_DIR), and after fine-tuning with big learning rate, the loss decreased as I expected. As it went lower and lower (e.g., 2.0), it is time to use smaller learning rate. So I moved the saved parameters from TRAIN_DIR to fine-tuning-ckpt, and removed all files under TRAIN_DIR. When I re-run the program, I expected the loss to be low (about 2.0), but got 300+. I tried several times and got the same results every time. So I guess the reason is that, the program I was running did not use the saved parameters (or did not restore from TRAIN_DIR), so I printed the variables_to_restore but found that all variable names. Does anybody have any idea or have the same problem ?

Best Regards

Source code / logs

@idorozenberg
Copy link

Also happened to me. Please help with this issue. Makes it impossible to work with the SSD meta-model if I can't use transfer learning properly.

@idorozenberg
Copy link

@drpngx @derekjchow. Are you looking into this?

Thank you.

@sarah-zhu
Copy link

I also had the same issue when training SSD model.

@drpngx drpngx added the stat:awaiting model gardener Waiting on input from TensorFlow model gardener label Jan 13, 2018
@drpngx
Copy link
Contributor

drpngx commented Jan 13, 2018

@derekjchow knows best. What optimizer are you using?

@idorozenberg
Copy link

Tried Adam, regular momentum.

@sarah-zhu
Copy link

It seems the variables are restored from pre-trained model, but the program is not using the restored variables.
Please help!

@tensorflowbutler tensorflowbutler removed the stat:awaiting model gardener Waiting on input from TensorFlow model gardener label Apr 6, 2018
@dextroza
Copy link

dextroza commented Apr 16, 2018

Also happend to me. I took pretrained model ssd_mobilenet_v2_coco and tried to continue to train on COCO dataset. First few losses are 300+. I expected less than 10. Did not change anything in ssd_mobilenet_v2_coco_2018_03_29.config

@JulienSiems
Copy link

Same here with a Mask RCNN inception V2 model pretrained on a custom data set.

@dcarnino
Copy link

+1 with Faster RCNN inception resnet v2 atrous coco on a custom data set.

@tT0NG
Copy link

tT0NG commented Jun 14, 2018

+1 with SSD inception v2 coco on custom data set:

  1. load the SSD_inception_v2_coco model, training on custom data till 100k iterations. The cost value decreases from 15 to around 1.2
  2. save the trained model to 100k.ckpt
  3. simply restore from the 100k.ckpt directly without changing any data, and resume training
  4. the cost won't start from previous value (1.2) and jumped to a higher value as the initial iteration (15)

@TitusTom
Copy link

same as dcarnino. Happened on multiple occasions.
+2 with Faster RCNN inception resnet v2 on a custom dataset

@huangbiubiu
Copy link

huangbiubiu commented Jul 15, 2018

+1 with SphereFace.
I use Dataset API and Adam optimizer with default parameters.
I was not using models in this repo. I write codes myself.

Update:

There is a possible solution for people who have this problem:

Please make sure your label is same each time especially you use array indices as labels. Usually happens when use set() or os.listdir() to get a class collection.
This question on StackOverflow might fix your problem.

@wt-huang
Copy link

wt-huang commented Nov 3, 2018

Closing as this is resolved

@wt-huang wt-huang closed this as completed Nov 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests