-
Notifications
You must be signed in to change notification settings - Fork 45.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSD MobileNet: Cannot finetune from checkpoint #1836
Comments
How did you train the net? Could you tell me? Thanks |
I downloaded the pretraine model from http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz and followed the tutorial https://github.com/tensorflow/models/blob/master/object_detection/g3doc/running_locally.md using the Oxford Pets sample config file (only changed the number of classes and the paths). If I do the same steps with the pretrained FasterRCNN model, everything works. |
BTW Im using Python 3.6, so I had the change an
|
I also have similar issue. python object_detection/train.py --logtostderr --pipeline_config_path=./object_detection/samples/configs/rfcn_resnet101_pets.config --train_dir=object_detection/pet_train W tensorflow/core/framework/op_kernel.cc:993] Not found: Key FirstStageFeatureExtractor/resnet_v1_101/block3/unit_7/bottleneck_v1/conv1/BatchNorm/gamma/Momentum not found in checkpoint Caused by op u'save_1/RestoreV2_15', defined at: NotFoundError (see above for traceback): Key FirstStageFeatureExtractor/resnet_v1_101/block1/unit_1/bottleneck_v1/conv1/BatchNorm/gamma/Momentum not found in checkpoint Traceback (most recent call last): Caused by op u'save_1/RestoreV2_15', defined at: NotFoundError (see above for traceback): Key FirstStageFeatureExtractor/resnet_v1_101/block1/unit_1/bottleneck_v1/conv1/BatchNorm/gamma/Momentum not found in checkpoint |
@groakat @is03wlei |
@chenyuZha |
@is03wlei just remove the old checkpoints in the folder or you indicate a new path of |
@chenyuZha |
I downloaded the ssd mobilenet tar.gz . I have an empty train_dir, and I have put the checkpoints in the data folder, it still gives me the following error: |
hello,i also test the model,ssd_mobilenet_v1_coco_11_06_2017.tar.gz, but it can not rechieve the real-time performance ,isn`t it? |
@chenyuZha can you explain what you mean by initializing the train_dir? I am using the checkpoint files provided from ssd_mobilenet_v1_coco_11_06_2017.tar.gz in the model detection zoo, and my pipeline configuration file correctly points to the location of these checkpoint files, yet it still fails to find all of these feature extractors. |
Your train script is probably something like:
Do the following:
Your final train script will be: Yes, it is absolutely bizarre that this solution works. |
@chenyuZha I finetune mobilenet for classification and got four files, checkpoint, out_graph-2000.data-00000-of-00001, out_graph-2000.index, out_graph-2000.meta. so which one should delete. |
@gavincmartin That means train_dir should be empty. |
@WillLiGitHub The checkpoint file contains the paths of all of ckpt that you have trained (in your case you have only one because it seems like you trained just for one ckpt). The rest 3 files is used to export your graph .pb. So if you delete any of 3 you will not success to have .pb file. |
@chenyuZha |
@chenyuZha remove the train dir does work. But this same error occur when restoring from parameters from the same models. Just run it, stop, and run it again without changing anything, and you will see the same error showing up. It is potentially a bug in the implementation. |
Similar warning:
happens in ssd_inception_v2_coco_2017_11_17. |
Happens in Faster RCNN with Inception ResNet V2 as well and initiating the train log folder does not change anything |
WARNING:root:Variable [FeatureExtractor/MobilenetV2/Conv/BatchNorm/beta/ExponentialMovingAverage] is not available in checkpoint Also faster_rcnn_resnet101_coco_2018_01_28 got the same issue... |
Thank you Varun, I wonder if reverse commit 93b8168 cause any problem with current version? |
Update The warning seem to be indicatory of Momentum optimiser parameters, which aren't included in the model zoo checkpoints (I need to verify this, would be grateful if someone does this too). The checkpoints do contain the weights of layers (as can be seen by printing or logging the graph operators). Ergo, the warnings are correct, and as @pkulzc suggested, would be deprecated soon (new interfaces expected I guess).
I am training a few models (Faster RCNN, SSD, RFCN) with and without this commit and would be able to comment on this, say, tomorrow only. You can follow a similar thread on the commit too: 93b8168 |
@ccthien Although training has started successfully, the warning still appears. Will the that be a problem ? |
@satendra929 It work without any problem. |
@ccthien I tried that 81d7766 commit. But I am getting the same error. Could you please help. |
Hi @tarunluthra this should be warnings only, not error. |
This is the error I'm getting on SSDMobileNetv2 and I have no idea how to fix this, anyone? |
I am facing the same issue, does anyone know how to fix this? #5792 |
Are you fine-tuning from a checkpoint with a different optimiser? For
instance if you trained 500 steps with Adam then changed to SGD.
You'll have to not load the moving averages in that case.
On Wed 21 Nov, 2018, 9:40 AM Zubair Ahmed, ***@***.***> wrote:
I am facing the same issue, does anyone know how to fix this? #5792
<#5792>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1836 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AUME5_WsRTFciyMtcnqaKj0bykXd32rOks5uxNJQgaJpZM4OLuA6>
.
--
Thank you,
Varun.
|
Hi Varun I have not changed the optimiser in the config I am using TFOD API with the standard config taken from model configs |
Could you post all relevant details regarding this error for
reproducibility? Further I don't think this is the same issue as the one
being discussed in this thread. Could you open a new issue thread?
On Wed, 21 Nov 2018 at 10:11 AM, Zubair Ahmed ***@***.***> wrote:
Hi Varun
I have not changed the optimiser in the config I am using TFOD API with
the standard config taken from model configs
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1836 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AUME50m2Ho8q1DcfwZAR-IVzA4HyakWeks5uxNmMgaJpZM4OLuA6>
.
--
Thanks, Varun.
|
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. During handling of the above exception, another exception occurred: Traceback (most recent call last): |
how could one resolve the above error |
Hi, |
Hi There, |
I have three files in checkpoint folder of efficientdet_d1_coco17_tpu-32 model and now I need to give path to fine_tune_checkpoint in pipeline.config Those files are :
|
Hi,
I am trying to finetune SDD MobileNet and I am failing because somehow the variables are not found in the checkpoint even though they are present. Finetunig FasterRCNN on the same dataset works fine btw.
...
If I inspect the checkpoint, all these variables seem to exist:
I am using the checkpoint
ssd_mobilenet_v1_coco_11_06_2017.tar.gz
from the model zoo, tensorflow 1.2 and the master branch of tensorflow/modelsThe text was updated successfully, but these errors were encountered: