Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine Tuning failed #53

Closed
anguszxd opened this issue Jun 18, 2016 · 3 comments
Closed

Fine Tuning failed #53

anguszxd opened this issue Jun 18, 2016 · 3 comments

Comments

@anguszxd
Copy link

Hi, everyone! Thank you for your reading my issue at first.
I'm trying to retrian CRFasRNN using other data instead of VOC. I modifiy TVG_CRFRNN_new_traintest.prototxt , and it works well when I train the model with all parameters randomly initialized. However, when I try to fine tune with fcn-8s-pascal.caffemodel , it fails.
Here are some log info:
**I0618 10:43:15.192327 567 caffe.cpp:128] Finetuning from ./models/fcn-8s-pascal.caffemodel
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 537962613
I0618 10:43:16.506494 567 upgrade_proto.cpp:620] Attempting to upgrade input file specified using deprecated V1LayerParameter: ./models/fcn-8s-pascal.caffemodel
I0618 10:43:17.094655 567 upgrade_proto.cpp:628] Successfully upgraded file specified using deprecated V1LayerParameter
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 537962613
I0618 10:43:18.547567 567 upgrade_proto.cpp:620] Attempting to upgrade input file specified using deprecated V1LayerParameter: ./models/fcn-8s-pascal.caffemodel
I0618 10:43:19.128402 567 upgrade_proto.cpp:628] Successfully upgraded file specified using deprecated V1LayerParameter
I0618 10:43:19.280107 567 caffe.cpp:211] Starting Optimization
I0618 10:43:19.280191 567 solver.cpp:293] Solving New_DATA_TRAIN
I0618 10:43:19.280205 567 solver.cpp:294] Learning Rate Policy: fixed
*** Aborted at 1466217832 (unix time) try "date -d @1466217832" if you are using GNU date ***
PC: @ 0x7f614c91f528 caffe::SoftmaxWithLossLayer<>::Backward_cpu()
* SIGSEGV (@0x250db0ac) received by PID 567 (TID 0x7f614d0ba780) from PID 621654188; stack trace: ***
@ 0x7f614b3a9cb0 (unknown)
@ 0x7f614c91f528 caffe::SoftmaxWithLossLayer<>::Backward_cpu()
@ 0x7f614c991dc9 caffe::Net<>::BackwardFromTo()
@ 0x7f614c991ea1 caffe::Net<>::Backward()
@ 0x7f614c8a89e1 caffe::Solver<>::Step()
@ 0x7f614c8a9225 caffe::Solver<>::Solve()
@ 0x408edb train()
@ 0x4068d1 main
@ 0x7f614b394f45 (unknown)
@ 0x406fbd (unknown)
@ 0x0 (unknown)

Dose anyone know how to solve this problem? Thanks a lot!

@bittnt
Copy link
Collaborator

bittnt commented Jun 19, 2016

Which fcn-8s model you are using? If you are using the fcn-8s-pascal.caffemodel in Evan's previous released, it should work.

From the error message, the problem seems to be due to the softmax loss layer.

@anguszxd
Copy link
Author

Hi @bittnt ! Thank you for your reply.
Although I'm still confused, but I have fixed this problem.
I have 8 classes in the training data, so I set the labels to be 0-7. It works well when I train without any pre-trained model, but fails to fine tune with fcn-8s-pascal.caffemodel. I have checked the prepared data, and modify the labels to be 1-8, and leave label 0 as a default class. And it is able to tine tune.
As a noob to caffe, I have to say it is not easy to handle. I will be more careful to every detail.

@xiewei198908
Copy link

@anguszxd hi,. when i set the labels to be 1-7 and the background is 0, I have also encountered this problem:

I1123 13:36:52.870985 2435 caffe.cpp:128] Finetuning from examples/segmentationcrfasrnn/TVG_CRFRNN_COCO_VOC.caffemodel
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 537968303
I1123 13:36:59.896049 2435 upgrade_proto.cpp:620] Attempting to upgrade input file specified using deprecated V1LayerParameter: examples/segmentationcrfasrnn/TVG_CRFRNN_COCO_VOC.caffemodel
I1123 13:37:00.105247 2435 upgrade_proto.cpp:628] Successfully upgraded file specified using deprecated V1LayerParameter
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 537968303
I1123 13:37:00.702107 2435 upgrade_proto.cpp:620] Attempting to upgrade input file specified using deprecated V1LayerParameter: examples/segmentationcrfasrnn/TVG_CRFRNN_COCO_VOC.caffemodel
I1123 13:37:00.891943 2435 upgrade_proto.cpp:628] Successfully upgraded file specified using deprecated V1LayerParameter
I1123 13:37:00.962447 2435 caffe.cpp:211] Starting Optimization
I1123 13:37:00.962471 2435 solver.cpp:293] Solving CRFRNN-VOC
I1123 13:37:00.962473 2435 solver.cpp:294] Learning Rate Policy: fixed
I1123 13:37:00.964704 2435 solver.cpp:346] Iteration 0, Testing net (#0)
I1123 13:40:36.198457 2435 solver.cpp:414] Test net output #0: accuracy = 0.942565
I1123 13:40:41.618705 2435 solver.cpp:242] Iteration 0, loss = 55547.3
I1123 13:40:41.618749 2435 solver.cpp:258] Train net output #0: accuracy = 0.970304
I1123 13:40:41.618757 2435 solver.cpp:258] Train net output #1: loss = 55547.3 (* 1 = 55547.3 loss)
I1123 13:40:41.618779 2435 solver.cpp:571] Iteration 0, lr = 1e-13
F1123 13:40:41.627219 2435 syncedmem.cpp:58] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x7f07bd4dfdaa (unknown)
@ 0x7f07bd4dfce4 (unknown)
@ 0x7f07bd4df6e6 (unknown)
@ 0x7f07bd4e2687 (unknown)
@ 0x7f07bdbd0f71 caffe::SyncedMemory::to_gpu()
@ 0x7f07bdbd02f9 caffe::SyncedMemory::mutable_gpu_data()
@ 0x7f07bdafdd32 caffe::Blob<>::mutable_gpu_data()
@ 0x7f07bdbd820e caffe::SGDSolver<>::ComputeUpdateValue()
@ 0x7f07bdbdc453 caffe::SGDSolver<>::ApplyUpdate()
@ 0x7f07bdbe72be caffe::Solver<>::Step()
@ 0x7f07bdbe79e5 caffe::Solver<>::Solve()
@ 0x408f0b train()
@ 0x406901 main
@ 0x7f07bc9dff45 (unknown)
@ 0x406fed (unknown)
@ (nil) (unknown)
./examples/segmentationcrfasrnn/TVG_CRFRNN.sh: 行 7: 2435 已放弃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants