Fine Tuning failed #53

anguszxd · 2016-06-18T08:10:31Z

Hi, everyone! Thank you for your reading my issue at first.
I'm trying to retrian CRFasRNN using other data instead of VOC. I modifiy TVG_CRFRNN_new_traintest.prototxt , and it works well when I train the model with all parameters randomly initialized. However, when I try to fine tune with fcn-8s-pascal.caffemodel , it fails.
Here are some log info:
**I0618 10:43:15.192327 567 caffe.cpp:128] Finetuning from ./models/fcn-8s-pascal.caffemodel
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 537962613
I0618 10:43:16.506494 567 upgrade_proto.cpp:620] Attempting to upgrade input file specified using deprecated V1LayerParameter: ./models/fcn-8s-pascal.caffemodel
I0618 10:43:17.094655 567 upgrade_proto.cpp:628] Successfully upgraded file specified using deprecated V1LayerParameter
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 537962613
I0618 10:43:18.547567 567 upgrade_proto.cpp:620] Attempting to upgrade input file specified using deprecated V1LayerParameter: ./models/fcn-8s-pascal.caffemodel
I0618 10:43:19.128402 567 upgrade_proto.cpp:628] Successfully upgraded file specified using deprecated V1LayerParameter
I0618 10:43:19.280107 567 caffe.cpp:211] Starting Optimization
I0618 10:43:19.280191 567 solver.cpp:293] Solving New_DATA_TRAIN
I0618 10:43:19.280205 567 solver.cpp:294] Learning Rate Policy: fixed
*** Aborted at 1466217832 (unix time) try "date -d @1466217832" if you are using GNU date ***
PC: @ 0x7f614c91f528 caffe::SoftmaxWithLossLayer<>::Backward_cpu()
* SIGSEGV (@0x250db0ac) received by PID 567 (TID 0x7f614d0ba780) from PID 621654188; stack trace: ***
@ 0x7f614b3a9cb0 (unknown)
@ 0x7f614c91f528 caffe::SoftmaxWithLossLayer<>::Backward_cpu()
@ 0x7f614c991dc9 caffe::Net<>::BackwardFromTo()
@ 0x7f614c991ea1 caffe::Net<>::Backward()
@ 0x7f614c8a89e1 caffe::Solver<>::Step()
@ 0x7f614c8a9225 caffe::Solver<>::Solve()
@ 0x408edb train()
@ 0x4068d1 main
@ 0x7f614b394f45 (unknown)
@ 0x406fbd (unknown)
@ 0x0 (unknown)
Dose anyone know how to solve this problem? Thanks a lot!

The text was updated successfully, but these errors were encountered:

bittnt · 2016-06-19T09:44:55Z

Which fcn-8s model you are using? If you are using the fcn-8s-pascal.caffemodel in Evan's previous released, it should work.

From the error message, the problem seems to be due to the softmax loss layer.

anguszxd · 2016-06-21T02:07:25Z

Hi @bittnt ! Thank you for your reply.
Although I'm still confused, but I have fixed this problem.
I have 8 classes in the training data, so I set the labels to be 0-7. It works well when I train without any pre-trained model, but fails to fine tune with fcn-8s-pascal.caffemodel. I have checked the prepared data, and modify the labels to be 1-8, and leave label 0 as a default class. And it is able to tine tune.
As a noob to caffe, I have to say it is not easy to handle. I will be more careful to every detail.

xiewei198908 · 2016-11-23T05:52:43Z

@anguszxd hi,. when i set the labels to be 1-7 and the background is 0, I have also encountered this problem：

I1123 13:36:52.870985 2435 caffe.cpp:128] Finetuning from examples/segmentationcrfasrnn/TVG_CRFRNN_COCO_VOC.caffemodel
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 537968303
I1123 13:36:59.896049 2435 upgrade_proto.cpp:620] Attempting to upgrade input file specified using deprecated V1LayerParameter: examples/segmentationcrfasrnn/TVG_CRFRNN_COCO_VOC.caffemodel
I1123 13:37:00.105247 2435 upgrade_proto.cpp:628] Successfully upgraded file specified using deprecated V1LayerParameter
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 537968303
I1123 13:37:00.702107 2435 upgrade_proto.cpp:620] Attempting to upgrade input file specified using deprecated V1LayerParameter: examples/segmentationcrfasrnn/TVG_CRFRNN_COCO_VOC.caffemodel
I1123 13:37:00.891943 2435 upgrade_proto.cpp:628] Successfully upgraded file specified using deprecated V1LayerParameter
I1123 13:37:00.962447 2435 caffe.cpp:211] Starting Optimization
I1123 13:37:00.962471 2435 solver.cpp:293] Solving CRFRNN-VOC
I1123 13:37:00.962473 2435 solver.cpp:294] Learning Rate Policy: fixed
I1123 13:37:00.964704 2435 solver.cpp:346] Iteration 0, Testing net (#0)
I1123 13:40:36.198457 2435 solver.cpp:414] Test net output #0: accuracy = 0.942565
I1123 13:40:41.618705 2435 solver.cpp:242] Iteration 0, loss = 55547.3
I1123 13:40:41.618749 2435 solver.cpp:258] Train net output #0: accuracy = 0.970304
I1123 13:40:41.618757 2435 solver.cpp:258] Train net output #1: loss = 55547.3 (* 1 = 55547.3 loss)
I1123 13:40:41.618779 2435 solver.cpp:571] Iteration 0, lr = 1e-13
F1123 13:40:41.627219 2435 syncedmem.cpp:58] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x7f07bd4dfdaa (unknown)
@ 0x7f07bd4dfce4 (unknown)
@ 0x7f07bd4df6e6 (unknown)
@ 0x7f07bd4e2687 (unknown)
@ 0x7f07bdbd0f71 caffe::SyncedMemory::to_gpu()
@ 0x7f07bdbd02f9 caffe::SyncedMemory::mutable_gpu_data()
@ 0x7f07bdafdd32 caffe::Blob<>::mutable_gpu_data()
@ 0x7f07bdbd820e caffe::SGDSolver<>::ComputeUpdateValue()
@ 0x7f07bdbdc453 caffe::SGDSolver<>::ApplyUpdate()
@ 0x7f07bdbe72be caffe::Solver<>::Step()
@ 0x7f07bdbe79e5 caffe::Solver<>::Solve()
@ 0x408f0b train()
@ 0x406901 main
@ 0x7f07bc9dff45 (unknown)
@ 0x406fed (unknown)
@ (nil) (unknown)
./examples/segmentationcrfasrnn/TVG_CRFRNN.sh: 行 7: 2435 已放弃

anguszxd closed this as completed Jun 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine Tuning failed #53

Fine Tuning failed #53

anguszxd commented Jun 18, 2016

bittnt commented Jun 19, 2016

anguszxd commented Jun 21, 2016

xiewei198908 commented Nov 23, 2016

Fine Tuning failed #53

Fine Tuning failed #53

Comments

anguszxd commented Jun 18, 2016

bittnt commented Jun 19, 2016

anguszxd commented Jun 21, 2016

xiewei198908 commented Nov 23, 2016