question in Backward code #11

flymark2010 · 2017-06-15T05:50:44Z

Hi, thanks for your great work. I have some doubt about the Backward code:

1. template <typename Dtype>
2. void CConvolutionLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
3.       const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
4. 	const Dtype* weightTmp = this->weight_tmp_.cpu_data();  
5. 	const Dtype* weightMask = this->blobs_[2]->cpu_data();
6. 	Dtype* weight_diff = this->blobs_[0]->mutable_cpu_diff();
7.   for (int i = 0; i < top.size(); ++i) {
8.     const Dtype* top_diff = top[i]->cpu_diff();    
9.     // Bias gradient, if necessary.
10.     if (this->bias_term_ && this->param_propagate_down_[1]) {
11. 			const Dtype* biasMask = this->blobs_[3]->cpu_data();
12.       Dtype* bias_diff = this->blobs_[1]->mutable_cpu_diff();			
13. 			for (unsigned int k = 0;k < this->blobs_[1]->count(); ++k) {
14. 				bias_diff[k] = bias_diff[k]*biasMask[k];
15. 			}
16.       for (int n = 0; n < this->num_; ++n) {
17.         this->backward_cpu_bias(bias_diff, top_diff + top[i]->offset(n));
18.       }
19.     }
20.     if (this->param_propagate_down_[0] || propagate_down[i]) {
21. 			const Dtype* bottom_data = bottom[i]->cpu_data();
22. 			Dtype* bottom_diff = bottom[i]->mutable_cpu_diff();	
23. 			for (unsigned int k = 0;k < this->blobs_[0]->count(); ++k) {
24. 				weight_diff[k] = weight_diff[k]*weightMask[k];
25. 			}
26.       for (int n = 0; n < this->num_; ++n) {
27.         // gradient w.r.t. weight. Note that we will accumulate diffs.
28.         if (this->param_propagate_down_[0]) {
29.           this->weight_cpu_gemm(bottom_data + bottom[i]->offset(n),
30.               top_diff + top[i]->offset(n), weight_diff);
31.         }
32.         // gradient w.r.t. bottom data, if necessary.
33.         if (propagate_down[i]) {
34.           this->backward_cpu_gemm(top_diff + top[i]->offset(n), weightTmp,
35.               bottom_diff + bottom[i]->offset(n));
36.         }
37.       }
38.     }
39.   }
40. }

To my understanding of caffe, the diff of weight blob is always set to 0 before each iteration. That's to say, weights_diff[k] and bias_diff[k] are always 0 before the backward_cpu_bias and weight_cpu_gemm. So operations of line 14 & line 24 are redundant. What do you really want to do? Does it should be weightTmp instead of weight_diff on line 24?

Thanks very much!

The text was updated successfully, but these errors were encountered:

yiwenguo · 2017-06-15T09:03:43Z

You are right, @flymark2010 . That might be some testing code that I forgot to comment. Better comment those lines for higher efficiency.

flymark2010 · 2017-06-15T09:26:26Z

@yiwenguo Ok. Thanks !

kai-xie · 2017-07-20T14:51:57Z

@yiwenguo I also have a question about this part. According to your paper, weight_diff[k] and bias_diff[k] are supposed to be updated according to weightMask[k] and biasMask[k] . So is it right to move line 13 - 15 after line 18, and move line 23 - 25 after line 31? Or just remove line 13 - 15 and 23 - 25?
Thank you very much!

dongxiao92 · 2017-10-25T13:12:54Z

@kai-xie I think if we use mask computed diffs(just as moving codes as you asked),weights and biases masked wil never be alive.So we pass errors to udpate those masked parameters to see if they can come alive although it's not correct in mathematics.

flymark2010 closed this as completed Jun 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question in Backward code #11

question in Backward code #11

flymark2010 commented Jun 15, 2017

yiwenguo commented Jun 15, 2017

flymark2010 commented Jun 15, 2017

kai-xie commented Jul 20, 2017

dongxiao92 commented Oct 25, 2017

question in Backward code #11

question in Backward code #11

Comments

flymark2010 commented Jun 15, 2017

yiwenguo commented Jun 15, 2017

flymark2010 commented Jun 15, 2017

kai-xie commented Jul 20, 2017

dongxiao92 commented Oct 25, 2017