Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question in Backward code #11

Closed
flymark2010 opened this issue Jun 15, 2017 · 4 comments
Closed

question in Backward code #11

flymark2010 opened this issue Jun 15, 2017 · 4 comments

Comments

@flymark2010
Copy link

Hi, thanks for your great work. I have some doubt about the Backward code:

1. template <typename Dtype>
2. void CConvolutionLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
3.       const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
4. 	const Dtype* weightTmp = this->weight_tmp_.cpu_data();  
5. 	const Dtype* weightMask = this->blobs_[2]->cpu_data();
6. 	Dtype* weight_diff = this->blobs_[0]->mutable_cpu_diff();
7.   for (int i = 0; i < top.size(); ++i) {
8.     const Dtype* top_diff = top[i]->cpu_diff();    
9.     // Bias gradient, if necessary.
10.     if (this->bias_term_ && this->param_propagate_down_[1]) {
11. 			const Dtype* biasMask = this->blobs_[3]->cpu_data();
12.       Dtype* bias_diff = this->blobs_[1]->mutable_cpu_diff();			
13. 			for (unsigned int k = 0;k < this->blobs_[1]->count(); ++k) {
14. 				bias_diff[k] = bias_diff[k]*biasMask[k];
15. 			}
16.       for (int n = 0; n < this->num_; ++n) {
17.         this->backward_cpu_bias(bias_diff, top_diff + top[i]->offset(n));
18.       }
19.     }
20.     if (this->param_propagate_down_[0] || propagate_down[i]) {
21. 			const Dtype* bottom_data = bottom[i]->cpu_data();
22. 			Dtype* bottom_diff = bottom[i]->mutable_cpu_diff();	
23. 			for (unsigned int k = 0;k < this->blobs_[0]->count(); ++k) {
24. 				weight_diff[k] = weight_diff[k]*weightMask[k];
25. 			}
26.       for (int n = 0; n < this->num_; ++n) {
27.         // gradient w.r.t. weight. Note that we will accumulate diffs.
28.         if (this->param_propagate_down_[0]) {
29.           this->weight_cpu_gemm(bottom_data + bottom[i]->offset(n),
30.               top_diff + top[i]->offset(n), weight_diff);
31.         }
32.         // gradient w.r.t. bottom data, if necessary.
33.         if (propagate_down[i]) {
34.           this->backward_cpu_gemm(top_diff + top[i]->offset(n), weightTmp,
35.               bottom_diff + bottom[i]->offset(n));
36.         }
37.       }
38.     }
39.   }
40. }

To my understanding of caffe, the diff of weight blob is always set to 0 before each iteration. That's to say, weights_diff[k] and bias_diff[k] are always 0 before the backward_cpu_bias and weight_cpu_gemm. So operations of line 14 & line 24 are redundant. What do you really want to do? Does it should be weightTmp instead of weight_diff on line 24?

Thanks very much!

@yiwenguo
Copy link
Owner

You are right, @flymark2010 . That might be some testing code that I forgot to comment. Better comment those lines for higher efficiency.

@flymark2010
Copy link
Author

@yiwenguo Ok. Thanks !

@kai-xie
Copy link

kai-xie commented Jul 20, 2017

@yiwenguo I also have a question about this part. According to your paper, weight_diff[k] and bias_diff[k] are supposed to be updated according to weightMask[k] and biasMask[k] . So is it right to move line 13 - 15 after line 18, and move line 23 - 25 after line 31? Or just remove line 13 - 15 and 23 - 25?
Thank you very much!

@dongxiao92
Copy link

@kai-xie I think if we use mask computed diffs(just as moving codes as you asked),weights and biases masked wil never be alive.So we pass errors to udpate those masked parameters to see if they can come alive although it's not correct in mathematics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants