question about gradient calculation with respect to weight #2

jay2002 · 2017-07-21T12:04:33Z

In Margin Inner Product, the gradient with respect to weight is very simple:

  // Gradient with respect to weight
  if (this->param_propagate_down_[0]) {
    caffe_cpu_gemm<Dtype>(CblasTrans, CblasNoTrans, N_, K_, M_, (Dtype)1.,
        top_diff, bottom_data, (Dtype)1., this->blobs_[0]->mutable_cpu_diff());
  }

But in large margin softmax, the gradient calcuation is much more complex...

Can you please tell me how to simplify the gradient calcuation?
I failed to derive it ...

The text was updated successfully, but these errors were encountered:

YYuanAnyVision · 2017-07-24T07:01:06Z

same question here. why update of weight it the same for m=1, 2, 3, 4

YYuanAnyVision · 2017-07-24T07:57:39Z

also, since you normalize the weight by overwriting in forward() instead of keeping the original weight like here:

  Dtype* norm_weight = this->blobs_[0]->mutable_cpu_data();
  Dtype temp_norm = (Dtype)0.;
  for (int i = 0; i < N_; i++) {
  	temp_norm = caffe_cpu_dot(K_, norm_weight + i * K_, norm_weight + i * K_);
  	temp_norm = (Dtype)1./sqrt(temp_norm);
  	caffe_scal(K_, temp_norm, norm_weight + i * K_);
  }

so everytime the after update of parameter: weight = weight + grad_w, the weight is simply clipped into the normalized version?

tornadomeet · 2017-07-28T00:43:09Z

+1, backward seems not correspond with forward.

wy1iu · 2017-07-31T00:49:59Z

It is actually a normalized version of the gradient, which can help converge more stably. The direction is the same as before. What we do here is simply to rescale the gradient (the learning rate can help us decide the scale). Similar idea and intuition also appear in https://arxiv.org/pdf/1707.04822.pdf.

However, if you use the original gradient to do the backprop, you could still make it work and obtain similar results, but may not be as stable as this normalized one.

wy1iu closed this as completed Aug 1, 2017

double-vane mentioned this issue Sep 26, 2017

loss =87.3 #40

Open

taoyunuo mentioned this issue Nov 30, 2017

loss is about 9？？How to make loss drop #59

Open

Riko0 mentioned this issue Sep 10, 2019

lambda on test #126

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about gradient calculation with respect to weight #2

question about gradient calculation with respect to weight #2

jay2002 commented Jul 21, 2017

YYuanAnyVision commented Jul 24, 2017

YYuanAnyVision commented Jul 24, 2017

tornadomeet commented Jul 28, 2017

wy1iu commented Jul 31, 2017 •

edited

question about gradient calculation with respect to weight #2

question about gradient calculation with respect to weight #2

Comments

jay2002 commented Jul 21, 2017

YYuanAnyVision commented Jul 24, 2017

YYuanAnyVision commented Jul 24, 2017

tornadomeet commented Jul 28, 2017

wy1iu commented Jul 31, 2017 • edited

wy1iu commented Jul 31, 2017 •

edited