-
Notifications
You must be signed in to change notification settings - Fork 950
Conversation
I used the log_softmax implementation in tensorflow here and use the gradient function here I haven't added test cases yet. Any suggestion for the test cases? Can I duplicate the test cases for softmax and update the expect results accordingly? I directly put the Edit |
59d80c8
to
ab5ed1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution, on high level, it is preferred to implement the op on GPU as an independent op, meaning the logic should not relies on other ops, mainly for performance concern.
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov)
@pyu10055 thanks for the comments. I followed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, I did not realize our softmax did not have its own GPU implementation. Thanks for the explanation.
Reviewed 3 of 3 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @yhwang and @dsmilkov)
src/ops/softmax.ts, line 102 at r1 (raw file):
axis = $logits.rank - 1; } if (axis !== $logits.rank - 1) {
can you explain why non-last dim axis is not supported? thanks.
src/ops/softmax.ts, line 113 at r1 (raw file):
const shifted = logits.sub(xMax); const value = shifted.toFloat().sub(shifted.exp().sum(axis, keepDims).log()) as T;
do we need to cast the shifted to float?
src/ops/softmax.ts, line 115 at r1 (raw file):
shifted.toFloat().sub(shifted.exp().sum(axis, keepDims).log()) as T; const gradFunc = (dy: T) => {
do you have tf implementation of the grad function for reference? thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @dsmilkov)
src/ops/softmax.ts, line 102 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
can you explain why non-last dim axis is not supported? thanks.
basically, log_softmax and softmax need to be performed against the last dimension. In order to support non-last dim axis, we need to do a transpose to swap the axis in the beginning and sawp it back at the end. the original softmax in tfjs-core doesn't support that. So I follow the convention to not support it. but I do see tf supports it at python level API here:https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn_ops.py#L1687-L1704
src/ops/softmax.ts, line 113 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
do we need to cast the shifted to float?
this also follows the softmax impl above (line 61)
src/ops/softmax.ts, line 115 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
do you have tf implementation of the grad function for reference? thanks.
here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn_grad.py#L247-L262
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I found the tf GPU impl here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/softmax_op_gpu.cu.cc#L135-L191
Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @dsmilkov)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Ping is right we should be doing this in a kernel, but it looks like we don't do that for softmax. One small comment about the test then I will merge :)
Reviewed 1 of 3 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055, @yhwang, and @dsmilkov)
src/ops/softmax.ts, line 102 at r1 (raw file):
Previously, yhwang (Yihong Wang) wrote…
basically, log_softmax and softmax need to be performed against the last dimension. In order to support non-last dim axis, we need to do a transpose to swap the axis in the beginning and sawp it back at the end. the original softmax in tfjs-core doesn't support that. So I follow the convention to not support it. but I do see tf supports it at python level API here:https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn_ops.py#L1687-L1704
This is what we do for softmax, so this is fine.
src/ops/softmax_test.ts, line 169 at r1 (raw file):
expect(f).toThrowError(); });
can you add a unit test for the gradient?
ab5ed1d
to
1592f2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @pyu10055, @nsthorat, and @dsmilkov)
src/ops/softmax_test.ts, line 169 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
can you add a unit test for the gradient?
thanks for the review. I just added a gradient test case with hard-code expected results. any suggestion on adding gradient test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw CI failed on a test case here:
matmul_test.ts:310
It should not be related to this PR.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @pyu10055, @nsthorat, and @dsmilkov)
Althought LogSoftmax is log(Softmax) but the implementation is not. Add the LogSoftmax op and use the implementation from tensorflow. Signed-off-by: Yihong Wang <yh.wang@ibm.com>
1592f2d
to
e054b61
Compare
@pyu10055 lets wait till the questions are all addressed before merging next time :) |
@yhwang go for it! |
Althought LogSoftmax is log(Softmax) but the implementation is not.
Add the LogSoftmax op and use the implementation from tensorflow.
fixes issue 684
Signed-off-by: Yihong Wang yh.wang@ibm.com
Description
For repository owners only:
Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY
For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md
This change is