Skip to content
This repository has been archived by the owner on Aug 15, 2019. It is now read-only.

Support LogSoftmax op #1342

Merged
merged 2 commits into from
Nov 8, 2018
Merged

Conversation

yhwang
Copy link
Contributor

@yhwang yhwang commented Oct 25, 2018

Althought LogSoftmax is log(Softmax) but the implementation is not.
Add the LogSoftmax op and use the implementation from tensorflow.

fixes issue 684
Signed-off-by: Yihong Wang yh.wang@ibm.com

Description


For repository owners only:

Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY

For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md


This change is Reviewable

@yhwang
Copy link
Contributor Author

yhwang commented Oct 25, 2018

I used the log_softmax implementation in tensorflow here and use the gradient function here

I haven't added test cases yet. Any suggestion for the test cases? Can I duplicate the test cases for softmax and update the expect results accordingly?

I directly put the logSoftmax() with softmax() in softmax.ts. Should I separate them?

Edit
I added some test cases. However, all of the test cases are for log softmax function and no test case for the gradient function yet.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution, on high level, it is preferred to implement the op on GPU as an independent op, meaning the logic should not relies on other ops, mainly for performance concern.

Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov)

@yhwang
Copy link
Contributor Author

yhwang commented Nov 5, 2018

@pyu10055 thanks for the comments. I followed the softmax and came up this PR. In logSoftmax it needs exp().sum().log() operations, same as softmax in tfjs-core. I guess the reason that original softmax impl doesn't put those calculation into WebGL is the sum() calculation in the middle. Therefore, I didn't try to move everything to WebGL. but that's my guess. Also the LogSoftmax implementation in tensorflow here seems has no specialization for GPU. Any suggestion to move this to WebGL and enhance the performance?

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, I did not realize our softmax did not have its own GPU implementation. Thanks for the explanation.

Reviewed 3 of 3 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @yhwang and @dsmilkov)


src/ops/softmax.ts, line 102 at r1 (raw file):

    axis = $logits.rank - 1;
  }
  if (axis !== $logits.rank - 1) {

can you explain why non-last dim axis is not supported? thanks.


src/ops/softmax.ts, line 113 at r1 (raw file):

    const shifted = logits.sub(xMax);
    const value =
        shifted.toFloat().sub(shifted.exp().sum(axis, keepDims).log()) as T;

do we need to cast the shifted to float?


src/ops/softmax.ts, line 115 at r1 (raw file):

        shifted.toFloat().sub(shifted.exp().sum(axis, keepDims).log()) as T;

    const gradFunc = (dy: T) => {

do you have tf implementation of the grad function for reference? thanks.

Copy link
Contributor Author

@yhwang yhwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @dsmilkov)


src/ops/softmax.ts, line 102 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

can you explain why non-last dim axis is not supported? thanks.

basically, log_softmax and softmax need to be performed against the last dimension. In order to support non-last dim axis, we need to do a transpose to swap the axis in the beginning and sawp it back at the end. the original softmax in tfjs-core doesn't support that. So I follow the convention to not support it. but I do see tf supports it at python level API here:https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn_ops.py#L1687-L1704


src/ops/softmax.ts, line 113 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

do we need to cast the shifted to float?

this also follows the softmax impl above (line 61)


src/ops/softmax.ts, line 115 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

do you have tf implementation of the grad function for reference? thanks.

here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn_grad.py#L247-L262

Copy link
Contributor Author

@yhwang yhwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I found the tf GPU impl here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/softmax_op_gpu.cu.cc#L135-L191

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @dsmilkov)

Copy link
Contributor

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Ping is right we should be doing this in a kernel, but it looks like we don't do that for softmax. One small comment about the test then I will merge :)

Reviewed 1 of 3 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055, @yhwang, and @dsmilkov)


src/ops/softmax.ts, line 102 at r1 (raw file):

Previously, yhwang (Yihong Wang) wrote…

basically, log_softmax and softmax need to be performed against the last dimension. In order to support non-last dim axis, we need to do a transpose to swap the axis in the beginning and sawp it back at the end. the original softmax in tfjs-core doesn't support that. So I follow the convention to not support it. but I do see tf supports it at python level API here:https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn_ops.py#L1687-L1704

This is what we do for softmax, so this is fine.


src/ops/softmax_test.ts, line 169 at r1 (raw file):

    expect(f).toThrowError();
  });

can you add a unit test for the gradient?

Copy link
Contributor Author

@yhwang yhwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @pyu10055, @nsthorat, and @dsmilkov)


src/ops/softmax_test.ts, line 169 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

can you add a unit test for the gradient?

thanks for the review. I just added a gradient test case with hard-code expected results. any suggestion on adding gradient test cases?

Copy link
Contributor Author

@yhwang yhwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw CI failed on a test case here:
matmul_test.ts:310
It should not be related to this PR.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @pyu10055, @nsthorat, and @dsmilkov)

Althought LogSoftmax is log(Softmax) but the implementation is not.
Add the LogSoftmax op and use the implementation from tensorflow.

Signed-off-by: Yihong Wang <yh.wang@ibm.com>
@pyu10055 pyu10055 merged commit ff27d84 into tensorflow:master Nov 8, 2018
@nsthorat
Copy link
Contributor

nsthorat commented Nov 8, 2018

@pyu10055 lets wait till the questions are all addressed before merging next time :)

@yhwang yhwang deleted the add-op-log-softmax branch November 8, 2018 17:32
@yhwang
Copy link
Contributor Author

yhwang commented Nov 8, 2018

@nsthorat @pyu10055 thanks for the reviews again. One question, may I also work on tfjs-converter to put logSoftmax into normalization ops?

@nsthorat
Copy link
Contributor

nsthorat commented Nov 8, 2018

@yhwang go for it!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ValueError: Unsupported Ops in the model before optimization LogSoftmax
3 participants