Skip to content
This repository was archived by the owner on Sep 17, 2022. It is now read-only.

Share host memory of Tensors with V8.#241

Merged
nkreeger merged 3 commits intomasterfrom
kreeger-v8-mem
Apr 4, 2019
Merged

Share host memory of Tensors with V8.#241
nkreeger merged 3 commits intomasterfrom
kreeger-v8-mem

Conversation

@nkreeger
Copy link
Contributor

@nkreeger nkreeger commented Apr 1, 2019

This PR introduces a new change to use V8 memory for Tensor memory allocation.

Previously, Tensor data was allocated off of the heap and memcpy'd from host V8 memory to the allocated Tensor data. Since the TF C API provides a callback when TF_Tensor instances are cleaned up, we can share host memory with V8 by simple adding and additional reference count to the represented JS typed-array.


This change is Reviewable

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Can you run the node benchmarks (cc @caisq) by linking to this build?

Reviewed 3 of 3 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @kangyizhang, @nkreeger, and @nsthorat)


binding/tfjs_backend.cc, line 43 at r1 (raw file):

    fprintf(stderr, "Invalid NapiAutoRef reference passed to V8 cleanup\n");
#endif
    return;

for my own understanding of C++: looks like if debug is enabled, we log an error, and when it is disabled, we are silent. Should we return an error code and fail somewhere in the stack instead?


binding/tfjs_backend.cc, line 160 at r1 (raw file):

  nstatus = auto_ref->Init(env, array_value);
  if (nstatus != napi_ok) {
    delete auto_ref;

when does this condition happen? Should we fail if the Init() fails?


binding/tfjs_backend.cc, line 177 at r1 (raw file):

      TFE_NewTensorHandle(tensor.tensor, tf_status.status);
  if (TF_GetCode(tf_status.status) != TF_OK) {
    delete auto_ref;

same here. When would the status not be ok?

Copy link
Contributor Author

@nkreeger nkreeger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried - the benchmarks are still very WIP and missing details for bootstrapping the environment.

Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @kangyizhang, and @nsthorat)


binding/tfjs_backend.cc, line 43 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

for my own understanding of C++: looks like if debug is enabled, we log an error, and when it is disabled, we are silent. Should we return an error code and fail somewhere in the stack instead?

Normally we do, but we suppress most build warnings/runtime warnings unless in developer-debug mode.

Also, N-API requires an napi_env instance to bubble the exception up. We do that whenever possible - the static scope of this callback doesn't allow that here.


binding/tfjs_backend.cc, line 160 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

when does this condition happen? Should we fail if the Init() fails?

This happens whenever V8 won't allow a napi_ref instance - it is not normally expected, but we need to check every call into n-api and return as early as possible to bubble up exceptions.


binding/tfjs_backend.cc, line 177 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

same here. When would the status not be ok?

When TensorFlow can not allocate a Tensor. This is another a check-every-call style API.

ENSURE_TF_OK_RETVAL attempts to get a message and return it to the user as a JS exception (same with the ENSURE_NAPI_OK_*) methods.

@caisq
Copy link
Contributor

caisq commented Apr 1, 2019

@nkreeger @dsmilkov I'll be happy to pull this branch and run the benchmarks on it and compared the results with the ones from master HEAD.

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @kangyizhang, @nkreeger, and @nsthorat)


binding/tfjs_backend.cc, line 177 at r1 (raw file):

Previously, nkreeger (Nick Kreeger) wrote…

When TensorFlow can not allocate a Tensor. This is another a check-every-call style API.

ENSURE_TF_OK_RETVAL attempts to get a message and return it to the user as a JS exception (same with the ENSURE_NAPI_OK_*) methods.

Thanks for explaining!

@dsmilkov
Copy link
Contributor

dsmilkov commented Apr 1, 2019

@caisq that would be great. LGTM, but let's run the benchmarks before submitting (it is a great opportunity to dogfood our benchmarks :)

@caisq
Copy link
Contributor

caisq commented Apr 2, 2019

FYI, I ran the benchmarks on this PR here are the results I got.

predict() times changes:

  • dense-tiny: 2.6 -> 2.0 (1.3x)
  • dense-large: 3.1 -> 2.7 (1.15x)
  • convolutional-1filters: 5.2 -> 5.0 (1.04x)
  • convolutional-32filters: 5.4 -> 5.7 (0.95x)
  • rnn-simpleRNN: 21.4 -> 16.9 (1.27x)
  • rnn-GRU: 83.9 -> 71.1 (1.18x)
  • rnn-LSTM: 79.6 -> 76.1 (1.05x)
  • mobilenet: 31.8 -> 30.7 (1.04x)
  • attention: 108.3 -> 89.7 (1.21x)

fit() time changes:

  • dense-tiny: 6.9 -> 5.2 (1.32x)
  • dense-large: 12.3 -> 12.8 (0.96x)
  • convolutional-1filters: 9.9 -> 11.2 (0.88x)
  • convolutional-32filters: 38.9 -> 40.7 (0.95x)
  • rnn-simpleRNN: 30.6 -> 25.4 (1.20x)
  • rnn-GRU: 88.7 -> 67.9 (1.30x)
  • rnn-lSTM: 79.9 -> 62.3 (1.28x)

So in most cases, there are very nice speedups (up to ~30%).

@nkreeger nkreeger merged commit 8d9962c into master Apr 4, 2019
@nkreeger nkreeger deleted the kreeger-v8-mem branch April 4, 2019 14:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants