Skip to content
This repository has been archived by the owner on Aug 15, 2019. It is now read-only.

allow backend engine to be cached by backend name #1233

Merged
merged 3 commits into from
Aug 27, 2018

Conversation

pyu10055
Copy link
Collaborator

@pyu10055 pyu10055 commented Aug 18, 2018

Description

This would allow multiple backends used in the same session, and it will prevent
tensor ref count being cleared when switching between backends.


For repository owners only:

Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY

For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md


This change is Reviewable

…use multiple backend at the same time without causing tensor_refs to be cleared
@pyu10055 pyu10055 requested a review from dsmilkov August 18, 2018 04:35
@nkreeger
Copy link
Contributor

This seems bad - keeping Tensors in different backends will cause problems when trying to use them in Ops executed by different backends...

@nsthorat
Copy link
Contributor

Nick is right -- I don't think we can go forward with this change because Tensors fundamentally cannot be shared between backends because the backend owns the mapping between DataId => data. Let's set up a VC at some point to talk about the problem and a better solution :)

@pyu10055
Copy link
Collaborator Author

pyu10055 commented Aug 20, 2018

I want to point out, the current approach has the same issue, if you don't manually download the all the tensors and re-register with the new backend, the tensors are practically splitted across backends, and even worse you can not even access them any more.

For example if you are on webgl backend, and switched to cpu, we just simple wiped out the ref counts in the engine, but never clean the backend. This is causing data inconsistency and memory leak.

One use case for keeping the tensors on the backend and engine is to reuse them when switch back.
For example, I want execute a set of ops in cpu as post-process for webgl based inference, but I need to switch back to webgl for the next inference. It would be great if I can reuse all the weight tensors of the model, which are on the webgl backend.

@dsmilkov
Copy link
Contributor

dsmilkov commented Aug 20, 2018

I think caching the engine is a good idea and doesn't change anything about previous behavior - backends still can't share tensors. What Ping said put in code, this change makes possible the following:

// Some work on webgl.
tf.setBackend('webgl');
const squareOnWebGL = tf.square(2);

// Then a bit on cpu.
tf.setBackend('cpu');
const sqrtOnCpu = tf.sqrt(squareOnWebGl.dataSync());

// Then back to webgl
tf.setBackend('webgl');
// NOTE: squareOnWebGL is not lost since we kept the engine.
const addOnWebGL = tf.add(squareOnWebGL, sqrtOnCpu.dataSync());

And a more realistic scenario:

function predict() {
  // Work on webgl, which holds all the variables.
  const result = model.predict();

  // Post-process on cpu, no variables, just ops.
  tf.setBackend('cpu');
  postProcess(result.dataSync());

  tf.setBackend('webgl');
  // When next animation frame fires, all the webgl variables are kept.
}

requestAnimationFrame(() => predict());

@nkreeger
Copy link
Contributor

Just curious what the use case is here - FWIW TensorFlow keeps Tensors stored on "devices" and automatically handles copying them over if they need to run on different host memory. I think exposing a way to jump between backends for an end user might be confusing and lead to misuse/bugs/etc.

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just talked to Nikhil and we are both ok with this change as a stop-gap. Nikhil or I will later properly implement sharing of tensors across backends (whose user-facing API will be discussed on Ping's doc). At that point there will be only 1 engine, thus no need to cache, so this stop-gap will go away.

Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov)

@dsmilkov dsmilkov merged commit 59b8ea7 into master Aug 27, 2018
@dsmilkov dsmilkov deleted the backend_engine_cache branch November 20, 2018 16:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants