-
Notifications
You must be signed in to change notification settings - Fork 95
Add new method dispose() to tf.layers.Layer and tf.Model #282
Conversation
FEATURE This allows a layer or a model to release the memory (e.g., WebGL textures) allocated for its weights. Usage example: ```js const model = tf.sequential(); model.add(tf.layers.dense({ units: 10, activation: 'relu', inputShape: [20]})); model.add(tf.layers.dense({units: 3, activation: 'softmax'})); model.summary(); model.decRef(); // This disposes the four weights that belong to the model's layers. ``` We use reference counting because layers may be shared among multiple models. Also, a model may be nested under other models. When the reference counter of a layer or model decreases to zero, its weights will be disposed. After that, the layer or model cannot be used in `apply`, `predict`, `evaluate`, `fit` or `save` calls anymore. Fixes: tensorflow/tfjs#533
On my phone right now but I think this should be called “dispose” for
consistency with core. dispose() actually does a decref, and I don’t think
JS folks will be comfortable with this terminology.
…On Mon, Aug 6, 2018 at 9:43 AM Shanqing Cai ***@***.***> wrote:
@caisq <https://github.com/caisq> requested your review on:
tensorflow/tfjs-layers#282
<#282> Add decRef to
tf.layer.Layer and tf.Model.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#282 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABDLzdZ67dN70g8Z7QeXP7FBsdsj3W-3ks5uOEf1gaJpZM4VwaMp>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment, @nsthorat ! I considered the dispose
. I think the downside is that it is potentially misleading. For example, when a Layer is shared among multiple Model instances, calling the method may not actually dispose the layer's weights. Similarly, when a sequential model is nested in another sequential model, calling the method will not immediately dispose the weights. This is different from the situation with core tensors and variables, for which calling the dispose
method is guaranteed to free the memory IIUC.
So I would prefer either
- keep the decRef() name, with the understanding that this is a pretty niche method that only a small group of hard-core clients of ours will deal with
- name it something more consistent with
dispose
, but more truthful, e.g.,tryDispose
.
Let me know what you think.
Reviewable status: 0 of 1 approvals obtained (waiting on @nsthorat, @davidsoergel, and @dsmilkov)
dispose() actually does not guarantee a disposal of the underlying memory,
it just decrefs. If you reshape or clone a Tensor, we increase the refcount
to the underlying memory. You have to dispose both the original tensor and
the cloned tensor to actually free up the underlying memory. To me, this is
almost the same thing as what you are doing with model weights / layers.
…On Mon, Aug 6, 2018 at 10:34 AM Shanqing Cai ***@***.***> wrote:
***@***.**** commented on this pull request.
Thanks for the comment, @nsthorat <https://github.com/nsthorat> ! I
considered the dispose. I think the downside is that it is potentially
misleading. For example, when a Layer is shared among multiple Model
instances, calling the method may not actually dispose the layer's weights.
Similarly, when a sequential model is nested in another sequential model,
calling the method will not immediately dispose the weights. This is
different from the situation with core tensors and variables, for which
calling the dispose method is guaranteed to free the memory IIUC.
So I would prefer either
- keep the decRef() name, with the understanding that this is a pretty
niche method that only a small group of hard-core clients of ours will deal
with
- name it something more consistent with dispose, but more truthful,
e.g., tryDispose.
Let me know what you think.
*Reviewable
<https://reviewable.io/reviews/tensorflow/tfjs-layers/282#-:-LJEdXERDzq9tnboOimE:b-pwhu7m>*
status: 0 of 1 approvals obtained (waiting on @nsthorat
<https://github.com/nsthorat>, @davidsoergel
<https://github.com/davidsoergel>, and @dsmilkov
<https://github.com/dsmilkov>)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#282 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABDLzaAKQvU_IUF0TlnjMBvMmdqdhDSvks5uOFPsgaJpZM4VwaMp>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds a little potentially misleading :p Looking at the doc string at https://js.tensorflow.org/api/0.12.0/#dispose, it doesn't mention the decRef behavior, which I think it should. Maybe let's discuss in person what to do with this method's name.
Reviewable status: 0 of 1 approvals obtained (waiting on @nsthorat, @davidsoergel, and @dsmilkov)
Sure, I’ll be back soon. We don’t mention it because it’s an implementation
detail and users should not have to worry about it.
In this case, it may be actually be important to the user and we should
document it.
…On Mon, Aug 6, 2018 at 10:57 AM Shanqing Cai ***@***.***> wrote:
***@***.**** commented on this pull request.
That sounds a little potentially misleading :p Looking at the doc string
at https://js.tensorflow.org/api/0.12.0/#dispose, it doesn't mention the
decRef behavior, which I think it should. Maybe let's discuss in person
what to do with this method's name.
*Reviewable
<https://reviewable.io/reviews/tensorflow/tfjs-layers/282#-:-LJEjHZK1I0pjwY-_BRr:blyedwn>*
status: 0 of 1 approvals obtained (waiting on @nsthorat
<https://github.com/nsthorat>, @davidsoergel
<https://github.com/davidsoergel>, and @dsmilkov
<https://github.com/dsmilkov>)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#282 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABDLzaLI-qz7pnV9qFa_EnVINu6XWyUBks5uOFlOgaJpZM4VwaMp>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After offline discussion, I'm renaming decRef to dispose. Rationale
- Consistent with the dispose method naming in core. Less scary looking name.
- The special case of shared Layers and nested Sequential models are relatively rare.
- Those special cases are now covered in the doc string.
PTAL.
Reviewable status: 0 of 1 approvals obtained (waiting on @nsthorat, @davidsoergel, and @dsmilkov)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM -- quick question, where is the actual model.dispose()? Is that container.ts?
Reviewable status: 0 of 1 approvals obtained (waiting on @nsthorat, @davidsoergel, and @dsmilkov)
@nsthorat To answer your question, it is in container.ts under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 7 files at r1, 5 of 5 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @davidsoergel, and @dsmilkov)
src/engine/container.ts, line 641 at r2 (raw file):
/** * Attempt to dispose a Container's weights.
instead of saying "Container" can you say "Model"? This will show better in the docs (Container is an implementation detail).
Also, this first line should just say something like "Disposes a model, freeing up used memory that is not shared with another model." since that's exactly what it's doing. The fact that another model may share weights doesn't reduce that fact that the first model is actually disposed.
Then after that we can talk about the refcounting stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @caisq, @davidsoergel, and @dsmilkov)
src/engine/container.ts, line 641 at r2 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
instead of saying "Container" can you say "Model"? This will show better in the docs (Container is an implementation detail).
Also, this first line should just say something like "Disposes a model, freeing up used memory that is not shared with another model." since that's exactly what it's doing. The fact that another model may share weights doesn't reduce that fact that the first model is actually disposed.
Then after that we can talk about the refcounting stuff
Good point. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, sorry I just saw this. This solution is ok, but adds memory management complexity in layers.
An alternative (a low maintenance approach) would have been to have model.clone() which calls layer.clone() which calls v.clone() for each variable v of a given layer. Then calling dispose() on a model would just call dispose on its layers, which calls dispose() on the variables used by those layers. This way you don't have to do any ref counting, since all of the ref counts are impl detail handled in tfjs-core via the .clone() / dispose() mechanism. This way you also have a symmetry of clone()<--->dispose().
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @nsthorat, @davidsoergel, and @dsmilkov)
Just spoke to Daniel offline -- as far as I understand, that approach unfortunately won't work because the layer itself is shared. Multiple models can both update the same weights. |
FEATURE
This allows a layer or a model to release the memory (e.g., WebGL
textures) allocated for its weights.
Usage example:
We use reference counting because layers may be shared among multiple
models. Also, a model may be nested under other models.
When the reference counter of a layer or model decreases to zero, its
weights will be disposed. After that, the layer or model cannot be used
in
apply
,predict
,evaluate
,fit
orsave
calls anymore.Fixes: tensorflow/tfjs#533
This change is