Optimize tf.tidy() and tf.keep(). #1621

nsthorat · 2019-03-12T20:33:13Z

We do this by:

Removing global tf.keep() tracking. Instead, we add a "kept" bit to tensors and we implicitly remove kept tensors from tracking mechanisms. When we are going to track in a parent scope, we avoid doing that if the kept bit is set on the tensor.
Add an integer ID to every scope. track sets the tensor's scope ID and then when we're checking whether the tensor belongs to the current scope when tracking in a parent, we simply check that id matching the current scope ID.

No unit tests added since we have pretty serious coverage over memory and this is an internal optimization.

This change is

dsmilkov

Reviewed 2 of 2 files at r1.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @caisq and @dsmilkov)

BUG Improve memory management of tensors during training. Op authors now explicitly save intermediate tensors that are needed for backwards mode. This allows the engine to optimally dispose memory during forward pass, and keep only tensors needed for the backward pass. Based on the [layers benchmark](https://github.com/tensorflow/tfjs-layers/tree/v1.0.0/integration_tests/benchmarks), this change along with #1621 led to: ## 2-3X memory reduction **Before** ![before-mem](https://user-images.githubusercontent.com/2294279/54299829-13b9db80-4592-11e9-97a4-04a95012b5c4.png) **After** ![after-mem](https://user-images.githubusercontent.com/2294279/54299854-203e3400-4592-11e9-8887-62165d0bfbbb.png) ## 1.5-1.7x improvement in fit() for GRU and LSTM ops. **Before** ![before](https://user-images.githubusercontent.com/2294279/54299715-d5242100-4591-11e9-8a4c-b5944e991f57.png) **After** ![after](https://user-images.githubusercontent.com/2294279/54299724-d9e8d500-4591-11e9-8ea9-e340e10a41ce.png) - When the user writes the forward pass of an op, they are given a `save` function that allows them to save inputs or intermediate tensors to be reused for the backwards pass - Before this change, the `save` function was a no-op, a placeholder for when we decide to optimize disposal of tensors in the training process in the future. - However, `save` being a no-op caused a bug for existing users who rely on it (e.g. Magenta.js). - After this change, the `save` function makes a shallow copy of the tensor, and keeps it until the backwards pass is done. - `save` used to take an array of tensors. Now it takes a `NamedTensorMap` which improves code readability, and reduces chances of off-by-one index bugs. Fixes tensorflow/tfjs#1320 PERF BUG

Nikhil Thorat added 2 commits March 12, 2019 16:14

Optimize tf.keep

8aa9332

save

9b8c416

nsthorat changed the title ~~Optimize tf.keep().~~ Optimize tf.tidy() and tf.keep(). Mar 12, 2019

save

8605aa8

nsthorat requested review from dsmilkov and caisq March 12, 2019 20:35

dsmilkov approved these changes Mar 12, 2019

View reviewed changes

nsthorat merged commit 3ed9016 into master Mar 12, 2019

nsthorat deleted the keep branch March 12, 2019 20:58

dsmilkov mentioned this pull request Mar 13, 2019

Improve disposal of tensors during training #1604

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize tf.tidy() and tf.keep(). #1621

Optimize tf.tidy() and tf.keep(). #1621

nsthorat commented Mar 12, 2019 •

edited

Loading

dsmilkov left a comment

Optimize tf.tidy() and tf.keep(). #1621

Optimize tf.tidy() and tf.keep(). #1621

Conversation

nsthorat commented Mar 12, 2019 • edited Loading

dsmilkov left a comment

Choose a reason for hiding this comment

nsthorat commented Mar 12, 2019 •

edited

Loading