-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] Avoid double lookup of tables when using ShadowVariable when the Optimizer is updating gradients #262
Conversation
@@ -101,7 +101,10 @@ def apply_grad_to_update_var(var, grad): | |||
var._track_optimizer_slots(_slots) | |||
|
|||
with ops.control_dependencies([grad]): | |||
v0 = var.read_value(do_prefetch=not var.params.bp_v2) | |||
if isinstance(var, de.shadow_ops.ShadowVariable): | |||
v0 = var.read_value(do_prefetch=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be ok if lookup multiple times in one function call, from different inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6c00505
to
7420ea5
Compare
@@ -51,6 +51,14 @@ _TF_DOWNLOAD_CLANG = "TF_DOWNLOAD_CLANG" | |||
_PYTHON_BIN_PATH = "PYTHON_BIN_PATH" | |||
|
|||
_DEFAULT_CUDA_COMPUTE_CAPABILITIES = { | |||
"11.6": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe update the cuda compute capability in another commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
3587c2d
to
c75cdbe
Compare
1d0c53c
to
667fec2
Compare
…imizer is updating gradients. [fix] pass parameter init_size when create slot variable for de.Variable
1.Make output shape is dim when raw_init is a TF Initializer 2.Make input dim is a constant op when using reshape op to prevent bug fault when tf.function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve again.
Description
Before this fix, in Keras model, ShadowVariable would read_value with do_prefetch twice when both embedding_lookup in forward calculation and apply_grad_to_update_var in backward calculation. For now, if var in apply_grad_to_update_var is ShadowVariable, read its value directly because it has been already done read_value with do_prefetch when in embedding_lookup function.
Also fix pass parameter init_size when create slot variable for de.Variable.
Also compatible with CUDA 11.6
Also modify _convert_anything_to_init function:
1.Make output shape is dim when raw_init is a TF Initializer
2.Make input dim is a constant op when using reshape op to prevent fault ——
*** tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 1 values, but the requested shape has 43
Encountered when executing an operation using EagerExecutor. This error cancels all future operations and poisons their output tensors.
Also restrict Bazel building ram resources for Github CI memory limit
Type of change
Checklist:
How Has This Been Tested?
A test python script