-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
for loop in map_fn within tf.function is very slow compared to using while_loop counterpart #40517
Comments
I have tried in colab with TF version 2.2, nightly versions and was able to reproduce the issue .Please, find the gist here.Thanks! |
This looks like a regression - certain ops that the for loop uses don't seem to be optimized properly. We'll look into that. For a temporary workaround, writing the loop like so should avoid the slow ops:
|
Looks like the workaround only fixes the difference partially - there is still a pair of logical_and/logical_or ops which are slow on GPU and hurt performance [1]. This will be fixed in TF 2.4.
|
…r when maximum_iterations is set. Fixes #40517. PiperOrigin-RevId: 326056684 Change-Id: I81854d6731a9134b695c704b0f28786091f8239e
* 'master' of github.com:tensorflow/tensorflow: (1480 commits) Strengthen the warning about source code of lambda functions, until the issue is fixed. As found in tensorflow#39832, this bug can lead not just to parse errors but incorrect code as well. Integrate LLVM at llvm/llvm-project@cd209f1a3790 compat: Update forward compatibility horizon to 2020-07-06 Update GraphDef version to 454. Fix a crash on BenchmarkTfLiteModel with delegate Remove the constraint the QConst couldn't be shared Integrate LLVM at llvm/llvm-project@edba2864a7a8 Integrate LLVM at llvm/llvm-project@91c320e9d852 compat: Update forward compatibility horizon to 2020-07-05 Update GraphDef version to 453. [MLIR] Add TF_Log(1+x) to TF_Log1p(x) canonicalization [mlir] Move test to correct folder Revive the use of constant_value workaround, since it seems that logical ops are still considerably slower on GPU. Fixes tensorflow#40517. Partially addresses tensorflow#40708. Remove unit tests from values_test that test functions in distribute_utils. compat: Update forward compatibility horizon to 2020-07-04 Update GraphDef version to 452. [XLA:CPU] Support printing with printf(3) during execution [XLA:CPU] [NFC] Extract a superclass for different ParticipantData inputs to the rendezvous [XLA][LHLO] signal pass failure if one conversion failed Integrate LLVM at llvm/llvm-project@01c4574a129e ...
Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template
System information
Describe the current behavior
Code 1 runs much slow than Code 2 (71s seconds vs 11s), but from my understanding in Tensorflow documentation, Code 1 and Code 2 should be effectively the same (tf.function will turn the for loop into
while_loop
if the loop use Tensor but apparently not the case here). Moreover Code 2 will run much faster (~1s) on CPU (9750H, 1650Ti), not sure if that is expected tho. Code 3 using for loop and for loop to replacemap_fn
completed in 164s which is the slowest among the three which is expected.Update: This behavior is not observed in tf2.1.0, code 1/2/3 completed in similar amount of time (11ish seconds)
Describe the expected behavior
Code 1 and Code 2 runs as fast as each other.
Standalone code to reproduce the issue
Code 1:
Code 2:
Code 3:
Other info / logs Include any logs or source code that would be helpful to
N/A
The text was updated successfully, but these errors were encountered: