-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add different pattern to show op in timeline #37074
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
@googlebot I signed it! |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
@prb12 do you see this pull request? |
@xinan-jiang, sorry, looks like @prb12 no longer works on TF anymore. @qiuminxu do you know if anybody on the performance side can take a look at this PR? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sending the PR.
Thank you for your review. I have modified the file according to the advice. Could you have a second look? Thank you! |
@google-admin @googlebot Could you help me check what's wrong with the cla check on this pr? The same github account and email address are used on #37813 and that pr passed the cla check. |
Can you run the check again and see if it passes? |
@googlebot I signed it! |
@qiuminxu Could you help me check what's the migration error in the check "import/copybara"? Thank you! |
@yifeif I couldn't see the details of the test error, can someone help take a look? |
@mihaimaruseac Can you please take a look on cla/google test failure? Thanks! |
This is a PR from JIZHI, the AI platform in Tencent.
When using timeline in tensorflow, we often observe large blanks in the "/job" row, which should be showing the consecutive execution of operators.
The following are timelines for transformer and transformer using XLA:
Transformer:
XLA:
The timeline is somehow confusing and those blanks may leads to misunderstanding that there are gpus hanging freely.
The reason for this issue is that the "/job" row only shows the scheduling time for each op. As a result, the async kernels may not start to compute by the time the scheduling of its op is over. For the transformer example, the matmul in embedding takes long and blocks other kernels, which results in the large blank in the middle. And for XLA, those fused kernels are scheduled early but have to wait for execution, which results in the large gap between the ending in "/job" and "/stream:all".
Therefore, we propose 2 new pattern to show the op execution time.
"gpu"
pattern will align op with the execution span of all its kernels"all"
pattern will only change the ending time of the op to the ending of its last kernel.we added a new argument
op_time
to functiongenerate_chrome_trace_format
to let user select how op execution time will be shown. The default value is"schedule"
which behaves the same as before. And other possible values are"gpu"
and"all"
as explained above.The result of "gpu" pattern is:
Transformer:
XLA:
And the result of "all" pattern is
Transformer:
XLA:
Notice that the above illustrations have only shown part of the "all" pattern since it may induce large parallel (many op are waiting to be executed at the same time.)
Additionally, the kernel name of an XLA fusion kernel is not parsed correctly, and because we do not have plans on changing the C++ part of the profiler, we reparsed it using the timeline_label attribute provided in RunMetadata.
Thank you for your time on this review.