Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel shader compilation is broken #7577

Closed
shanumante-sc opened this issue Apr 12, 2023 · 3 comments
Closed

Parallel shader compilation is broken #7577

shanumante-sc opened this issue Apr 12, 2023 · 3 comments
Assignees
Labels
stat:awaiting response type:bug Something isn't working

Comments

@shanumante-sc
Copy link

shanumante-sc commented Apr 12, 2023

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): M1 Mac 13.3.1 (22E261)
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Reproducible on any device
  • TensorFlow.js installed from (npm or script link):
  • TensorFlow.js version (use command below): 4.2.0
  • Browser version: Chrome Version 112.0.5615.49
  • Tensorflow.js Converter Version: N/A

Describe the current behavior

  • We use ENGINE_COMPILE_ONLY to speed up initial model loading time by not blocking on shader compilation status checks.
  • After Use VAOs for save+restore of vertexAttribPointer state between different webgl programs. #6913, even if we set ENGINE_COMPILE_ONLY env variable, we end up calling getAttributeLocation when we call bindVertexProgramAttributeStreams.
  • This introduces a synchronization point for the graphics pipeline and we block on shader compilation. A side issue is that TFJS calls getAttributeLocation without first checking for gl.LINK_STATUS
  • See screenshot below which contains Chrome trace of a model which is being run with ENGINE_COMPILE_ONLY (running on M1 mac Chrome)

51667c9b-5733-488c-afac-126e5cd357e6

Describe the expected behavior

  • TFJS should check ENGINE_COMPILE_ONLY before calling bindVertexProgramAttributeStreams so that we do not stall GPU pipeline. We can set up the VAO in checkCompileCompletion.

Standalone code to reproduce the issue

  • Run any network with ENGINE_COMPILE_ONLY set to true, capture chrome perf trace, and see that getAttributeLocation is called for each call to createProgram

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.

@Linchenn
Copy link
Collaborator

@shanumantesc Thanks for reporting this! I just verified the fix and it should work now.

@gaikwadrahul8
Copy link
Contributor

Hi, @shanumantesc

I see this PR #7587 got merged to take care of your issue so if your issue got resolved could you please close this issue now ? Thank you!

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@gaikwadrahul8 gaikwadrahul8 self-assigned this Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting response type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants