Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check for valid buffer and program #23658

Closed
wants to merge 2 commits into from

Conversation

rickg-hcl
Copy link

@rickg-hcl rickg-hcl commented Feb 12, 2025

@kripken kripken requested a review from kainino0x February 21, 2025 23:57
Copy link
Collaborator

@kainino0x kainino0x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, but I'm not sure this change works correctly.

In particular I don't really understand how things could get into a state where this is a problem. For example for prevVB: if the buffer was already deleted before the state was saved, then nothing will be bound, so getParameter(gl.ARRAY_BUFFER) (line 1004) returns null, so prevVB == null. When we restore the state we'll call gl.bindBuffer(gl.ARRAY_BUFFER, null), which is correct - it unbinds the temporary array buffer. We don't want to leave a buffer bound if there wasn't one previously, which I think this PR would do.

Obviously there is some sequence of events that triggers the error you provided in the bug report, I'm just not sure what it could be. If you could provide a repro case or at least a step-by-step of how it happens, that would be really useful to understanding what the correct fix is here. Thanks!

@rickg-hcl
Copy link
Author

I attempted to make a sample, but could not make one that duplicated the situation.
The issue is that somewhere in between gathering the previous program and restoring the previous program, that program was deleted by another thread.
The interaction between the threads is hard to put into a sample, but happens regularly in my application.

@kainino0x ,I can give you credentials (in a private channel) on our sandbox server so you can see the log yourself
https://nomad.myhclsandbox.com:9443/nomad/#/

@kainino0x
Copy link
Collaborator

kainino0x commented Feb 25, 2025

A threading issue [in the middle of this code] shouldn't be possible. WebGL itself is single-threaded, so if you're using multi-threaded GL in your program that's achieved by Emscripten's GL proxying feature. When that's used, all WebGL calls still happen on the same JS thread. All of the code in this function happens at once in a single JS task, which can't be preempted by any other JS code.

It's possible there's a browser bug that causes getParameter to not update immediately when the buffer is deleted? Does this issue happen in more than one browser engine? (chromium / firefox / safari)

It's also possible there's some way GL can get into this state that I'm not aware of. I'm rusty on knowledge of the GL state machine.

You could try capturing the actual sequence of WebGL calls using SpectorJS and that might give some insight into how this happens.

You can feel free to email me at kainino@google.com but I won't be able to take much time to look at it. Hopefully these pointers help you find out what's going on.

@rickg-hcl
Copy link
Author

rickg-hcl commented Feb 25, 2025

I tested on Firefox, and the same Warning occurred.

according to copilot:

The blitOffscreenFramebuffer function in the provided code can cause the thread to pause due to the following reasons:

  1. WebGL Context Loss: If the WebGL context is lost, operations like bindFramebuffer, blitFramebuffer, and other WebGL calls may block or fail, causing the thread to pause.

  2. GPU Synchronization: Operations like blitFramebuffer involve GPU commands that may require synchronization between the CPU and GPU. This synchronization can cause the thread to pause while waiting for the GPU to complete its tasks.

  3. State Changes: The function makes multiple state changes (e.g., enabling/disabling scissor test, binding framebuffers, switching programs, etc.). Each state change can introduce overhead and potential pauses, especially if the GPU needs to flush or synchronize its state.

  4. Resource Binding: Binding resources like textures and buffers can cause the thread to pause if the resources are not readily available or if there is contention for these resources.

I don't think it's the context loss, but it could any of the others.
I will email you credentials and instructions that display this situation readily.

@kainino0x
Copy link
Collaborator

Blocking operations could cause the JS/Wasm VM to stall waiting for the browser, however they can't cause it to yield and allow other VM code to execute because JS tasks are not preemptible. Only explicit pauses (await or completing the task) allow other tasks to run.

(aside: Some of the things Copilot listed would be blocking in JS, others would not be because they'd block in the browser's GPU process instead)

@kainino0x
Copy link
Collaborator

Sorry I didn't get a chance to look at your link. (I think the link isn't working anymore.)

The screenshot you sent me does confirm that you're using Emscripten's proxying (though if I'd looked at the stack in the bug you'd filed, I would have seen this). But that shouldn't really matter; it's still the case that this function can't be interrupted even with proxying.

For example for prevVB: if the buffer was already deleted before the state was saved, then nothing will be bound, so getParameter(gl.ARRAY_BUFFER) (line 1004) returns null, so prevVB == null. When we restore the state we'll call gl.bindBuffer(gl.ARRAY_BUFFER, null), which is correct - it unbinds the temporary array buffer.

I just decided to try a standalone test of this. I realized that I tested the buffer case, but I did not test the program case. It turns out that gl.getParameter(gl.CURRENT_PROGRAM) will return deleted objects, unlike gl.getParameter(gl.ARRAY_BUFFER_BINDING):

BUFFER [object WebGLBuffer]
  bindBuffer(ARRAY_BUFFER, buffer)
  isBuffer(buffer) -> true
  getParameter(ARRAY_BUFFER_BINDING) -> [object WebGLBuffer]
  deleteBuffer(buffer)
  isBuffer(buffer) -> false
  getParameter(ARRAY_BUFFER_BINDING) -> null
PROGRAM [object WebGLProgram]
  useProgram(program)
  isProgram(program) -> true
  getParameter(CURRENT_PROGRAM) -> [object WebGLProgram]
  deleteProgram(program)
  isProgram(program) -> true
  getParameter(CURRENT_PROGRAM) -> [object WebGLProgram]

I did not know about this, but it is the specified behavior:

If a program object is in use as part of current rendering state, it will be flagged for deletion, but it will not be deleted until it is no longer part of current state for any rendering context.

But of course we do take it out of the current state, so then it gets deleted. This seems to make it impossible to fully restore the previous GL state. However, we can at least avoid this error. Hopefully the application is not still relying on the deleted program being usable.

For buffers, I'm not aware of any such logic, however. Can you confirm whether you've actually seen the crash on prevVB, or if you've only seen it on prevProgram?

kainino0x added a commit to kainino0x/emscripten that referenced this pull request Mar 25, 2025
See comment for explanation.

Fixes 23654. See the other PR for additional discussion:
emscripten-core#23658 (comment)
@kainino0x
Copy link
Collaborator

I have a proposed fix for prevProgram: #23980
It's very similar to yours but it also will unbind the program if there was none bound in the first place.

If you're still able to test it, please try this.

Also please let me know about prevBuffer. If there's still an issue there I will have to keep digging....

kainino0x added a commit to kainino0x/emscripten that referenced this pull request Mar 25, 2025
See comment for explanation.

Fixes 23654. See the other PR for additional discussion:
emscripten-core#23658 (comment)
kainino0x added a commit to kainino0x/emscripten that referenced this pull request Mar 25, 2025
See comment for explanation.

Fixes 23654. See the other PR for additional discussion:
emscripten-core#23658 (comment)
@rickg-hcl
Copy link
Author

I was able to try the fix and it does remove the warning for me.
The buffer change in my PR was just because it looked similar, there was never an issue there.
I think this fix solves my issue, Thanks.

@kainino0x
Copy link
Collaborator

Thank you for confirming! I will land that soon, then.
(Apologies for taking over the change, it was just a bit easier!)

@kainino0x kainino0x closed this Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants