-
Notifications
You must be signed in to change notification settings - Fork 242
Add Delayed Messages API Explainer #1029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
09536f0
to
1a64a5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together!
DelayedMessages/explainer.md
Outdated
|
||
In this example, the worker log `message queue wait time + etc: 130.40 ms` indicates the time elapsed from when the main thread initiated the `postMessage` (including its ~130.30 ms serialization block) to when the worker’s `onmessage` handler began execution. This suggests that the message queue wait time is nearly zero, and the delay is primarily caused by serialization on the sender side. However, when the event loop is also busy with other long tasks, it becomes difficult to distinguish these individual sources of delay (serialization, actual queueing, deserialization, and task execution) from other task delays without manual instrumentation. | ||
|
||
This API proposes to expose these timings (`serialization`, `deserialization`, `blockedDuration`) explicitly, simplifying the diagnosis of such delays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what might a developer do with this information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. In the examples above, the worker is starved for resources, which seems to limit our mitigations to:
- Improving the performance of the code responsible for long tasks.
- Running more workers to run tasks in parallel.
- Implementing a custom task scheduler to prioritize tasks important to the application.
What additional options am I missing?
Would we use this data to create bugs to go after the performance improvements in mitigation 1) above? If so, would our previous proposal to bring the long task API to web workers be sufficient to identify post message bottle necks?
https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/main/LongTasks/explainer.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what might a developer do with this information?
Developers may use this information to identify the specific causes of message delays between the execution contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What additional options am I missing?
If serialization/de-serialization is the main source of delay, we can use SharedArrayBuffer.
Would we use this data to create bugs to go after the performance improvements in mitigation 1) above? If so, would our previous proposal to bring the long task API to web workers be sufficient to identify post message bottle necks?
https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/main/LongTasks/explainer.md
I tried to extend the long tasks API to support workers, but this API initially does not provide details about long tasks such as sourceURL and sourceCharPosition properties. This was one of reason that the LoAF API was introduced. Instead of extending this API, the owner of the Long Tasks API recommended proposing a new one. This new API is similar to LoAF, but focuses on message delays rather than frame updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If serialization/de-serialization is the main source of delay, we can use SharedArrayBuffer.
I meant this in a broader sense than this one source of delay. But based on this explainer it seems like this ([de]serialization) is already easy to accurately measure, and if there's a proposed resolution such as the use of SharedArrayBuffer
, has that approach been tried already? To what effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don’t need to copy data when using SharedArrayBuffer, which reduces overhead when handling large data. But still, it’s only available in secure, cross-origin isolated contexts. I haven’t tried using SharedArrayBuffer yet.
The reason I included serialization/deserialization timing is that in some cases, we don’t know in advance how much data will be sent. Measuring this helps us better understand the cost and performance impact.
If possible, web developers can use SharedArrayBuffer in cross-origin isolated environments. Otherwise, they can reduce the size of the data being transferred or split it into smaller chunks and send it incrementally.
DelayedMessages/explainer.md
Outdated
// Create a Web Worker | ||
const worker = new Worker("worker.js"); | ||
|
||
// Open IndexedDB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would generally not recommend using IndexedDB this way --- the API has first class support for simultaneous connections from multiple threads or even processes --- so this is perhaps not the strongest example to lead with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a specific IndexedDB situation, but rather a general scenario where the message queue is congested with numerous tasks. I will update this example to exclude any mention of IndexedDB in order to prevent any confusion.
DelayedMessages/explainer.md
Outdated
|
||
* It's challenging to intercept all messages, especially those from third-party libraries. | ||
* Accurately measuring internal browser operations like serialization, deserialization, and precise queue waiting time is not feasible from JavaScript. | ||
* It adds boilerplate code and maintenance overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is somewhat flipped. "Maintenance overhead" is the main reason to push back against adding new APIs to the web, as they become a permanent burden on what's already an extremely complex system.
If we could show that there are many websites out there doing manual instrumentation of postMessage
and they would all benefit, that would strengthen the case for this proposal. (it might also help us shape the API as we'd be able to determine what commonalities the various clients all have/require)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is somewhat flipped. "Maintenance overhead" is the main reason to push back against adding new APIs to the web, as they become a permanent burden on what's already an extremely complex system.
Okay, I’ll remove that part.
If we could show that there are many websites out there doing manual instrumentation of
postMessage
and they would all benefit, that would strengthen the case for this proposal. (it might also help us shape the API as we'd be able to determine what commonalities the various clients all have/require)
Yes, we can look for examples of manual postMessage instrumentation both within Microsoft and across public websites.
DelayedMessages/explainer.md
Outdated
|
||
### Summary of Problems | ||
|
||
Existing performance tools can help detect that messages are delayed, but pinpointing the *exact cause* is difficult. The delay could be due to serialization/deserialization, event handling logic, general browser overhead, or time spent in microtasks. Measuring message queue wait time accurately is also challenging. A dedicated API is needed to accurately measure, attribute, and identify sources of `postMessage` delays, simplifying diagnosis and optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes message events unique? Do we need to generalize to include other types of delayed tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Message events share common traits but also differ in the following ways:
- Who is exchanging messages? (e.g., between windows, threads, or frames)
- What boundaries are being crossed? (e.g., thread, window, or network)
- How is the data processed?
So, I worked to generalize the API to better support a broader range of message types.
DelayedMessages/explainer.md
Outdated
|
||
### Summary of Problems | ||
|
||
Existing performance tools can help detect that messages are delayed, but pinpointing the *exact cause* is difficult. The delay could be due to serialization/deserialization, event handling logic, general browser overhead, or time spent in microtasks. Measuring message queue wait time accurately is also challenging. A dedicated API is needed to accurately measure, attribute, and identify sources of `postMessage` delays, simplifying diagnosis and optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "The delay could be due to...time spent in microtasks."
Does this proposal capture all of the types of delays mentioned above, including microtasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "The delay could be due to...time spent in microtasks."
Sure, I'll update the sentence to:
"The delay could be caused by serialization/deserialization, event handling logic, general browser overhead, or time spent in microtasks."
Does this proposal capture all of the types of delays mentioned above, including microtasks?
Yes
DelayedMessages/explainer.md
Outdated
|
||
#### `PerformanceExecutionContextInfo.name` | ||
|
||
Returns the name of the execution context. For workers, this is the name provided during instantiation (e.g., `new Worker("worker.js", { name: "MyWorker" })`). For windows or iframes, it might be empty or derived from `window.name`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: worker names are optional so they might be empty too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns the name of the execution context. For workers, this is the name provided during instantiation (e.g., new Worker("worker.js", { name: "MyWorker" })
). It might be empty, as the name is optional. For windows or iframes, it might be empty or derived from window.name
.
* `"service-worker"` | ||
* `"shared-worker"` | ||
* `"window"` | ||
* `"iframe"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just FYI, there are also message ports to consider, which can also be transferred between execution contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can find the channel
type from PerformanceDelayMessageTiming.messageType
, which uses message ports.
DelayedMessages/explainer.md
Outdated
|
||
# Problems | ||
|
||
When a developer sends a message using `postMessage` to a web worker or an iframe, they expect it to be processed on the target context in a timely manner. However, `postMessage` can experience significant delays, making it difficult to pinpoint the root cause. These delays might result from synchronous JavaScript executions blocking the main thread or worker thread, an excessive number of messages being sent too quickly, or significant time spent processing the data being transferred. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Chromium processes post message tasks using the default priority. Other tasks involving user input and rendering can have higher priority since they are user visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the information! I'll revise this sentence to:
While developers expect messages sent via
postMessage
to web workers or iframes to be processed promptly, these tasks typically receive default priority in the browser's task scheduler (e.g. Chromium). As a result, postMessage communication can experience noticeable delays due to lower prioritization compared to user-visible tasks, often compounded by synchronous JavaScript blocking the target thread, a flood of messages overwhelming the message queue, or significant time spent processing the data being transferred, making the root cause challenging to pinpoint.
|
||
# References | ||
- [Extending Long Tasks API to Web Workers](https://github.com/joone/MSEdgeExplainers/blob/add_id_src_type/LongTasks/explainer.md) | ||
- https://developer.mozilla.org/en-US/docs/Web/API/PerformanceLongTaskTiming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The event timing API, https://w3c.github.io/event-timing/, might be another resource to draw from since postMessage produces MessageEvents. The API is focused on measuring long input events responsible for UI hangs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I’ll include the Event Timing API as a reference.
|
||
#### `PerformanceDelayMessageTiming.scripts` | ||
|
||
Returns an array of `PerformanceScriptTiming` instances. These represent the long tasks that were executing on the receiver's thread between `sentTime` and `processingStart`, thus contributing to `blockedDuration`. This leverages the same mechanism as the [Long Animation Frames API](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceLongAnimationFrameTiming/scripts). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the threshold for scripts to appear in the long tasks array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 50ms, which is the same threshold defined by the Long Tasks API.
|
||
### worker.js | ||
|
||
In worker.js, the duration of deserialization is estimated by calling `performance.now()` immediately before and after the first access to properties of event.data (e.g., `event.data.startTime`), as this access typically triggers the deserialization process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert in the area, tagging @dandclark as he might have more context.
Do you have any documentation supporting this claim?
as this access typically triggers the deserialization process.
On a quick read to the spec I see that the deserialization is supposed to happen before the message event is fired, but looking at the Chromium code, the property is actually a getter and the deserialization can happen on first access, just as you said here.
If the behavior is inconsistent between browsers or between data types, that might be a good argument on why an API exposing this particular time is needed. I see that below you listed
Accurately measuring internal browser operations like serialization, deserialization, and precise queue waiting time is not feasible from JavaScript.
as one of the drawbacks of polyfills. I think it would be helpful to add to an example of those unfeasible cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a quick read to the spec I see that the deserialization is supposed to happen before the message event is fired, but looking at the Chromium code, the property is actually a getter and the deserialization can happen on first access, just as you said here.
It looks like an optimization to delay deserialization until the data is actually accessed, which helps avoid unnecessary work when the message content isn’t used.
If the behavior is inconsistent between browsers or between data types, that might be a good argument on why an API exposing this particular time is needed. I see that below you listed
Got it, I will mention this in the explainer.
Accurately measuring internal browser operations like serialization, deserialization, and precise queue waiting time is not feasible from JavaScript.
as one of the drawbacks of polyfills. I think it would be helpful to add to an example of those unfeasible cases.
I believe the current examples (Cases 1, 2, and 3) already illustrate this well.
|
||
Returns an array of `PerformanceScriptTiming` instances. These represent the long tasks that were executing on the receiver's thread between `sentTime` and `processingStart`, thus contributing to `blockedDuration`. This leverages the same mechanism as the [Long Animation Frames API](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceLongAnimationFrameTiming/scripts). | ||
|
||
## `PerformanceMessageScriptInfo` Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this data will be used to correlate the call-sites with the un-minified code. @issackjohn from your experience with optional stack trace, is this data enough? I remember some discussion around script-hashes, but I don't know if that would be applicable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use an internal tool that maps minified code back to the original source using the sourceURL and sourceCharPosition properties.
2b30298
to
679ffe6
Compare
- Updated problem description to clarify causes of `postMessage` delays and challenges in identifying root causes. - Expanded explanation of deserialization timing inconsistencies across browsers. - Refined summary of problems to emphasize the need for a dedicated API for diagnosing `postMessage` delays. - Improved description of `PerformanceExecutionContextInfo.name` to clarify optionality for workers and windows/iframes. - Removed "It adds boilerplate code and maintenance overhead" in manual instrumentation section. - Added missing reference to the Event Timing API in the references section. - Update congested example to remove IndexedDB references
A few housekeeping items:
|
…gements' sections
@sfortiner I’ve updated the explainer to include:
Thank you! |
The Delayed Messages API allows web developers to identify congested browser contexts or workers and provide details on the end-to-end timing of postMessage events, as well as their related blocking tasks.