Skip to content

Add Delayed Messages API Explainer #1029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

joone
Copy link
Contributor

@joone joone commented May 10, 2025

The Delayed Messages API allows web developers to identify congested browser contexts or workers and provide details on the end-to-end timing of postMessage events, as well as their related blocking tasks.

@joone joone force-pushed the delayed_messages branch 10 times, most recently from 09536f0 to 1a64a5e Compare May 12, 2025 04:40
@joone joone force-pushed the delayed_messages branch from 1a64a5e to 617e3a4 Compare May 12, 2025 08:24
Copy link

@evanstade evanstade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together!


In this example, the worker log `message queue wait time + etc: 130.40 ms` indicates the time elapsed from when the main thread initiated the `postMessage` (including its ~130.30 ms serialization block) to when the worker’s `onmessage` handler began execution. This suggests that the message queue wait time is nearly zero, and the delay is primarily caused by serialization on the sender side. However, when the event loop is also busy with other long tasks, it becomes difficult to distinguish these individual sources of delay (serialization, actual queueing, deserialization, and task execution) from other task delays without manual instrumentation.

This API proposes to expose these timings (`serialization`, `deserialization`, `blockedDuration`) explicitly, simplifying the diagnosis of such delays.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what might a developer do with this information?

Copy link
Contributor

@SteveBeckerMSFT SteveBeckerMSFT May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. In the examples above, the worker is starved for resources, which seems to limit our mitigations to:

  1. Improving the performance of the code responsible for long tasks.
  2. Running more workers to run tasks in parallel.
  3. Implementing a custom task scheduler to prioritize tasks important to the application.

What additional options am I missing?

Would we use this data to create bugs to go after the performance improvements in mitigation 1) above? If so, would our previous proposal to bring the long task API to web workers be sufficient to identify post message bottle necks?

https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/main/LongTasks/explainer.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what might a developer do with this information?

Developers may use this information to identify the specific causes of message delays between the execution contexts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What additional options am I missing?

If serialization/de-serialization is the main source of delay, we can use SharedArrayBuffer.

Would we use this data to create bugs to go after the performance improvements in mitigation 1) above? If so, would our previous proposal to bring the long task API to web workers be sufficient to identify post message bottle necks?

https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/main/LongTasks/explainer.md

I tried to extend the long tasks API to support workers, but this API initially does not provide details about long tasks such as sourceURL and sourceCharPosition properties. This was one of reason that the LoAF API was introduced. Instead of extending this API, the owner of the Long Tasks API recommended proposing a new one. This new API is similar to LoAF, but focuses on message delays rather than frame updates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If serialization/de-serialization is the main source of delay, we can use SharedArrayBuffer.

I meant this in a broader sense than this one source of delay. But based on this explainer it seems like this ([de]serialization) is already easy to accurately measure, and if there's a proposed resolution such as the use of SharedArrayBuffer, has that approach been tried already? To what effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t need to copy data when using SharedArrayBuffer, which reduces overhead when handling large data. But still, it’s only available in secure, cross-origin isolated contexts. I haven’t tried using SharedArrayBuffer yet.

The reason I included serialization/deserialization timing is that in some cases, we don’t know in advance how much data will be sent. Measuring this helps us better understand the cost and performance impact.

If possible, web developers can use SharedArrayBuffer in cross-origin isolated environments. Otherwise, they can reduce the size of the data being transferred or split it into smaller chunks and send it incrementally.

// Create a Web Worker
const worker = new Worker("worker.js");

// Open IndexedDB

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would generally not recommend using IndexedDB this way --- the API has first class support for simultaneous connections from multiple threads or even processes --- so this is perhaps not the strongest example to lead with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a specific IndexedDB situation, but rather a general scenario where the message queue is congested with numerous tasks. I will update this example to exclude any mention of IndexedDB in order to prevent any confusion.


* It's challenging to intercept all messages, especially those from third-party libraries.
* Accurately measuring internal browser operations like serialization, deserialization, and precise queue waiting time is not feasible from JavaScript.
* It adds boilerplate code and maintenance overhead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is somewhat flipped. "Maintenance overhead" is the main reason to push back against adding new APIs to the web, as they become a permanent burden on what's already an extremely complex system.

If we could show that there are many websites out there doing manual instrumentation of postMessage and they would all benefit, that would strengthen the case for this proposal. (it might also help us shape the API as we'd be able to determine what commonalities the various clients all have/require)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is somewhat flipped. "Maintenance overhead" is the main reason to push back against adding new APIs to the web, as they become a permanent burden on what's already an extremely complex system.

Okay, I’ll remove that part.

If we could show that there are many websites out there doing manual instrumentation of postMessage and they would all benefit, that would strengthen the case for this proposal. (it might also help us shape the API as we'd be able to determine what commonalities the various clients all have/require)

Yes, we can look for examples of manual postMessage instrumentation both within Microsoft and across public websites.


### Summary of Problems

Existing performance tools can help detect that messages are delayed, but pinpointing the *exact cause* is difficult. The delay could be due to serialization/deserialization, event handling logic, general browser overhead, or time spent in microtasks. Measuring message queue wait time accurately is also challenging. A dedicated API is needed to accurately measure, attribute, and identify sources of `postMessage` delays, simplifying diagnosis and optimization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes message events unique? Do we need to generalize to include other types of delayed tasks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message events share common traits but also differ in the following ways:

  • Who is exchanging messages? (e.g., between windows, threads, or frames)
  • What boundaries are being crossed? (e.g., thread, window, or network)
  • How is the data processed?

So, I worked to generalize the API to better support a broader range of message types.


### Summary of Problems

Existing performance tools can help detect that messages are delayed, but pinpointing the *exact cause* is difficult. The delay could be due to serialization/deserialization, event handling logic, general browser overhead, or time spent in microtasks. Measuring message queue wait time accurately is also challenging. A dedicated API is needed to accurately measure, attribute, and identify sources of `postMessage` delays, simplifying diagnosis and optimization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "The delay could be due to...time spent in microtasks."

Does this proposal capture all of the types of delays mentioned above, including microtasks?

Copy link
Contributor Author

@joone joone May 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "The delay could be due to...time spent in microtasks."

Sure, I'll update the sentence to:
"The delay could be caused by serialization/deserialization, event handling logic, general browser overhead, or time spent in microtasks."

Does this proposal capture all of the types of delays mentioned above, including microtasks?

Yes


#### `PerformanceExecutionContextInfo.name`

Returns the name of the execution context. For workers, this is the name provided during instantiation (e.g., `new Worker("worker.js", { name: "MyWorker" })`). For windows or iframes, it might be empty or derived from `window.name`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: worker names are optional so they might be empty too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns the name of the execution context. For workers, this is the name provided during instantiation (e.g., new Worker("worker.js", { name: "MyWorker" })). It might be empty, as the name is optional. For windows or iframes, it might be empty or derived from window.name.

* `"service-worker"`
* `"shared-worker"`
* `"window"`
* `"iframe"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just FYI, there are also message ports to consider, which can also be transferred between execution contexts.

Copy link
Contributor Author

@joone joone May 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can find the channel type from PerformanceDelayMessageTiming.messageType, which uses message ports.


# Problems

When a developer sends a message using `postMessage` to a web worker or an iframe, they expect it to be processed on the target context in a timely manner. However, `postMessage` can experience significant delays, making it difficult to pinpoint the root cause. These delays might result from synchronous JavaScript executions blocking the main thread or worker thread, an excessive number of messages being sent too quickly, or significant time spent processing the data being transferred.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Chromium processes post message tasks using the default priority. Other tasks involving user input and rendering can have higher priority since they are user visible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information! I'll revise this sentence to:

While developers expect messages sent via postMessage to web workers or iframes to be processed promptly, these tasks typically receive default priority in the browser's task scheduler (e.g. Chromium). As a result, postMessage communication can experience noticeable delays due to lower prioritization compared to user-visible tasks, often compounded by synchronous JavaScript blocking the target thread, a flood of messages overwhelming the message queue, or significant time spent processing the data being transferred, making the root cause challenging to pinpoint.


# References
- [Extending Long Tasks API to Web Workers](https://github.com/joone/MSEdgeExplainers/blob/add_id_src_type/LongTasks/explainer.md)
- https://developer.mozilla.org/en-US/docs/Web/API/PerformanceLongTaskTiming
Copy link
Contributor

@SteveBeckerMSFT SteveBeckerMSFT May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event timing API, https://w3c.github.io/event-timing/, might be another resource to draw from since postMessage produces MessageEvents. The API is focused on measuring long input events responsible for UI hangs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I’ll include the Event Timing API as a reference.


#### `PerformanceDelayMessageTiming.scripts`

Returns an array of `PerformanceScriptTiming` instances. These represent the long tasks that were executing on the receiver's thread between `sentTime` and `processingStart`, thus contributing to `blockedDuration`. This leverages the same mechanism as the [Long Animation Frames API](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceLongAnimationFrameTiming/scripts).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the threshold for scripts to appear in the long tasks array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 50ms, which is the same threshold defined by the Long Tasks API.


### worker.js

In worker.js, the duration of deserialization is estimated by calling `performance.now()` immediately before and after the first access to properties of event.data (e.g., `event.data.startTime`), as this access typically triggers the deserialization process.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert in the area, tagging @dandclark as he might have more context.

Do you have any documentation supporting this claim?

as this access typically triggers the deserialization process.

On a quick read to the spec I see that the deserialization is supposed to happen before the message event is fired, but looking at the Chromium code, the property is actually a getter and the deserialization can happen on first access, just as you said here.

If the behavior is inconsistent between browsers or between data types, that might be a good argument on why an API exposing this particular time is needed. I see that below you listed

Accurately measuring internal browser operations like serialization, deserialization, and precise queue waiting time is not feasible from JavaScript.

as one of the drawbacks of polyfills. I think it would be helpful to add to an example of those unfeasible cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a quick read to the spec I see that the deserialization is supposed to happen before the message event is fired, but looking at the Chromium code, the property is actually a getter and the deserialization can happen on first access, just as you said here.

It looks like an optimization to delay deserialization until the data is actually accessed, which helps avoid unnecessary work when the message content isn’t used.

If the behavior is inconsistent between browsers or between data types, that might be a good argument on why an API exposing this particular time is needed. I see that below you listed

Got it, I will mention this in the explainer.

Accurately measuring internal browser operations like serialization, deserialization, and precise queue waiting time is not feasible from JavaScript.

as one of the drawbacks of polyfills. I think it would be helpful to add to an example of those unfeasible cases.

I believe the current examples (Cases 1, 2, and 3) already illustrate this well.


Returns an array of `PerformanceScriptTiming` instances. These represent the long tasks that were executing on the receiver's thread between `sentTime` and `processingStart`, thus contributing to `blockedDuration`. This leverages the same mechanism as the [Long Animation Frames API](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceLongAnimationFrameTiming/scripts).

## `PerformanceMessageScriptInfo` Interface

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this data will be used to correlate the call-sites with the un-minified code. @issackjohn from your experience with optional stack trace, is this data enough? I remember some discussion around script-hashes, but I don't know if that would be applicable here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use an internal tool that maps minified code back to the original source using the sourceURL and sourceCharPosition properties.

@joone joone force-pushed the delayed_messages branch 2 times, most recently from 2b30298 to 679ffe6 Compare May 18, 2025 20:38
- Updated problem description to clarify causes of `postMessage` delays and challenges in identifying root causes.
- Expanded explanation of deserialization timing inconsistencies across browsers.
- Refined summary of problems to emphasize the need for a dedicated API for diagnosing `postMessage` delays.
- Improved description of `PerformanceExecutionContextInfo.name` to clarify optionality for workers and windows/iframes.
- Removed "It adds boilerplate code and maintenance overhead" in manual instrumentation section.
- Added missing reference to the Event Timing API in the references section.
- Update congested example to remove IndexedDB references
@joone joone force-pushed the delayed_messages branch from 679ffe6 to 0ee2031 Compare May 19, 2025 00:49
@sfortiner
Copy link
Member

A few housekeeping items:

  • Include an edit to the README.md file to include this in the list of explainers
  • Please add a section at the top about how to participate in the discussion related to this explainer, appropriate venue, etc. Look through some of the other explainers for examples of doing this. If the appropriate discussion forum is MSEdgeExplainers, consider adding an issue template at https://github.com/MicrosoftEdge/MSEdgeExplainers/tree/main/.github/ISSUE_TEMPLATE
  • Consider adding at TOC (table of contents) given the length of the explainer
  • Are we aware of other web developers discussing the issue that this proposal is aiming to solve? Other browser vendors weighing into this space? If so, providing some links to those discussions in a User Research or Stakeholder Feedback/Opposition section is helpful for letting others quickly catch up to current thinking.

@joone
Copy link
Contributor Author

joone commented Jun 7, 2025

@sfortiner I’ve updated the explainer to include:

  • A table of contents
  • A section at the top on how to participate in the discussion
  • A “Related Discussion, Articles, and Browser Issues” section

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants