Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with web APIs #82

Open
andreubotella opened this issue Apr 16, 2024 · 13 comments
Open

Integration with web APIs #82

andreubotella opened this issue Apr 16, 2024 · 13 comments

Comments

@andreubotella
Copy link
Member

andreubotella commented Apr 16, 2024

I've been looking at how AsyncContext would integrate with various APIs and events in the web platform. These are my conclusions:

Web APIs

These can be grouped in various categories. I've listed APIs shipping in at least two browser engines, with a few single-engine additions that I thought relevant.

Schedulers

For these APIs, their sole purpose is to take a callback and schedule it in the event loop in some way. The callback will run at most once (except for setInterval) after the function returns, when there is no other JS code running on the stack.

Since the default context for these callbacks would be the empty context, they should be called with the context in which the API was called:

  • setTimeout
  • setInterval
  • queueMicrotask
  • requestAnimationFrame
  • HTMLVideoElement's requestVideoFrameCallback method
  • requestIdleCallback
  • scheduler.postTask

Completion callbacks

These APIs take callbacks that are called once, to indicate the completion of an asynchronous operation that they start. These act just like the callbacks passed to .then/.catch in promises, and therefore they should behave the same, being called with the context in which the web API was called:

  • <canvas>'s toBlob method
  • DataTransferItem's getAsString method (drag & drop)
  • Notification.requestPermission
  • BaseAudioContext's decodeAudioData method (web audio API)
  • RTCPeerConnection's createOffer, setLocalDescription, createAnswer, setRemoteDescription and addIceCandidate methods (WebRTC)
  • Geolocation's getCurrentPosition method
  • The legacy filesystem entries API:
    • FileSystemEntry's getParent method
    • FileSystemDirectoryEntry's getFile and getDirectory methods
    • FileSystemFileEntry's file method
    • FileSystemDirectoryReader's readEntries method

Callbacks run as part of an async algorithm

The following APIs call the callback to run user code at most once as part of an asynchronous operation that they start. Therefore, they should probably use the context in which the web API was called:

  • Document's startViewTransition method (CSS view transitions)
  • LockManager's request method (web locks)

Note that these APIs are conceptually similar to this JS code, where the callback would be called with the same context as the call to api:

async function api(callback) {
  const state = await setup();
  await callback(state);
  await finish(state);
}

(See also streams in "others" below.)

Observers

Observers are web API classes that take a callback in its constructor, and provide an observe() method to register things to observe. Whenever something relevant happens to the observed things, the callback will be called asynchronously to indicate that those observations have taken place.

You could think of both the construction of an observer, and the time its observe() method is called, as being contexts that could be relevant to a call resulting from that observation. However, for all web API observers, the updates are batched so that the callback is called with an array of observation records, which can go back to multiple calls to observe().

Unlike with the case with multiple async originating contexts in events, here it doesn't make sense to go with the latest originating call to observe(), since that context only matters for one of the observations. Therefore, the only context that seems to make sense is the one active when the observer class is constructed.

These are the web APIs which follow this pattern:

  • MutationObserver
  • ResizeObserver
  • IntersectionObserver
  • PerformanceObserver
  • ReportingObserver

There is also an observer-like class defined in TC39, namely FinalizationRegistry. It differs from web APIs in that the callback is called once for each registered object that has been GC'd, rather than with a list of observations. This would make it possible to call the callback with the context at which register() (the equivalent of observe()) was called, but we chose to have it behave the same as web API observers instead. See #69 for the relevant discussion.

Action registrations

These APIs register a callback or constructor to be invoked whenever some action runs:

  • MediaSession's setActionHandler method
  • Geolocation's watchPosition method
  • RemotePlayback's watchAvailability method
  • Worklets' registration functions (AudioWorkletGlobalScope's registerProcessor, PaintWorkletGlobalScope's registerPaint)

In all of the above cases, the action will be invoked browser-internally, so there is no JS on the stack or any other context that could be used other than that active when the API is called. (In the terms we define below for events, they have no originating context.)

Additionally, custom element classes can be registered with customElements.define, and their constructor and reaction methods are also invoked whenever some action runs. However, these can be called with JS code on the stack (when the element is changed via web APIs), or without it (when it's changed through a user interaction, such as editing). (In event terms, they could optionally have a sync call-time context.)

Others

  • The DOM spec's NodeIterator and TreeWalker. These APIs are created from a document method (document.createNodeIterator() and document.createTreeWalker()), which takes either a function or an object with an acceptNode() method to act as a filter. When the various methods of these classes are called, the filter is invoked to check whether a node should be returned or skipped. The possibilities are to use the creation-time context, or the call-time context of the methods of those classes. This is essentially the same as it.filter(cb) in the iterator helpers proposal, and the behavior should be consistent with whatever we decide there (see Interaction with iterator helpers #75).

  • Streams ({Readable,Writable,Transform}Stream). Constructors for these classes take an object on which methods will be called, and they also take an options bag (two for TransformStream) with an optional size callback field. While the start method of the first argument will only ever be called synchronously in the class's construction (so it's not a concern wrt AsyncContext), the other methods and callbacks are trickier.

    These methods are run as part of an algorithm, but they differ from the APIs listed above in that they're registered by a different API than they're used by (they're registered by the constructor and used by e.g. reader.read()). Furthermore, they can be run with JS code on the stack (e.g. reader.read()), or without it (e.g. when passing a ReadableStream as a request body to fetch()). Although piping could make things more complicated, and more research is needed, the possibilities for these methods seem to be registration-time context and originating context, as with events.

Events

The web platform has many events. Just to give you an idea, according to BCD (which provides browser compatibility data for MDN and caniuse.com), there are 263 different event names which are supported in at least two browser engines. Furthermore, the same event name can have different meanings in different APIs (e.g. the error event on window indicates uncaught script errors, but the error event on XMLHttpRequest indicates a fetch failure), so that's certainly an underestimate of the amount of work needed to figure out all of them. I have not yet analyzed them all, but these are my findings from analyzing a subset.

Background

Every time an event callback is invoked, there is at least one relevant context: the time at which the event listener or handler was registered (i.e. when addEventListener was called, or e.g. onclick was set). We will be calling this the registration-time context.

Although most of the time events are fired asynchronously, there are some times when calling an API will caused an event to be fired synchronously. Some examples are calling the abort() method on XMLHttpRequest, which will fire an abort error; or changing the value of location.hash to a non-empty value, which will cause a synchronous navigation which in turn fires the popstate event on window. The context when this synchronous API is called is the sync call-time context, and it is the default that would be used if we don't change the web specs.1

In all other cases, the event will fire when there's no other JS code on the stack. Sometimes this is because the event was triggered by the browser (i.e. after a user interaction), in which the only possible context that matters is the registration-time one. But other times there are APIs that asynchronously cause the event to fire, such as XMLHttpRequest's send() method causing the load event. These are async originating contexts.

It seems like whenever there is a sync call-time context in the web platform, there are no async originating contexts that matter. After all, the immediate cause of the event is the synchronous web API call.2 However, one same type of event could sometimes be fired with a call-time context, sometimes with an async originating context, and sometimes with no originating contexts. An example is the click event, which is usually browser-triggered, but el.click() will fire it synchronously.

Sometimes there are multiple async contexts that could originate an event dispatch. For example, if you have a <video> element, and in quick succession the user hits play, and then some JS code runs video.load() and video.play(), which would be the async originating context (if any) for the load event? But you could define some criterion to sort them: for example, always use the most important of such contexts (which would need to be judged independently for each API), and if there are multiple, the latest.

If we define some such criterion, then for every event there would be at most one originating context, which would be the sync call-time context if there is one, and otherwise the "winning" async originating context. Only if the event was browser-triggered there would be no originating context. And with that, the decision for every event would be a binary one between the registration-time context and the originating context.

The choice of context

This decision is not yet a settled question for every event, but there are two events in particular which we know have specific needs:

  • The unhandledrejection event is an asynchronous event, and its async originating context is that active when the HostPromiseRejectTracker host hook is called. Bloomberg's use case for this proposal needs this originating context to be accessible from the unhandledrejection event listener, although it does not necessarily need to be the active context when the listener is called. For their needs it would be sufficient to expose that context as an AsyncContext.Snapshot property in the event object, for example. (This was previously discussed in Specifying unhandledrejection behavior #16.)
  • The message event on window and MessagePort is an asynchronous event, whose async originating context is that active when window.postMessage() or messagePort.postMessage() was called to enqueue the relevant message. Although this event is meant for communication, it is often used in the wild as a way to schedule tasks, since it has scheduling properties that other scheduling APIs didn't have before scheduler.postTask(). As such, the event listener should be called with the originating context by default (at least when the message is being sent to the same window), so it behaves like other scheduling APIs.

For other events, there are various possibilities. However, it seems clear that if there is an originating context, advanced users of AsyncContext need to be able to access it, and using the registration-time context would make this impossible.3 At the same time, it seems like the context that non-advanced users would expect is the registration-time.

The choice between registration-time and originating context actually reflects different use cases for AsyncContext (something that Yoav Weiss discusses in some detail in this blog post, applied to the needs of task attribution). So the best course of action is probably to let listeners opt into the originating context, if one exists. This could be done by having an AsyncContext.Snapshot as a property of the event object, or by using AsyncContext.callingContext() (#77).


cc @annevk @domenic @smaug---- @yoavweiss @shaseley

Footnotes

  1. All event types can be called with a sync call-time context, through the dispatchEvent() method. We will be ignoring this, though, since we're focusing on events fired by web platform APIs.

  2. One example of an API where this wouldn't be the case would be something that enqueues tasks or events, and runs them synchronously at some later point when some API is called. As far as I know, there's nothing like this built into the web platform.

  3. The inverse is not true: if events use the originating context, an event listener could make sure the callback is called with the registration-time context by wrapping it with AsyncContext.Snapshot.wrap.

@andreubotella
Copy link
Member Author

After some discussion in today's meeting, it was pointed out that when you need the originating context for an event, you usually want to choose which context to use when registering the event. One way of doing this would be to add an option to the addEventListener options bag to opt into using the originating context.

If such an option is passed, what should happen when there is no originating context? Note that many events can be fired with or without an originating context, depending on the specific cause of the event dispatch. Would it make sense to use the empty context? If not, would it make sense to use the registration-time context even though the option was meant to opt out of it?

@Qard
Copy link

Qard commented Apr 17, 2024

There is always a valid originating context, it's just not necessarily directly synchronously calling the continuation. For example, in all those XMLHttpRequest cases the originating context would be when the send() method was called. Some async stuff happens in the background, but logically a line can be drawn through all that internal behaviour back to having been triggered by the send() call. This is yet another reason why I strongly believe context should flow through the path through internals and only bind around some points at the end of an edge and not the beginning, so everywhere, including internals should always have a valid context.

Async context functions by modelling segments of sync execution on cpu rearranged by task scheduling to resume the state as it was at the point the task was scheduled. This means modelling the behaviour on cpu exactly and not just what it looks like from the dynamic language perspective. I have a bunch of writing I'm working on making shareable soon which goes into details like this of how to do complete context management.

@andreubotella
Copy link
Member Author

andreubotella commented Apr 17, 2024

There is always a valid originating context, it's just not necessarily directly synchronously calling the continuation.

That is not the case. In the browser, a user-initiated click event does not have an originating context (other than the empty one), simply because there's no JS code causing the event, synchronously or asynchronously. The same goes for events coming from outside the thread or process, such as worker message events, or signal events in Node.js.

@littledan
Copy link
Member

I think we should avoid ever using the empty context. It feels like a malformed leak to do so; it would mean that we can't use AsyncContext for things where we need to be able to inherit from somewhere and get a reliable value. I'd prefer falling back to the registration time context if the origination context is selected (e.g., in an addEventListener option) but then isn't available at runtime.

@Flarna
Copy link

Flarna commented Apr 17, 2024

Which context is used at the beginning of the main script? The/An empty context or something else?
If it is an empty context it would be also propagated into any listener registered at that time.
So I think the empty context (in OTel they name it ROOT_CONTEXT) is a thing in general and not something wrong.

@andreubotella
Copy link
Member Author

andreubotella commented Apr 17, 2024

Which context is used at the beginning of the main script? The/An empty context or something else? If it is an empty context it would be also propagated into any listener registered at that time. So I think the empty context (in OTel they name it ROOT_CONTEXT) is a thing in general and not something wrong.

When starting an agent (a V8 isolate, a Node.js process...) from scratch, the context there would be the empty context indeed.

But this question is actually related to another topic of discussion related to the web integration that I also wanted to bring up (but forgot to mention in the OP): should cross-document navigations have an associated context?

This would affect the originating context of various events fired during page load, as well as the running context of scripts found during parsing, and the registration-time context for event handler attributes (e.g. <button onclick="something()">).

Cross-origin navigations of any kind can't possibly keep any context from before the navigation, since the AsyncContext state is agent-specific. Same-origin navigations caused by the user (e.g. by clicking a link or navigating via the URL bar) also wouldn't have an originating context, and so would need to have the empty context. But what about same-origin navigations caused by setting location.href? (These could be observed by sending AsyncContext.Variable objects cross-window.) What about setting iframe.src?

@Qard
Copy link

Qard commented Apr 18, 2024

That is not the case. In the browser, a user-initiated click event does not have an originating context (other than the empty one), simply because there's no JS code causing the event, synchronously or asynchronously. The same goes for events coming from outside the thread or process, such as worker message events, or signal events in Node.js.

The originating context of a click would be the setting up of a system to track clicks, which would be the page loading in the first place and therefore should be the top-level context.

Events coming from elsewhere originate from whatever initiated the connection to those elsewhere things.

Nothing ever happens without code at some level having been defined to initiate that a thing can happen. That system might be very far away from where the final result is, but every bit of logic can eventually be traced back to the process starting. This is what context is modelling.

Within a browser you would surely want to sandbox process-level context down to page-level context, but that's still a subset that flows out of other systems with connections to external systems set up at page load, so that boundary becomes the "start" of the context management window.

The point I'm trying to make is that context is an expression of application behaviour all the way back to the root, whatever that root may be expressed as, and all things derive from points in that tree. If something would be initiated from "outside" that execution tree then it still has to have some source of when that system became connected which is likely just the root of that context. Having no contextual parent is not valid as nothing can execute without something having started that execution.

@andreubotella
Copy link
Member Author

The originating context of a click would be the setting up of a system to track clicks, which would be the page loading in the first place and therefore should be the top-level context

Fair enough. But in the web specs, all of this happens implicitly, and this is not something web developers have any insight into. Furthermore, as @littledan points out above, in cases where the only originating context would be the one in which basic tracking systems are set up at page load, that context is of no use to developers (other than maybe in distinguishing it from other contexts that they have set up).

Events coming from elsewhere originate from whatever initiated the connection to those elsewhere things.

This raises an interesting question. Windows, workers and MessagePort objects all share the postMessage() API and the message event. However, for windows, this is always same-agent, where as for workers it's always cross-agent. It seems like what you're proposing would have window message events having the call to postMessage() as their originating context, whereas worker message events would have the worker creation as their originating context.

And this gets really interesting with MessagePort, because it will start same-agent, but it can be transferred across same-origin agents. So if it's transferred to a worker, should the originating context start being the context at which it was transferred off-agent? And then it can be transferred back to the same agent. Also, what if the MessagePorts at both sides of a channel get both transferred to the same agent?

@Qard
Copy link

Qard commented Apr 18, 2024

Furthermore, as @littledan points out above, in cases where the only originating context would be the one in which basic tracking systems are set up at page load, that context is of no use to developers

I have some writing which I will be sharing soon which covers this issue and others. Need some time to make it publicly consumable as it's part of internal Datadog docs at the moment.

It describes a useful way of thinking about it, which I will summarize here. Asynchrony is rearranging sync segments of execution based on readiness. There is generally a scheduler managing segments at the barrier between what is actually running in cpu vs what is running in the "runtime" environment. However there's also often further layers doing their own scheduling of sorts like connection pools or task queues. Sometimes these things are useful to express but not always. Mechanically though, execution flows through these arrangement systems and so context must also flow through them for them to ensure context is always present. In many cases there are universally acceptable reductions of that graph, like carving the path around a connection pooling mechanism to represent user intent rather than runtime machinery. This binding of points does not mean that path through internals has no context however. It just means that when it reaches the task it wants bound to a different context that the context it flowed through by call path automatically will no longer be used in that task.

All systems have an internal path to follow and it should be followed because that will be the valid propagation case in every case except the exceptions which bind around patterns like connection pools. So internal path is the default and user intent path is a reduction applied in particular cases where it makes universal sense.

There's also a way to restore internal path as-needed, if you do flow through the internal path and just bind around it, but I'll leave that for when I share the other docs. Basically though we should be considering execution flow at the cpu level as the default expression of how context flows, only carving out specific exceptions where register path is more relevant to the user.

@domenic
Copy link
Member

domenic commented Apr 19, 2024

However, for windows, this is always same-agent

This is not correct; cross-origin postMessage() between windows works fine.

@rjgotten
Copy link

rjgotten commented May 8, 2024

I've got one more example of how this could mesh with a Web API that was passed over:
There's a spec being worked on for signals (i.e. computed and derived observable state) which for now has explicitly left open the matter of async derived signal value computation as to-be-decided.

Having async context available would mean backing store and tracking of dependent observables could flow via async context and signals could track dependencies easily in async functions. Transparent to the API consumer that creates a derived signal.

See also the discussion for async signals:
tc39/proposal-signals#30

@andreubotella
Copy link
Member Author

andreubotella commented May 8, 2024

I've got one more example of how this could mesh with a Web API that was passed over: There's a spec being worked on for signals (i.e. computed and derived observable state) which for now has explicitly left open the matter of async derived signal value computation as to-be-decided.

Having async context available would mean backing store and tracking of dependent observables could flow via async context and signals could track dependencies easily in async functions. Transparent to the API consumer that creates a derived signal.

See also the discussion for async signals: tc39/proposal-signals#30

In this issue I meant "web API" meaning "an API defined by the web platform specs", not something defined in TC39. You're right that the integration with signals has to be considered (and @shaylew and @littledan have already started taking a look at it), but since signals is at a lower stage from AsyncContext, that consideration could be considered at some later point when signals is farther along, whereas the integration with well-established web APIs needs to be figured out before the AsyncContext proposal advances to stage 2.7.

@andreubotella
Copy link
Member Author

Since I opened this issue, I noticed a couple cases that I missed in my analysis:

  • As a result of the conversation in The case for _not_ binding around awaits #83, I started thinking about the resolution-time context of promises returned by web APIs, and I realized that even in the current spec text, the rejection-time context of such promises is observable through the unhandledrejection event. In almost every case, the fulfillment and rejection contexts would be the context in which the web API that created them is called, but cases involving async callbacks need more investigation.

  • Similarly to unhandledrejection, the error event is fired on the window or worker global object when some script execution threw an uncaught exception. The originating context for this event should also be the context in which the script was run. Unlike with the unhandledrejection case, though, the context will be that in which the script started executing, not that in which the exception was thrown:

    const cb = () => {
      asyncVar.run("bar", () => {
        throw new Error();
      });
    };
    
    asyncVar.run("foo", () => {
      setTimeout(cb, 0);
    });
    
    window.addEventListener(
      "error",
      () => console.log(asyncVar.get()),  // foo
      { useOriginatingContext: true }  // (the exact API is pending bikeshedding)
    );

    This is because .run() will always restore the previous context after it finishes running, even if the callback threw an exception. Therefore once the exception has been thrown through every JS stack frame, which is the only time that the web specs can get to it, it will not keep the context in which it was originally thrown.

  • One interesting case for the error event is FinalizationRegistry callbacks, because that event is fired if there was some uncaught exception, but the HTML spec text that fires this event doesn't have access to even the context in which the callback is run (let alone the throw context). This is because the TC39 CleanupFinalizationRegistry algorithm switches into and out of the context, without involving the embedder that calls it. We should probably change this to make this context available to the caller somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants