Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interaction with iterator helpers #75

Open
andreubotella opened this issue Mar 5, 2024 · 13 comments
Open

Interaction with iterator helpers #75

andreubotella opened this issue Mar 5, 2024 · 13 comments

Comments

@andreubotella
Copy link
Member

andreubotella commented Mar 5, 2024

In a previous meeting we agreed that, although built-in iterators should not propagate the creation context by default, iterator helpers should probably propagate it:

const var1 = new AsyncContext.Variable();

const iter = Iterator.from([42]);
const callback = (v) => {
  return var1.get();
};
const mapped = var1.run("foo", () => {
  return iter.map(callback);
});

console.log(mapped.next());  // {value: "foo", done: false}

This needs modifications on the (sync and async) iterator helpers spec text.

@Qard
Copy link

Qard commented Mar 6, 2024

Why is specific propagation for these methods needed? Would the callbacks not just run in whatever context the methods are called in, since it's synchronous?

@jridgewell
Copy link
Member

jridgewell commented Mar 6, 2024

I kinda go back and forth on this. We decided that user generators should capture their init-time context for all iteration calls – I argued that we shouldn't need to, because the init context is likely to be maintained by the calling code anyways. But since we did, this seems like a normal extension.

@Qard
Copy link

Qard commented Mar 7, 2024

Iterators aren't async on their own though, they're just an abstraction over something else that is async. Even an async iterator is just an abstraction over a bunch of promises, so it's the promise itself that needs to maintain the context, not the iterator.

It has been my experience with AsyncLocalStorage that it has actually been a good thing that iterators didn't manipulate the context because in most cases just adopting whatever context the next was called in was the correct case, and in any exceptions to that case it was easy to use AsyncResource.bind(...) to correct it. I think that's generally the better approach.

Context management should model where asynchrony actually is and leave it to the user to work around abstractions which are not themselves inherently async as the correct way to express those can be ambiguous. Rather we should be considering our learnings with promises that often taking one path allows achieving the other through bind(...), but often the reverse is not true. The safest path to carve for context flow always seems to be the innermost, as in the closest to the actual asynchrony, as shorter edges can be drawn around those parts of the graph to exclude them, but once a path is excluded from the graph it's very difficult to add it back.

@jridgewell
Copy link
Member

@Qard Do you have an example where using the calling time context was the correct case?

@littledan
Copy link
Member

IMO iterator helpers should propagate the context to the functions which are called inside of them. Not to anywhere else, not to anyplace that is analogous to what iterators have.

@shicks
Copy link
Contributor

shicks commented Mar 19, 2024

This seems to be clarifying a question brought up but not clearly resolved in #18 (comment).

Given the following,

const var1 = new AsyncContext.Variable({defaultValue: 0});
function* gen() {
  console.log(`gen1: ${var1.get()}`);
  yield;
  console.log(`gen2: ${var1.get()}`);  
}
const iter = gen();
var1.run(1, () => iter.next());
const mapped = var1.run(2, () => iter.map(() => console.log(`map: ${var1.get()}`)));
var1.run(3, () => mapped.next());

My understanding of #18 is that the decision was made to spec this as logging gen1: 0, gen2: 0, but the question of whether it should be map: 0, map: 2, or map: 3 was not clearly settled.

@andreubotella indicated in that issue (and repeated again in the OP here) that this would log map: 2 because map() would snapshot the context at the time it was called. This seems (to my mind) to be consistent with gen1: 0, gen2: 0 snapshotting the context at the time the generator is initially invoked.

I understand @Qard to be advocating for gen1: 1, gen2: 3, map: 3 by way of not doing any snapshotting here because nothing is actually "async". I slightly disagree with that characterization because I think the relevant distinction (at least logically, I can't speak to spec internals) is more about whether a callback cb passed to a function fn(cb) is run "synchronously" before fn returns a value back to its caller (which may be a promise), or "asynchronously" triggered by some later thing after that initial return. In the case of both iterator helper callbacks and all the code in the body of a (non-async) generator, it's all run at a later time, so (in the spirit of Zalgo) I would think about it as falling more in the async bucket, even though that later time happens to be in the same microtask.

That said, @Qard brought up an interesting point that it's a lot easier to support both behaviors if the built-in behavior is to treat everything as "sync" and not capture the snapshot (and @jridgewell posted an example in the other issues of a wrapper to add the snapshotting). In fact, it's not at all clear to me how one could possibly go about accessing the outer context from the caller of next() if the language itself is forcibly restoring the snapshot at each continuation.

I think either gen1: 0, gen2: 0, map: 2 or else gen1: 1, gen2: 3, map: 3 make sense and is self-consistent. I would be surprised by any other combination. Most of my sample use cases don't really make sense with generators, but (FWIW) one use case (propagating an AbortSignal) seems moderately relevant and would benefit from the retaining the initial snapshot. But I'm also interested in what use cases would prefer the other treatment. Either way, if the outcome of #18 stands and the init-time context is preserved throughout the generator, then I'd like to see iterator helpers also preserve the context from the point at which the helper is called, and vice versa if the basic untransformed generator context is changed to no longer capture then neither should the helpers.

@jridgewell
Copy link
Member

jridgewell commented Mar 19, 2024

Your understanding is correct. From our call today, I believe we're in agreement that gen1: 0, gen2: 0, map: 2 is an acceptable answer (both user generators and iterator helpers capture and restore init-time context). But @Qard is suggesting that gen1: 1, gen2: 3, map: 3 is also acceptable (and in some cases even the correct choice, but we still need to define that case) because user generators shouldn't capture a context and by extension neither should iterator helpers. I don't think anyone else is arguing for a different result, so it'll be one of those 2.

I agree with @Qard. Generator's yields are not themselves async, it depends on who's calling them. If the generator is sync, I fully expect that the calling code will handle them within their own sync context (so there's no need to capture, we'll execute within the same context anyways). And if the generator is async, then I expect the calling code will handle that asynchronicity by using await/.then() (and so the context will be automatically propagated anyways). Eg,

const v = new AsyncContext.Variable();
function* foo() {
  console.log(`gen: ${v.get()}`);
  yield 1;
  console.log(`gen: ${v.get()}`);
  yield 2;
  console.log(`gen: ${v.get()}`);
}

async function* bar() {
  // this will yield a promise each time `foo` yields/returns a value.
  yield* foo();
}

v.run(1, () => {
  // this will log 3 `gen: 1` messages
  Array.from(foo());

  // So will this.
  for (const i of foo()) {}

  // So will this.
  const it = foo();
  const v1 = it.next();
  const v2 = it.next();
  const v3 = it.next();
});

v.run(2, () => {
  // this will log 3 `gen: 2` messages
  Array.fromAsync(bar());

  // So will this.
  for await (const i of bar()) {}

  // So will this (parallel calls will queue).
  const it1 = bar();
  const v1 = it1.next();
  const v2 = it1.next();
  const v3 = it1.next();

  // So will this (serial waiting won't queue).
  const it2 = bar();
  const v1 = await it2.next();
  const v2 = await it2.next();
  const v3 = await it2.next();
});

There are cases where letting the iterator escape the context will lead to a different result, as you demonstrate. But why treat this any different than a promise that escapes its context?

const it = v.run(3, () => foo())
// I think this should log `gen: 4`
v.run(4, () => it.next());

const p = v.run(3, () => Promise.resolve());
v.run(4, () => {
  // logs `p: 4`
  p.then(() => console.log(`p: ${v.get()}`));
});

@Qard
Copy link

Qard commented Mar 19, 2024

The particular example I was thinking of was co, which basically did async/await before async/await existed. I'm not referring to that library specifically though but rather the pattern it presents.

It used a yield point to represent a task to be completed at some point and the return back into the generator came from that task being resolved. In the case of co that task was a promise, but it is still a common expression of generators for a yield to correspond to a value being returned into the generator later and that being some sort of task expression.

Now from an APM perspective, what if we want to instrument such form of generator and represent each task as a span? Okay, we can wrap the code between each external gen.next(...) call, but then if we try to attribute new spans created within the generator to tasks which came before it we can't do that because the yield point binding has drawn an edge around the execution we wanted to connect to. There's no way to restore the inner path whereas there is a way to draw that new inner line after the fact.

Let's look at a code example:

function* workQueue(tasks) {
  const results = []
  for (const task of tasks) {
    results.push(task)
  }

  // Sum the results wrapped in a span to express its timing
  tracer.getCurrentSpan().createChildSpan('aggregate results', () => {
    return results.reduce((m, v) => m + v, 0)
  })
}

function processQueue(tasks) {
  const gen = workQueue(tasks)
  let result = gen.next()
  while (!result.done) {
    // Do some work wrapped in a span to express its timing
    tracer.getCurrentSpan().createChildSpan('task', (span) => {
      gen.next(result.value * 2)
    })
  }
  return result.value
}

processQueue([ 1, 2, 3 ])

In the example above what we want is for each step to be a child of the previous and for the final return to connect back to the tasks which led to the aggregation task it represents. This is impossible when forcing that generators retain the context between yield and return. I expect this will be a common problem in how we think about context management that the most seemingly natural way to think about it would be to model it as a direct reflection of execution at the language level, but actually what is often needed is for it to reflect what the execution model is at the hardware level. We want to flow in exactly the arrangement the execution does on-cpu.

Situations like this though are where context flow is subjective and different users will have different opinions, which is also why I had previously suggested an instance-scoped bind to allow different store owners to apply their differing opinions and why, in the diagnostics_channel module in Node.js, I've implemented a system of binding individual stores to channel events with channel.bindStore(store, transform), channel.unbindStore(store) of channel.runStores(event, scope). Incidentally, I've also been meaning to prepare a proposal for diagnostics_channel at the language level to integrate with AsyncContext, so I figure it's relevant to think about how that could impact the use of AsyncContext here.

The point I want people to keep in mind though is just what was already expressed: it's easy to reduce a graph by drawing new edges within it, but it's very difficult to expand a graph after branches have already been orphaned. I'm not convinced we actually can have a universally acceptable graph shape, which is why I feel it's important we provide the tools to effectively mutate our own representations of the graph to fit our uses and for the default path it follows to avoid locking users out from the expressions of the graph which they actually need.

@shicks
Copy link
Contributor

shicks commented Mar 19, 2024

@jridgewell

v.run(2, () => {
  // this will log 3 `gen: 1` messages
  Array.fromAsync(bar());

Did you mean it will log "gen: 2"? I don't see where a "1" would come from here.

There are cases where letting the iterator escape the context will lead to a different result, as you demonstrate. But why treat this any different than a promise that escapes its context?

I think the difference is that in the case of a generator, the code lives in the generator's definition, and so there's maybe some surprise that the context is changing out from under it. I could ask the same thing about the continuations - there's an understanding that in /* context 1 */ await 0; /* context 2 */, the await is giving up control flow, but it will come back at some point and when it does, context 2 will always be the same as context 1. In the same way, /* context 1 */ yield; /* context 2*/ is giving up control flow, and when it comes back it would be somewhat surprising to me if the context has suddenly changed.

Honestly, I don't have a strong opinion either way. The transpilation/polyfill is actually easier if we don't have any special handling for generators. My feeling is that capturing the init snapshot would be less surprising, but I can also understand the flexibility argument winning out.


@Qard

The particular example I was thinking of was co, which basically did async/await before async/await existed. I'm not referring to that library specifically though but rather the pattern it presents.

Thanks for that example - that's helpful.

Now from an APM perspective, what if we want to instrument such form of generator and represent each task as a span? Okay, we can wrap the code between each external gen.next(...) call, but then if we try to attribute new spans created within the generator to tasks which came before it we can't do that because the yield point binding has drawn an edge around the execution we wanted to connect to. There's no way to restore the inner path whereas there is a way to draw that new inner line after the fact.

I'm also looking at using AsyncContext for tracing, though I'm not sure I was imaginative enough to come up with a valid use case for tracing generators at all.

The point I want people to keep in mind though is just what was already expressed: it's easy to reduce a graph by drawing new edges within it, but it's very difficult to expand a graph after branches have already been orphaned. I'm not convinced we actually can have a universally acceptable graph shape, which is why I feel it's important we provide the tools to effectively mutate our own representations of the graph to fit our uses and for the default path it follows to avoid locking users out from the expressions of the graph which they actually need.

I think that's a really good point to keep in mind, and I agree that if we go with the init snapshot, that it would be very hard to use this for the other use case.


Who is arguing strongly for snapshotting? I recall seeing that @jridgewell had pointed out a weird example with Promise.all and async generators that I didn't quite follow, but it seemed like snapshotting resolved some ambiguity there.

@jridgewell
Copy link
Member

Let's look at a code example:

I do not understand this code example. It's seems we're missing yield and a result = gen.next()? But even if I imagine that in the example, I'm still not sure what the result of it should be.


Did you mean it will log "gen: 2"? I don't see where a "1" would come from here.

Yes I meant "gen: 2", and I fixed my example.

I could ask the same thing about the continuations - there's an understanding that in /* context 1 */ await 0; /* context 2 */, the await is giving up control flow, but it will come back at some point and when it does, context 2 will always be the same as context 1. In the same way, /* context 1 */ yield; /* context 2*/ is giving up control flow, and when it comes back it would be somewhat surprising to me if the context has suddenly changed.

I think I see generators and async functions as being different. The context before and after an await point is the same because that await automatically resumes the async function immediately without any input from calling code. But generators aren't that, a yield completely pauses until the calling code resumes it. It's the caller that decides the context that execution resumes from.

In my mind, generators aren't just a single function unit, but multiple units combined with prettier syntax. Eg, the transpilation of a generator is a reentrant switch statement to skip to a code location, with each being a separate function call entirely. Each of these calls are their own call-context inheriting from the parent's.

The transpilation/polyfill is actually easier if we don't have any special handling for generators

That, and node's AsyncLocalStorage doesn't do anything special with generators either, so we've introduced a divergence that people need to be aware of when switching.

Who is arguing strongly for snapshotting?

@littledan and I think @bakkot from an old TC39 plenary.

I recall seeing that @jridgewell had pointed out a weird example with Promise.all and async generators that I didn't quite follow, but it seemed like snapshotting resolved some ambiguity there.

That was #18 (comment). Async generators prevent parallel execution, queueing the 2nd+ calls behind the promise of the current call.

async function* foo(n) {
  for (let i  = 0; i < n; i++) {
    await sleep(10);
    yield i;
  }
}

const it = foo();

// Notice that I don't await, there are 2 parallel calls to the iterator.
// The second cannot enter the function's body until the first hits the yield.
// So, the async-gen internally queued the call, and the p2 call will (sync) resume
// as soon as we resolve p1 (there's no tick between p1 resolving and p2 resuming,
// it's not actually a promise.then() chain internally).
v.run(5, () => {
  const p1 = it.next();
  const p2 = it.next();
  return Promise.all([p1, p2]);
});

Because of the internal promise queuing, p2's call context will be p1's call context. But, p1's context is just the parent context, which I think is the expected result anyways. The snapshot behavior addresses this because p1's context is forced to be the init-time context, which means p2's context is always the init-time one.

This can be easily resolved with a modification of the AsyncGeneratorEnqueue and AsyncGeneratorDrainQueue AOs.

@shicks
Copy link
Contributor

shicks commented Mar 20, 2024

This thread is getting a little off-topic - I moved the discussion of basic generator mechanics back to #18 and suggested we re-open that issue. I suggest that the resolution here be dependent on how #18 resolves:

  • if we retain the existing proposal where (async) generators capture/restore the init snapshot across each yield, then it makes sense for iterator helpers to analogously capture snapshots at the call-site
  • if we flip the decision on generators, then this becomes a more interesting question, and I think this one could go either way - it might still be reasonable to capture the context, but I could also see an argument for not doing so

@littledan
Copy link
Member

I think our current direction is that generators will save and restore the AsyncContext Snapshot across yield, just as they do across await, but this doesn't automatically extend to build-in generators, which includes iterator helpers. There's a significant observable difference: in particular, whether the .next() calls themselves are done within this restored snapshot.

For iterator helpers, my opinion is that we should do an AsyncContext.Snapshot.wrap around any callbacks which are passed into iterator helpers, but not save and restore snapshots around that (so, the inner .next() would run in the same snapshot as the outer .next()).

@shicks
Copy link
Contributor

shicks commented Apr 10, 2024

I'm not sure I understand the distinction about the callback vs the inner and outer .next(). Can you give a concrete example that shows the specific user-written function that would be running in one or the other context (i.e. where/how this would make a difference)?

As I understand it, nearly every iterator is already going to restore some or other context when it runs next() - if the iterator came from an iterator helper, then it would be the snapshotted context from when the helper was called. If it came from a generator, it would be the init-time context snapshotted when the generator function first ran. The only exceptions (where it might matter what context the inner .next() runs in) would be either

const arr = [null];
Object.defineProperty(arr, 0, {get() { return v.get(); }});
const iter1 = arr[Symbol.iterator]();

const iter2 = Iterator.from({
  [Symbol.iterator]() { return this; },
  next() { return {done: false, value: v.get()}; }.
});

IIUC you're saying that the following should pass with iter1 and iter2 (assuming Iterator.from doesn't do any snapshotting):

function check(iter) {
  const i = v.run(3, () => iter.map(x => 2 * x));
  v.run(5, () => {
    assertEquals(10, i.next().value);
  });
}

But that check(function*() { yield v.get(); }()) would fail because the generator would clobber the 5 by restoring the init context. Is that understanding correct?

@jridgewell There was discussion (elsewhere) of a callingContext - I think that it would still work here, i.e. that any additional wrapped iterators' contexts shouldn't get in the way of using it to allow a generator to see its calling context - i.e. if one wrote a wrapWithCallingContext function that wraps an iterator to run its inner next in callingContext, then check(wrapWithCallingContext(function*() { yield v.get(); }())) would pass. But I'm not positive, so it's probably worth thinking about to be sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants