Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

Decoupling from spec object graph for nulling out [[Target]] #105

Closed
syg opened this issue May 14, 2019 · 60 comments
Closed

Decoupling from spec object graph for nulling out [[Target]] #105

syg opened this issue May 14, 2019 · 60 comments

Comments

@syg
Copy link
Collaborator

syg commented May 14, 2019

Dan has expressed a desire to rephrase the language in the spec around when it's allowed to null out the [[Target]] slot of WeakRefs. In particular, to not tie it to the reachability of the spec object graph.

One off-the-cuff formulation might be a counterfactual:

For a value v in the [[Target]] slot, at any particular point in evaluation, if there does not exist a program that does not use WeakRef or FinalizationGroup APIs on any not-derefed-this-turn WeakRefs that would evaluate to a value w such that SameValue(v, w) is true, then the [[Target]] slot may be replaced with the value empty.

There are trade-offs here. Some of them that I can think of:

Pros:

  • Give engines maximum flexibility. A strong edge in the spec might in fact be unobservable to the mutator and need not prevent finalization.

Cons:

  • It's unclear what spec authors should expect of objects held in internal slots that happen to be unobservable, or worse, variably observable depending on host. This might get us in a weird place: the spec can manipulate things that the are also allowed to be finalized. Like, I guess that's technically okay, but it sure feels weird.
  • Counterfactuals in general are very hard to reason about, and kind of (IMO) go against the "explicitly define all the behavior" ethos of web specs.

My current opinion is that loose language here is ultimately not desirable. Thoughts, @littledan and @erights?

@erights
Copy link
Contributor

erights commented May 14, 2019

From https://web.archive.org/web/20141214093124/http://wiki.ecmascript.org/doku.php?id=strawman:gc_semantics

The safety constraint is that the garbage collector MUST NOT collect any storage that then becomes needed to continue correct execution of the program. Should weak references be added, then garbage collection decisions become observable. Based on a suggestion from Cameron McCormack [@heycam], we state the safety constraint as follows: So long as operational semantics of the remainder of the program execution includes the possibility that a reference to an object X may be dereferenced, then X MUST NOT be collected. Thus, the garbage collector is allowed to ignore any references that remain present in the semantic state, but which it can ascertain will never be dereferenced in any possible execution. This sets an upper bound on what state MAY be collected. Put another way, if the garbage collector ever reports that X has been collected, such as by nullifying a weak reference to X, if operational semantics of remaining execution requires the traversal of a strong (non-weak) reference to X, then the previous report demonstrates a safety violation.

That said, if we can agree on a stronger and more deterministic spec, that still allows the optimizations that engines already do and cannot be talked out of, then great.

As with concurrency memory models, implementations will only be willing to pay a minor performance cost at most to forego optimizations that would make a mess of the spec. But the spec cannot demand any more than that or the spec becomes an empty fiction. We need to find a creative compromise of the cleanest most predicable spec that implementors will actually be willing to implement.

When I wrote the text above, apparently in 2011, the specific optimization I knew I needed to allow is dead variable elimination. I believe that is still the case.

@syg
Copy link
Collaborator Author

syg commented May 14, 2019

Agree that we should push for as strong and deterministic a spec as we can so long as people are willing to implement.

Do I take that to mean a preference for continuing to use spec object graph reachability as a proxy for garbage-collectability in the spec?

@erights
Copy link
Contributor

erights commented May 14, 2019

Do I take that to mean a preference for continuing to use spec object graph reachability as a proxy for garbage-collectability in the spec?

I would be surprised if people were willing to implement that, for example, because of the desire to eliminate dead variables. What is the minimal additional wiggle room needed for the dead variable elimination that implementations want the freedom to do? Other than dead variable elimination, are there any other implementation optimizations which would become observable violations of a simple reachability spec?

@syg
Copy link
Collaborator Author

syg commented May 14, 2019

Ah, I see. I had created this dichotomy of spec graph reachability vs implementation observability in my head -- of course the naive spec graph reachability is too restrictive, as you point out. But the larger point is the spectrum has more points on it. We might be able to stick with reachability with the right clever concessions.

@syg
Copy link
Collaborator Author

syg commented May 14, 2019

I have come around and I am convinced that reachability definitely doesn't work for the optimization reasons and more. I am quite daunted at the prospect of defining a notion of observability, however. Let me outline a particular scenario I'm worried about.

Suppose we define observability as heycam recommended by quantification over all possible future executions of the script. Also suppose spec keeps some object o in an internal list [[ForSpecMachineryOnly]] for its own spec machinery such that that list is always unreachable and unobservable from script. At time t, because script can never dereference o, it is collected and a finalizer is observably run. Should the spec, at time t+1, be able to then get o out of [[ForSpecMachineryOnly]] and do something with it that doesn't affect execution, like asserting that it has certain properties? This feels odd to me and I am not sure how to think about it yet.

Edit: I realize asserting that it has certain properties might trigger proxy traps. Let's say instead that it has certain internal fields.

@erights
Copy link
Contributor

erights commented May 15, 2019

Also suppose spec keeps some object o in an internal list [[ForSpecMachineryOnly]] for its own spec machinery such that that list is always unreachable and unobservable from script.

We are the ones writing the spec. We shouldn't do that. If we do, we should consider the object to be reachable. To do otherwise would be an extraordinary act of hair splitting that would only be justified by an extraordinary reason.

@littledan
Copy link
Member

I was thinking about things like dead variable elimination and closure edge cases when this first occurred to me, but perhaps even more worrying is the world of library and embedder specifications, which would suddenly have to think about a whole new level of meaning to their text, as @domenic raised in #18 . Between these, I am skeptical that anyone will (or should) actually implement the current precise language.

@littledan
Copy link
Member

The wording in #105 (comment) looks pretty great to me, except I don't understand how "be dereferenced" should be defined.

@syg
Copy link
Collaborator Author

syg commented May 15, 2019

If we do, we should consider the object to be reachable. To do otherwise would be an extraordinary act of hair splitting that would only be justified by an extraordinary reason.

I'm not so sure. This also ties in to Dan's how "be dereferenced" should be defined. Let's consider a real example, FunctionDeclarationInstantiation steps 26 and 28.f.i.4. Step 26 initializes parameters, and 28.f.i.4 copies the initial parameter value to a same-named var binding.

For the following snippet of code:

function g(unused) {
  var unused;
  // do stuff but doesn't use unused
}
function f() {
  var o = {};
  gRef = new WeakRef(o);
  // reasonable to empty gRef here
  g(o);
}

Seems reasonable to me to inline g, at which point the compiler easily sees both unused bindings are unused, and optimizes them away. It's also reasonable to allow the implementation to empty out gRef before g is called.

If so, going back to FunctionDeclarationInstantiation, the question I'm grappling with is: if gRef is emptied during the evaluation (i.e. operational semantics), what do steps 26 and 28.f.i.4 now mean when handling an object that is morally observed to be garbage collected via a weak ref? Without weak refs, we could've just said "well, it's not observable anyways". Indeed we could still say that, but now it feels weirder.

In the context of Dan's question of how "dereferenced" should be defined, ISTM it has to be tied to some notion of observability, which is daunting.

Does that make sense?

Edit: typos

Edit 2: This particular example doesn't work because newly created WeakRefs are always kept for one turn, but you get the idea.

@syg
Copy link
Collaborator Author

syg commented May 15, 2019

Regarding #105 (comment), I had an enlightening conversation with Jim Blandy and Jason Orendorff from the SpiderMonkey team. A clean way to address my concern is simply to keep the spec assumption that all objects live forever. From a spec's POV, WeakRefs prescribes when an object is allowed to become "irrelevant", while allowing the specs to continue treating objects as living forever. In implementations, of course, that "irrelevance" prescription is expected to be an upper bound for when something is garbage collectable.

This assumption will end up being viral for all specs that interact with JS. I think it's reasonable, and that HTML at the very least seems like it'd be fine with this.

Here's a write up from Jim: https://gist.github.com/jimblandy/0014dc11233d2d40df922af850b0489a

@erights
Copy link
Contributor

erights commented May 15, 2019

Considering all objects to be immortal still allows us to classify each object as "collected" or "retained". Only irrelevant objects can be classified as collected. Only weakrefs to "collected" objects can be nulled.

How does "relevant" relate to the language I quote at #105 (comment) :

So long as operational semantics of the remainder of the program execution includes the possibility that a reference to an object X may be dereferenced [...]

I would say that if the the operational semantics of the remaining execution includes such a possibility, then the object is "relevant". Otherwise it is "irrelevant". Of course, we cannot tell where the boundary is precisely. But any object "known to be irrelevant" can be reclassified as "collected".

@syg
Copy link
Collaborator Author

syg commented May 15, 2019

Considering all objects to be immortal still allows us to classify each object as "collected" or "retained". Only irrelevant objects can be classified as collected. Only weakrefs to "collected" objects can be nulled.

Agree.

I would say that if the the operational semantics of the remaining execution includes such a possibility, then the object is "relevant".

That's the art of it, yeah. In my example from #105 (comment), do you think that counts as the operational semantics dereferencing o? It really shouldn't be because it's optimizable, but if we take "operational semantics" to mean exactly the steps prescribed by ecma262, then it is a dereference. In an academic setting I've usually had to first define an observability filter on the reduction relation.

@erights
Copy link
Contributor

erights commented May 15, 2019

That's the art of it, yeah. In my example from #105 (comment), do you think that counts as the operational semantics dereferencing o?

We need to separate two senses of "dereference" that I had not thought to distinguish.

  • The one I was not thinking of: the line g(o); dereferences the variable name o into a reference to the empty object created by the {} above. It also passes this reference.
  • The g(o) passes the reference, but it treats what is being referenced in a completely opaque manner. Likewise, so does the g function. However, if g did an unused.foo, even though that empty object has no foo property, it would have dereferenced the reference to operate on the object it references.

I think the second notion of dereference above allows the optimizations we cannot prohibit, but is more specifiable, predictable, and intuitive than observability.

@erights erights closed this as completed May 15, 2019
@erights
Copy link
Contributor

erights commented May 15, 2019

closed by mistake

@erights erights reopened this May 15, 2019
@littledan
Copy link
Member

I think it's really good to talk all this through. I just wanted to ask, when does this need to be decided by? Is it OK to iterate I'm these details during Stage 3?

@erights
Copy link
Contributor

erights commented May 15, 2019

I think it's really good to talk all this through. I just wanted to ask, when does this need to be decided by? Is it OK to iterate I'm these details during Stage 3?

That is an important process question for which I do not know the answer. Perhaps the best precedent would be the design and debugging of memory model for SABs.

That said, I hope this does not need to be precisely settled in detail before stage 3. I hope agreement on the general approach to take should be enough. But maybe not.

@syg
Copy link
Collaborator Author

syg commented May 16, 2019

I agree the exact language should not be a stage 3 blocker, but consensus on the spirit of the language should be gotten from the implementers before stage 3.

@littledan
Copy link
Member

@erights I'm trying to understand #105 (comment) . What do you think we could write as specification text to capture this? How does it differ from the "observability" direction discussed above in, well, observability?

Do we agree, in this thread, that references from specification data structures don't necessarily keep things alive for the purposes of WeakRefs and FinalizationGroups? If so, maybe that would be enough to nail down the "spirit" for Stage 3 and respond to #31. I can't tell, though, whether @erights would agree with this.

@erights
Copy link
Contributor

erights commented May 16, 2019

Before I answer, I realize I should first ask: What do either of you (@syg or @littledan) mean by "observability"?

What I am looking for is a criteria that would set a lower bound, i.e., that would let us reason to a conclusion that a particular object at a particular time in specified (as opposed to implemented) execution sequence cannot be observably collected, i.e., that observing a weakref to that object at that time cannot report that the object is collected. In the previous examples, can we conclude that

function g1(unused) {
  // do stuff but doesn't use unused
}
function f1() {
  const o = {};
  const gRef = new WeakRef(o);
  // reasonable to empty gRef here
  g1(o);
}

but

function g2(used) {
  // a clever enough compiler might have been able to optimize this out:
  used.foo;
}
function f2() {
  const o = {};
  const gRef = new WeakRef(o);
  // UNREASONABLE to empty gRef here
  g2(o);
}

?

(Code above revised to avoid distracting "var"s.)

@littledan
Copy link
Member

I don't have a formal definition to answer with. When I used the word "observable", I was thinking about something which would change the "result" of the program, or how it interacts with the rest of the world. But, in particular, I was thinking about omitting WeakRef and FinalizationGroup operations from the definition--those are allowed to change, and that's the point. It's just that, if nothing except for those operations changes in their semantics, it'd be fine to collect the object.

I guess your example program is a little incomplete, since the WeakRef constructor includes KeepDuringJob. But if a turn has passed, I'd say, if the compiler can prove that there will be no side effect, then yes, it's OK if the WeakRef becomes empty.

@erights
Copy link
Contributor

erights commented May 16, 2019

Oops. Yes, I should have inserted a turn boundary:

function g1(unused) {
  // do stuff but doesn't use unused
}
async function f1() {
  const o = {};
  const gRef = new WeakRef(o);
  await 1;
  // reasonable to empty gRef here
  g1(o);
}

vs

function g2(used) {
  // a clever enough compiler might have been able to optimize this out:
  used.foo;
}
async function f2() {
  const o = {};
  const gRef = new WeakRef(o);
  await 1;
  // UNREASONABLE to empty gRef here
  g2(o);
}

I'd say, if the compiler can prove that there will be no side effect, then yes, it's OK if the WeakRef becomes empty.

So here's the trickiness. Depending on the contents of the object o aka used, the line used.foo; might very well have had side effects; for example, if .foo were an accessor. So is the lack of an effect an observation that o did not have an accessor named .foo?

If your notion of observability still allows collection immediately after the await 1; above, could you give a similar example where we can clearly conclude that collection immediately after the await 1; would not be allowed? Could you explain your reasoning?

@littledan
Copy link
Member

So here's the trickiness. Depending on the contents of the object o aka used, the line used.foo; might very well have had side effects; for example, if .foo were an accessor. So is the lack of an effect an observation that o did not have an accessor named .foo?

Yes, of course in that case, it should not be collected in that case. I was assuming that the compiler could trace that it was {} and also that Object.prototype didn't have anything here. It'd be a very smart compiler.

@syg
Copy link
Collaborator Author

syg commented May 16, 2019

I don't have a particular notion of observability in mind, but was suggesting an alternative framework instead of "dereferenced". If we proceed with a "good" dereference and a "bad" dereference as laid out above, I feel like the task of classifying every access in the spec is not tractable.

Practically I want this notion of observability to be, in spirit, the same as the spirit we used to guide the memory model: ideally, the set of optimizations that apply to programs without weak refs should also be applicable to those with weak refs.

We need to first agree on what the unit of observability is. An evaluation step (i.e. a reduction step in the operational semantics) itself is too restrictive. Since JS (thanks to your vision @erights!) has such a small surface in the core spec, and depends on a host to do effectful things, I think we can work backwards from the JS-host boundary. Every time we call out to the host, the objects that the host can reach must be assumed to be observable (with some kind of escape hatch to allow the host to do optimizations as well). An initial formulation may be:

Let an execution be 0+ evaluation steps. The space of all executions may be thought of a step and the set of all possible tails, which are themselves executions.

Let a class of steps be called observed steps. For every observed step, let S_pre be the reachable objects before the step and S_post be the reachable objects after the step.

Base case: a step that calls the host is observed.
(Co?)inductive case: A step is observed iff its removal from the execution changes the S_pre and S_post of any observed step in any future execution.

A WeakRef may be emptied out at any step if doing so does not change the S_pre and S_post of any future observed step in any execution.

Edit: Note that the above is very permissive, much more permissive than real world compilers because doing that kind of analysis is definitely intractable. I guess I'm saying it seems easier to me to start with the super permissive thing and work backwards.

@erights
Copy link
Contributor

erights commented May 21, 2019

Practically I want this notion of observability to be, in spirit, the same as the spirit we used to guide the memory model: ideally, the set of optimizations that apply to programs without weak refs should also be applicable to those with weak refs.

Were this the only ideal, we would have adopted a much more permissive memory model, such as other languages (Java, C++) and architectures have.

For both, there are opposing ideals, for which we seek a compromise that accommodates a best-of-both of these ideals. For both, the opposing ideal is enabling more reliable static reasoning about correctness --- both formal and informal reasoning. @waldemarhorwat's concerns about quantum-like indeterminate junk (I forget the actual terminology) led us to a stronger memory model than any other comparable language, forcing us to avoid certain optimizations.

Here, the comparable issue is reasoning from a program to guarantees about what will not be observably collected. For some X which, by the specified operational state, is reachable from roots, sometimes we wish to allow optimizations which might cause X to be observably collected. However, we need to be clear about when X is reachable from roots in ways that optimizations cannot disrupt, i.e., that guarantee that X cannot be observably collected. I return to the example:

function g2(used) {
  // a clever enough compiler might have been able to optimize this out:
  used.foo;
}
async function f2() {
  const o = {};
  const gRef = new WeakRef(o);
  await 1;
  // UNREASONABLE to empty gRef here
  g2(o);
}

Because the meaning of used.foo depends on the state of used, the execution of this expression should be considered an observation. The above program must be guaranteed not to observe gRef being empty at the commented execution point.

I do not subscribe to driving the relevant observation criteria from external host effects, though I see the elegance of doing so. Were we to allow this, then a subsystem that had been correct when connected directly to the outside world might become incorrect when connected only to an internal mock of that outside world, because the difference in internal weakref behavior causes something else that is externally observable.

@syg
Copy link
Collaborator Author

syg commented May 21, 2019

concerns about quantum-like indeterminate junk (I forget the actual terminology)

We called it "quantum garbage". I know the actual optimization itself as "rematerialization", viz. rematerializing an array read.

For both, there are opposing ideals, for which we seek a compromise that accommodates a best-of-both of these ideals.

Very much so! Just like the memory model had to be strong enough for reasoning and weak enough for CPU/compiler reality, so the weak ref spec must also be strong enough for reasoning and weak enough for GC/compiler reality.

We both want to arrive at that unknown sweet spot. I am approaching it by identifying the weakest guarantee then strengthening, while you are favoring approaching from the other end of the spectrum.

My reasoning is that there is a practical difference between getting implementer consensus for SABs and for WeakRefs. For SABs, the optimizations we wanted verboten weren't written yet. Since SAB was an entirely new API, we weren't asking any engine to check their existing optimizations. This isn't true for WeakRefs though: by saying we don't want to be as permissive as possible, we are literally making the previously unobservable observable. Like you've said, we need to gauge the implementers' appetite for change here, but my feeling is we're in a much more difficult position than with SABs.

Were we to allow this, then a subsystem that had been correct when connected directly to the outside world might become incorrect when connected only to an internal mock of that outside world, because the difference in internal weakref behavior causes something else that is externally observable.

That is a good point. However, #31 has already shown that whatever semantics we come up with here must be applicable by host specs that embed JS to their own references of JS objects. I understand your concern for not driving observability wholesale from the host, but I contend the host must have a say in the observability of a GC thing, and thus the correctness of the program. This point is a real-world one: while all JS engines have a tracing GC of varying sophistication, there is higher variety in how DOM nodes are managed. Firefox has a cycle collector (formally, the dual of a GC). Blink AFAIU now has a tracing GC, "Oilpan". AFAIU WebKit does ref counting and has no CC or GC. I'm skeptical we can come up with a set of "must be observably not collected" rules that is palatable for these very different implementations.

@erights
Copy link
Contributor

erights commented May 22, 2019

Hi @syg that's a nice summary of the issues. Regarding:

I'm skeptical we can come up with a set of "must be observably not collected" rules that is palatable for these very different implementations.

We must. Without being able to reason from safe bounds, we cannot soundly use the spec for much. For example, the cross-address-space distributed acyclic gc use, as well as the js-to-wasm use, cannot be sound without reasoning about the absence of "false" collection signals.

@littledan
Copy link
Member

@erights, I can understand how it is desirable to have these things more defined, but where do you find a difference in viewpoint from @syg's argument about feasibility?

@erights
Copy link
Contributor

erights commented Jun 4, 2019

Let's examine constraints from the other side. If there are never any guarantees about what it not collected, then it is impossible to write a correct program that uses weakrefs to clean something up when "it" won't be further used.

  • Recall that the motivating example is JS code holding onto some kind of handle to resources within a wasm instance. If correct JS code does not drop the handle, as far as the language semantics are concerned, and then later uses it to access that wasm resource, we need to know that the wasm resource has not already been deallocated because of an earlier "false" finalization signal.
  • The enumerable weakKeyMap and weakValueMap examples are not correct if there are no guarantees about what is not collected.
  • Pretty much every example. We should critically examine those and determine how a "clever enough compiler" together with too-weak spec guarantees might cause them to be incorrect.

@tschneidereit
Copy link
Member

@erights, I agree with all these constraints. I do, however, think that there is a lot of room for disagreement on what "it is impossible to write a correct program" means :)

I'm quite skeptical that we'll be able to agree on anything that'll restrict implementations' ability to apply future optimizations to which objects can be GC'd.

To make that more concrete, consider the case of Web Workers. A Worker can be collected in its entirety if it can't possibly execute code in the future. That is the case if no channel exists to it or it doesn't have any onmessage handler registered, and its stack is empty. That scenario is obviously trivial.

A more interesting example is a Worker that

  • has a single onmessage event handler registered
  • doesn't have any other event handlers registered
  • doesn't have any promises pending that could be resolved from outside the Worker
  • has an empty stack.

In the absence of WeakRefs, I think the following holds in that scenario:
An implementation is free to collect any objects that content can't reach from within the onmessage event handler.

AFAICT the question under discussion here effectively boils down to what change to this statement the introduction of WeakRefs requires. (Of course there are lots of other scenarios to consider, but I think they don't differ in fundamental ways.)

A simple change would be to the following:
An implementation is free to collect any objects that content can't reach from within the onmessage event handler, excluding through dereferencing weak references.

Of note, this definition allows implementations to introduce optimizations that change whether someWeakRef.deref() stays valid or not.

I'm highly skeptical about our ability to define liveness guarantees that'd constrain an implementation's freedom compared to the above. At the same time, I do think that this actually is a useful definition that allows developers to write programs with guarantees about when a WeakRef will continue to be valid, so perhaps it's sufficient?

@erights
Copy link
Contributor

erights commented Jun 4, 2019

An implementation is free to collect any objects that content can't reach from within the onmessage event handler, excluding through dereferencing weak references.

Do we agree that, according to your text, the implementation is not free to collect used at the commented point in the program?

function g2(used) {
  used.foo;
}
async function f2() {
  const o = {};
  const gRef = new WeakRef(o);
  await 1;
  // UNREASONABLE to empty gRef here
  g2(o);
}
onmessage = f2;

@erights
Copy link
Contributor

erights commented Jun 4, 2019

An implementation is free to collect any objects that content can't reach from within the onmessage event handler, excluding through dereferencing weak references.

Also, I'll just be pedantic again and point out that this text doesn't actually state that an implementation is not free to collect except under these conditions. Compare:

An implementation is only free to collect objects that content can't reach from within the onmessage event handler, excluding through dereferencing weak references.

@syg
Copy link
Collaborator Author

syg commented Jun 4, 2019

@erights To be precise, is the intended reason that o in your example is not collectable due to const o = {} instead of const o = { __proto__: null }?

@tschneidereit
Copy link
Member

An implementation is only free to collect objects that content can't reach from within the onmessage event handler, excluding through dereferencing weak references.

I agree that this is better.

Do we agree that, according to your text, the implementation is not free to collect used at the commented point in the program?

However, in light of this question I want to change the statement slightly:

An implementation is only free to collect objects that content can't reach by any code that could execute in the future, excluding through dereferencing weak references.

I'm sure that there are better ways to put this. However, with this definition yes, AFAICT an implementation would be free to collect o. (Note: I'm assuming that you mean for the example to not actually contain a strong reference to o as it currently does.)

@syg
Copy link
Collaborator Author

syg commented Jun 4, 2019

To @tschneidereit's point about different tiers earlire, I think it's perfectly reasonable that an implementation will collect o in f2 in some invocations of f2.

Suppose there is nothing on the Object prototype chain to make g2's used.foo observable. Further suppose f2 is run 10,000 times, gets optimized, inlines g2, and the optimizer DCEs used.foo away. For those initial invocations, it should be reasonable for the implementation to collect o.

Then, suppose the host loads a new script that injects a proxy into the Object prototype and that invalidates and throws away the existing JIT code for f2 and g2, because the assumption that used.foo is unobservable has become invalid. For any subsequent invocations of f2, it is now unreasonable for the implementation to collect o.

@erights
Copy link
Contributor

erights commented Jun 4, 2019

Before answering the above questions, I'd like to ask about a similar example:

function g3(used) {
  used.foo();
}
let x = 0;
async function f3() {
  const o = {foo() {console.log(++x);}};
  const gRef = new WeakRef(o);
  await 1;
  // UNREASONABLE to empty gRef here
  g3(o);
}
onmessage = f3;

Does everyone agree that the implementation is not free to collect o at the execution point marked by the comment?

If so, can you reason from your proposed text to this conclusion?

@littledan
Copy link
Member

I don't understand what would prevent collecting o. Could you explain how you would imagine this blocked while permitting inlining?

@ljharb
Copy link
Member

ljharb commented Jun 4, 2019

o is passed into g3, after that comment - it would certainly be a problem if the spec allowed it to be collected there.

@littledan
Copy link
Member

This issue is pretty subtle, so @ljharb and I talked it over offline and agreed to the following:

We're really excited about WeakRefs coming to JavaScript; they should be really useful, and they're long overdue. It can be hard to think about garbage collection, so it'll be good if the specification can explain what we can expect from this feature. It's unclear what guarantees can be provided without slowing down existing code. We'd encourage engine developers to think hard about what's possible to nail down. We support this proposal going to Stage 3 with its current specification text leaving "observably referenced" as a little bit open, and on the path towards Stage 4, we should work together on figuring out more precisely what we can guarantee to developers.

@erights
Copy link
Contributor

erights commented Jun 4, 2019

I agree. We should go to stage 3 with the current understanding of the two hard constraints, both of which must be met. We can then figure out how to meet them during stage 3.

@brabalan
Copy link

brabalan commented Jun 5, 2019

@erights asked me to give my opinion here based on what I know of how the OCaml compiler handles weak references. There are two mechanisms at play there: optimizations and runtime handling.

Regarding optimizations, it's basically a static liveness analysis after program transformation (such as inlining function calls). It is done fairly late in the compilation chain. I don't know how to do a precise enough static analysis when staying at the source code level, so I don't think this approach is viable for the spec if we don't talk about program transformation. Alternatively, we could use a dynamic notion of liveness (something is live if it can be reached from the roots, which include things in the stack, without going through week pointer), and say that it is allowed to null any object that is not live. But I'm afraid this amounts to specifying what a correct GC would be (and weakrefs are just an small addition to that), which requires being able to talk about the runtime state of programs.

@syg
Copy link
Collaborator Author

syg commented Jun 5, 2019

My understanding is pretty close to @brabalan's and I think is a good argument for being more lax than not with guarantees.

@syg
Copy link
Collaborator Author

syg commented Jun 5, 2019

@erights I agree that in your f3/g3 example, implementations should not be free to collect o, if the host has an effectful console.log, such as on the web.

Edit: retracted in light of Lars's comment below.

@lars-t-hansen
Copy link

Before answering the above questions, I'd like to ask about a similar example:

function g3(used) {
  used.foo();
}
let x = 0;
async function f3() {
  const o = {foo() {console.log(++x);}};
  const gRef = new WeakRef(o);
  await 1;
  // UNREASONABLE to empty gRef here
  g3(o);
}
onmessage = f3;

Does everyone agree that the implementation is not free to collect o at the execution point marked by the comment?

If the implementation can ascertain that the WeakRef constructor is the original WeakRef constructor (eg as a result of online profiling) then it can reason that the o passed to the WeakRef constructor is not actually used for anything since gRef is not used for anything (it is local to the function and never referenced), ie all it needs to know is that a call to the original WeakRef constructor with the result ignored is a no-op, for practical purposes (whether the call is made or not is not observable). Thus it might not create the weakref at all. Of course making gRef a global fixes that but maybe it changes the point you were trying to make.

I think the point I'm trying to make here is that it's not just about reachability but also about how reachability is created -- the clause in the prose about "excluding through weak references" ignores how weak references are created (or not).

Another point in the example above is that even if the spec should require the WeakRef to be created in all cases even if it's not observable, the implementation might still be allowed to know enough about how it works that it can create two objects, one to pass to the WeakRef (which will never be seen by anyone) and one to pass to g3. (After which the second o is optimized away and g3 and foo are inlined and the final call of f3 becomes simply console.log(x++).) But now we're requiring storage to be kept alive even though it is clearly unobservable.

The implementation might also move the creation of the o and the weakref until after the await, since the await does not depend on them and they are not visible externally; again changing the calculus of possible optimizations.

@lars-t-hansen
Copy link

@erights I agree that in your f3/g3 example, implementations should not be free to collect o, if the host has an effectful console.log, such as on the web.

Really, you wouldn't just inline the call to console.log into f3 after reasoning about reachability of o?

@syg
Copy link
Collaborator Author

syg commented Jun 5, 2019

I retract my agreement. :)

@littledan
Copy link
Member

Hey, now that we seem to agree on the reachability of the identity itself to constitute liveness (in offline discussion), does anyone want to write this up as a PR against the draft specification?

@syg
Copy link
Collaborator Author

syg commented Jun 9, 2019 via email

@wingo
Copy link
Collaborator

wingo commented Sep 27, 2019

I believe this is now closeable given that #142 was merged in mid-august.

@syg syg closed this as completed Sep 27, 2019
cpcallen added a commit to cpcallen/proposal-weakrefs that referenced this issue Nov 19, 2019
Removed editor's note about definition of liveness being under discussion, as tc39#115 is now closed, as is tc39#105 (which probably should have been the issue referenced).

Also removes malformed markup.
cpcallen added a commit to cpcallen/proposal-weakrefs that referenced this issue Nov 19, 2019
The referenced PR comment appears to have been resolved along with issue tc39#105.
littledan pushed a commit that referenced this issue Nov 19, 2019
Removed editor's note about definition of liveness being under discussion, as #115 is now closed, as is #105 (which probably should have been the issue referenced).

Also removes malformed markup.
littledan pushed a commit that referenced this issue Nov 19, 2019
The referenced PR comment appears to have been resolved along with issue #105.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants