Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spec] refactor spec to generate a nonobservable Map instance containing all intrinsics #17

Merged
merged 4 commits into from
Jun 2, 2023

Conversation

ljharb
Copy link
Member

@ljharb ljharb commented Jan 19, 2023

Calling getIntrinsic with no arguments returns a "keys" iterator for this Map.

I'm sure there's lots of spec text massaging that's needed here, but I'm hoping the overall approach can achieve stage 2. Thoughts are welcome.

Closes #11.

…ing all intrinsics

Calling `getIntrinsic` with no arguments returns an "entries" iterator for this Map.
@erights
Copy link

erights commented Jan 20, 2023

What's the easiest way for me to look at this rendered?

@ljharb
Copy link
Member Author

ljharb commented Jan 20, 2023

@zloirock
Copy link

God's functions are evil. One function - one responsibility. It's much better to add a namespace like

namespace Intrinsics {
  get(name: string): any;
  iterate(): Iterator;
}

or even

namespace Intrinsics {
  get(name: string): any;
  @@iterator(): Iterator;
}

@ljharb
Copy link
Member Author

ljharb commented Jan 22, 2023

I don’t think “god function” really is an appropriate label for something with only two responsibilities. Personally i find this preferable to a namespace anyways, but i think it’s critical the getIntrinsic function be global - so two functions would mean two globals, and that’s something to avoid.

Copy link

@erights erights left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my review, I find the ecmarkup unreadable, and will instead jot notes here from reviewing the rendered form at https://raw.githack.com/tc39/proposal-get-intrinsic/enumerability/index.html

Unless otherwise specified each intrinsic object actually corresponds to a set of similar objects, one per realm.

No change suggested to this text, but I am curious: Do we currently have or expect any such exceptions?

A reference such as %name.a.b% means, as if the "b" property of the value of the "a" property of the intrinsic object %name% was accessed prior to any ECMAScript code being evaluated.

In case you want to be precise about the edge cases where one of these properties is an accessor or one of these objects is a proxy: You could simply prohibit these edge cases. The spec must only name an intrinsic with a dotted path if it traverses only the values of data properties, where the hypothetical GET of that value causes no effects. If there is no such dotted path, the intrinsic should get its own standalone name.

Altogether, I'm happy not being that precise and leaving the current text as is. Either choice would be fine.

The well-known intrinsics are listed in Table 1.

The Table 1 that I see in the rendering only has three rows, each of which already appears in the spec. I really strongly appreciate the addition of the new "Directly Accessible to ECMAScript code" column! But I don't understand why the addition of this column is part of this proposal. Is it only a drive-by enhancement to the spec text? (If so, I am happy to see it)

Aside from the addition of this column, I did not spot any other differences in the contents of these three rows from the status quo spec text. If it's there as examples of the naming rule immediately above, it should probably include a dotted path name.

  • An intrinsic name that describes a data property must have no prefix.
  • An intrinsic name that describes an accessor property must have the prefix "get " or "set ".

This is confusingly non-parallel to the (correct) description of language-provided intrinsic names. At least for the data-property case, presumably, the intrinsic name is not describe the data property, but described the intrinsic object which is initially found at the value of that data property.

For the accessor case, you are indeed not referring to the value obtained by a hypothetical GET on the property, but rather of the getter or setter intrinsic function itself. Echoing the consideration above, if there is a host provided intrinsic that got be gotten by a hypothetical GET on that property, and if the hypothetical GET has no side effects, can the host name the resulting intrinsic object with the unqualified property name?

Can you give an example of a host name that follows your recommendations?

Copy link

@erights erights left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetIntrinsicsMap returns a Map object. This Map is not directly observable from ECMAScript code.

Ah, I think I understand what Table 1 was supposed to be. Maybe it is, but there's some bug in the ecmarkup processing that caused it to render badly? What I'm guessing is that this inaccessible Map instance is supposed to appear in the table as inaccessible. But if so, I would expect this section to use a %...% intrinsic name to refer to that map instance. But I don't see such a name. I think I'm still confused.

For each baseIntrinsic of each intrinsic name listed in 6.1.7.4,

This is the first use of baseIntrinsic. It is not defined in this document nor in the status quo 262 spec. But I think I get it. In your example, where the intrinsic-name of an intrinsic is %name.a.b%, there must also be an intrinsic, presumably found in Table 1 as well, named %name%. However, from the algorithm, it seems the value of baseIntrinsic would be the string name "%name%".

I find that confusing. I would expect "Intrinsic" to be a kind of intrinsic, i.e., the intrinsic associated with the base intrinsic name "%name%".

If an intrinsic is reachable by multiple such dotted paths, which should take priority? Or should all be included. You at least need to suppress cycles, like %Foo.prototype.constructor%.

Since you are distinguishing base intrinsics from all intrinsics anyway, where all non-base-intrinsics can be reached by (presumably side-effect free) dotted path name navigation from the base intrinsics, you may consider providing (in the hidden map and the exposed iterator) only these base intrinsics, and only under their base intrinsic name, which should have no ambiguity. Programs (like the Hardened JS shim (aka the SES shim) that need to discover all hidden intrinsics would still get what they need, since they can do that name navigation themselves.

An important invariant that this spec seems to violate but does actually violate: (Aside from the known exceptions of the Date constructor, the Date.now method, Math.random, WeakRef constructor, FinalizationRegistry constructor, and SharedArrayBuffer constructor) Intrinsics have no hidden mutable state. Once frozen all are, they are effectively immutable. The hidden map instance, by virtual of it being a map instance, has hidden mutable state. But the map is encapsulated in the algorithm and cannot be mutated after initialization. But this lack of mutable state is a theorem, rather than an obvious consequence of the spec.

Oops. If the map instance is an intrinsic included in Table 1, then we have a bigger problem:

  1. For each baseIntrinsic of each intrinsic name listed in 6.1.7.4,

Since you made a point of deeming the explanatory objects that are not directly accessible from ECMAScript code to be intrinsics, listed in Table 1 under their intrinsic-name, do you intend for these objects to become accessible to ECMAScript code via this proposal (both by name retrieval and by iteration)? For the two that you list (or at least that I see in the rendering), %AsyncFromSyncIteratorPrototype% and %ForInIteratorPrototype%, no problem. These have no hidden mutable state. Once it and all objects transitively reachable from it by named traversal are frozen, they are effectively immutable.

However, exposing this map instance in this manner would be fatal. Map instances inherently have hidden mutable state. Perhaps this is why you didn't give it a name and AFAICT did not list it in Table 1, so that getIntrinsic would not expose it? If so, then we are still safe, since there's no way to mutate the map after initialization.

But if it is a nameless explanatory non-intrinsic object, why not elevate it to the level of the other spec fiction lists and records? Is this just to economize on the need for spec mechanism to explain the lookup and iteration? As a spec fiction, just use the intrinsics list of records from your algorithm directly. Introducing an inaccessible and non-obviously immutable unnamed map instance is more complexity cost than you're saving on the other end.

Copy link

@erights erights left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Let map be GetIntrinsicsMap().
  2. If MapHas(map) is false, then
    a. Throw a TypeError exception.

GetIntrinsicMap is the internal spec function you are defining. This is an error check that it does indeed return a map, just in case it returns something else. But wouldn't that be a spec violation? IIUC, this should be a spec assert rather than an operational error check.

When the name string succeeds in producing an intrinsic, the name string surrounded by "%" matches the intrinsic notation described in the specification in Well-Known Intrinsic Objects.

Juxtaposing this with HostAddedIntrinsics steps

  • An intrinsic name that describes a data property must have no prefix.
  • An intrinsic name that describes an accessor property must have the prefix "get " or "set ".

It is almost an invariant that the intrinsic names of intrinsics that appear in Table 1 begin with a "%" and intrinsic names provided by the host via HostAddedIntrinsics not begin with a "%". Could we require hosts not to name any of these data properties with a name beginning with '%', or if they do, could we suggest a mangling so that their intrinsic name does not begin with a '%'.

(long tangent you can skip if you don't need it)

The SES-shim lockdown, in order to remain secure when run on later versions of the language, uses a whitelist of known-SES-safe-globals as of that version, which include all the powerless 262 globals. Any global not on our whitelist we assume is a host-provided power, which should remain unperturbed on the realm's global (i.e., the global of the start compartment). We then do name-based navigation from these whitelisted globals to all intrinsics reachable by transitive name+prototype traversal. But this leaves the "hidden intrinsics", intrinsics that JS code with access only to the whitelisted globals can still reach. There are two ways that language changes can introduce such hidden intrinsics: intrinsics reachable by new syntax (such as %AsyncGenerator%) or intrinsics reachable by invocation (new iterator prototypes).

(end tangent)

Using getIntrinsic we would freeze all intrinsics whose intrinsic-name begins with '%' and everything reachable by name and prototype traversal from there. This may include more intrinsics than confined code could have reached, but due to other invariants we've been guarding, should still be safe to freeze. (I don't yet understand what invariants your dotted-path naming provides, but it seems aligned, and might further help.)

We would skip all discovered intrinsics whose name did not begin with a '%', because freezing these hidden host intrinsics may very well break some host-provided API that relies on these not being frozen.

Since the proposal as is effectively gives us this '%' invariant for all practical purposes, we would freeze according to this rule regardless. If it is true for all host objects now, we would treat it as a de-facto invariant. But, as with all such de facto invariants that code relies on, it would be better for it to be de-jure. We have a zillion of those, but we gotta start somewhere ;)

19.3.3 isFinite ( number )

Why is this in this PR? It is not otherwise mentioned. Is it an ecmarkup proximity artifact?

@ljharb
Copy link
Member Author

ljharb commented Jan 30, 2023

@erights that "one per realm" phrasing is not in this proposal, it's in the main spec already, and I thus haven't paged it in.

Getting accessor intrinsics is an important use case, so I do think we need to specify it. For example, I need to be able to get the original Map size accessor function.

That column is added because, without it, this proposal would cause these previously unobservable intrinsics. I only included a few rows to show the insertions/changes rather than replicating the entire table.

Re baseIntrinsic, that's the spec's notation for for-each loops - baseIntrinsic is defined by that line, as the iteration variable.

Regarding cycles, I thought I'd already covered that in the Map creation code, by explicitly not traversing any constructor or prototype or __proto__ properties, but if not, I'd be happy to refine that in stage 2; that's obviously important to address.

The Map instance created here is not an intrinsic, and regardless can never be directly exposed to user code - intrinsics are only things that are listed in the table, or reachable from the items in the table in a new realm, which doesn't include this map instance.

The MapHas check is not checking it's a map - that's already stipulated, by the return type of the AO - it's just meant to be checking if the key is present - but i see i've forgotten the key :-) so i'll fix that.

I would be happy to add normative text to guarantee that no builtin things ever have % in the name, but I don't think that's actually necessary - if that ever happened, the % logic in the spec would need to change for the spec to be coherent.

isFinite is included in the proposal because it's adjacent to a change, and just like git diffs, unchanged lines adjacent to changed lines are included as context clues to help with understanding where in the original document the change occurs.

Copy link

@erights erights left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter separate from all my previous review comments:

I do not like the API design that one function does two jobs that are so different from each other, based on the presence or absence of an argument. I also find it weird that
getIntrinsic() returns an iterator of many intrinsics, but the name is singular. I would much prefer a pair of functions, getIntrinsic(name) and getIntrinsics(). In this case, I think introducing two names has a lower cognitive burden than introducing one name.

@ljharb
Copy link
Member Author

ljharb commented Jan 30, 2023

Introducing two names works fine, but it would mean we have to create two globals - and two new intrinsics - rather than one. Is there an alternative you can suggest that only adds a single global function?

@erights
Copy link

erights commented Jan 30, 2023

Normally I would also rather introduce one new global variable name rather than two. But this is based on cognitive burden --- how much more complex does it make the language feel?

For this case, could you state why you'd prefer introducing one, rather than introducing a pair or names that are obviously closely related. What cost are you economizing?

@ljharb
Copy link
Member Author

ljharb commented Jan 30, 2023

I'm worried that trying to add two globals for this proposal may cause the entire thing to get more pushback than it would for having a dual-responsibility function. Obviously if my fear is unfounded, I'd switch to two globals, and either way I'll certainly mention that as a possibility.

@erights
Copy link

erights commented Jan 30, 2023

To answer your question, all other ways I've immediately been able to imagine that use only one function would be even worse than what you propose. So I'll spare you ;)

@erights
Copy link

erights commented Jan 30, 2023

Good! With that clarified, we can take the temperature of the room, find our which design gets less pushback, and go with that.

@bathos
Copy link
Contributor

bathos commented Jan 30, 2023

Could we require hosts not to name any of these data properties with a name beginning with '%' [...]

For the web platform, this would probably be an uncontroversial requirement since it’s already syntactically enforced there in a sense: the Web IDL identifiers that get mapped to property names in the ES binding’s “initial objects” (= host intrinsics) can’t include “%”.

@erights
Copy link

erights commented Jan 30, 2023

For the web platform, this would probably be an uncontroversial: the Web IDL identifiers that get mapped to property names in the ES binding’s “initial objects” (= host intrinsics) already can’t include “%”

@bathos that's great news! Thanks for the clarification. It would still be nice to codify the invariant, so that all hosts know to obey that constraint.

@gibson042
Copy link

We should also think about the proper key for %Intl%.[[FallbackSymbol]].

@bathos
Copy link
Contributor

bathos commented Jan 30, 2023

ECMA-402 also doesn’t currently “declare” the %SegmentIteratorPrototype% and %SegmentsPrototype% well-knowns in the WKIO table (issue). The SplitIntrinsicPath AO relies on that table, so it would throw a TypeError for these.

@gibson042
Copy link

ECMA-402 %Intl% and its [[FallbackSymbol]] are intrinsics (the latter a "hidden intrinsic" not reachable from property access that starts with the global object) in implementations supporting that spec, and therefore must be associated with some key in this proposal, whether treated as host-added or not.

@gibson042
Copy link

ECMA-402 being basically a normative-optional part of ECMA-262, I think it would be best if the two specs cooperated here, with 402 adding to the table in 262 and 262 documenting that (like the handling of toLocaleString methods). And for tc39/ecma402#655 , possibly "demoting" its intrinsics, e.g. %NumberFormat% to %Intl.NumberFormat%.

@ljharb
Copy link
Member Author

ljharb commented Jan 30, 2023

To clarify, the % is strictly intended to remain an editorial part of the spec; it's not to be exposed as part of the API. eg, getIntrinsic('Array.prototype.concat').

@ljharb
Copy link
Member Author

ljharb commented Jan 30, 2023

@gibson042 intrinsic notation doesn't allow for field/slot lookup, so it'd need to be a top-level intrinsic value in the 402 spec, like %IntlFallbackSymbol%, to be valid. It's possible 402 is already not meeting this requirement, but it can and should be fixed.

@gibson042
Copy link

@ljharb Exactly. I'm not proposing that the fallback symbol be looked up using slot notation, just pointing out that the hidden intrinsic exists and needs representation (presumably as a top-level sibling of %Intl% like you suggest).

@rbuckton
Copy link

rbuckton commented Feb 2, 2023

I'm not opposed to using Map, but I'm curious why this chooses to do so. Would it not be sufficient to specify this as just constructing a List of key/value pairs, and letting implementations perform their own optimizations? Or even just referencing the table directly like we do in https://tc39.es/ecma262/#sec-runtime-semantics-unicodematchproperty-p?

@rbuckton
Copy link

rbuckton commented Feb 2, 2023

Would it not be sufficient to specify this as just constructing a List of key/value pairs, and letting implementations perform their own optimizations? Or even just referencing the table directly like we do in https://tc39.es/ecma262/#sec-runtime-semantics-unicodematchproperty-p?

Sorry, I see that this was the prior implementation. That said, I am curious why it would be necessary to depend on a Map just to use its keys() as opposed to producing an iterator over the names in the table.

@ljharb
Copy link
Member Author

ljharb commented Feb 2, 2023

The table isn't sufficient, because it only contains top-level intrinsics, and doesn't contain host-added intrinsics - additionally, the order in the table shouldn't dictate observable ordering. Also, using a Map is already using spec Lists internally :-)

That said, in this PR I extract the MapGet etc logic (which uses lists, in [[MapData]]) into AOs so that I can use it - I could instead just copy-paste the same logic to iterate over Lists. At that point, I'd need to either use a generator, or make a new intrinsic IntrinsicIterator or something.

@gibson042
Copy link

You could also refactor the introduced AOs to operate directly on a List of { [[Key]], [[Value]] } Records rather than a Map with [[MapData]] of that shape.

@@ … @@ GetIntrinsicsMap(
           1. Let _hostIntrinsics_ be HostAddedIntrinsics().
-          1. Let _map_ be ! OrdinaryCreateFromConstructor(%Map%, "%Map.prototype%", « [[MapData]] »).
-          1. Set _map_.[[MapData]] to the list-concatenation of _intrinsics_ and _hostIntrinsics_.
-          1. Return _map_.
+          1. Return the list-concatenation of _intrinsics_ and _hostIntrinsics_.
       </emu-alg>
     </emu-clause>
@@ … @@ <emu-clause id="sec-map-get" type="abstract operation">
         MapGet(
-          _map_: a Map,
+          _entries_: a List of { [[Key]], [[Value]] } Records,
           _key_: an ECMAScript language value,
         ): a Boolean
       </h1>
       <dl class="header">
       </dl>
       <emu-alg>
-        1. Let _entries_ be the List that is _map_.[[MapData]].
         1. For each Record { [[Key]], [[Value]] } _p_ of _entries_, do
           1. If _p_.[[Key]] is not ~empty~ and SameValueZero(_p_.[[Key]], _key_) is *true*, return _p_.[[Value]].
         1. Return *undefined*.
       </emu-alg>
     </emu-clause>

and so on.

@ljharb
Copy link
Member Author

ljharb commented Feb 2, 2023

That's a great point, and I definitely will do that regardless :-)

@ljharb
Copy link
Member Author

ljharb commented Jun 2, 2023

I'm going to merge this, and then put up a PR that moves to Reflect methods.

After that, I'll file an issue to track whether we should be including all intrinsics or only hidden ones.

@ljharb
Copy link
Member Author

ljharb commented Jun 2, 2023

Also, I'll file an issue about using a Map internally vs iterating over a List.

@thanhlucifer

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend scope to include discoverability
7 participants