Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: Increase consistency and clarity of string operations #2974

Merged
merged 12 commits into from
Mar 21, 2023

Conversation

gibson042
Copy link
Contributor

@gibson042 gibson042 commented Dec 15, 2022

There was a lot of gratuitous divergence between Encode, Decode, escape, and unescape, and also between those operations and other string operations.

Non-comprehensive improvements:

  • Make better use of ParseText to get Parse Nodes or errors rather than testing syntax first.
  • Align on code {unit,point} whose numeric value is… phrasing rather than code {unit,point} whose value is….
  • Increase the majority of Let _len_ be the length of… over alternate aliases for _len_.
  • Increase the majority of Repeat, while _k_ < _len_ iteration over alternatives like while _k_ ≠ _len_ or embedded If _k_ = _len_, return….
  • Make better use of string-concatenation accepting both strings and code units.

spec.html Outdated
Comment on lines 28984 to 28987
1. Let _hexDigits_ be the substring of _string_ from _k_ + 1 to _k_ + 3.
1. Let _parseResult_ be ParseText(StringToCodePoints(_hexDigits_), |HexDigits[~Sep]|).
1. If _parseResult_ is not a Parse Node, throw a *URIError* exception.
1. Let _B_ be the unsigned 8-bit value corresponding with the MV of _parseResult_.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These re-use the aliases _hexDigits_, _parseResult_, and _B_, so should use Set-to syntax rather than Let-be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider these to be separate aliases in a deeper scope that shadow those in the outer scope, which has the joint benefits of a) matching likely implementation where shadowing is available, and b) allowing reuse of the pattern without gratuitous variation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider these to be separate aliases in a deeper scope that shadow those in the outer scope,

That's reasonable from a programming language point of view, but I don't think there's anything in the spec that supports it. E.g., 5.2 "Algorithm Conventions" doesn't say anything about scopes or shadowing. It could, but my guess is, that would be more bother than it's worth.

Anyway, I'll leave it to the editors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jmdyck; we don't currently define any scoping rules except for Abstract Closures, and it's not worth the complexity of doing so vs just picking different aliases or (my preference here) using Set.

Also, nested scoping would conflict with the current pattern of

1. If condition,
  1. Let _x_ be 1.
1. Else,
  1. Let _x_ be 2.
1. Use _x_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant "deeper scope" from a logical perspective rather than algorithmic. AFAICT from Algorithm Conventions, Let _x_ be … after a preceding Let _x_ be … is not an error and has the effect of declaring a new alias that shadows the previous one temporally rather than lexically (i.e., "may be referenced in any subsequent steps"), which in the case of this algorithm are indistinguishable anyway because these declarations are in the same loop body (itself also supporting the "temporal shadowing" interpretation because each iteration encounters the same Let steps) and there are no references at decreasing indentation.

As noted above, I think that switching to Set would result in gratuitous variation for identical logic:

               1. Let _hexDigits_ be the substring of _string_ from _k_ + 1 to _k_ + 3.
               1. Let _parseResult_ be ParseText(StringToCodePoints(_hexDigits_), |HexDigits[~Sep]|).
               1. If _parseResult_ is not a Parse Node, throw a *URIError* exception.
               1. Let _B_ be the unsigned 8-bit value corresponding with the MV of _parseResult_.
               1. …
               1. If _n_ = 0, then
                 1. …
               1. Else,
                 1. …
                 1. Repeat, while _j_ < _n_,
                   1. …
-                  1. Let _hexDigits_ be the substring of _string_ from _k_ + 1 to _k_ + 3.
-                  1. Let _parseResult_ be ParseText(StringToCodePoints(_hexDigits_), |HexDigits[~Sep]|).
+                  1. Set _hexDigits_ to the substring of _string_ from _k_ + 1 to _k_ + 3.
+                  1. Set _parseResult_ to ParseText(StringToCodePoints(_hexDigits_), |HexDigits[~Sep]|).
                   1. If _parseResult_ is not a Parse Node, throw a *URIError* exception.
-                  1. Let _B_ be the unsigned 8-bit value corresponding with the MV of _parseResult_.
+                  1. Set _B_ to the unsigned 8-bit value corresponding with the MV of _parseResult_.

If that is indeed the editorial preference, I think I would instead encapsulate it in a micro-operation, e.g.:

-              1. Let _hexDigits_ be the substring of _string_ from _k_ + 1 to _k_ + 3.
-              1. Let _parseResult_ be ParseText(StringToCodePoints(_hexDigits_), |HexDigits[~Sep]|).
-              1. If _parseResult_ is not a Parse Node, throw a *URIError* exception.
-              1. Let _B_ be the unsigned 8-bit value corresponding with the MV of _parseResult_.
+              1. Let _B_ be ReadHexOctet(_string_, _k_ + 1).
+              1. If _B_ is not an integer, throw a *URIError* exception.
               1. …
               1. If _n_ = 0, then
                 1. …
               1. Else,
                 1. …
                 1. Repeat, while _j_ < _n_,
                   1. …
-                  1. Let _hexDigits_ be the substring of _string_ from _k_ + 1 to _k_ + 3.
-                  1. Let _parseResult_ be ParseText(StringToCodePoints(_hexDigits_), |HexDigits[~Sep]|).
-                  1. If _parseResult_ is not a Parse Node, throw a *URIError* exception.
-                  1. Let _B_ be the unsigned 8-bit value corresponding with the MV of _parseResult_.
+                  1. Let _continuationByte_ be ReadHexOctet(_string_, _k_ + 1).
+                  1. If _continuationByte_ is not an integer, throw a *URIError* exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the call today, editors are in agreement that we don't want to do this kind of shadowing. If you want to introduce a new AO here that's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

spec.html Outdated Show resolved Hide resolved
@ljharb ljharb requested a review from jmdyck January 16, 2023 21:26
@michaelficarra michaelficarra added editor call to be discussed in the next editor call and removed editor call to be discussed in the next editor call labels Jan 17, 2023
spec.html Show resolved Hide resolved
spec.html Show resolved Hide resolved
spec.html Show resolved Hide resolved
@gibson042 gibson042 force-pushed the 2022-12-string-op-consistency branch from c0105cf to 870ad44 Compare January 31, 2023 19:08
@ljharb ljharb requested review from jmdyck and a team January 31, 2023 19:26
Copy link
Contributor

@bakkot bakkot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than nits.

spec.html Show resolved Hide resolved
spec.html Show resolved Hide resolved
jmdyck
jmdyck previously requested changes Feb 11, 2023
spec.html Show resolved Hide resolved
spec.html Show resolved Hide resolved
@gibson042
Copy link
Contributor Author

Thanks @jmdyck; good suggestions.

@ljharb ljharb requested a review from jmdyck March 1, 2023 23:13
Copy link
Collaborator

@jmdyck jmdyck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay to me.

Copy link
Contributor

@syg syg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % small comments

spec.html Show resolved Hide resolved
spec.html Show resolved Hide resolved
@michaelficarra michaelficarra added the ready to merge Editors believe this PR needs no further reviews, and is ready to land. label Mar 21, 2023
@ljharb ljharb force-pushed the 2022-12-string-op-consistency branch from 2dc4f8e to 5cf7099 Compare March 21, 2023 17:42
@ljharb ljharb merged commit 5cf7099 into tc39:main Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editorial change establishes editorial conventions ready to merge Editors believe this PR needs no further reviews, and is ready to land.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants