Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remaining variable ambiguity #308

Closed
TimothyGu opened this issue May 4, 2017 · 10 comments
Closed

remaining variable ambiguity #308

TimothyGu opened this issue May 4, 2017 · 10 comments

Comments

@TimothyGu
Copy link
Member

The EOF code point is a conceptual code point that signifies the end of a string or code point stream.

remaining references the substring after pointer in the string being processed.

File state: … remaining consists of zero code points

My understanding is that this condition in file state really covers two cases:

  • when c is the last code point of input
  • when c is the EOF code point

Is this understanding correct?

(I know in this case it doesn't really matter, as the case when c is the EOF code point will take a different switch-case above, but I just wanted to make sure remaining is properly understood and spec'd if necessary.)

@annevk
Copy link
Member

annevk commented May 4, 2017

Yeah, I think that's correct.

@annevk
Copy link
Member

annevk commented May 4, 2017

I wouldn't mind figuring out a better way to write this down though, eventually. It's not the best algorithm style.

@rmisev
Copy link
Member

rmisev commented May 4, 2017

I think we should add a note stating, that the remaining is accessed (used) only when c isn't the EOF.
I checked, this satisfies the current specification.

I think, that such or similar note clarifies the definition and can be valuable for developers.

@annevk
Copy link
Member

annevk commented May 4, 2017

Sounds reasonable.

@GPHemsley
Copy link
Member

GPHemsley commented Jun 4, 2017

I think it's also important to define the behavior of c when pointer is less than 0.

This scenario happens during the transition between "scheme start state" and "scheme state" at the beginning of the string, where pointer is set to -1 for the rest of the run. Though c is not checked during this time, it should be clear what would happen if it were.

Should c just act as if pointer were 0 and point to the first code point? Or should it not reference any code point?

Similarly, if pointer is incremented past the EOF, what does c represent? Does the string end in an infinite number of EOFs? Or just one, with any pointer value after that not referencing any code point?

@rmisev
Copy link
Member

rmisev commented Jun 4, 2017

All your mentioned cases are the access of c when pointer is out-of-bounds. The behaviour is described for lists here: https://infra.spec.whatwg.org/#list

An indexing syntax can be used by providing a zero-based index into a list inside square brackets. The index cannot be out-of-bounds, except when used with exists.

I think the same must be defined for strings with only EOF code point exception (when pointer = string.length). So any attempt to access c then pointer is out-of-bounds (pointer < 0 or pointer > string.length), must be considered a bug.

@GPHemsley
Copy link
Member

I agree that the definitions should be analogous, though I don't know if I would call out-of-bounds a "bug" so much as undefined or similar.

And the existence remaining further complicates the issue: what if pointer is out of bounds but pointer + 1 isn't? Or vice versa? Does the definition of remaining require that c be defined?

@rmisev
Copy link
Member

rmisev commented Jun 4, 2017

Well, what you think about following explanation:

c is defined and accessed only when pointer is within bounds of the string or points to the EOF code point.

remaining is defined and accessed only when pointer is within bounds of the string.

@GPHemsley
Copy link
Member

That appears to be exactly what I've implemented since your previous comment:

  • pointer points to the EOF code point if pointer equals the length of string
  • c is undefined if pointer is less than 0 or greater than the length of string
  • remaining is undefined if c is undefined or if pointer points to the EOF code point

@GPHemsley
Copy link
Member

To get back to the original question: if c points to the last code point in the string, remaining necessarily consists of only the EOF code point. Once c points to the EOF code point, remaining becomes undefined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants