Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent <ref> changes do not work for long reference names #153

Closed
domenic opened this issue Aug 6, 2023 · 3 comments · Fixed by #154
Closed

Recent <ref> changes do not work for long reference names #153

domenic opened this issue Aug 6, 2023 · 3 comments · Fixed by #154

Comments

@domenic
Copy link
Member

domenic commented Aug 6, 2023

E.g. in the live https://html.spec.whatwg.org/ now, we have [JSIMPORTATTRIBU] and [CSSVIEWTRANSITI].

No idea why this would be happening... 4cabebb#diff-c1f20712f6803c4501d76807b80642566e51f13847a8c9dcc91d48b366919ece looks pretty reasonable. Maybe something weird about Pascal string semantics...

@domenic
Copy link
Member Author

domenic commented Aug 6, 2023

Claude says:

The issue is that the code is now using the TextContent of the element as the reference name directly, without checking the length.

The TextContent is represented by a rope, which has a default maximum length of 15 characters. So when you assign the TextContent directly to ReferenceName, it gets truncated to 15 characters.

To fix it, you need to convert the rope to a string with no length limit before assigning it to ReferenceName:

-            ReferenceName := Element.TextContent.AsString;
+            ReferenceName := Element.TextContent.ToString;

This will convert the rope to a string with no length limit, preserving the full reference name.

I'll try this tomorrow.

domenic added a commit that referenced this issue Aug 7, 2023
@domenic
Copy link
Member Author

domenic commented Aug 7, 2023

Claude was just hallucinating :(. ToString is not a real method, and there is no default maximum length of ropes. Indeed, ropes are entirely implemented in the Wattsi codebase.

When I tried to upload the ropes.pas file to get a better answer, it just started being dumb, like saying "replace this line of code with [the exact same line of code]".

My best guess as to what happened here is that there are multiple overloads of Append(), and calling Append(aString) will somehow try to treat that as a short inline string of maximum length 15 (I think that's UTF8InlineSize in ropes.pas?). Whereas calling Append(@aString) will trigger the correct overload.

@sideshowbarker
Copy link
Contributor

sideshowbarker commented Aug 8, 2023

My best guess as to what happened here is that there are multiple overloads of Append(), and calling Append(aString) will somehow try to treat that as a short inline string of maximum length 15 (I think that's UTF8InlineSize in ropes.pas?). Whereas calling Append(@aString) will trigger the correct overload.

Yeah. And I opened #155 with patch that attempts to ensure we won’t run into this again (or will catch it when we do).

For the record here, what we got bit with is that the Append() procedure for “Ropes” allows directly passing in a string, but also passing in a pointer to a string:

wattsi/src/lib/ropes.pas

Lines 122 to 126 in 4cabebb

procedure Append(const NewString: PUTF8String); inline;
procedure Append(const NewString: RopeInternals.TInlineString); inline;
procedure Append(const NewString: TUnicodeCodepoint); inline;
procedure Append(const NewString: TUnicodeCodepointArray); inline; // length of data must be <= CodepointsSize (2)
procedure Append(const NewData: Pointer; const NewLength: QWord); inline;

PUTF8String is a pointer to a string; RopeInternals.TInlineString is a string type length-limited to UTF8InlineSize.

The intent of allowing strings to be passed in directly seems to have been just for the case of passing in very short string constants, when the string length is explicitly known not to exceed UTF8InlineSize (which in practice seems to be 15).

But in general as far as I can see, the documented behavior of the language and runtime is that any time you try to use a particular string type with a string whose length exceeds the specified size of that particular string type, the runtime silently truncates the string down to the type’s specified size.

So I guess we really shouldn’t be using Append() with strings directly unless they’re very short string literals — such as Scratch.Append('#refs') — and otherwise if we already have the string in a variable, call Append() with a pointer to the address of that variable: Append(@ReferenceName). And the patch in #155 is an attempt at formalizing that in the code.

domenic added a commit that referenced this issue Aug 8, 2023
sideshowbarker added a commit that referenced this issue Aug 9, 2023
This change reverts 0ad4342 and increases to 255 the length of strings
that can be passed directly to Rope≫Append() without getting truncated.
Otherwise, without this change, any string passed directly to
Rope≫Append() whose length exceeds a particular system-dependent limit
(which in practice seems to be 15) will get silently truncated.

Relates to #153.
sideshowbarker added a commit that referenced this issue Aug 16, 2023
This change adds a Contributing section to the README.md, and cites it
from the error message we emit when a string is passed to Rope≫Append()
that’s larger than the allowed string size.

Relates to #153.
sideshowbarker added a commit that referenced this issue Aug 16, 2023
This change adds a Contributing section to the README.md, and cites it
from the error message we emit when a string is passed to Rope≫Append()
that’s larger than the allowed string size.

Relates to #153.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants