Skip to content

Conversation

socram8888
Copy link

Overhauled the script to extract all available revisions for each of the standards, so it is possible to link to a specific one.

Now also the main URL for all Unicode standards now point to the latest live on their website.

str += node.textContent;
}
return str;
return trimText(str);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling trimText here introduces a side effect: a few pages (e.g., UTS37) separate authors using lines, and trimText replaces all white spaces and line terminators with a single white space. The parseEditor function is then unable to split authors as \n no longer matches anything.

Copy link
Author

@socram8888 socram8888 Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The reason for the change is that some documents have inline coloring via <span>s.

For example TR35-3, it has a version set to:

1<span>.</span><span class="changed">2 (draft 5)</span>

The original version caused it to become:

1 . 2 (draft 5)

I will think of a better implementation that works for all cases.

Overhauled the script to extract all available revisions for each of
the standards, so it is possible to link to a specific one.

Now also the main URL for all Unicode standards now point to the
latest live on their website.
@tidoust
Copy link
Collaborator

tidoust commented Sep 24, 2025

The update drops rawDate at the root level. Now, I realize that SpecRef is somewhat inconsistent there: the date is always set for W3C entries (to the date of the latest version), sometimes for entries with versions in biblio.json, and never for WHATWG entries. Given that Unicode specs are not updated on a continuous basis, I would report the last date to the root level as well, so that specs that care about dates can display the date when they reference the spec.

@socram8888
Copy link
Author

socram8888 commented Sep 24, 2025

I can add that no problem, but I'm not sure if it's a good idea in general for versioned entries when not referencing any version in particular?

If they want to explicitely state the last version they checked for compatibility alongside its date, they can now reference a particular version.

For a non-specific version, however, the date would cause the documents referring to it to also change the date any time they're recompiled, even if the writer has not actually checked the newer version to be fully compatible with the documentation.

For example, UTS46-33 made some changes in the processing that were not covered in the WHATWG URL specs at the time, and needed some changes (whatwg/url#836). With the date there, any recompilations of the WHATWG URL document between the new UTS46-33 and ammending of the WHATWG URL standard, would cause the date to be also updated, incorrectly implying UTS46-33 changes were already taken into account.

IMO if they want to specify a non-specific version with a check date, that should be manually stated by the writer, as the compilation time will be later than the time they've checked it, and the refDate at the root level could be different.

Copy link
Collaborator

@tidoust tidoust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the argument goes both ways. That is, without any mention of date, you also imply that the latest version you're going to get when you retrieve the URL was the one taken into account. That's what you get when you choose to reference "the latest version of a spec". With a date, you could at least theoretically speaking spot the fact that the document you're referencing has changed when you re-build your spec.

That said, it seems wrong to use the latest URL along with a date and then, when you click on the link, you actually get a version published at a later date. It might actually explain why the script had been written this way: the URL and date at the root level were aligned, editors who care about dates could have both a way not to worry about versions when they edited their spec ("just use the latest one") and an automated pinning mechanism when they published their spec.

In short, I think you're right, if we're going to use a non specific URL, dropping the date seems indeed better.

I'm approving the PR but not merging immediately to leave a bit of time for other possible reviewers to chime in if they feel strongly about the direction here.

@tidoust tidoust requested a review from tobie September 24, 2025 11:11
@tobie
Copy link
Owner

tobie commented Sep 25, 2025

I think we should leave the date for the reason @tidoust mentioned here:

I guess the argument goes both ways. That is, without any mention of date, you also imply that the latest version you're going to get when you retrieve the URL was the one taken into account. That's what you get when you choose to reference "the latest version of a spec". With a date, you could at least theoretically speaking spot the fact that the document you're referencing has changed when you re-build your spec.

@socram8888
Copy link
Author

Would it make sense to do that outside the extraction script, though? Sort of max([c.refdate for c in versions]) so it's consistent for other sources, as currently that is not the case as @tidoust said for W3G standards.

@tobie
Copy link
Owner

tobie commented Sep 26, 2025

Possibly. Would argue doing this in a separate PR, though

@socram8888
Copy link
Author

I guess that could be done in https://github.com/tobie/specref/blob/main/lib/bibref.js#L263-L270, eg if parent.rawDate is null check if latest isn't and copy it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants