Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting nodes alphabetically is underspecified #598

Closed
aphillips opened this issue Sep 21, 2018 · 13 comments
Closed

Sorting nodes alphabetically is underspecified #598

aphillips opened this issue Sep 21, 2018 · 13 comments
Labels
agreed-to-close-during-mtg i18n group has discussed and resolved to close, typically in telecon close? The related issue was closed by the Group but open here needs-resolution i18n expects this item to be resolved to their satisfaction. s:webaudio https://webaudio.github.io/web-audio-api/ t:char_sort 5.7 Specifying sort and search functionality wg:audio https://www.w3.org/groups/wg/audio

Comments

@aphillips
Copy link
Contributor

aphillips commented Sep 21, 2018

1.24. The MediaStreamAudioSourceNode Interface
https://www.w3.org/TR/2018/CR-webaudio-20180918/#mediastreamaudiosourcenode

This interface represents an audio source from a MediaStream. The track that will be
used as the source of audio and will be output from this node is the first MediaStreamTrack
whose kind attribute has the value "audio", when alphabetically sorting the tracks of
this MediaStream by their id attribute.

Sorting alphabetically seems to be underspecified. Given that MediaStreamTrack id attributes are recommended to be UUIDs and that the "alphabetic" sorting appears to be an arbitrary way to prioritize the item selection, defining the comparison as ASCII case-insensitive might be a good choice. However, since there is no actual restriction on what the values can be, I'd suggest that you replace "alphabetically" with ordering by code point (which means that it is case-sensitive, please note)

This is a tracker issue. Only discuss things here if they are i18n WG internal meta-discussions about the issue. Contribute to the actual discussion at the following link:

§ WebAudio/web-audio-api#1772

@aphillips aphillips added pending Issue not yet sent to WG, or raised by tracker tool & needing labels. s:webaudio https://webaudio.github.io/web-audio-api/ labels Sep 21, 2018
@aphillips aphillips removed the pending Issue not yet sent to WG, or raised by tracker tool & needing labels. label Sep 27, 2018
@svgeesus
Copy link

Thanks for your comment. We discussed it on todays telcon. We agree that "alphabetically" is imprecise and that we should order by Unicode codepoint. We are not yet sure on whether to do case folding, but suspect that this is indeed a good idea.

RFC 4122 section 3 requires that the characters to be generated in lower case, while being case insensitive on input, though some commonly-used implementations violate this rule.
https://en.wikipedia.org/wiki/Universally_unique_identifier

So these are 128bit numbers represented to users (and input) as hex strings and thus, constrained to US-ASCII.

Given the RFC recommends converting to lowercase on input, and as in practice implementations don't all do that, ASCII case-insensitive (as used by for example CSS) seems like an appropriate method.

@stpeter
Copy link

stpeter commented Sep 27, 2018

@svgeesus A small note: the Media Capture and Streams spec says only that "A good practice is to use a UUID [rfc4122], which is 36 characters long in its canonical form" (this is not even a SHOULD), thus the concern about id values that are not UUIDs in the first place (basically it's just a DOMString).

@aphillips
Copy link
Contributor Author

@svgeesus Thanks for the reply. Adding casefolding adds a level of complexity that you might not want, since it generally brings with it the need to do text normalization or the need to case fold only ASCII characters. My personal suggestion would be to do strict Unicode code point ordering, since the goal of the sort doesn't appear to be to produce any specific order.

@stpeter is right that the id value can be any DOMString, at least technically. But the original text commented appears to be providing a deterministic way of choosing arbitrarily when there is more than one audio item. Either ASCII or Unicode case insensitive comparison requires more plumbing than is needed to do that (there is a tiny bias in favor of implementations that use uppercase in UUIDs, but not enough to make it worth the effort??)

@r12a
Copy link
Contributor

r12a commented Oct 1, 2018

Chaps, this is not the place to hold this discussion since the Web Audio Working Group doesn't see it. (@svgeesus see the notes in the first comment: you intercepted our tracker issue before we had sent you the comment officially :-).) Addison, when you raise the issue in the target WG issue list could you copy over relevant comments there?

@stpeter
Copy link

stpeter commented Oct 1, 2018

@r12a Noted!

@aphillips For the comments in the Web Audio WG tracker: if the spec said MUST be a UUID I'd have no concerns.

@r12a r12a added pending Issue not yet sent to WG, or raised by tracker tool & needing labels. needs-attention The i18n WG should urgently review the status of this item. labels Nov 20, 2019
@svgeesus
Copy link

svgeesus commented Dec 5, 2019

Please note that we clarified the spec, and followed the advice here not to use any case-folding. The definition of sorting has been moved from the section introduction to the definition of the constructor, so the spec now says:

  1. Sort the elements in tracks based on their id attribute using lexicographic ordering on sequences of code unit values.

Accordingly, we closed our issue. Please feel free to re-open it if our changes do not adequately address your concern.

@aphillips
Copy link
Contributor Author

I'll respond and reopen your issue presently. I'm in India today and rushing off to the office just now.

I don't think this works they way you thing. The term "lexicographic" means, effectively, "alphabetical order", as opposed to code point order. In addition, I don't think you want code unit order (which is sensitive to the character encoding used) but rather you mean code points (Unicode Scalar Values)? I would tend to say "Sort the elements in tracks based on their id attribute, ordered by code point values" or similar. More in your issue anon.

@aphillips
Copy link
Contributor Author

Commented at WebAudio/web-audio-api#1772 (comment). Please discuss on that thread.

@r12a r12a added close? The related issue was closed by the Group but open here and removed pending Issue not yet sent to WG, or raised by tracker tool & needing labels. labels Feb 26, 2020
@r12a
Copy link
Contributor

r12a commented Feb 26, 2020

In the end they chose to sort by code unit, rather than by code point. See https://github.com/WebAudio/web-audio-api/pull/1811/files

@r12a r12a added the needs-resolution i18n expects this item to be resolved to their satisfaction. label Feb 26, 2020
@svgeesus
Copy link

Yes, we chose to sort by code unit because Web Audio API is a JavaScript API, and JavaScript DOMStrings are in UTF-16 and sorted by code unit, not code point. Since these are opaque identifiers and all we want is a stable sort that works the same on all platforms and all implementations, code unit sort is the simplest and meets our needs. See rtoy/web-audio-api@986bae6

@r12a @aphillips could you confirm that this is ok?

@aphillips
Copy link
Contributor Author

I'm going to add discussion to the underlying issue. This issue is for tracking that issue (and for discussing about the issue, not the issue itself). That said, my reply there is going to be:

The choice of sorting on code points or code units depends on what one wants to do with the resulting list or array of items. Because this is a list of opaque IDs and the goal is a fast deterministic sort, it makes all kinds of sense to do code unit sorting. So I'm satisfied with the result of the changes.

@aphillips
Copy link
Contributor Author

Should be resolved. I kept the "need attention" label for now so we can close this in our next telecon.

@himorin
Copy link
Contributor

himorin commented Jan 29, 2021

Spec text changed to use "alphabetically sorting" of id to "using lexicographic ordering on sequences of code unit values" (PR), and updated to remove "lexicographic" and to add reference to whatwg/infra "code units" for making sorting method specifically using UTF-16.
Issue resolved, and closing this tracker (agreed during weekly call)

@himorin himorin closed this as completed Jan 29, 2021
@himorin himorin added agreed-to-close-during-mtg i18n group has discussed and resolved to close, typically in telecon and removed needs-attention The i18n WG should urgently review the status of this item. labels Jan 29, 2021
@r12a r12a added the t:char_sort 5.7 Specifying sort and search functionality label Aug 10, 2022
@w3cbot w3cbot added the wg:audio https://www.w3.org/groups/wg/audio label Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agreed-to-close-during-mtg i18n group has discussed and resolved to close, typically in telecon close? The related issue was closed by the Group but open here needs-resolution i18n expects this item to be resolved to their satisfaction. s:webaudio https://webaudio.github.io/web-audio-api/ t:char_sort 5.7 Specifying sort and search functionality wg:audio https://www.w3.org/groups/wg/audio
Projects
None yet
Development

No branches or pull requests

6 participants