-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define string size #74
Comments
I think I agree the latter is better. Nit: should probably be "length" instead of "size" as is traditional for strings. |
I think maybe just as with size, we should use We can do the same for sorting, which we also need for byte sequences (see Fetch). I wanted to start working on this but I'm blocking myself on #73 since I don't want to deal with merge conflicts. Suggestions for how to formally define sorting would be welcome by the way. |
Hmm. I guess if people are being very good about distinguishing their string types, then just using "length" and linking to the appropriate definition (with for="") should work OK. The benefit of having explicit separate length concepts (e.g. code-unit length and code-point length) is that it makes you re-emphasize the difference at the point at which you determine the length. But if people are careful with their types that's not necessary. Formally defining sorting: based on https://en.wikipedia.org/wiki/Sorting_algorithm, the key is to define an ordering relation (x <= y) and then state that the sorted version is the unique permutation of the list such that it's in non-decreasing order. This permutation will not be unique if it's possible to have two things that are equal (x <= y && y <= x) but are not truly identical, e.g. if two strings have identity (which I don't believe they should). Maybe it'd be best to just define ordering relations actually, and leave sorting with its English/computer science definition. Once we have well-defined ordering relations it's pretty obvious IMO. |
For lists we use "size" rather than "length", contradicting ECMAScript. Using size consistently might be easier (since you just need to remember the correct for value). WDYT? |
ECMAScript is a bit weird. Most of its data structures (Maps and Sets) use size. I think if arrays were designed today they might use size. Spec "List" type does use length though, it's true. I think it's best to not blur the line between data structures and strings/byte sequences, and using different words (size vs. length) helps enforce this I think. |
Given what List does I we should also change lists while we can (or maybe accept both for a while). Having one exception is going to be annoying. |
I don't understand. What is the exception? What would we change about lists? |
If we align with ECMAScript on size/length except for lists, lists would be an exception. Therefore we'd change it to use length as well. |
(Or we consistently use size.) |
I think it's OK for infra to set its own rules here. For data structures, infra uses size. For strings, it uses length. What ECMAScript does can be separate. I don't think we should make an exception for lists and say they use length even though all other data structures use size. |
In particular for JavaScript string, see #73, we need something like code-unit length from HTML (and then remove that from HTML and use our new concept).
Either we define size and for JavaScript string it's the number of code units and for scalar value string it's the number of scalar values, or size is always code points and we have code-unit size just for JavaScript strings. The latter is probably slightly better since it makes it more explicit?
The text was updated successfully, but these errors were encountered: