Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define string size #74

Closed
annevk opened this issue Mar 17, 2017 · 10 comments
Closed

Define string size #74

annevk opened this issue Mar 17, 2017 · 10 comments

Comments

@annevk
Copy link
Member

annevk commented Mar 17, 2017

In particular for JavaScript string, see #73, we need something like code-unit length from HTML (and then remove that from HTML and use our new concept).

Either we define size and for JavaScript string it's the number of code units and for scalar value string it's the number of scalar values, or size is always code points and we have code-unit size just for JavaScript strings. The latter is probably slightly better since it makes it more explicit?

@domenic
Copy link
Member

domenic commented Mar 17, 2017

I think I agree the latter is better.

Nit: should probably be "length" instead of "size" as is traditional for strings.

@annevk
Copy link
Member Author

annevk commented Mar 20, 2017

I think maybe just as with size, we should use for to distinguish. So we can have for="JavaScript string" (code units), for=string (code points), and for="byte sequence" (bytes).

We can do the same for sorting, which we also need for byte sequences (see Fetch).

I wanted to start working on this but I'm blocking myself on #73 since I don't want to deal with merge conflicts. Suggestions for how to formally define sorting would be welcome by the way.

@domenic
Copy link
Member

domenic commented Mar 20, 2017

Hmm. I guess if people are being very good about distinguishing their string types, then just using "length" and linking to the appropriate definition (with for="") should work OK. The benefit of having explicit separate length concepts (e.g. code-unit length and code-point length) is that it makes you re-emphasize the difference at the point at which you determine the length. But if people are careful with their types that's not necessary.

Formally defining sorting: based on https://en.wikipedia.org/wiki/Sorting_algorithm, the key is to define an ordering relation (x <= y) and then state that the sorted version is the unique permutation of the list such that it's in non-decreasing order. This permutation will not be unique if it's possible to have two things that are equal (x <= y && y <= x) but are not truly identical, e.g. if two strings have identity (which I don't believe they should).

Maybe it'd be best to just define ordering relations actually, and leave sorting with its English/computer science definition. Once we have well-defined ordering relations it's pretty obvious IMO.

@annevk
Copy link
Member Author

annevk commented Mar 22, 2017

For lists we use "size" rather than "length", contradicting ECMAScript. Using size consistently might be easier (since you just need to remember the correct for value). WDYT?

@domenic
Copy link
Member

domenic commented Mar 23, 2017

ECMAScript is a bit weird. Most of its data structures (Maps and Sets) use size. I think if arrays were designed today they might use size. Spec "List" type does use length though, it's true.

I think it's best to not blur the line between data structures and strings/byte sequences, and using different words (size vs. length) helps enforce this I think.

@annevk
Copy link
Member Author

annevk commented Mar 23, 2017

Given what List does I we should also change lists while we can (or maybe accept both for a while). Having one exception is going to be annoying.

@domenic
Copy link
Member

domenic commented Mar 23, 2017

I don't understand. What is the exception? What would we change about lists?

@annevk
Copy link
Member Author

annevk commented Mar 23, 2017

If we align with ECMAScript on size/length except for lists, lists would be an exception. Therefore we'd change it to use length as well.

@annevk
Copy link
Member Author

annevk commented Mar 23, 2017

(Or we consistently use size.)

@domenic
Copy link
Member

domenic commented Mar 23, 2017

I think it's OK for infra to set its own rules here. For data structures, infra uses size. For strings, it uses length. What ECMAScript does can be separate.

I don't think we should make an exception for lists and say they use length even though all other data structures use size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants