Define string size #74

annevk · 2017-03-17T10:44:38Z

In particular for JavaScript string, see #73, we need something like code-unit length from HTML (and then remove that from HTML and use our new concept).

Either we define size and for JavaScript string it's the number of code units and for scalar value string it's the number of scalar values, or size is always code points and we have code-unit size just for JavaScript strings. The latter is probably slightly better since it makes it more explicit?

domenic · 2017-03-17T18:10:25Z

I think I agree the latter is better.

Nit: should probably be "length" instead of "size" as is traditional for strings.

annevk · 2017-03-20T08:38:03Z

I think maybe just as with size, we should use for to distinguish. So we can have for="JavaScript string" (code units), for=string (code points), and for="byte sequence" (bytes).

We can do the same for sorting, which we also need for byte sequences (see Fetch).

I wanted to start working on this but I'm blocking myself on #73 since I don't want to deal with merge conflicts. Suggestions for how to formally define sorting would be welcome by the way.

domenic · 2017-03-20T20:04:36Z

Hmm. I guess if people are being very good about distinguishing their string types, then just using "length" and linking to the appropriate definition (with for="") should work OK. The benefit of having explicit separate length concepts (e.g. code-unit length and code-point length) is that it makes you re-emphasize the difference at the point at which you determine the length. But if people are careful with their types that's not necessary.

Formally defining sorting: based on https://en.wikipedia.org/wiki/Sorting_algorithm, the key is to define an ordering relation (x <= y) and then state that the sorted version is the unique permutation of the list such that it's in non-decreasing order. This permutation will not be unique if it's possible to have two things that are equal (x <= y && y <= x) but are not truly identical, e.g. if two strings have identity (which I don't believe they should).

Maybe it'd be best to just define ordering relations actually, and leave sorting with its English/computer science definition. Once we have well-defined ordering relations it's pretty obvious IMO.

annevk · 2017-03-22T17:34:19Z

For lists we use "size" rather than "length", contradicting ECMAScript. Using size consistently might be easier (since you just need to remember the correct for value). WDYT?

domenic · 2017-03-23T02:29:08Z

ECMAScript is a bit weird. Most of its data structures (Maps and Sets) use size. I think if arrays were designed today they might use size. Spec "List" type does use length though, it's true.

I think it's best to not blur the line between data structures and strings/byte sequences, and using different words (size vs. length) helps enforce this I think.

annevk · 2017-03-23T06:10:13Z

Given what List does I we should also change lists while we can (or maybe accept both for a while). Having one exception is going to be annoying.

domenic · 2017-03-23T06:17:35Z

I don't understand. What is the exception? What would we change about lists?

annevk · 2017-03-23T06:22:26Z

If we align with ECMAScript on size/length except for lists, lists would be an exception. Therefore we'd change it to use length as well.

annevk · 2017-03-23T06:22:47Z

(Or we consistently use size.)

domenic · 2017-03-23T07:08:34Z

I think it's OK for infra to set its own rules here. For data structures, infra uses size. For strings, it uses length. What ECMAScript does can be separate.

I don't think we should make an exception for lists and say they use length even though all other data structures use size.

Fixes #74.

This was referenced Mar 17, 2017

Tracker for things to move here #6

Closed

Define string sorting by "code unit order" #55

Closed

annevk added a commit that referenced this issue Mar 27, 2017

Define length for byte sequences and strings

80afd6d

Fixes #74.

annevk mentioned this issue Mar 27, 2017

Define length for byte sequences and strings #105

Merged

annevk closed this as completed in #105 Mar 27, 2017

annevk added a commit that referenced this issue Mar 27, 2017

Define length for byte sequences and strings

c484a4f

Fixes #74.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define string size #74

Define string size #74

annevk commented Mar 17, 2017

domenic commented Mar 17, 2017

annevk commented Mar 20, 2017

domenic commented Mar 20, 2017

annevk commented Mar 22, 2017

domenic commented Mar 23, 2017

annevk commented Mar 23, 2017

domenic commented Mar 23, 2017

annevk commented Mar 23, 2017

annevk commented Mar 23, 2017

domenic commented Mar 23, 2017

Define string size #74

Define string size #74

Comments

annevk commented Mar 17, 2017

domenic commented Mar 17, 2017

annevk commented Mar 20, 2017

domenic commented Mar 20, 2017

annevk commented Mar 22, 2017

domenic commented Mar 23, 2017

annevk commented Mar 23, 2017

domenic commented Mar 23, 2017

annevk commented Mar 23, 2017

annevk commented Mar 23, 2017

domenic commented Mar 23, 2017