Object Keys must omit [S]tring marker #28

ghost · 2013-05-29T20:23:08Z

JSON dictates that Keys can only be Strings. Spending some time reading online it looks like non-String keys, while supported by some parsers and platforms, are problomatic and can lead to hard-to-handle issues -- namely because not ALL platforms or parsers support it.

Right now it is entirely possible to write valid UBJSON that uses non-string values as Keys. This breaks the transitivity of UBJSON to JSON, back to UBJSON again.

This proposal is two fold:

Require that keys are STRINGS.
Require that the S markers are omitted from objects when writing out the keys, saving 1 byte per entry.

I win here is that when moving from Draft 8 to Draft 9 when we normalized lengths on the integer numeric types, we grew Strings by 1 byte each. This would be a nice savings by re-reducing the size of objects again.

Using the example from the Type Ref page:
http://ubjson.org/type-reference/

This leads to a 7% reduction in size.

I realize this requires parsers to maintain more state than they currently have, but it shouldn't be more than a toggle indicating if a KEY or VALUE is being parsed; if the toggle is KEY then the marker should be i/I/l/L -- if it isn't, then an exception.

kxepal · 2013-05-29T20:48:19Z

While it's looks oblivious optimization, I feel "explicit is better than implicit" rule worth something.

However, if Draft 10 ships with STC type, UBJSON parser will be already stateful and this optimization is acceptable.

Moreover, this optimization makes more complex to check for valid object key data: previously parser just read single byte and had check it for equality to S value. If key is malformed, you'd got assertion at this point. Now it have to check it against B/i/I/l/L values, but you get an assertion only at value decoding since encoded key may be just a number. Since my thoughts aren't clear at this night, short example:

Sample hash-map data for some generic language:

{"foo": "bar", 4: "baz"}

UBJSON data:

[{]
    [S][i][3][foo][S][i][3][bar]
    [i][4][S][i][3][baz]
[}]

Draft 9 decoder will raise assertion for [i][4] bytes sequence since it's malformed object key.

Draft 10 will count [i][4] as "string key with length 4" and will try to read 4 bytes ahead. It will count [S][i][3][b] as key and will fail on decoding [a] marker raising exception about malformed value - but it is not.

Sure the encoder should validate hash map keys before serialization, but nobody perfect.

0.33(3) for this change.

ghost · 2013-05-29T21:34:38Z

Alex, you have an uncanny ability to think of edge cases faster than anyone I've ever met; I love it :)

I want this simplification (just like I still want #27 ) but the edge case you identified concerns me. We lose the ability to accurately/quickly communicate the exact position of a malformed construct in the stream.

You identified this edge case very quickly, makes me wonder what other edge cases we don't see yet that might bite us in the ass.

rogusdev · 2013-05-30T12:23:39Z

Actually, in the example given, draft 10 should raise an exception on the very second byte: [S], not on [i][4]. When it encounters the [S] byte as the beginning of the key string's length, [S] is not a valid integer type, so that is the exception here, and is in fact the exact position of the malformed data, so everything still works properly, to my mind.

ghost · 2013-05-30T14:20:02Z

@rogusdev True, but changing the example to read:

[{]
    [i][3][foo][S][i][3][bar]
    [i][4][S][i][3][baz]
[}]

still illustrates Alex's point fairly well. Removing the markers does make parsing a bit more sloppy in the face of invalid data unfortunately.

ghost · 2013-05-30T22:41:26Z

Closing this -- after discussing it at length with others it does start to make the spec sloppy.

If we end up solving this with a Typed Group or something akin to that, fine, but for now I would rather OBJECTS continue be well-defined and easily parsed than save a few bytes in the name of structure.

ghost self-assigned this May 29, 2013

ghost closed this as completed May 30, 2013

ghost removed their assignment Aug 15, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Object Keys must omit [S]tring marker #28

Object Keys must omit [S]tring marker #28

ghost commented May 29, 2013

kxepal commented May 29, 2013

ghost commented May 29, 2013

rogusdev commented May 30, 2013

ghost commented May 30, 2013

ghost commented May 30, 2013

Object Keys must omit [S]tring marker #28

Object Keys must omit [S]tring marker #28

Comments

ghost commented May 29, 2013

kxepal commented May 29, 2013

ghost commented May 29, 2013

rogusdev commented May 30, 2013

ghost commented May 30, 2013

ghost commented May 30, 2013