Skip to content
This repository has been archived by the owner on Oct 10, 2019. It is now read-only.

Support JSON serialisation of BigInt values #24

Closed
claudepache opened this issue Mar 17, 2017 · 84 comments
Closed

Support JSON serialisation of BigInt values #24

claudepache opened this issue Mar 17, 2017 · 84 comments

Comments

@claudepache
Copy link

claudepache commented Mar 17, 2017

The abstract operation SerializeJSONProperty needs to be patched, so that JSON.stringify(1n) returns "1" instead of undefined:

  • If Type(value) is Integer, then return a string representing value, without any suffix.

I don’t think there is anything to do for JSON.parse, though.

@littledan
Copy link
Member

Do we want that behavior? Maybe if we return undefined then the issue will be more visible, as their code will not work at all, rather than giving a funny path to implicitly coerce an Integer to a Number.

@rauschma
Copy link

Being able to store 64-bit integers in JSON would be nice: https://dev.twitter.com/overview/api/twitter-ids-json-and-snowflake

Not sure how to best do that, though. One option is to allow Integer literals in JSON.

@littledan
Copy link
Member

@rwaldron explained to me a while ago that we can't just do that. It'd break the JSON-web, in the sense that if one endpoint starts sending things that are outside the normal JSON grammar that the other endpoint doesn't expect, that could cause things to not parse. For this reason, the Integer proposal doesn't currently add to JSON. A user has to pass in their own serializer function, etc, to get Integer support.

@michaelficarra
Copy link
Member

I would prefer null (remember, JSON doesn't have undefined) over any value that won't round-trip through JSON.parse as an integer again.

@claudepache
Copy link
Author

claudepache commented Mar 18, 2017

Do we want that behavior? Maybe if we return undefined then the issue will be more visible, as their code will not work at all, rather than giving a funny path to implicitly coerce an Integer to a Number.

Note that JSON.stringify does already coerce Infinity and NaN into null.

Also, the coercion to Number doesn’t seem implicit to me, at least no more than in the current situation. I mean, doing JSON.parse is already coercing arbitrary-precision decimal numbers into a Number value. (And, to be clear, the arbitrary-precision decimal number in question might be routinely produced by some alien programming language that supports serialisation of 64-bit integers.)

In order to support Integer on the JSON.parse side, a replacer may (and must) be used (just like Date). (Nevermind, it can’t work for large integers.) But on the JSON.stringify side, it is difficult to do it manually, and there is no compatibility concern to do it natively.

@littledan
Copy link
Member

It would really be great if we could support larger values in JSON, but I don't have a full understanding of compatibility risks. Let's see if we can get input from a broader set of experts here.

@MaxArt2501
Copy link

@littledan

@rwaldron explained to me a while ago that we can't just do that. It'd break the JSON-web, in the sense that if one endpoint starts sending things that are outside the normal JSON grammar that the other endpoint doesn't expect, that could cause things to not parse.

But since the spec doesn't limit the length of a number representation in JSON, that is actually the "normal JSON grammar". And as @rauschma mentioned, there are already APIs that use long precision integers in their JSON responses, and @claudepache reminded that common parsing implementations just lose precision. Any JSON-parsing method that breaks on them is outright a bad (or incomplete/limited) implementation.

About existing APIs, it's not like that "Integers happen", like a feature that just stops working because its support has been removed: to use them, a deliberate action on the code has to be taken. Numbers won't get coerced to Integers, so existing APIs won't break anyway.

JSON.stringify never returns undefined unless its argument is undefined or has a toJSON method that returns undefined. Looks like it's this case, but the problem in returning undefined is that serializing something like {foo:1n} makes the property "foo" just disappear from the result - are we sure that won't break the other endpoint?

If you want to make the problem of using Integers in JSON evident, I'd prefer if it could throw a TypeError or something. But personally I feel there's something odd in extending a language with new types but not giving them support in other native features.

On the other hand, I understand your approach here:

It would really be great if we could support larger values in JSON, but I don't have a full understanding of compatibility risks. Let's see if we can get input from a broader set of experts here.

That's completely reasonable. Can we say that further study on the impact of JSON support has to be done, rather than cut everything short with undefined?

@littledan
Copy link
Member

Can we say that further study on the impact of JSON support has to be done, rather than cut everything short with undefined?

Yes, that's what this bug is for. The currently proposed semantics are just a starting point.

@bakkot
Copy link
Contributor

bakkot commented Mar 25, 2017

the problem in returning undefined is that serializing something like {foo:1n} makes the property "foo" just disappear from the result - are we sure that won't break the other endpoint?

Reasonably sure, yes; there are already other values which likewise disappear from serialization. For that to break an endpoint, you'd have to start serializing a value of a type which did not previously exist, and be relying on it appearing in the serialization. But it's already the case that not all values appear in serialization.

But personally I feel there's something odd in extending a language with new types but not giving them support in other native features.

We already did that with symbols. The fundamental problem is that JSON is a protocol which exists outside of JS, which means we can't extend it to new types. In this particular case, we might be able to get around it using the fact that the JSON spec allows arbitrary-width integers, but it's not something we can avoid in general.


Currently we have an invariant (unless .toJSON is overridden, as is the case with Date): a primitive property either round-trips through JSON.parse(JSON.stringify(_)), or stringify's to undefined. I'm hesitant to break this for primitive types.


Incidentally, "the spec" and "the normal JSON grammar" are kind of ambiguous. RFC 7159, which is the thing I assume we mean, says:

"This specification allows implementations to set limits on the range and precision of numbers accepted."

It further goes on to say that IEEE doubles are probably good enough.

Of course, parsing JSON in practice is already pretty crazy, without broad agreement on corner cases.

@littledan
Copy link
Member

FWIW in TC39-land, we might be talking about ECMA 404. This document does not describe the interpretation of Numbers.

@rwaldron
Copy link
Contributor

It would really be great if we could support larger values in JSON, but I don't have a full understanding of compatibility risks. Let's see if we can get input from a broader set of experts here.

This is actually exactly where I landed last week when I was first @-mentioned here, so consider this a "second" in support of @littledan's call for more information.

It further goes on to say that IEEE doubles are probably good enough.

I'd like to piece together a bit of historic information here, as it will be easier to discuss with everything present:

  • JSON was originally defined as a subset of the "ECMAScript Language Specification (Standard ECMA-262 3rd Edition - December 1999)"
  • The above reference provides little guidance with regard to numbers, saying only:
    • "A value can be a string in double quotes, or a number, or true or false or null, or an object or an array"

    • "A number is very much like a C or Java number, except that the octal and hexadecimal formats are not used."

    • And the grammar on the right:

      number
        int
        int frac
        int exp
        int frac exp
      
      int
        digit
        digit1-9 digits 
        - digit
        - digit1-9 digits
      
      frac
        . digits
      
      exp
        e digits
      
      digits
        digit
        digit digits
      
      e
        e
        e+
        e-
        E
        E+
        E-
      

      ...which itself is incomplete.

  • To fill in the blanks, one might reasonably assume that the number portion of JSON is defined in ECMA-262 3rd Edition
    • "The Number type has exactly 18437736874454810627 (that is, 264−253+3) values, representing the double- precision 64-bit format IEEE 754 values as specified in the IEEE Standard for Binary Floating-Point Arithmetic" (8.5 on page 24)
  • In "ECMAScript Language Specification (Standard ECMA-262 5.1 Edition - June 2011)", the following appears:

The JSON Data Interchange Format is described in RFC 4627 http://www.ietf.org/rfc/rfc4627.txt. The JSON interchange format used in this specification is exactly that described by RFC 4627 with two exceptions:

  • The top level JSONText production of the ECMAScript JSON grammar may consist of any JSONValue rather than being restricted to being a JSONObject or a JSONArray as specified by RFC 4627.
  • Conforming implementations of JSON.parse and JSON.stringify must support the exact interchange format described in this specification without any deletions or extensions to the format. This differs from RFC 4627 which permits a JSON parser to accept non-JSON forms and extensions.
  • Considering the above normative prose, JSONValue defines valid values

  • In 15.12.1.2 The JSON Syntactic Grammar of ECMA-262 5.1, JSONValue is defined to include JSONNumber, which itself is...

    JSONNumber ::
    - opt DecimalIntegerLiteral JSONFractionopt ExponentPartopt

    JSONFraction ::
    . DecimalDigits

  • Elsewhere, DecimalIntegerLiteral and DecimalDigits are defined...

    DecimalIntegerLiteral ::
    0
    NonZeroDigit DecimalDigitsopt

    DecimalDigits ::
    DecimalDigit
    DecimalDigits DecimalDigit

    DecimalDigit :: one of
    0123456789

    NonZeroDigit :: one of
    123456789

    ...Which are subject to the exhaustive semantics defined in 7.8.3 Numeric Literals of ECMA-262 5.1

  • That leads to the 2013 publishing of Ecma-404 (which @littledan has already mentioned above)


Anyway, don't take any of that as argument or evidence in support of any particular point, I just wanted to make sure we all had all of the same information and historical context.

@leobalter
Copy link
Member

I remember Brendan Eich and other people at TC39 suggesting the max-min strategy in different proposals.

@rwaldron just mentioned several points to change if we want to simply start writing changes to support a new type in JSON. This is only the start.

We also need to identify where a change to JSON grammar would affect, and that's way beyond what we usually call web-reality. It's not limited to browsers, but anything parsing json. Include other languages, Perl, Python, etc. Call everyone to check for compatibility and to create some expectation for implementation.

@littledan my suggestion is to roll with casting Integers to null - as mentioned before - and then we can expect the work on JSON in a separate proposal, which might be bigger than this one.

remember the max-min: cap the proposal to the minimum necessary for us to land this (that's my own interpretation). Further extensions to Integer support should come separately.


If anyone really wants to pursue this JSON support, I suggest working in a proposal addressing all of these fields we will need this change. We'll need tests, we'll need impact research, and more.

Nice to have features might require a lot of extra work, which I'm fond to avoid for now while we don't even have the basic new type landing as a feature yet.

@ljharb
Copy link
Member

ljharb commented Mar 28, 2017

I don't think there's anything wrong with omitting a given type of value from JSON, conceptually.

However, I think it would be very unfortunate, unintuitive, and confusing if Integers were silently omitted from JSON. Symbols, functions, regexes, and dates don't have an intuitive or reliable non-string serialization format, so there's no conceptual conflict with omitting them. However, integers absolutely do - you just represent the integers. I've often seen non-JS runtimes produce integer values as Numbers in JSON that are larger than JS itself can support (64 bit tweet IDs, for example) that worked in Java and Ruby, but broke when they landed in JS due to truncation - which means that the JSON ecosystem already works fine with numbers that JS itself can't truly support.

I think this JSON work should be, if not completed, at least relatively mapped out prior to Integers being introduced.

@leobalter
Copy link
Member

@ljharb I understand and agree we should have compatibility with JSON. My suggestion is to apply this first as-is, and then we work in an exclusive proposal for JSON support.

@ljharb
Copy link
Member

ljharb commented Mar 28, 2017

I think it could be dangerous to ship the first part without understanding the shape of the second part.

@littledan
Copy link
Member

@ljharb From the comments in this thread, it seems like the upgrade to Integers in JSON will be a heavy, breaking change. It might not be possible to migrate the ecosystem to support it at all. I'm not sure what sort of planning we can do ahead of time to mitigate these issues.

@bterlson
Copy link
Member

@littledan fail fast is the general guidance in such situations, I guess?

@ljharb
Copy link
Member

ljharb commented Mar 29, 2017

@littledan right, that's why i don't think it's appropriate to defer the JSON question until "later" - i think a definitive answer is needed now.

If integers in JSON just stored a number representation of it - even if JS is unable to parse it accurately - I think the ecosystem would handle it just fine. If you wanted your big integers to be precise, you'd simply need a JSON.parse that supported them.

@bterlson
Copy link
Member

Creating a situation where re-encoding a JSON document is possibly lossy seems like a recipe for disaster IMO. JSON should support integers with different syntax I think (though I make no judgement on how likely this is to work).

@littledan
Copy link
Member

@bterlson Good point. Throwing here breaks new ground, but it seems justified for future-proofing. I'll make a patch to do this.

@ljharb
Copy link
Member

ljharb commented Mar 29, 2017

@bterlson that's already always been the case with JSON - JSON can have duplicate keys, for example, and parsing + restringifying it will collapse them.

@bakkot
Copy link
Contributor

bakkot commented Mar 29, 2017

@ljharb While that's true (and more relevantly JSON can already contain arbitrarily large integers), we currently do have that JSON.parse(JSON.stringify(JSON.parse(y))) === JSON.parse(y) and JSON.stringify(JSON.parse(JSON.stringify(x))) === JSON.stringify(x) for all x and y (I think). Encoding an Integer as a large number would break that invariant. That's concerning.

@littledan
Copy link
Member

I think the two acceptable options here are:

  • Leave the Integer out of the JSON conversion, following Symbol's behavior (which is what the current spec does--implemented by returning undefined from SerializeJSONProperty)
  • Throw an exception for better future-proofing.

I don't think we should do something which loses precision of an Integer over this round-tripping that @bakkot mentions. The whole point of this proposal is to provide precision. Losing precision is somehow worse than having a duplicate key missing--it's just subtly wrong, so the bug is less likely to be caught in simple testing.

@bterlson
Copy link
Member

@ljharb you are right (and there are at least a handful of such cases) but none that affect the naïve round trip shown by @bakkot. I bet that pattern is executed billions of times a day :)

@ljharb
Copy link
Member

ljharb commented Mar 29, 2017

I'm starting to really like the concept of allowing conversion (and serialization) below MAX_SAFE_INTEGER, and throwing/omitting otherwise. By far the common case will be to use safe integers, and if they can't easily convert to Numbers and send them over the wire via JSON, then i think the majority of users will never bother using this type.

@bterlson
Copy link
Member

The choice should dovetail with other choices. If Integers are going to be "throwy" for things like possibly lossy math operations then throwy makes sense here too. I don't mind the "do it if you can without losing precision, otherwise throw" approach, though I can imagine this causing service failures when some application generates an ID over MAX_SAFE_INTEGER after many months of operation :-P

FWIW, ignoring doesn't make sense to me in this case. Symbols seem like something that is clearly not representable in JSON and ignoring is what you probably want to do all the time anyway. Integers, seems less likely to be so.

@bakkot
Copy link
Contributor

bakkot commented Mar 29, 2017

though I can imagine this causing service failures when some application generates an ID over MAX_SAFE_INTEGER after many months of operation

That's a major concern with serializing only when ≤ 2^53. People will try it with small Integers and get no indication that it will fail for larger ones. I'd expect that to be a major source of bugs.

@ljharb
Copy link
Member

ljharb commented Mar 29, 2017

I think it's important to note that any implementation supporting integers would in theory (if they serialized to a numeric integer representation, even for large ints) also support parsing them, so the invariant above would still hold.

The difficulty would only be that older unpolyfilled json implementations couldn't parse large ints without losing precision - which is what already happens if you get an int64 from a non-js json source.

@bakkot
Copy link
Contributor

bakkot commented Mar 29, 2017

Modifying JSON.parse would be a backwards incompatible change: JSON.parse('9007199254740993') currently gives you the number 9007199254740992, not an Integer. Maybe that would be ok, but I'd worry.

@vitaly-t
Copy link

vitaly-t commented Oct 6, 2019

For those who came here seeking to serialize BigInt into JSON, without caring if it is irreversible, here's a simple but usable hack:

// Does JSON.stringify, with support for BigInt (irreversible)
function toJson(data) {
    if (data !== undefined) {
        return JSON.stringify(data, (_, v) => typeof v === 'bigint' ? `${v}n` : v)
            .replace(/"(-?\d+)n"/g, (_, a) => a);
    }
}

It formats each BigInt as a string, appending n to it, and then un-quotes all such numbers within the string and removes the n, with the help of RegEx.

Two things to keep in mind:

  • If you try to reverse it with JSON.parse, any number that was originally BigInt will become number, and thus losing anything that's between 2^53 and 2^64 - 1
  • If your data has values that look like 123n, those will become open numbers. Note however that you can easily obfuscate the format to anything you want, like ${^123^}, or 123-bigint, and shrink the chance of a conflict into nothing.

But on the whole, this is quite usable in all cases where the reverse operation is not required. For example, within database code - generating JSON with BigInt that will be sent into the server, so no reverse operation is needed.

@saschanaz
Copy link

saschanaz commented Oct 6, 2019

FYI there is a trial to make it (partly) reversible: https://github.com/tc39/proposal-json-parse-with-source

@vitaly-t
Copy link

vitaly-t commented Oct 6, 2019

@saschanaz Thanks! In my case I didn't need it to be reversible, so I came up with the simplest solution that I've shared here ;)

@kaizhu256
Copy link

It formats each BigInt as a string, appending n to it, and then un-quotes all such numbers within the string and removes the n, with the help of RegEx.

above is not generally applicable, because its common for json-data to have random-string ids that have a small-chance of looking like bigint:

var jsonData = {
    // random id could potentially have bigint signature,
    // e.g. "id": "123456789012n"
    "id": Math.random().toString(16).slice(2),
    "bigint": "1234n"
}

there are honestly surprisingly few javascript use-cases where you need to revive bigint values from json. like temporals, "bigints" are mostly meant for [external] database-logic (e.g. wasm-sqlite3).

javascript's role is mainly i/o and baton-passing stringified temporals/bigint/etc. between databases, rather than doing any of those business-logic themselves.

@ljharb
Copy link
Member

ljharb commented Oct 6, 2019

although i agree with most of your point. JavaScript’s role is not mainly i/o, or baton-passing; it absolutely includes business logic - I’m not sure how many dozens of times this has to be repeated before you internalize it.

@kaizhu256
Copy link

wasm-sqlite3 will change that.

it's more cost-effective/maintainnable to use sql-queries for data aggregation/sorting/joining/etc. than in javascript.

@ljharb
Copy link
Member

ljharb commented Oct 6, 2019

Whether it will or not (i highly doubt that it will), that's irrelevant to the current state of things - which is that JavaScript is primarily for everything, and minimizing/brushing off use cases merely because you don't have that use case is not productive.

@vitaly-t
Copy link

vitaly-t commented Oct 6, 2019

@kaizhu256 You were repeating what I said about the use of the format already. Surely 123n can be suddenly your random string text, but that does not diminish the value of the approach. And beside, you can further obfuscate it, to something like ${^123^} or 123-bigint, which the algorithm allows easily, and then the chances of a matching string go into nothing.

@ljharb I've just released my PostgreSQL driver that's in heavy use by the DEV community, which now supports BigInt, so one can use objects with BigInt directly within JSON queries in the code. So go ahead, and call it "unproductive" here. But people will use it right now, because it is highly usable, and regardless of what you are contemplating here for the future of JavaScript.

Because that is very real, hardcore hands-on use case for using BigInt in JSON. Even the very reason of why I published it here, by looking at the long list of referenced issues, and the reason I myself came here, looking for something like that. That is why I chose to share my solution here.

@vitaly-t
Copy link

vitaly-t commented Oct 6, 2019

As an update, improving on safety of the suggested earlier work-around...

You can make the work-around safe by counting the number of replacements to match that of the BigInt injections, and throwing when there is a mismatch:

// Like JSON.stringify, but with safe support for BigInt (irreversible)
function toJson(data) {
    if (data !== undefined) {
        let intCount = 0, repCount = 0;
        const json = JSON.stringify(data, (_, v) => {
            if (typeof v === 'bigint') {
                intCount++;
                return `${v}#bigint`;
            }
            return v;
        });
        const res = json.replace(/"(-?\d+)#bigint"/g, (_, a) => {
            repCount++;
            return a;
        });
        if (repCount > intCount) {
            // You have a string somewhere that looks like "123#bigint";
            throw new Error(`BigInt serialization pattern conflict with a string value.`);
        }
        return res;
    }
}

Above I am using "123#bigint" pattern from my own code, but you can use any other, if you like. The important thing is that if it is not unique enough, the issue will be reported.

@apaprocki
Copy link
Contributor

@vitaly-t see #24 (comment)

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt#Use_within_JSON

The spec specifically allows you to integrate with JSON.stringify() directly by supplying a BigInt.prototype.toJSON in the program.

@vitaly-t
Copy link

vitaly-t commented Oct 6, 2019

@apaprocki You are missing the point. I'm well aware of that spec.

The code I provided is to be able to generate BigInt as an open value within the resulting JSON, which the spec does not permit. Many servers, like PostgreSQL in my example, require that bigint is an open value, and not a double-quoted one.

Example

const input = {
    value: 12345n
};

// Needs to become after serialization:
//=> {"value": 12345}

// And NOT this one:
//=> {"value": "12345"}

@apaprocki
Copy link
Contributor

Yes, but personally I would explore whether the receiving end with that restriction is open to supporting a quoted representation. There’s no reason why it couldn’t support both, especially with the restrictions of JSON.parse in JS.

@vitaly-t
Copy link

vitaly-t commented Oct 6, 2019

Are you are suggesting that I should pester PostgreSQL team to support JSON differently? :))))))

They don't care what JavaScript can or cannot do, because the server wasn't developed for JavaScript clients. They use their own versions of JSON parsers that have nothing to do with JavaScript. This is why there are libraries, like mine, that mediate the discrepancies internally.

JavaScript, on the other hand, should be flexible enough to allow at least some level of customization on how it generates or parses JSON. I think this would be more reasonable to pursue. As of now, lack of provision for serializing BigInt as an open value is frustrating.

@apaprocki
Copy link
Contributor

Well, looking at the documentation, they already have a double() jsonpath operator for the purpose of returning a double value from either a number or string — it just seems like no one asked for or submitted the bigint() equivalent.

@cyberphone
Copy link

If "full" JSON support is needed, here is such a proposal. Replace ES6 with ECMAScript.
https://github.com/cyberphone/es6-bigint-json-support

@vitaly-t
Copy link

vitaly-t commented Oct 6, 2019

@cyberphone That is rad, if you can make it stick! 😄

That is way beyond my ambitions on this matter.

@ljharb
Copy link
Member

ljharb commented Oct 6, 2019

@vitaly-t i think you’re misunderstanding me; what I’m saying is unproductive is claiming there’s no use case for what you’re talking about. Happy to see people experimenting in this space.

I think https://github.com/tc39/proposal-json-parse-with-source or similar proposals are good directions to explore supporting BigInt in JSON.

@vitaly-t
Copy link

vitaly-t commented Oct 6, 2019

I’m saying is unproductive is claiming there’s no use case for what you’re talking about

@ljharb From my answer to you earlier, how is not a use case? -

Because that is very real, hardcore hands-on use case for using BigInt in JSON. Even the very reason of why I published it here, by looking at the long list of referenced issues, and the reason I myself came here, looking for something like that. That is why I chose to share my solution here.

I explained why and where it is needed, on a very practical note, and you are saying this is not a practical case? Here's associated research I had to make for this.

@ljharb
Copy link
Member

ljharb commented Oct 6, 2019

@vitaly-t again, you misunderstand. It is a practical case, i respect your use case. I was never replying to you before, only to @kaizhu256’s attempt to marginalize your use case.

@vitaly-t
Copy link

vitaly-t commented Oct 6, 2019

@ljharb Ok, never mind, I think the way you phrased it confused me completely 😄 So it was kind of double-negation perspective that was lost on me, as such things often do 😄

@kaizhu256
Copy link

Acknowledge marginalization came from me and apologize for any wrath misdirected @ljharb rather than me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests