Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support BigInt values #37

Merged
merged 1 commit into from Nov 23, 2021

Conversation

paul-marechal
Copy link
Member

@paul-marechal paul-marechal commented Oct 1, 2021

JSON doesn't constrain numeric values to be specific sizes. In most
languages this isn't an issue thanks to proper support for a wide
variety of number precision.

Unfortunately JavaScript struggles with integer numbers bigger than
Number.MAX_SAFE_INTEGER. This is due to JavaScript's implementation of
integer values as double. For anything bigger than this max safe
integer value JavaScript provides a BigInt type.

The first issue with this behavior is that when parsing JSON, there is
no way of telling which "number flavor" to expect, APIs just don't allow
us to define so. Another issue is that the default JSON
serializer/deserializer doesn't even know how to handle BigInt.

This commit fixes those issues as followed:

  1. Use a custom JSON parser json-bigint that can serialize and
    deserialze BigInt values. A quirk is that it will only deserialize
    a number into a BigInt if the serialized value ends up being bigger
    than JavaScript's max safe integer. This means there is ambiguity
    when deserialing data.
  2. To address the ambiguity issue, I added a TypeScript mapped type
    Deserialized<T> that given a type T will replace all fields
    that are either bigint or number. Note that this is only static
    typing to teach TypeScript about this behavior. Then when receiving
    messages, one has to define Normalizer functions which will take
    this ambiguous type as input and return a "normalized" version that
    matches the original type T. See this as post-processing to make
    sure the received data is using the proper data types in the right
    places.
  3. To help with the normalization I added a createNormalizer helper
    function that should statically figure out what fields need to be
    normalized for a given type and ensure that those are properly
    handled.
  4. Rewrite all the tests to validate this logic using test data as
    coming out of the current Trace Server.

@paul-marechal
Copy link
Member Author

paul-marechal commented Oct 1, 2021

Regarding 4) I wish I wouldn't have had to rewrite everything, but as far as I could tell nothing was being tested...

Huge thanks to @PatrickTasse for providing the test data.

Another thing is that implementing the parsing/normalization revealed some TSP messages to be incorrectly defined? If you look at the changes made to the TypeScript interfaces I had to add an optional marker (?) for a bunch of fields. There seems to be a mismatch between the TSP specification and what the Trace Server returns? If I had to guess, I'd say you guys need to do a pass on the fields again and properly specify what's required and what's optional.

@paul-marechal
Copy link
Member Author

Lastly this work is based on #28 so there might be a few common changes.

@paul-marechal paul-marechal force-pushed the mp/serial branch 5 times, most recently from f2b5b51 to 1496d5e Compare October 1, 2021 17:34
src/protocol/rest-client.ts Outdated Show resolved Hide resolved
src/protocol/rest-client.ts Outdated Show resolved Hide resolved
src/protocol/rest-client.ts Outdated Show resolved Hide resolved
src/protocol/rest-client.ts Outdated Show resolved Hide resolved
src/protocol/rest-client.ts Outdated Show resolved Hide resolved
src/protocol/rest-client.ts Outdated Show resolved Hide resolved
src/models/xy.ts Outdated Show resolved Hide resolved
src/protocol/rest-client.ts Outdated Show resolved Hide resolved
src/models/annotation.ts Outdated Show resolved Hide resolved
src/models/timegraph.ts Outdated Show resolved Hide resolved
src/models/xy.ts Outdated Show resolved Hide resolved
src/protocol/serialization.ts Outdated Show resolved Hide resolved
src/protocol/tsp-client.test.ts Outdated Show resolved Hide resolved
const response = await client.fetchExperiment('not-relevant');
const experiment = response.getModel()!;

expect(typeof experiment.end).toEqual('bigint');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're only checking that it's a bigint, but we're not checking that the bigint was created without loss of precision. Or would it be possible to check the actual value against the expected value from the fixture?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this as a followup?

src/protocol/tsp-client.test.ts Outdated Show resolved Hide resolved
src/protocol/serialization.ts Outdated Show resolved Hide resolved
@paul-marechal
Copy link
Member Author

I just contributed the last remaining improvements to the typings for the createNormalizer function.

I also rebased on master.

JSON doesn't constrain numeric values to be specific sizes. In most
languages this isn't an issue thanks to proper support for a wide
variety of number precision.

Unfortunately JavaScript struggles with integer numbers bigger than
`Number.MAX_SAFE_INTEGER`. This is due to JavaScript's implementation of
integer values as `double`. For anything bigger than this max safe
integer value JavaScript provides a `BigInt` type.

The first issue with this behavior is that when parsing JSON, there is
no way of telling which "number flavor" to expect, APIs just don't allow
us to define so. Another issue is that the default JSON
serializer/deserializer doesn't even know how to handle `BigInt`.

This commit fixes those issues as followed:

1. Use a custom JSON parser `json-bigint` that can serialize and
   deserialze `BigInt` values. A quirk is that it will only deserialize
   a number into a `BigInt` if the serialized value ends up being bigger
   than JavaScript's max safe integer. This means there is ambiguity
   when deserialing data.
2. To address the ambiguity issue, I added a TypeScript mapped type
   `Deserialized<T>` that given a type `T` will replace all fields
   that are either `bigint` or `number`. Note that this is only static
   typing to teach TypeScript about this behavior. Then when receiving
   messages, one has to define `Normalizer` functions which will take
   this ambiguous type as input and return a "normalized" version that
   matches the original type `T`. See this as post-processing to make
   sure the received data is using the proper data types in the right
   places.
3. To help with the normalization I added a `createNormalizer` helper
   function that should statically figure out what fields need to be
   normalized for a given type and ensure that those are properly
   handled.
4. Rewrite all the tests to validate this logic using test data as
   coming out of the current Trace Server.

Signed-off-by: Paul Marechal <paul.marechal@ericsson.com>
Co-authored-by: Patrick Tasse <patrick.tasse@ericsson.com>
@paul-marechal
Copy link
Member Author

I rebased and squashed all the commits. I know this is a big change but ultimately it does a lot of good with the test suite. What bothers me is how I had to implement a mini serialization framework to handle the big int values in structs on a per-case basis. The root of the issue is that there's no type reflection in TypeScript: the interfaces are erased from the runtime. So if we want runtime mechanics based on interfaces, we have to go the way I've done: explicitly create functions for each type, as a separate entity.

@paul-marechal
Copy link
Member Author

cc @bhufmann @PatrickTasse please tell me if anything bothers you with this PR.

@bhufmann
Copy link
Collaborator

cc @bhufmann @PatrickTasse please tell me if anything bothers you with this PR.

@PatrickTasse could you please continue with your review?

/**
* `true` if `T` must be normalized, `false` otherwise.
*/
export type MustBeNormalized<T> =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologize if this is not right place, I was looking through the PR and wonder how does this type work ? I can't seem to understand the concept, especially with the 0 and 1 type being used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pain.

  1. I want to check if Deserialized<T> is assignable to T. On paper, this should only fail when something like number gets converted into bigint | number in Deserialized<T>. If T is "safe" then Deserialized<T> will be assignable to T and everyone is happy.
  2. Let's assume we use any in our type T: then Deserialized<T> will be compatible with T, but I wanted this to fail because technically a number could slide into it and get deserialized as bigint | number.
  3. Only any and unknown seem to be assignable to unknown
  4. 0 is not assignable to 1
  5. To fix (2) I replace occurrences of any by 0 on one side, and by 1 on the other side so that fields typed as any forcefully conflict with one another.

Doing this ensures that any fields get marked as "must be normalized".

@haoadoreorange
Copy link

haoadoreorange commented Nov 23, 2021

1/ I see that this PR also use the custom parser for serialization, is that necessary ? I don't see the ambiguity when serializing bigint (Correct me if I'm wrong!). The reason I'm asking is the custom parser might take perf hit compare to current implementation (default parser + regex)

2/ Provide that it's necessary, does it call toJSON under the hood (I suppose not) and suffer from the same prob that I made the PR #41 for ?

EDIT: typo

@paul-marechal
Copy link
Member Author

paul-marechal commented Nov 23, 2021

@haoadoresorange

  1. RegExps can be slow and while it might work, the whole "let's find and replace bigint strings by a literal number using RegExp" keeps me up at night. I haven't run benchmarks, but I'd be curious to see what's the perf hit between the two for large TSP messages.
  2. The third-party JSON serializer/deserializer still calls the native implementation on non-bigint values IIUC, so it should call .toJSON there.

@haoadoreorange
Copy link

haoadoreorange commented Nov 23, 2021

I haven't run benchmarks, but I'd be curious to see what's the perf hit between the two for large TSP messages

I ran a quick benchmark on stringify (default + regex vs json-bigint) here.
TL;DR: There's perf hit when there's no BigInt in the object, and about the same with BigInt

EDIT: I ran a couple more for different object size and the result is fairly consistent. There's a ~30% perf loss when using the custom stringify from json-bigint for objects with small amount of BigInt involved. I suggest we leave it as is for the stringify part, what do you think ?

Copy link
Contributor

@PatrickTasse PatrickTasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this awesome contribution!

Copy link
Collaborator

@bhufmann bhufmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. Looks good to me. I tested within the theia-trace-extension and it works well. Thanks also to enhance the unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants