Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose position and/or input of source text? #10

Open
gibson042 opened this issue Feb 6, 2020 · 12 comments
Open

Expose position and/or input of source text? #10

gibson042 opened this issue Feb 6, 2020 · 12 comments

Comments

@gibson042
Copy link
Collaborator

String.prototype.replace passes position and input arguments to replacer functions and the return value from RegExp.prototype.exec has "index" and "input" properties; JSON.parse could behave similarly.

const input = '\n\t"use\\u0020strict"';
let spied;
const parsed = JSON.parse(input, (key, val, context) => (spied = context, val));
parsed === 'use strict';
// → true
spied.source === '"use\\u0020strict"';
// → true
spied.index === 2;
// → true
spied.input === input;
// → true

As noted by @rbuckton, this could be useful for error reporting.

@ljharb
Copy link
Member

ljharb commented Feb 6, 2020

Please do not repeat the regex match object pattern of adding properties to an array :-)

@gibson042
Copy link
Collaborator Author

No, it would be an object. 😅

@michaelficarra
Copy link
Member

I'm concerned about the use of insignificant things such as whitespace, key ordering, etc to smuggle additional information. See my comments from February.

@gibson042
Copy link
Collaborator Author

Such a channel already exists in the form of confusable glyphs/invisible whitespace/excess numeric precision/etc., not to mention extra object members and/or array items (and key ordering in particular is unaffected by this proposal; revivers are already called in source text order). That said, it's true that the bandwidth for such information would be dramatically increased. But is it really appropriate to try protecting against the extremely narrow scenarios in which a potentially malicious reviver is used with JSON.parse (e.g., that Eve doesn't have the ability to perform arbitrary inspection but does have unfettered access to deserialized values)? The fix for that would be straightforward—don't pass through source/input/index:

-JSON.parse(src, untrusted)
+JSON.parse(src, function scrubber(key, val, { keys }) { return call(untrusted, this, key, val, { keys }) })

@michaelficarra
Copy link
Member

@gibson042 I think you misunderstood my concern. I am not worried about an adversarial scenario. I am worried about well-meaning developers deliberately (ab)using these insignificant encoding differences to transmit additional, extra-JSON information. I would prefer that JSON.parse only be able to act on the significant parts of the input.

@gibson042
Copy link
Collaborator Author

This proposal necessarily introduces the ability to differentiate between input like 4 vs. 4.0 and "𝌆" vs. "\uD834\uDF06", which already seems to trip your concern. But beyond that, it currently also exposes whitespace at the non-leaf level (e.g., JSON.parse("0\t", (key, val, {source}) => source) === "0\t"). It would be possible, however, to mitigate that by only providing source text for primitives, reducing the available rope but also the internal consistency. How strong would you say your concerns are?

@michaelficarra
Copy link
Member

I think that strikes a good balance. We don't have to eliminate the possibility, we just have to limit it enough to make it unappealing.

@gibson042
Copy link
Collaborator Author

To clarify, is that support for a) exposing source text with all values but input and index for none, or b) taking the further step of denying the ability to read source text for objects and arrays?

@michaelficarra
Copy link
Member

I think the only loss of (significant) information in JavaScript's interpretation of JSON is in the numbers/strings, right? In which case B would be preferable.

@gibson042
Copy link
Collaborator Author

There was consensus on TC39 Incubator call to expose source text only for primitive values, and in particular to not provide input and index, unless a significant use case is identified.

@bergus
Copy link

bergus commented Dec 4, 2020

index (actually, line number and position) would be very relevant if I use JSON.parse for deserialising values that are encoded as a custom scheme, where the decoding encounters a parse error. This would allow the parser to throw an appropriate human-readable error message, aiding in finding the malformed part of the source file. Not sure if that would count as a significant use case?

Examples:

// parsing JSON-LD
JSON.parse(text, (key, val, context) => {
  if (key == "@type" || key == "@context") {
    try {
      return new URL(val);
    } catch(err) {
      throw Obect.assign(new SyntaxError(err.message + ` at line ${context.line}, character ${context.position}`), err);
    }
  } else {
    return val;
  }
});
// parsing abstract syntax tree
JSON.parse(text, (key, val, context) {
  if (typeof val == "object" && val.type == "TreeNode") {
    if (!Array.isArray(val.children))
      throw new SyntaxError(`Encountered TreeNode without children at line ${context.line}, character ${context.position}`);
    return new TreeNode(val.children);
  }
  return val;
});

@sarahghp
Copy link

Agree that "throwing indicative errors" is a strong use-case for exposing position information. Especially with other data types in the pipeline (Decimal, etc.), establishing a generalized serialization pattern with indicative errors would be a strong base for growing functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants