Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tighten the definition of "column" #5

Closed
tromey opened this issue Sep 7, 2017 · 15 comments
Closed

Tighten the definition of "column" #5

tromey opened this issue Sep 7, 2017 · 15 comments

Comments

@tromey
Copy link

tromey commented Sep 7, 2017

The source map spec doesn't describe the units for columns -- just that they are zero-based. It turns out that Gecko uses UTF-16 code points, and that seems to be a common, but not universal, choice. It would improve the spec if it were explicit about the units.

@littledan
Copy link
Member

The WebAssembly source map integration defines column by byte.

@mitsuhiko
Copy link
Contributor

I would like to revisit this. Firefox today is the only engine that does not use UTF-16 codepoints and apparently switched at one point at random. I would propose to once and for all specify this to mean UTF-16 as this matches what most tools today assume.

@mitsuhiko
Copy link
Contributor

There seems to be some confusion here on what is currently happening so I will try to collect some data until the next meeting:

  • What do browsers report in error.stack for columns? (Chrome, Firefox, Edge, Safari)
  • What do browsers report in the devtools UI for columns?
  • What offsets are generated by source map generators today (webpack, etc.)

@mitsuhiko
Copy link
Contributor

mitsuhiko commented May 8, 2023

So here are some of my findings. Take this file:

<!doctype html>
<meta charset=utf-8>
<title>Column Test</title>
<script>
function fail() {
  throw new Error();
}

function logColumn(c, e) {
  const lines = e.stack.split(/\r?\n/g);
  if (lines[0] === 'Error') {
    lines.shift();
  }
  console.log(c, parseInt(lines[1].match(/:\d+:(\d+)/)[1]));
}

try {
  /* xxxxx */fail();
} catch (e) {
  logColumn("x", e);
}
try {
  /* ☃️☃️☃️☃️☃️ */fail();
} catch (e) {
  logColumn("☃️", e);
}
try {
  /* 🔥🔥🔥🔥🔥 */fail();
} catch (e) {
  logColumn("🔥", e);
}
try {
  /* 👩‍👩‍👧👩‍👩‍👧👩‍👩‍👧👩‍👩‍👧👩‍👩‍👧 */fail();
} catch (e) {
  logColumn("👩‍👩‍👧", e);
}
</script>

The calculated offsets of the throw keyword are the following (utf-8 offset, utf-16 offset, unicode codepoint offset aka utf-32):

/* xxxxx */: 14/14/14
/* ☃️☃️☃️☃️☃️ */: 39/19/19
/* 🔥🔥🔥🔥🔥 */: 29/19/14
/* 👩‍👩‍👧👩‍👩‍👧👩‍👩‍👧👩‍👩‍👧👩‍👩‍👧 */: 99/49/34

What browsers report:

Safari: (UTF-16 + 4 (for fail))

  • x – 18
  • ☃️ – 23
  • 🔥 – 23
  • 👩‍👩‍👧 – 53

Firefox: (Unicode Codepoints / UTF-32)

  • x 14
  • ☃️ 19
  • 🔥 14
  • 👩‍👩‍👧 34

Chrome: (UTF-16)

  • x 14
  • ☃️ 19
  • 🔥 19
  • 👩‍👩‍👧 49

Edge: (UTF-16)

  • x 14
  • ☃️ 19
  • 🔥 19
  • 👩‍👩‍👧 49

Node: (UTF-16)

  • x 14
  • ☃️ 19
  • 🔥 19
  • 👩‍👩‍👧 49

Deno: (UTF-16)

  • x 14
  • ☃️ 19
  • 🔥 19
  • 👩‍👩‍👧 49

Conclusion: all runtimes but Firefox report UTF-16 offsets, Safari seems to disagree as the only browser about which column to report for a failing function call.

@jkrems
Copy link
Contributor

jkrems commented May 8, 2023

Really nice data and +1 for the set of test cases! Since Chrome/Edge/Node/Deno are all V8 under the hood, their consistent behavior is expected. I'm pretty sure the JS engines populate this property without involvement (by default) from embedding runtimes/browsers. It's really:

  • V8 (UTF-16)
  • JavaScriptCore (UTF-16 + "what's a call site" 4)
  • Spidermonkey (UTF-32)

So no two engines actually agree in this case, although V8 and JavaScriptCore disagree on something not related to column offset calculations.

Btw, I'm not sure if it's still maintained but eshost-cli could be a nice way of running these test cases across runtimes: https://github.com/bterlson/eshost-cli. If nothing else, the page with Supported Hosts is a nice potential checklist.

@mitsuhiko
Copy link
Contributor

Given that this is mostly determined by browsers I will skip evaluating what the generators currently output. From my experience at processing a lot of source maps at Sentry, the general consensus is pretty strong for it to be counted in UTF-16 offsets.

@littledan
Copy link
Member

If someone is available to volunteer, it'd be great to see this analysis of generators. The goal is to get in the habit of testing sourcemap generators anyway, so we can file bugs against them and get everyone aligned. Also this is something of an edge case, so I wouldn't be shocked to see someone who we should already file a bug against (as soon as we agree on the definition)--so this testing should be directly useful for that.

@tromey
Copy link
Author

tromey commented May 17, 2023

So here are some of my findings. Take this file:

I haven't worked in this area in a while, so take this with a grain of salt, but it's important to check CSS as well.

@mitsuhiko
Copy link
Contributor

I tried but I have not found a place in the browsers where columns are displayed for CSS source maps, thus I was unable to determine based on observation in the browser how columns are handled. However libsass produces UTF-16 columns:

foo.scss

body {
  /*🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥*/background:/*🔥*/red;
  /*🔥🔥🔥*/color:/*🔥*/blue;
}

foo.css

body{background:red;color:blue}/*# sourceMappingURL=foo.css.map */

foo.css.map

{"version":3,"sourceRoot":"","sources":["foo.scss"],"names":[],"mappings":"AAAA,KAC8B,eAClB","file":"foo.css"}%

Decoded:

  0:0 -> foo.scss:0:0
  0:5 -> foo.scss:1:30
  0:20 -> foo.scss:2:12

If you look at the tokens referenced:

1:30 -> background:/*🔥*/red;
2:12 -> color:/*🔥*/blue;

@tromey
Copy link
Author

tromey commented May 26, 2023

I tried but I have not found a place in the browsers where columns are displayed for CSS source maps, thus I was unable to determine based on observation in the browser how columns are handled.

It's possible you could do it in Firefox by having CSS that uses non-base-plane characters and then provokes a warning later on the same line. Then, look at the warning in the console. IIRC (it's been a while) the devtools will apply source maps in this case.

@mitsuhiko
Copy link
Contributor

To drive this discussion forward, here is my proposal of what should be added to the spec:

The definition for columns in source maps can depend on the format. For JavaScript and CSS based source maps are defined to be in UTF-16 code units analogous to JavaScript string indexes. That means that "A" (LATIN CAPITAL LETTER A) measures 1 code unit, and "🔥" (FIRE) measures 2 code units. Source maps for other formats (for instance web assembly) might diverge from this.

@JannikGM
Copy link

Can this be closed since tc39/source-map-spec#8 was merged?

@mitsuhiko
Copy link
Contributor

Yes. I consider this closed. We can follow up on other formats if the need arises.

@jkup
Copy link
Collaborator

jkup commented Jul 26, 2023

I think we were still waiting to hear back from some folks about WASM? I'm happy to close this an open a separate ticket for that since the above PR has been merged.

@JSMonk
Copy link
Collaborator

JSMonk commented Jul 26, 2023

The PR for the wasm columns definition: tc39/source-map-spec#14

@jkup jkup closed this as completed Jul 27, 2023
nicolo-ribaudo pushed a commit to nicolo-ribaudo/source-map that referenced this issue Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants