Skip to content

feat(napi/parser)!: add range option #11728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Jun 24, 2025

Conversation

bacarybruno
Copy link
Contributor

@bacarybruno bacarybruno commented Jun 15, 2025

Heyo team 👋!

I took a shot at #10307 to add loc and range fields. I followed this suggestion #10307 (comment) by making the fields configurable.

I needed to get this information from oxc-parser for different use cases including writing an eslint parser. Is this gets merged I would also need/like to add the a tokens option (to be discussed).

The JSDoc comments are based on https://typescript-eslint.io/packages/typescript-estree/#parsecode-options

P.S: I'm new to Rust and this is my first contrib on this repo, so feel free to suggest improvements. That would be really appreciated!

Copy link
Contributor

graphite-app bot commented Jun 15, 2025

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

@github-actions github-actions bot added A-parser Area - Parser A-ast Area - AST A-ast-tools Area - AST tools C-enhancement Category - New feature or request labels Jun 15, 2025
@overlookmotel overlookmotel changed the title feat(parser): add loc and range feat(ast/estree): add loc and range Jun 16, 2025
Copy link

codspeed-hq bot commented Jun 16, 2025

CodSpeed Instrumentation Performance Report

Merging #11728 will not alter performance

Comparing bacarybruno:oxc-10307-ast-loc-range (b02d526) with main (d991fed)

Summary

✅ 38 untouched benchmarks

Copy link
Contributor

@overlookmotel overlookmotel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for tackling this. Given that you say you're new to Rust, very impressive that you managed to figure out the codegen as well as everything else!

I've made a few comments below, but the larger problem is that, while the implementation for ranges is good, I don't think loc is going to work correctly.

loc

The problem is UTF-8 vs UTF-16 offsets. Rust works with UTF-8 strings, whereas JS works in UTF-16. So there's an AST pass that happens before serialization which converts all the offsets from UTF-8 to UTF-16:

let mut comments =
convert_utf8_to_utf16(&source_text, &mut program, &mut module_record, &mut errors);

/// Convert spans to UTF-16
pub fn convert_utf8_to_utf16(
source_text: &str,
program: &mut Program,
module_record: &mut ModuleRecord,
errors: &mut [OxcError],
) -> Vec<Comment> {
let span_converter = Utf8ToUtf16::new(source_text);
span_converter.convert_program(program);
// Convert comments
let mut offset_converter = span_converter.converter();
let comments = program
.comments
.iter()
.map(|comment| {
let value = comment.content_span().source_text(source_text).to_string();
let mut span = comment.span;
if let Some(converter) = offset_converter.as_mut() {
converter.convert_span(&mut span);
}
Comment {
r#type: match comment.kind {
CommentKind::Line => String::from("Line"),
CommentKind::Block => String::from("Block"),
},
value,
start: span.start,
end: span.end,
}
})
.collect::<Vec<_>>();
// Convert spans in module record to UTF-16
span_converter.convert_module_record(module_record);
// Convert spans in errors to UTF-16
if let Some(mut converter) = span_converter.converter() {
for error in errors {
for label in &mut error.labels {
converter.convert_offset(&mut label.start);
converter.convert_offset(&mut label.end);
}
}
}
comments
}

During serialization, you're using Rope to convert offsets to line/column. But Rope expects UTF-8 offsets, and by this point we've already converted them to UTF-16.

If the whole source text is ASCII, then there's no difference, but I think you'll find loc is incorrect for source text like this:

[
"🍄🍄🍄",
123
]

Solving this problem is not so easy (especially making it work and performant).

I propose:

  1. Scale back the scope of this PR to only add ranges.
  2. I'll write up on #10307 the steps I think we'd need to take to support loc too, and we can discuss.

Custom serializers

This point applies to range too: range field also needs to be added manually in all the hand-written ESTree impls in serialize directory.

We'll also need to support it in raw transfer, but you can leave that to me - raw transfer is a bit of a minefield.

Ditto we'll need conformance testing for range, but I can handle that.

Tests

In absence of conformance test coverage, could you please add some tests for the range option to https://github.com/oxc-project/oxc/blob/main/napi/parser/test/parse.test.ts.

No need for anything fancy, just 3 tests which check the output with ranges: false, ranges: true, and ranges: undefined.

@bacarybruno bacarybruno changed the title feat(ast/estree): add loc and range feat(ast/estree): add range Jun 16, 2025
@overlookmotel
Copy link
Contributor

I see you're making changes after my feedback, so marking as draft for now. Please let me know when it's ready for review again.

And thank you for your efforts!

@overlookmotel overlookmotel marked this pull request as draft June 17, 2025 13:13
@overlookmotel overlookmotel marked this pull request as draft June 17, 2025 13:13
@bacarybruno bacarybruno force-pushed the oxc-10307-ast-loc-range branch from 1690b28 to d1ecb1c Compare June 17, 2025 23:26
@overlookmotel
Copy link
Contributor

I know this is still a work in progress, so apologies for making comments prematurely. I just wanted to give feedback while I had the chance, as I'm going to be away from keyboard over the weekend.

@bacarybruno
Copy link
Contributor Author

@overlookmotel I think I've gotten closer to what you had in mind. I'll continue when you get back from the weekend. Have a good weekend and thanks again for your help 🙏
I'k keeping the PR in draft in the meantime

@overlookmotel overlookmotel changed the title feat(ast/estree): add range feat(napi/parser)!: add range option Jun 24, 2025
@overlookmotel overlookmotel force-pushed the oxc-10307-ast-loc-range branch from f8e4f58 to afcb64f Compare June 24, 2025 01:06
Copy link
Contributor

@overlookmotel overlookmotel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this doesn't come across as rude, but I'm going away tomorrow for a week, and didn't want to leave your PR hanging any longer, so I've pushed some commits to finish it off. Going to merge it now.

We have some follow-up work to complete after this is merged. I'll make a comment on #10307 in case you want to continue on with this work (please do!)

@overlookmotel overlookmotel marked this pull request as ready for review June 24, 2025 01:10
@overlookmotel overlookmotel merged commit 9a2548a into oxc-project:main Jun 24, 2025
25 checks passed
overlookmotel pushed a commit that referenced this pull request Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast Area - AST A-ast-tools Area - AST tools A-parser Area - Parser C-enhancement Category - New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants