-
-
Notifications
You must be signed in to change notification settings - Fork 617
feat(napi/parser)!: add range
option
#11728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(napi/parser)!: add range
option
#11728
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. |
loc
and range
CodSpeed Instrumentation Performance ReportMerging #11728 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for tackling this. Given that you say you're new to Rust, very impressive that you managed to figure out the codegen as well as everything else!
I've made a few comments below, but the larger problem is that, while the implementation for ranges
is good, I don't think loc
is going to work correctly.
loc
The problem is UTF-8 vs UTF-16 offsets. Rust works with UTF-8 strings, whereas JS works in UTF-16. So there's an AST pass that happens before serialization which converts all the offsets from UTF-8 to UTF-16:
Lines 91 to 92 in ea6ce9d
let mut comments = | |
convert_utf8_to_utf16(&source_text, &mut program, &mut module_record, &mut errors); |
oxc/crates/oxc_napi/src/lib.rs
Lines 12 to 59 in ea6ce9d
/// Convert spans to UTF-16 | |
pub fn convert_utf8_to_utf16( | |
source_text: &str, | |
program: &mut Program, | |
module_record: &mut ModuleRecord, | |
errors: &mut [OxcError], | |
) -> Vec<Comment> { | |
let span_converter = Utf8ToUtf16::new(source_text); | |
span_converter.convert_program(program); | |
// Convert comments | |
let mut offset_converter = span_converter.converter(); | |
let comments = program | |
.comments | |
.iter() | |
.map(|comment| { | |
let value = comment.content_span().source_text(source_text).to_string(); | |
let mut span = comment.span; | |
if let Some(converter) = offset_converter.as_mut() { | |
converter.convert_span(&mut span); | |
} | |
Comment { | |
r#type: match comment.kind { | |
CommentKind::Line => String::from("Line"), | |
CommentKind::Block => String::from("Block"), | |
}, | |
value, | |
start: span.start, | |
end: span.end, | |
} | |
}) | |
.collect::<Vec<_>>(); | |
// Convert spans in module record to UTF-16 | |
span_converter.convert_module_record(module_record); | |
// Convert spans in errors to UTF-16 | |
if let Some(mut converter) = span_converter.converter() { | |
for error in errors { | |
for label in &mut error.labels { | |
converter.convert_offset(&mut label.start); | |
converter.convert_offset(&mut label.end); | |
} | |
} | |
} | |
comments | |
} |
During serialization, you're using Rope
to convert offsets to line/column. But Rope
expects UTF-8 offsets, and by this point we've already converted them to UTF-16.
If the whole source text is ASCII, then there's no difference, but I think you'll find loc
is incorrect for source text like this:
[
"🍄🍄🍄",
123
]
Solving this problem is not so easy (especially making it work and performant).
I propose:
- Scale back the scope of this PR to only add
range
s. - I'll write up on #10307 the steps I think we'd need to take to support
loc
too, and we can discuss.
Custom serializers
This point applies to range
too: range
field also needs to be added manually in all the hand-written ESTree
impls in serialize
directory.
We'll also need to support it in raw transfer, but you can leave that to me - raw transfer is a bit of a minefield.
Ditto we'll need conformance testing for range
, but I can handle that.
Tests
In absence of conformance test coverage, could you please add some tests for the range
option to https://github.com/oxc-project/oxc/blob/main/napi/parser/test/parse.test.ts.
No need for anything fancy, just 3 tests which check the output with ranges: false
, ranges: true
, and ranges: undefined
.
loc
and range
range
I see you're making changes after my feedback, so marking as draft for now. Please let me know when it's ready for review again. And thank you for your efforts! |
1690b28
to
d1ecb1c
Compare
I know this is still a work in progress, so apologies for making comments prematurely. I just wanted to give feedback while I had the chance, as I'm going to be away from keyboard over the weekend. |
@overlookmotel I think I've gotten closer to what you had in mind. I'll continue when you get back from the weekend. Have a good weekend and thanks again for your help 🙏 |
Restore comment 2
range
range
option
f8e4f58
to
afcb64f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope this doesn't come across as rude, but I'm going away tomorrow for a week, and didn't want to leave your PR hanging any longer, so I've pushed some commits to finish it off. Going to merge it now.
We have some follow-up work to complete after this is merged. I'll make a comment on #10307 in case you want to continue on with this work (please do!)
To continue on this work #11728 Context: #10307 (comment)
Heyo team 👋!
I took a shot at #10307 to add
loc
andrange
fields. I followed this suggestion #10307 (comment) by making the fields configurable.I needed to get this information from oxc-parser for different use cases including writing an eslint parser. Is this gets merged I would also need/like to add the a
tokens
option (to be discussed).The JSDoc comments are based on https://typescript-eslint.io/packages/typescript-estree/#parsecode-options
P.S: I'm new to Rust and this is my first contrib on this repo, so feel free to suggest improvements. That would be really appreciated!