How to preserve spans from the lexer into the parser? #125

kevinbarabash · 2022-04-16T17:40:39Z

I'm struggling to access char spans in my parser when using a two-step lexer+parser approach:

My lexer looks like:

Parser<char, Vec<(Token, Span)>, Error = Simple<char>>

and I started working on a parser that looks like:

Parser<(Token, Span), WithSpan<Expr>, Error = Simple<(Token, Span)>>

In nano_rust.rs, the parser only accepts Token instead of (Token, Span).

I'm going to try doing the same and then using the token spans provided by .map_with_span() in the parser to look up the span in the original char stream from the Vet<(Token, Span)> produced by the lexer.

I was wondering if this is the recommended way of accessing char spans from the parser when using a two-step parsing approach?

The text was updated successfully, but these errors were encountered:

CraftSpider · 2022-04-16T17:47:55Z

Personally, I use a custom Span with two fields.

struct Span {
    source_span: (usize, usize),
    stream_span: (usize, usize),
}

impl chumsky::Span for Span {
    // Return stream span here
}

Then later, you use Stream::from_iter() and construct your spans there from both bits of info.

kevinbarabash · 2022-04-16T20:10:38Z

@CraftSpider I like that the custom Span you suggested provides a way to differentiate source spans from (token) stream spans. Is the idea to use this custom Span for the parser and for the lexer just use (usize, usize)?

CraftSpider · 2022-04-16T20:58:01Z

Yes. I lex with logos, so it just uses its range internally, then when I hit chumsky, I convert to this new differentiated span, and that span gets preserved into future steps.

CraftSpider · 2022-04-16T21:00:10Z

If you want more example code, I can grab some snippets from the implementation

zesterer · 2022-04-16T23:29:13Z

In nano_rust.rs, the parser only accepts Token instead of (Token, Span).

Chumsky supports spans 'natively', so you don't need to mention them in the type of the input token. Although nano_rust's parser takes Token, it still uses the same span internally, which is why the span is preserved until runtime (if you try invoking a runtime error, just as calling a function with the wrong number of arguments, you will see this).

kevinbarabash · 2022-05-14T22:05:56Z

I ended up dropping the separate lexer that was generating a Vac<Token>. I'm working on a parser for a language with syntax very similar to JavaScript (but with more expressions and less statements). As part of this, I'd like to support parsing JSX and I don't think that's possible with a distinct lexer.

kevinbarabash closed this as completed May 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to preserve spans from the lexer into the parser? #125

How to preserve spans from the lexer into the parser? #125

kevinbarabash commented Apr 16, 2022 •

edited

CraftSpider commented Apr 16, 2022

kevinbarabash commented Apr 16, 2022

CraftSpider commented Apr 16, 2022

CraftSpider commented Apr 16, 2022

zesterer commented Apr 16, 2022

kevinbarabash commented May 14, 2022

How to preserve spans from the lexer into the parser? #125

How to preserve spans from the lexer into the parser? #125

Comments

kevinbarabash commented Apr 16, 2022 • edited

CraftSpider commented Apr 16, 2022

kevinbarabash commented Apr 16, 2022

CraftSpider commented Apr 16, 2022

CraftSpider commented Apr 16, 2022

zesterer commented Apr 16, 2022

kevinbarabash commented May 14, 2022

kevinbarabash commented Apr 16, 2022 •

edited