Skip to content

Conversation

@molyee
Copy link

@molyee molyee commented Jul 30, 2025

Rationale for this change

This change improves the flexibility of the SQL parser by allowing clients to pass a pre-tokenized input stream and an explicit dialect for parsing.
It enables custom tokenization workflows — including external tokenizers, dialect-specific preprocessing, or token manipulation — to be performed outside the DataFusion crate, without modifying internal parser logic.
The default behavior remains unchanged, ensuring full backward compatibility.

What changes are included in this PR?

  • Introduced DFParser::from_dialect_and_tokens for creating a parser with an externally provided tokens.
  • Added DFParser::parse_tokens_with_dialect for parsing SQL statements from a given token sequence and dialect.
  • Updated existing DFParser's initialization (new, new_with_dialect) and functions (parse_sql, parse_sql_with_dialect) to internally delegate to the new functions, preserving original semantics.
  • Existing tests now implicitly exercise the updated code paths using the default tokenizer and dialect.

Note: The sqlparser dependency remains pinned to a forked version (tarantool/datafusion-sqlparser-rs, branch release-42.0.0) that supports external tokenization workflows via iterator-based tokenizers. The parser itself does not depend on these tokenizers directly.

Are these changes tested?

No new tests were added. The changes are fully integrated into the existing parsing flow, and all current tests pass. This confirms correctness and compatibility of the new implementation.

Are there any user-facing changes?

Yes, two new opt-in APIs are now exposed in the parser module, enabling advanced usage scenarios with pre-tokenized SQL input. These changes do not affect existing behavior or interfaces and are fully backward-compatible.

@molyee molyee changed the title Support parsing sql with custom tokenizer Support parsing with prepared tokens Jul 31, 2025
@0x501D 0x501D merged commit 047d7fb into tarantool:release-42.0.0 Aug 1, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants