Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use unicode in queries on windows in rust in tree-sitter 0.22.2 #3222

Closed
ahelwer opened this issue Mar 26, 2024 · 0 comments · Fixed by #3223 or tlaplus-community/tree-sitter-tlaplus#111
Labels

Comments

@ahelwer
Copy link
Contributor

ahelwer commented Mar 26, 2024

Problem

On tree-sitter versions 0.21.0 and higher, attempting to compile a query containing a unicode character fails on windows using the rust bindings. The same code succeeds on linux and macOS, and succeeds across all platforms on 0.20.x versions:

use tree_sitter::{Parser, Query, QueryCursor};

fn main() {
    let mut parser = Parser::new();
    parser.set_language(&tree_sitter_test::language()).expect("Error loading grammar");
    let source_code = "op == expr op ≜ expr";
    let tree = parser.parse(source_code, None).unwrap();
    println!("{}", tree.root_node().to_sexp());

    let query = Query::new(&tree_sitter_test::language(), "(def_eq \"\" @def_eq)").unwrap();
    let mut cursor = QueryCursor::new();
    for capture in cursor.matches(&query, tree.root_node(), "".as_bytes()) {
        println!("{:?}", capture);
    }
}

Steps to reproduce

  1. Clone this branch containing a minimal tree-sitter grammar and rust program: https://github.com/ahelwer/tree-sitter-test/tree/windows-unicode
  2. cd into the rust directory and run cargo run

On Windows, this will produce the following error value:

called `Result::unwrap()` on an `Err` value: QueryError { row: 0, column: 9, offset: 9, message: "", kind: NodeType }

On Linux and macOS, it will succeed.

Alternatively, you can see a less minimal cross-platform reproduction of it in this CI run: https://github.com/tlaplus-community/tlauc/actions/runs/8440383121

Expected behavior

I expected to be able to continue using unicode characters in queries on all supported platforms. This behavior worked across all platforms when generating and consuming grammars with tree-sitter 0.20.x.

Tree-sitter version (tree-sitter --version)

0.22.2, 0.21.0 or higher

Operating system/version

N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant