Skip to content
Emacs Lisp binding and tooling for tree-sitter
Rust Emacs Lisp PowerShell Shell Makefile
Branch: master
Clone or download

README.md

emacs-tree-sitter

Build Status Build Status

This is an Emacs Lisp binding for tree-sitter, an incremental parsing library.

It aims to be the foundation for a new breed of Emacs packages that understand code structurally. For example:

  • Faster, fine-grained code highlighting.
  • More flexible code folding.
  • Structural editing (like Paredit, or even better) for non-Lisp code.
  • More informative indexing for imenu.

The author of tree-sitter articulated its merits a lot better in this Strange Loop talk.

Pre-requisites

  • Emacs 25.1 or above, built with module support. This can be checked with (functionp 'module-load).
  • Rust toolchain, to build the dynamic module.
  • tree-sitter CLI tool, to build loadable language files from grammar repos.
  • clang, to generate the raw Rust binding for emacs-module.h.

Building and Installation

  • Clone this repo.
  • Build:
    make build
  • Add this repo's directory to load-path.

Getting Language Files

This package is currently not bundled with any language. Parsers for individual languages are to be built and loaded from shared dynamic libraries, using tree-sitter CLI tool.

  • Install tree-sitter CLI tool (if you don't use NodeJS, you can download the binary directly from GitHub):
    # For yarn user
    yarn global add tree-sitter-cli
    
    # For npm user
    npm install -g tree-sitter-cli
  • Use the wrapper script bin/ensure-lang to download the grammar from tree-sitter's GitHub org and build the parser:
    # The shared lib will be at ~/.tree-sitter/bin/rust.so (.dll on Windows)
    make ensure/rust
  • Load it in Emacs:
    (require 'tree-sitter)
    (ts-require-language 'rust)

Basic Usage

  • Enable tree-sitter in a major mode:
    (require 'tree-sitter)
    ;;; Assuming ~/.tree-sitter/bin/rust.so was already generated.
    (add-hook 'rust-mode-hook #'tree-sitter-mode)
  • Show the debug view of a buffer's parse tree:
    (require 'tree-sitter-debug)
    (tree-sitter-debug-enable)
  • Customize the language to use for a major mode:
    (add-to-list 'tree-sitter-major-mode-language-alist '(scala-mode . scala))
  • Use the lower-level APIs directly:
    (setq rust (ts-require-language 'rust))
    (setq parser (ts-make-parser))
    (ts-set-language parser rust)
    
    ;;; Parse a simple string.
    (ts-parse-string parser "fn foo() {}")
    
    ;;; Incremental parsing.
    (with-temp-buffer
      (insert-file-contents "src/types.rs")
      (let* ((tree)
             (initial (benchmark-run (setq tree (ts-parse parser #'ts-buffer-input nil))))
             (reparse (benchmark-run (ts-parse parser #'ts-buffer-input tree))))
        ;; Second parse should be much faster than the initial parse, especially as code size grows.
        (message "initial %s" initial)
        (message "reparse %s" reparse)))

APIs

  • The tree-sitter doc is a good read to understand its concepts, and how to use the parsers in general.
  • Functions in this package are named differently, to be more Lisp-idiomatic. The overall parsing flow stays the same.
  • Documentation for individual functions can be viewed with C-h f (describe-function), as usual.
  • A symbol in the C API is actually the ID of a type, so it's called type-id in this package.

Types

  • language, parser, tree, node, cursor: corresponding tree-sitter types, embedded in user-ptr objects.
  • point: a vector in the form of [row column], where row and column are zero-based. This is different from Emacs's concept of "point". Also note that column counts bytes, unlike the current built-in function current-column.
  • range: a vector in the form of [start-point end-point].

These types are understood only by this package. They are not recognized by type-of, but have corresponding type-checking predicates, which are useful for debugging: ts-language-p, ts-tree-p, ts-node-p...

Functions

  • Language:
    • ts-require-language: like require, for tree-sitter languages.
  • Parser:
    • ts-make-parser: create a new parser.
    • ts-set-language: set a parser's active language.
    • ts-parse-string: parse a string.
    • ts-parse: parse with a text-generating callback.
    • ts-set-included-ranges: set sub-ranges when parsing multi-language text.
  • Tree:
    • ts-root-node: get the tree's root node.
    • ts-edit-tree: prepare a tree for incremental parsing.
    • ts-changed-ranges: compare 2 trees for changes.
    • ts-tree-to-sexp: debug utility.
  • Cursor:
    • ts-make-cursor: obtain a new cursor from either a tree or a node.
    • ts-goto- functions: move to a different node.
    • ts-current- functions: get the current field/node.
  • Node:
    • ts-node- functions: node's properties and predicates.
    • ts-get- functions: get related nodes (parent, siblings, children, descendants).
    • ts-count- functions: count child nodes.
    • ts-mapc-children: loops through child nodes.
    • ts-node-to-sexp: debug utility.
  • Query:
    • ts-make-query: create a new query.
    • ts-make-query-cursor: create a new query cursor.
    • ts-query-matches, ts-query-captures: execute a query, returning matches/captures.
    • ts-set-byte-range, ts-set-point-range: limit query execution to a range.

Development

  • Testing:
    make test
  • Continuous testing (requires cargo-watch):
    make watch

You can optionally export an environment variable called EMACS which will make tests to use a different binary of GNU Emacs (i.e.: EMACS=/snap/bin/emacs make test), otherwise the binary located with which emacs is used instead.

On Windows, use PowerShell to run the corresponding .ps1 scripts which are ./bin/build.ps1, ./bin/ensure-lang.ps1 (this is used like this ./bin/ensure-lang.ps1 rust) and ./bin/test.ps1.

Alternative

Binding through C instead of Rust: https://github.com/karlotness/tree-sitter.el

You can’t perform that action at this time.