You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to combine ts_tree_get_changed_ranges and ts_query_cursor_set_byte_range to find all ranges with syntax changes during an edit, to avoid running the language query against the full document for every edit. As far as I can tell, this combination works as expected to capture nodes whose syntax has changed, but not those that have been deleted.
My basic setup is to hold a list of byte-deltas holding the bytes that transition to a new query-capture-ID, but my question doesn't really depend on knowing my specific data structure - basically I want to update some mapping of bytes to capture IDs, doing this while only re-querying the changed node ranges. I've been using a combination of:
Manually incrementing/decrementing deltas following an edit range
Manually deleting all transition points within a deleted range, and
Deleting & inserting all node byte ranges using capture.node for all ts_query_cursor_next_capture matches.
I keep feeling very close but find new edge cases that don't work. For example, let's say I have a valid document with two valid statements
i = 1100;
j = 2200;
and I make an edit to delete the first statement to produce the document
num_changed_ranges will be 0, since the nodes that are still present have not had their syntactic structure changed.
So my first naive approach was to manually delete the range of query capture transitions (my text editor's basic highlighting data structure) in the deleted range. This works fine for most cases, but let's say I then make an edit to delete bytes 3-6 to change j = 2200; into j =0;.
This document has the same syntactic structure as j = 2200;, so again, ts_tree_get_changed_ranges will return num_changed_ranges == 0. I'm having difficulty finding robust ways of distinguishing between all cases without tree-sitter returning a list of deleted nodes.
Here is most of my actual code to better understand what I'm trying to accomplish.
Some of this might be a bit unclear - CaptureIdTransitions is an instance of a data structure that manages a vector of capture-ID transition delta-bytes, keeping track of the cumulative ByteIndex using an iterator, with Insert and Delete methods that handle neighbor delta updates as needed etc. - basically it manages the following data structure:
if (Tree != nullptr) {
for (constauto &edit : edits) {
const TSInputEdit ts_edit{.start_byte = edit.StartByte, .old_end_byte = edit.OldEndByte, .new_end_byte = edit.NewEndByte};
ts_tree_edit(Tree, &ts_edit);
}
}
auto *old_tree = Tree;
Tree = ts_parser_parse(Parser, Tree, Input);
/* Update capture ID transition points (used for highlighting) based on the query and the edits. */// Find the minimum range needed to span all nodes whose syntactic structure has changed.
ByteRange changed_range = {UINT32_MAX, 0u};
if (old_tree != nullptr) {
uint num_changed_ranges;
const TSRange *changed_ranges = ts_tree_get_changed_ranges(old_tree, Tree, &num_changed_ranges);
for (uint i = 0; i < num_changed_ranges; ++i) {
changed_range.Start = std::min(changed_range.Start, changed_ranges[i].start_byte);
changed_range.End = std::max(changed_range.End, changed_ranges[i].end_byte);
}
free((void *)changed_ranges);
}
constbool any_changed_captures = changed_range.Start < changed_range.End;
if (any_changed_captures) {
ts_query_cursor_set_byte_range(QueryCursor, changed_range.Start, changed_range.End);
}
auto transition_it = CaptureIdTransitions.begin();
// Adjust transitions based on the edited ranges, from last to first.if (CaptureIdTransitions.size() > 1) {
for (constauto &edit : reverse_view(ordered_edits)) {
constuint inc_after_byte = edit.OldEndByte;
transition_it.MoveTo(inc_after_byte);
if (!transition_it.IsEnd()) {
if (transition_it.ByteIndex != inc_after_byte) ++transition_it;
CaptureIdTransitions.Increment(transition_it, edit.NewEndByte - edit.OldEndByte);
}
}
}
// Delete all transitions in deleted ranges? Not right in all cases.// for (const auto &edit : reverse_view(ordered_edits) | filter([](const auto &edit) { return edit.IsDelete(); })) {// CaptureIdTransitions.Delete(transition_it, edit.NewEndByte, edit.OldEndByte);// }if (old_tree == nullptr || any_changed_captures) {
// Either this is the first parse, or the edit(s) affect existing node captures.// Execute the query and add all capture transitions.ts_query_cursor_exec(QueryCursor, Query, ts_tree_root_node(Tree));
TSQueryMatch match;
uint capture_index;
while (ts_query_cursor_next_capture(QueryCursor, &match, &capture_index)) {
const TSQueryCapture &capture = match.captures[capture_index];
// We only store the points at which there is a _transition_ from one style to another.// This can happen either at the capture node's beginning or end.const TSNode node = capture.node;
if (ts_node_child_count(node) > 0) continue; // Only highlight terminal nodes.// Delete invalidated transitions and insert new ones.constauto node_byte_range = ToByteRange(node);
CaptureIdTransitions.Delete(transition_it, node_byte_range.Start, node_byte_range.End);
if (*transition_it != capture.index) {
CaptureIdTransitions.Insert(transition_it, node_byte_range.Start, capture.index);
if (node_byte_range.End != changed_range.End) {
CaptureIdTransitions.Insert(transition_it, node_byte_range.End, NoneCaptureId);
}
}
}
}
// Cleanup: Delete all transitions beyond the new text range.
CaptureIdTransitions.Delete(transition_it, ts_node_end_byte(ts_tree_root_node(Tree)), UINT32_MAX);
I'm probably overcomplicating things and missing something simple since minimal syntax highlighting updates are a primary tree-sitter use case.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi all,
I am trying to combine
ts_tree_get_changed_ranges
andts_query_cursor_set_byte_range
to find all ranges with syntax changes during an edit, to avoid running the language query against the full document for every edit. As far as I can tell, this combination works as expected to capture nodes whose syntax has changed, but not those that have been deleted.My basic setup is to hold a list of byte-deltas holding the bytes that transition to a new query-capture-ID, but my question doesn't really depend on knowing my specific data structure - basically I want to update some mapping of bytes to capture IDs, doing this while only re-querying the changed node ranges. I've been using a combination of:
capture.node
for allts_query_cursor_next_capture
matches.I keep feeling very close but find new edge cases that don't work. For example, let's say I have a valid document with two valid statements
and I make an edit to delete the first statement to produce the document
If I do
num_changed_ranges
will be0
, since the nodes that are still present have not had their syntactic structure changed.So my first naive approach was to manually delete the range of query capture transitions (my text editor's basic highlighting data structure) in the deleted range. This works fine for most cases, but let's say I then make an edit to delete bytes 3-6 to change
j = 2200;
intoj =0;
.This document has the same syntactic structure as
j = 2200;
, so again,ts_tree_get_changed_ranges
will returnnum_changed_ranges == 0
. I'm having difficulty finding robust ways of distinguishing between all cases without tree-sitter returning a list of deleted nodes.Here is most of my actual code to better understand what I'm trying to accomplish.
Some of this might be a bit unclear -
CaptureIdTransitions
is an instance of a data structure that manages a vector of capture-ID transition delta-bytes, keeping track of the cumulativeByteIndex
using an iterator, withInsert
andDelete
methods that handle neighbor delta updates as needed etc. - basically it manages the following data structure:I'm probably overcomplicating things and missing something simple since minimal syntax highlighting updates are a primary tree-sitter use case.
Any help or direction would be much appreciated!
Beta Was this translation helpful? Give feedback.
All reactions