two HiFi tree diff algorithms for after source-to-source transformations (quick-fix, refactoring, formatting) in the IDE #2031

jurgenvinju · 2024-09-12T08:59:25Z

Two algorithms are added to form the keystone elements in hifi source-to-source transformation pipelines written in Rascal. They both transform a coupled pair of parse Trees <original, rewritten> to a list[TextEdit]. The resulting
TextEdits are ready for use in VScode extensions via LSP features in util::LanguageServer and
util::IDEServices.

There are two complementary algorithms here:

treeDiff: works on a pair of (original, rewritten) parse trees and detects the minimal differences without too much damage to whitespace and source code comments.
layoutDiff: works on a pair of (original, formatted) parse trees, where original := formatted must be true, and derives edits to whitespace only, while trying to preserve source code comments and case insensitive literals.

The treeDiff single pass parse tree recursions maps sub-tree and sub-list differences to textual differences in a special way. It lifts on the semantics of special parse tree non-terminals (literals, lexicals, separators) to ignore certain
superfluous changes made in the rewritten tree, focusing only on essential syntactic changes that do change the operational semantics of the source code. As a result the edited source text retains more of its original layout, syntax which does not affect operational semantics, including indentation and comments. As compared to yielding the
rewritten parse tree to a string and replacing the entire file, treeDiff does a lot better in terms of fidelity. TreeDiff especially shines when it comes to changes to concrete separated lists: by understanding the semantics of separated lists exactly, the user does not have to think about this at all when rewriting parse trees.

The complementary layoutDiff algorithm is very similar, but focuses on other parts of the subtrees: the layout nodes. Also because original := formatted, there is no need for complex list algorithms as in treeDiff. LayoutDiff can also infer changes for normalizing case-insensitive literals in five modes: as-original, as-rewritten, capitalize, toUppercase and toLowercase. For layoutDiff, the cases of the characters of case-insensitive literals are seens as a kind of "layout", as this does not influence operational semantics of the code, while it does influence readability and comprehensibility of the code for a human.

To obtain the rewritten parse tree, we expect formatted code in str form will have to be reparsed with the original grammar for layoutDiff. For treeDiff, reparsing is not a good plan, since this will influence the efficiency of the algorithm greatly. We expect treeDiff to be applied on a rewritten parse tree, using concrete pattern matching and substitution for example.

TODO's:

write test, and fix initial issues triggered by the tests
test the example in the documentation
shorten the documentation
document the TextEdits module
short-circuit lexical identifiers (do not go deeper)
write indentation inheritance algorithm

…factoring commands in the IDE

codecov · 2024-09-12T09:03:13Z

Codecov Report

❌ Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 47%. Comparing base (2c284f1) to head (afbb402).
⚠️ Report is 85 commits behind head on main.

Files with missing lines	Patch %	Lines
src/org/rascalmpl/types/NonTerminalType.java	50%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##              main   #2031   +/-   ##
=======================================
  Coverage       47%     47%           
- Complexity    6492    6505   +13     
=======================================
  Files          780     780           
  Lines        64429   64434    +5     
  Branches      9594    9596    +2     
=======================================
+ Hits         30282   30311   +29     
+ Misses       31856   31834   -22     
+ Partials      2291    2289    -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jurgenvinju · 2024-09-12T09:05:58Z

@tvdstorm @DavyLandman I don't have the energy to finish this now, so I'm parking it here until I do. I wanted you to know it exists, because low-fidelity rewrites are a common issue we have to solve and because refactoring and quick-fixes in VScode are now under our fingertips.

DavyLandman · 2024-09-12T11:25:14Z

Cool stuff 👍

…se indentation in O(1)

This finishes the complete algorithm for lists for the first time. The algorithm works in these steps: * Trim equal elements from the head and the tail of both lists * Detect common edits to lists with fast list patterns; this is an optional optimization * Find the latest common sublist and split both lists in three parts: two different prefixes, two equal middle parts and two different post fixes. Recurse on the prefixes and the postfixes and concatenate their edits lists. * Finally we end up with two empty lists or two lists without common elements; we collect the differences of each element position pairwise. Lists that became shorter get an additional edit to cut off the list, while lists that became shorter get one additional edit to add the new elements. The new elements inherit indentation from the pre-existing elements. For these changes additional tests still must be added later.

jurgenvinju · 2025-08-11T20:37:29Z

The static checker chokes on the new code in ParseTree.rsc:

Error: [ERROR]   RunTests.onlyChangedModulesAreReChecked1: <18024,1205> » Throw /home/runner/actions-runner/_work/rascal/rascal/src/org/rascalmpl/compiler/lang/rascalcore/check/tests/StaticTestingUtils.rsc:168,26: "{error(\"Invalid type: expected node, ADT, or concrete syntax types, found `![]`\",|tmp:///rascal/src/org/rascalmpl/library/ParseTree.rsc|(47720,84,\<972,10\>,\<972,94\>),fixes=[])}"
Error: [ERROR]   RunTests.onlyTouchedModulesAreReChecked1: <15213,521> » Throw /home/runner/actions-runner/_work/rascal/rascal/src/org/rascalmpl/compiler/lang/rascalcore/check/tests/StaticTestingUtils.rsc:168,26: "{error(\"Invalid type: expected node, ADT, or concrete syntax types, found `![]`\",|tmp:///rascal/src/org/rascalmpl/library/ParseTree.rsc|(47720,84,\<972,10\>,\<972,94\>),fixes=[])}"

report the bug Spurious static error for char(_) parse Tree and annotation setting #2342
rewrite the code to work around it

… of toBox. This then makes the treatment of files with comments much easier, because every box will always have the same predictable amount of children

experimental HiFi tree diff algorithm for use with quick-fixes and re…

7b6f519

…factoring commands in the IDE

jurgenvinju requested review from tvdstorm and DavyLandman September 12, 2024 08:59

jurgenvinju added the enhancement label Sep 12, 2024

jurgenvinju self-assigned this Sep 12, 2024

jurgenvinju added 22 commits October 1, 2024 11:32

developing the list diff algorithms with inspiration from the diff tool

374a8a2

Merge branch 'main' into hifi-tree-diff

da7f5a1

made some progress with the list algorithm

c623d2b

minor improvements. this is not finished yet

1525e73

slow progress

3196433

added demo

3f05df4

Merge branch 'main' into hifi-tree-diff

2547a1a

exposed IString.indent to String library module to allow users to reu…

ed091f7

…se indentation in O(1)

slow progress on the diff algorithm

2462eeb

more complex example, and debug prints

8abdbd6

finetunes stuff in indentation learner

4a55110

testing

97eb352

Merge branch 'main' into hifi-tree-diff

b051673

Merge branch 'main' into hifi-tree-diff

27e8536

fixed nasty bug in Type.intersection w.r.t. parameter types

fd6ccbb

started on testing HiFiTreeDiff

1c0a81d

minor improvements

9c64458

fixed bug in list diff

71a1c00

oops

2677795

simplified and repaired equal sublist detection

091b0b9

finding more nested similarity under list elements

ed1ad03

jurgenvinju added 10 commits August 11, 2025 15:20

default indentation size set to 4

a9fb177

added more complex tests for sublist equality

cb7dedf

fixed asFormatted literal option

bd9cc83

added TODO

6f2dceb

added test

37334ab

improved indentation inheritance

5d12f49

refactored private function name

5d60bdf

fixed documentation

f99ee23

fixed minor issue in tutor compiler

5b24d7d

Merge branch 'main' into hifi-tree-diff

4299881

jurgenvinju marked this pull request as ready for review August 11, 2025 19:42

apply suggesting by @toinehartman for improved readability

62689ab

jurgenvinju added 2 commits August 11, 2025 23:02

workaround for #2342

f197f4a

added comment

a893ed1

jurgenvinju changed the title ~~experimental HiFi tree diff algorithm for use with quick-fixes and refactoring commands in the IDE~~ two HiFi tree diff algorithm for use after source-to-source transformations (quick-fix, refactoring, formatting) in the IDE Aug 11, 2025

jurgenvinju changed the title ~~two HiFi tree diff algorithm for use after source-to-source transformations (quick-fix, refactoring, formatting) in the IDE~~ two HiFi tree diff algorithms for after source-to-source transformations (quick-fix, refactoring, formatting) in the IDE Aug 11, 2025

jurgenvinju and others added 10 commits August 12, 2025 14:20

bug in execute text edits solved

d1ff2bb

finetune comma and semicolon separated lists

7bff86d

better defaults for binary expressions

122932f

minor refactoring

407b95b

with layoutDiff in the game, comment preservation is no longer a task…

5d1606d

… of toBox. This then makes the treatment of files with comments much easier, because every box will always have the same predictable amount of children

finetuning the pico demo

c545a74

factored clones in pico formatting demo

9e07b1f

improved docs

93e0b93

Fix tutor & type errors.

b56bb93

Merge branch 'main' into hifi-tree-diff

afbb402

jurgenvinju merged commit 9d96167 into main Aug 14, 2025
8 checks passed

jurgenvinju deleted the hifi-tree-diff branch August 14, 2025 11:37

toinehartman mentioned this pull request Aug 18, 2025

Optionally preserve trailing spaces in layoutDiff #2360

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

two HiFi tree diff algorithms for after source-to-source transformations (quick-fix, refactoring, formatting) in the IDE #2031

two HiFi tree diff algorithms for after source-to-source transformations (quick-fix, refactoring, formatting) in the IDE #2031

Uh oh!

jurgenvinju commented Sep 12, 2024 •

edited

Loading

Uh oh!

codecov bot commented Sep 12, 2024 •

edited

Loading

Uh oh!

jurgenvinju commented Sep 12, 2024

Uh oh!

DavyLandman commented Sep 12, 2024

Uh oh!

jurgenvinju commented Aug 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

two HiFi tree diff algorithms for after source-to-source transformations (quick-fix, refactoring, formatting) in the IDE #2031

two HiFi tree diff algorithms for after source-to-source transformations (quick-fix, refactoring, formatting) in the IDE #2031

Uh oh!

Conversation

jurgenvinju commented Sep 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jurgenvinju commented Sep 12, 2024

Uh oh!

DavyLandman commented Sep 12, 2024

Uh oh!

jurgenvinju commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jurgenvinju commented Sep 12, 2024 •

edited

Loading

codecov bot commented Sep 12, 2024 •

edited

Loading

jurgenvinju commented Aug 11, 2025 •

edited

Loading