Skip to content

Conversation

jurgenvinju
Copy link
Member

@jurgenvinju jurgenvinju commented Sep 12, 2024

Two algorithms are added to form the keystone elements in hifi source-to-source transformation pipelines written in Rascal. They both transform a coupled pair of parse Trees <original, rewritten> to a list[TextEdit]. The resulting
TextEdits are ready for use in VScode extensions via LSP features in util::LanguageServer and
util::IDEServices.

There are two complementary algorithms here:

  • treeDiff: works on a pair of (original, rewritten) parse trees and detects the minimal differences without too much damage to whitespace and source code comments.
  • layoutDiff: works on a pair of (original, formatted) parse trees, where original := formatted must be true, and derives edits to whitespace only, while trying to preserve source code comments and case insensitive literals.

The treeDiff single pass parse tree recursions maps sub-tree and sub-list differences to textual differences in a special way. It lifts on the semantics of special parse tree non-terminals (literals, lexicals, separators) to ignore certain
superfluous changes made in the rewritten tree, focusing only on essential syntactic changes that do change the operational semantics of the source code. As a result the edited source text retains more of its original layout, syntax which does not affect operational semantics, including indentation and comments. As compared to yielding the
rewritten parse tree to a string and replacing the entire file, treeDiff does a lot better in terms of fidelity. TreeDiff especially shines when it comes to changes to concrete separated lists: by understanding the semantics of separated lists exactly, the user does not have to think about this at all when rewriting parse trees.

The complementary layoutDiff algorithm is very similar, but focuses on other parts of the subtrees: the layout nodes. Also because original := formatted, there is no need for complex list algorithms as in treeDiff. LayoutDiff can also infer changes for normalizing case-insensitive literals in five modes: as-original, as-rewritten, capitalize, toUppercase and toLowercase. For layoutDiff, the cases of the characters of case-insensitive literals are seens as a kind of "layout", as this does not influence operational semantics of the code, while it does influence readability and comprehensibility of the code for a human.

To obtain the rewritten parse tree, we expect formatted code in str form will have to be reparsed with the original grammar for layoutDiff. For treeDiff, reparsing is not a good plan, since this will influence the efficiency of the algorithm greatly. We expect treeDiff to be applied on a rewritten parse tree, using concrete pattern matching and substitution for example.

TODO's:

  • write test, and fix initial issues triggered by the tests
  • test the example in the documentation
  • shorten the documentation
  • document the TextEdits module
  • short-circuit lexical identifiers (do not go deeper)
  • write indentation inheritance algorithm

Copy link

codecov bot commented Sep 12, 2024

Codecov Report

❌ Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 47%. Comparing base (2c284f1) to head (afbb402).
⚠️ Report is 85 commits behind head on main.

Files with missing lines Patch % Lines
src/org/rascalmpl/types/NonTerminalType.java 50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##              main   #2031   +/-   ##
=======================================
  Coverage       47%     47%           
- Complexity    6492    6505   +13     
=======================================
  Files          780     780           
  Lines        64429   64434    +5     
  Branches      9594    9596    +2     
=======================================
+ Hits         30282   30311   +29     
+ Misses       31856   31834   -22     
+ Partials      2291    2289    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jurgenvinju
Copy link
Member Author

@tvdstorm @DavyLandman I don't have the energy to finish this now, so I'm parking it here until I do. I wanted you to know it exists, because low-fidelity rewrites are a common issue we have to solve and because refactoring and quick-fixes in VScode are now under our fingertips.

@DavyLandman
Copy link
Member

Cool stuff 👍

This finishes the complete algorithm for lists for the first time. The algorithm works in these steps:
* Trim equal elements from the head and the tail of both lists
* Detect common edits to lists with fast list patterns; this is an optional optimization
* Find the latest common sublist and split both lists in three parts: two different prefixes, two equal middle parts and two different post fixes. Recurse on the prefixes and the postfixes and concatenate their edits lists.
* Finally we end up with two empty lists or two lists without common elements; we collect the differences of each element position pairwise. Lists that became shorter get an additional edit to cut off the list, while lists that became shorter get one additional edit to add the new elements. The new elements inherit indentation from the pre-existing elements. 

For these changes additional tests still must be added later.
@jurgenvinju jurgenvinju marked this pull request as ready for review August 11, 2025 19:42
@jurgenvinju
Copy link
Member Author

jurgenvinju commented Aug 11, 2025

The static checker chokes on the new code in ParseTree.rsc:

Error: [ERROR]   RunTests.onlyChangedModulesAreReChecked1: <18024,1205> » Throw /home/runner/actions-runner/_work/rascal/rascal/src/org/rascalmpl/compiler/lang/rascalcore/check/tests/StaticTestingUtils.rsc:168,26: "{error(\"Invalid type: expected node, ADT, or concrete syntax types, found `![]`\",|tmp:///rascal/src/org/rascalmpl/library/ParseTree.rsc|(47720,84,\<972,10\>,\<972,94\>),fixes=[])}"
Error: [ERROR]   RunTests.onlyTouchedModulesAreReChecked1: <15213,521> » Throw /home/runner/actions-runner/_work/rascal/rascal/src/org/rascalmpl/compiler/lang/rascalcore/check/tests/StaticTestingUtils.rsc:168,26: "{error(\"Invalid type: expected node, ADT, or concrete syntax types, found `![]`\",|tmp:///rascal/src/org/rascalmpl/library/ParseTree.rsc|(47720,84,\<972,10\>,\<972,94\>),fixes=[])}"

@jurgenvinju jurgenvinju changed the title experimental HiFi tree diff algorithm for use with quick-fixes and refactoring commands in the IDE two HiFi tree diff algorithm for use after source-to-source transformations (quick-fix, refactoring, formatting) in the IDE Aug 11, 2025
@jurgenvinju jurgenvinju changed the title two HiFi tree diff algorithm for use after source-to-source transformations (quick-fix, refactoring, formatting) in the IDE two HiFi tree diff algorithms for after source-to-source transformations (quick-fix, refactoring, formatting) in the IDE Aug 11, 2025
@jurgenvinju jurgenvinju merged commit 9d96167 into main Aug 14, 2025
8 checks passed
@jurgenvinju jurgenvinju deleted the hifi-tree-diff branch August 14, 2025 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants