-
Notifications
You must be signed in to change notification settings - Fork 81
two HiFi tree diff algorithms for after source-to-source transformations (quick-fix, refactoring, formatting) in the IDE #2031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…factoring commands in the IDE
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2031 +/- ##
=======================================
Coverage 47% 47%
- Complexity 6492 6505 +13
=======================================
Files 780 780
Lines 64429 64434 +5
Branches 9594 9596 +2
=======================================
+ Hits 30282 30311 +29
+ Misses 31856 31834 -22
+ Partials 2291 2289 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@tvdstorm @DavyLandman I don't have the energy to finish this now, so I'm parking it here until I do. I wanted you to know it exists, because low-fidelity rewrites are a common issue we have to solve and because refactoring and quick-fixes in VScode are now under our fingertips. |
Cool stuff 👍 |
…se indentation in O(1)
This finishes the complete algorithm for lists for the first time. The algorithm works in these steps: * Trim equal elements from the head and the tail of both lists * Detect common edits to lists with fast list patterns; this is an optional optimization * Find the latest common sublist and split both lists in three parts: two different prefixes, two equal middle parts and two different post fixes. Recurse on the prefixes and the postfixes and concatenate their edits lists. * Finally we end up with two empty lists or two lists without common elements; we collect the differences of each element position pairwise. Lists that became shorter get an additional edit to cut off the list, while lists that became shorter get one additional edit to add the new elements. The new elements inherit indentation from the pre-existing elements. For these changes additional tests still must be added later.
The static checker chokes on the new code in ParseTree.rsc:
|
… of toBox. This then makes the treatment of files with comments much easier, because every box will always have the same predictable amount of children
Two algorithms are added to form the keystone elements in hifi source-to-source transformation pipelines written in Rascal. They both transform a coupled pair of parse Trees
<original, rewritten>
to alist[TextEdit]
. The resultingTextEdits are ready for use in VScode extensions via LSP features in
util::LanguageServer
andutil::IDEServices
.There are two complementary algorithms here:
treeDiff
: works on a pair of (original, rewritten) parse trees and detects the minimal differences without too much damage to whitespace and source code comments.layoutDiff
: works on a pair of (original, formatted) parse trees, whereoriginal := formatted
must be true, and derives edits to whitespace only, while trying to preserve source code comments and case insensitive literals.The treeDiff single pass parse tree recursions maps sub-tree and sub-list differences to textual differences in a special way. It lifts on the semantics of special parse tree non-terminals (literals, lexicals, separators) to ignore certain
superfluous changes made in the rewritten tree, focusing only on essential syntactic changes that do change the operational semantics of the source code. As a result the edited source text retains more of its original layout, syntax which does not affect operational semantics, including indentation and comments. As compared to yielding the
rewritten parse tree to a string and replacing the entire file, treeDiff does a lot better in terms of fidelity. TreeDiff especially shines when it comes to changes to concrete separated lists: by understanding the semantics of separated lists exactly, the user does not have to think about this at all when rewriting parse trees.
The complementary layoutDiff algorithm is very similar, but focuses on other parts of the subtrees: the layout nodes. Also because
original := formatted
, there is no need for complex list algorithms as intreeDiff
. LayoutDiff can also infer changes for normalizing case-insensitive literals in five modes: as-original, as-rewritten, capitalize, toUppercase and toLowercase. ForlayoutDiff
, the cases of the characters of case-insensitive literals are seens as a kind of "layout", as this does not influence operational semantics of the code, while it does influence readability and comprehensibility of the code for a human.To obtain the rewritten parse tree, we expect formatted code in
str
form will have to be reparsed with the original grammar forlayoutDiff
. FortreeDiff
, reparsing is not a good plan, since this will influence the efficiency of the algorithm greatly. We expect treeDiff to be applied on a rewritten parse tree, using concrete pattern matching and substitution for example.TODO's: