Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use xdiff to implement internal diff algorithm #2732

Closed
wants to merge 17 commits into from

Conversation

chrisbra
Copy link
Member

Okay, this is very highly experimental and still work in progress. But it might be good enough to share what I have, because it seems to pass my very basic tests and I do not know how much time I have to work on it in the future.

This PR imports the libxdiff from the git source tree to implement an internal diff parser (this is from git-2.17.0-rc0, plus some slight style changes [added a couple of casts]). This will make creating diffs a lot faster, since we do not need to shell out to parse the output of diff foobar foobar.new.

Currently, it runs the xdiff algorithm on the already written temp files tmp_new and tmp_orig and write the result to tmp_diff. We can probably skip writing temp files for creating diffs at all, but I am currently not sure how to do it. So there is some room for even more performance improvements later.

In addition, by using the git diff included xdiff, we can in the future support several improvements to the diff algorithm, e.g. implement patience and histogram diff algorithm, without much fuzz. Currently this is not supported, but it should be easily possible. Also the new indent heuristics from git which is the default starting with 2.14 should be possible with some kind of a switch.

To use the new algorithm, make sure to :set diffopt+=internal. As said before, this is highly experimental and might crash :(

Also note, I have basically no clue of how to import that part into the Makefile, so I changed src/Makefile manually and it might be cumbersome. But at least, it seems to work :)

BTW: feedback welcome, however, I am not sure how much time in the future I have to work on it. So I leave this here and hopefully this will make it easier to implement an internal diff in Vim.

However, note, I did not yet check which files from git/git source xdiff directory are actually needed. Currently, it will compile a couple of files, which might not (all) be necessary. So it imports a couple of more files than currently needed. But it will enable additional diff options later.

However, once we have this change in place, it should be straight forward to make further adjustments as needed (e.g. do not write temp files at all to speed up diffing even further).

Also this has the further advantage to support unified diffs, when using a custom diffexpression. Currently only ed like style diffs are supported, which kind of sucks and prevents us from simply using :set diffexpr=git diff, because unified style diffs seems to be what most tools nowadays understand and ed like style diffs seem kind of traditional and even unsupported.

Note, there might be a some disadvantages with using libxdiff:

  • the libxdiff algorithm is different from GNU diff. However, I suppose by using xdiff from git,
    we got a lot of testing. (git has imported this library around 2006, so there are at least 10
    years of thoroughly testing).

@codecov-io
Copy link

codecov-io commented Mar 21, 2018

Codecov Report

Merging #2732 into master will decrease coverage by 0.81%.
The diff coverage is 4.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2732      +/-   ##
==========================================
- Coverage      75%   74.19%   -0.82%     
==========================================
  Files          92       98       +6     
  Lines      134149   135619    +1470     
==========================================
+ Hits       100614   100618       +4     
- Misses      33535    35001    +1466
Impacted Files Coverage Δ
src/xdiff/xprepare.c 0% <0%> (ø)
src/xdiff/xdiffi.c 0% <0%> (ø)
src/xdiff/xpatience.c 0% <0%> (ø)
src/xdiff/xemit.c 0% <0%> (ø)
src/xdiff/xhistogram.c 0% <0%> (ø)
src/xdiff/xutils.c 0% <0%> (ø)
src/diff.c 75.89% <37.29%> (-6.58%) ⬇️
src/gui_beval.c 62.23% <0%> (-0.86%) ⬇️
src/if_xcmdsrv.c 83.63% <0%> (-0.36%) ⬇️
src/ex_cmds.c 79.39% <0%> (-0.23%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b571c63...70f99ef. Read the comment docs.

@chrisbra
Copy link
Member Author

I have disabled the libxdiff feature on Windows, because the windows Makefile has not been changed yet to include the xdiff source and there were some issues with mch_stat where I am not sure how to fix them for Windows.

Also I have not yet included any tests, mainly because I am not sure how to test the resulting diff. Perhaps it would be possible to run Vim recursively inside a terminal window in diff mode and snapshot the resulting diff using term_dumpwrite(). But I haven't looked at it how feasible this would be.

@LemonBoy
Copy link
Contributor

Nice, I wrote a small tool using xdiff to have nice/fast diffs and avoid messing with the vim source code.

@chrisbra
Copy link
Member Author

Ha, I like the name ;)
BTW: I have implemented a vim-script solution to translate unified diff output to ed style diffs in vim-diff-enhanced

@chrisbra
Copy link
Member Author

I think it is now in a shape, that is safe to include. I can squash it into a single commit if needed. Once this prooved to be stable enough, it might even allow us to get rid of distributing diff together with Vim.

I added a couple of tests, implemented the missing 'diffopt' values and made it work on Windows. Tests pass except for some slight differences in the term_dump() function. This can be seen here:

https://api.travis-ci.org/v3/job/357269023/log.txt
https://api.travis-ci.org/v3/job/357269024/log.txt

The only difference is this part:

@@ -17,4 +17,4 @@
 | +0#0000e05#a8a8a8255@1|~+0#4040ff13#ffffff0| @33||+1#0000000&| +0#0000e05#a8a8a8255@1|~+0#4040ff13#ffffff0| @33
 | +0#0000e05#a8a8a8255@1|~+0#4040ff13#ffffff0| @33||+1#0000000&| +0#0000e05#a8a8a8255@1|~+0#4040ff13#ffffff0| @33
 |X+3#0000000&|f|i|l|e|1| @12|1|,|1| @11|A|l@1| |X+1&&|f|i|l|e|2| @12|1|,|1| @11|A|l@1
-|:+0&&> @73
+|:+0&&> | @72

which I think is a bug in the term_dump() function and I have already reported at vim-dev. Also there was one ASAN error reported here:
https://api.travis-ci.org/v3/job/357269032/log.txt

Which also looks like a bug in the f_term_dumpwrite function. And finally, it looks like there was one out of memory error on travis:
https://api.travis-ci.org/v3/job/357337067/log.txt

There is still room for improvement though. For one thing one could avoid the whole I/O overhead by making it use without using temporary files. But I have the feeling that is something I cannot easily implement and might take some additional time and effort, which I do not have currently.

Second, the :diffpatch function could make use of the xdiff library. However it looks like currently the xdl_patch() function has been removed from the xdiff library by the git folks. Anyhow, I have the feeling, that the :diffpatch command is actually not much used, since it currently requires the patch program available and that one is not even bundled with Vim. So this command already fails miserably on Windows with a command not found error :(

And finally it might make sense to allow context lines when parsing the unified diff. This would mean, one needs to actually adjust the hunk addresses (lnum_orig, count_orig, lnum_new and count_new) as one parses the file. I have not done so yet, so Vim always expects a unified diff with zero context lines. There won't be an error, just the diff will be rendered wrongly.

And finally here are two screencasts how to make use of the new diff algorithms in Vim that comes with xdiff:

Kudos goes to the git developers for developing those parts.

@nuko8
Copy link

nuko8 commented Mar 23, 2018

On my Mac, the build was done successfully, and the resulting Vim worked fine, just the same way as the screencasts are showing.

I have used my own Lua script to have Vim accept the output of git diff --no-index. I found that the xdiff of this PR gives just the same results that the script does, but I also found that the former gives a far better user experience than the latter: It's free from severe flicker or hiccups such as incompletely-finished-redrawing. In addition to making some nice diff algorithms available to Vim, I think this PR succeeded in showing the advantage of having a builtin xdiff.

As to the screen dump test, I got

Test results:


From test_diffmode.vim:
Found errors in Test_diff_screen():
function RunTheTest[38]..Test_diff_screen[14]..VerifyScreenDump line 14: See dump file difference: call term_dumpdiff("Test_diff_01.dump.failed", "dumps/Test_diff_01.dump")
function RunTheTest[38]..Test_diff_screen[14]..VerifyScreenDump line 16: --- dumps/Test_diff_01.dump       2018-03-23 21:31:49.000000000 +0900
function RunTheTest[38]..Test_diff_screen[14]..VerifyScreenDump line 16: +++ Test_diff_01.dump.failed      2018-03-23 21:41:51.000000000 +0900
function RunTheTest[38]..Test_diff_screen[14]..VerifyScreenDump line 16: @@ -17,4 +17,4 @@
function RunTheTest[38]..Test_diff_screen[14]..VerifyScreenDump line 16:  | +0#0000e05#a8a8a8255@1|~+0#4040ff13#ffffff0| @33||+1#0000000&| +0#0000e05#a8a8a8255@1|~+0#4040ff13#ffffff0| @33
function RunTheTest[38]..Test_diff_screen[14]..VerifyScreenDump line 16:  | +0#0000e05#a8a8a8255@1|~+0#4040ff13#ffffff0| @33||+1#0000000&| +0#0000e05#a8a8a8255@1|~+0#4040ff13#ffffff0| @33
function RunTheTest[38]..Test_diff_screen[14]..VerifyScreenDump line 16:  |X+3#0000000&|f|i|l|e|1| @12|1|,|1| @11|A|l@1| |X+1&&|f|i|l|e|2| @12|1|,|1| @11|A|l@1
function RunTheTest[38]..Test_diff_screen[14]..VerifyScreenDump line 16: -|:+0&&> @73
function RunTheTest[38]..Test_diff_screen[14]..VerifyScreenDump line 16: +|:+0&&> | @72
TEST FAILURE

Also, when I tried to compile the patched source with -std=c89, I got

$ CFLAGS='-std=c89' ./configure && make
...
gcc -c -I. -Iproto -DHAVE_CONFIG_H   -DMACOS_X -DMACOS_X_DARWIN  -std=c89 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1        -o objects/xdiffi.o xdiff/xdiffi.c
xdiff/xdiffi.c:733:8: error: unknown type name 'inline'
static inline int group_next(xdfile_t *xdf, struct xdlgroup *g)
       ^
xdiff/xdiffi.c:733:15: error: expected identifier or '('
static inline int group_next(xdfile_t *xdf, struct xdlgroup *g)
              ^
xdiff/xdiffi.c:749:8: error: unknown type name 'inline'
static inline int group_previous(xdfile_t *xdf, struct xdlgroup *g)
       ^
xdiff/xdiffi.c:749:15: error: expected identifier or '('
static inline int group_previous(xdfile_t *xdf, struct xdlgroup *g)
              ^
4 errors generated.
make: *** [objects/xdiffi.o] Error 1

@chrisbra
Copy link
Member Author

The screen dump diff looks similar to the one above from the CI. I am not sure, but I think the difference is only that for one dump there a 73 spaces while the result of the test was 1 single space followed by 72 spaces. If you load it into Vim using call term_dumpdiff("Test_diff_01.dump.failed", "dumps/Test_diff_01.dump") you'll probably not notice any difference.

regarding the C89 issue, I have seen a similar error on Windows and have therefore disabled the inline and attribute flags using 0c40761. Perhaps those should be done unconditionally? (I didn't do it, because I didn't want to change too much of the upstream source. Or is there a similar compile property available to check against when compiling?

@justinmk
Copy link
Contributor

This PR imports the libxdiff from the git source tree

Is libxdiff not available for linking instead of inlining the entire thing (GPL license) into the Vim tree?

@nuko8
Copy link

nuko8 commented Mar 23, 2018

@chrisbra

If you load it into Vim using call term_dumpdiff("Test_diff_01.dump.failed", "dumps/Test_diff_01.dump") you'll probably not notice any difference.

Indeed, I can't distinguish one from another. For convenience, I'm uploading a screenshot obtained by the suggested command:

screen shot 2018-03-24 at 00 49 00

regarding the C89 issue, I have seen a similar error on Windows and have therefore disabled the inline and attribute flags using 0c40761. Perhaps those should be done unconditionally?

I think that would be the easiest, practical way to go unless avoiding the inline ends up a noticeable deterioration in performance.

That said, I also understand

I didn't want to change too much of the upstream source.

as, IIRC, an option once introduced to git diff was abolished in a few month last year. The development has been so active even these days. Too much deviation from the original source surely makes it harder for us to catch up with improvement done by the upstream in the future...

Or is there a similar compile property available to check against when compiling?

I'm afraid there's not.

@chrisbra
Copy link
Member Author

Is libxdiff not available for linking instead of inlining the entire thing (GPL license) into the Vim tree?

As far as I can see, there is no libxdiff-dev available in Debian. Not even the original version. Also git seems to inline it (using a static library, not that it matters much in practice).

@chrisbra chrisbra force-pushed the internal-diff branch 3 times, most recently from 06c9b5e to 40a8078 Compare March 23, 2018 22:07
@chrisbra
Copy link
Member Author

[disable inline...]

I think that would be the easiest, practical way to go unless avoiding the inline ends up a noticeable deterioration in performance.

I tried and failed miserably.
https://ci.appveyor.com/project/chrisbra/vim-ch0ci/build/job/g1430v7vw08q6y3y
https://ci.appveyor.com/project/chrisbra/vim-ch0ci/build/job/g1430v7vw08q6y3y

duplicate symbol _isprint in:
    objects/xdiffi.o
    objects/xemit.o
duplicate symbol _digittoint in:
    objects/xdiffi.o
    objects/xemit.o
duplicate symbol _isxdigit in:
    objects/xdiffi.o
    objects/xemit.o
duplicate symbol _isdigit in:
    objects/xdiffi.o
    objects/xemit.o

On Mac and on Windows with Mingw. Wow, I did not expect that. So I am rolling it back for now.

@chrisbra
Copy link
Member Author

rebased and added another test for 'diffopt+=iwhite'

@justinmk
Copy link
Contributor

justinmk commented Apr 9, 2018

What are the implications of pasting GPL-licensed code into Vim's source?

@tonymec
Copy link

tonymec commented Apr 9, 2018

@justinmk : from a page or so below :help license:

- According to Richard Stallman the Vim license is GNU GPL compatible.
  A few minor changes have been made since he checked it, but that should not
  make a difference.

- If you link Vim with a library that goes under the GNU GPL, this limits
  further distribution to the GNU GPL.  Also when you didn't actually change
  anything in Vim.

- Once a change is included that goes under the GNU GPL, this forces all
  further changes to also be made under the GNU GPL or a compatible license.

@micbou
Copy link

micbou commented Apr 9, 2018

The library is LGPL-licensed so including the library doesn't require changing the Vim license to GPL. There are other requirements though like mentioning that the library is used and covered by the LGPL (this could be added to the :version output for instance), and adding a copy of the GPL and LGPL to the project.

The Vim source code is compatible with the GNU GPL so it should be
possible to link in the additional libxdiff library which is distributed
under the GNU LGPL license.

Since the xdiff library is distributed under LGPL 2.1 license, add the
license file to the imported source code. Note, that each file has its
own copyright section which also mentions who is the author of the code.

http://www.xmailserver.org/xdiff-lib.html
https://www.gnu.org/copyleft/lesser.html

Note, that this includes the xdiff library as it is distributed by the
git project (https://git-scm.com/), which is heavily modified from the
upstream project (http://www.xmailserver.org/xdiff-lib.html). So it does
not contain the complete upstream source but only the needed parts for
git (which happen to be also enough for Vim).
This makes it possible to either parse an "ed" style diff (which is the
traditional diff) or parse the unified diff.

The unified diff is created by the xdiff library. Note: it is highly
experimental. Note: it only works, if there are zero context lines.

Still this allows a 'diffexpr' to return a unified diff and parse it
correctly, as long as it is in a unified diff with zero context lines.
include Vims auto/config.h  header and redefine UNUSED according to the
HAVE_ATTRIBUTE_UNUSED property.
This is not stricktly required for the xdiff feature but an enhancement
to the test suite in general. By default, whenever a SwapFile Exists
message is shown in the Test suite, the Test will stop waiting
indefinitely until a key is pressed. However this will cause an
indefinite hang on the CI infrastructure which will then eventually time
out.

So avoid this at all by globally setting `:set shortmess+=A`. Since one
test however does check the SwapFileExist warning, it needs to be
adjusted to remove the 'A' flag from the shortmess setting.
@chrisbra
Copy link
Member Author

Thanks for those comments. I added a commit mentioning the license of libxdiff and included the LGPL for further distribution. I leave the decision to mention the license in the :version output to Bram.

I also added a second commit to make the tests work better by disableing the SwapFile Exists warning.

There is still an ASAN failure which looks like term->tl_vterm-> screen gets somehow corrupted causing a segfault. Unfortunately, I can only reproduce this on the CI infrastructure and not locally. I don't actually know how to debug this.

@brammool
Copy link
Contributor

brammool commented Apr 10, 2018 via email

@chdiza
Copy link

chdiza commented Apr 10, 2018

Vim doesn't even put it's own licensing info in the output of :version. I have no idea why we would put anything else's licensing info in there.

@micbou
Copy link

micbou commented Apr 10, 2018

Vim doesn't even put it's own licensing info in the output of :version. I have no idea why we would put anything else's licensing info in there.

I thought the license was mentioned there as a lot of tools are displaying that kind of info when running them with --version on the command line. Anyway, this was just a suggestion of where it could be mentioned. Adding it to :h license should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants