Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text missing from HTML diff #80

Open
kwyntes opened this issue Nov 18, 2023 · 0 comments
Open

Text missing from HTML diff #80

kwyntes opened this issue Nov 18, 2023 · 0 comments

Comments

@kwyntes
Copy link

kwyntes commented Nov 18, 2023

When using the following test HTML files as input...

$ cat old.html
<html>
        <body>
                some <div>text and more</div> text
        </body>
</html>

$ cat new.html
<html>
        <body>
                some <div class='red'>text</div> and more <strong>text</strong>
        </body>
</html>

$ graphtage old.html new.html
<html>
        <body>
                some <̟d̟i̟v̟ ̟c̟l̟a̟s̟s̟=̟"̟r̟e̟d̟"̟>̟t̟e̟x̟t̟<̟/̟d̟i̟v̟>̟
        <̟s̟t̟r̟o̟n̟g̟>̟t̟e̟x̟t̟<̟/̟s̟t̟r̟o̟n̟g̟>̟
        <̶d̶i̶v̶>̶t̶e̶x̶t̶ ̶a̶n̶d̶ ̶m̶o̶r̶e̶<̶/̶d̶i̶v̶>̶
    </body>
</html>

+ screenshot:
image

..., as you can see, the text and more is missing from the diff generated by graphtage.

I've tried some other diff tools and it seems and none of them had any success with correctly processing these two files for some reason (many are using the same core algorithm I suppose). Is there some kind of general issue with processing text not enclosed in tags (as in, and more is between two elements, but not enclosed in any tag (apart from the parent <body> tag) itself)?

I have also tried surrounding and more in a <p> tag in new.html, which resulted in this mess:

$ graphtage old.html new.html
<html>
        <body>
                some <̟d̟i̟v̟ ̟c̟l̟a̟s̟s̟=̟"̟r̟e̟d̟"̟>̟t̟e̟x̟t̟<̟/̟d̟i̟v̟>̟
        <p̟d̶i̶v̶>t̶e̶x̶t̶ ̶and more</p̟d̶i̶v̶>
        <̟s̟t̟r̟o̟n̟g̟>̟t̟e̟x̟t̟<̟/̟s̟t̟r̟o̟n̟g̟>̟
    </body>
</html>

+ screenshot:
image

What's happening?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant