Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue #115 #123

Merged
merged 4 commits into from
May 25, 2021
Merged

fix issue #115 #123

merged 4 commits into from
May 25, 2021

Conversation

nonara
Copy link
Collaborator

@nonara nonara commented May 22, 2021

Abstract

The following PR takes a holistic approach to mitigating issue #115 by implementing HTML-specific trim functionality, where necessary. Its aim is to provide a more accurate correction for the trim() issue by replicating trim, while explicitly preserving only single leading or trailing non-breaking whitespace, where present.

Background

In addition to the already corrected area (structuredText) a second affected area was found.

The purpose of HTMLElement#removeWhitespace is to remove unnecessary whitespace nodes and otherwise cleanup the text. To do this, it calls trim() on TextNode rawText. Unfortuntately, this is too greedy.

Trim rightly removes repeating spaces, tabs, and line-breaks, which are often simply junk from multi-line HTML formatting. Unfortunately, it also removes legitimate, single, non-breaking space at the beginning or end of the text in a TextNode.

Put more simply, it's overcorrecting in some scenarios. Here are some examples:

<p>Hello <em>world!</em></p>` <!-- Desired: 'Hello world!' Actual: 'Helloworld!` -->
<p>Hello\n\r\r<em>world </em>!</p>` <!-- Desired: 'Helloworld !' Actual: 'Helloworld!` -->
<p>Hello\n\r\t<em>world   </em>!</p>` <!-- Desired: 'Hello world !' Actual: 'Helloworld!` -->

Solution

I've replaced the two calls to trim() with a cached accessor on each TextNode. The accessor performs the same action as trim, with the exception that it will allow a single non-breaking space before and after the text, if one is present.

Associated Issues

#115
crosstype/node-html-markdown#9

@nonara nonara marked this pull request as draft May 22, 2021 22:32
@nonara nonara marked this pull request as ready for review May 22, 2021 23:18
@nonara
Copy link
Collaborator Author

nonara commented May 22, 2021

@taoqf Ready to go. Let me know if you have any questions.

@@ -260,7 +261,7 @@ export default class HTMLElement extends Node {
// Whitespace node, postponed output
currentBlock.prependWhitespace = true;
} else {
let text = node.text;
let text = (<TextNode>node).trimmedText;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reinstates the trim functionality (prior to 3.3.2) with our special trim, which should produce a cleaner output.

@taoqf
Copy link
Owner

taoqf commented May 25, 2021

So many thanks, I don't have much time on this rep recently, so I am going to merge your pr and publish another version now.

@nonara
Copy link
Collaborator Author

nonara commented May 25, 2021

No problem! I know how it is. Thanks for the quick merge!

@nonara nonara deleted the trim-fix branch May 25, 2021 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants