-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix!: TextNode text property is not decoded (fixes #134) #135
Conversation
All set for review @taoqf |
Thanks for your work. based on your pr, I digged it up. please check the code 840ffda if you have time. |
Hi. I appreciate your taking the time! I did find a few issues. Principally, according to spec An appealAs for this proposal, please consider the following:
Because of this, I would classify this as an error, as it's going against established convention setup for nodes as well as its documentation. It also effectively eliminates the purpose for having both properties. Put more simply, why should So, although it's technically a breaking change, it is one which makes the code do what the documentation and convention says. Issue with textContentThe issue with changing If you absolutely do not want to change the Next stepsI would advise removing your recent commit, for now, and I've given you access to this PR in case you need to make changes. I've made some corrections to my PR based on your commit. I've also added notes to make it easier to understand what I did. I should have done that before. I apologize. Please have a look over the comments in the |
@@ -458,7 +458,7 @@ export default class HTMLElement extends Node { | |||
if ((node as TextNode).isWhitespace) { | |||
return; | |||
} | |||
node.rawText = (<TextNode>node).trimmedText; | |||
node.rawText = (<TextNode>node).trimmedRawText; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are setting rawText
, we want to be certain that it uses the raw version
this._rawText = text; | ||
this._trimmedRawText = void 0; | ||
this._trimmedText = void 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset caches when rawText changed (connected to previous note)
@@ -61,18 +60,50 @@ export default class TextNode extends Node { | |||
* @return {string} text content | |||
*/ | |||
public get text() { | |||
return this.rawText; | |||
return decode(this.rawText); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standardize to convention and documentation
} | ||
|
||
/** | ||
* Detect if the node contains only white space. | ||
* @return {bool} | ||
* @return {boolean} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This produced an eslint error, so I updated it
textNode.rawText.should.eql(' 123 '); | ||
const textNode = new TextNode(' 123 '); | ||
textNode.rawText = textNode.trimmedRawText; | ||
textNode.rawText.should.eql(' 123 '); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this original test in a previous PR. We're altering it slightly to make sure we're using the proper rawText version of trimmed text. We also add in an   to ensure that it does not decode.
*/ | ||
public get isWhitespace() { | ||
return /^(\s| )*$/.test(this.rawText); | ||
} | ||
|
||
public toString() { | ||
return this.text; | ||
return this.rawText; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Necessary to ensure toString
does not decode (as it did not before)
|
||
const textNode = divNode.firstChild; | ||
textNode.rawText.should.eql(content); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test ultimately did not need to be rewritten, however, I left my change in because it is more complete and easier to read. I also renamed it so that it is easier to understand what's happening. If you'd prefer, it can be changed back.
|
||
const textNode = pNode.firstChild; | ||
textNode.text.should.eql(decodedText); | ||
textNode.rawText.should.eql(encodedText); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It made sense to keep this test under the encode/decode heading, as this is core functionality.
As a secondary suggestion, I recall your saying you don't have much time for this repo anymore. If you'd like help maintaining it, I'd be happy to come on board. I currently maintain several large, critical libraries in the parser/compiler space. Because we're looking at a major version bump, if it made you feel better about it, I could also go through and fix the active issues that are bugs, as well as clean up the tests and update them to use jest. That way we can test against uncompiled source. I also can be sure to maintain (and likely improve) performance time, as I do/did with node-html-markdown, obtaining 3x faster than the leading node compiler. I could also add github actions to automate testing in pull requests, pushes, etc. That can make reviewing PRs a lot easier. Let me know if any if that sounds good, otherwise, just getting this wrapped is fine. |
super(parentNode); | ||
this._rawText = rawText; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change allows us to automatically reset trimmed text caches on update. I added the caching in a previous PR. This change prevents a future bug report with existing code.
Thank you for all your work. |
Should I increase the major version? I have not publish new version yet. |
@taoqf You're very welcome! Thanks for addressing it so quickly! Yes, I would definitely publish as a new major version. I'd release as v4.0.0 and add a release with the following notes: Breaking Changes
|
Addressed
TextNode#Text
now is html-decodedHtmlNode#structuredText
now uses html-decoded textrawText
update (ensures up to date afterHtmlNode#removeWhitespace
is called)Notes
structuredText
should not preserve html entities, if I'm wrong, let me know and I'll adjust.text
instead of the properrawText
to get html entitiesAssociated Issue
crosstype/node-html-markdown#14