Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFM strikethrough causes nested attention sequences to be considered just text data #84

Closed
rawilder opened this issue Oct 27, 2023 · 3 comments

Comments

@rawilder
Copy link
Contributor

Here is the test to show what I am talking about.

markdown from test ~~foo __*a*__~~ = foo a

assert_eq!(
    to_html_with_options("~~foo __*a*__~~", &Options::gfm())?,
    "<p><del>foo <strong><em>a</em></strong></del></p>",
    "should support emphasis within strong emphasis within strikethrough w/ `*` in `_`"
);

I've already written the code to fix this, but I also refactored portion of the code to help me understand exactly where I was getting it wrong, so I am opening an issue to share findings and discuss. I eventually landed on a reimplementation of the determining of an AttentionSequence's open/close state using something maybe too close to the GFM spec's wording.

The root cause of the above test failing was tildes not being considered punctuation. I added a kind_before_index which copied its sibling (that already considered ascii punctuation as valid, which includes tilde).

After this though, it caused some other tests to break. Fast forward a bit of debugging and I found that my implementation was actually technically correct but Github's UI is able to handle something that their own spec doesn't necessarily support, but was being tested for in this repo I imagine for the exact reason that Github's UI supports it.

markdown from test e ***~~xxx~~***yyy = e xxxyyy
other parsers (my ide, and some live example sites) were showing me: e ***xxx***yyy

assert_eq!(
    to_html_with_options("e ***~~xxx~~***yyy", &Options::gfm())?,
    "<p>e <em><strong><del>xxx</del></strong></em>yyy</p>",
    "interplay"
);

Now I went through my new implementation to see if I could get this test to pass while maintaining the test at the top. I tweaked one of the refactor's new functions, is_right_flanking_delimiter_run, to treat preceding tildes as Other, else the new flow that includes ascii punctuation.

Happy to open the PR if asked, just didn't want to surprise anyone with an out of the blue PR.

@rawilder rawilder changed the title GFM strikethrough causes nested attention sequences to be ignored GFM strikethrough causes nested attention sequences to be considered just text data Oct 27, 2023
@wooorm
Copy link
Owner

wooorm commented Nov 6, 2023

Hey! Thanks for your patience!

I eventually landed on a reimplementation of the determining of an AttentionSequence's open/close state using something maybe too close to the GFM spec's wording.

This might, or might not, be useful here!
Does it pass all the tests?

In JS, here’s how it works: https://github.com/micromark/micromark/blob/929275e2ccdfc8fd54adb1e1da611020600cc951/packages/micromark-core-commonmark/dev/lib/attention.js#L261-L268
I think I implemented it here a bit differently?
Either because I figured out a smarter way.
Or perhaps indeed because I missed a bug—this bug.

Github's UI is able to handle something that their own spec doesn't necessarily support, but was being tested for in this repo I imagine for the exact reason that Github's UI supports it.

Yes, CM and GFM are specs that are super limited.
As in, pretty useless.

markdown-rs and micromark are reverse engineered with tons of things checked against their actual implementation.

So, which way to go here, depends more on how the CM dingus works and GH works here in comments/readmes, then on what their specs say!

@rawilder
Copy link
Contributor Author

With that context then, I'll open the simpler PR which simply addresses the bug rather than the refactor that makes it read more like the spec.

@rawilder
Copy link
Contributor Author

#88

@wooorm wooorm closed this as completed in f2c94af Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants