GFM strikethrough causes nested attention sequences to be considered just text data #84

rawilder · 2023-10-27T07:52:35Z

Here is the test to show what I am talking about.

markdown from test ~~foo __*a*__~~ = ~~foo a~~

assert_eq!(
    to_html_with_options("~~foo __*a*__~~", &Options::gfm())?,
    "<p><del>foo <strong><em>a</em></strong></del></p>",
    "should support emphasis within strong emphasis within strikethrough w/ `*` in `_`"
);

I've already written the code to fix this, but I also refactored portion of the code to help me understand exactly where I was getting it wrong, so I am opening an issue to share findings and discuss. I eventually landed on a reimplementation of the determining of an AttentionSequence's open/close state using something maybe too close to the GFM spec's wording.

The root cause of the above test failing was tildes not being considered punctuation. I added a kind_before_index which copied its sibling (that already considered ascii punctuation as valid, which includes tilde).

After this though, it caused some other tests to break. Fast forward a bit of debugging and I found that my implementation was actually technically correct but Github's UI is able to handle something that their own spec doesn't necessarily support, but was being tested for in this repo I imagine for the exact reason that Github's UI supports it.

markdown from test e ***~~xxx~~***yyy = e ~~xxx~~yyy
other parsers (my ide, and some live example sites) were showing me: e ***~~xxx~~***yyy

assert_eq!(
    to_html_with_options("e ***~~xxx~~***yyy", &Options::gfm())?,
    "<p>e <em><strong><del>xxx</del></strong></em>yyy</p>",
    "interplay"
);

Now I went through my new implementation to see if I could get this test to pass while maintaining the test at the top. I tweaked one of the refactor's new functions, is_right_flanking_delimiter_run, to treat preceding tildes as Other, else the new flow that includes ascii punctuation.

Happy to open the PR if asked, just didn't want to surprise anyone with an out of the blue PR.

The text was updated successfully, but these errors were encountered:

wooorm · 2023-11-06T14:23:32Z

Hey! Thanks for your patience!

I eventually landed on a reimplementation of the determining of an AttentionSequence's open/close state using something maybe too close to the GFM spec's wording.

This might, or might not, be useful here!
Does it pass all the tests?

In JS, here’s how it works: https://github.com/micromark/micromark/blob/929275e2ccdfc8fd54adb1e1da611020600cc951/packages/micromark-core-commonmark/dev/lib/attention.js#L261-L268
I think I implemented it here a bit differently?
Either because I figured out a smarter way.
Or perhaps indeed because I missed a bug—this bug.

Github's UI is able to handle something that their own spec doesn't necessarily support, but was being tested for in this repo I imagine for the exact reason that Github's UI supports it.

Yes, CM and GFM are specs that are super limited.
As in, pretty useless.

markdown-rs and micromark are reverse engineered with tons of things checked against their actual implementation.

So, which way to go here, depends more on how the CM dingus works and GH works here in comments/readmes, then on what their specs say!

rawilder · 2023-11-12T03:12:14Z

With that context then, I'll open the simpler PR which simply addresses the bug rather than the refactor that makes it read more like the spec.

rawilder · 2023-11-12T03:33:20Z

#88

rawilder changed the title ~~GFM strikethrough causes nested attention sequences to be ignored~~ GFM strikethrough causes nested attention sequences to be considered just text data Oct 27, 2023

wooorm closed this as completed in f2c94af Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GFM strikethrough causes nested attention sequences to be considered just text data #84

GFM strikethrough causes nested attention sequences to be considered just text data #84

rawilder commented Oct 27, 2023

wooorm commented Nov 6, 2023

rawilder commented Nov 12, 2023

rawilder commented Nov 12, 2023

GFM strikethrough causes nested attention sequences to be considered just text data #84

GFM strikethrough causes nested attention sequences to be considered just text data #84

Comments

rawilder commented Oct 27, 2023

wooorm commented Nov 6, 2023

rawilder commented Nov 12, 2023

rawilder commented Nov 12, 2023