Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MarkdownFixer] Do not escape username with underscores in code or codeblocks #5785

Conversation

@JuanitoFatas
Copy link
Contributor

JuanitoFatas commented Jan 28, 2020

What type of PR is this? (check all applicable)

  • Refactor

  • Feature

  • Bug Fix

  • Optimization

  • Documentation Update

  • Tests are passing

    bundle exec rspec spec/labor/markdown_fixer_spec.rb spec/requests/api/v0/articles_spec.rb spec/services/articles/creator_spec.rb spec/services/mentions/create_all_spec.rb
  • I‘ve verified it works on development
    starts the server at local, sign in, new post, copy the following to preview and see if things work as expected. You should not see \ in code and codeblock, and underscored username renders correctly.

    Code for testing at development
    @_dev_
    
    ```
    @_dev_
    ```
    
    `@_dev_`
    
    @_dev_
    

Description

The underscored username got escaped in code and code blocks. We probably do not want that. The fix is only to escape underscored username if they are not in code or code blocks.

Other possible approaches:

  • Write regexp that only pull out underscored usernames not in code or code blocks
  • Turn markdown into HTML and use Nokogiri to work on text of ancestors not pre or code

Related Tickets & Documents

Fixes #5739.

Screenshots

Current on dev (2019.01.28, 26e4412)

スクリーンショット 2020-01-28 17 41 26

After this Pull Request (tested against 432885f)

スクリーンショット 2020-02-09 22 49 42

スクリーンショット 2020-02-09 22 36 07

Added to documentation?

  • docs.dev.to
  • readme
  • no documentation needed
  • code comments added

[optional] What gif best describes this PR or how it makes you feel?

@pr-triage pr-triage bot added the PR: unreviewed label Jan 28, 2020
@claassistantio

This comment has been minimized.

Copy link

claassistantio commented Jan 28, 2020

CLA assistant check
All committers have signed the CLA.

@JuanitoFatas JuanitoFatas force-pushed the JuanitoFatas:markdown_fixer/not-escape-for-code-or-codeblocks branch from d486f87 to 6591cb7 Jan 28, 2020
@rhymes rhymes changed the title [MarkdownFixer] Do not escape underscores in code or codeblocks [WIP] [MarkdownFixer] Do not escape underscores in code or codeblocks Jan 28, 2020
@pr-triage pr-triage bot removed the PR: unreviewed label Jan 28, 2020
@JuanitoFatas JuanitoFatas force-pushed the JuanitoFatas:markdown_fixer/not-escape-for-code-or-codeblocks branch from 6591cb7 to 5315367 Jan 28, 2020
@JuanitoFatas JuanitoFatas changed the title [WIP] [MarkdownFixer] Do not escape underscores in code or codeblocks [MarkdownFixer] Do not escape underscores in code or codeblocks Jan 28, 2020
@JuanitoFatas JuanitoFatas requested a review from rhymes Jan 28, 2020
@pr-triage pr-triage bot added the PR: unreviewed label Jan 28, 2020
@JuanitoFatas JuanitoFatas changed the title [MarkdownFixer] Do not escape underscores in code or codeblocks [WIP] [MarkdownFixer] Do not escape username with underscores in code or codeblocks Jan 28, 2020
@pr-triage pr-triage bot removed the PR: unreviewed label Jan 28, 2020
@JuanitoFatas JuanitoFatas changed the title [WIP] [MarkdownFixer] Do not escape username with underscores in code or codeblocks [MarkdownFixer] Do not escape username with underscores in code or codeblocks Jan 29, 2020
@pr-triage pr-triage bot added the PR: unreviewed label Jan 29, 2020
@rhymes rhymes requested review from maestromac and Zhao-Andy and removed request for rhymes Jan 30, 2020
Copy link
Contributor

Zhao-Andy left a comment

This looks good to me, thanks for the fix!

@Zhao-Andy

This comment has been minimized.

Copy link
Contributor

Zhao-Andy commented Feb 3, 2020

I assume this PR is good because all tests are passing now, but @JuanitoFatas let me know if I'm mistaken. :)

@JuanitoFatas JuanitoFatas force-pushed the JuanitoFatas:markdown_fixer/not-escape-for-code-or-codeblocks branch from 5315367 to 477f152 Feb 4, 2020
@JuanitoFatas

This comment has been minimized.

Copy link
Contributor Author

JuanitoFatas commented Feb 4, 2020

@Zhao-Andy

I assume this PR is good because all tests are passing now, but @JuanitoFatas let me know if I'm mistaken. :)

Rebased with master and verified it works

スクリーンショット 2020-02-04 09 27 52

and tests passed 🎉

スクリーンショット 2020-02-04 09 32 22

:shipit:

@mstruve mstruve requested a review from rhymes Feb 6, 2020
@mstruve mstruve requested a review from citizen428 Feb 6, 2020
Copy link
Contributor

rhymes left a comment

Thank you @JuanitoFatas, markdown parsing bugs are always complicated to fix, your solution seems very clever!

I left notes about adding comments to the new code as node traversal and regexp aren't usually super intuitive for future new readers.

ps. don't forget to prefix your branches with your name in the future, you can checkout the contributing guide for details.

app/labor/markdown_fixer.rb Outdated Show resolved Hide resolved
app/labor/markdown_fixer.rb Show resolved Hide resolved
@JuanitoFatas JuanitoFatas force-pushed the JuanitoFatas:markdown_fixer/not-escape-for-code-or-codeblocks branch from 477f152 to 432885f Feb 9, 2020
@JuanitoFatas

This comment has been minimized.

Copy link
Contributor Author

JuanitoFatas commented Feb 9, 2020

@rhymes

Thanks for the review. I‘ve added comments and changed the implementation to a less clever one 😉 . I also pulled the node traverser into a separate file, hope that make it easier to read. Verified this works and updated the screenshot in the Pull Request accordingly 🎉

@@ -47,11 +47,29 @@ def split_tags(markdown)
end

def underscores_in_usernames(markdown)
markdown.gsub(/(@)(_\w+)(_)/, '\1\\\\\2\\\\\3')
return markdown unless markdown.include?("@")

This comment has been minimized.

Copy link
@citizen428

citizen428 Feb 10, 2020

Contributor

This guard clause has to go through all of markdown to determine whether or not to return, so I don't think it saves us much work. If we skip this check we go through the document once, if we keep it we go through the document once + the percentage of text preceding the username.

This comment has been minimized.

Copy link
@JuanitoFatas

JuanitoFatas Feb 10, 2020

Author Contributor

There are number of tests that pass "", front matters, or text that do not have username at all (without @) to here. These make sense in real world. Hence checking for @ here to avoid doing work for irrelevant cases and all tests are happy.

This comment has been minimized.

Copy link
@citizen428

citizen428 Feb 11, 2020

Contributor

I understand the reason for the check, my only slight concern was this:

Case 1: markdowndoesn't contain an @: we're still going to compare every character in the document to @ for the include? check. I'd guess a majority of articles won't have usernames in them.

Case 2. markdown contains an @: we read markdown once until we find the @, then traverse again.

Maybe the guard clause should actually make use of the regular expression you defined, something like

return markdown unless markdown =~ USERNAME_WITH_UNDERSCORE_REGEXP

as that's actually what we care about, since the @ sign does not necessarily indicate a username (in some languages it's used for array concatenation for example).

This comment has been minimized.

Copy link
@JuanitoFatas

JuanitoFatas Feb 13, 2020

Author Contributor

Hey checking against the regexp makes sense. I‘ve updated the code.

app/labor/markdown_fixer.rb Outdated Show resolved Hide resolved
app/labor/markdown_fixer.rb Show resolved Hide resolved
app/labor/markdown_traverser.rb Show resolved Hide resolved
Copy link
Contributor Author

JuanitoFatas left a comment

🏓

@@ -47,11 +47,29 @@ def split_tags(markdown)
end

def underscores_in_usernames(markdown)
markdown.gsub(/(@)(_\w+)(_)/, '\1\\\\\2\\\\\3')
return markdown unless markdown.include?("@")

This comment has been minimized.

Copy link
@JuanitoFatas

JuanitoFatas Feb 10, 2020

Author Contributor

There are number of tests that pass "", front matters, or text that do not have username at all (without @) to here. These make sense in real world. Hence checking for @ here to avoid doing work for irrelevant cases and all tests are happy.

app/labor/markdown_fixer.rb Outdated Show resolved Hide resolved
app/labor/markdown_fixer.rb Show resolved Hide resolved
app/labor/markdown_traverser.rb Show resolved Hide resolved
@JuanitoFatas JuanitoFatas requested a review from citizen428 Feb 10, 2020
Copy link
Contributor

citizen428 left a comment

Left a bunch more comments, but don't let any of them stop you.

It only runs when markdown has underscored usernames.

Skip escaping underscore when content is in code or codeblock. It works by
going through all lines of a markdown content. Find underscored username.
Escape if we are not in the codeblock.

The (?<!) is negative lookbehind which rules out the case
when a underscored username is present in code, e.g., `@_dev_` will not be escaped.
@JuanitoFatas JuanitoFatas force-pushed the JuanitoFatas:markdown_fixer/not-escape-for-code-or-codeblocks branch from 0e92d06 to 2ac92e8 Feb 13, 2020
@rhymes rhymes requested a review from citizen428 Feb 13, 2020
Copy link
Contributor

citizen428 left a comment

This looks good now, thanks for updating the guard clause.

@rhymes rhymes requested a review from Zhao-Andy Feb 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants
You can’t perform that action at this time.