Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Hashtags" in code blocks get parsed as hashtags #6

thebaer opened this issue Nov 11, 2018 · 3 comments

"Hashtags" in code blocks get parsed as hashtags #6

thebaer opened this issue Nov 11, 2018 · 3 comments


Copy link

thebaer commented Nov 11, 2018

As reported here:

Copy link

MCSH commented Nov 12, 2018

How about this idea?

instead of using regexp for replacing hashtags, using a parser that does different passes. Like this

It would be something like this:

assign last_codeblock_end = 0

start = 0
for start < len(doc):
if start < last_codeblock_end:
ignore everything
check if this is a new code block
then if not do rendering

I don't think its possible to parse it using regexp, at least not with vanilla regexp, I'm not sure how different Go's regexp is from the regular expression used in theory of computation.

Just for reference, this is the code that needs changing:

Hope this helps.

Copy link

mrvdb commented Nov 30, 2018

How can we move forward here? This is probably less trivial than it looks. Some things which stood out to me:

  • i would have thought md -> html conversion was a more or less 'solved problem' by now, what makes our conversion special?
  • the md parser seems a modified version of correct? some details on why would be helpful
  • the extracttags is coming from which seems to be optimized for tweet parsing. Perhaps not the best choice for extracting hashtags in general?

Copy link
Member Author

thebaer commented Dec 1, 2018

@MCSH Agreed that regex isn't the right solution, and some kind of better parser is needed. Thanks for the suggestion!


  • The trouble is with using regex to find hashtags in plain Markdown and just replacing them with HTML -- that method lacks the context to know instances when it shouldn't replace "hashtags", like inside code blocks
  • Right, I listed the changes made in the forked repo. Mostly, it's to make the parser more strict. Changes are all based off of how people were actually using, e.g. trying to insert special formatting / characters but not wanting it to actual render some special way.
  • The twitter-text-go library is the best one I've found that works with any language / character set you can throw at it. We can always switch if there's a better library out there, but I doubt any take markdown into consideration like we'd need them to.

mrvdb referenced this issue in mrvdb/writefreely Dec 3, 2018
This improves rendering in a number of situations:

- it keeps anchor tags working
- it gives the user some control for not linking, for example in code

hashTags at the beginning of a line without a space won't get linked.

Workaround related to issues #42 and #6 and #33
@thebaer thebaer self-assigned this Jan 14, 2019
@thebaer thebaer added this to the 1.0 milestone Jan 25, 2019
@thebaer thebaer closed this as completed in 32e99d0 Feb 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

No branches or pull requests

3 participants