Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle non-ASCII strings in header anchors #591

Merged
merged 2 commits into from
Dec 25, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
16 changes: 15 additions & 1 deletion ext/redcarpet/html.c
Original file line number Diff line number Diff line change
Expand Up @@ -281,16 +281,20 @@ rndr_header_anchor(struct buf *out, const struct buf *anchor)
int stripped = 0, inserted = 0;

for (; i < size; ++i) {
// skip html tags
if (a[i] == '<') {
while (i < size && a[i] != '>')
i++;
// skip html entities
} else if (a[i] == '&') {
while (i < size && a[i] != ';')
i++;
}
// replace non-ascii or invalid characters with dashes
else if (!isascii(a[i]) || strchr(STRIPPED, a[i])) {
if (inserted && !stripped)
bufputc(out, '-');
// and do it only once
stripped = 1;
}
else {
Expand All @@ -300,8 +304,18 @@ rndr_header_anchor(struct buf *out, const struct buf *anchor)
}
}

if (stripped)
// replace the last dash if there was anything added
if (stripped && inserted)
out->size--;

// if anchor found empty, use djb2 hash for it
if (!inserted && anchor->size) {
unsigned long hash = 5381;
for (i = 0; i < size; ++i) {
hash = ((hash << 5) + hash) + a[i]; /* h * 33 + c */
}
bufprintf(out, "part-%lx", hash);
}
}

static void
Expand Down
7 changes: 7 additions & 0 deletions test/html_render_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,13 @@ def test_non_ascii_removal_in_header_anchors
assert_equal html, render(markdown, with: [:with_toc_data])
end

def test_utf8_only_header_anchors
markdown = "# 見出し"
html = "<h1 id=\"part-37870bfa194139f\">見出し</h1>"

assert_equal html, render(markdown, with: [:with_toc_data])
end

def test_escape_entities_removal_from_anchor
output = render("# Foo's & Bar's", with: [:with_toc_data])
result = %(<h1 id="foos-bars">Foo&#39;s &amp; Bar&#39;s</h1>)
Expand Down