Skip to content

XSS via dangerous URI schemes in HtmlRenderer: HTML entity bypass and missing data:/vbscript: filters #675

@dyingman1

Description

@dyingman1

Describe the bug

flexmark-java's default SUPPRESSED_LINKS pattern (javascript:.*) can be bypassed in two ways, both resulting in dangerous URLs being rendered as clickable <a href="..."> links in the HTML output:

  1. HTML entity bypass — When a link URL in the source Markdown contains HTML entities (e.g. &#106; for j), isSuppressedLinkPrefix() checks the raw, un-decoded URL string and does not match the javascript:.* pattern. However, resolvedLink.getUrl() returns the decoded URL, which is placed in the href attribute. The browser then decodes the entity and executes the script. This was previously noted in issue XSS via HTML entities in javascript: URLs #672.

  2. Missing scheme coverage — The default blocklist only covers javascript:. The data: and vbscript: URI schemes are not blocked by default, allowing them to be used as-is without any encoding tricks.

Affected component:

  • HtmlRenderer

To Reproduce

import com.vladsch.flexmark.html.HtmlRenderer;
import com.vladsch.flexmark.parser.Parser;
import com.vladsch.flexmark.util.ast.Node;
import com.vladsch.flexmark.util.data.MutableDataSet;

public class XssPoC {
    public static void main(String[] args) {
        MutableDataSet options = new MutableDataSet();
        Parser parser = Parser.builder(options).build();
        HtmlRenderer renderer = HtmlRenderer.builder(options).build();

        // Bypass 1: HTML entity encoding of 'j' evades the javascript: filter
        String md1 = "[click me](&#106;avascript:alert(document.domain))";
        Node doc1 = parser.parse(md1);
        System.out.println(renderer.render(doc1));
        // Output: <p><a href="javascript:alert(document.domain)">click me</a></p>

        // Bypass 2: data: URI scheme is not in the default blocklist
        String md2 = "[click me](data:text/html,<script>alert(document.domain)</script>)";
        Node doc2 = parser.parse(md2);
        System.out.println(renderer.render(doc2));
        // Output: <p><a href="data:text/html,&lt;script&gt;alert(document.domain)&lt;/script&gt;">click me</a></p>
        // Browser decodes HTML entities in href attribute value → executes script

        // Bypass 3: vbscript: URI scheme is not in the default blocklist
        String md3 = "[click me](vbscript:msgbox(1))";
        Node doc3 = parser.parse(md3);
        System.out.println(renderer.render(doc3));
        // Output: <p><a href="vbscript:msgbox(1)">click me</a></p>
    }
}

Root cause (Bypass 1)

In CoreNodeRenderer, the suppression check uses node.getUrl(), which returns the raw URL from the parsed Markdown source — HTML entities are not decoded at this point. The pattern javascript:.* therefore does not match &#106;avascript:.... After the check passes, resolvedLink.getUrl() (with entities decoded) is written into the href attribute:

// CoreNodeRenderer.java
if (context.isDoNotRenderLinks() || isSuppressedLinkPrefix(node.getUrl(), context)) {
    html.text(text);  // ← check uses raw URL
} else {
    ResolvedLink resolvedLink = context.resolveLink(LinkType.LINK, text, null);
    html.attr("href", resolvedLink.getUrl());  // ← href uses decoded URL
}

Root cause (Bypasses 2 & 3)

HtmlRenderer.SUPPRESSED_LINKS defaults to "javascript:.*" only. data: and vbscript: URIs pass the filter without any modification.

// HtmlRenderer.java
final public static DataKey<String> SUPPRESSED_LINKS =
    new DataKey<>("SUPPRESSED_LINKS", "javascript:.*");  // data: and vbscript: not covered

Expected behavior

All three inputs should render as plain text (no <a> tag), or the href should be replaced with a safe fallback such as #. The suppression check should operate on the decoded URL, and the default blocklist should cover all commonly dangerous URI schemes.

Expected output for all three inputs:

<p>click me</p>

or

<p><a href="#">click me</a></p>

Resulting Output

Actual output (flexmark 0.64.8, JDK 17):

<!-- Bypass 1 -->
<p><a href="javascript:alert(document.domain)">click me</a></p>

<!-- Bypass 2 -->
<p><a href="data:text/html,&lt;script&gt;alert(document.domain)&lt;/script&gt;">click me</a></p>

<!-- Bypass 3 -->
<p><a href="vbscript:msgbox(1)">click me</a></p>

All three render clickable links with dangerous URIs. A user who clicks any of these in an application that renders user-supplied Markdown will trigger script execution in their browser.

Additional context

Tested against com.vladsch.flexmark:flexmark:0.64.8 (current Maven Central release) with default renderer options on JDK 17.

Suggested fix:

  1. Decode HTML entities in the URL before applying isSuppressedLinkPrefix(), so that entity-encoded schemes are detected correctly.
  2. Expand the default SUPPRESSED_LINKS pattern to also cover data: and vbscript:, for example: "(?i)(javascript|data|vbscript):.*".

Related: issue #672 (HTML entity bypass was previously reported; this report adds the missing-scheme vectors and provides a consolidated reproducer).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions