Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"safe" option #253

Closed
mnapoli opened this issue Jun 27, 2016 · 7 comments
Closed

"safe" option #253

mnapoli opened this issue Jun 27, 2016 · 7 comments
Labels
feedback wanted We need your input! question General questions about the project or usage

Comments

@mnapoli
Copy link
Contributor

mnapoli commented Jun 27, 2016

I think I'm confused with the safe option: if set to true, does that mean:

  • the Markdown input is considered "safe"?
  • the HTML output can be considered "safe"? (i.e. to use with an "unsafe" input)

I think I accidentally misinterpreted that and ended up opening XSS on my application. I'm afraid it might be wrongly interpreted by others. Would it make sense to rename that option (keeping BC of course) to something very explicit?

Cheers!

@colinodell
Copy link
Member

It should be number 2: the resulting HTML output will be safe because any raw HTML in the input will be escaped.

This option defaults to false for backwards-compatibility reasons (and because the spec calls for raw HTML to be preserved). However, I do agree that this option is confusing, so perhaps we should "rename" it. What do you think of this suggestion?

  1. Add a new option allow_raw_html.
    • false - Ensures the HTML is not vulnerable to XSS because raw HTML will be escaped.
    • true - Raw HTML is kept as-is; XSS possible.
    • (Basically the opposite values of safe)
  2. Deprecate safe (raise an E_DEPRECATED noticed if this configuration option is set - minor BC break)
  3. If both allow_raw_html and safe are set, the former takes precedence.

I'm open to any alternate names or other feedback on this.

@colinodell colinodell added the question General questions about the project or usage label Jun 27, 2016
@mnapoli
Copy link
Contributor Author

mnapoli commented Jun 27, 2016

👍 sounds good, but would allow_raw_html = false be the default? I'd say this would be a good move, but it seems to conflict with the CommonMark spec.

I'm going a bit off topic but is there any spec on the behavior when allow_raw_html = false? I.e. I was surprised to see all HTML tags removed, I was expecting them to be escaped instead. I.e. Markdown:

Example: <strong>Foo</strong>

Expected:

<p>Example: &lt;strong&gt;Foo&lt;/strong&gt;</p>

Actual:

<p>Example: Foo</p>

In the documents I'm processing it's common for users to just write code in text (not use backticks), I'm guessing I might not be the only one?

Would it make sense to have 3 behaviors:

  • strip tags
  • escape HTML entities
  • allow raw HTML

or is there a reason not to?

@colinodell
Copy link
Member

I'd say this would be a good move, but it seems to conflict with the CommonMark spec.

I totally agree with you. IMO there are two questions that need to be answered:

  1. Does changing this default violate my BC promise? (see the second paragraph under Versioning)
  2. Does the security benefit outweigh the fact that this parser will become non-compliant with the spec by default?

I'd happily make the proposed change if I was 100% convinced the answer to both questions is "yes", but unfortunately I'm not.

Perhaps the README and documentation should make this configuration highly visible so users are aware of it?

An alternative solution might be raising some kind of E_* error if the configuration option is not set (but using the current default). For example:

if (!isset($options['safe']) && !isset($options['allow_raw_html'])) {
    trigger_error('Failing to set the "allow_raw_html" option could make your site vulnerable to XSS attacks. See http://commonmark.thephpleague.com/configuration#allow_raw_html for more information.', E_USER_NOTICE);
}

That link would describe the option and which value to choose.

While this technically is a BC break, it's not nearly as severe. It also causes the user to explicitly state what functionality they want. Thoughts?

Would it make sense to have 3 behaviors:

Yeah I really like that! What about this: a new configuration option html_input with three possible values:

  • strip or remove
  • escape
  • preserve, allow or keep

(Naming things is hard...)

If you like the E_* suggestion, perhaps we could implement this variation:

if (isset($options['safe'])) {
    trigger_error('The "safe" option has been deprecated, use "html_input" instead. See http://commonmark.thephpleague.com/configuration#html_input for more information.', E_USER_DEPRECATED);
    $options['html_input'] = $options['safe'] ? 'strip' : 'preserve';
} elseif (!isset($options['html_input'])) {
    trigger_error('Failing to set the "html_input" option could make your site vulnerable to XSS attacks. See http://commonmark.thephpleague.com/configuration#html_input for more information.', E_USER_NOTICE);
    $options['html_input'] = 'escape';
}

Thoughts?

@colinodell colinodell added the feedback wanted We need your input! label Jun 27, 2016
@mnapoli
Copy link
Contributor Author

mnapoli commented Jun 28, 2016

I'd happily make the proposed change if I was 100% convinced the answer to both questions is "yes", but unfortunately I'm not.

Agreed, documenting it more visibly might be a good first step. Errors can be a pain to deal with, I'm afraid it would be a burden to some users.

An html_input with strip, escape and allow would be awesome (and very clear). I might have some time this week to have a look, do you have some pointers (e.g. the classes I should look at)?

@colinodell
Copy link
Member

I might have some time this week to have a look

That would be amazing!

do you have some pointers (e.g. the classes I should look at)?

Search the codebase for 'safe' - it only appears in a few places.

There are two renders which are responsible for converting HTML AST nodes into the final output - right now they simply return '' if safe mode is enabled, or the raw content if disabled. You'd probably want to put a switch here to return '', the raw contents, or escaped contents depending on the config value.

I haven't yet thought about the best location to implement the trigger_error code. Perhaps Environment::initializeExtensions() would work? This function is always run exactly once before the configuration options are used. There may be a better location but I'd need time to re-review everything to find it - suggestions are welcome, otherwise I can tackle this later.

BTW it looks like safe is referenced in a few other locations:

  • Link and Image renderers
  • An option on the bin/commonmark CLI command/app
  • A functional test for the above command
  • Default value in Environment on line 481

I apologize for not having the time to provide better, more-coherent information but perhaps later tonight or tomorrow I can do that.

@mnapoli
Copy link
Contributor Author

mnapoli commented Jun 28, 2016

As you said there's also the "safe links" behavior (only allow safe links), and I'm not sure an html_input option would make sense to control those. What do you think? Would it make sense to have a different option for that?

@colinodell
Copy link
Member

Yeah that makes sense to me.

On Tue, Jun 28, 2016 at 3:06 PM Matthieu Napoli notifications@github.com
wrote:

As you said there's also the "safe links" behavior (only allow safe
links), and I'm not sure an html_input option would make sense to control
those. What do you think? Would it make sense to have a different option
for that?


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#253 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAMVMrtsRVb_RPDv8lFxeAESp2XXtXT0ks5qQXC4gaJpZM4I_hkA
.

mnapoli added a commit to mnapoli/commonmark that referenced this issue Jun 28, 2016
Fixes thephpleague#253. The new `html_input` option allows 3 different behaviors:

- `strip` HTML input
- `allow` HTML input
- `escape` HTML input
mnapoli added a commit to mnapoli/commonmark that referenced this issue Jun 29, 2016
See thephpleague#253 and thephpleague#255. The `safe` option is now deprecated in favor of the `allow_unsafe_links` and `html_input` options.

The `allow_unsafe_links` is `true` by default for BC reasons.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feedback wanted We need your input! question General questions about the project or usage
Projects
None yet
Development

No branches or pull requests

2 participants