Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data: URIs are incorrectly stripped #80

Open
nikclayton opened this Issue Feb 28, 2019 · 14 comments

Comments

Projects
None yet
4 participants
@nikclayton
Copy link

nikclayton commented Feb 28, 2019

Describe the bug

The markup

![Red dot]()

Should show a red dot as an image.

It doesn't. Inspecting the generated HTML from the Markdown shows that the generated img element has no src attribute.

I think, but have not confirmed, that this is because

var imageURLRegex = regexp.MustCompile(`(?i)^https?:\/\/[^ ]*\.(gif|png|jpg|jpeg|image)$`)
does not include data: as a valid protocol.

Expected behavior

img element should be created with the correct src attribute.

@thebaer

This comment has been minimized.

Copy link
Member

thebaer commented Feb 28, 2019

Thanks. I'm not sure that we want to support data URIs in WriteFreely -- at least not for all instances. But I'm open to your input. First, some things to consider:

I know WF doesn't have photo hosting built-in, so this could be an attractive feature. But some admins, especially of multi-user instances, might not be expecting to host large chunks of data inside each post, as supporting data URIs would enable.

Also, as far as I understand it, adding support would make the site vulnerable to XSS attacks by post authors. Again, this is most important for multi-user instances where the admin can't trust user input, and malicious users could have a wider reach with the Reader feature.

Due to these issues, I wouldn't want to support data URIs for everyone, including on Write.as. But one thing we could do, for those that really need this, would be to only allow data URIs on single-user instances. What do you think?

To add support, we'd actually need to chance this func:

writefreely/postrender.go

Lines 157 to 167 in 32e99d0

func getSanitizationPolicy() *bluemonday.Policy {
policy := bluemonday.UGCPolicy()
policy.AllowAttrs("src", "style").OnElements("iframe", "video")
policy.AllowAttrs("frameborder", "width", "height").Matching(bluemonday.Integer).OnElements("iframe")
policy.AllowAttrs("allowfullscreen").OnElements("iframe")
policy.AllowAttrs("controls", "loop", "muted", "autoplay").OnElements("video")
policy.AllowAttrs("target").OnElements("a")
policy.AllowAttrs("style", "class", "id").Globally()
policy.AllowURLSchemes("http", "https", "mailto", "xmpp")
return policy
}
to include this line:

policy.AllowDataURIImages()
@emsenn

This comment has been minimized.

Copy link

emsenn commented Feb 28, 2019

Thanks for posting via Fediverse about this Issue - I want to voice my support for it; I have these sort of
links in my content that I occassionally through a Write.freely instance.

But I agree that this should be something that is clearly - explicitly - disabled by default, for the reasons y'all say. But i think it should be real clear in the documentation that this sort of protection is in place - as I see it (semantically, not necessarily in implementation), limiting what kinds of URIs can be used is a deviation from the "standard," as much as there can be one in this situation, yeah? Like, HTML doesn't normally block (or fail to account for?) data URIs, so if y'all do, you should be explicit about that.

tl;dr: I support parsing data URIs as an off-by-default option.

@thebaer

This comment has been minimized.

Copy link
Member

thebaer commented Feb 28, 2019

I appreciate the input, @emsenn. I agree the documentation is lacking on this subject right now -- if you wouldn't mind, could you create a quick issue for this on the documentation repo so we can make sure it gets addressed?

Otherwise, would you be okay if that optionality is tied to the single-/multi-user config setting, like I mentioned? That is: data URIs are disabled for multi-user instances but enabled for single-user instances?

@emsenn

This comment has been minimized.

Copy link

emsenn commented Feb 28, 2019

I think that could be a sensible way to do it, though if it is as simple as adding the specified line to the specified function, could the solution be as simple as documenting that one could add that line to allow that parsing?

I'm concerned that parsing data uri images is such a niche feature, it's not worth introducing a new deviation between single- and multi-user instances behavior, even if it only affects posters, not readers.

@thebaer

This comment has been minimized.

Copy link
Member

thebaer commented Feb 28, 2019

Yeah, I would say that tying it to that setting only makes sense if we also relax other parts of the allowed HTML policy.

But actually, then that would affect mobility: writers' posts would break if they move from a single-user blog to a multi-user instance. So yeah, this kind of deviation might not make sense. As far as post rendering, it should probably be consistent across configurations, above all else.

@emsenn

This comment has been minimized.

Copy link

emsenn commented Feb 28, 2019

To confirm I understand, you're saying that, in general, you want to stay away from options that affect how the source gets rendered, to preserve easy migration?

And so in the specific, this should not be an option because it'd require converting posts to migrate between an instance with it to one without?

I don't know if I agree with prioritizing easy migration, but the logic seems reasonable. (To be clear, I don't disagree with the priority, I just haven't thought about it.)

Would that mean this issue can be closed just by adding documentation so that users know what the expected behavior is? Seems fair to me.

As a user, I'm already posting to writefreely using an Emacs script, which I suspect could be modified to parse or flag data URIs if I wanted. (I mention only to say that the problem being "unsolved" within your software does not mean the problem is unsolvable, for me. I'm generally just trying to include as much info I as I can think of since this seems a pretty "philosophical"

@thebaer

This comment has been minimized.

Copy link
Member

thebaer commented Feb 28, 2019

To confirm I understand, you're saying that, in general, you want to stay away from options that affect how the source gets rendered, to preserve easy migration?

And so in the specific, this should not be an option because it'd require converting posts to migrate between an instance with it to one without?

Right. Data portability is important, but also: the way that the software renders user input is core to the entire platform.

If we support these minor differences in core functionality, in the same application but with different configurations across different servers, it means users will inevitably end up more confused, need to spend more time reading docs, getting support from instance admins, etc. Even if you're not moving your data, I think the end-user expectation will be that one WF instance behaves, at its core, in the same way as on any other instance.

As a user, I'm already posting to writefreely using an Emacs script, which I suspect could be modified to parse or flag data URIs if I wanted.

That's good to know. I think the best solution will be to get photo hosting built-in (as is planned) so clients can upload photos and just use image URLs instead.


I think we've arrived at a pretty good conclusion, but I'll leave the issue open for a bit longer, in case anyone else has any input (including @nikclayton).

@dpc

This comment has been minimized.

Copy link

dpc commented Feb 28, 2019

Maybe this could be enabled only for subset of data payloads (only base64 encoded png,jpg,gif) and only for certain, small max size.

It is an useful feature to enable small images inside the document itself.

@nikclayton

This comment has been minimized.

Copy link
Author

nikclayton commented Mar 1, 2019

Whitelisting image MIME types was the approach chosen in https://snyk.io/vuln/npm:markdown-it:20160912, FWIW.

@thebaer

This comment has been minimized.

Copy link
Member

thebaer commented Mar 6, 2019

Thanks @nikclayton, good to know. We'll need to see if bluemondays AllowDataURIImages() func takes care of this, or how we can do this otherwise. Anyone should feel free to test this and let us know -- it'll still be a while before I can dig into it.

I am still concerned that supporting this (and publicizing it) will encourage people to encode and embed large images in posts while there's no other built-in way to add media to posts. Again, the problems arise both for admins not expecting large chunks of non-text data and for readers who have to deal with un-cache-able images. Plus, limiting the size of images embedded this way will be difficult, and likely sub-optimal from a UX perspective (e.g. how do I know ahead of time if I'm embedding an image that's too large?).

@dpc

This comment has been minimized.

Copy link

dpc commented Mar 6, 2019

@thebaer I still don't understand why posts are not cacheable. At very least they should use etags, no?

@thebaer

This comment has been minimized.

Copy link
Member

thebaer commented Mar 13, 2019

@dpc I think the chance of a single post being re-viewed by the average reader is pretty low, so client-side caching would have minimal impact. But if we did add it, as things are right now it would just mean that page views aren't counted.

@dpc

This comment has been minimized.

Copy link

dpc commented Mar 14, 2019

@thebaer With etags client does send the request every time, so it seems to me there is possibility to count it. And it seems to me that re-reviewing the same post is actually quite common. Especially the authors tends to re-read their posts multiple time (I do, at least).

@thebaer

This comment has been minimized.

Copy link
Member

thebaer commented Mar 14, 2019

@dpc Ah, good point -- if you don't mind, let's continue this discussion on the forum, since it isn't related to this particular issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.