data: URIs are incorrectly stripped #80

nikclayton · 2019-02-28T09:26:14Z

Describe the bug

The markup

![Red dot](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==)

Should show a red dot as an image.

It doesn't. Inspecting the generated HTML from the Markdown shows that the generated img element has no src attribute.

I think, but have not confirmed, that this is because https://github.com/writeas/writefreely/blob/32e99d00415c6e86a9536d9b824dcdf0b119270d/posts.go#L1368 does not include data: as a valid protocol.

Expected behavior

img element should be created with the correct src attribute.

The text was updated successfully, but these errors were encountered:

thebaer · 2019-02-28T14:11:01Z

Thanks. I'm not sure that we want to support data URIs in WriteFreely -- at least not for all instances. But I'm open to your input. First, some things to consider:

I know WF doesn't have photo hosting built-in, so this could be an attractive feature. But some admins, especially of multi-user instances, might not be expecting to host large chunks of data inside each post, as supporting data URIs would enable.

Also, as far as I understand it, adding support would make the site vulnerable to XSS attacks by post authors. Again, this is most important for multi-user instances where the admin can't trust user input, and malicious users could have a wider reach with the Reader feature.

Due to these issues, I wouldn't want to support data URIs for everyone, including on Write.as. But one thing we could do, for those that really need this, would be to only allow data URIs on single-user instances. What do you think?

To add support, we'd actually need to chance this func: https://github.com/writeas/writefreely/blob/32e99d00415c6e86a9536d9b824dcdf0b119270d/postrender.go#L157-L167 to include this line:

policy.AllowDataURIImages()

emsenn · 2019-02-28T16:16:23Z

Thanks for posting via Fediverse about this Issue - I want to voice my support for it; I have these sort of
links in my content that I occassionally through a Write.freely instance.

But I agree that this should be something that is clearly - explicitly - disabled by default, for the reasons y'all say. But i think it should be real clear in the documentation that this sort of protection is in place - as I see it (semantically, not necessarily in implementation), limiting what kinds of URIs can be used is a deviation from the "standard," as much as there can be one in this situation, yeah? Like, HTML doesn't normally block (or fail to account for?) data URIs, so if y'all do, you should be explicit about that.

tl;dr: I support parsing data URIs as an off-by-default option.

thebaer · 2019-02-28T16:37:19Z

I appreciate the input, @emsenn. I agree the documentation is lacking on this subject right now -- if you wouldn't mind, could you create a quick issue for this on the documentation repo so we can make sure it gets addressed?

Otherwise, would you be okay if that optionality is tied to the single-/multi-user config setting, like I mentioned? That is: data URIs are disabled for multi-user instances but enabled for single-user instances?

emsenn · 2019-02-28T16:47:39Z

I think that could be a sensible way to do it, though if it is as simple as adding the specified line to the specified function, could the solution be as simple as documenting that one could add that line to allow that parsing?

I'm concerned that parsing data uri images is such a niche feature, it's not worth introducing a new deviation between single- and multi-user instances behavior, even if it only affects posters, not readers.

thebaer · 2019-02-28T17:02:42Z

Yeah, I would say that tying it to that setting only makes sense if we also relax other parts of the allowed HTML policy.

But actually, then that would affect mobility: writers' posts would break if they move from a single-user blog to a multi-user instance. So yeah, this kind of deviation might not make sense. As far as post rendering, it should probably be consistent across configurations, above all else.

emsenn · 2019-02-28T17:18:07Z

To confirm I understand, you're saying that, in general, you want to stay away from options that affect how the source gets rendered, to preserve easy migration?

And so in the specific, this should not be an option because it'd require converting posts to migrate between an instance with it to one without?

I don't know if I agree with prioritizing easy migration, but the logic seems reasonable. (To be clear, I don't disagree with the priority, I just haven't thought about it.)

Would that mean this issue can be closed just by adding documentation so that users know what the expected behavior is? Seems fair to me.

As a user, I'm already posting to writefreely using an Emacs script, which I suspect could be modified to parse or flag data URIs if I wanted. (I mention only to say that the problem being "unsolved" within your software does not mean the problem is unsolvable, for me. I'm generally just trying to include as much info I as I can think of since this seems a pretty "philosophical"

thebaer · 2019-02-28T18:42:43Z

To confirm I understand, you're saying that, in general, you want to stay away from options that affect how the source gets rendered, to preserve easy migration?

And so in the specific, this should not be an option because it'd require converting posts to migrate between an instance with it to one without?

Right. Data portability is important, but also: the way that the software renders user input is core to the entire platform.

If we support these minor differences in core functionality, in the same application but with different configurations across different servers, it means users will inevitably end up more confused, need to spend more time reading docs, getting support from instance admins, etc. Even if you're not moving your data, I think the end-user expectation will be that one WF instance behaves, at its core, in the same way as on any other instance.

As a user, I'm already posting to writefreely using an Emacs script, which I suspect could be modified to parse or flag data URIs if I wanted.

That's good to know. I think the best solution will be to get photo hosting built-in (as is planned) so clients can upload photos and just use image URLs instead.

I think we've arrived at a pretty good conclusion, but I'll leave the issue open for a bit longer, in case anyone else has any input (including @nikclayton).

dpc · 2019-02-28T20:32:23Z

Maybe this could be enabled only for subset of data payloads (only base64 encoded png,jpg,gif) and only for certain, small max size.

It is an useful feature to enable small images inside the document itself.

nikclayton · 2019-03-01T07:12:45Z

Whitelisting image MIME types was the approach chosen in https://snyk.io/vuln/npm:markdown-it:20160912, FWIW.

thebaer · 2019-03-06T19:42:39Z

Thanks @nikclayton, good to know. We'll need to see if bluemondays AllowDataURIImages() func takes care of this, or how we can do this otherwise. Anyone should feel free to test this and let us know -- it'll still be a while before I can dig into it.

I am still concerned that supporting this (and publicizing it) will encourage people to encode and embed large images in posts while there's no other built-in way to add media to posts. Again, the problems arise both for admins not expecting large chunks of non-text data and for readers who have to deal with un-cache-able images. Plus, limiting the size of images embedded this way will be difficult, and likely sub-optimal from a UX perspective (e.g. how do I know ahead of time if I'm embedding an image that's too large?).

dpc · 2019-03-06T19:55:40Z

@thebaer I still don't understand why posts are not cacheable. At very least they should use etags, no?

thebaer · 2019-03-13T17:55:48Z

@dpc I think the chance of a single post being re-viewed by the average reader is pretty low, so client-side caching would have minimal impact. But if we did add it, as things are right now it would just mean that page views aren't counted.

dpc · 2019-03-14T05:20:45Z

@thebaer With etags client does send the request every time, so it seems to me there is possibility to count it. And it seems to me that re-reviewing the same post is actually quite common. Especially the authors tends to re-read their posts multiple time (I do, at least).

thebaer · 2019-03-14T12:11:26Z

@dpc Ah, good point -- if you don't mind, let's continue this discussion on the forum, since it isn't related to this particular issue.

dangom · 2020-05-16T17:01:48Z

Hi guys, I was wondering if there any ideas out there on how to embed images into posts since this issue was opened?

thebaer · 2020-09-02T14:04:39Z

@dangom We've had a small discussion about this on the forum -- hopefully some resources there can help out!

On another note, I'm closing this issue since it turned into more of a discussion, and we haven't made any further progress here. If anyone is interested in continuing this, we'd be happy to look at a pull request that adds support for data: URIs to see if it makes sense in the application.

dangom · 2020-09-03T21:52:41Z

Sure, as long as we had an API to upload photos too, that'd be great. The data: URI would've been just a workaround. This issue was originally opened because OP and I were publishing to writefreely from Emacs, but there was no way to get images uploaded transparently.

thebaer added enhancement Proposed enhancements to current behavior discussion labels Feb 28, 2019

emsenn mentioned this issue Feb 28, 2019

Document Link Parsing Limitations writefreely/documentation#1

Open

nikclayton mentioned this issue Mar 1, 2019

Convert images to data URIs dangom/writefreely.el#16

Open

thebaer closed this as completed Sep 2, 2020

thebaer mentioned this issue Dec 2, 2021

Data URLs are not rendert in images #512

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data: URIs are incorrectly stripped #80

data: URIs are incorrectly stripped #80

nikclayton commented Feb 28, 2019

thebaer commented Feb 28, 2019 •

edited

Loading

emsenn commented Feb 28, 2019

thebaer commented Feb 28, 2019

emsenn commented Feb 28, 2019

thebaer commented Feb 28, 2019

emsenn commented Feb 28, 2019

thebaer commented Feb 28, 2019

dpc commented Feb 28, 2019 •

edited

Loading

nikclayton commented Mar 1, 2019

thebaer commented Mar 6, 2019

dpc commented Mar 6, 2019

thebaer commented Mar 13, 2019

dpc commented Mar 14, 2019

thebaer commented Mar 14, 2019

dangom commented May 16, 2020

thebaer commented Sep 2, 2020

dangom commented Sep 3, 2020

data: URIs are incorrectly stripped #80

data: URIs are incorrectly stripped #80

Comments

nikclayton commented Feb 28, 2019

Describe the bug

Expected behavior

thebaer commented Feb 28, 2019 • edited Loading

emsenn commented Feb 28, 2019

thebaer commented Feb 28, 2019

emsenn commented Feb 28, 2019

thebaer commented Feb 28, 2019

emsenn commented Feb 28, 2019

thebaer commented Feb 28, 2019

dpc commented Feb 28, 2019 • edited Loading

nikclayton commented Mar 1, 2019

thebaer commented Mar 6, 2019

dpc commented Mar 6, 2019

thebaer commented Mar 13, 2019

dpc commented Mar 14, 2019

thebaer commented Mar 14, 2019

dangom commented May 16, 2020

thebaer commented Sep 2, 2020

dangom commented Sep 3, 2020

thebaer commented Feb 28, 2019 •

edited

Loading

dpc commented Feb 28, 2019 •

edited

Loading