Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer overflow with "--footer-html" and "data" URLs #2280

Closed
metagriffin opened this Issue Apr 3, 2015 · 19 comments

Comments

Projects
None yet
4 participants
@metagriffin
Copy link

metagriffin commented Apr 3, 2015

When invoking wkhtmltopdf with a header/footer with a data-scheme URL, such as:

wkhtmltopdf --footer-html 'data:text/html;base64,PGh0bWw+VGVzdDwvaHRtbD4K' IN OUT

the footer contains randomly appended characters -- see the screenshot:

screenshot

(the base64 decoded text is: <html>Test</html>)

@ashkulz

This comment has been minimized.

Copy link
Member

ashkulz commented Apr 5, 2015

Hmm, --footer-html is supposed to be used with actual files and not a data: URI ... still a bug, though.

@ashkulz ashkulz added the Verified label Apr 5, 2015

@ashkulz ashkulz closed this in c477d24 Apr 6, 2015

@ashkulz ashkulz added Fixed and removed Verified labels Apr 6, 2015

@ashkulz ashkulz added this to the 0.12.3 milestone Apr 6, 2015

@ashkulz

This comment has been minimized.

Copy link
Member

ashkulz commented Apr 6, 2015

Thanks for the catch! Now it will give the error message:

Error: --footer-html should be a URL and not a string containing HTML code.
Exit with code 1, due to unknown error.
@metagriffin

This comment has been minimized.

Copy link
Author

metagriffin commented Apr 6, 2015

this is a very unfortunate solution. use of the "data" scheme is perfectly valid, IMHO, and very powerful. i was hoping to use the --footer-html to be able to control the footer very precisely, but now i need a persistent location for the HTML, ie. a server, which in my particularly context i do not have access to since all content is generated on the fly (and there is no server).

is there any way that you can re-implement this to allow for that?

@ashkulz

This comment has been minimized.

Copy link
Member

ashkulz commented Apr 7, 2015

How would query string parameters be handled in that case if data URIs were allowed?

@metagriffin

This comment has been minimized.

Copy link
Author

metagriffin commented Apr 7, 2015

just like in a browser, they should be ignored, i.e. no special handling needs to be done for query strings. for example, you can try the following in your browser:

data:text/html;base64,PGh0bWw+VGVzdDwvaHRtbD4K?foo=bar
@ashkulz

This comment has been minimized.

Copy link
Member

ashkulz commented Apr 7, 2015

The moment you add a query string, it does not work in Firefox & Chrome ... so I guess they are not allowed for data URIs.

@metagriffin

This comment has been minimized.

Copy link
Author

metagriffin commented Apr 7, 2015

i just confirmed that, sorry! -- i had tried it with FF, first without the query-string, which worked, and then simply added the QS, which also appeared to work because it was just leaving the page unchanged. unfortunately, it does not work if you go directly to it with the QS.

interestingly, if you add a # before the ?, it works in FF (but not chrome).

perhaps it is the additional query strings that are being added automatically that are causing the garbage text?

@ashkulz

This comment has been minimized.

Copy link
Member

ashkulz commented Apr 7, 2015

Not sure I want to investigate it further ... as long as it gets caught beforehand 😄

@ashkulz

This comment has been minimized.

Copy link
Member

ashkulz commented Jul 14, 2015

A preview for the next release 0.12.3-dev-79ff51e is available, which should contain the fix for this issue. Please report back if it is not solved with the above version.

Please note that the above downloads will be removed after the 0.12.3 release.

@ashkulz

This comment has been minimized.

Copy link
Member

ashkulz commented Jan 21, 2016

0.12.3 has been released, which should contain the fix for this issue. Please report back if it is not solved with the above version.

Note that the preview downloads mentioned above have been removed.

@tyuwan

This comment has been minimized.

Copy link

tyuwan commented Apr 4, 2016

Has this issue already been resolved? It still doesn't allows me to use the data-scheme.

0.12.3 still checking for header and footer containing html code.

        if (looksLikeHtmlAndNotAUrl(s.header.htmlUrl)) {
            emit out.error("--header-html should be a URL and not a string containing HTML code.");
            fail();
            return;
        }
@ashkulz

This comment has been minimized.

Copy link
Member

ashkulz commented Apr 5, 2016

@tyuwan: the fix was to disallow data URIs because of the overflow and the inability to use query string parameters (see the above discussion).

@tyuwan

This comment has been minimized.

Copy link

tyuwan commented Apr 5, 2016

@ashkulz: Is there a way to is there any way that you can re-implement to allow data string OR URL? Since I normally generated all the content on the fly.

@ashkulz

This comment has been minimized.

Copy link
Member

ashkulz commented Apr 6, 2016

@tyuwan: not really, there would also be limitations due to maximum command-line length (as wkhtmltopdf is cross-platform). It is simply better to use temporary files which you delete after conversion.

@metagriffin

This comment has been minimized.

Copy link
Author

metagriffin commented Apr 7, 2016

@ashkulz: speaking for myself (and i suspect for @tyuwan as well), i politely disagree with you. yes, you are correct that in the situation where there is a lot of data, then using the command line is not viable. for all other situations (definitely mine and probably @tyuwan's), using the command line is not a problem. furthermore, in my case, i am generating the string in server memory, so needing to create a temporary file is a significant annoyance (i need to write a lot of extra code, ensure correct access, ensure that the file is correctly deleted, more unit tests, etc).

please do us a favor and remove the artificial restriction that you have put in place and fix the buffer overflow keeping us from using your tool the way that best works for us. thanks!

@ashkulz

This comment has been minimized.

Copy link
Member

ashkulz commented Apr 10, 2016

data URIs do not support query strings -- so they cannot be used for the header/footer HTML, because that usage requires query string parameters. The buffer overflow is probably because of incomplete validation in the Qt/WebKit code, which is why additional validation was added in wkhtmltopdf.

So this not an artificial restriction, but a limitation of the spec -- which was also referred to above.

@metagriffin

This comment has been minimized.

Copy link
Author

metagriffin commented Apr 10, 2016

ok, since by definition the content provided in a "data" uri scheme cannot depend on the query strings (since it is static), it would make sense to not include the query strings if the URL uses the "data" scheme. this is (for me) the perfect solution: i can generate static headers/footers and don't need to create temporary files.

so, that would comply with the spec and may address the qt/webkit problem, if that is where the problem was. would that be possible?

@ashkulz

This comment has been minimized.

Copy link
Member

ashkulz commented Apr 11, 2016

That would introduce a strange limitation -- for data URIs only static headers/footers are supported. I'm pretty sure it would come as a surprise to most people who think of using this functionality, so I'm not in favor of implementing it...

madnight added a commit to madnight/docker-wkhtmltopdf-aas that referenced this issue Jul 18, 2016

@pawepaw

This comment has been minimized.

Copy link

pawepaw commented Jan 7, 2019

Isn't it possible to create temp file inside wkhtmltopdf lib as you do for body html? it's a strange limitation that you have to create temp file for footer/header but not for body.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.