Skip to content
This repository has been archived by the owner on Jan 2, 2023. It is now read-only.

Expected usage for local files #2687

Open
NathanHazout opened this issue Nov 25, 2015 · 9 comments
Open

Expected usage for local files #2687

NathanHazout opened this issue Nov 25, 2015 · 9 comments
Labels

Comments

@NathanHazout
Copy link

What is the expected usage when the files are stored locally on the filesystem?

My HTML contains references to css or images that look like: /MFPSamples/lib/bootstrap/css/bootstrap.css.

When running the command, I get:

Warning: Failed to load file:///MFPSamples/lib/bootstrap/css/bootstrap.css

For some reason it tries to load the files from the root of the computer instead of relative to where I am running the command from.

Don't ask me to modify the source HTML, it is supposed to work online as well.

@PhilterPaper
Copy link

The HREF says /MFPSamples, which means that the file is expected to be found in the site root by the browser. Note that this is not the server filesystem root! Try it in a browser with a local copy of the files (page and CSS, etc.) and see if it still displays. I suspect that WebKit is reading such an absolute file path as literally being the filesystem address, which probably should be considered a bug (although how it has escaped notice for so long is a puzzle). It's certainly seems to be incompatible between local file use in wkHTMLtoPDF and browser usage (and in turn, possibly between web usage and local file usage with a browser). Please report back what you find for the three cases. I suspect that processing the page on the server (not as local files) will work correctly, but it will be interesting to see how both browser+local files and browser+web access work for you. By the way, what version of wkHTMLtoPDF and QT?

@NathanHazout
Copy link
Author

Sure if I run with a server it all works, if I open the HTML file without a server it fails with the same error, as expected.

But isn't there a way to specify the root of my system? Am I the first one to need this behavior??

I could in theory run wkHTMLtoPDF on the published site, but because there are hundreds of pages with images, it would be much slower than running it on the local file system.

Regarding versions, my command-line is running wkhtmltopdf 0.12.2.1 (with patched qt).

@PhilterPaper
Copy link

If trying to display it in a browser with local files (file:/// etc.) gives the same error, then I'd say that wkHTMLtoPDF is behaving the same way as a browser is, and is not at fault. All I ever used was relative paths (no http: ), so I'm not familiar with the way that both browsers (file, not server) and wkHTMLtoPDF are handling absolute file paths (starting with /). Given that the path starts with a /, it then seems reasonable that this would be treated as something in the filesystem root, given that there is no other information available to either the browser or wkHTMLtoPDF as to where in the filesystem this "root" actually is.

I'm not sure it would work to preprocess or edit the HTML files so that a <base> tag is used to specify the stuff in the filesystem that comes before the "absolute" path, as <base> I think is only used for relative paths. Maybe your preprocessor could taker all "absolute" paths have something prepended to them? It could even just be "." to make them into relative paths (such as replace href="/ with href="./). Does wkHTMLtoPDF have any command line parameter that is the equivalent of <base>, or of editing absolute paths to prepend a root path? That way you could avoid editing the files -- I didn't see anything with a quick look, but certainly worth checking for before doing a lot of work!

@samandiriel
Copy link

I just bumped into this same issue - I'd like to output HTML directly a file, send it to wkhtmltopdf on my webserver to turn into a PDF, but the relative paths on the JS, image and CSS files all get lost in the shuffle.

I tried symlinking from the HTML file directory to the relative paths (eg mapping mytemp/js to /www/mysite/js) but that didn't work so I don't think it was using the file location as the document root. Symlinking to the filesystem root produced similar non-results.

I was hoping there would be a switch for setting the default directory/site root, but I didn't spy any such in the documentation (I tried the Page Options "--allow ", but that was either ineffective or I'm using it incorrectly).

wkhtmltopdf --version
wkhtmltopdf 0.12.2.1 (with patched qt)

@bgadoury
Copy link

Adding a dot in front of a path starting with a slash would definitely make it relative when accessing the HTML as a file or from a webserver.

There's no option to change this on the fly for absolute URLs in wkhtmltopdf. The only time I can imagine this would be needed is if you have to fetch the HTML from the local filesystem (versus a website) but you don't have the ability to change the HTML or make a copy of the HTML and change your copy of it.

@sjvdm
Copy link

sjvdm commented Mar 17, 2016

Same issue experienced here. I am using PDFkit.from_string, but one should be able to set the base URL for local files. I reference most of my files as "./image.jpg", but it expects the file to be at "file:///tmp/image.jpg"

@PhilterPaper
Copy link

So are you running in /tmp (your cd), with href="./image-name"? It would expect your file to be in (or below) /tmp. Still, it would be very nice when getting files from a local filesystem to be able to specify an override of a base URL/file to be prepended on to any file address. Even more flexible would be the ability to specify different overrides (base URLs) for addresses given as URLs, absolute file paths, or relative file paths. You might want different file paths for each of those.

@sjvdm
Copy link

sjvdm commented Mar 17, 2016

That's the funny thing - I am not running in /tmp, but say /home/user. I don't know where it get's the /tmp from. I agree - it would be nice to specify a base URL (like weasyprint) where all relative paths would be calculated relative to.

@wjdp
Copy link

wjdp commented Nov 6, 2016

Running into the same issue here. Tried the --allow option and setting a <base> tag: <base href="./">.

My use case: I'm generating a static site, as part of the build process I create PDF versions of a number of pages. All internal links are absolute, rewriting them as relative is not ideal as HTML is shared between published pages and the PDF generation documents. Running a web server for every build is also a rather taxing addition.

When viewing the HTML files in the browser directly (file://) the absolute links do not work. Which as @PhilterPaper says implies wjhtmltopdf isn't at fault. However I'd argue that the use case here is significant enough to warrant an option to rewrite absolute URLs to a specific origin.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

No branches or pull requests

7 participants