-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Auto page size based on content size #1627
Comments
|
Why would you prefer a PDF in such a scenario instead of an image generated by |
|
PDF keeps text in vector format. I know wkhtmltoimage+SVG does the same, but SVGs can’t be included into latex without conversion to PDF by other software. This can be done with Inkscape or similar, but it’s still an extra step. |
|
So you can include a PDF into LaTeX, and it keeps the text in vector format? |
|
Yep, that’s the main point of using wkhtmltopdf instead of wkhtmltoimage. |
|
I'm not too comfortable with adding this feature, as it only makes sense for single-page PDFs. Also, what happens if the text generates more than one page? |
|
Well, that only page can be of large arbitrary size – I don't see a problem here. Another way of implementing such feature could be adding PDF as export format in wkhtmltoimage: Advantage of such approach is more coherence in API (images don't have page margins, page numbers etc). How about this? |
|
As mentioned earlier, to get a PDF of a minimum possible size, I'm doing HTML conversion in two steps. First, I get an SVG using All works fine except one thing: the links defined in the HTML get lost. If I try to convert HTML to PDF in one step using If Is there any way this can be done now? There is a chance I've missed something in the docs. |
|
Not really, it will require some changes -- for which there is no time right now. |
|
OK, no worries. I wish I could help with the development, but C++ and Python are quite unfamiliar to me. It will be absolutely great if you consider implementing something like |
|
+1 One use case for this feature would be printing the generated PDF by a paper-role printer, which prints everything on one "page". |
|
@ashkulz, it’s great that you’ve shortlisting this feature! When about are you planning to work on it? I’m currently finding more and more difficulties in using The only workaround I see so far is running I could look more into this temporal hack and then share a shell script here, but what if you are willing to patch Not rushing you, just asking :–) Thanks for this awesome library again! |
|
I would very much be interested in this feature :) We print from rollers, so this feature would be perfect. I could probably help out with the development too, once I have my head round the code. |
|
@lukeenglish: patches are always welcome 👍 |
|
How do you currently determine when to break onto a new page if no manual breaks a provided? I assume its when the content expands over the page height. I am more than happy to try and patch this. |
|
Just see this in your documentation.... The current page breaking algorithm of WebKit leaves much to be desired. Surely, we just need to interrupt the splitting method in a patched web kit ?? |
|
I'd start by looking into wkhtmltoimage, as it already does interrupt this splitting (or does it join everything?). Anyway, I doesn't seem difficult to implement! |
|
Yeah cheers for the heads up, that's where I am looking at the moment. |
|
+1 I'd also interested in achieving this somehow. I'd like to keep the vectors from the PDF but have the content fit on one page, no matter how big. I tried setting a big paper size but that creates unnecessary file-size and white-space when opening the file in for example Illustrator. |
|
Hey guys, any updates on this issue? |
|
+1 on this. Receipt printers are the main use case in our work. Unfortunately I don't have the chops to create a patch myself. |
|
Hello, sharing any feedback would be awesome. :) |
|
What updates would you want? I'm not working on it, as I said above ... |
|
Well, @lukeenglish and @leonelsr had some ideas to begin with, so I thought that they might share their findings and estimates on whether this feature request is possible/feasible to implement. |
|
+1 For this. I'm thinking I might have to |
|
+1 |
|
That link just eventually loops back to this particular issue! By the way, before trying to produce outsized single PDF pages, someone ought to find out if there really is a semi-/un-documented limit of 200 inches (508cm) for a PDF page dimension. At least you would know there is an upper limit on what you can do. |
|
This thread has been inactive for a long time since my last post. Too bad since this would be a great feature that is requested a lot. That's a nice catch @PhilterPaper didn't know that. I worked around this problem by setting all page margins to zero with the margin_top, margin_bottom, margin_left, margin_right parameters. And then setting the page_height and page_width to the HTML-document size by calculating the pixels to cm. It's not 100% perfect but works well enough in my case so thought someone else would be interested in this while we keep waiting for a potential fix to this issue. :) Accoding to http://www.translatorscafe.com/cafe/units-converter/typography/calculator/pixel-(X)-to-centimeter-[cm]/ 1px ~= 0.02645833333333cm However I found that when setting the height it's actually a little bit bigger in the resulting PDF. I got the best results when calculating 1px width = 0.0333333 and 1px height = 0.04. So take the width in px * 0.0333333 And needless to say. None of my documents have exceeded 508cm. Yet. I will continue to follow this thread with interest :) |
|
+1 |
1 similar comment
|
+1 |
|
come on guys! very useful feature! |
|
+1 |
|
A workaround |
|
Yes, but i need it based on content size 😕 , without |
Well, my fix to that was to multiply 297 (standard height of A4 documents) by the number of pages in the document. That should give you the height based on your content. |
|
my previous solution was to find page-height via JS in px, then convert to millimeters (X = px*0.264583333) and then render PDF file with But it was bad idea find height on client side, because of different fonts drawings... it's different in different browsers... |
|
+1 |
|
+1 |
|
Seems silly that this is still an issue. I'm calling |
|
I agree that an option would be great, but here's my take to avoid rendering the document twice and relying on the calculations shown here. I print my document setting a very long page (like, 5000mm for example), making sure to get a single page, and then I use If you need to have an exact page width, things gets more complicated, but it's possible to use |
|
One aspect of unlimited page length is that, while pages are built from the top down, the coordinate system runs from the bottom up. Thus, the y-coordinate of the top of the page needs to be known when you start. It's too bad that Adobe chose to do this (almost every other graphics system has Don't forget that PDF officially limits you to 14400 points, unless you use a UserUnits scale factor. I think a lot of readers are lax in enforcing that rule, but you never know when you'll run into one that is strict about it. |
|
Not sure if this is useful, but there was a recent WebKit development related to this issue: according to this Twitter thread WebKit now contains a |
|
By the way, I see a lot of comments stating that the "standard" page size is A4 (595Pt x 842Pt). Keep in mind that wkHTMLtoPDF apparently sets this (via |
|
+1 |
|
My frustration made me throw together this dirty script. Following @lorenzos solution, along with this script used for cropping: #!/bin/bash
# 1. convert html to one long paged pdf (5000mm long):
wkhtmltopdf --enable-local-file-access --page-width 210mm --page-height 5000mm "$1" converted.pdf
# 2. crop the long pdf:
fname="converted.pdf"
pagesize=( $(pdfinfo "$fname" | grep "Page size" | cut -d ":" -f2 | \
awk '{ print $1,$3 }') )
bounding=( $(pdfcrop --verbose "$fname" | grep "%%HiResBoundingBox" | \
cut -d":" -f2 ) )
rm "${fname//.pdf/-crop.pdf}"
lmarg="${bounding[0]}"
rmarg="$(python3 -c "print('{:.3f}'.format(${pagesize[0]} - ${bounding[2]}))")"
pdfcrop --margins "$lmarg 75 $rmarg 75" "$fname" "single_page.pdf"
#cleanup the intermediate pdf file (converted.pdf, the one with all the whitespace ;))
rm "$fname"The script takes in a single argument, Assumptions:
Open the terminal here and: You should be left with a file called Quick and dirty, but gets it done. Hope this feature gets added. It may be quirky but it's also insanely frustrating :P . |
|
+1 |
1 similar comment
|
+1 |
|
One more reason that wkhtmltoimage isn't suitable for converting to PDF is that it doesn't preserve text. The text in the original HTML is converted to paths when the HTML file is converted to SVG, so it can't be selected and copied from the result. |
|
As in @beevabeeva's solution, I found that |
|
any chance that it will be supported anytime soon? |
|
Probably never :-) ... you can try my solution if you want --> https://github.com/Sicos1977/ChromeHtmlToPdf case PaperFormat.FitPageToContent:
PreferCSSPageSize = true;
break; |

It would be nice to be able to set something like
--auto-sizewhen calling wkhtmltopdf to generate a PDF with one page of a minimum possible size. This can be currently done using a workaround, and means to avoid such hack would be useful.PDFs with sizes that depend on the content are widely used in latex documents. One can render an HTML page and then embed the resulting PDF as a figure.
The text was updated successfully, but these errors were encountered: