-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with PDF output size #20
Comments
Hi, I am not sure I understand everything, but I will try to explain what I can. First, some questions:
You are running in 1st Step - Resize
2nd Step - Scale
At MacOSX, there may be a problem if you are trying to process a PDF file that was just created by another script step, which may not yet have spotlight's metadata. That will happen if mdls is used because it uses this metadata. This will only happen if the PDF was just created miliseconds before. You can force another method of size detection to make sure this is not a problem though. Try installing (from homebrew) imagemagick or xpdf and using Next version will have ghostscript detection, so this should not be a problem anymore. Let me know if this helps. |
BTW, postscript points are integers. Can you try it using integers for custom paper size? Or use metric. EDIT3: Does the merged (input) PDF has each page with different paper size maybe? |
It seems that attachments don’t go through so I will paste here the text of the mail and here is the Dropbox link of the folder where the mentioned files are. Sorry for that.
Dear Gustavo,
First of all nice to meet you and thank you so much for taking your time to assist me personally, I cannot emphasise enough how much this means to me!
On with your questions:
• I’m using pdfScale 2.4.9 (one interesting thing is that the upgrade command failed on me to the point I had to manually install the updated version inside the usr/local/bin directory.
• I am sending you several PDFs and I would like to ask you to convert them to -r ‘custom mm 232 305’ -s 0.985.
• The PDF labelled ‘One’ is a simple PDF outputted from the app Avid Sibelius. To me this resizes and scales perfectly, so no issue with that.
• The PDF labelled ‘Two’ contains ‘One’ plus a few other PDFs exported from Apple Pages and then combined with the Combine Tool in Acrobat CC 2020 (though using Preview or PDF Expert didn’t change). Passing these files to pdfScale produces the issue shown in picture ‘Two-CropMedia’, while the output from the Terminal is in text file ‘Two-Terminal’. The resulting file is called ‘Two_scaled’. I have already inquired with Acrobat communities what these two Boxes mean and am waiting for an answer.
Normally, also because I am a total beginner in scripting (am a musician, nothing more!), I just pass PDFs that have already been created for quite a while so the issue you describe at the end should not be there.
I installed imagemagick but I am not sure where in the script I should put those extra parameters. If you think this is important could you please point me there?
Thank you so much for all your help, I hope what I wrote is somehow helpful in the investigation.
My only workaround now is to open each file in Preview and export them using the Print tool, which I can do but is in an order of magnitude of O(n) times slower!
Thank you once more and all the best
Michele Galvagno
|
Hi, there are no links at your post. Seems like the upgrade is broken on MacOSX because it uses BSD's readlink instead of GNU's. To force imagemagick, just add You can paste images here, so maybe the screenshots would help me understand the results. Would be nice to have the actual PDF's as well though. |
Please try this:
https://www.dropbox.com/sh/es7qtpq3vy96xak/AACsoPUJB1z3o4EJGa2ee0oea?dl=0
Do you see it now?
Otherwise how can I reach you privately? Twitter?
Thanks!
|
Yep, now I got it. |
Still investigating, but this is what I have so far:
The verbose run of Two.pdf
The error is here:
But as mentioned the page size is parsed correctly and the execution seems to proceed without problems. This is what the grep call returns on One.pdf and Two.pdf One.pdf
Two.pdf
Those weird chars are what causes the parsing problems.
But as mentioned, you can use However, this does not seem to be the problem, since the page size is parsed correctly (even with the error). Please note how complex the second PDF definition is and how it has a lot more stuff than the other file has. I would guess that these other things are interfering with the result. Honestly, I am still not 100% sure I understand what the problem is? The resulting MediaBox size seems to be correct, so pdfScale seems to be working properly, but the CropBox seems to be keeping the original proportion and that is what ends up rendering on screen. I am not sure why you have a cropbox defined. From what I understand, that is used in pre-press to define a page with a bleed. So they can print it a bit bigger than the actual needed size and then cut the excess later (for a better finishing and no borders). So maybe you can config the Acrobat merger in order for it to not define a cropbox? Here are some explanations on the PDF boxes:
Anyways, let me know if this helps. While writing this I made a few more tests and got some new info: Example run
Notes
So we at least know where the problem is now, but I sill don't know what path I should take to solve this yet. This post seems to shed some light on the Your file also has a I will keep researching it. Cheers! |
Wow, thank you!
I read all of this quickly before heading off to bed (midnight here).
Will read this again tomorrow morning with a fresh head and see what I can do!
Thanks for this thorough investigation!
…On 30 Mar 2020, 23:34 +0200, Gustavo Arnosti Neves ***@***.***>, wrote:
Still investigating, but this is what I have so far:
• File Two.pdf does have some weird stuff coming with the /Mediabox definition (from grep).
• Even though this causes an error, the page size seems to be processed accordingly.
• Using imagemagick (or pdfinfo) will solve this problem (add -m i to call).
The verbose run of Two.pdf
$ pdfscale -v -r 'custom mm 232 305' -s 0.985 Two.pdf
pdfscale v2.4.9 - Verbose Execution
Mixed Tasks: Resize & Scale
Dry-Run: FALSE
Input File: Two.pdf
Output File: Two.CUSTOM.SCALED.pdf
Get Page Size: Adaptive Enabled
Method: Grep
/usr/local/bin/pdfscale: line 1497: warning: command substitution: ignored null byte in input
Source Width: 595 postscript-points
Source Height: 842 postscript-points
Print Mode: Print ( auto/empty )
Fit To Page: Enabled (default)
Auto Rotate: PageByPage
Flip Detect: No change needed
Run Resizing: CUSTOM ( 658 x 865 ) pts
New Width: 658 postscript-points
New Height: 865 postscript-points
Scale Factor: 0.985
Vert-Align: CENTER
Hor-Align: CENTER
Translation X: 5.01 = 5.01 + 0.00 (offset)
Translation Y: 6.59 = 6.59 + 0.00 (offset)
Run Scaling: -1 %
Background: No background (default)
Final Status: File created successfully
The error is here:
/usr/local/bin/pdfscale: line 1497: warning: command substitution: ignored null byte in input
But as mentioned the page size is parsed correctly and the execution seems to proceed without problems.
This is what the grep call returns on One.pdf and Two.pdf
One.pdf
$ grep -a -e '/MediaBox' -m 1 ./One.pdf
/MediaBox [0 0 595.000000 842.000000]
Two.pdf
$ grep -a -e '/MediaBox' -m 1 ./Two.pdf
ðV ù(Õ��çKp �a§��uV4L��ò×ç]áÐ�Àxú©AÖ0�àt~îSD?�NT�Äg¢jO�§|®I�O|C|%´�áÑu?k�Óºá�º�òÛ JÀz�È_H/üÛ
<</Contents[1422 0 R 1423 0 R 1424 0 R 1425 0 R 1426 0 R 1427 0 R 1428 0 R 1430 0 R]/CropBox[0 0 595.2756 841.8898]/MediaBox[0 0 595.2756 841.8898]/Parent 1400 0 R/Resources 1437 0 R/Rotate 0/T<</Filter/FlateDecode/First 72/Length 642/N 8/Type/ObjStm>>stream
Those weird chars are what causes the parsing problems.
Maybe I can run it through a pipe with strings or cat to mitigate the problem (eg.)
$ strings Two.pdf | grep -e '/MediaBox' -m 1
<</Contents[1422 0 R 1423 0 R 1424 0 R 1425 0 R 1426 0 R 1427 0 R 1428 0 R 1430 0 R]/CropBox[0 0 595.2756 841.8898]/MediaBox[0 0 595.2756 841.8898]/Parent 1400 0 R/Resources 1437 0 R/Rotate 0/Type/Page>>
But as mentioned, you can use -m i to solve this as well.
However, this does not seem to be the problem, since the page size is parsed correctly (even with the error).
Please note how complex the second PDF definition is and how it has a lot more stuff than the other file has. I would guess that these other things are interfering with the result.
Honestly, I am still not 100% sure I understand what the problem is?
It is a bit confusing, but seems like the proportions of the original file is maintained, right?
The resulting MediaBox size seems to be correct, so pdfScale seems to be working properly, but the CropBox seems to be keeping the original proportion and that is what ends up rendering on screen.
I am not sure why you have a cropbox defined. From what I understand, that is used in pre-press to define a page with a bleed. So they can print it a bit bigger than the actual needed size and then cut the excess later (for a better finishing and no borders).
So maybe you can config the Acrobat merger in order for it to not define a cropbox?
I would try to tinker with the merger options to see if it makes any difference.
Here is some explanations on the PDF boxes:
• https://www.prepressure.com/pdf/basics/page-boxes
• https://wiki.scribus.net/canvas/PDF_Boxes_:_mediabox,_cropbox,_bleedbox,_trimbox,_artbox
• https://www.w3pedia.com/uk/the-pdf-page-boxes-cropbox-bleedbox-trimbox
Anyways, let me know if this helps.
I recommend using Lightshot to create screenshots (copy to memory) and then you can just paste them here (ctrl + V). You can save the image and drag+drop here as well.
While writing this I made a few more tests and got some new info:
Example run
$ pdfscale -m i -v -r 'custom mm 232 305' -s 0.985 Two.pdf
Checking for imagemagick's identify
pdfscale v2.4.9 - Verbose Execution
Mixed Tasks: Resize & Scale
Dry-Run: FALSE
Input File: Two.pdf
Output File: Two.CUSTOM.SCALED.pdf
Get Page Size: Adaptive Disabled
Method: ImageMagick's Identify
Source Width: 595 postscript-points
Source Height: 842 postscript-points
Print Mode: Print ( auto/empty )
Fit To Page: Enabled (default)
Auto Rotate: PageByPage
Flip Detect: No change needed
Run Resizing: CUSTOM ( 658 x 865 ) pts
New Width: 658 postscript-points
New Height: 865 postscript-points
Scale Factor: 0.985
Vert-Align: CENTER
Hor-Align: CENTER
Translation X: 5.01 = 5.01 + 0.00 (offset)
Translation Y: 6.59 = 6.59 + 0.00 (offset)
Run Scaling: -1 %
Background: No background (default)
Final Status: File created successfully
Notes
• Source Width/Height is always correct (even when using grep with the error)
• Target Width/Height is also correct
• PTS ( 658 x 865 ) == MM ( 232 x 305 )
• The resulting /Mediaboxes have the correct size
$ strings Two.CUSTOM.SCALED.pdf | grep -e '/MediaBox'
<</Type/Page/MediaBox [0 0 658 865]
<</Type/Page/MediaBox [0 0 658 865]
<</Type/Page/MediaBox [0 0 658 865]
. . .
• The resulting /Cropboxes are the ones keeping the proportion of the page
• Their sizes are ( 634.808105 x 865.0 )
$ strings Two.CUSTOM.SCALED.pdf | grep -e '/CropBox'
/CropBox [23.191925 0 634.808105 865.0]
/CropBox [23.191803 .00003051758 634.808228 865.0]
/CropBox [23.191925 0 634.808105 865.0]
/CropBox [23.191925 0 634.808105 865.0]
/CropBox [23.191925 0 634.808105 865.0]
/CropBox [23.191803 .00003051758 634.808228 865.0]
/CropBox [23.3735352 .00003051758 634.626465 865.0]
. . .
So we at least know where the problem is now, but I sill don't know what path I should take to solve this yet.
This post seems to shed some light on the /Cropbox issue and offers a workaround.
https://stackoverflow.com/a/26989410/1273636
Your file also has a /Cropbox defined for EACH page as in the question above.
I will keep researching it.
Cheers!
Gus
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Seems like I have a solution to bypass the Cropboxes with the new sizes. I am still researching the best way to implement it though. I am inclined to just add a cli parameter that will redefine all Cropboxes to the same size as the paper (Mediabox). This will be easy to implement and run, but will not be a universal/automatic solution (which would be nice). Gus |
The problem is thinking on all possible outcomes and situations. As mentioned before, this only applies to resizing, for scaling this is all irrelevant. Possibilities
Options for now
Detecting Cropboxes would be nice, but the complexity grows a lot. Detecting is the first problem, since it may not always work (seems to be exactly the same as Mediabox detection). There is also no clear definition on what default behaviour should be used on each case, since it will always depend on what the user actually wants. Still digging and thinking here. |
Hi, things went a bit hectic yesterday, so I could no finish anything. Anyways, for your specific use case I already have a solution (which will be to reset all cropboxes to the same size as the Mediabox by issuing an execution flag). On top of that I will also add the option It will probably be something like
So it will probably be similar to the regular page size definition. |
Thank you so much! |
Hi, https://github.com/tavinus/pdfScale/tree/v2.5 Can you please try it and let me know? Things to note
Here is the
So on your case you should just add EDIT |
The
I have installed it manually and the version is not correctly 2.5.2. Thank you so much for this! |
^ This was running 2.4.9, so it is normal for it not to work. Only 2.5.2 will run the upgrade properly on Macs (even though it will offer the older version, with a warning). Proceeding with the upgrade will downgrade (until I merge with the master branch). I was able to test on a Yosemite VM (which is when I found the problem with curl that was patched on 2.5.2).
Your automator script seems fine to what you need and I can't think any reason for it not to work with the new version. Would be nice to have batch processing for folders included into pdfScale, but I am not sure I will be able to do it right now. I will probably merge with master today, so everything will be easier to test and the upgrade will not downgrade anymore. |
To install using the v2.5 branch you need to adjust the URLs
|
I believe I had a similar problem resizing PDFs in letter format to A4. Sorry if this is hijacking this thread. Just wanted to give feedback that using Here is an example of a scientific article in letter format: https://www.hydrol-earth-syst-sci.net/23/303/2019/hess-23-303-2019.pdf The standard command doesn't yield the desired result. Although there are some differences in the dimensions of the Media and Crop Box with respect to the original file. It shows the pdf still in letter format: Using the
|
Not hijacking at all. Thanks for the feedback @fabern From what I tested, most problematic PDFs had different cropbox sizes on different pages (some where very close but still a bit different). If you want the cropbox reset to the SAME size as you are resizing, |
Ok, v2.5.3 was merged and released and the v2.5 branch was deleted. Cheers 🍻 |
I'm using your tool inside of a shell script which gets run by an Automator.
I am not the author of this script but it used to work up until 2-3 weeks ago, then now whatever I try the output PDFs are not coming out the size written inside the script.
For example, the one pasted below comes out 611 x 684 instead of the correct amount.
Could you help me find the issue? Thanks!
EDIT: with help from user VikingOSX from discussions.apple.com, I discovered that the issue presents itself only when the input file is a PDF made up combining different PDFs, for example using Adobe Acrobat.
Your tool will correctly resize the MediaBox of the file but will also apply a CropBox to it that maintains the proportion of the original file.
For example: if I want to convert such file from A4 to 9x12in the resulting file available to the user will be 12in high but 8.51in wide, as it will keep the A4 proportions.
Could you please give a look at this issue and tell me how to solve it?
Thank you very much
The text was updated successfully, but these errors were encountered: