Skip to content
This repository has been archived by the owner on Jan 2, 2023. It is now read-only.

UTF-8 in header/footer (skipping non-ASCII characters) (Again) #4228

Open
enwood opened this issue Jan 10, 2019 · 9 comments
Open

UTF-8 in header/footer (skipping non-ASCII characters) (Again) #4228

enwood opened this issue Jan 10, 2019 · 9 comments

Comments

@enwood
Copy link

enwood commented Jan 10, 2019

I'm opening this issue again (formerly Issue 2002) as I'm still seeing this problem with version 0.12.3 (with patched qt) operating under Red Hat Enterprise Linux 7.3.

In my application, the wicked_pdf gem is being used to drive wkhtmltopdf.

When wicked_pdf passes UTF-8 characters to wkhtmltopdf via the command line, they are being dropped by wkhtmltopdf. In this example, I pass the text "Issued/Issué" as footer text. I can see the accented (UTF-8) footer text ("Issué") being passed in the command line:

"***************[\"/apps/webapps/lts/vendor/bundle/ruby/2.2.0/bin/wkhtmltopdf\", \"-q\", 
\"--encoding\", \"UTF-8\", 
\"--margin-top\", \"25\", 
\"--margin-bottom\", \"15\", 
\"--margin-left\", \"0\", 
\"--margin-right\", \"0\", 
\"--header-spacing\", \"6\", 
\"--header-html\", 
\"file:////var/folders/cl/7g4qxvd15jq_fcsvbzwgzxtm0000gn/T/wicked_header_pdf20190110-8066-vu8w2k.html\", 
\"--footer-center\", \"Issued/Issué: 2019-01-10 11:02\\nPage: [page]/[topage]\", 
\"--footer-font-size\", \"8\", 
\"file:////var/folders/cl/7g4qxvd15jq_fcsvbzwgzxtm0000gn/T/wicked_pdf20190110-8066-ef1jne.html\", \"/var/folders/cl/7g4qxvd15jq_fcsvbzwgzxtm0000gn/T/wicked_pdf_generated_file20190110-8066-umsc0g.pdf\"]***************"

but the resulting PDF continues to be missing the "é" in "Issué" when it appears in the rendered footer.

Red Hat Enterprise Linux Server release 7.3 (Maipo)
[apps@vapp02t lts]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

[apps@vapp02t lts]$ bundle exec which wkhtmltopdf
/apps/webapps/lts/vendor/bundle/ruby/2.2.0/bin/wkhtmltopdf

[apps@vapp02t lts]$ bundle exec /apps/webapps/lts/vendor/bundle/ruby/2.2.0/bin/wkhtmltopdf --version
wkhtmltopdf 0.12.3 (with patched qt)

Any guidance would be appreciated!

@PhilterPaper
Copy link

First, are you absolutely certain that the file you showed (with text containing "é") is UTF-8 and not Latin-1? It's very easy for an editor to slip into Latin-1 etc. mode without you realizing it. Second, if this is being processed by the shell command line, are you absolutely certain that it passes in UTF-8 characters uncorrupted? Often, command shells are single byte (Latin-1, etc.) and may have done something nasty to the text you're passing in.

@enwood
Copy link
Author

enwood commented Jan 15, 2019

Hi Phil; Thanks for your suggestions. Yes, I'm pretty certain everything is passed as UTF-8.

The source file containing the "é" character is a ruby file, annotated with "# encoding: UTF-8", and edited with the Sublime Text editor. The character is passed to the wicked_pdf gem in a call such as this:

footer_text = "Issued/Issué: " + Time.current.strftime("%Y-%m-%d %H:%M")
cover_pdf = WickedPdf.new.pdf_from_string( s, 
          :encoding => 'UTF-8',
          :footer => { :center => footer_text }

when I check the encoding on footer_text, it reports as UTF-8:

Rendered layouts/certificate-header-blank.html (0.2ms)
TitlesController: footer_text encoding is: UTF-8

The wicked_pdf gem then generates the command line calling wkhtmltopdf:

"***************[\"/apps/webapps/lts/vendor/bundle/ruby/2.2.0/bin/wkhtmltopdf\", \"-q\", 
\"--encoding\", \"UTF-8\", 
\"--footer-center\", \"Issued/Issué: 2019-01-10 11:02\", 
\"file:////var/folders...."
]***************"

and the bash shell in which wkhtmltopdf are operating in UTF-8:

[apps@vapp02t lts]$ echo $LANG
en_US.UTF-8

So, I'm not sure what else I can control?

Tim

@enwood
Copy link
Author

enwood commented Jan 15, 2019

And, even if I explicitly reference "é" as "\u00E9", it's still being dropped by wkhtmltopdf 0.12.3.

@PhilterPaper
Copy link

If an accented character is vanishing, my best guess is that it is reaching the engine as an invalid character (corrupted somewhere along the line), or even just dropped. If it definitely started out as a UTF-8 (two byte) character, most likely it was in the shell/command-line handling that something bad happened. Per one of the referenced older issues, have you checked the "locale" settings all the way down the line, to make sure you're properly handling UTF-8 and not treating it as Latin-1 or even ASCII? You might want to either insert some printout code in wkHTMLtoPDF or make up a little dummy program to see if the UTF-8 character is getting to wkHTMLtoPDF, or it's being dropped or corrupted somewhere earlier. Beyond that, I'm out of ideas.

@enwood
Copy link
Author

enwood commented Jan 15, 2019

To complicate matters, I'm finding that the accented character is NOT dropped when wkHTMLtoPDF generates the PDF in my development environment (Mac OS/Ruby 2.2.5). But in production, when running under RHEL 7.3/Ruby 2.2.5, the character is dropped, even with the locale of the shell clearly operating in en_US.UTF-8.

Yes, I think I'm going to have to dig into the receiving end of wkHTMLtoPDF as you suggested, Phil.

If I uncover any reasons for the character being dropped or misinterpreted, I'll post back here.

@ksassnowski
Copy link

I had the exact same thing happen to me using footer-text. All german umlauts would disappear from the generated pdf.

However, what did work for me, was using footer-html with <meta charset="utf-8"> instead of footer-text. @enwood Might be worth a shot? Doesn't really fix the original error but might be a workaround.

@enwood
Copy link
Author

enwood commented May 24, 2019 via email

@x1nas
Copy link

x1nas commented Jul 23, 2019

I had the same problems with cyrillic symbols.
Try dpkg-reconfigure locales and add necessary locale (ru_RU.UTF-8 in my case)

@sbj42
Copy link

sbj42 commented Oct 11, 2022

see also #4777

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

6 participants