UTF-8 in header/footer (skipping non-ASCII characters) (Again) #4228

enwood · 2019-01-10T22:59:34Z

I'm opening this issue again (formerly Issue 2002) as I'm still seeing this problem with version 0.12.3 (with patched qt) operating under Red Hat Enterprise Linux 7.3.

In my application, the wicked_pdf gem is being used to drive wkhtmltopdf.

When wicked_pdf passes UTF-8 characters to wkhtmltopdf via the command line, they are being dropped by wkhtmltopdf. In this example, I pass the text "Issued/Issué" as footer text. I can see the accented (UTF-8) footer text ("Issué") being passed in the command line:

"***************[\"/apps/webapps/lts/vendor/bundle/ruby/2.2.0/bin/wkhtmltopdf\", \"-q\", 
\"--encoding\", \"UTF-8\", 
\"--margin-top\", \"25\", 
\"--margin-bottom\", \"15\", 
\"--margin-left\", \"0\", 
\"--margin-right\", \"0\", 
\"--header-spacing\", \"6\", 
\"--header-html\", 
\"file:////var/folders/cl/7g4qxvd15jq_fcsvbzwgzxtm0000gn/T/wicked_header_pdf20190110-8066-vu8w2k.html\", 
\"--footer-center\", \"Issued/Issué: 2019-01-10 11:02\\nPage: [page]/[topage]\", 
\"--footer-font-size\", \"8\", 
\"file:////var/folders/cl/7g4qxvd15jq_fcsvbzwgzxtm0000gn/T/wicked_pdf20190110-8066-ef1jne.html\", \"/var/folders/cl/7g4qxvd15jq_fcsvbzwgzxtm0000gn/T/wicked_pdf_generated_file20190110-8066-umsc0g.pdf\"]***************"

but the resulting PDF continues to be missing the "é" in "Issué" when it appears in the rendered footer.

Red Hat Enterprise Linux Server release 7.3 (Maipo)
[apps@vapp02t lts]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

[apps@vapp02t lts]$ bundle exec which wkhtmltopdf
/apps/webapps/lts/vendor/bundle/ruby/2.2.0/bin/wkhtmltopdf

[apps@vapp02t lts]$ bundle exec /apps/webapps/lts/vendor/bundle/ruby/2.2.0/bin/wkhtmltopdf --version
wkhtmltopdf 0.12.3 (with patched qt)

Any guidance would be appreciated!

The text was updated successfully, but these errors were encountered:

PhilterPaper · 2019-01-12T03:55:05Z

First, are you absolutely certain that the file you showed (with text containing "é") is UTF-8 and not Latin-1? It's very easy for an editor to slip into Latin-1 etc. mode without you realizing it. Second, if this is being processed by the shell command line, are you absolutely certain that it passes in UTF-8 characters uncorrupted? Often, command shells are single byte (Latin-1, etc.) and may have done something nasty to the text you're passing in.

enwood · 2019-01-15T13:08:04Z

Hi Phil; Thanks for your suggestions. Yes, I'm pretty certain everything is passed as UTF-8.

The source file containing the "é" character is a ruby file, annotated with "# encoding: UTF-8", and edited with the Sublime Text editor. The character is passed to the wicked_pdf gem in a call such as this:

footer_text = "Issued/Issué: " + Time.current.strftime("%Y-%m-%d %H:%M")
cover_pdf = WickedPdf.new.pdf_from_string( s, 
          :encoding => 'UTF-8',
          :footer => { :center => footer_text }

when I check the encoding on footer_text, it reports as UTF-8:

Rendered layouts/certificate-header-blank.html (0.2ms)
TitlesController: footer_text encoding is: UTF-8

The wicked_pdf gem then generates the command line calling wkhtmltopdf:

"***************[\"/apps/webapps/lts/vendor/bundle/ruby/2.2.0/bin/wkhtmltopdf\", \"-q\", 
\"--encoding\", \"UTF-8\", 
\"--footer-center\", \"Issued/Issué: 2019-01-10 11:02\", 
\"file:////var/folders...."
]***************"

and the bash shell in which wkhtmltopdf are operating in UTF-8:

[apps@vapp02t lts]$ echo $LANG
en_US.UTF-8

So, I'm not sure what else I can control?

Tim

enwood · 2019-01-15T13:16:58Z

And, even if I explicitly reference "é" as "\u00E9", it's still being dropped by wkhtmltopdf 0.12.3.

PhilterPaper · 2019-01-15T14:12:41Z

If an accented character is vanishing, my best guess is that it is reaching the engine as an invalid character (corrupted somewhere along the line), or even just dropped. If it definitely started out as a UTF-8 (two byte) character, most likely it was in the shell/command-line handling that something bad happened. Per one of the referenced older issues, have you checked the "locale" settings all the way down the line, to make sure you're properly handling UTF-8 and not treating it as Latin-1 or even ASCII? You might want to either insert some printout code in wkHTMLtoPDF or make up a little dummy program to see if the UTF-8 character is getting to wkHTMLtoPDF, or it's being dropped or corrupted somewhere earlier. Beyond that, I'm out of ideas.

enwood · 2019-01-15T14:44:03Z

To complicate matters, I'm finding that the accented character is NOT dropped when wkHTMLtoPDF generates the PDF in my development environment (Mac OS/Ruby 2.2.5). But in production, when running under RHEL 7.3/Ruby 2.2.5, the character is dropped, even with the locale of the shell clearly operating in en_US.UTF-8.

Yes, I think I'm going to have to dig into the receiving end of wkHTMLtoPDF as you suggested, Phil.

If I uncover any reasons for the character being dropped or misinterpreted, I'll post back here.

ksassnowski · 2019-05-24T04:41:47Z

I had the exact same thing happen to me using footer-text. All german umlauts would disappear from the generated pdf.

However, what did work for me, was using footer-html with <meta charset="utf-8"> instead of footer-text. @enwood Might be worth a shot? Doesn't really fix the original error but might be a workaround.

enwood · 2019-05-24T14:35:25Z

Much obliged, Kai! Thanks for the tip. I will give that a shot and report back. Tim

…

On Fri, May 24, 2019 at 12:42 AM Kai Sassnowski ***@***.***> wrote: I had the exact same thing happen to me using footer-text. All german umlauts would disappear from the generated pdf. However, what *did* work for me, was using footer-html with <meta charset="utf-8"> instead of footer-text. @enwood <https://github.com/enwood> Might be worth a shot? Doesn't really fix the original error but might be a workaround. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4228?email_source=notifications&email_token=AAATMR7G62ULL5IQV5AELRTPW5WZVA5CNFSM4GPJ53J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWEELDQ#issuecomment-495469966>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAATMR35274ZMLYDW5KDRJLPW5WZVANCNFSM4GPJ53JQ> .

x1nas · 2019-07-23T12:36:49Z

I had the same problems with cyrillic symbols.
Try dpkg-reconfigure locales and add necessary locale (ru_RU.UTF-8 in my case)

sbj42 · 2022-10-11T17:18:24Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 in header/footer (skipping non-ASCII characters) (Again) #4228

UTF-8 in header/footer (skipping non-ASCII characters) (Again) #4228

enwood commented Jan 10, 2019

PhilterPaper commented Jan 12, 2019

enwood commented Jan 15, 2019

enwood commented Jan 15, 2019

PhilterPaper commented Jan 15, 2019

enwood commented Jan 15, 2019

ksassnowski commented May 24, 2019

enwood commented May 24, 2019 via email

x1nas commented Jul 23, 2019

sbj42 commented Oct 11, 2022

UTF-8 in header/footer (skipping non-ASCII characters) (Again) #4228

UTF-8 in header/footer (skipping non-ASCII characters) (Again) #4228

Comments

enwood commented Jan 10, 2019

PhilterPaper commented Jan 12, 2019

enwood commented Jan 15, 2019

enwood commented Jan 15, 2019

PhilterPaper commented Jan 15, 2019

enwood commented Jan 15, 2019

ksassnowski commented May 24, 2019

enwood commented May 24, 2019 via email

x1nas commented Jul 23, 2019

sbj42 commented Oct 11, 2022