Skip to content
This repository has been archived by the owner on Jan 2, 2023. It is now read-only.

UTF-8 issues #3108

Closed
xokaido opened this issue Sep 5, 2016 · 10 comments
Closed

UTF-8 issues #3108

xokaido opened this issue Sep 5, 2016 · 10 comments

Comments

@xokaido
Copy link

xokaido commented Sep 5, 2016

Hello...
I've got wkhtmltopdf 0.12.3 running on centos 7.
It seems to convert HTMLs into PDFs but UTF-8 characters are either missing (empty fields) or displayed as squares.

Thanks

@xokaido
Copy link
Author

xokaido commented Sep 5, 2016

I forgot to mention I am trying to run it from the command line, I tried with an --encoding option but it didn't help.

Thanks

@PhilterPaper
Copy link

Please supply a small HTML file that exhibits this problem, and the complete command line. Are you using unusual fonts? (such as downloaded with @ face) Does it work OK with, say, just 'serif' and 'sans-serif' font-families, or default, fonts? Are unaccented Latin characters (ASCII) coming out OK, but anything accented is not? Does your HTML display OK in browsers? I've seen accented text actually entered (edited) in Latin-1, rather than UTF-8, and thus fail to convert.

@xokaido
Copy link
Author

xokaido commented Sep 5, 2016

Hello...
Thanks for your reply...
This is the page I am trying to render: http://xoks.net/
The wkhtmltopdf doesn't output the first line and displays the second line with an error. Everything else seems to be working.
This is the command line I am trying to run:
wkhtmltopdf --encoding utf-8 http://xoks.net/ xoksnet.pdf

Thank you one more time.

@PhilterPaper
Copy link

I see 4 different alphabets (scripts) on this page. The first, which you say doesn't show up at all, I don't recognize. The following three are Greek, Cyrillic, and Latin alphabets. Although they show up in my browser, I would not be the least surprised if the first alphabet (what is it?) is rare enough that the UTF-8 font(s) used by the first line don't include those characters. I don't think PDF and browsers necessarily use the same font files. You are going to have to find what fonts you're using here (apparently the default ones), what alphabet is used in the first line, and ask someone familiar with this whether those characters should be expected to show up. If not, you may have to embed the font information in the PDF (watch out for copyright issues!), if you find one that works on your PC.

The Greek line might include archaic character(s) that most PDF fonts don't include. Is it a missing glyph or box, or some more serious error reported? It could be useful for you to attach the PDF, so someone can possibly see what is going on.

@xokaido
Copy link
Author

xokaido commented Sep 6, 2016

The first line is a Georgian language and the second one (Greek) is a normal Greek text. Cyrillic letters seem to be working. How do we include custom fonts into the PDF and how do we tell wkhtmltopdf to look for fonts in the different directory (or what's the default directory to look for)?...

Thank you very much for the help.

@xokaido
Copy link
Author

xokaido commented Sep 6, 2016

It's not an issue, I am sorry, I had to install additional fonts on the system...
The problem was that my Centos 7 system didn't include correct fronts for these non-latin characters.

The issue is close, thanks for the help!.

@xokaido xokaido closed this as completed Sep 6, 2016
@ohadperry
Copy link

ohadperry commented Jul 17, 2017

@xokaido what did you install exactly?
having the same issue with Hebrew fonts
screen shot 2017-07-17 at 9 02 35

Tried
1.

<head>
        <meta name="pdfkit-page-size" content="Legal"/>
        <link rel="stylesheet" href="/static/webapp/plugins/manual/pdf.css">
        <meta charset="UTF-8">
    </head>
  1. Tried: Sudo yum install curl cabextract xorg-x11-font-utils fontconfig
  2. Tried: Sudo yum install liberation-sans-fonts
  3. https://gist.github.com/drakakisgeo/7591660
    1. sudo yum install dejavu-lgc-sans-fonts

none worked.

the output of locale

[my_user@ip-172-31-34-70 ~]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

@ohadperry
Copy link

I've figured it out solution is here
https://stackoverflow.com/a/45145350/1574104
The answer is that the remote server didn't have the right fonts.

I have also solved this by just copying Arial.ttf from my local mac in /Library/Fonts

scp -i "$STAGING_CERT_PATH" Arial.ttf root@"$STAGING_IP":/usr/share/fonts/local/

to my remote server to /usr/share/fonts/local (created the local dir myself).

then fc-cache -v to update and it worked

@mboullouz
Copy link

@ohadperry
thanx for the hint, on CentOS 6, I copied fonts from http://thelinuxbox.org/downloads/fonts/msttcorefonts.tar.gz to /usr/share/font/truetype and solved the same issue

@ShemerKuznits
Copy link

@ohadperry Thanks! this solved my issue as well

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

5 participants