New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode in Href not working #3900

Closed
ScouterX opened this Issue Apr 30, 2018 · 8 comments

Comments

Projects
None yet
2 participants
@ScouterX

ScouterX commented Apr 30, 2018

The HTML, which have asian unicode characters within Href, can't produce proper links.
After analyzing the pdf with an Hex-Editor i found me lucky: The URLs are in clear text and with broken asian characters.
I use Arial Unicode MS as Font.
I hope, i have analyzed the pdfs correct. In comparison asian characters are parsed correct elsewhere.
In the Zip File is an HTML and the resulting pdf.
The HTML works flawlessly with new Browsers, although it is not HTML 5. Changing this document to HTML 5 doesn't improve anything for wkhtmltopdf. I'm sure there are other charsets with problems.

Bug2.zip

@ashkulz

This comment has been minimized.

Member

ashkulz commented Apr 30, 2018

Which version did you try this with? Was it a distribution package or downloaded from the website?

@ScouterX

This comment has been minimized.

ScouterX commented Apr 30, 2018

Version 0.12.4.0 for Windows installed via wkhtmltox-0.12.4_msvc2015-win64.exe.

@ashkulz ashkulz added the Verified label May 1, 2018

@ashkulz ashkulz added this to the 0.12.5 milestone May 1, 2018

@ashkulz

This comment has been minimized.

Member

ashkulz commented May 1, 2018

Okay, am able to confirm this. It looks like the link is converted to Latin1 instead of using toEncoded which naturally coverts the non-ASCII characters to ?.

Thanks for the test case and description, this should get fixed soon.

@ScouterX

This comment has been minimized.

ScouterX commented May 2, 2018

Thank you.

@ScouterX ScouterX closed this May 2, 2018

@ashkulz

This comment has been minimized.

Member

ashkulz commented May 2, 2018

Keeping it open till it gets fixed.

@ashkulz ashkulz reopened this May 2, 2018

ashkulz added a commit that referenced this issue May 7, 2018

@ashkulz

This comment has been minimized.

Member

ashkulz commented May 9, 2018

@ScouterX: can you try the build and confirm that it works?

@ashkulz ashkulz closed this in a23cca3 May 9, 2018

@ScouterX

This comment has been minimized.

ScouterX commented May 9, 2018

In my opinion, you have fixed the culprit. A japanese font example (attachement) parsed to a pdf with working links (with some pdf viewer software). I can finally contact the various programmers out there to fix their unicode implemantation of existing programs.
Sidenote: the fix also helped with latin font implementation in regard of other pdf viewer software.
Maior improvement.
Pdf created by PowerShell 2.0:
.\wkhtmltopdf -L 0 -R 0 --encoding utf-8 --page-width 450 --page-height 2000 -B 15mm "input.html" output.pdf
Seems to be OK.zip

@ashkulz

This comment has been minimized.

Member

ashkulz commented May 30, 2018

A release candidate is available which includes the fixes made in wkhtmltopdf/qt@6198829 -- please test and report your findings before the final release.

@ashkulz ashkulz added Fixed and removed Verified labels May 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment