Skip to content
This repository has been archived by the owner on Jan 2, 2023. It is now read-only.

UTF-8 characters passed as value to external header/footer html files not showing correctly. #2427

Open
newpen opened this issue Jun 24, 2015 · 13 comments

Comments

@newpen
Copy link

newpen commented Jun 24, 2015

I am using --replace to pass some values (Chinese Characters) to my external footer.html (using --footer-html). But the Chinese characters do not shows correctly. While the Chinese characters on the footer.html itself (hard coded on the file, not passed as variable) can be printed with no problem. And using command such as 'header-left' to print out the Chinese characters is also working fine.

On my footer.html, I already have <meta charset="utf-8">. But as characters that are on this file can be printed correctly, I think it should not be the encoding problem. Any idea regarding this? Thanks!

@ashkulz
Copy link
Member

ashkulz commented Jun 24, 2015

Are you on Windows?

@ashkulz
Copy link
Member

ashkulz commented Jun 24, 2015

Also, without a minimal, reproducible test case as requested in the support page this issue cannot be investigated further.

@newpen
Copy link
Author

newpen commented Jun 24, 2015

It's Ubuntu 14.04.1 LTS. I'm using phpwkhtmltopdf on apache server. The generated command is like this:

wkhtmltopdf --encoding 'UTF-8' --header-left '中文字' --header-right '[page]/[toPage]' --header-spacing '15' --enable-toc-back-links --margin-top '3cm' --margin-right '0cm' --margin-bottom '2cm' --margin-left '0' --footer-html 'footer.html' --replace 'mytitle' '中文字' cover 'www.mysite.com/cover.php?id=1' toc --disable-dotted-lines 'www.mysite.com/content.php?id=1' '/tmp/tmp_wkhtmlto_pdf_OIK8lh.pdf'

And on the footer.html, I the JS
<script charset="utf-8"> function subst() { var vars={}; var x=window.location.search.substring(1).split('&'); for (var i in x) {var z=x[i].split('=',2);vars[z[0]] = unescape(z[1]);} var x=['frompage','topage','page','webpage','section','subsection','subsubsection','bookurl','mytitle']; for (var i in x) { var y = document.getElementsByClassName(x[i]); for (var j=0; j<y.length; ++j) y[j].textContent = vars[x[i]]; } } </script>' and also a`. to print it out.

The '中文字' on the header-left is alright, but the '中文字' on the footer.html becomes unrecognizable words. But if I directly place '中文字' on the footer.html, it is working fine.

@newpen
Copy link
Author

newpen commented Jul 2, 2015

I have tried a different JavaScript function (http://stackoverflow.com/questions/12049620/how-to-get-get-variables-value-in-javascript)

  function subst() {
    var vars={};
    var query = document.location
                   .toString()
                   // get the query string
                   .replace(/^.*?\?/, '')
                   // and remove any existing hash string (thanks, @vrijdenker)
                   .replace(/#.*$/, '')
                   .split('&');

    for(var i=0, l=query.length; i<l; i++) {
       var aux = decodeURIComponent(query[i]).split('=');
       vars[aux[0]] = aux[1];
    }
    var x=['frompage','topage','page','webpage','section','subsection','subsubsection'];
    for (var i in x) {
      var y = document.getElementsByClassName(x[i]);
      for (var j=0; j<y.length; ++j) y[j].textContent = vars[x[i]];

    }  }  

Now that if I go to footer.html?webpage=中文字, I can display the UTF-8 characters correctly, but it's still not ok when the the file is called by wkhtmltopdf. I have already set --encoding 'UTF-8' as can be seen in the above code. So I think probably something went wrong for the encoding when calling external footer/header in wkhtmltopdf.

@ddeath
Copy link

ddeath commented Jul 12, 2015

Same here. If section contains some non-ascii character they are not shown correctly.

Prehľad rozloženia => Prehľad rozloženia

footer.html is similar as @newpen described.
OS: Debian 8
wkhtml version: 0.12.2.1 (with patched qt)

@ddeath
Copy link

ddeath commented Jul 15, 2015

@ashkulz Hey guys, my colleague found the catch.
You are using in you documentation js function unescape() which is depreciated and does not deal good with UTF-8 characters. If you use decodeURIComponent() instead, everything is working as expected. It should be fixed in documentation.

@newpen
check if you have following code in your footer.html

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

@newpen
Copy link
Author

newpen commented Jul 15, 2015

@ddeath thanks for the hint, I have it changed to decodeURIComponent() and the line is also included in the footer.html, but it doesn't work for me.
If I go to the url footer.html?webpage=中文字, the "中文字" can be shown correctly. So it may not be the issue of the encoding of the page. But the problem only exist when I print the UTF-8 characters on the footer.html.

My simplified command is like:
wkhtmltopdf --encoding 'UTF-8' --header-left '中文字' --margin-top '3cm' --margin-bottom '2cm' --footer-html 'footer.html' --replace 'booktitle' '中文字' toc --disable-dotted-lines 'example.com/mypage?id=1' '0715.pdf'
the header-left can be shown correctly, but the footer is not...

@jobe451
Copy link

jobe451 commented Sep 30, 2015

I did workaround this issue by adding an ugly javascript workaround to the header.html

I am running wkhtmltopdf on CentOS7. the header.html is stored as UTF-8 with BOM

      function subst() {
        var vars={};
        var x=window.location.search.substring(1).split('&');
        for (var i in x) {var z=x[i].split('=',2);vars[z[0]] = decodeUTF8(unescape(z[1]));}
        var x=['frompage','topage','page','webpage','section','subsection','subsubsection','title'];
        for (var i in x) {
          var y = document.getElementsByClassName(x[i]);
          for (var j=0; j<y.length; ++j) y[j].textContent = vars[x[i]];
        }
      }

      // This is an ugly hack for this bug:
      // https://github.com/wkhtmltopdf/wkhtmltopdf/issues/2427
      function decodeUTF8(text) {

        var i=0;
        var replacement = [
          {'dec': 'Á', 'enc': 'Ã?'},
          {'dec': 'Â', 'enc': 'Â'},
          {'dec': 'Ä', 'enc': 'Ä'},
          {'dec': 'É', 'enc': 'É'},
          {'dec': 'Ó', 'enc': 'Ó'},
          {'dec': 'Ô', 'enc': 'Ô'},
          {'dec': 'Ö', 'enc': 'Ö'},
          {'dec': 'Ú', 'enc': 'Ú'},
          {'dec': 'Ü', 'enc': 'Ãœ'},
          {'dec': 'ß', 'enc': 'ß'},
          {'dec': 'á', 'enc': 'á'},
          {'dec': 'â', 'enc': 'â'},
          {'dec': 'ä', 'enc': 'ä'},
          {'dec': 'ç', 'enc': 'ç'},
          {'dec': 'é', 'enc': 'é'},
          {'dec': 'ë', 'enc': 'ë'},
          {'dec': 'î', 'enc': 'î'},
          {'dec': 'ó', 'enc': 'ó'},
          {'dec': 'ô', 'enc': 'ô'},
          {'dec': 'ö', 'enc': 'ö'},
          {'dec': 'ú', 'enc': 'ú'},
          {'dec': 'ü', 'enc': 'ü'},
          {'dec': 'è', 'enc': 'Ä?'},
          {'dec': 'Ê', 'enc': 'Ę'},
          {'dec': 'ê', 'enc': 'Ä™'},
          {'dec': 'Ì', 'enc': 'Äš'},
          {'dec': 'ì', 'enc': 'Ä›'},
          {'dec': 'Ò', 'enc': 'Ň'},
          {'dec': 'ò', 'enc': 'ň'},
          {'dec': 'À', 'enc': 'Å”'},
          {'dec': 'à', 'enc': 'Å•'},
          {'dec': 'Ù', 'enc': 'Å®'},
          {'dec': 'ù', 'enc': 'ů'},
          {'dec': 'Û', 'enc': 'Å°'}
        ];

        for (; i < replacement.length; i++) {
          text = text.replace(replacement[i].enc, replacement[i].dec);
        }

        return text;
      }

@leew
Copy link

leew commented Mar 2, 2017

Hi I am having the same issue.

Characters passed in the --replace argument are appearing different to the other query params for the --header-html.

I am using wkhtmltopdf 0.12.1 (with patched qt) on ubuntu 16.04
./wkhtmltopdf --replace "personname" "á" --encoding="UTF-8" --header-html header.html

query string obtained from the header html
?page=1§ion=Verbeterde%20Rapport&sitepage=1&title=&subsection=%C3%81&frompage=1&subsubsection=&personname=%C3%83%C2%A1&topage=12&doctitle=&sitepages=12&webpage=-&time=18:00&date=01/03/2017

I tried using --title "á" instead of --replace, but in this case the character is dropped as mentioned by other users

@leew
Copy link

leew commented Feb 6, 2018

I have recently found potentially a workaround here, I am URI encoding my UTF-8 characters before passing these to wkhtmlpdf, then in my header file I am double decoding the query string that I am passing this seems to get passed the issue. But I am still testing this. I will update with a code example once I have confirmed.

Has anyone else tried this approach?

@Tomsgu
Copy link
Contributor

Tomsgu commented Aug 22, 2018

Is this still an issue in 0.12.5?

@irbian
Copy link

irbian commented Feb 15, 2019

As a workaround, params that come from the --replace, I decode them with

decode_utf8(decodeURIComponent(param));
			function decode_utf8(string) {
				  return decodeURIComponent( escape( string ) );
			}

I am affected by this problem on 0.12.3, don´t know about 0.12.5

@sidnas
Copy link

sidnas commented Aug 30, 2019

I had a similar problem and i found such a solution:

As --replace parameter i pass base64 encoded string and after that in footer.html file i worked on it with javascript function:

function b64DecodeUnicode(str) {
     return decodeURIComponent(Array.prototype.map.call(atob(str), function(c) {
         return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2)
     }).join(''))
}

Hope this helps someone.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

8 participants