XHTML displayed in Chrome:
After converting to PDF:
XHTML contents to reproduce: Arabic issue.zip
I'm using standalone version of wkhtmltopdf.
Command: wkhtmltopdf name_of_input_file.xhtml name_of_output_file.pdf
wkhtmltopdf name_of_input_file.xhtml name_of_output_file.pdf
Other alphabets (Sinhalese, for example) have reported problems with composite characters being separately output. See #2764 -- it might be the same, or similar, issue.
@PhilterPaper - You're right, it seems like a very similar (if not the same) issue.
It's worth mentioning, that when I'm using non-standard fonts in XHTML (i.e. Harmattan), this problem is no longer present and characters are rendered correctly.
With some alphabets, some characters are defined in UTF-8 as "combining" marks, to be printed over another character, otherwise without any change to either character. With others, it appears that the two (or more) characters are supposed to be merged into an entirely new single character (presumably its own entry in UTF-8). Thinking about it some more, I suspect that your Arabic characters fall into the first category, and the Sinhalese falls into the second. They would very likely have quite different code handling them, and be quite separate problems. Plus, an Arabic letter has different forms for where in the word it appears, which might have some bearing on the problem. It's quite possible that WebKit (or maybe wkHTMLtoPDF) just wasn't written with such support in mind, just simple Latin script with maybe some diacritic overprints?