-
-
Notifications
You must be signed in to change notification settings - Fork 877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Replace non-standard Unicode characters with entities for HTML output #1506
Comments
I'm afraid this is not simple to fix, and knitr is probably not the best place to fix it (it goes back to the evaluate package and base R). There have been long-standing issues like yours: |
I understand that the issue is more complex. But I know and have tested that taking UTF-8 string, converting it to HTML entities before passing to I understand if you consider this an ugly hack or don't like the penalty of increasing output size, but I believe this workaround could solve some real user problems (I may be oblivious to other downsides). |
Is the hack going to be something like |
For HTML, you can be a little cleaner - I noticed that when using
But thinking some more, it might also make sense to do all the processing you do, and then replace the If that's of interest, I did a minor examination on where exactly is the string messed up and it line 157 in
Breakpoint at line 157:
contents of line 157:
after executing the line
Which also means that I was mistaken and the |
Many thanks for the careful debugging! That was amazing. So the actual culprit was line 157. I wonder if it was |
OK, so the culprit is |
Well actually, the issue is not fully resolved :-) kable now produces correct output, but the UTF-8 is mangled somewhere down the pipeline, which brings us to the original suggestion of converting to HTML entities :-) |
I am not really sure, if I should open a separate issue for this, but it turns out my problems are unrelated to HTML, as I am using blogdown, which (I just learned) does not knit to HTML output - instead, it uses knitr to write markdown which is transformed to HTML afterwards. Stepping through knitr, all UTF-8 from
Interestingly, if I force Only if I don't use
The UTF-8 is also preserved when using Which IMHO also means that it might be possible to workaround the evaluate bug - I mean RStudio is definitely able to do this (in RStudio console on my system, |
The encoding issues are always messy. Now I know much more about character encodings than five years ago, but I guess I'll need some substantial time to clean up the relevant code I wrote before. For blogdown, .Rmd is compiled to .md through knitr, and .md is converted to .html through Pandoc. What sounds odd to me is that blogdown calls An object of the class |
So, actually I was wrong with my assesment - the UTF-8 only survived that long because I debugged the code and something got evaluated in a different context (I reproduced it several times, but I don't think I completely understand it - it has a very blackmagic-y feel to it). Nevertheless, in the default regime, I am now quite sure, UTF-8 is lost in a call to |
Hi, I was about to write on stackoverflow asking a question about a certain encoding issue I am facing, which is specific to Windows and knitr. I found this issue already opened and it sounds very much related. My issue is the difference between a chunk output in knitr and the normal R console. I am trying to knit this file to a html document on Windows.
(The R code is of course marked as such, but GitHub strips the code section. Please add them to have running example) The test.txt file contains just two lines with the following content:
This works fine on Linux, but on Windows the following output is returned for line 2 and 5: I also found this issue, which deals with the same problem. Is this connected to the issue? Any suggestion how to solve, if not? (I don't want to hijack this issue. Let me know, fi I should open a separate one) Thanks for any help or suggestions. |
@FelixErnst If test.txt is encoded in UTF-8 (you didn't provide the file, so we don't know), the correct way to read it is |
@yihui Sorry forgot, that uploading the files is also an option. Both are UTF-8 encoded. Removing the call to |
@FelixErnst Okay, in that case, I guess it is simply impossible to solve the problem. See my first reply above (r-lib/evaluate#59): #1506 (comment) The limit comes from base R, which I cannot modify. |
@yihui thanks for the quick answer. I followed down the links to the explanation. I am certainly not grasping every aspect of it, but that is an answer I can live with. Then we hope that Windows 1903 might get a step closer to full UTF-8 support. |
Haha. Fingers crossed! 🙏 |
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary. |
On Windows, locale sometimes messes up Unicode characters in HTML output from knitr. While this can be avoided with proper locale, for HTML, it can also be avoided by using HTML entities. HTML entities also could be somewhat preferable to raw Unicode (not sure about this, really). So my suggestion is to do this transformation by default.
If you agree, I am ready to implement this, but I am not sure, if this transformation should be implemented in knitr, or by modifying the
escape_html
function in highr.This came up while working on an issue in skimr (ropensci/skimr#278) where
kable
can mess up histograms built from Unicode characters.The text was updated successfully, but these errors were encountered: