-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utf-8 encoding after yaml.load_file #6
Comments
same problem |
This same behavior occurs when using R core functions like |
Since a yaml file is encoded in unicode, I would expect strings to be given this encoding. The character string that yaml.load_file returns in my example is utf-8 encoded. But I haven't tried an example yaml in utf-16, so I don't know if setting a bit in every string would be enough. |
Ah, I see. I didn't realize that all YAML documents are unicode, but the YAML specification agrees with you. The specification says that by default, the encoding is UTF-8. For UTF-16, the document must provide a byte-order mark: It looks like LibYAML has an encoding property: I'll add this into the next update. |
As it turns out, R does not support UTF-16 at all in |
We just ran into the same problem. It will be nice if you can explicitly mark the encoding of character strings as UTF-8. Thanks! (We probably do not need to worry about UTF-16) |
I had forgotten about this issue, unfortunately. I will take a fresh look at it. |
Thanks! FWIW, this is our current workaround: rstudio/rmarkdown#421 (Recursively mark the character elements of |
There seem to be two issues here, one with When While I suggest setting |
We were bitten by this issue again: rstudio/bookdown#142 Is there a chance that you could fix it? The fix should be fairly simple (mark the input and output strings as UTF-8), and I'm just not familiar with C. |
not sure when this bug can be fixed: vubiostat/r-yaml#6
…alternative form of chapter_name (due to the bug vubiostat/r-yaml#6, we cannot use R expressions in YAML that contains multibyte characters)
We encountered the same issue as well, although it can by solved as @yihui did in https://github.com/rstudio/bookdown/blob/3ed7fc6bd30e2832948d28298dee5cd546339fc8/R/utils.R#L82 We thought it would be nicer if it's fixed in the package Thanks. |
And bitten by this again rstudio/rmarkdown#841 so yet yet another patch... |
Unfortunately I have precious little time to work on this project at present. A pull request would be appreciated. |
@viking Okay, actually that is all I need from you. I'll try to find someone to do the work and submit a pull request. Thanks! |
@viking Done in #32. Tested on Windows and *nix. In the long run, if you feel it is difficult for you to maintain this package, you may consider finding a new maintainer. It seems you are having the similar situation of the tikzDevice package, which is a package that I was highly interested in but the original authors lacked time. The yaml package is critical to the R Markdown world, and I hope you could consider increasing the bus factor so this important project can be carried forward nicely in the future. BTW, I found this article very inspiring: I gave commit rights to someone I didn't know, I could never have guessed what happened next!. |
Thank you. |
@viking Any chance you could make a CRAN release soon? I hate bugging you like this, but without the CRAN release, we just keep hearing users report this issue. Here again: http://rmarkdown.rstudio.com/r_notebooks.html#comment-2982649887 |
I'll get to it soon. Not being funded to do this means that I have other priorities. Please recognize that. |
I don't wish to continue this discussion here. I will let you know when the new version is on CRAN. |
Yep definitely understood, and much appreciated! |
New version is up on CRAN as of about 10 minutes ago. |
Upstream changes: CHANGES IN knitr VERSION 1.15.1 @yihui yihui released this on 23 Nov 2016 · 49 commits to master since this release NEW FEATURES added a new hook function hook_pngquant() that can call pngquant to optimize PNG images (thanks, @slowkow, #1320) BUG FIXES not really a knitr bug, but knit_params() should be better at dealing with multibyte characters now due to the bug fix in the yaml package vubiostat/r-yaml#6 Downloads Source code (zip) Source code (tar.gz) v1.15 b08a7bc CHANGES IN knitr VERSION 1.15 @yihui yihui released this on 10 Nov 2016 · 63 commits to master since this release NEW FEATURES NA values can be displayed using different characters (including empty strings) in kable(); you can set the option knitr.kable.NA, e.g. options(knitr.kable.NA = '') to hide NA values (#1283) added a fortran95 engine (thanks, @stefanedwards, #1282) added a block2 engine for R Markdown documents as an alternative to the block engine; it should be faster and supports arbitrary Pandoc's Markdown syntax, but it is essentially a hack; note when the output format is LaTeX/PDF, you have to define \let\BeginKnitrBlock\begin \let\EndKnitrBlock\end in the LaTeX preamble figure captions specified in the chunk option fig.cap are also applied to HTML widgets (thanks, @byzheng, rstudio/bookdown#118) when the chunk option fig.show = 'animate' and ffmpeg.format = 'gif', a GIF animation of the plots in the chunk will be generated for HTML output (https://twitter.com/thomasp85/status/785800003436421120) added a width argument to write_bib() so long lines in bib entries can be wrapped the inline syntax r#code is also supported besides r code; this can make sure the inline expression is not split when the line is wrapped (thanks, Dave Jarvis) provided a global R option knitr.use.cwd so users can choose to evaluate the R code chunks in the current working directory after setting options(knitr.use.cwd = TRUE); the default is to evaluate code in the directory of the input document, unless the knitr option opts_knit$set(root.dir = ...) has been set if options(knitr.digits.signif = TRUE), numbers from inline expressions will be formatted using getOption('digits') as the number of significant digits, otherwise (the default behavior) getOption('digits') is treated as the number of decimal places (thanks, @numatt, #1053) the chunk option engine.path can also be a list of paths to the engine executables now, e.g., you can set knitr::opts_chunk$set(engine.path = list(python = '/anaconda/bin/python', perl = '/usr/local/bin/perl')), then when a python code chunk is executed, /anaconda/bin/python will be called instead of the system default (rstudio/rmarkdown#812) introduced a mechanism to protect text output in the sense that it will not be touched by Pandoc during the conversion from R Markdown to another format; this is primarily for package developers to extend R Markdown; see ?raw_output for details (which also shows new functions extract_raw_output() and restore_raw_output()) MAJOR CHANGES the minimal version of R required for knitr is 3.1.0 now (#1269) the formatR package is an optional package since the default chunk option tidy = FALSE has been there for a long time; if you use tidy = TRUE, you need to install formatR separately if it is not installed :set +m is no longer automatically added to haskell code chunks (#1274) MINOR CHANGES the package option opts_knit$get('stop_on_error') has been removed the confusing warning message about knitr::knit2html() when buiding package vignettes using the knitr::rmarkdown engine without pandoc/pandoc-citeproc has been removed (#1286) the default value of the quiet argument of plot_crop() was changed from !opts_knit$get('progress') to TRUE, i.e., by default the messages from cropping images are suppressed BUG FIXES the chunk option cache.vars did not really behave like what was documented (thanks, @simonKTH, #1280) asis_output() should not be merged with normal character output when results='hold' (thanks, @kevinushey, #1310) Downloads Source code (zip) Source code (tar.gz) v1.14 b34be0d CHANGES IN knitr VERSION 1.14 @yihui yihui released this on 12 Aug 2016 · 845 commits to master since this release NEW FEATURES improved caching for Rcpp code chunks: the shared library built from the C++ code will be preserved on disk and reloaded the next time if caching is enabled (chunk option cache = TRUE), so that the exported R functions are still usable in later R code chunks; note this feature requires Rcpp >= 0.12.5.6 (thanks, @jjallaire, #1239) added a helper function all_rcpp_labels(), which is simply all_labels(engine == 'Rcpp') and can be used to extract all chunk lables of Rcpp chunks added a new engine named sql that uses the DBI package to execute SQL queries, and optionally assign the result to a variable in the knitr session; see http://rmarkdown.rstudio.com/authoring_knitr_engines.html for details (#1241) fig.keep now accepts numeric values to index low-level plots to keep (#1265) BUG FIXES fixed #1211: pandoc('foo.md') generates foo_utf8.html instead of foo.html by default fixed #1236: include = FALSE for code chunks inside blockquotes did not work (should return > instead of a blank line) (thanks, @fmichonneau) fixed #1217: define the command \hlipl for syntax highlighting for Rnw documents (thanks, @conjugateprior) fixed #1215: restoring par() settings might fail when the plot window is partitioned, e.g. par(mfrow = c(1, 2)) (thanks, @jrwishart @jmichaelgilbert) fixed #1250: in the quiet mode, knit() should not emit the message "processing file ..." when processing child documents (thanks, @KZARCA) MAJOR CHANGES knitr will no longer generate screenshots automatically for HTML widgets if the webshot package or PhantomJS is not installed MINOR CHANGES if dev = 'cairo_pdf', the cairo_pdf device will be used to record plots (previously the pdf device was used) (#1235) LaTeX short captions now go up to the first ., : or ; character followed by a space or newline (thanks, @knokknok, #1249)
I've got a file with utf-8 characters. yaml.load_file loads the character strings correctly. But the encoding, as given by Encoding(), returns unknown. Now I use Encoding(...) <-'UTF-8' to set the encoding.
It would be nice if the character strings had the utf-8 encoding bit set.
The text was updated successfully, but these errors were encountered: