New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tibble multiplication sign is invalid UTF-8 character #216
Comments
Fixed typo in last sentence: Changed wickham to hadley. |
Thanks. Could you please post the |
Here is the .Rmd file:title: "Test of tibble in R Markdown"
|
I am not allowed to paste HTML output. For you to see my R output, I attach the PDF output. |
I can confirm this bug. The 'x' is the culprit. Here is a short Rnw with reproducible example:
|
On Windows 10, R 3.3.3, rmarkdown 1.4, tibble 1.3.0.9000 I am unable to reproduce this with either Rmd or Rnw. However, if I use |
@yihui: Is there a way to determine the expected encoding for console output for a knitr or rmarkdown run? Or do we just assume UTF-8? tibble is printing a multiplication sign which requires Unicode and seems to break knitr documents in some cases. |
The weird thing is that my system is using utf8, and other non-ascii characters seem to do just fine. In the example provided the encoding is declared when loading the inputenc package in the LaTeX header ( |
I received a similar report recently about the multiplication sign: yihui/knitr#1389 but I could not reproduce it on Windows. I guess @thibautjombart's problem is that he didn't tell knitr the encoding was supposed to be UTF-8 (which is the default on *nix but not Windows): I'd recommend that you just use the letter |
@hadley: Okay to revert to plain ASCII |
@yihui nope, my native encoding is utf-8 (I'm on linux). Adding the option hasn't changed the error.
Also note this character is used in the |
For what it's worth, this is what emacs thinks of this character:
Seems like a valid utf8 character to my (naive) eye.. |
@krlmlr yeah, it's not worth the hassle. |
- The `print()`, `format()`, and `tbl_sum()` methods are now implemented for class `"tbl"` and not for `"tbl_df"`. This allows subclasses to use tibble's formatting facilities. The formatting of the header can be tweaked by implementing `tbl_sum()` for the subclass. - New `set_tidy_names()` and `tidy_names()`, a simpler version of `repair_names()` which works unchanged for now (#217). - Printing now uses `x` again instead of the Unicode multiplication sign, to avoid encoding issues (#216). - `glimpse()` now properly displays tibbles with foreign characters in column names (#235).
- Subsetting zero columns no longer returns wrong number of rows (#241, @echasnovski). - New `set_tidy_names()` and `tidy_names()`, a simpler version of `repair_names()` which works unchanged for now (#217). - New `rowid_to_column()` that adds a `rowid` column as first column and removes row names (#243, @barnettjacob). - The `all.equal.tbl_df()` method has been removed, calling `all.equal()` now forwards to `base::all.equal.data.frame()`. To compare tibbles ignoring row and column order, please use `dplyr::all_equal()` (#247). - Printing now uses `x` again instead of the Unicode multiplication sign, to avoid encoding issues (#216). - String values are now quoted when printing if they contain non-printable characters or quotes (#253). - The `print()`, `format()`, and `tbl_sum()` methods are now implemented for class `"tbl"` and not for `"tbl_df"`. This allows subclasses to use tibble's formatting facilities. The formatting of the header can be tweaked by implementing `tbl_sum()` for the subclass, which is expected to return a named character vector. The `print.tbl_df()` method is still implemented for compatibility with downstream packages, but only calls `NextMethod()`. - Own printing routine, not relying on `print.data.frame()` anymore. Now providing `format.tbl_df()` and full support for Unicode characters in names and data, also for `glimpse()` (#235). - Improve formatting of error messages (#223). - Using `rlang` instead of `lazyeval` (#225, @lionel-), and `rlang` functions (#244). - `tribble()` now handles values that have a class (#237, @NikNakk). - Minor efficiency gains by replacing `any(is.na())` with `anyNA()` (#229, @csgillespie). - The `microbenchmark` package is now used conditionally (#245). - `pkgdown` website.
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary. |
The tibble multiplication sign is an invalid UTF-8 character. Here is a typical example output from
http://readr.tidyverse.org/reference/read_delim.html :
#> # A tibble: 32 × 11
The multiplication sign character in read_csv outputs such as above is extended ASCII but it should be either in plain ASCII or in Unicode UTF-8. In UTF-8 encoding, the character is displayed as xD7 but pandoc gives the error message
"Cannot decode byte '\xd7': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream"
This is a problem for pandoc on Windows only. I tried pandoc version 1.13.1 and 1.18. I mentioned the problem on Statalist and wondered if it was a problem with Stata's user-written program "Markdoc", which is Stata's equivalent program to R Markdown. The user-programmer of MarkDoc concluded that read_csv should have avoided the invalid UTF-8 character, and I agree. The Statalist URL is http://www.statalist.org/forums/forum/general-stata-discussion/general/1355554-markdoc-manual-gui?p=1362612#post1362612
What is the rationale for using extended ASCII instead of plain ASCII or UTF-8 for the tibble multiplication sign? Given (1) the compatibility problems with pandoc on Windows and with dependent programs such as Stata's markdoc, (2) the no need for extended ASCII, and (3) having an obvious easy fix, I assume this issue was simply overlooked. The problem occurs with R's read_csv () but in bug tidyverse/readr#547 hadley closed the bug and instead suggested this is a tibble problem.
The text was updated successfully, but these errors were encountered: