Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Right-to-left languages flip columns #433
I'm not sure what the solution is, but when left-to-right (English) characters get mixed with right-to-left characters (in this case, Hebrew), the column names and column contents no longer match:
Would be happy to help solve this issue, but no clue where to start. Just happened to notice this the other day.
For some reason, the reprex shows up weirdly (maybe another issue?)
library(tibble) #> Warning: package 'tibble' was built under R version 3.4.3 tibble(a = "<U+05D0>", b = "<U+05D1>") #> # A tibble: 1 x 2 #> a b #> <chr> <chr> #> 1 <U+05D0> <U+05D1>
Thanks. I think this is the underlying cause:
paste( "א", "ב" ) #>  "א ב"
A solution that works in the R terminal on Linux, but not in reprex or in RStudio (due to weak HTML support for isolates):
# Reference: https://www.w3.org/International/questions/qa-bidi-unicode-controls fsi <- function(x) paste0("\u2068", x, "\u2069") paste( fsi("א"), fsi("ב") )
For the reprex issue: Windows and Unicode don't get along well, but it might be worth filing an issue with the reprex repo.
I haven't thought much about Bidi, and it's not even clear to me that there's an obvious "correct" behavior in the presence of Bidi text. (What to do if the column contains some Bidi text and some non-Bidi? What about if a row has both Bidi and non-Bidi?)
@isteves if you want to help, probably the first thing to do is to spec out some of the cases and when the correct behavior should be.
@patperry Here's a visual intro to how Bidi text works in Excel. My (Windows 10) laptop is typically set to English, but the Hebrew keyboard was enabled for the gif. Thus, cells default to left-to-right (LTR) but can be forced right-to-left (RTL) if Hebrew characters are used.
The links that @krlmlr shared seem to go into detail about the logic behind this behavior.
Below, I've outlined some examples of how tables should look. In these examples, the column with A/א represents the first column.
Note: I'm not sure if these theoretical cases are currently supported as code. It may be possible to import csv's/xlsx's formatted like this (not sure, haven't tried), but I was not able to do it this way:
tibble(א = "a", ב = "b") # reprexed version tibble(<U+05D0> = "a", <U+05D1> = "b") #> Error: <text>:4:8: unexpected '<' #> 3: #> 4: tibble(< #> ^
Mixed column names
Here, the order of the columns should be determined by the type (LTR/RTL) of the first "strong" character in the name of the first column. In the first example, typing "A" first should force LTR columns. In the second example, typing "א" first should force RTL columns.
@patperry is this helpful?
The syntax is
# reprexed version tibble(`<U+05D0>` = "a", `<U+05D1>` = "b")
but this doesn't give the expected results on Linux.
We need to tell the terminal to make up its own mind about each individual cell (using bidi rules for the cell contents), but use LTR to align the cells. The
Trying to be explicit about the cell alignment (using left-to-right isolates) doesn't fix the problem in RStudio -- does it help in the OS X terminal?
# Reference: https://www.w3.org/International/questions/qa-bidi-unicode-controls fsi <- function(x) paste0("\u2068", x, "\u2069") ltr <- function(x) paste0("\u2066", x, "\u2069") cat( ltr( paste( fsi("א"), fsi("ב") ) ) )
Maybe it's worth trying different terminals on OS X?