Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Right-to-left languages flip columns #433

Open
isteves opened this issue Jul 18, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@isteves
Copy link

commented Jul 18, 2018

I'm not sure what the solution is, but when left-to-right (English) characters get mixed with right-to-left characters (in this case, Hebrew), the column names and column contents no longer match:

tibble(a = "א",
       b = "ב")

Would be happy to help solve this issue, but no clue where to start. Just happened to notice this the other day.

For some reason, the reprex shows up weirdly (maybe another issue?)

library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.3

tibble(a = "<U+05D0>",
       b = "<U+05D1>")
#> # A tibble: 1 x 2
#>   a        b       
#>   <chr>    <chr>   
#> 1 <U+05D0> <U+05D1>

@krlmlr krlmlr added the bug label Jul 21, 2018

@krlmlr

This comment has been minimized.

Copy link
Member

commented Jul 21, 2018

Thanks. I think this is the underlying cause:

paste(
  "א",
  "ב"
)
#> [1] "א ב"

Created on 2018-07-21 by the reprex package (v0.2.0).

A solution that works in the R terminal on Linux, but not in reprex or in RStudio (due to weak HTML support for isolates):

# Reference: https://www.w3.org/International/questions/qa-bidi-unicode-controls
fsi <- function(x) paste0("\u2068", x, "\u2069")

paste(
  fsi("א"),
  fsi("ב")
)

Interestingly, data.frame() gets it right (in the terminal, but the display in RStudio is still wrong). I suspect it's using a technique similar to the "first strong isolate" in the previous example. Need to look at how it's implemented there, and also fix this in the RStudio IDE (if possible).

@patperry @brodieG: Any suggestions?

For reference:

For the reprex issue: Windows and Unicode don't get along well, but it might be worth filing an issue with the reprex repo.

@patperry

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2018

FWIW data.frame gets it wrong on my terminal (Mac OS Terminal.app)

I haven't thought much about Bidi, and it's not even clear to me that there's an obvious "correct" behavior in the presence of Bidi text. (What to do if the column contains some Bidi text and some non-Bidi? What about if a row has both Bidi and non-Bidi?)

@isteves if you want to help, probably the first thing to do is to spec out some of the cases and when the correct behavior should be.

@isteves

This comment has been minimized.

Copy link
Author

commented Jul 23, 2018

@patperry Here's a visual intro to how Bidi text works in Excel. My (Windows 10) laptop is typically set to English, but the Hebrew keyboard was enabled for the gif. Thus, cells default to left-to-right (LTR) but can be forced right-to-left (RTL) if Hebrew characters are used.

excel-bidi2

The links that @krlmlr shared seem to go into detail about the logic behind this behavior.

Below, I've outlined some examples of how tables should look. In these examples, the column with A/א represents the first column.

LTR columns

A B
א ב
A12 B34
א12 ב34

RTL columns

ב א
B A
ב34 א12
B34 A12

Note: I'm not sure if these theoretical cases are currently supported as code. It may be possible to import csv's/xlsx's formatted like this (not sure, haven't tried), but I was not able to do it this way:

tibble(א = "a",
        ב = "b")

# reprexed version
tibble(<U+05D0> = "a",
        <U+05D1> = "b")
#> Error: <text>:4:8: unexpected '<'
#> 3: 
#> 4: tibble(<
#>           ^

(data.frame's are no better.)

Mixed column names

Here, the order of the columns should be determined by the type (LTR/RTL) of the first "strong" character in the name of the first column. In the first example, typing "A" first should force LTR columns. In the second example, typing "א" first should force RTL columns.

1 2
2 1

@patperry is this helpful?

@krlmlr

This comment has been minimized.

Copy link
Member

commented Jul 23, 2018

The syntax is

# reprexed version
tibble(`<U+05D0>` = "a",
       `<U+05D1>` = "b")

but this doesn't give the expected results on Linux.

We need to tell the terminal to make up its own mind about each individual cell (using bidi rules for the cell contents), but use LTR to align the cells. The fsi() function in my example embeds the contents into "First strong isolate"/"Pop isolate" codepoints; to my understanding this should achieve the desired behavior but apparently doesn't work on all terminals.

@krlmlr

This comment has been minimized.

Copy link
Member

commented Jul 23, 2018

Trying to be explicit about the cell alignment (using left-to-right isolates) doesn't fix the problem in RStudio -- does it help in the OS X terminal?

# Reference: https://www.w3.org/International/questions/qa-bidi-unicode-controls
fsi <- function(x) paste0("\u2068", x, "\u2069")
ltr <- function(x) paste0("\u2066", x, "\u2069")

cat(
  ltr(
    paste(
      fsi("א"),
      fsi("ב")
    )
  )
)

Maybe it's worth trying different terminals on OS X?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.