Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Right-to-left languages flip columns #433

Closed
isteves opened this issue Jul 18, 2018 · 11 comments
Closed

Right-to-left languages flip columns #433

isteves opened this issue Jul 18, 2018 · 11 comments

Comments

@isteves
Copy link

@isteves isteves commented Jul 18, 2018

I'm not sure what the solution is, but when left-to-right (English) characters get mixed with right-to-left characters (in this case, Hebrew), the column names and column contents no longer match:

tibble(a = "א",
       b = "ב")

Would be happy to help solve this issue, but no clue where to start. Just happened to notice this the other day.

For some reason, the reprex shows up weirdly (maybe another issue?)

library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.3

tibble(a = "<U+05D0>",
       b = "<U+05D1>")
#> # A tibble: 1 x 2
#>   a        b       
#>   <chr>    <chr>   
#> 1 <U+05D0> <U+05D1>
@krlmlr krlmlr added the bug label Jul 21, 2018
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Jul 21, 2018

Thanks. I think this is the underlying cause:

paste(
  "א",
  "ב"
)
#> [1] "א ב"

Created on 2018-07-21 by the reprex package (v0.2.0).

A solution that works in the R terminal on Linux, but not in reprex or in RStudio (due to weak HTML support for isolates):

# Reference: https://www.w3.org/International/questions/qa-bidi-unicode-controls
fsi <- function(x) paste0("\u2068", x, "\u2069")

paste(
  fsi("א"),
  fsi("ב")
)

Interestingly, data.frame() gets it right (in the terminal, but the display in RStudio is still wrong). I suspect it's using a technique similar to the "first strong isolate" in the previous example. Need to look at how it's implemented there, and also fix this in the RStudio IDE (if possible).

@patperry @brodieG: Any suggestions?

For reference:

For the reprex issue: Windows and Unicode don't get along well, but it might be worth filing an issue with the reprex repo.

@patperry
Copy link
Contributor

@patperry patperry commented Jul 22, 2018

FWIW data.frame gets it wrong on my terminal (Mac OS Terminal.app)

I haven't thought much about Bidi, and it's not even clear to me that there's an obvious "correct" behavior in the presence of Bidi text. (What to do if the column contains some Bidi text and some non-Bidi? What about if a row has both Bidi and non-Bidi?)

@isteves if you want to help, probably the first thing to do is to spec out some of the cases and when the correct behavior should be.

@isteves
Copy link
Author

@isteves isteves commented Jul 23, 2018

@patperry Here's a visual intro to how Bidi text works in Excel. My (Windows 10) laptop is typically set to English, but the Hebrew keyboard was enabled for the gif. Thus, cells default to left-to-right (LTR) but can be forced right-to-left (RTL) if Hebrew characters are used.

excel-bidi2

The links that @krlmlr shared seem to go into detail about the logic behind this behavior.

Below, I've outlined some examples of how tables should look. In these examples, the column with A/א represents the first column.

LTR columns

A B
א ב
A12 B34
א12 ב34

RTL columns

ב א
B A
ב34 א12
B34 A12

Note: I'm not sure if these theoretical cases are currently supported as code. It may be possible to import csv's/xlsx's formatted like this (not sure, haven't tried), but I was not able to do it this way:

tibble(א = "a",
        ב = "b")

# reprexed version
tibble(<U+05D0> = "a",
        <U+05D1> = "b")
#> Error: <text>:4:8: unexpected '<'
#> 3: 
#> 4: tibble(<
#>           ^

(data.frame's are no better.)

Mixed column names

Here, the order of the columns should be determined by the type (LTR/RTL) of the first "strong" character in the name of the first column. In the first example, typing "A" first should force LTR columns. In the second example, typing "א" first should force RTL columns.

1 2
2 1

@patperry is this helpful?

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Jul 23, 2018

The syntax is

# reprexed version
tibble(`<U+05D0>` = "a",
       `<U+05D1>` = "b")

but this doesn't give the expected results on Linux.

We need to tell the terminal to make up its own mind about each individual cell (using bidi rules for the cell contents), but use LTR to align the cells. The fsi() function in my example embeds the contents into "First strong isolate"/"Pop isolate" codepoints; to my understanding this should achieve the desired behavior but apparently doesn't work on all terminals.

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Jul 23, 2018

Trying to be explicit about the cell alignment (using left-to-right isolates) doesn't fix the problem in RStudio -- does it help in the OS X terminal?

# Reference: https://www.w3.org/International/questions/qa-bidi-unicode-controls
fsi <- function(x) paste0("\u2068", x, "\u2069")
ltr <- function(x) paste0("\u2066", x, "\u2069")

cat(
  ltr(
    paste(
      fsi("א"),
      fsi("ב")
    )
  )
)

Maybe it's worth trying different terminals on OS X?

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Jul 11, 2019

So the following works in RStudio and in reprex:

# Reference: https://www.w3.org/International/questions/qa-bidi-unicode-controls
fsi <- function(...) paste0("\u2068", ..., "\u2069")
lri <- function(...) paste0("\u2066", ..., "\u2069")
rli <- function(...) paste0("\u2067", ..., "\u2069")
lro <- function(...) paste0("\u202d", ..., "\u202c")

aleph <- "א"
bet <- "ב"

# Text output is right-to-left...
paste0(aleph, bet)
#> [1] "אב"

# but we can force the components to be left to right
lro(paste0(
  fsi(aleph, bet),
  " ",
  fsi(bet, aleph)
))
#> [1] "‭⁨אב⁩ ⁨בא⁩‬"

Created on 2019-07-11 by the reprex package (v0.3.0)

RStudio and reprex run in HTML. However, most terminals don't know and don't care about RTL text. In the terminals I've tested, paste0(aleph, bet) will have aleph on the left-hand side of bet, i.e. in the wrong direction. The only exception (on Linux) is mlterm, but that package didn't work well with the bidi control characters.

Is there a terminal implementation maybe on macOS where this reprex shows output consistent with the HTML output?

@isteves
Copy link
Author

@isteves isteves commented Jul 11, 2019

@krlmlr

fsi <- function(...) paste0("\u2068", ..., "\u2069")
lri <- function(...) paste0("\u2066", ..., "\u2069")
rli <- function(...) paste0("\u2067", ..., "\u2069")
lro <- function(...) paste0("\u202d", ..., "\u202c")

aleph <- "א"
bet <- "ב"

# Text output is right-to-left...
paste0(aleph, bet)
#> [1] "אב"

# but we can force the components to be left to right
lro(paste0(
  fsi(aleph, bet),
  " ",
  fsi(bet, aleph)
))
#> [1] "‭\u2068אב\u2069 \u2068בא\u2069‬"

Created on 2019-07-11 by the reprex package (v0.2.1)

(This is what it looks like on macOS.)

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Apr 16, 2021

@isteves: Can you please try this revised example on a macOS terminal? It works for me in VS Code too:

# Reference: https://www.w3.org/International/questions/qa-bidi-unicode-controls
fsi <- function(...) paste0("\u2068", ..., "\u2069")
lri <- function(...) paste0("\u2066", ..., "\u2069")
rli <- function(...) paste0("\u2067", ..., "\u2069")
lro <- function(...) paste0("\u202d", ..., "\u202c")

aleph <- "א"
bet <- "ב"

# Text output is right-to-left...
paste0(aleph, bet)
#> [1] "אב"

# but we can force the components to be left to right
writeLines(lro(paste0(
  fsi(aleph, bet),
  " ",
  fsi(bet, aleph)
)))
#> ‭⁨אב⁩ ⁨בא⁩‬

Created on 2021-04-16 by the reprex package (v1.0.0)

If this works, I think I can add it to the next pillar release, as an option.

Related: would it be natural for RTL folks to have all columns ordered from right to left? Does Excel offer such an option? Need to figure out how to deal with the header and footer.

@krlmlr krlmlr added this to the 3.1.2 milestone Apr 16, 2021
@isteves
Copy link
Author

@isteves isteves commented Apr 19, 2021

I'm getting the same result:

fsi <- function(...) paste0("\u2068", ..., "\u2069")
lri <- function(...) paste0("\u2066", ..., "\u2069")
rli <- function(...) paste0("\u2067", ..., "\u2069")
lro <- function(...) paste0("\u202d", ..., "\u202c")

aleph <- "א"
bet <- "ב"

# Text output is right-to-left...
paste0(aleph, bet)
#> [1] "אב"

# but we can force the components to be left to right
writeLines(lro(paste0(
  fsi(aleph, bet),
  " ",
  fsi(bet, aleph)
)))
#> ‭⁨אב⁩ ⁨בא⁩‬

Created on 2021-04-19 by the reprex package (v0.2.1)

Excel does offer the option of RTL sheets as well as RTL cells. As for what is "natural", I have no idea -- perhaps @adisarid or @yogevherz can help answer that.

@adisarid
Copy link

@adisarid adisarid commented Apr 19, 2021

Specifically relating to "what feels natural" of RTL sheets, excel offers a button under the "Page Layout" menu called "Sheet Right-to-Left" (in the snapshot roughly in the middle of the screen) which upon enabling will flip the entire sheet. I.e.:
Before
image

After
image

Consider adding as a settings in the tools/global options menu under "Console -> Display". A checkbox of "Parse RTL languages with an RTL layout".

@krlmlr krlmlr removed this from the 3.1.2 milestone Jul 17, 2021
@krlmlr krlmlr added this to the 3.1.3 milestone Jul 17, 2021
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Jul 20, 2021

@krlmlr krlmlr closed this Jul 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants