Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UTF-8-related problem on Windows #76

Merged
merged 2 commits into from May 25, 2017

Conversation

yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Mar 19, 2017

On Windows machines where the default locale is not UTF-8, the output of reprex() cannot be rendered properly due to Encoding-related issue.

Problem

An example code is here:

reprex::reprex({
    cat("")
})

This will be corrupted as bellow:

cat("縺・)
#> 縺・

Cause and fix

First, r_file is written out with native encoding here:

 writeLines(the_source, r_file)

Though the_source may be encoded with UTF-8 or the native encoding, depending on whether the source is acquired from the argument or the clipboard, r_file is in the native encoding anyway. So far, so good.

Then, r_file is rendered by rmarkdown::render(), which outputs files with UTF-8, here:

  output_file <- md_file <- reprex_(r_file)

Since output_file and md_file is UTF-8. we have to specify encoding = "UTF-8" here,

  output_lines <- readLines(md_file)

and here:

    if (is.null(outfile)) {
      html_file <- rmarkdown::render(md_file, quiet = TRUE)
    } else {
      html_file <- strip_ext(basename(md_file))
      html_file <- tempfile(pattern = paste0(html_file, "_"), fileext = ".html")
      html_file <-
        rmarkdown::render(md_file, output_file = html_file, quiet = TRUE)
}

Additionally, I wonder we should also add encoding here, but I'm not really sure that the output R script should be always with UTF-8.

 writeLines(output_lines, output_file)

@yutannihilation
Copy link
Member Author

Note that this will conflict with the changes I've made in #75. I will rebase either one after the other one is merged :)

@jennybc
Copy link
Member

jennybc commented Mar 19, 2017

Thanks! I'm on vacation right now. Please feel free to nudge me on one or the other if I forget to come back to this in the next week or so.

@yutannihilation
Copy link
Member Author

Sure. Have a nice vacation!

@yutannihilation
Copy link
Member Author

@jennybc Hi, are you still in vacation? If not, I'm grad if you take a look at my PR :)

@jennybc
Copy link
Member

jennybc commented Apr 2, 2017

I am back, but pushing to get readxl released. Then I will come back to this. Sorry for the delay.

@yutannihilation
Copy link
Member Author

Never mind, I will wait. 👍

@jennybc
Copy link
Member

jennybc commented Apr 2, 2017

I am motivated to fix this because encoding problems actually came up while trying to create reprexes for some issues in readxl 😂.

@yutannihilation
Copy link
Member Author

Oh, it's a good reason :)

But, I guess you can just install my PR (devtools::install_github("jennybc/reprex#76")) temporarily in order to produce UTF-8-related reprexes, and review this later after finishing your job in readxl. Thinking about encoding is a tough job, which will take your time...

@hadley
Copy link
Member

hadley commented May 25, 2017

This looks good to me.

@jennybc jennybc merged commit 40ba2c3 into tidyverse:master May 25, 2017
@yutannihilation yutannihilation deleted the encoding branch May 25, 2017 23:16
@yutannihilation
Copy link
Member Author

Thanks for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants