New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong encoding detection #25
Comments
Hmmm, that's a weird one - I can't even figure out what the encoding is supposed to be. |
It's |
A few more diagnostics: library("httr")
library("rvest")
url <- "http://psytests.org"
# No encoding in http request
r <- GET(url)
headers(r)$`Content-Type`
# So default text content from httr is bad
content(r, "text")
# stringi thinks encoding is ISO-8859-1
as.data.frame(stringi::stri_enc_detect(content(r, "raw"))[[1]])
# But it's not
stringi::stri_encode(content(r, "raw"), "ISO-8859-1")
# It's actually cp1251
stringi::stri_encode(content(r, "raw"), "cp1251")
# Which also works when we give it to content
content(r, "text", encoding = "cp1251")
# But not when we give it to rvest::html
rvest::html("http://psytests.org", encoding = "cp1251") |
rvest::html("http://psytests.org", encoding = "cp1251") Translates to rvest:::html.response(httr::GET("http://psytests.org"), encoding = "cp1251") In |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi.
Return a broken symbols but
works without any additional actions.
The text was updated successfully, but these errors were encountered: