Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i try to scrape a webpage http://www3.boj.or.jp/market/jp/stat/of141205.htm
require(rvest) url='http://www3.boj.or.jp/market/jp/stat/of141205.htm' # bad, return string like: I�t�@�[ (12��5�ú���à��) html(url, encoding='utf-8') %>% html_nodes('title') %>% html_text() html(url, encoding='SHIFT_JIS') %>% html_nodes('title') %>% html_text() # good, return: オファー (12月5日<金>) html(readLines(url, encoding='utf-8')) %>% html_nodes('title') %>% html_text() what is the difference between `html` and `readLines` in deal with encoding?
The text was updated successfully, but these errors were encountered:
I don't know why the last example works, but specifying the encoding is unrelated:
html(readLines(url)) %>% html_nodes('title') %>% html_text() [1] "オファー (12月5日<金>)" There were 12 warnings (use warnings() to see them) > warnings() Warning messages: 1: In grepl("^http", x) : input string 5 is invalid in this locale 2: In grepl("^http", x) : input string 7 is invalid in this locale 3: In grepl("^http", x) : input string 10 is invalid in this locale 4: In grepl("^http", x) : input string 11 is invalid in this locale 5: In grepl("^http", x) : input string 12 is invalid in this locale 6: In if (grepl("^http", x)) { ... : the condition has length > 1 and only the first element will be used 7: In grepl("<|>", x) : input string 5 is invalid in this locale 8: In grepl("<|>", x) : input string 7 is invalid in this locale 9: In grepl("<|>", x) : input string 10 is invalid in this locale 10: In grepl("<|>", x) : input string 11 is invalid in this locale 11: In grepl("<|>", x) : input string 12 is invalid in this locale 12: In if (grepl("<|>", x)) { ... : the condition has length > 1 and only the first element will be used
Sorry, something went wrong.
tidyverse/rvest@4eb5a8d
No branches or pull requests
i try to scrape a webpage http://www3.boj.or.jp/market/jp/stat/of141205.htm
The text was updated successfully, but these errors were encountered: