Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upadd in english names to jpnprefs dataset #21
Conversation
| @@ -14,6 +14,10 @@ library(tidyverse) | |||
| # dplyr # 0.7.6 | |||
| # tidyr # 0.8.1 | |||
| # purrr # 0.2.5 | |||
| # (stringr) # 1.3.1 | |||
|
|
|||
| library(polite) # 0.0.0.9004 | |||
uribo
Sep 20, 2018
Owner
Why should we introduce the polite package?
This package is certainly useful, but has not yet been registered with CRAN.
Why should we introduce the polite package?
This package is certainly useful, but has not yet been registered with CRAN.
Ryo-N7
Sep 20, 2018
•
Author
Contributor
yeah, I suppose it's not entirely necessary to use this package for now. We're only scraping from Wikipedia anyways. I use it as part of my workflow but I understand from a package development/maintenance point of view that it's not necessary.
We can just replace it with the regular rvest code instead:
url <- "https://en.wikipedia.org/wiki/Prefectures_of_Japan"
jpn_pref_raw <- read_html(url) %>%
html_nodes("table.wikitable:nth-child(49)") %>%
html_table() %>%
purrr::flatten_df()
url2 <- "https://en.wikipedia.org/wiki/List_of_Japanese_prefectures_by_population"
jpn_pref2_raw <- read_html(url) %>%
html_nodes("table.wikitable:nth-child(7)") %>%
html_table() %>%
purrr::flatten_df()
yeah, I suppose it's not entirely necessary to use this package for now. We're only scraping from Wikipedia anyways. I use it as part of my workflow but I understand from a package development/maintenance point of view that it's not necessary.
We can just replace it with the regular rvest code instead:
url <- "https://en.wikipedia.org/wiki/Prefectures_of_Japan"
jpn_pref_raw <- read_html(url) %>%
html_nodes("table.wikitable:nth-child(49)") %>%
html_table() %>%
purrr::flatten_df()
url2 <- "https://en.wikipedia.org/wiki/List_of_Japanese_prefectures_by_population"
jpn_pref2_raw <- read_html(url) %>%
html_nodes("table.wikitable:nth-child(7)") %>%
html_table() %>%
purrr::flatten_df()| purrr::flatten_df() | ||
|
|
||
| jpn_pref_df <- jpn_pref_raw %>% | ||
| janitor::clean_names() %>% |
uribo
Sep 20, 2018
Owner
I do not feel motivated to use janitor for this process.
Is it possible to change to a method using dplyr::select() which explicitly selects and renames a variable?
I do not feel motivated to use janitor for this process.
Is it possible to change to a method using dplyr::select() which explicitly selects and renames a variable?
Ryo-N7
Sep 20, 2018
•
Author
Contributor
Yeah, sure! I'll do it similar to how you set the names for the Japanese table using set_colnames():
jpn_pref_df <- jpn_pref_raw %>%
select(2, 4, 5) %>%
set_colnames(c("kanji", "region_en", "major_island_en")) %>%
mutate(region_en = region_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT"))
Yeah, sure! I'll do it similar to how you set the names for the Japanese table using set_colnames():
jpn_pref_df <- jpn_pref_raw %>%
select(2, 4, 5) %>%
set_colnames(c("kanji", "region_en", "major_island_en")) %>%
mutate(region_en = region_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT")) | purrr::flatten_df() | ||
|
|
||
| jpn_pref2_df <- jpn_pref2_raw %>% | ||
| janitor::clean_names() %>% |
uribo
Sep 20, 2018
Owner
Same as above.
Same as above.
Ryo-N7
Sep 20, 2018
Author
Contributor
jpn_pref2_df <- jpn_pref2_raw %>%
select(3, 2, 4) %>%
set_colnames(c("kanji", "prefecture_en", "capital_en")) %>%
mutate(prefecture_en = prefecture_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT"),
capital_en = capital_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT"))
jpn_pref2_df <- jpn_pref2_raw %>%
select(3, 2, 4) %>%
set_colnames(c("kanji", "prefecture_en", "capital_en")) %>%
mutate(prefecture_en = prefecture_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT"),
capital_en = capital_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT"))|
Thanks your contribution :) |
When working with the jpndistrict package I often find myself needing the English names of the prefectures, capitals, regions, etc. For the convenience of other non-Japanese users of this package I thought I would add in the English names for the prefectures, prefecture capital, region, and major island taken from Wikipedia to the "jpnprefs" dataset. I have also updated the tests as well.
Please review the changes at your convenience, thanks!