New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
map_df fails on mapping functions that generate single rows #179
Comments
|
Related with this issue: tidyverse/dplyr#1450. |
|
More threads on the general topic of: "it can be hard to glue conformable things together row-wise": tidyverse/dplyr#1104, #112. I must admit I haven't really explored the Here's another library(purrr)
alist <- list(
data.frame(a = 1:2, b = 6:7),
data.frame(a = 3:5, b = 8:10)
)
alist %>%
transpose() %>%
map_df(. %>% map_dbl(mean))
#> Source: local data frame [2 x 2]
#>
#> a b
#> (dbl) (dbl)
#> 1 1.5 6.5
#> 2 4.0 9.0 |
|
Coerce a map_df(alist, ~as.list(colMeans(.x)), .id = "df")
#> Source: local data frame [2 x 3]
#>
#> df a b
#> (chr) (dbl) (dbl)
#> 1 1 5.5 5.5
#> 2 2 25.5 35.5@jennybc your example not works for me: alist %>% transpose() %>% map_df(. %>% map_dbl(mean))
#> Rcpp::exception in 'eval(expr, envir, enclos)':
#> cannot convert object to a data frame
|
|
@artemklevtsov Hmm... I just successfully ran my example again in a clean R session. FWIW I've got |
|
@jennybc I used stable version. Anyway UPD: with git version the same error. This equivalent to: alist %>% transpose() %>% map(. %>% map_dbl(mean)) %>% dplyr::bind_rows()But UPD2: works with |
|
@artemklevtsov Well there still must be some version mismatch between us, because it works for me. I'm not making it up! |
|
The thing is that Adirectional vectors are normally ambiguous as to their nature, they can either be a row vector or a column vector. It looks like the development version of dplyr considers them to be column vectors: row_vectors <- list(
c(a = 1, b = 2),
c(a = 3, b = 4)
)
col_vectors <- list(
a = c(1, 2),
b = c(3, 4)
)bind_rows(row_vectors)
#> Error: cannot convert object to a data frame
bind_rows(col_vectors)
#> Source: local data frame [2 x 2]
#>
#> a b
#> (dbl) (dbl)
#> 1 1 3
#> 2 2 4This is because a named list of vectors is a dataframeable object. I think that behaviour with vectors is a bit weird because
That's a case where it makes sense to use alist %>% map_df(. %>% dmap(mean))
#> Source: local data frame [2 x 2]
#>
#> a b
#> (dbl) (dbl)
#> 1 1.5 6.5
#> 2 4.0 9.0 |
|
@lionel- I think the dmap solution is the one that wins here, although the others are great. But this makes the most sense in terms of generalizing beyond colMeans to other functions. I see the conformability row v. col vector argument. Just seemed like sensible behaviour if a data frame came in, a single row was produced, that it would be a single row of a data frame out. But - I recognize that's making a lot of assumptions that might not be valid, even with map_df. Thanks for a great discussion, all! This was illuminating. |
|
Final note about performance: library(purrr)
library(microbenchmark)
list_dfs <- lapply(1:100, function(...) as.data.frame(replicate(10, runif(1000))))
microbenchmark(
map_df(list_dfs, ~as.list(colMeans(.x))),
map_df(transpose(list_dfs), . %>% map_dbl(mean)),
map_df(list_dfs, . %>% dmap(mean)))
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> map_df(list_dfs, ~as.list(colMeans(.x))) 10.465237 10.879279 12.754523 12.43822 12.870534 21.255816 100 b
#> map_df(transpose(list_dfs), . %>% map_dbl(mean)) 4.604981 4.712693 5.476855 4.79528 5.652353 9.876513 100 a
#> map_df(list_dfs, . %>% dmap(mean)) 20.829029 21.324822 23.918506 21.57097 23.288950 97.466169 100 cTo improve @jennybc solution: mean2 <- function(x) sum(x) / length(x)
microbenchmark(
map_df(transpose(list_dfs), . %>% map_dbl(mean)),
map_df(transpose(list_dfs), . %>% map_dbl(mean2)))
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> map_df(transpose(list_dfs), . %>% map_dbl(mean)) 5.268310 5.326473 5.797630 5.384035 5.540218 8.678301 100 b
#> map_df(transpose(list_dfs), . %>% map_dbl(mean2)) 2.214055 2.266207 2.495217 2.317600 2.366828 4.773380 100 a |
|
This is educational! I feel like I still struggle with row binding with @lionel- How would you do this with x <- dplyr::data_frame(int = 1:3,
let = letters[int],
fac = factor(let),
dbl = int + 0.1)
plyr::ldply(x, class, .id = "var_name")
#> var_name V1
#> 1 int integer
#> 2 let character
#> 3 fac factor
#> 4 dbl numeric |
hmm I'd probably use tidyr: dmap(x, class) %>% tidyr::gather()
#> Source: local data frame [4 x 2]
#>
#> key value
#> (chr) (chr)
#> 1 int integer
#> 2 let character
#> 3 fac factor
#> 4 dbl numeric |
|
Is this a good example of where one needs to use dmap, and NOT map_df? iI have URL's I want to scrape, which are in a dataframe, and when I want to scrape them, map_df fails because of bind_rows: pacman::p_load("httr","magrittr", "dplyr", "purrr","rvest")
get_page <- function(i=1, pb=NULL){
if (!is.null(pb)) pb$tick()$print()
result = POST(data$URL[[i]])
stop_for_status(result)
A = content(result, as="parsed", encoding = "iso-8859-1")
#Assign output to table$link new col
#data$SCRAPED_NAME[[i]] <-
A %>%
html_nodes(css ="#_brand4 span") %>%
html_text()
i <- i+1
}
data = data.frame(
"URL" = c("https://www.ratebeer.com/beer/8481","https://www.ratebeer.com/beer/3228/"),
"SCRAPED_NAME" = NA, stringsAsFactors = FALSE
)
debug(get_page)
finaldf = map_df(1:(length(data$URL)),get_page)
#> debugging in: .f(.x[[i]], ...)
#> debug at <text>#2: {
#> if (!is.null(pb))
#> pb$tick()$print()
#> result = POST(data$URL[[i]])
#> stop_for_status(result)
#> A = content(result, as = "parsed", encoding = "iso-8859-1")
#> A %>% html_nodes(css = "#_brand4 span") %>% html_text()
#> i <- i + 1
#> }
#> debug at <text>#3: if (!is.null(pb)) pb$tick()$print()
#> debug at <text>#4: result = POST(data$URL[[i]])
#> debug at <text>#5: stop_for_status(result)
#> debug at <text>#6: A = content(result, as = "parsed", encoding = "iso-8859-1")
#> debug at <text>#10: A %>% html_nodes(css = "#_brand4 span") %>% html_text()
#> debug at <text>#13: i <- i + 1
#> exiting from: .f(.x[[i]], ...)
#> debugging in: .f(.x[[i]], ...)
#> debug at <text>#2: {
#> if (!is.null(pb))
#> pb$tick()$print()
#> result = POST(data$URL[[i]])
#> stop_for_status(result)
#> A = content(result, as = "parsed", encoding = "iso-8859-1")
#> A %>% html_nodes(css = "#_brand4 span") %>% html_text()
#> i <- i + 1
#> }
#> debug at <text>#3: if (!is.null(pb)) pb$tick()$print()
#> debug at <text>#4: result = POST(data$URL[[i]])
#> debug at <text>#5: stop_for_status(result)
#> debug at <text>#6: A = content(result, as = "parsed", encoding = "iso-8859-1")
#> debug at <text>#10: A %>% html_nodes(css = "#_brand4 span") %>% html_text()
#> debug at <text>#13: i <- i + 1
#> exiting from: .f(.x[[i]], ...)
#> Error in bind_rows_(x, .id): cannot convert object to a data frame |
|
It's better if you create a minimal reprex rather than a complex example. I think this should now work if you install the dev version of dplyr. |
|
Sorry for thread-rezzing, but I'd like to add my own explorations, maybe |
If a map function creates a single row of output, map_df will fail in making a data frame. One can force it to work by making a data frame of the transpose of the output, but that seems like an unnecessary PITA.
While there may be a different function to use, this strikes me as behavior that is not quite sensible.
Example:
alist <- list(
data.frame(a=1:10, b=1:10),
data.frame(a=21:30, b=31:40)
)
returns normally
map_df(alist, .f = function(x) x+1)
returns Error: cannot convert object to a data frame
map_df(alist, .f = colMeans)
returns with the proper output
map_df(alist, function(x) data.frame(t(colMeans(x))))
The text was updated successfully, but these errors were encountered: