Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the demo("bench-merge") cannot be run for various reasons #1487

Closed
knbknb opened this issue Oct 30, 2015 · 2 comments
Closed

the demo("bench-merge") cannot be run for various reasons #1487

knbknb opened this issue Oct 30, 2015 · 2 comments
Milestone

Comments

@knbknb
Copy link

@knbknb knbknb commented Oct 30, 2015

This command:

demo(package = .packages(all.available = TRUE))  

yields

Demos in package ‘dplyr’:
bench-merge                         Benchmark merging between R and python
bench-rbind                         Benchmark various flavours of rbind
bench-set                           Benchmark set operations on data frames

However, demo "bench-merge" cannot be run by an ordinary user, at least not by me.
(The 2 other demos work properly, though).

I was able to almost make bench-merge run.

I know some python, so I already had a working pandas module installed.
In R, First I had to install some missing packages, microbenchmark. R told me what's needed.
Then I had to create a subdirectory demo/pandas in the package directory
I had to issue a setwd("${r_pkg_directory}/dplyr/demo/"), because this dir did not exist.
Then I had to clone the git repository demo/pandas/bench_merge.py , because this .py file does not get installed by install.packages("dplyr"). The I copied pandas.py from the cloned repo to ${r_pkg_directory}/dplyr/demo/pandas.
I also installed the development version of dplyr because I hoped that would give me all missing files.
Then I was able to run the demo, but now R segfaults.

I think the easiest workaround would be to change the "description" line from
 Benchmark merging between R and python

to something like

 Benchmark merging between R and python (internal demo, for developers only)

Some info about my computing environment.

packageVersion("dplyr")
[1] ‘0.4.3.9000’
R> getwd()
[1] "/home/knb/code/git/dplyr/demo"
R> demo("bench-merge")


    demo(bench-merge)
    ---- ~~~~~~~~~~~

Type     to start : 

R> # Compare base, data table, dplyr and pandas
R> #
R> # To install pandas on OS X:
R> # * brew update && brew install python
R> # * pip install --upgrade setuptools
R> # * pip install --upgrade pip
R> # * pip install pandas
R> 
R> library(dplyr)

R> library(data.table)

R> library(microbenchmark)

R> library(reshape2)

R> set.seed(1014)

R> # Generate sample data ---------------------------------------------------------
R> 
R> random_strings <- function(n, m) {
+   mat <- matrix(sample(letters, m * n, rep = TRUE), ncol = m)
+   apply(mat, 1, paste, collapse = "")
+ }

R> N <- 10000

R> indices  <- random_strings(N, 10)

R> indices2 <- random_strings(N, 10)

R> left <- data.frame(
+   key = rep(indices[1:8000], 10),
+   key2 = rep(indices2[1:8000], 10),
+   value = rnorm(80000)
+ )

R> right <- data.frame(
+   key = indices[2001:10000],
+   key2 = indices2[2001:10000],
+   value2 = rnorm(8000)
+ )

R> write.csv(left, "pandas/left.csv", row.names = FALSE)

R> write.csv(right, "pandas/right.csv", row.names = FALSE)

R> # Equivalent functions for each technique --------------------------------------
R> 
R> base <- list(
+   setup = function(x, y) list(x = x, y = y),
+   
+   left  = function(x, y) base::merge(x, y, all.x = TRUE),
+   right = function(x, y) base::merge(x, y, all.y = TRUE),
+   inner = function(x, y) base::merge(x, y)
+ )

R> data.table <- list(
+   setup = function(x, y) {
+     list(
+       x = data.table(x, key = c("key", "key2")),
+       y = data.table(y, key = c("key", "key2"))
+     )
+   },
+   
+   left  = function(x, y) x[y],
+   right = function(x, y) y[x],
+   inner = function(x, y) merge(x, y, all = FALSE)
+ )

R> dplyr <- list(
+   setup = function(x, y) list(x = x, y = y),
+   
+   left  = function(x, y) left_join(x, y, by = c("key", "key2")),
+   right = function(x, y) NULL,
+   inner = function(x, y) inner_join(x, y, by = c("key", "key2"))
+ )

R> techniques <- list(base = base, data.table = data.table, dplyr = dplyr)

R> # Aggregate results ------------------------------------------------------------
R> 
R> niter <- 10

R> r <- lapply(names(techniques), function(nm) {
+   tech <- techniques[[nm]]
+   df <- tech$setup(left, right)
+   m <- microbenchmark(
+     left = tech$left(df$x, df$y),
+     right = tech$right(df$x, df$y),
+     inner = tech$inner(df$x, df$y),
+     times = niter
+   )
+   
+   means <- tapply(m$time, m$expr, FUN = mean) / 1e9
+   data.frame(type = names(means), mean = means, tech = nm, 
+     row.names = NULL, stringsAsFactors = FALSE)
+ })

 *** caught segfault ***
address 0x2710, cause 'memory not mapped'

Traceback:
 1: .Call("dplyr_left_join_impl", PACKAGE = "dplyr", x, y, by_x,     by_y)
 2: left_join_impl(x, y, by$x, by$y)
 3: left_join.tbl_df(tbl_df(x), y, by = by, copy = copy, ...)
 4: left_join(tbl_df(x), y, by = by, copy = copy, ...)
 5: as.data.frame(left_join(tbl_df(x), y, by = by, copy = copy, ...))
 6: left_join.data.frame(x, y, by = c("key", "key2"))
 7: left_join(x, y, by = c("key", "key2"))
 8: tech$left(df$x, df$y)
 9: microbenchmark(left = tech$left(df$x, df$y), right = tech$right(df$x,     df$y), inner = tech$inner(df$x, df$y), times = niter)
10: FUN(X[[i]], ...)
11: lapply(names(techniques), function(nm) {    tech <- techniques[[nm]]    df <- tech$setup(left, right)    m <- microbenchmark(left = tech$left(df$x, df$y), right = tech$right(df$x,         df$y), inner = tech$inner(df$x, df$y), times = niter)    means <- tapply(m$time, m$expr, FUN = mean)/1e+09    data.frame(type = names(means), mean = means, tech = nm,         row.names = NULL, stringsAsFactors = FALSE)})
12: eval(expr, envir, enclos)
13: eval(ei, envir)
14: withVisible(eval(ei, envir))
15: source(available, echo = echo, max.deparse.length = Inf, keep.source = TRUE,     encoding = encoding)
16: demo("bench-merge")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 2
Warning messages:
1: In inner_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
2: In inner_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
3: In inner_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
4: In inner_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
5: In inner_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
6: In inner_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
7: In left_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
8: In left_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.10

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] colorout_1.1-1

@hadley hadley added this to the 0.5 milestone Mar 1, 2016
@hadley
Copy link
Member

@hadley hadley commented Mar 1, 2016

Fixed in eb45d04

@lock
Copy link

@lock lock bot commented Sep 16, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Sep 16, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants