New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow matrix and data frame columns #416

Closed
hadley opened this Issue May 31, 2018 · 11 comments

Comments

Projects
None yet
4 participants
@hadley
Copy link
Member

hadley commented May 31, 2018

i.e. relax the length() restriction to be an NROW() restriction

@krlmlr

This comment has been minimized.

Copy link
Member

krlmlr commented Jun 6, 2018

I've seen tidyverse/dplyr#3630, and I wonder what's the rationale. One advantage I see is that we can support POSIXlt columns (and other related data-frame-like data types). For matrices and data frames, I wonder if as_tibble() could flatten these instead (with a consistent naming scheme) so that the columns remain simple vectors.

@brendanf

This comment has been minimized.

Copy link

brendanf commented Jun 6, 2018

With regards to rationale, my use case is for modeling functions like mlm() and mvabund::manyglm() which can take a matrix as the LHS of the model formula to represent a multivariate response; the data argument for those functions ideally has the matrix as a column.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jun 6, 2018

Because I think data frame and matrix columns are an important data structure for reasons that I can't fully articulate yet, but they would be useful in bigrquery (for representing nested, but not repeated, columns) and in data structures for visualisation.

We absolutely do not want to flatten.

@romainfrancois

This comment has been minimized.

Copy link
Member

romainfrancois commented Jun 8, 2018

More from #419 that I'm closing now as a duplicate to this. Sorry for the noise:

library(dplyr, warn.conflicts = FALSE)

df <- data.frame(x1 = rep(1:3, times = 3), x2 = 1:9)
df$x3 <- df %>% mutate(x3 = x2)
as_tibble(df)
#> Error: All columns in a tibble must be a 1d vector or a list:
#> * Column `x3` is data.frame

Related to tidyverse/dplyr#3630

Some dplyr code can lead to a tibble with a data frame column and in that case printing and [ is broken:

library(dplyr, warn.conflicts = FALSE)

df <- data.frame(x1 = rep(1:3, times = 3), x2 = 1:9)
df$x3 <- df %>% mutate(x3 = x2)

d <- group_by(df, x1)
# looks ok
str(d)
#> Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame':  9 obs. of  3 variables:
#>  $ x1: int  1 2 3 1 2 3 1 2 3
#>  $ x2: int  1 2 3 4 5 6 7 8 9
#>  $ x3:'data.frame':  9 obs. of  3 variables:
#>   ..$ x1: int  1 2 3 1 2 3 1 2 3
#>   ..$ x2: int  1 2 3 4 5 6 7 8 9
#>   ..$ x3: int  1 2 3 4 5 6 7 8 9
#>  - attr(*, "groups")=Classes 'tbl_df', 'tbl' and 'data.frame':   3 obs. of  2 variables:
#>   ..$ x1   : int  1 2 3
#>   ..$ .rows:List of 3
#>   .. ..$ : int  1 4 7
#>   .. ..$ : int  2 5 8
#>   .. ..$ : int  3 6 9

# fails
d
#> Error in `[.data.frame`(X[[i]], ...): undefined columns selected

# whereas
as.data.frame(d)[1:3, ]
#>   x1 x2 x3.x1 x3.x2 x3.x3
#> 1  1  1     1     1     1
#> 2  2  2     2     2     2
#> 3  3  3     3     3     3

# that's just weird. `[.grouped_df` is in dplyr
d[1:3, ]
#> # A tibble: 3 x 3
#> # Groups:   x1 [3]
#>      x1    x2 x3                          
#>   <int> <int> <data.frame>                
#> 1     1     1 c(1, 2, 3, 1, 2, 3, 1, 2, 3)
#> 2     2     2 1:9                         
#> 3     3     3 1:9

# but [.tbl_df is in tibble and already does this
class(d) <- c("tbl_df", "tbl", "data.frame")
attr(d,"groups") <- NULL
d[1:3, ]
#> # A tibble: 3 x 3
#>      x1    x2 x3                          
#>   <int> <int> <data.frame>                
#> 1     1     1 c(1, 2, 3, 1, 2, 3, 1, 2, 3)
#> 2     2     2 1:9                         
#> 3     3     3 1:9

# and ... 
d
#> Error in `[.data.frame`(X[[i]], ...): undefined columns selected

romainfrancois added a commit to tidyverse/dplyr that referenced this issue Jun 8, 2018

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jun 8, 2018

Ok, looks like we'll also need to make some basic fixes to the printing code.

It's not clear to me how a df-col or matrix-col should print, so for now we just need some placeholder that at least doesn't error.

@romainfrancois

This comment has been minimized.

Copy link
Member

romainfrancois commented Jun 8, 2018

It's not just printing, [ also needs some attention:

library(dplyr, warn.conflicts = FALSE)

df <- data.frame(x1 = rep(1:3, times = 3), x2 = 1:9)
df$x3 <- df %>% mutate(x3 = x2)

d <- group_by(df, x1)
str(d)
#> Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame':  9 obs. of  3 variables:
#>  $ x1: int  1 2 3 1 2 3 1 2 3
#>  $ x2: int  1 2 3 4 5 6 7 8 9
#>  $ x3:'data.frame':  9 obs. of  3 variables:
#>   ..$ x1: int  1 2 3 1 2 3 1 2 3
#>   ..$ x2: int  1 2 3 4 5 6 7 8 9
#>   ..$ x3: int  1 2 3 4 5 6 7 8 9
#>  - attr(*, "groups")=Classes 'tbl_df', 'tbl' and 'data.frame':   3 obs. of  2 variables:
#>   ..$ x1   : int  1 2 3
#>   ..$ .rows:List of 3
#>   .. ..$ : int  1 4 7
#>   .. ..$ : int  2 5 8
#>   .. ..$ : int  3 6 9

d2 <- d[1:2, ]
str(d2)
#> Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame':  2 obs. of  3 variables:
#>  $ x1: int  1 2
#>  $ x2: int  1 2
#>  $ x3:'data.frame':  9 obs. of  2 variables:
#>   ..$ x1: int  1 2 3 1 2 3 1 2 3
#>   ..$ x2: int  1 2 3 4 5 6 7 8 9
#>  - attr(*, "groups")=Classes 'tbl_df', 'tbl' and 'data.frame':   2 obs. of  2 variables:
#>   ..$ x1   : int  1 2
#>   ..$ .rows:List of 2
#>   .. ..$ : int 1
#>   .. ..$ : int 2
@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jun 8, 2018

Oh got it - [.tbl_df will need to supply additional missing arguments to [.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jun 8, 2018

Something like this, I think:

subset_ROW <- function(x, i) {
  nd <- length(dim(x))
  if (nd <= 1L) {
    x[i]
  } else {
    dims <- rep(list(rlang::missing_arg()), nd - 1L)
    expr <- rlang::expr(x[i, !!!dims, drop = FALSE])
    rlang::eval_bare(expr)
  }
}

subset_ROW(1:10, 4:6)
subset_ROW(array(1:10, c(10)), 4:6)
subset_ROW(array(1:10, c(10, 1)), 4:6)
subset_ROW(array(1:10, c(10, 1, 1)), 4:6)
@romainfrancois

This comment has been minimized.

Copy link
Member

romainfrancois commented Jun 8, 2018

Nice. I'll probably need to extract the part that generate the expression for internal use in dplyr, e.g. for what is bound to a symbol in the grouped case.

Hopefully multi dimensional array columns won't be too much of a thing, matrix and data frame columns are already going to require some grasping.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jun 8, 2018

That's a good point. Rather than being so clever it might be better to do this:

subset_ROW <- function(x, i) {
  nd <- length(dim(x))
  if (nd <= 1L) {
    x[i]
  } else if (nd == 2L) {
    x[i, , drop = FALSE]
  } else {
    stop("Only 0, 1, and 2d components are supported", call. = FALSE)
  }
}

romainfrancois added a commit to tidyverse/dplyr that referenced this issue Jun 8, 2018

@krlmlr

This comment has been minimized.

Copy link
Member

krlmlr commented Jun 29, 2018

I have a working implementation locally, will push tomorrow. pillar is ready to handle the printing of these tibbles.

@krlmlr krlmlr closed this in c7b49d8 Jun 30, 2018

romainfrancois added a commit to tidyverse/dplyr that referenced this issue Aug 7, 2018

romainfrancois added a commit to tidyverse/dplyr that referenced this issue Aug 7, 2018

romainfrancois added a commit to tidyverse/dplyr that referenced this issue Aug 14, 2018

maxheld83 added a commit to maxheld83/tibble that referenced this issue Aug 23, 2018

romainfrancois added a commit to tidyverse/dplyr that referenced this issue Sep 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment