Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of list() that uses NSE to name components lazily #1290

Closed
jennybc opened this issue Jul 28, 2015 · 5 comments
Closed

Implementation of list() that uses NSE to name components lazily #1290

jennybc opened this issue Jul 28, 2015 · 5 comments
Labels
feature a feature request or enhancement
Milestone

Comments

@jennybc
Copy link
Member

jennybc commented Jul 28, 2015

Opening at @hadley's suggestion on twitter. Relevant to a triplicated question on stackoverflow. Overlaps with the store() proposal by @lionel- over in purrr.

If list components could get named at creation time, that would be handy.

thing_one <- head(iris, 3)
thing_two <- head(iris, 3)

## our current options
thing_list <- list(thing_one = thing_one, thing_two = thing_two)
thing_list <- mget(c("thing_one", "thing_two"))
str(thing_list, max.level = 1)
## List of 2
##  $ thing_one:'data.frame':   3 obs. of  5 variables:
##  $ thing_two:'data.frame':   3 obs. of  5 variables:
## names can by so handy in downstream operations, e.g.
plyr::ldply(thing_list, .id = "came_from")
##   came_from Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 thing_one          5.1         3.5          1.4         0.2  setosa
## 2 thing_one          4.9         3.0          1.4         0.2  setosa
## 3 thing_one          4.7         3.2          1.3         0.2  setosa
## 4 thing_two          5.1         3.5          1.4         0.2  setosa
## 5 thing_two          4.9         3.0          1.4         0.2  setosa
## 6 thing_two          4.7         3.2          1.3         0.2  setosa
## wishful thinking
#thing_list <- list(thing_one, thing_two)
#thing_list <- magical_list_fxn("^thing_")
@lionel-
Copy link
Member

lionel- commented Jul 28, 2015

Indeed, dplyr may well be a better fit than purrr for store() and a list method for mutate_().

@jimhester
Copy link
Contributor

I think this does the trick?

auto_list <- function(...) {
  lapply(dplyr:::named_dots(...), eval)
}

a <- 5
auto_list(a, 1:10, c = 'hi')

# $a
# [1] 5
# 
# $`1:10`
#  [1]  1  2  3  4  5  6  7  8  9 10
# 
# $c
# [1] "hi"

auto_list(thing_one, thing_two)
# $thing_one
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 
# $thing_two
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa

@hadley hadley added the feature a feature request or enhancement label Aug 24, 2015
@hadley hadley added this to the 0.5 milestone Aug 24, 2015
@hadley
Copy link
Member

hadley commented Oct 29, 2015

I think this should be as similar as possible to data_frame(), and hopefully we can make it so that

data_frame <- function(...) {
  as_data_frame(auto_list(...))
}

works. This would improve the design of data_frame and as_data_frame my moving the currently duplicated checks into one place.

But what's a good name? lst()?

@jimhester
Copy link
Contributor

This produces the same functionality as my previous implementation and uses lazyeval style NSE.

lst <- function(...) {
  lst_(lazyeval::lazy_dots(...))
}

lst_ <- function(columns) {
  lazyeval::lazy_eval(lazyeval::auto_name(columns))
}

thing_one <- head(iris, 3)
thing_two <- head(iris, 3)

lst(thing_one, thing_two)
#> $thing_one
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 
#> $thing_two
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa

Also seems to work fine as a replacement for data_frame() (with limited testing).

data_frame2 <- function(...) {
  as_data_frame(lst(...))
}

data_frame2(a=1:10, b=letters[1:10])
#> Source: local data frame [10 x 2]
#> 
#>     a b
#> 1   1 a
#> 2   2 b
#> 3   3 c
#> 4   4 d
#> 5   5 e
#> 6   6 f
#> 7   7 g
#> 8   8 h
#> 9   9 i
#> 10 10 j

data_frame2(a=1:10, b="a")
#> Error: Columns are not all same length

@hadley
Copy link
Member

hadley commented Oct 29, 2015

In midst of refactoring data_frame_ into two pieces.

@hadley hadley closed this as completed in 3d254d6 Oct 29, 2015
krlmlr pushed a commit to krlmlr/dplyr that referenced this issue Mar 2, 2016
Make data_frame() and as_data_frame() more consistent. Improve error messages.

Fixes tidyverse#1290
krlmlr pushed a commit to tidyverse/tibble that referenced this issue Mar 22, 2016
- Initial CRAN release

- Extracted from `dplyr` 0.4.3

- Exported functions:
    - `tbl_df()`
    - `as_data_frame()`
    - `data_frame()`, `data_frame_()`
    - `frame_data()`, `tibble()`
    - `glimpse()`
    - `trunc_mat()`, `knit_print.trunc_mat()`
    - `type_sum()`
    - New `lst()` and `lst_()` create lists in the same way that
      `data_frame()` and `data_frame_()` create data frames (tidyverse/dplyr#1290).
      `lst(NULL)` doesn't raise an error (#17, @jennybc), but always
      uses deparsed expression as name (even for `NULL`).
    - New `add_row()` makes it easy to add a new row to data frame
      (tidyverse/dplyr#1021).
    - New `rownames_to_column()` and `column_to_rownames()` (#11, @zhilongjia).
    - New `has_rownames()` and `remove_rownames()` (#44).
    - New `repair_names()` fixes missing and duplicate names (#10, #15,
      @r2evans).
    - New `is_vector_s3()`.

- Features
    - New `as_data_frame.table()` with argument `n` to control name of count
      column (#22, #23).
    - Use `tibble` prefix for options (#13, #36).
    - `glimpse()` now (invisibly) returns its argument (tidyverse/dplyr#1570). It
      is now a generic, the default method dispatches to `str()`
      (tidyverse/dplyr#1325).  The default width is obtained from the
      `tibble.width` option (#35, #56).
    - `as_data_frame()` is now an S3 generic with methods for lists (the old
      `as_data_frame()`), data frames (trivial), matrices (with efficient
      C++ implementation) (tidyverse/dplyr#876), and `NULL` (returns a 0-row
      0-column data frame) (#17, @jennybc).
    - Non-scalar input to `frame_data()` and `tibble()` (including lists)
      creates list-valued columns (#7). These functions return 0-row but n-col
      data frame if no data.

- Bug fixes
    - `frame_data()` properly constructs rectangular tables (tidyverse/dplyr#1377,
      @kevinushey).

- Minor modifications
    - Uses `setOldClass(c("tbl_df", "tbl", "data.frame"))` to help with S4
      (tidyverse/dplyr#969).
    - `tbl_df()` automatically generates column names (tidyverse/dplyr#1606).
    - `tbl_df`s gain `$` and `[[` methods that are ~5x faster than the defaults,
      never do partial matching (tidyverse/dplyr#1504), and throw an error if the
      variable does not exist.  `[[.tbl_df()` falls back to regular subsetting
      when used with anything other than a single string (#29).
      `base::getElement()` now works with tibbles (#9).
    - `all_equal()` allows to compare data frames ignoring row and column order,
      and optionally ignoring minor differences in type (e.g. int vs. double)
      (tidyverse/dplyr#821).  Used by `all.equal()` for tibbles.  (This package
      contains a pure R implementation of `all_equal()`, the `dplyr` code has
      identical behavior but is written in C++ and thus faster.)
    - The internals of `data_frame()` and `as_data_frame()` have been aligned,
      so `as_data_frame()` will now automatically recycle length-1 vectors.
      Both functions give more informative error messages if you are attempting
      to create an invalid data frame.  You can no longer create a data frame
      with duplicated names (tidyverse/dplyr#820).  Both functions now check that
      you don't have any `POSIXlt` columns, and tell you to use `POSIXct` if you
      do (tidyverse/dplyr#813).  `data_frame(NULL)` raises error "must be a 1d
      atomic vector or list".
    - `trunc_mat()` and `print.tbl_df()` are considerably faster if you have
      very wide data frames.  They will now also only list the first 100
      additional variables not already on screen - control this with the new
      `n_extra` parameter to `print()` (tidyverse/dplyr#1161).  The type of list
      columns is printed correctly (tidyverse/dplyr#1379).  The `width` argument is
      used also for 0-row or 0-column data frames (#18).
    - When used in list-columns, S4 objects only print the class name rather
      than the full class hierarchy (#33).
    - Add test that `[.tbl_df()` does not change class (#41, @jennybc).  Improve
      `[.tbl_df()` error message.

- Documentation
    - Update README, with edits (#52, @bhive01) and enhancements (#54,
      @jennybc).
    - `vignette("tibble")` describes the difference between tbl_dfs and
      regular data frames (tidyverse/dplyr#1468).

- Code quality
    - Test using new-style Travis-CI and AppVeyor. Full test coverage (#24,
      #53). Regression tests load known output from file (#49).
    - Renamed `obj_type()` to `obj_sum()`, improvements, better integration with
     `type_sum()`.
    - Internal cleanup.
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants