Skip to content

Behaviour of features with functions that return argument > length 1 #258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
njtierney opened this issue Aug 19, 2020 · 3 comments
Closed

Comments

@njtierney
Copy link

njtierney commented Aug 19, 2020

I've noticed that when passing features a function that outputs something of lenght > 1, you get a long warning of renaming - I just wanted to check if this was expected behavhouiur? It's quite cool that this just works! I can get as many columns as there are outputs from the function. But just wanted to raise it as I hasn't seen it documented.

library(brolgar)
# Passing a vector to `mean` vs `diff` 
# mean
mean(c(1:10))
#> [1] 5.5
# input = vector of length 10
# output = vector of lenght 1

diff(c(1:10))
#> [1] 1 1 1 1 1 1 1 1 1
# input = vector of length 10
# input = vector of length 9

# so consequently, passing `features` a function that produces a vector
# will give you quite different return formats.

# mean
wages %>%
  features(ln_wages, 
           list(mean = mean))
#> # A tibble: 888 x 2
#>       id  mean
#>    <int> <dbl>
#>  1    31  1.75
#>  2    36  2.33
#>  3    53  1.89
#>  4   122  2.17
#>  5   134  2.48
#>  6   145  1.76
#>  7   155  2.17
#>  8   173  1.93
#>  9   206  2.27
#> 10   207  2.11
#> # … with 878 more rows

# range
wages %>%
  features(ln_wages, 
           list(range = range))
#> New names:
#> * `.?` -> `.?...1`
#> * `.?` -> `.?...2`
#> New names:
#> * `diff_.?...12` -> `diff_.?...13`
#> New names:
#> * `diff_.?...1` -> `diff_.?...2`
#> * `diff_.?...2` -> `diff_.?...3`
#> * `diff_.?...3` -> `diff_.?...4`
#> * `diff_.?...4` -> `diff_.?...5`
#> * `diff_.?...5` -> `diff_.?...6`
#> * ...
#> # A tibble: 888 x 14
#>       id `diff_.?...2` `diff_.?...3` `diff_.?...4` `diff_.?...5` `diff_.?...6`
#>    <int>         <dbl>         <dbl>         <dbl>         <dbl>         <dbl>
#>  1    31        -0.058         0.036         0.28          0.182        -0.222
#>  2    36        -0.184         0.458         0.317        -0.754         1.11 
#>  3    53        -0.225         1.70         -1.65         -0.03          0.316
#>  4   122         0.806        -1.00         -1.16          1.68         -0.263
#>  5   134         0.095         0.224         0.064        -0.124         0.063
#>  6   145        -0.086         0.12          0.289         0.151        -0.265
#>  7   155         0.302        -0.543         0.627         0.158        -0.593
#>  8   173         0.134         0.283         0.051        -0.011         0.319
#>  9   206         0.269         0.185        NA            NA            NA    
#> 10   207         0.169         0.182        -0.068         0.361        -0.172
#> # … with 878 more rows, and 8 more variables: `diff_.?...7` <dbl>,
#> #   `diff_.?...8` <dbl>, `diff_.?...9` <dbl>, `diff_.?...10` <dbl>,
#> #   `diff_.?...11` <dbl>, `diff_.?...12` <dbl>, diff <dbl>,
#> #   `diff_.?...14` <dbl>

Created on 2020-08-19 by the reprex package (v0.3.0)

@mitchelloharawild
Copy link
Member

All values of the vector needs names if multiple values will be returned. There shouldn't be this many warnings (one per column is appropriate), but this is how I would use range():

library(brolgar)
# range
wages %>%
  features(ln_wages, 
           list(range = ~ setNames(range(.), c("min", "max"))))
#> # A tibble: 888 x 3
#>       id range_min range_max
#>    <int>     <dbl>     <dbl>
#>  1    31     1.43       2.13
#>  2    36     1.80       2.93
#>  3    53     1.54       3.24
#>  4   122     0.763      2.92
#>  5   134     2.00       2.93
#>  6   145     1.48       2.04
#>  7   155     1.54       2.64
#>  8   173     1.56       2.34
#>  9   206     2.03       2.48
#> 10   207     1.58       2.66
#> # … with 878 more rows

Created on 2020-08-20 by the reprex package (v0.3.0)

@njtierney
Copy link
Author

Neat! Thanks for that! So there isn't currently a way to specify names within features?

Also if it is helpful I'd be happy to document this behaviour in fabletools, since I'll be doing it for brolgar as well

@mitchelloharawild
Copy link
Member

More documentation is always welcomed.

How do you mean by 'specify names within features'? Do you have a proposed interface improvement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants