Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: layer_data returns a tibble #3018

Closed
IndrajeetPatil opened this issue Nov 28, 2018 · 17 comments
Closed

feature request: layer_data returns a tibble #3018

IndrajeetPatil opened this issue Nov 28, 2018 · 17 comments

Comments

@IndrajeetPatil
Copy link

I often use the layer_ functions from ggplot2 for units tests. I was wondering if the layer_data function can return a tibble, which would make it much easier to peruse the data. Currently, this just overwhelms the console.

library(ggplot2)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang

# plot
p <- ggplot(iris, aes(Species, Sepal.Length)) + geom_point()

# checking data used
ggplot2::layer_data(p)
#>     x   y PANEL group shape colour size fill alpha stroke
#> 1   1 5.1     1     1    19  black  1.5   NA    NA    0.5
#> 2   1 4.9     1     1    19  black  1.5   NA    NA    0.5
#> 3   1 4.7     1     1    19  black  1.5   NA    NA    0.5
#> 4   1 4.6     1     1    19  black  1.5   NA    NA    0.5
#> 5   1 5.0     1     1    19  black  1.5   NA    NA    0.5
#> 6   1 5.4     1     1    19  black  1.5   NA    NA    0.5
#> 7   1 4.6     1     1    19  black  1.5   NA    NA    0.5
#> 8   1 5.0     1     1    19  black  1.5   NA    NA    0.5
#> 9   1 4.4     1     1    19  black  1.5   NA    NA    0.5
#> 10  1 4.9     1     1    19  black  1.5   NA    NA    0.5
#> 11  1 5.4     1     1    19  black  1.5   NA    NA    0.5
#> 12  1 4.8     1     1    19  black  1.5   NA    NA    0.5
#> 13  1 4.8     1     1    19  black  1.5   NA    NA    0.5
#> 14  1 4.3     1     1    19  black  1.5   NA    NA    0.5
#> 15  1 5.8     1     1    19  black  1.5   NA    NA    0.5
#> 16  1 5.7     1     1    19  black  1.5   NA    NA    0.5
#> 17  1 5.4     1     1    19  black  1.5   NA    NA    0.5
#> 18  1 5.1     1     1    19  black  1.5   NA    NA    0.5
#> 19  1 5.7     1     1    19  black  1.5   NA    NA    0.5
#> 20  1 5.1     1     1    19  black  1.5   NA    NA    0.5
#> 21  1 5.4     1     1    19  black  1.5   NA    NA    0.5
#> 22  1 5.1     1     1    19  black  1.5   NA    NA    0.5
#> 23  1 4.6     1     1    19  black  1.5   NA    NA    0.5
#> 24  1 5.1     1     1    19  black  1.5   NA    NA    0.5
#> 25  1 4.8     1     1    19  black  1.5   NA    NA    0.5
#> 26  1 5.0     1     1    19  black  1.5   NA    NA    0.5
#> 27  1 5.0     1     1    19  black  1.5   NA    NA    0.5
#> 28  1 5.2     1     1    19  black  1.5   NA    NA    0.5
#> 29  1 5.2     1     1    19  black  1.5   NA    NA    0.5
#> 30  1 4.7     1     1    19  black  1.5   NA    NA    0.5
#> 31  1 4.8     1     1    19  black  1.5   NA    NA    0.5
#> 32  1 5.4     1     1    19  black  1.5   NA    NA    0.5
#> 33  1 5.2     1     1    19  black  1.5   NA    NA    0.5
#> 34  1 5.5     1     1    19  black  1.5   NA    NA    0.5
#> 35  1 4.9     1     1    19  black  1.5   NA    NA    0.5
#> 36  1 5.0     1     1    19  black  1.5   NA    NA    0.5
#> 37  1 5.5     1     1    19  black  1.5   NA    NA    0.5
#> 38  1 4.9     1     1    19  black  1.5   NA    NA    0.5
#> 39  1 4.4     1     1    19  black  1.5   NA    NA    0.5
#> 40  1 5.1     1     1    19  black  1.5   NA    NA    0.5
#> 41  1 5.0     1     1    19  black  1.5   NA    NA    0.5
#> 42  1 4.5     1     1    19  black  1.5   NA    NA    0.5
#> 43  1 4.4     1     1    19  black  1.5   NA    NA    0.5
#> 44  1 5.0     1     1    19  black  1.5   NA    NA    0.5
#> 45  1 5.1     1     1    19  black  1.5   NA    NA    0.5
#> 46  1 4.8     1     1    19  black  1.5   NA    NA    0.5
#> 47  1 5.1     1     1    19  black  1.5   NA    NA    0.5
#> 48  1 4.6     1     1    19  black  1.5   NA    NA    0.5
#> 49  1 5.3     1     1    19  black  1.5   NA    NA    0.5
#> 50  1 5.0     1     1    19  black  1.5   NA    NA    0.5
#> 51  2 7.0     1     2    19  black  1.5   NA    NA    0.5
#> 52  2 6.4     1     2    19  black  1.5   NA    NA    0.5
#> 53  2 6.9     1     2    19  black  1.5   NA    NA    0.5
#> 54  2 5.5     1     2    19  black  1.5   NA    NA    0.5
#> 55  2 6.5     1     2    19  black  1.5   NA    NA    0.5
#> 56  2 5.7     1     2    19  black  1.5   NA    NA    0.5
#> 57  2 6.3     1     2    19  black  1.5   NA    NA    0.5
#> 58  2 4.9     1     2    19  black  1.5   NA    NA    0.5
#> 59  2 6.6     1     2    19  black  1.5   NA    NA    0.5
#> 60  2 5.2     1     2    19  black  1.5   NA    NA    0.5
#> 61  2 5.0     1     2    19  black  1.5   NA    NA    0.5
#> 62  2 5.9     1     2    19  black  1.5   NA    NA    0.5
#> 63  2 6.0     1     2    19  black  1.5   NA    NA    0.5
#> 64  2 6.1     1     2    19  black  1.5   NA    NA    0.5
#> 65  2 5.6     1     2    19  black  1.5   NA    NA    0.5
#> 66  2 6.7     1     2    19  black  1.5   NA    NA    0.5
#> 67  2 5.6     1     2    19  black  1.5   NA    NA    0.5
#> 68  2 5.8     1     2    19  black  1.5   NA    NA    0.5
#> 69  2 6.2     1     2    19  black  1.5   NA    NA    0.5
#> 70  2 5.6     1     2    19  black  1.5   NA    NA    0.5
#> 71  2 5.9     1     2    19  black  1.5   NA    NA    0.5
#> 72  2 6.1     1     2    19  black  1.5   NA    NA    0.5
#> 73  2 6.3     1     2    19  black  1.5   NA    NA    0.5
#> 74  2 6.1     1     2    19  black  1.5   NA    NA    0.5
#> 75  2 6.4     1     2    19  black  1.5   NA    NA    0.5
#> 76  2 6.6     1     2    19  black  1.5   NA    NA    0.5
#> 77  2 6.8     1     2    19  black  1.5   NA    NA    0.5
#> 78  2 6.7     1     2    19  black  1.5   NA    NA    0.5
#> 79  2 6.0     1     2    19  black  1.5   NA    NA    0.5
#> 80  2 5.7     1     2    19  black  1.5   NA    NA    0.5
#> 81  2 5.5     1     2    19  black  1.5   NA    NA    0.5
#> 82  2 5.5     1     2    19  black  1.5   NA    NA    0.5
#> 83  2 5.8     1     2    19  black  1.5   NA    NA    0.5
#> 84  2 6.0     1     2    19  black  1.5   NA    NA    0.5
#> 85  2 5.4     1     2    19  black  1.5   NA    NA    0.5
#> 86  2 6.0     1     2    19  black  1.5   NA    NA    0.5
#> 87  2 6.7     1     2    19  black  1.5   NA    NA    0.5
#> 88  2 6.3     1     2    19  black  1.5   NA    NA    0.5
#> 89  2 5.6     1     2    19  black  1.5   NA    NA    0.5
#> 90  2 5.5     1     2    19  black  1.5   NA    NA    0.5
#> 91  2 5.5     1     2    19  black  1.5   NA    NA    0.5
#> 92  2 6.1     1     2    19  black  1.5   NA    NA    0.5
#> 93  2 5.8     1     2    19  black  1.5   NA    NA    0.5
#> 94  2 5.0     1     2    19  black  1.5   NA    NA    0.5
#> 95  2 5.6     1     2    19  black  1.5   NA    NA    0.5
#> 96  2 5.7     1     2    19  black  1.5   NA    NA    0.5
#> 97  2 5.7     1     2    19  black  1.5   NA    NA    0.5
#> 98  2 6.2     1     2    19  black  1.5   NA    NA    0.5
#> 99  2 5.1     1     2    19  black  1.5   NA    NA    0.5
#> 100 2 5.7     1     2    19  black  1.5   NA    NA    0.5
#> 101 3 6.3     1     3    19  black  1.5   NA    NA    0.5
#> 102 3 5.8     1     3    19  black  1.5   NA    NA    0.5
#> 103 3 7.1     1     3    19  black  1.5   NA    NA    0.5
#> 104 3 6.3     1     3    19  black  1.5   NA    NA    0.5
#> 105 3 6.5     1     3    19  black  1.5   NA    NA    0.5
#> 106 3 7.6     1     3    19  black  1.5   NA    NA    0.5
#> 107 3 4.9     1     3    19  black  1.5   NA    NA    0.5
#> 108 3 7.3     1     3    19  black  1.5   NA    NA    0.5
#> 109 3 6.7     1     3    19  black  1.5   NA    NA    0.5
#> 110 3 7.2     1     3    19  black  1.5   NA    NA    0.5
#> 111 3 6.5     1     3    19  black  1.5   NA    NA    0.5
#> 112 3 6.4     1     3    19  black  1.5   NA    NA    0.5
#> 113 3 6.8     1     3    19  black  1.5   NA    NA    0.5
#> 114 3 5.7     1     3    19  black  1.5   NA    NA    0.5
#> 115 3 5.8     1     3    19  black  1.5   NA    NA    0.5
#> 116 3 6.4     1     3    19  black  1.5   NA    NA    0.5
#> 117 3 6.5     1     3    19  black  1.5   NA    NA    0.5
#> 118 3 7.7     1     3    19  black  1.5   NA    NA    0.5
#> 119 3 7.7     1     3    19  black  1.5   NA    NA    0.5
#> 120 3 6.0     1     3    19  black  1.5   NA    NA    0.5
#> 121 3 6.9     1     3    19  black  1.5   NA    NA    0.5
#> 122 3 5.6     1     3    19  black  1.5   NA    NA    0.5
#> 123 3 7.7     1     3    19  black  1.5   NA    NA    0.5
#> 124 3 6.3     1     3    19  black  1.5   NA    NA    0.5
#> 125 3 6.7     1     3    19  black  1.5   NA    NA    0.5
#> 126 3 7.2     1     3    19  black  1.5   NA    NA    0.5
#> 127 3 6.2     1     3    19  black  1.5   NA    NA    0.5
#> 128 3 6.1     1     3    19  black  1.5   NA    NA    0.5
#> 129 3 6.4     1     3    19  black  1.5   NA    NA    0.5
#> 130 3 7.2     1     3    19  black  1.5   NA    NA    0.5
#> 131 3 7.4     1     3    19  black  1.5   NA    NA    0.5
#> 132 3 7.9     1     3    19  black  1.5   NA    NA    0.5
#> 133 3 6.4     1     3    19  black  1.5   NA    NA    0.5
#> 134 3 6.3     1     3    19  black  1.5   NA    NA    0.5
#> 135 3 6.1     1     3    19  black  1.5   NA    NA    0.5
#> 136 3 7.7     1     3    19  black  1.5   NA    NA    0.5
#> 137 3 6.3     1     3    19  black  1.5   NA    NA    0.5
#> 138 3 6.4     1     3    19  black  1.5   NA    NA    0.5
#> 139 3 6.0     1     3    19  black  1.5   NA    NA    0.5
#> 140 3 6.9     1     3    19  black  1.5   NA    NA    0.5
#> 141 3 6.7     1     3    19  black  1.5   NA    NA    0.5
#> 142 3 6.9     1     3    19  black  1.5   NA    NA    0.5
#> 143 3 5.8     1     3    19  black  1.5   NA    NA    0.5
#> 144 3 6.8     1     3    19  black  1.5   NA    NA    0.5
#> 145 3 6.7     1     3    19  black  1.5   NA    NA    0.5
#> 146 3 6.7     1     3    19  black  1.5   NA    NA    0.5
#> 147 3 6.3     1     3    19  black  1.5   NA    NA    0.5
#> 148 3 6.5     1     3    19  black  1.5   NA    NA    0.5
#> 149 3 6.2     1     3    19  black  1.5   NA    NA    0.5
#> 150 3 5.9     1     3    19  black  1.5   NA    NA    0.5

Created on 2018-11-28 by the reprex package (v0.2.1)

@batpigandme
Copy link
Contributor

So basically default to the wrapped output below, or am I missing something?

library(tidyverse)
p <- ggplot(iris, aes(Species, Sepal.Length)) + geom_point()
layer_data <- as_tibble(ggplot2::layer_data(p))
layer_data
#> # A tibble: 150 x 10
#>        x     y PANEL group shape colour  size fill  alpha stroke
#>    <int> <dbl> <fct> <int> <dbl> <chr>  <dbl> <lgl> <lgl>  <dbl>
#>  1     1   5.1 1         1    19 black    1.5 NA    NA       0.5
#>  2     1   4.9 1         1    19 black    1.5 NA    NA       0.5
#>  3     1   4.7 1         1    19 black    1.5 NA    NA       0.5
#>  4     1   4.6 1         1    19 black    1.5 NA    NA       0.5
#>  5     1   5   1         1    19 black    1.5 NA    NA       0.5
#>  6     1   5.4 1         1    19 black    1.5 NA    NA       0.5
#>  7     1   4.6 1         1    19 black    1.5 NA    NA       0.5
#>  8     1   5   1         1    19 black    1.5 NA    NA       0.5
#>  9     1   4.4 1         1    19 black    1.5 NA    NA       0.5
#> 10     1   4.9 1         1    19 black    1.5 NA    NA       0.5
#> # ... with 140 more rows

Created on 2018-11-28 by the reprex package (v0.2.1.9000)

@IndrajeetPatil
Copy link
Author

Oh, yeah, I can convert the output to a tibble afterward. But I just thought it would be consistent behavior across tidyverse packages if ggplot2 also returned a tibble by default (e.g., broom, readr, purrr all return tibbles).

I usually expect a tibble as an output from tidyverse functions and so found it a bit surprising that this particular function returned a data.frame and raised this issue. You can close this issue if you feel this won't be worth it, but just thought I'd point this out! :)

@batpigandme
Copy link
Contributor

No, it's not necessarily a bad idea, I just genuinely wanted to make sure I wasn't missing something!

@clauswilke
Copy link
Member

I think it is critical that layer_data() return exactly the data structure that ggplot2 uses internally, since we use it for debugging and testing.

@thomasp85 Would it be possible for your new data_frame() constructor to also set class tbl_df? Would this have performance implications? Any other reasons why this might be a bad idea?

@yutannihilation
Copy link
Member

I'm not sure how this affect the overall performance of ggplot2, but tibble seems slower than data.frame for subsetting by [, especially if the index is character. For $ and [[, tibble is faster.

library(tibble)

d <- data.frame(a = 1, b = 2, c = 3)
t <- as_tibble(d)

m <- bench::mark(
  "`[`,  int, data.frame" = d[, 3, drop = FALSE],
  "`[`,  chr, data.frame" = d[, "c", drop = FALSE],
  "`$`,       data.frame" = d$c,
  "`[[`, int, data.frame" = d[[3]],
  "`[[`, chr, data.frame" = d[["c"]],
  "`[`,  int, tibble" = t[, 3],
  "`[`,  chr, tibble" = t[, "c"],
  "`$`,       tibble" = t$c,
  "`[[`, int, tibble" = t[[3]],
  "`[[`, chr, tibble" = t[["c"]],
  check = FALSE
)

library(ggplot2)

autoplot(m) +
  theme(axis.text.y = element_text(hjust = 0))

Created on 2018-11-30 by the reprex package (v0.2.1)

@clauswilke
Copy link
Member

Wow, [ is up to a thousand times slower for tibble than for data.frame. Do the tibble developers know? @krlmlr

@krlmlr
Copy link
Member

krlmlr commented Nov 30, 2018

Thanks for letting me know. We have a very expensive check and S3 dispatch which both slow down the code, but this is fairly easy to fix. Will do for the 2.0.0 release.

@clauswilke
Copy link
Member

Can subsetting for ints be made as fast as data.frame or will it remain ~10x slower?

@yutannihilation
Copy link
Member

Can subsetting for ints be made as fast as data.frame or will it remain ~10x slower?

Let's discuss this on tidyverse/tibble#544 (or on a new issue on tibble's repo).

@krlmlr
Copy link
Member

krlmlr commented Dec 19, 2018

Subsetting tibbles is now a tad faster on my machine for the use case shown here.

@clauswilke
Copy link
Member

Thanks, Kirill!

@thomasp85 Can we now use tibbles internally in ggplot2?

@thomasp85
Copy link
Member

I’m unsure about any reason to doing this? Is the impediment simply so that layer_data returns a tibble?

tibbles and data frames have very different sub setting behaviour so the change will likely require changes throughout the codebase

@clauswilke
Copy link
Member

clauswilke commented Dec 19, 2018

I think the argument would be that all of tidyverse is moving/has moved to tibbles, so using them internally for ggplot2 would keep things consistent with the rest of the ecosystem. Also, apparently, now they're consistently faster (as the graphic above shows, almost by a factor of 10 for the most common use case, $).

@yutannihilation
Copy link
Member

One more argument to support using tibble is that we are already using tibble's spec at least. NEWS says:

Internally, ggplot2 now uses as.data.frame(tibble::as_tibble(x)) to convert a list into a data frame.

@yutannihilation
Copy link
Member

yutannihilation commented Jan 20, 2019

After a short discussion with @thomasp85 and @hadley, and since #3048 actually seems difficult, I'd concude that we cannot use tibble for internal uses; it's technically possible, but it requires a lot of work while the gain is relatively small. So, if we agree with this argument by @clauswilke, I will close this issue.

I think it is critical that layer_data() return exactly the data structure that ggplot2 uses internally, since we use it for debugging and testing.

For a side note, I think it's good if we describe a concrete spec for internal data.frame. It's coincidently similar to tibble's spec for now, but it might diverge someday.

@clauswilke
Copy link
Member

I agree.

@lock
Copy link

lock bot commented Jul 19, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jul 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants