New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow printing with very wide tibbles #360
Comments
Thanks for raising this issue and for the example. We'll investigate the performance of printing and other operations for the upcoming tibble relese. |
We're doing expensive computations for all columns, of a data frame, not only for those to be displayed. Need to revisit that. |
I fixed some of the worst problems in pillar in r-lib/pillar#87 (now merged), install with # install.packages("remotes")
remotes::install_github("r-lib/pillar") Still, our implementation of |
We should at least replicate the previous heuristic which is to only look at the first |
That's what we do, it's the colored word wrap that now takes most of the time. Fixing things here will also fix OS X test failures (because of different behavior of wrapping non-breaking spaces on Linux and OS X). |
The wrap of the data columns or the extra columns? |
Wrapping extra columns currently takes most of the time for wide tibbles. |
The example above prints in 0.32s the first time, and in 0.16s from the second time on. This is about as fast as I could make it. |
- `enframe(NULL)` now returns the same as `enframe(logical())` (#352). - `tbl[1, , drop = TRUE]` now behaves identically to data frames (#367). - The `tibble.width` option is honored again (#369). - Faster printing of very wide tibbles (#360). - Update vignette to match changes in 1.4.1 (#368, @bgreenwell). - Don't rely on `ncol()` for `glimpse()`, only query `nrow()` and `head()`. - Return input for zero-column data frames. - Add test for `glimpse()` with unknown rows (#366, @kevinykuo). - Faster construction and subsetting for tibbles (#353). - `tribble()` now ignores trailing commas (#342, @LaDilettante). - Fix error message when accessing columns using a logical index vector (#337, @mundl).
Bug fixes --------- - Fix OS X builds. - The `tibble.width` option is honored again (#369). - `tbl[1, , drop = TRUE]` now behaves identically to data frames (#367). - Fix error message when accessing columns using a logical index vector (#337, @mundl). - `glimpse()` returns its input for zero-column data frames. Features -------- - `enframe(NULL)` now returns the same as `enframe(logical())` (#352). - `tribble()` now ignores trailing commas (#342, @LaDilettante). - Updated vignettes and website documentation. Performance ----------- - Faster printing of very wide tibbles (#360). - Faster construction and subsetting for tibbles (#353). - Only call `nrow()` and `head()` in `glimpse()`, not `ncol()`.
This issue persists in current versions and is very troublesome---I've had to take up the practice of converting all my tbls into data frames because of this. I have a very new and powerful machine and the example above takes
versus
I think it's a serious mistake to close this issue. Are you OK waiting more than 5 seconds every time you glance at a dataset? Why should looking at a tibble take that much longer than looking at the same data as a data.frame? I think this should be reopened. ── Attaching packages ─────────── tidyverse 1.2.1 ── |
@gvfarns I recommend you open a new issue and link to this thread. That fits better with our workflow. |
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary. |
After updating to the most recent version of the package, I noticed that a) the new console output was great, and b) that printing was substantially slower for tibbles with ~50 or more columns. In addition to printing slower, the output process hangs between printing the tabular data preview and the list of columns excluded therefrom.
This reprex uses gapminder data to make a tibble with 1 row and 711 columns. I exaggerated the number of columns in an effort to make it reproducible on machines with better specs than my middling i-5 and 8 gigs of ram.
load packages
make the test data
simple timing
conclusions
Obviously tibble is doing more work to print its output than data.frame(), but the ~15X jump in time seems like quite a lot more than it was in previous versions, and also more than it should be to produce the output that is actually shown on screen. I unfortunately don't have time to downgrade tibble, or test timing more rigorously, but I'll check later and update.
My only hypothesis is that tibble is applying its print processing to all the columns, including the hidden ones, before it shrinks the output and sends it to the console, but I don't know enough to figure out whether or not that's the case.
session info
The text was updated successfully, but these errors were encountered: