Skip to content

Slow printing with very wide tibbles #360

@ghost

Description

After updating to the most recent version of the package, I noticed that a) the new console output was great, and b) that printing was substantially slower for tibbles with ~50 or more columns. In addition to printing slower, the output process hangs between printing the tabular data preview and the list of columns excluded therefrom.

This reprex uses gapminder data to make a tibble with 1 row and 711 columns. I exaggerated the number of columns in an effort to make it reproducible on machines with better specs than my middling i-5 and 8 gigs of ram.

load packages

library(gapminder)
library(tidyverse)
#-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
# v ggplot2 2.2.1     v purrr   0.2.4
# v tibble  1.4.1     v dplyr   0.7.4
# v tidyr   0.7.2     v stringr 1.2.0
# v readr   1.1.1     v forcats 0.2.0

make the test data

tst_tibble <- gapminder %>%
  
  # change the year filter to add or subtract columns 
  # from the final tibble
  filter(year < 1975) %>% 
  unite(loc_yr, continent, country, year) %>%
  select(loc_yr, lifeExp) %>% 
  spread(loc_yr, lifeExp)


tst_df     <- as.data.frame(tst_tibble)

simple timing

system.time(print(tst_tibble))

# user  system elapsed 
# 8.24    0.00    8.28 

system.time(print(tst_df))

# user  system elapsed 
# 0.55    0.00    0.56

conclusions

Obviously tibble is doing more work to print its output than data.frame(), but the ~15X jump in time seems like quite a lot more than it was in previous versions, and also more than it should be to produce the output that is actually shown on screen. I unfortunately don't have time to downgrade tibble, or test timing more rigorously, but I'll check later and update.

My only hypothesis is that tibble is applying its print processing to all the columns, including the hidden ones, before it shrinks the output and sends it to the console, but I don't know enough to figure out whether or not that's the case.

session info

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] forcats_0.2.0   stringr_1.2.0   dplyr_0.7.4     purrr_0.2.4     readr_1.1.1     tidyr_0.7.2    
[7] tibble_1.4.1    ggplot2_2.2.1   tidyverse_1.2.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14     cellranger_1.1.0 pillar_1.0.1     compiler_3.4.3   plyr_1.8.4      
 [6] bindr_0.1        tools_3.4.3      lubridate_1.7.1  jsonlite_1.5     nlme_3.1-131    
[11] gtable_0.2.0     lattice_0.20-35  pkgconfig_2.0.1  rlang_0.1.6      psych_1.7.8     
[16] cli_1.0.0        rstudioapi_0.7   yaml_2.1.16      parallel_3.4.3   haven_1.1.0     
[21] bindrcpp_0.2     xml2_1.1.1       httr_1.3.1       hms_0.4.0        grid_3.4.3      
[26] glue_1.2.0       R6_2.2.2         readxl_1.0.0     foreign_0.8-69   modelr_0.1.1    
[31] reshape2_1.4.3   magrittr_1.5     scales_0.5.0     rvest_0.3.2      assertthat_0.2.0
[36] mnormt_1.5-5     colorspace_1.3-2 stringi_1.1.6    lazyeval_0.2.1   munsell_0.4.3   
[41] broom_0.4.3      crayon_1.3.4    

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions