New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_excel progress bar #243

Closed
bbrewington opened this Issue Feb 2, 2017 · 16 comments

Comments

Projects
None yet
7 participants
@bbrewington
Copy link

bbrewington commented Feb 2, 2017

It would be nice to have a progress bar like readr.

Here's the "show_progress" function from https://github.com/tidyverse/readr/blob/8d6a892b941948c96b6a5aa2ddb5b897492478e3/R/utils.R

show_progress <- function() {
!isTRUE(getOption("readr.show_progress")) || # user disables progress bar
!interactive() || # not an interactive session
!is.null(getOption("knitr.in.progress")) # Not actively knitting a document
}

@jennybc jennybc added the feature label Feb 3, 2017

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented Feb 3, 2017

I'm curious. With what size sheet are you really feeling this need? xlsx or xls?

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented Feb 10, 2017

@bbrewington

This comment has been minimized.

Copy link

bbrewington commented Feb 10, 2017

I think it ended up being a corrupt xls file...oops :) Are you thinking it should read fast enough that a progress bar isn't necessary?

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented Feb 11, 2017

@bbrewington I just wanted a data point from the real world re: at what size progress starts to seem necessary. The sheets I work with during development are weird but small.

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented Mar 5, 2017

We're not going to implement progress bar at this point. Not clear the work is justified by the pay off.

@jennybc jennybc closed this Mar 5, 2017

@bluunk

This comment has been minimized.

Copy link

bluunk commented Sep 28, 2017

One data point: I am using readxl to open files with a size of >100 MB. This takes quite long. Thus, some feedback in form of a progress bar would be helpful, just to know if R is still working or got stuck

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented Sep 28, 2017

I'll re-open this so I take another look at the next release.

@jennybc jennybc reopened this Sep 28, 2017

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented Dec 16, 2018

I have a progress indicator now in a local branch but could use help from actual users. I am interested in:

  • At what size is readxl::read_excel() slow enough that you would even want a progress indicator?
    • size of overall .xls or .xlsx, dimensions of a worksheet that is slow to read
  • What is your OS and how much RAM?
  • Do you know of any inconveniently large publicly available xls or xlsx files?
@nacnudus

This comment has been minimized.

Copy link
Contributor

nacnudus commented Dec 16, 2018

This xlsx is the largest I know, 25M overall with a sheet 211,933 rows by 10 columns. But it loads in a second or so on my Linux laptop with 16GB RAM and I wouldn't want a progress indicator until it was maybe three seconds.

That site seems to have trouble, so I could email it instead.

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented Dec 16, 2018

I have a .xls that's 117MB (many different worksheets) but, like @nacnudus's experience, the progress bar barely has time to display before read_excel() completes.

@scjs

This comment has been minimized.

Copy link

scjs commented Dec 17, 2018

The Voice_Master.xlsx spreadsheet here (the first link, "Acoustics") is 73 MB with 16k rows and about 500 columns in one worksheet. It is CC BY-NC-SA 3.0 licensed. Loading it with read_excel() takes about 20 seconds on Windows 7 with 8 GB RAM. I used openxlsx to convert and clean it in this document.

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented Dec 17, 2018

Wow @scjs that is a great find. Not only does it take a non-trivial time to read, but it also (probably related) demonstrates the over-messaging problem from #361. It hurts, but that's a great example!

@KyleHaynes

This comment has been minimized.

Copy link

KyleHaynes commented Dec 17, 2018

My Googling skills are not up to scratch (Well done @scjs) - spent way too much time this morning trying to find a large, public available xls(x) file with no success :(

My poor old home computer:
Operating system: Windows 10
RAM: 4 GB
Method: readxl::read_excel("C:\temp\Voice_Master.xlsx")
91.8 seconds

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.5.1   assertthat_0.2.0 cli_1.0.0        tools_3.5.1      pillar_1.3.0    
 [6] tibble_1.4.2     crayon_1.3.4     Rcpp_0.12.19     utf8_1.1.4       fansi_0.3.0     
[11] cellranger_1.1.0 readxl_1.1.0     rlang_0.2.2     

I'm on 32 GB of RAM at work (Windows 10), let me know if you want me to report that.

@jennybc Would you consider making the progress bar an argument? I like that you have this flexibility with data.table::fread (however, it's still not verbose if the file is read in under 3 seconds - which I don't agree with).

@JakeRuss

This comment has been minimized.

Copy link
Contributor

JakeRuss commented Dec 17, 2018

Isn't the cost of inclusion near zero at this point (since you've already written the code)? Just have it turned off by default?

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented Dec 17, 2018

Would you consider making the progress bar an argument?

It will be under control of an option, most likely. But yeah there will be some way to control it. It's possible I won't put it in this release, but in the next. The over-messaging problem might be a higher priority.

@KyleHaynes

This comment has been minimized.

Copy link

KyleHaynes commented Dec 17, 2018

My work computer:
Operating system: Windows 10
RAM: 32 GB
Method: readxl::read_excel("C:\temp\Voice_Master.xlsx")
12.45 seconds

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.5.1   tools_3.5.1      pillar_1.3.0     tibble_1.4.2    
[5] crayon_1.3.4     Rcpp_1.0.0       cellranger_1.1.0 readxl_1.1.0    
[9] rlang_0.3.0.1 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment