drive_ls returning variable number of files #277

wilvancleve · 2019-09-27T20:40:39Z

I have a Team Drive directory with 570 pdf files in it.

I identify the directory in question using a dribble:

pdf_dribble <- drive_ls(pattern="application_pdfs", team_drive=as_id(season$id))

When I call drive_ls, I get a variable (usually wrong) number of files returned each time I call drive_ls:

application_files = drive_ls(path=as_dribble(pdf_dribble), pattern="pdf", n_max=10000)

Sometimes this will return 100 files, sometimes 500, sometimes 569, but almost never the correct number.

Any thoughts?

The text was updated successfully, but these errors were encountered:

jennybc · 2019-09-30T23:00:34Z

Sounds possibly related to #272. I haven't experienced this myself yet and no one's provided, say, a dribble containing these weird results. So I'm currently at a bit of a loss re: tackling this. I'm in a "listening and thinking" phase ... 🤔

Maybe it will happen to me soon.

wilvancleve · 2019-10-04T06:11:06Z

Happy to provide test data. It's a large number of files (~550) and when running drive_ls multiple times with the same inputs I can get different results with successive calls; it appears that some calls get abbreviated results (as though n_max is being ignored and google is returning a subset for brevity's sake).

jessjaco · 2019-10-29T22:26:58Z

It looks like do_paginated_request is giving inconsistent results. If you pause here and run the request a bunch of times, it returns differing numbers of items, for example:

> do_paginated_request(request, n_max=Inf, n=function(x) length(x$files),verbose=FALSE) %>% 
  map("files") %>% 
  flatten %>% 
  as_dribble %>% 
  nrow
[1] 1722
do_paginated_request(request, n_max=Inf, n=function(x) length(x$files),verbose=FALSE) %>% 
  map("files") %>% 
  flatten %>% 
  as_dribble %>% 
  nrow
[1] 1022
do_paginated_request(request, n_max=Inf, n=function(x) length(x$files),verbose=FALSE) %>% 
  map("files") %>% 
  flatten %>% 
  as_dribble %>% 
  nrow
[1] 422
do_paginated_request(request, n_max=Inf, n=function(x) length(x$files),verbose=FALSE) %>% 
  map("files") %>% 
  flatten %>% 
  as_dribble %>% 
  nrow
[1] 322

Worse even it appears to be giving duplicate results rather than incomplete ones:

do_paginated_request(request, n_max=Inf, n=function(x) length(x$files),verbose=FALSE) %>% 
  map("files") %>% 
  flatten %>% 
  duplicated(.$id) %>% 
  sum
[1] 84
do_paginated_request(request, n_max=Inf, n=function(x) length(x$files),verbose=FALSE) %>% 
  map("files") %>% 
  flatten %>% 
  duplicated(.$id) %>% 
  sum
[1] 0
do_paginated_request(request, n_max=Inf, n=function(x) length(x$files),verbose=FALSE) %>% 
  map("files") %>% 
  flatten %>% 
  duplicated(.$id) %>% 
  sum
[1] 13

I wonder if this can be reproduced using the bare API or if it's something in gargle.

jennybc · 2020-01-14T20:58:25Z

I still have yet to experience this phenomenon or get enough data to truly study it.

But I have formed an untestable hypothesis about the root cause and installed a fix 🤞

Needless to say, please open a new issue if you update to this dev version and still see the phenomenon.

wilvancleve · 2020-01-15T02:02:30Z

Installed via devtools (the master branch) and still seeing a variable number of files returned. Source folder has 280 files. First call returned 256, then a bunch of calls returned 280, then a final call returned 269.

Is there more detailed debug info I can provide?

jennybc · 2020-01-15T03:36:16Z

Is this with a plain vanilla drive_ls()?

Can you do some exploratory analysis / comparison of those different return values?

Can you verify that there are no duplicated file IDs (the id column)?

Is there anything you can say about the 280 - 269 = 11 or 280 - 256 = 24 files that are sometimes missing? Are they being edited, were they recently created, do they all live in one subfolder, etc etc?

jennybc · 2020-01-15T03:38:03Z

The root problem is that we are working with a paginated result, i.e. the results come in batches. I am now guarding against those pages containing replicated results. But I cannot guard against the inverse problem, which is that some results appear in no page (?).

I suspect that this is what you are now seeing, because you report no error.

jennybc · 2020-01-15T04:23:44Z

@wilvancleve Nevermind, I can see what you see in #288. I have some homework to do, but this may be something to ask about upstream.

spocks · 2020-04-02T16:33:52Z

After weeks of debugging I realize this issue is causing error in my code. Any plan to fix it on github or CRAN version?

jennybc · 2020-04-02T20:11:41Z

@spocks drive_ls() is fixed on GitHub?

https://github.com/tidyverse/googledrive/blame/c8108a913d023b6f80ba07f75bdeb4e0b4520094/NEWS.md#L5

which is how this issue got closed above.

spocks · 2020-04-03T01:40:39Z

@jennybc I updated the googledrive to the latest github version (1.0.0.9000) however I still have this issue.

jennybc · 2020-04-03T02:10:57Z

Have you definitely restarted R?

Also, the version number is not definitive for dev versions, i.e. many source states have the same dev version. We generally only bump the dev version when a change is important for another (dev) package.

If you've ruled all of that out, please open a new issue.

aleksandereiken · 2021-09-22T11:16:41Z

Dear Jenny,
I still experience the same issue as described above, also with the googledrive dev version 2.0.0.9000 (I have restarted, and tested with sessionInfo(), that the version is the one above).

My code is this:

# Authenticate
googledrive::drive_auth()

# Find IDs of team drive folders
ids_of_team_drives <- googledrive::shared_drive_find()

# Find team drive of interest
team_drive_id <- ids_of_team_drives[which(ids_of_team_drives$name == "my_team_drive"),]$id

# List folders within team drive of interest
folders_within_team_drive <- googledrive::drive_ls(
  googledrive::as_dribble(
    googledrive::as_id(
      team_drive_id
    )
  )
)

# Download tibble with content of "my_sub_folder"
files <- googledrive::drive_ls(
  googledrive::as_dribble(
    googledrive::as_id(
      folders_within_team_drive[which(folders_within_team_drive$name == "my_sub_folder"), ]$id
    )
  )
)

The folder "my_sub_folder" contains 126 images of jpeg format, however, when I re-run the code above, a tibble with everything between 100 and 126 rows is returned.

Thanks for looking into this when you have time. And thank you for an amazing package!!

jennybc · 2021-09-22T16:49:08Z

At this point, I think this is as "fixed" as I can make it. The root problem is this:

The root problem is that we are working with a paginated result, i.e. the results come in batches. I am now guarding against those pages containing replicated results. But I cannot guard against the inverse problem, which is that some results appear in no page (?).

I.e. googledrive is faithfully presenting the results Google gives us, but sometimes those results are incomplete. 😕

My best advice re: what to do about this is in here:

#288

specifically:

#288 (comment)

Now, I realize you already using drive_ls().

BTW your code is much harder to read (and write!) than it needs to be. This is presumably not the problem, but is still an improvement you might enjoy.

I think the above can be rewritten as:

target_team_drive <- shared_drive_get("my_team_drive")
target_folder <- drive_get("my_sub_folder", shared_drive = target_team_drive)
files <- drive_ls(target_folder)

You might try specifying the corpus in drive_ls(), i.e. drive_ls(target_folder, corpus = "allDrives"). I don't think that should matter and if it helped that would be interesting to me.

aleksandereiken · 2021-09-22T18:04:47Z

Okay, thanks a lot for you time and comment. I will look into the q clause to see if it can somehow help. Thanks for the code review too! I will test it out and let you know if it helps.

jennybc mentioned this issue Oct 7, 2019

Intermittent failure in all googledrive functions #279

Closed

jennybc mentioned this issue Oct 15, 2019

Error with drive_get() #281

Closed

jennybc closed this as completed in e56b3f5 Jan 14, 2020

jennybc mentioned this issue Jan 15, 2020

drive_find() does not always return exactly the same files #288

Closed

jennybc mentioned this issue Mar 2, 2020

several issues with drive_mkdir() #299

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drive_ls returning variable number of files #277

drive_ls returning variable number of files #277

wilvancleve commented Sep 27, 2019

jennybc commented Sep 30, 2019

wilvancleve commented Oct 4, 2019

jessjaco commented Oct 29, 2019

jennybc commented Jan 14, 2020

wilvancleve commented Jan 15, 2020

jennybc commented Jan 15, 2020 •

edited

Loading

jennybc commented Jan 15, 2020 •

edited

Loading

jennybc commented Jan 15, 2020

spocks commented Apr 2, 2020

jennybc commented Apr 2, 2020

spocks commented Apr 3, 2020

jennybc commented Apr 3, 2020

aleksandereiken commented Sep 22, 2021 •

edited

Loading

jennybc commented Sep 22, 2021

aleksandereiken commented Sep 22, 2021

drive_ls returning variable number of files #277

drive_ls returning variable number of files #277

Comments

wilvancleve commented Sep 27, 2019

jennybc commented Sep 30, 2019

wilvancleve commented Oct 4, 2019

jessjaco commented Oct 29, 2019

jennybc commented Jan 14, 2020

wilvancleve commented Jan 15, 2020

jennybc commented Jan 15, 2020 • edited Loading

jennybc commented Jan 15, 2020 • edited Loading

jennybc commented Jan 15, 2020

spocks commented Apr 2, 2020

jennybc commented Apr 2, 2020

spocks commented Apr 3, 2020

jennybc commented Apr 3, 2020

aleksandereiken commented Sep 22, 2021 • edited Loading

jennybc commented Sep 22, 2021

aleksandereiken commented Sep 22, 2021

jennybc commented Jan 15, 2020 •

edited

Loading

jennybc commented Jan 15, 2020 •

edited

Loading

aleksandereiken commented Sep 22, 2021 •

edited

Loading