Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow GCS access does not work from colab #25463

Closed
rsepassi opened this issue Feb 3, 2019 · 5 comments
Closed

TensorFlow GCS access does not work from colab #25463

rsepassi opened this issue Feb 3, 2019 · 5 comments
Assignees
Labels
comp:apis Highlevel API related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug

Comments

@rsepassi
Copy link

rsepassi commented Feb 3, 2019

System information
Using colab.research.google.com

Describe the current behavior

Hangs.

import tensorflow as tf
tf.io.gfile.exists(“gs://tfds-data”)  # which is a public GCS bucket

Describe the expected behavior

Should not hang.

@rsepassi
Copy link
Author

rsepassi commented Feb 3, 2019

Linked tensorflow/datasets#36

@craigcitro
Copy link
Contributor

Yes, this one's a known bug: the problem is that it's trying an authenticated request, with some number of timeouts; IIRC, it'll eventually complete after ~10m. (!!!)

Quick workaround: first, do

from google.colab import auth
auth.authenticate_user()

and it'll work.

This was intended to be fixed upstream, but it looks like that didn't work (at least, it still hangs for me). I'll take a look.

@rsepassi
Copy link
Author

rsepassi commented Feb 4, 2019

Thanks @craigcitro. Why is it trying to make an authenticated request? In the particular case I'm running into, it's a publicly accessible GCS bucket, so no auth necessary. Where is the "authenticated request" logic happening? I'm guessing it's a TF thing, not a colab thing, right? Should we update it to first check if credentials exist? (if so, make authenticated request, if not don't)

In my particular case, TFDS is storing some files on a public GCS bucket and is trying to load them. The user shouldn't know anything about the GCS bucket. Calling auth.authenticate_user() forces the user to go to an oauth link, copy a verification code, and paste it back in. This isn't something we want to force users to do. In the short term TFDS will probably use the GCS HTTP API instead of tf.io.gfile.

@jvishnuvardhan jvishnuvardhan added type:bug Bug comp:apis Highlevel API related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Feb 4, 2019
@craigcitro
Copy link
Contributor

I've added a $NO_GCE_CHECK environment variable that allows a user to completely sidestep the GCE metadata checks; anyone who's hitting the original issue (slow timeouts + retries attempting to fetch a public GCS resource) should be able to use this to get unblocked.

For Colab in particular, I'm going to add this patch and enable it in our runtimes (where GCE metadata is never available).

@rsepassi
Copy link
Author

rsepassi commented Feb 7, 2019 via email

@dynamicwebpaige dynamicwebpaige removed this from Done in TensorFlow 2.0 Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:apis Highlevel API related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug
Projects
None yet
Development

No branches or pull requests

3 participants