Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support AWS IAM roles when using TensorFlow file_io #43344

Closed
sheromon opened this issue Sep 18, 2020 · 11 comments
Closed

Support AWS IAM roles when using TensorFlow file_io #43344

sheromon opened this issue Sep 18, 2020 · 11 comments
Assignees
Labels
comp:apis Highlevel API related issues stat:awaiting response Status - Awaiting response from author type:feature Feature requests

Comments

@sheromon
Copy link

sheromon commented Sep 18, 2020

System information

  • TensorFlow version (you are using): 2.1.0, 2.3.0
  • Are you willing to contribute it (Yes/No): No, sorry, I'm not very comfortable with C++

Describe the feature and the current behavior/state.
I really appreciate the functionality provided by TensorFlow file_io to allow the user to treat files stored locally on disk in the same was as files stored in S3. It just works! It's wonderful! However, this functionality doesn't work if your AWS credentials are being provided using an AWS IAM role. Unfortunately the default AWS SDK credentials behavior does not account for this situation, and the maintainers have said that they will not incorporate this feature into their default credentials provided. They did offer a suggested way for people using the AWS SDK to support this feature. Here is the thread where this is discussed: aws/aws-sdk-cpp#150.

Will this change the current api? How?
No

Who will benefit with this feature?
TensorFlow users who are using AWS S3 with credentials provided using AWS IAM roles

Any Other info.
Here's an example from TensorFlow 2.3.0 with the error message, run in a Docker container that uses tensorflow/tensorflow:2.3.0-cpu as the base image. The AWS_PROFILE EV is set to use my IAM role. I've confirmed that aws s3 ls s3://my-bucket/my-file.txt works.

>>> from tensorflow.python.lib.io import file_io as tf_file_io
>>> tf_file_io.file_exists('s3://my-bucket/my-file.txt')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/lib/io/file_io.py", line 249, in file_exists
    return file_exists_v2(filename)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/lib/io/file_io.py", line 267, in file_exists_v2
    _pywrap_file_io.FileExists(compat.as_bytes(path))
tensorflow.python.framework.errors_impl.FailedPreconditionError: AWS Credentials have not been set properly. Unable to access the specified S3 location
@sheromon sheromon added the type:feature Feature requests label Sep 18, 2020
@amahendrakar amahendrakar added the comp:apis Highlevel API related issues label Sep 21, 2020
@gowthamkpr gowthamkpr added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Sep 22, 2020
@surry
Copy link
Contributor

surry commented Sep 28, 2020

I was able to follow a suggestion on an AWS C++ SDK issue and get this working locally, however, for our use case, there's an issue in the AWS SDK where the credentials are only read from ~/.aws/config, instead of from ~/.aws/credentials, so it still doesn't quite work as we would like even after adding a custom CredentialsProviderChain to TensorFlow. Details here:

aws/aws-sdk-cpp#1330 (comment)

@dgoldenberg-audiomack
Copy link

Hi, any word on this issue? IMHO, this is rather critical as it's blocking #1252. I imagine lots of folks do/will want to load Parquet and other formats into TensorFlow from AWS S3. Just adding my vote here; thanks.

@mihaimaruseac
Copy link
Collaborator

We are moving cloud filesystems to SIG IO due to size constraints on the TF wheel package. SIG IO filesystems already provide more support than what we can offer.

@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 8, 2021
@sachinprasadhs
Copy link
Contributor

As per the comment here if your concern is addressed, could you please move this issue to closed. Thanks.

We are moving cloud filesystems to SIG IO due to size constraints on the TF wheel package. SIG IO filesystems already provide more support than what we can offer.

@sachinprasadhs sachinprasadhs added the stat:awaiting response Status - Awaiting response from author label Feb 19, 2022
@sheromon
Copy link
Author

@sachinprasadhs, I'm not sure who your comment is addressing. I've given up on this functionality. m(-_-)m

Perhaps @dgoldenberg-audiomack has what they needed now?

@dgoldenberg-audiomack
Copy link

Have what? I'm not a contributor. Gave up on the ticket that this one was apparently blocking

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Feb 21, 2022
@sachinprasadhs
Copy link
Contributor

@sheromon , Since you have opened this issue and if you don't have this issue anymore, please go ahead and close this issue. Thanks!

@sachinprasadhs sachinprasadhs added the stat:awaiting response Status - Awaiting response from author label Feb 22, 2022
@sheromon
Copy link
Author

@sachinprasadhs, okay, let me check with my collaborators first.

@sheromon
Copy link
Author

Well, I've found workarounds (although I didn't like any of them more than TensorFlow's file_io module), so I guess we can close this.

@dvaldivia
Copy link

@sheromon what was your workaround if you don't mind me asking?

@sheromon
Copy link
Author

sheromon commented Apr 7, 2022

@dvaldivia Using the smart_open Python package in some places and custom functions that use boto3 in other places. It's not as convenient, but it does the job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:apis Highlevel API related issues stat:awaiting response Status - Awaiting response from author type:feature Feature requests
Projects
None yet
Development

No branches or pull requests

10 participants