Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF 2.3 S3 client having permission issues with tf.data.TFRecordDataset() #44564

Closed
shaowei-su opened this issue Nov 3, 2020 · 12 comments
Closed
Assignees
Labels
comp:data tf.data related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.3 Issues related to TF 2.3 type:bug Bug

Comments

@shaowei-su
Copy link

shaowei-su commented Nov 3, 2020

Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary):
  • TensorFlow version (use command below): 2.3
  • Python version: 3.6
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:N/A
  • GPU model and memory:

You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with:

  1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
  2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior
TF 2.3 tf.data.TFRecordDataset() failed to load files from S3.
Side note:
tf_file_io is working properly for the same setup.

>> from tensorflow.python.lib.io import file_io as tf_file_io
>> tf_file_io.file_exists('SOME_S3_FILE')

True

See error log below.
Describe the expected behavior
Same API and environments works well with TF 2.1 and TF 1.15.

Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.

dataset = tf.data.TFRecordDataset('SOME_S3_PATH')
for raw_record in dataset.take(10):
  print(repr(raw_record))

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/default_user/.conda/envs/user/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 736, in __next__
    return self.next()
  File "/home/default_user/.conda/envs/user/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 772, in next
    return self._next_internal()
  File "/home/default_user/.conda/envs/user/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 764, in _next_internal
    return structure.from_compatible_tensor_list(self._element_spec, ret)
  File "/home/default_user/.conda/envs/user/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/default_user/.conda/envs/user/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 2105, in execution_mode
    executor_new.wait()
  File "/home/default_user/.conda/envs/user/lib/python3.6/site-packages/tensorflow/python/eager/executor.py", line 67, in wait
    pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
tensorflow.python.framework.errors_impl.FailedPreconditionError: AWS Credentials have not been set properly. Unable to access the specified S3 location
@shaowei-su shaowei-su added the type:bug Bug label Nov 3, 2020
@shaowei-su shaowei-su changed the title TF 2.3 S3 client having permission issues with AWS TF 2.3 S3 client having permission issues with tf.data.TFRecordDataset() Nov 3, 2020
@Saduf2019 Saduf2019 added TF 2.3 Issues related to TF 2.3 comp:data tf.data related issues labels Nov 4, 2020
@Saduf2019
Copy link
Contributor

@shaowei-su
Please provide with complete stand alone code for us to replicate the issue faced, or if possible share a colab gist with error reported.

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Nov 4, 2020
@shaowei-su
Copy link
Author

shaowei-su commented Nov 4, 2020

Thank you @Saduf2019 taking look into this.
To reproduce the error it requires:

  • S3 bucket access
  • AWS credential setup for the given S3 bucket

(Unfortunately, my current setup is not public accessible/sharable)
Once its setup then you can run the following tests:

  1. Smoke test (should pass)
from tensorflow.python.lib.io import file_io as tf_file_io
tf_file_io.file_exists('SOME_S3_TFRECORD_FILE')
  1. TFRecordDataset load (should fail)
dataset = tf.data.TFRecordDataset('SOME_S3_TFRECORD_FILE')
for raw_record in dataset.take(10):
  print(repr(raw_record))

@shaowei-su
Copy link
Author

Quick updates on my end:
This issue seems related to the S3 multi part download that's introduced in TF2.2.
By disable the functionality with os.environ['S3_DISABLE_MULTI_PART_DOWNLOAD'] = '1', tf.data.TFRecordDataset() can load data from S3 properly.

@Saduf2019 Saduf2019 removed the stat:awaiting response Status - Awaiting response from author label Nov 6, 2020
@Saduf2019 Saduf2019 assigned ymodak and unassigned Saduf2019 Nov 6, 2020
@ymodak ymodak assigned aaudiber and unassigned ymodak Nov 13, 2020
@ymodak ymodak added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 13, 2020
@vnghia
Copy link
Contributor

vnghia commented Dec 15, 2020

@shaowei-su

I am working on this error. But I am not able to reproduce it locally. Could you tried tf.io.read_file('SOME_S3_TFRECORD_FILE'). If it still falied for the same reason, could you please tell me the size of that file and check your environmental variables ( especially AWS_REGION because s3 usually fails because of bucket region ) ?

@shaowei-su
Copy link
Author

Hi @vnvo2409
Yeah read_file failed with the same reason, see stacktrace

2020-12-16 19:07:50.187041: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at whole_file_read_ops.cc:116 : Failed precondition: AWS Credentials have not been set properly. Unable to access the specified S3 location
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/default_user/.conda/envs/user/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 562, in read_file
    filename, name=name, ctx=_ctx)
  File "/home/default_user/.conda/envs/user/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 600, in read_file_eager_fallback
    attrs=_attrs, ctx=ctx, name=name)
  File "/home/default_user/.conda/envs/user/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.FailedPreconditionError: AWS Credentials have not been set properly. Unable to access the specified S3 location [Op:ReadFile]

In this use case, credentials are not passed in as environment variables but configured for Amazon EC2 instance that has an IAM role.

Side note: tf.io.read_file('SOME_S3_TFRECORD_FILE') works fine though by disabling multi part data loading with

import os
os.environ['S3_DISABLE_MULTI_PART_DOWNLOAD'] = '1' 

@vnghia
Copy link
Contributor

vnghia commented Dec 17, 2020

Unfortunately, I don't have access to an EC2 instance right now. Could you please set the environement as follow in order to see what did happend behind the scene ?

import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "0"
os.environ["TF_CPP_MIN_VLOG_LEVEL"] = "5"
os.environ["AWS_LOG_LEVEL"] = "trace"

@shaowei-su
Copy link
Author

The stack trace is quite long I pasted it here: https://gist.github.com/shaowei-su/4485e00a7a2d1e78f39275be3e7dd8f1

@vnghia
Copy link
Contributor

vnghia commented Dec 17, 2020

@shaowei-su
Many thanks for the stack trace. I am unable to understand what are happening. Please wait till the modular filesystem is ready. With that filesystem, error should be clearer. In addition, please check if there are any sensitive information inside the stack trace and remove it !

Maybe it could be related here #43344

@shaowei-su
Copy link
Author

@vnvo2409 thanks for reminding! deleted the stack trace for now

@vnghia
Copy link
Contributor

vnghia commented Dec 18, 2020

@shaowei-su
Maybe your problem related to this issue aws/aws-sdk-cpp#863 ?

@shaowei-su
Copy link
Author

@vnvo2409 Yes! I can confirm that by adding GetObejectVersion permissions this issue is resolved. Thanks for all the help!

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:data tf.data related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.3 Issues related to TF 2.3 type:bug Bug
Projects
None yet
Development

No branches or pull requests

5 participants