Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS lib is verbose when using S3 #21898

Closed
mbelang opened this issue Aug 27, 2018 · 14 comments
Closed

AWS lib is verbose when using S3 #21898

mbelang opened this issue Aug 27, 2018 · 14 comments
Assignees

Comments

@mbelang
Copy link

mbelang commented Aug 27, 2018

I'm using latest version of Tensorflow with AWS S3 as a backend for models and I have those annoying logs:

[tensorflow-server-pr-14-585f869b5f-2wln8] 2018-08-27 13:51:41.183378: I external/org_tensorflow/tensorflow/core/platform/s3/aws_logging.cc:54] Found secret key
[tensorflow-server-pr-14-585f869b5f-2wln8] 2018-08-27 13:51:41.183471: I external/org_tensorflow/tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
[tensorflow-server-pr-14-585f869b5f-2wln8] 2018-08-27 13:51:41.194855: I external/org_tensorflow/tensorflow/core/platform/s3/aws_logging.cc:54] Found secret key
[tensorflow-server-pr-14-585f869b5f-2wln8] 2018-08-27 13:51:41.194963: I external/org_tensorflow/tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
[tensorflow-server-pr-14-585f869b5f-2wln8] 2018-08-27 13:51:41.206484: I external/org_tensorflow/tensorflow/core/platform/s3/aws_logging.cc:54] Found secret key
[tensorflow-server-pr-14-585f869b5f-2wln8] 2018-08-27 13:51:41.206714: I external/org_tensorflow/tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
[tensorflow-server-pr-14-585f869b5f-2wln8] 2018-08-27 13:51:41.212817: E external/org_tensorflow/tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
[tensorflow-server-pr-14-585f869b5f-2wln8] 2018-08-27 13:51:41.212852: W external/org_tensorflow/tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.

The server is working as expected though. I set the log level to ERROR only but still the 404 error is happening more than every second.

I saw that there is an issue open in tensorflow/serving but this seems to be core/3rd party issue.

tensorflow/serving#615

Have I written custom code: No
OS Platform and Distribution: Kubernetes 1.8 running on CENTOS
TensorFlow installed from: Docker image tensorflow/serving:1.10.1
TensorFlow version: 1.10.1
Bazel version: ??
CUDA/cuDNN version: N/A
GPU model and memory: N/A
Exact command to reproduce:

if [ -f /etc/secrets/AWS_SECRET_ACCESS_KEY.secret ]; then
    export AWS_SECRET_ACCESS_KEY=$(echo -e $(cut -d ":" -f 2 <<< $(cat /etc/secrets/AWS_SECRET_ACCESS_KEY.secret)))
fi

tensorflow_model_server --port=8500 --tensorflow_session_parallelism=1 --enable_model_warmup=true --model_config_file=${MODEL_CONFIG}

Example of model_config_file:

model_config_list: {
  config: {
    name: "model1",
    base_path: "s3://tensorflow-models/model1",
    model_platform: "tensorflow"
  },
  config: {
    name: "model2",
    base_path: "s3://tensorflow-models/model2",
    model_platform: "tensorflow"
  }
}

Mobile device: None

Thank you

@tensorflowbutler tensorflowbutler added the stat:awaiting response Status - Awaiting response from author label Aug 31, 2018
@tensorflowbutler
Copy link
Member

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce
Mobile device

@mbelang
Copy link
Author

mbelang commented Aug 31, 2018

Updated.

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Sep 1, 2018
@mbelang
Copy link
Author

mbelang commented Sep 17, 2018

Yeah we disabled the logs completely because it was logging way too much and so we are loosing all the logging purpose now because "good" logs are being lost in false positive.

thank you.

@r-wheeler
Copy link

Any update on this?

@prameshbajra
Copy link

I am getting this : 😢

2019-03-19 14:14:15.095336: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-03-19 14:14:15.609993: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-03-19 14:14:16.813516: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-03-19 14:14:17.289853: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-03-19 14:14:17.804022: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-03-19 14:14:18.303771: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-03-19 14:14:19.577461: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-03-19 14:14:20.058953: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-03-19 14:14:20.533169: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-03-19 14:14:21.008386: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.

Actually, more than 500 of these. 🤕

Can anybody let me know how do I disable this log (Only this, if possible)? or a possible workaround for this?

Thanks.

@karlschriek
Copy link

This issue seems to be quite stale by now, but the problem still persists, at least for me. Has there been any kind of fix yet?

@apls777
Copy link

apls777 commented Apr 8, 2019

@prameshbajra As a workaround, I just filter these messages with grep :)

python train.py 2>&1 | grep -v "Connection has been released. Continuing."

@ghost
Copy link

ghost commented Apr 19, 2019

#24088 (comment)

@drdee
Copy link
Contributor

drdee commented May 21, 2019

See also tensorflow/serving#789

@ghost
Copy link

ghost commented Jun 11, 2019

@ymodak this can be closed since #24088

@ymodak
Copy link
Contributor

ymodak commented Jun 11, 2019

Closing this issue since the PR has been merged. Thanks!

@ymodak ymodak closed this as completed Jun 11, 2019
@davidparks21
Copy link

davidparks21 commented Nov 26, 2019

The solution here is to set the logging level for AWS, taken from the PR which has been merged, this worked in 1.14 for me:

export AWS_LOG_LEVEL=3

@descampsk
Copy link

descampsk commented Dec 19, 2019

I use TF_CPP_MIN_LOG_LEVEL=3 and it works too but the above solution seems a better way to do.

@rokopi-byte
Copy link

Hi,
I have the same problem during training, using 1.15. I'm saving logs and checkpoint to s3 bucket, because I use sagemaker for training and want to use tensorboard in real time. Are you sure that increasing the log level is the right solution? What I understand from that errors is that tensorflow is continuously polling the S3 filesystem.. is that right? Problem is that s3 charge you for this read operations, and can lead to unexpected high bills from AWS.. Is there any way to stop this polling during training ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests