Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.keras.estimator.estimator_from_model does not respect options set in RunConfig #14776

Closed
droidicus opened this issue Nov 22, 2017 · 17 comments
Assignees
Labels

Comments

@droidicus
Copy link

droidicus commented Nov 22, 2017

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): tf.VERSION = 1.4.0 tf.GIT_VERSION = v1.4.0-rc1-11-g130a514
  • Python version: 2.7.12
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: 8.0.61/6.0.21
  • GPU model and memory: NVIDIA Tesla M60 8 GB
  • Exact command to reproduce: See Below

Describe the problem

When trying to use an estimator that is derived from tf.keras.estimator.estimator_from_model() and training with tf.estimator.train_and_evaluate(), setting gpu_options in the session_config of tf.estimator.RunConfig does not cause the settings to be respected when passed to the estimator_from_model function. For example setting per_process_gpu_memory_fraction=0.5 does not decrease the memory allocated to the process on the GPU, similarly setting allow_growth=True continues to allocate all of the memory and does not allow memory growth.

I also tested this with the canned estimator tf.estimator.DNNRegressor, and the settings were applied as expected when the RunConfig was passed to the estimator.

Below is code to demonstrate this issue.

Source code / logs

Minimal example, runs to completion and trains successfully. But, changing the GPUOptions settings does not cause the GPU memory to be utilized as expected:

import os
import numpy as np
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

# Neither of these GPUOptions are respected
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
#gpu_options = tf.GPUOptions(allow_growth=True)
sess_config = tf.ConfigProto(gpu_options=gpu_options)
run_config = tf.estimator.RunConfig(session_config=sess_config)

inputs = tf.keras.layers.Input(shape=(10,))
outputs = tf.keras.layers.Dense(10)(inputs)
model = tf.keras.models.Model(inputs, outputs)
model.compile(optimizer='sgd', loss='mse')
est_keras = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config)

input_name = model.input_names[0]
data = np.random.rand(1000,10).astype(np.float32)
train_input_fn = tf.estimator.inputs.numpy_input_fn({input_name:data}, data, batch_size=10, num_epochs=None, shuffle=False)

train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn, max_steps=100000)
eval_spec = tf.estimator.EvalSpec(input_fn=train_input_fn, steps=10)
tf.estimator.train_and_evaluate(est_keras, train_spec, eval_spec)
@droidicus
Copy link
Author

@fchollet @yifeif May be related to #14504

@shivaniag
Copy link
Contributor

@yifeif could you please take a look.

@shivaniag shivaniag added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug labels Nov 27, 2017
@yifeif
Copy link
Contributor

yifeif commented Dec 1, 2017

@ispirmustafa any idea? The run_config is passed in directly when creating the keras version of the Estimator. Do we need to pass these configurations anywhere else?

@ispirmustafa
Copy link
Contributor

It should not be related with Keras code. It should work since it is handled within Estimator.
@droidicus could you please print est_keras.config and est_keras.config.cluster_spec.as_dict?

@yifeif yifeif added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Dec 1, 2017
@droidicus
Copy link
Author

Sure thing, here is the output:

*********** est_keras.config *************************************
<tensorflow.python.estimator.run_config.RunConfig object at 0x7f7694423fd0>
*********** est_keras.config.cluster_spec.as_dict()  *************
{}
******************************************************************

Code to reproduce (same as the source in the origional issue above, but with print statements after the creation of the keras estimator), and full log output are avaliable here: https://gist.github.com/droidicus/146532eacf88ed57538bb41a8fc7da4b

@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label Dec 3, 2017
@tensorflowbutler
Copy link
Member

It has been 14 days with no activity and this issue has an assignee.Please update the label and/or status accordingly.

@droidicus
Copy link
Author

Gentle ping, this is still an issue for me.

@shivaniag shivaniag added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 9, 2018
@ispirmustafa
Copy link
Contributor

Hi @shivaniag,
Estimator sends the given session_config directly to the Session constructor. Could you please assign somebody who is more familiar with tf.Session and it's handling of GPU settings?

@ispirmustafa
Copy link
Contributor

FYI, I've checked the keras.model_to_estimator. It's sending the config properly to tf.estimator.Estimator.
tf.estimator.Estimator sends that config to tf.train.SessionManager calls which uses it as a constructor argument to tf.Session.

@droidicus
Copy link
Author

Just as an FYI, while this is still a problem in TFv1.5rc0 we were able to do the following as a workaround for now, by setting the default session manually the memory fraction is respected:

import os
import numpy as np
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
sess_config = tf.ConfigProto(gpu_options=gpu_options)
# Manually set the default session instead
tf.Session(config=sess_config).as_default()
#run_config = tf.estimator.RunConfig(session_config=sess_config)

inputs = tf.keras.layers.Input(shape=(10,))
outputs = tf.keras.layers.Dense(10)(inputs)
model = tf.keras.models.Model(inputs, outputs)
model.compile(optimizer='sgd', loss='mse')
est_keras = tf.keras.estimator.model_to_estimator(keras_model=model)#, config=run_config)

input_name = model.input_names[0]
data = np.random.rand(1000,10).astype(np.float32)
train_input_fn = tf.estimator.inputs.numpy_input_fn({input_name:data}, data, batch_size=10, num_epochs=None, shuffle=False)

train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn, max_steps=100000)
eval_spec = tf.estimator.EvalSpec(input_fn=train_input_fn, steps=10)
tf.estimator.train_and_evaluate(est_keras, train_spec, eval_spec)

@ispirmustafa
Copy link
Contributor

@yifeif, @fchollet is there any place within underlying Keras code that uses default session while building the graph, train_op, ...?

@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 23, 2018
@tensorflowbutler
Copy link
Member

A member of the TensorFlow organization has replied after the stat:awaiting tensorflower label was applied.

@tensorflowbutler
Copy link
Member

Nagging Assignee: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@yifeif
Copy link
Contributor

yifeif commented Feb 14, 2018

Sorry for the delay. This is how model_to_estimator create its model_fn. @ispirmustafa anything you see with session that should be done differently? Thanks!

Also tried print out estimator.config.__dict__ and estimator.config.cluster_spec.__dict__ for canned estimator, custom estimator and keras converted estimator, and I'm seeing the same results:
`
*********** est_keras.config.dict *************************************
{'_save_checkpoints_secs': 600, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 0.5
}
, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f8e3ebcabd0>, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': '/tmp/tmp8jvdNt', '_save_summary_steps': 100}
*********** est_keras.config.cluster_spec.dict *************
{'_cluster_def': , '_cluster_spec': {}}


`

@tensorflowbutler
Copy link
Member

Nagging Assignees @yifeif, @shivaniag, @ispirmustafa: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@yifeif
Copy link
Contributor

yifeif commented Mar 6, 2018

A fix has been submitted internally and should make to master tomorrow. Thanks!

@droidicus
Copy link
Author

Fantastic, thanks!

@yifeif yifeif closed this as completed in 355fb5e Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants