tf.keras.estimator.estimator_from_model does not respect options set in RunConfig #14776

droidicus · 2017-11-22T00:35:51Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): tf.VERSION = 1.4.0 tf.GIT_VERSION = v1.4.0-rc1-11-g130a514
Python version: 2.7.12
Bazel version (if compiling from source): N/A
GCC/Compiler version (if compiling from source): N/A
CUDA/cuDNN version: 8.0.61/6.0.21
GPU model and memory: NVIDIA Tesla M60 8 GB
Exact command to reproduce: See Below

Describe the problem

When trying to use an estimator that is derived from tf.keras.estimator.estimator_from_model() and training with tf.estimator.train_and_evaluate(), setting gpu_options in the session_config of tf.estimator.RunConfig does not cause the settings to be respected when passed to the estimator_from_model function. For example setting per_process_gpu_memory_fraction=0.5 does not decrease the memory allocated to the process on the GPU, similarly setting allow_growth=True continues to allocate all of the memory and does not allow memory growth.

I also tested this with the canned estimator tf.estimator.DNNRegressor, and the settings were applied as expected when the RunConfig was passed to the estimator.

Below is code to demonstrate this issue.

Source code / logs

Minimal example, runs to completion and trains successfully. But, changing the GPUOptions settings does not cause the GPU memory to be utilized as expected:

import os
import numpy as np
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

# Neither of these GPUOptions are respected
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
#gpu_options = tf.GPUOptions(allow_growth=True)
sess_config = tf.ConfigProto(gpu_options=gpu_options)
run_config = tf.estimator.RunConfig(session_config=sess_config)

inputs = tf.keras.layers.Input(shape=(10,))
outputs = tf.keras.layers.Dense(10)(inputs)
model = tf.keras.models.Model(inputs, outputs)
model.compile(optimizer='sgd', loss='mse')
est_keras = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config)

input_name = model.input_names[0]
data = np.random.rand(1000,10).astype(np.float32)
train_input_fn = tf.estimator.inputs.numpy_input_fn({input_name:data}, data, batch_size=10, num_epochs=None, shuffle=False)

train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn, max_steps=100000)
eval_spec = tf.estimator.EvalSpec(input_fn=train_input_fn, steps=10)
tf.estimator.train_and_evaluate(est_keras, train_spec, eval_spec)

The text was updated successfully, but these errors were encountered:

droidicus · 2017-11-22T00:37:22Z

@fchollet @yifeif May be related to #14504

shivaniag · 2017-11-27T19:47:08Z

@yifeif could you please take a look.

yifeif · 2017-12-01T00:10:31Z

@ispirmustafa any idea? The run_config is passed in directly when creating the keras version of the Estimator. Do we need to pass these configurations anywhere else?

ispirmustafa · 2017-12-01T19:11:22Z

It should not be related with Keras code. It should work since it is handled within Estimator.
@droidicus could you please print est_keras.config and est_keras.config.cluster_spec.as_dict?

droidicus · 2017-12-03T17:00:01Z

Sure thing, here is the output:

*********** est_keras.config *************************************
<tensorflow.python.estimator.run_config.RunConfig object at 0x7f7694423fd0>
*********** est_keras.config.cluster_spec.as_dict()  *************
{}
******************************************************************

Code to reproduce (same as the source in the origional issue above, but with print statements after the creation of the keras estimator), and full log output are avaliable here: https://gist.github.com/droidicus/146532eacf88ed57538bb41a8fc7da4b

tensorflowbutler · 2017-12-20T01:29:06Z

It has been 14 days with no activity and this issue has an assignee.Please update the label and/or status accordingly.

droidicus · 2018-01-02T20:05:46Z

Gentle ping, this is still an issue for me.

ispirmustafa · 2018-01-09T23:38:09Z

Hi @shivaniag,
Estimator sends the given session_config directly to the Session constructor. Could you please assign somebody who is more familiar with tf.Session and it's handling of GPU settings?

ispirmustafa · 2018-01-09T23:39:52Z

FYI, I've checked the keras.model_to_estimator. It's sending the config properly to tf.estimator.Estimator.
tf.estimator.Estimator sends that config to tf.train.SessionManager calls which uses it as a constructor argument to tf.Session.

droidicus · 2018-01-10T00:04:09Z

Just as an FYI, while this is still a problem in TFv1.5rc0 we were able to do the following as a workaround for now, by setting the default session manually the memory fraction is respected:

import os
import numpy as np
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
sess_config = tf.ConfigProto(gpu_options=gpu_options)
# Manually set the default session instead
tf.Session(config=sess_config).as_default()
#run_config = tf.estimator.RunConfig(session_config=sess_config)

inputs = tf.keras.layers.Input(shape=(10,))
outputs = tf.keras.layers.Dense(10)(inputs)
model = tf.keras.models.Model(inputs, outputs)
model.compile(optimizer='sgd', loss='mse')
est_keras = tf.keras.estimator.model_to_estimator(keras_model=model)#, config=run_config)

input_name = model.input_names[0]
data = np.random.rand(1000,10).astype(np.float32)
train_input_fn = tf.estimator.inputs.numpy_input_fn({input_name:data}, data, batch_size=10, num_epochs=None, shuffle=False)

train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn, max_steps=100000)
eval_spec = tf.estimator.EvalSpec(input_fn=train_input_fn, steps=10)
tf.estimator.train_and_evaluate(est_keras, train_spec, eval_spec)

ispirmustafa · 2018-01-10T00:14:24Z

@yifeif, @fchollet is there any place within underlying Keras code that uses default session while building the graph, train_op, ...?

tensorflowbutler · 2018-01-23T22:57:11Z

A member of the TensorFlow organization has replied after the stat:awaiting tensorflower label was applied.

tensorflowbutler · 2018-02-07T13:43:59Z

Nagging Assignee: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

yifeif · 2018-02-14T00:34:53Z

Sorry for the delay. This is how model_to_estimator create its model_fn. @ispirmustafa anything you see with session that should be done differently? Thanks!

Also tried print out estimator.config.__dict__ and estimator.config.cluster_spec.__dict__ for canned estimator, custom estimator and keras converted estimator, and I'm seeing the same results:
`
*********** est_keras.config.dict *************************************
{'_save_checkpoints_secs': 600, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 0.5
}
, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f8e3ebcabd0>, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': '/tmp/tmp8jvdNt', '_save_summary_steps': 100}
*********** est_keras.config.cluster_spec.dict *************
{'_cluster_def': , '_cluster_spec': {}}

`

tensorflowbutler · 2018-03-03T08:02:50Z

Nagging Assignees @yifeif, @shivaniag, @ispirmustafa: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

yifeif · 2018-03-06T06:52:01Z

A fix has been submitted internally and should make to master tomorrow. Thanks!

droidicus · 2018-03-06T19:34:10Z

Fantastic, thanks!

droidicus mentioned this issue Nov 23, 2017

Error when setting model_dir for tf.keras.estimator.estimator_from_model() #14818

Closed

shivaniag assigned yifeif Nov 27, 2017

shivaniag added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug labels Nov 27, 2017

yifeif added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Dec 1, 2017

aselle removed the stat:awaiting response Status - Awaiting response from author label Dec 3, 2017

shivaniag assigned ispirmustafa Jan 9, 2018

shivaniag added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 9, 2018

ispirmustafa assigned shivaniag Jan 9, 2018

tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 23, 2018

yifeif closed this as completed in 355fb5e Mar 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf.keras.estimator.estimator_from_model does not respect options set in RunConfig #14776

tf.keras.estimator.estimator_from_model does not respect options set in RunConfig #14776

droidicus commented Nov 22, 2017 •

edited

droidicus commented Nov 22, 2017

shivaniag commented Nov 27, 2017

yifeif commented Dec 1, 2017

ispirmustafa commented Dec 1, 2017

droidicus commented Dec 3, 2017

tensorflowbutler commented Dec 20, 2017

droidicus commented Jan 2, 2018

ispirmustafa commented Jan 9, 2018

ispirmustafa commented Jan 9, 2018

droidicus commented Jan 10, 2018

ispirmustafa commented Jan 10, 2018

tensorflowbutler commented Jan 23, 2018

tensorflowbutler commented Feb 7, 2018

yifeif commented Feb 14, 2018

tensorflowbutler commented Mar 3, 2018

yifeif commented Mar 6, 2018

droidicus commented Mar 6, 2018

tf.keras.estimator.estimator_from_model does not respect options set in RunConfig #14776

tf.keras.estimator.estimator_from_model does not respect options set in RunConfig #14776

Comments

droidicus commented Nov 22, 2017 • edited

System information

Describe the problem

Source code / logs

droidicus commented Nov 22, 2017

shivaniag commented Nov 27, 2017

yifeif commented Dec 1, 2017

ispirmustafa commented Dec 1, 2017

droidicus commented Dec 3, 2017

tensorflowbutler commented Dec 20, 2017

droidicus commented Jan 2, 2018

ispirmustafa commented Jan 9, 2018

ispirmustafa commented Jan 9, 2018

droidicus commented Jan 10, 2018

ispirmustafa commented Jan 10, 2018

tensorflowbutler commented Jan 23, 2018

tensorflowbutler commented Feb 7, 2018

yifeif commented Feb 14, 2018

tensorflowbutler commented Mar 3, 2018

yifeif commented Mar 6, 2018

droidicus commented Mar 6, 2018

droidicus commented Nov 22, 2017 •

edited