New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib.learn.Estimator does not work with multiple GPU #6132

Closed
ashern opened this Issue Dec 6, 2016 · 23 comments

Comments

Projects
None yet
7 participants
@ashern

ashern commented Dec 6, 2016

Attempting to assign ops to a GPU within model_fn passed to an Estimator produces the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device to node 'save/ShardedFilename_2': Could not satisfy explicit device specification '/device:GPU:1' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices: 
Identity: CPU 
ShardedFilename: CPU 
	 [[Node: save/ShardedFilename_2 = ShardedFilename[_device="/device:GPU:1"](save/Const, save/ShardedFilename_2/shard, save/num_shards)]]

This can be reproduced by running the example in examples/learn/multiple_gpu.py

@ashern

This comment has been minimized.

Show comment
Hide comment
@ashern

ashern Dec 8, 2016

A quick bump on this - am I making fundamentally incorrect assumptions on how this should work?

I'm not eager to replicate the Estimator's functionality, which is great overall, but it's important for me to get my models working in TF w/ multiple GPUs. I'd be happy to help w/ a PR if some work is needed to get this functioning.

I had also posted on SO here.

Thank you - wonderful library.

ashern commented Dec 8, 2016

A quick bump on this - am I making fundamentally incorrect assumptions on how this should work?

I'm not eager to replicate the Estimator's functionality, which is great overall, but it's important for me to get my models working in TF w/ multiple GPUs. I'd be happy to help w/ a PR if some work is needed to get this functioning.

I had also posted on SO here.

Thank you - wonderful library.

@martinwicke

This comment has been minimized.

Show comment
Hide comment
@martinwicke

martinwicke Dec 8, 2016

Member

Thanks for filing this issue. We know about the problem and are in the process of fixing it. Specifically, and as a stopgap, we will allow you do (optionally) create a Saver yourself, so you can control where its ops land. We'll update this thread once that has landed (@ispirmustafa FYI)

Member

martinwicke commented Dec 8, 2016

Thanks for filing this issue. We know about the problem and are in the process of fixing it. Specifically, and as a stopgap, we will allow you do (optionally) create a Saver yourself, so you can control where its ops land. We'll update this thread once that has landed (@ispirmustafa FYI)

@ashern

This comment has been minimized.

Show comment
Hide comment
@ashern

ashern Dec 8, 2016

Thank you, @martinwicke. I'll keep an eye out.

In the meanwhile, were I to dig into the source and add allow_soft_placement=True on the session creation, is that likely to work-around the problem or is there another blocking issue (recognizing the can of worms I'm opening w/ that option setting)?

ashern commented Dec 8, 2016

Thank you, @martinwicke. I'll keep an eye out.

In the meanwhile, were I to dig into the source and add allow_soft_placement=True on the session creation, is that likely to work-around the problem or is there another blocking issue (recognizing the can of worms I'm opening w/ that option setting)?

@martinwicke

This comment has been minimized.

Show comment
Hide comment
@martinwicke

martinwicke Dec 8, 2016

Member

I believe if you did that, you should be fine.

Member

martinwicke commented Dec 8, 2016

I believe if you did that, you should be fine.

@vvpreetham

This comment has been minimized.

Show comment
Hide comment
@vvpreetham

vvpreetham Jan 23, 2017

Hi,

I get the same errors when using contrib.learn package (Regressors) where only the GPU memory is allocated but the GPU processing is at zero. I tried allow_soft_placement=True but still get the same errors.

Is this fixed yet?

vvpreetham commented Jan 23, 2017

Hi,

I get the same errors when using contrib.learn package (Regressors) where only the GPU memory is allocated but the GPU processing is at zero. I tried allow_soft_placement=True but still get the same errors.

Is this fixed yet?

@ispirmustafa

This comment has been minimized.

Show comment
Hide comment
@ispirmustafa

ispirmustafa Jan 24, 2017

Contributor

We're making "allow_soft_placement" default in Estimator implementation.
That should fix this issue.

Contributor

ispirmustafa commented Jan 24, 2017

We're making "allow_soft_placement" default in Estimator implementation.
That should fix this issue.

@martinwicke

This comment has been minimized.

Show comment
Hide comment
@martinwicke

martinwicke Jan 24, 2017

Member

@vvpreetham it's surprising that allow_soft_placement would not have fixed this. How did you set it?

Member

martinwicke commented Jan 24, 2017

@vvpreetham it's surprising that allow_soft_placement would not have fixed this. How did you set it?

@vvpreetham

This comment has been minimized.

Show comment
Hide comment
@vvpreetham

vvpreetham Jan 24, 2017

vvpreetham commented Jan 24, 2017

@vvpreetham

This comment has been minimized.

Show comment
Hide comment
@vvpreetham

vvpreetham Jan 24, 2017

Also, I am using the estimator as follows:

    estimator.fit(input_fn=lambda: feed_fn(batch_df_train), steps=train_steps)

vvpreetham commented Jan 24, 2017

Also, I am using the estimator as follows:

    estimator.fit(input_fn=lambda: feed_fn(batch_df_train), steps=train_steps)
@vvpreetham

This comment has been minimized.

Show comment
Hide comment
@vvpreetham

vvpreetham Jan 24, 2017

Also as an update, the reason I am trying out the specific device allocation is that, when I normally use the program without tf.device then, I am seeing a strange behavior where only the Titan-X GPU memory is allocated but the GPU processor is 0% used. (I have 3 Titan-X GPU on the same box)

vvpreetham commented Jan 24, 2017

Also as an update, the reason I am trying out the specific device allocation is that, when I normally use the program without tf.device then, I am seeing a strange behavior where only the Titan-X GPU memory is allocated but the GPU processor is 0% used. (I have 3 Titan-X GPU on the same box)

@ashern

This comment has been minimized.

Show comment
Hide comment
@ashern

ashern Jan 24, 2017

@vvpreetham I think what you're seeing is expected behavior, and unrelated to Estimator. Tensorflow will automatically claim all available memory for all of the GPUs that it sees, unless you tell it otherwise. It will do this whether you are using them for your model or not. Take a look at CUDA_VISIBLE_DEVICES environment setting to specify GPUs to use, or you can change the memory settings (see using GPUs in the docs)

ashern commented Jan 24, 2017

@vvpreetham I think what you're seeing is expected behavior, and unrelated to Estimator. Tensorflow will automatically claim all available memory for all of the GPUs that it sees, unless you tell it otherwise. It will do this whether you are using them for your model or not. Take a look at CUDA_VISIBLE_DEVICES environment setting to specify GPUs to use, or you can change the memory settings (see using GPUs in the docs)

@vvpreetham

This comment has been minimized.

Show comment
Hide comment
@vvpreetham

vvpreetham Jan 24, 2017

@ashern I am guessing your comment is specific to my last comment (memory allocation). How do I ensure that tensorflow is actually using all the GPU processors I have?
Does CUDA_VISIBLE_DEVICE also ensure the GPU processor usage?

vvpreetham commented Jan 24, 2017

@ashern I am guessing your comment is specific to my last comment (memory allocation). How do I ensure that tensorflow is actually using all the GPU processors I have?
Does CUDA_VISIBLE_DEVICE also ensure the GPU processor usage?

@ashern

This comment has been minimized.

Show comment
Hide comment
@ashern

ashern Jan 24, 2017

@vvpreetham That's a pretty big topic. Take a look here: https://www.tensorflow.org/how_tos/using_gpu/ - and I think you'll find more help if you post your questions to StackOverflow for a wider audience to answer.

ashern commented Jan 24, 2017

@vvpreetham That's a pretty big topic. Take a look here: https://www.tensorflow.org/how_tos/using_gpu/ - and I think you'll find more help if you post your questions to StackOverflow for a wider audience to answer.

@vvpreetham

This comment has been minimized.

Show comment
Hide comment
@vvpreetham

vvpreetham Jan 24, 2017

@ashern Thanks. I shall post on StackOverflow, meanwhile setting CUDA_VISIBLE_DEVICES still does not enable my processor usage.

(I have tried everything on the link you have specified and still no luck. Hence posted here)

vvpreetham commented Jan 24, 2017

@ashern Thanks. I shall post on StackOverflow, meanwhile setting CUDA_VISIBLE_DEVICES still does not enable my processor usage.

(I have tried everything on the link you have specified and still no luck. Hence posted here)

@vvpreetham

This comment has been minimized.

Show comment
Hide comment
@vvpreetham

vvpreetham Jan 24, 2017

@martinwicke bump on my original question (so that this thread is not lost) :)

vvpreetham commented Jan 24, 2017

@martinwicke bump on my original question (so that this thread is not lost) :)

@vvpreetham

This comment has been minimized.

Show comment
Hide comment
@vvpreetham

vvpreetham Jan 31, 2017

The problem seems to be the SparseTensor. Note that the following piece of code works:

categorical_cols = {k: tf.SparseTensor(
        indices=[[i, 0] for i in range(df[k].size)],
        values=df[k].values,
        shape=[df[k].size, 1]) for k in CATEGORICAL_COLUMNS}
for d in ['/gpu:0','/gpu:1','/gpu:2']:
    with tf.device(d):
        # Converts the label column into a constant Tensor.
        label = tf.constant(df[REGRESSOR_LABEL_COLUMN].values)
        # Returns the feature columns and the label.

BUT if I do the following (that-is, assign the SparseTensor to the devices, it fails)

for d in ['/gpu:0','/gpu:1','/gpu:2']:
    with tf.device(d):
        # Creates a dictionary mapping from each categorical feature column name (k)
        # to the values of that column stored in a tf.SparseTensor.
        categorical_cols = {k: tf.SparseTensor(
                indices=[[i, 0] for i in range(df[k].size)],
                values=df[k].values,
                shape=[df[k].size, 1]) for k in CATEGORICAL_COLUMNS}
        # Converts the label column into a constant Tensor.
        label = tf.constant(df[REGRESSOR_LABEL_COLUMN].values)
        # Returns the feature columns and the label.

vvpreetham commented Jan 31, 2017

The problem seems to be the SparseTensor. Note that the following piece of code works:

categorical_cols = {k: tf.SparseTensor(
        indices=[[i, 0] for i in range(df[k].size)],
        values=df[k].values,
        shape=[df[k].size, 1]) for k in CATEGORICAL_COLUMNS}
for d in ['/gpu:0','/gpu:1','/gpu:2']:
    with tf.device(d):
        # Converts the label column into a constant Tensor.
        label = tf.constant(df[REGRESSOR_LABEL_COLUMN].values)
        # Returns the feature columns and the label.

BUT if I do the following (that-is, assign the SparseTensor to the devices, it fails)

for d in ['/gpu:0','/gpu:1','/gpu:2']:
    with tf.device(d):
        # Creates a dictionary mapping from each categorical feature column name (k)
        # to the values of that column stored in a tf.SparseTensor.
        categorical_cols = {k: tf.SparseTensor(
                indices=[[i, 0] for i in range(df[k].size)],
                values=df[k].values,
                shape=[df[k].size, 1]) for k in CATEGORICAL_COLUMNS}
        # Converts the label column into a constant Tensor.
        label = tf.constant(df[REGRESSOR_LABEL_COLUMN].values)
        # Returns the feature columns and the label.
@martinwicke

This comment has been minimized.

Show comment
Hide comment
@martinwicke

martinwicke Feb 3, 2017

Member

If you're still getting the same error message (cannot put string tensor on GPU), then you still haven't enabled soft placement (or soft placement does not consider colocation constraints? @vrv do you know?). You cannot put string tensors on GPUs, so that particular tensor has to live on the CPU.

Member

martinwicke commented Feb 3, 2017

If you're still getting the same error message (cannot put string tensor on GPU), then you still haven't enabled soft placement (or soft placement does not consider colocation constraints? @vrv do you know?). You cannot put string tensors on GPUs, so that particular tensor has to live on the CPU.

@vvpreetham

This comment has been minimized.

Show comment
Hide comment
@vvpreetham

vvpreetham Feb 3, 2017

I am still getting the error. I am super certain that soft placement is enabled as I have dry run the code with log.info and also breakpoints. I also add the gpu_fraction and soft placement directly in the session (The GPU memory fraction works, and soft placement works as I have stated for constants). My modified code for session is as follows:

def get_session(gpu_fraction=DEFAULT_GPU_FRACTION):
    num_threads = os.environ.get('OMP_NUM_THREADS')
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=gpu_fraction)
    if num_threads:
        return tf.Session(config=tf.ConfigProto(
                    allow_soft_placement=True,
                    log_device_placement=True,
                    gpu_options=gpu_options, 
                    intra_op_parallelism_threads=num_threads))
    else:
        return tf.Session(config=tf.ConfigProto(allow_soft_placement=True,
                                                log_device_placement=True,
                                                gpu_options=gpu_options))

I setup the session as follows:

def main(_):
    with get_session() as sess:
        m, df_test = wide_batch_train()

The problem seems to be the tf.SparseTensor

vvpreetham commented Feb 3, 2017

I am still getting the error. I am super certain that soft placement is enabled as I have dry run the code with log.info and also breakpoints. I also add the gpu_fraction and soft placement directly in the session (The GPU memory fraction works, and soft placement works as I have stated for constants). My modified code for session is as follows:

def get_session(gpu_fraction=DEFAULT_GPU_FRACTION):
    num_threads = os.environ.get('OMP_NUM_THREADS')
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=gpu_fraction)
    if num_threads:
        return tf.Session(config=tf.ConfigProto(
                    allow_soft_placement=True,
                    log_device_placement=True,
                    gpu_options=gpu_options, 
                    intra_op_parallelism_threads=num_threads))
    else:
        return tf.Session(config=tf.ConfigProto(allow_soft_placement=True,
                                                log_device_placement=True,
                                                gpu_options=gpu_options))

I setup the session as follows:

def main(_):
    with get_session() as sess:
        m, df_test = wide_batch_train()

The problem seems to be the tf.SparseTensor

@ashern

This comment has been minimized.

Show comment
Hide comment
@ashern

ashern Mar 27, 2017

I see in the repo that Estimator has been promoted to core from contrib for 1.1 - great news.

Briefly checking in on this issue - will the deployed implementation handle multiple GPU device assignment & soft placement?

Many thanks for the hard work!

ashern commented Mar 27, 2017

I see in the repo that Estimator has been promoted to core from contrib for 1.1 - great news.

Briefly checking in on this issue - will the deployed implementation handle multiple GPU device assignment & soft placement?

Many thanks for the hard work!

@martinwicke

This comment has been minimized.

Show comment
Hide comment
@martinwicke

martinwicke Mar 27, 2017

Member
Member

martinwicke commented Mar 27, 2017

@ashern

This comment has been minimized.

Show comment
Hide comment
@ashern

ashern Mar 27, 2017

Excellent. Time to put my homespun solution to rest...

Many thanks, Martin. Appreciated your overview at the summit!

ashern commented Mar 27, 2017

Excellent. Time to put my homespun solution to rest...

Many thanks, Martin. Appreciated your overview at the summit!

@ashern ashern closed this Mar 27, 2017

@abnera

This comment has been minimized.

Show comment
Hide comment
@abnera

abnera Apr 4, 2017

@martinwicke: any examples of multi-gpu for tf.learn now that it will be on core besides the one provided in the example section? The one from example section only does parallel-model as opposed to data-parallel https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/multiple_gpu.py

Thanks.

abnera commented Apr 4, 2017

@martinwicke: any examples of multi-gpu for tf.learn now that it will be on core besides the one provided in the example section? The one from example section only does parallel-model as opposed to data-parallel https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/multiple_gpu.py

Thanks.

@ispirmustafa ispirmustafa assigned alextp and unassigned ispirmustafa Apr 5, 2017

@alextp

This comment has been minimized.

Show comment
Hide comment
@alextp

alextp Apr 5, 2017

Member

I don't think creating sparse string Tensors on the GPU will work since the GPU does not support strings.

It's best to structure your model so the string to int conversion happens on the CPU, and the GPU just processes the dense part of your model.

That said, the linear regression canned estimator will probably see no benefit from running on the GPU (not enough matrix multiplications to offset the data transfer cost).

Member

alextp commented Apr 5, 2017

I don't think creating sparse string Tensors on the GPU will work since the GPU does not support strings.

It's best to structure your model so the string to int conversion happens on the CPU, and the GPU just processes the dense part of your model.

That said, the linear regression canned estimator will probably see no benefit from running on the GPU (not enough matrix multiplications to offset the data transfer cost).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment