Using Dataset api with Estimator in MirroredStrategy, Non-DMA-safe string tensor error #19588

huangynn · 2018-05-28T06:53:00Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): centos
TensorFlow installed from (source or binary): pip install tensorflow-gpu
TensorFlow version (use command below):1.8.0
Python version:
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):4.8.5
CUDA/cuDNN version: 9.0
GPU model and memory:GeForce GTX 1080Ti * 4
Exact command to reproduce:

Describe the problem

Using mutilple gpu by MirroredStrategy, Get ' Non-DMA-safe string tensor may not be copied from/to a GPU.' error

Source code / logs

chengmengli06 · 2018-06-01T03:22:52Z

I meet the same problem @skye in using object detection apis

skye · 2018-06-04T22:34:51Z

Can you provide code to repro the problem?

mrry · 2018-06-04T22:45:54Z

Rohan/Priya: I'm guessing this is what happens when a tf.string tensor goes through prefetch_to_devices(), but unclear whether it should be handled in the client program (e.g. by splitting out strings from the prefetched dataset) or in the FunctionBufferingResource (e.g. by allowing some outputs to be "host memory" only).

huangynn · 2018-06-05T00:39:54Z

def get_inputs(mode, csv_file, batch_size, label_list, preprocess):
iterator_initializer_hook = IteratorInitializerHook()
def inputs():
is_training = mode==estimator.ModeKeys.TRAIN
ds = tf.data.TextLineDataset(csv_file).skip(1)

    def classification_parse_line(line):
        columns = ['img','label']
        img_name, label = tf.decode_csv(
            line, 
            record_defaults = [[''],['']]) 
        # assume every pic is rgb
        image_decoded = tf.image.decode_png(
            tf.read_file(img_name),
            channels=3)
        image = preprocess(image_decoded)
        """image = image_preprocessing_fn(
            image, 
            image_size,
            image_size)"""
        return image,label
    
    cpu_num = multiprocessing.cpu_count()
    ds = ds.apply(
        tf.contrib.data.map_and_batch(
            classification_parse_line,
            batch_size=batch_size,
            num_parallel_batches=cpu_num))   
    ds = ds.prefetch(None)
    iterator = ds.make_initializable_iterator()

    iterator_initializer_hook.iterator_initializer_func = lambda sess: sess.run(iterator.initializer)
    return ds

return iterator_initializer_hook, inputs

distribution = tf.contrib.distribute.MirroredStrategy()
config = tf.estimator.RunConfig(
model_dir=args.model_dir,
tf_random_seed=912,
save_summary_steps=args.save_summary_steps,
save_checkpoints_steps=args.save_interval_steps,
keep_checkpoint_max=5*get_num_replicas(),
train_distribute=distribution,
session_config=session_config
)

classifier = tf.estimator.Estimator(
my_model,
config=config,
params=params)
for epoch in range(args.num_epochs):
logger.info('Starting epoch %d / %d' % (
epoch + 1, args.num_epochs))
classifier.train(
train_ds,
hooks=[train_ds_hook])
classifier.evaluate(
val_ds,
hooks=[val_ds_hook])

huangynn · 2018-06-05T00:41:20Z

Nothing special, its just common code with MirroredStrategy.
If replace MirroredStrategy with OneDeviceStrategy, everything went well

tensorflowbutler · 2018-06-19T18:43:39Z

Nagging Assignees @rohan100jain, @guptapriya: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

rohan100jain · 2018-06-19T20:59:54Z

As Derek mentioned, currently we don't have a mechanism for specifying if some outputs should be in host memory and we assume (to a large extent) that they'd be in device memory. Strings can't be in device memory, hence the bug. I shall work on having a dynamic method of identifying which outputs should be allocated on the host / device. Stay tuned for a fix in a bit.

FunctionBufferingResource. This allows for types such as strings that are always in host memory to be returned from the FunctionBufferingResource. Fixes tensorflow#19588 PiperOrigin-RevId: 202206052

guptapriya · 2018-06-28T05:19:18Z

@rohan100jain can this issue be closed now that your fix has been merged?

tensorflowbutler assigned skye May 28, 2018

skye assigned mrry and unassigned skye Jun 4, 2018

mrry assigned rohan100jain and guptapriya and unassigned mrry Jun 4, 2018

rohan100jain added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug labels Jun 19, 2018

tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jun 28, 2018

gunan closed this as completed in 6d1beeb Jun 29, 2018

Jetcodery mentioned this issue Jul 31, 2018

Using Dataset api with Estimator in MirroredStrategy, Dst tensor is not initialized #21260

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Dataset api with Estimator in MirroredStrategy, Non-DMA-safe string tensor error #19588

Using Dataset api with Estimator in MirroredStrategy, Non-DMA-safe string tensor error #19588

huangynn commented May 28, 2018

chengmengli06 commented Jun 1, 2018

skye commented Jun 4, 2018

mrry commented Jun 4, 2018

huangynn commented Jun 5, 2018

huangynn commented Jun 5, 2018

tensorflowbutler commented Jun 19, 2018

rohan100jain commented Jun 19, 2018

guptapriya commented Jun 28, 2018

Using Dataset api with Estimator in MirroredStrategy, Non-DMA-safe string tensor error #19588

Using Dataset api with Estimator in MirroredStrategy, Non-DMA-safe string tensor error #19588

Comments

huangynn commented May 28, 2018

System information

Describe the problem

Source code / logs

chengmengli06 commented Jun 1, 2018

skye commented Jun 4, 2018

mrry commented Jun 4, 2018

huangynn commented Jun 5, 2018

huangynn commented Jun 5, 2018

tensorflowbutler commented Jun 19, 2018

rohan100jain commented Jun 19, 2018

guptapriya commented Jun 28, 2018