Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resnet: Default AvgPoolingOp only supports NHWC. #263

Closed
andreas128 opened this issue May 9, 2017 · 13 comments
Closed

Resnet: Default AvgPoolingOp only supports NHWC. #263

andreas128 opened this issue May 9, 2017 · 13 comments
Labels
examples unrelated unrelated to tensorpack / invalid questions / questions that are not helpful to others

Comments

@andreas128
Copy link

When running python examples/ResNet/cifar10-resnet.py I get following error message InvalidArgumentError (see above for traceback): Default AvgPoolingOp only supports NHWC.

I tried it using tensorflow 1.0 and 1.1. Also with tag 0.1.8 and 0.1.9. The full error message is as following:

  0%|                                                                                                           |0/390[00:00<?,?it/s]
E tensorflow/core/common_runtime/executor.cc:594] Executor failed to create kernel. Invalid argument: Default AvgPoolingOp only suppo
rts NHWC.
         [[Node: tower0/res2.0/pool/output = AvgPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1,
 1, 2, 2], _device="/job:localhost/replica:0/task:0/cpu:0"](tower0/res1.17/add)]]

Traceback (most recent call last):
[0509 17:26:58 @input_data.py:125] EnqueueThread Exited.
  File "examples/ResNet/cifar10-resnet.py", line 179, in <module>
    SyncMultiGPUTrainer(config).train()
  File "/media/Data/face3d/tensorpack/tensorpack/train/base.py", line 94, in train
    self.main_loop()
  File "/media/Data/face3d/tensorpack/tensorpack/train/base.py", line 164, in main_loop
    self.run_step()  # implemented by subclass
  File "/media/Data/face3d/tensorpack/tensorpack/train/feedfree.py", line 43, in run_step
    self.hooked_sess.run(self.train_op)
  File "/home/andreas/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 891, in run
    run_metadata=run_metadata)
  File "/home/andreas/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 744, in run
    return self._sess.run(*args, **kwargs)
  File "/home/andreas/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/home/andreas/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/home/andreas/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/home/andreas/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default AvgPoolingOp only supports NHWC.
         [[Node: tower0/res2.0/pool/output = AvgPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1,
 1, 2, 2], _device="/job:localhost/replica:0/task:0/cpu:0"](tower0/res1.17/add)]]

Caused by op u'tower0/res2.0/pool/output', defined at:
  File "examples/ResNet/cifar10-resnet.py", line 179, in <module>
    SyncMultiGPUTrainer(config).train()
  File "/media/Data/face3d/tensorpack/tensorpack/train/base.py", line 93, in train
    self.setup()
  File "/media/Data/face3d/tensorpack/tensorpack/train/base.py", line 108, in setup
    self._setup()   # subclass will setup the graph
  File "/media/Data/face3d/tensorpack/tensorpack/train/multigpu.py", line 132, in _setup
    self.config.tower, lambda: self._get_cost_and_grad()[1])
  File "/media/Data/face3d/tensorpack/tensorpack/train/multigpu.py", line 41, in _multi_tower_grads
    grad_list.append(get_tower_grad_func())
  File "/media/Data/face3d/tensorpack/tensorpack/train/multigpu.py", line 132, in <lambda>
    self.config.tower, lambda: self._get_cost_and_grad()[1])
  File "/media/Data/face3d/tensorpack/tensorpack/train/feedfree.py", line 66, in _get_cost_and_grad
    self.build_train_tower()
  File "/media/Data/face3d/tensorpack/tensorpack/train/feedfree.py", line 35, in build_train_tower
    f()
  File "/media/Data/face3d/tensorpack/tensorpack/train/feedfree.py", line 28, in f
    self.model.build_graph(inputs)
  File "/media/Data/face3d/tensorpack/tensorpack/models/model_desc.py", line 116, in build_graph
    self._build_graph(model_inputs)
  File "examples/ResNet/cifar10-resnet.py", line 83, in _build_graph
    l = residual('res2.0', l, increase_dim=True)
  File "examples/ResNet/cifar10-resnet.py", line 68, in residual
    l = AvgPooling('pool', l, 2)
  File "/media/Data/face3d/tensorpack/tensorpack/models/common.py", line 97, in wrapped_func
    outputs = func(*args, **actual_args)
  File "/media/Data/face3d/tensorpack/tensorpack/models/pool.py", line 64, in AvgPooling
    data_format=data_format)
  File "/media/Data/face3d/tensorpack/tensorpack/models/pool.py", line 28, in _Pooling
    name='output')
  File "/home/andreas/miniconda2/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1765, in avg_pool
    name=name)
  File "/home/andreas/miniconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 50, in _avg_pool
    data_format=data_format, name=name)
  File "/home/andreas/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/home/andreas/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/andreas/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Default AvgPoolingOp only supports NHWC.
         [[Node: tower0/res2.0/pool/output = AvgPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1,
 1, 2, 2], _device="/job:localhost/replica:0/task:0/cpu:0"](tower0/res1.17/add)]]

@ppwwyyxx
Copy link
Collaborator

ppwwyyxx commented May 9, 2017

TensorFlow only supports NHWC on cpu.
If you really want to use CPU, you can change data_format to 'NHWC' at this line, and remove the layout transform at this line.

@andreas128
Copy link
Author

Thank you very much, that solved the Issue.

@abhijitnathwani
Copy link

abhijitnathwani commented Nov 29, 2017

Hello,
Even I faced the same error, and on solving with the method as suggested by @ppwwyyxx I solved that error. However, i guess changing the data_format also changed some shapes.

Now I'm getting error:

ValueError: Dimensions must be equal, but are 3 and 16 for 'tower0/res1.0/add' (op: 'Add') with input shapes: [?,3,32,3], [?,3,32,16].

Please help me for the same. I need to train cifar10-resnet.py on CPU. I don't have GPU with me.

@ppwwyyxx
Copy link
Collaborator

cifar10-resnet.py is written for GPU.
You need to change the relevant symbolic code after changing data_format.

@abhijitnathwani
Copy link

Can you please help me with changes @ppwwyyxx ? I'm stuck on this for long now. Don't know where to make the changes for this to run on CPU. All the help is appreciated!

@ppwwyyxx
Copy link
Collaborator

If you can understand tensorflow symbolic code, the changes are easy to identify in build_graph. If not I suggest you learn basic tensorflow symbolic programming before using tensorpack.

@abhijitnathwani
Copy link

@ppwwyyxx I already made the necessary changes in build_graph to bypass GPU checks. I still cannot get why the

ValueError: Dimensions must be equal, but are 3 and 16 for 'tower0/res1.0/add' (op: 'Add') with input shapes: [?,3,32,3], [?,3,32,16].

error occurs.

Shapes should have nothing to do with GPU/CPU I guess.

@ppwwyyxx
Copy link
Collaborator

Shapes should have something to do with data_format, for sure.

@ppwwyyxx
Copy link
Collaborator

diff --git i/examples/ResNet/cifar10-resnet.py w/examples/ResNet/cifar10-resnet.py
index d2cbc6c..764ca75 100755
--- i/examples/ResNet/cifar10-resnet.py
+++ w/examples/ResNet/cifar10-resnet.py
@@ -48,12 +48,10 @@ class Model(ModelDesc):
     def _build_graph(self, inputs):
         image, label = inputs
         image = image / 128.0
-        assert tf.test.is_gpu_available()
-        image = tf.transpose(image, [0, 3, 1, 2])
 
         def residual(name, l, increase_dim=False, first=False):
             shape = l.get_shape().as_list()
-            in_channel = shape[1]
+            in_channel = shape[3]
 
             if increase_dim:
                 out_channel = in_channel * 2
@@ -68,12 +66,12 @@ class Model(ModelDesc):
                 c2 = Conv2D('conv2', c1, out_channel)
                 if increase_dim:
                     l = AvgPooling('pool', l, 2)
-                    l = tf.pad(l, [[0, 0], [in_channel // 2, in_channel // 2], [0, 0], [0, 0]])
+                    l = tf.pad(l, [[0, 0], [0, 0], [0, 0], [in_channel // 2, in_channel // 2]])
 
                 l = c2 + l
                 return l
 
-        with argscope([Conv2D, AvgPooling, BatchNorm, GlobalAvgPooling], data_format='NCHW'), \
+        with argscope([Conv2D, AvgPooling, BatchNorm, GlobalAvgPooling], data_format='NHWC'), \
                 argscope(Conv2D, nl=tf.identity, use_bias=False, kernel_shape=3,
                          W_init=variance_scaling_initializer(mode='FAN_OUT')):
             l = Conv2D('conv0', image, 16, nl=BNReLU)

@abhijitnathwani
Copy link

abhijitnathwani commented Nov 29, 2017

@ppwwyyxx Thanks a ton! You're a savior. Works well now.

@Superlee506
Copy link

@ppwwyyxx
When I tested the "roi_align" function, I also meet this problem and I used GPU.
2018-05-08 17:24:44.425156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7) 2018-05-08 17:24:44.624317: E tensorflow/core/common_runtime/executor.cc:643] Executor failed to create kernel. Invalid argument: Default AvgPoolingOp only supports NHWC. [[Node: roi_align/AvgPool = AvgPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="SAME", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:GPU:0"](roi_align/crop_and_resize/transpose_1)]]

@ppwwyyxx
Copy link
Collaborator

ppwwyyxx commented May 8, 2018

@Superlee506 When reporting problems please following the issue template. Thanks!

@ppwwyyxx ppwwyyxx added the unrelated unrelated to tensorpack / invalid questions / questions that are not helpful to others label May 8, 2018
@Superlee506
Copy link

@ppwwyyxx Ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples unrelated unrelated to tensorpack / invalid questions / questions that are not helpful to others
Projects
None yet
Development

No branches or pull requests

4 participants