Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TF 2.0] Using Keras custom layers, cannot learn on Colaboratory on TPU. #33890

Closed
july1997 opened this issue Oct 31, 2019 · 3 comments
Closed

Comments

@july1997
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Colaboratory on TPU
  • TensorFlow version (use command below):2.0.0
  • Python version:Python 3.6.8

Describe the current behavior
Tansfomer, which uses Keras custom layers, cannot learn on Colaboratory's TPU.
It shows the error below.

UnimplementedError:  Compilation failure: Asked to propagate a dynamic dimension from hlo %scatter.14694 = f32[8333,256]{1,0} scatter(f32[8333,256]{1,0} %broadcast.14689, s32[320]{0} %reshape.2569, f32[320,256]{1,0} %reshape.14686), update_window_dims={1}, inserted_window_dims={0}, scatter_dims_to_operand_dims={0}, index_vector_dim=1, to_apply=%scatter-combiner.14690, metadata={op_type="UnsortedSegmentSum" op_name="Adam/CrossReplicaSum/input"}@{}@0 to hlo %all-reduce.14699 = f32[8333,256]{1,0} all-reduce(f32[8333,256]{1,0} %scatter.14694), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.14695, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	TPU compilation failed
	 [[{{node tpu_compile_succeeded_assert/_16451521731088977986/_6}}]]
Additional GRPC error information:
{"created":"@1572537805.656904528","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":" Compilation failure: Asked to propagate a dynamic dimension from hlo %scatter.14694 = f32[8333,256]{1,0} scatter(f32[8333,256]{1,0} %broadcast.14689, s32[320]{0} %reshape.2569, f32[320,256]{1,0} %reshape.14686), update_window_dims={1}, inserted_window_dims={0}, scatter_dims_to_operand_dims={0}, index_vector_dim=1, to_apply=%scatter-combiner.14690, metadata={op_type="UnsortedSegmentSum" op_name="Adam/CrossReplicaSum/input"}@{}@0 to hlo %all-reduce.14699 = f32[8333,256]{1,0} all-reduce(f32[8333,256]{1,0} %scatter.14694), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.14695, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.\n\tTPU compilation failed\n\t [[{{node tpu_compile_succeeded_assert/_16451521731088977986/_6}}]]","grpc_status":12} [Op:__inference_distributed_function_41790]

Function call stack:
distributed_function -> distributed_function

The code I used is from the following site and I wanted it to work with TPU.
(https://medium.com/tensorflow/a-transformer-chatbot-tutorial-with-tensorflow-2-0-88bf59e66fe2)

Describe the expected behavior
The Model can be learned on Colaboratory's TPU using custom layers.

Code to reproduce the issue
Here is a notebook to reproduce the problem. Look at this.
https://colab.research.google.com/github/july1997/transformer_chatbot_tpu/blob/master/transformer_chatbot_tf2_fix_tpu.ipynb

Other info / logs
Here all Logs.

INFO:tensorflow:Initializing the TPU system: 10.8.123.210:8470
INFO:tensorflow:Initializing the TPU system: 10.8.123.210:8470
INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Finished initializing TPU system.
INFO:tensorflow:Finished initializing TPU system.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
Train on 689 steps
Epoch 1/20
---------------------------------------------------------------------------
UnimplementedError                        Traceback (most recent call last)
<ipython-input-23-234b607b615c> in <module>()
     34   model.compile(optimizer=optimizer, loss=loss_function, metrics=[accuracy])
     35 
---> 36   model.fit(create_dataset(questions, answers), epochs=EPOCHS)

11 frames
/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    726         max_queue_size=max_queue_size,
    727         workers=workers,
--> 728         use_multiprocessing=use_multiprocessing)
    729 
    730   def evaluate(self,

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/training_distributed.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, **kwargs)
    683         validation_steps=validation_steps,
    684         validation_freq=validation_freq,
--> 685         steps_name='steps_per_epoch')
    686 
    687   def evaluate(self,

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, validation_freq, mode, validation_in_fit, prepared_feed_values_from_dataset, steps_name, **kwargs)
    297           else:
    298             actual_inputs = ins()
--> 299           batch_outs = f(actual_inputs)
    300         except errors.OutOfRangeError:
    301           if is_dataset:

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/distribute/distributed_training_utils.py in execution_function(input_fn)
    876       def execution_function(input_fn):
    877         # `numpy` translates Tensors to values in Eager mode.
--> 878         return [out.numpy() for out in distributed_function(input_fn)]
    879     else:
    880       execution_function = distributed_function

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/def_function.py in __call__(self, *args, **kwds)
    455 
    456     tracing_count = self._get_tracing_count()
--> 457     result = self._call(*args, **kwds)
    458     if tracing_count == self._get_tracing_count():
    459       self._call_counter.called_without_tracing()

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/def_function.py in _call(self, *args, **kwds)
    518         # Lifting succeeded, so variables are initialized and we can run the
    519         # stateless function.
--> 520         return self._stateless_fn(*args, **kwds)
    521     else:
    522       canon_args, canon_kwds = \

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/function.py in __call__(self, *args, **kwargs)
   1821     """Calls a graph function specialized to the inputs."""
   1822     graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 1823     return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
   1824 
   1825   @property

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/function.py in _filtered_call(self, args, kwargs)
   1139          if isinstance(t, (ops.Tensor,
   1140                            resource_variable_ops.BaseResourceVariable))),
-> 1141         self.captured_inputs)
   1142 
   1143   def _call_flat(self, args, captured_inputs, cancellation_manager=None):

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1222     if executing_eagerly:
   1223       flat_outputs = forward_function.call(
-> 1224           ctx, args, cancellation_manager=cancellation_manager)
   1225     else:
   1226       gradient_name = self._delayed_rewrite_functions.register()

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    509               inputs=args,
    510               attrs=("executor_type", executor_type, "config_proto", config),
--> 511               ctx=ctx)
    512         else:
    513           outputs = execute.execute_with_cancellation(

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     65     else:
     66       message = e.message
---> 67     six.raise_from(core._status_to_exception(e.code, message), None)
     68   except TypeError as e:
     69     keras_symbolic_tensors = [

/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

UnimplementedError:  Compilation failure: Asked to propagate a dynamic dimension from hlo %scatter.14694 = f32[8333,256]{1,0} scatter(f32[8333,256]{1,0} %broadcast.14689, s32[320]{0} %reshape.2569, f32[320,256]{1,0} %reshape.14686), update_window_dims={1}, inserted_window_dims={0}, scatter_dims_to_operand_dims={0}, index_vector_dim=1, to_apply=%scatter-combiner.14690, metadata={op_type="UnsortedSegmentSum" op_name="Adam/CrossReplicaSum/input"}@{}@0 to hlo %all-reduce.14699 = f32[8333,256]{1,0} all-reduce(f32[8333,256]{1,0} %scatter.14694), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.14695, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	TPU compilation failed
	 [[{{node tpu_compile_succeeded_assert/_16451521731088977986/_6}}]]
Additional GRPC error information:
{"created":"@1572537805.656904528","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":" Compilation failure: Asked to propagate a dynamic dimension from hlo %scatter.14694 = f32[8333,256]{1,0} scatter(f32[8333,256]{1,0} %broadcast.14689, s32[320]{0} %reshape.2569, f32[320,256]{1,0} %reshape.14686), update_window_dims={1}, inserted_window_dims={0}, scatter_dims_to_operand_dims={0}, index_vector_dim=1, to_apply=%scatter-combiner.14690, metadata={op_type="UnsortedSegmentSum" op_name="Adam/CrossReplicaSum/input"}@{}@0 to hlo %all-reduce.14699 = f32[8333,256]{1,0} all-reduce(f32[8333,256]{1,0} %scatter.14694), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.14695, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.\n\tTPU compilation failed\n\t [[{{node tpu_compile_succeeded_assert/_16451521731088977986/_6}}]]","grpc_status":12} [Op:__inference_distributed_function_41790]

Function call stack:
distributed_function -> distributed_function
@nikochiko
Copy link
Contributor

It is a known issue #33517 . According to @yunxing (#33517 (comment) ), the fix should have been applied in the tf-nightly release.

@july1997
Copy link
Author

july1997 commented Nov 1, 2019

Thank you for telling me! I try tf-nightly release.

@july1997 july1997 closed this as completed Nov 1, 2019
@mirik123
Copy link

mirik123 commented Oct 8, 2021

I have the same issue with TF 2.6.0:

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/context.py in sync_executors(self)
    672     """
    673     if self._context_handle:
--> 674       pywrap_tfe.TFE_ContextSyncExecutors(self._context_handle)
    675     else:
    676       raise ValueError("Context is not initialized.")

UnimplementedError: 9 root error(s) found.
  (0) Unimplemented: {{function_node __inference_train_function_94596}} Asked to propagate a dynamic dimension from hlo convolution.31970@{}@2 to hlo %all-reduce.31975 = f32[3,3,<=3,40]{3,2,1,0} all-reduce(f32[3,3,<=3,40]{3,2,1,0} %convolution.31970), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.31971, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	 [[{{node TPUReplicate/_compile/_3223262300518379275/_5}}]]
	 [[tpu_compile_succeeded_assert/_4226575268796567461/_6/_145]]
  (1) Unimplemented: {{function_node __inference_train_function_94596}} Asked to propagate a dynamic dimension from hlo convolution.31970@{}@2 to hlo %all-reduce.31975 = f32[3,3,<=3,40]{3,2,1,0} all-reduce(f32[3,3,<=3,40]{3,2,1,0} %convolution.31970), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.31971, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	 [[{{node TPUReplicate/_compile/_3223262300518379275/_5}}]]
	 [[tpu_compile_succeeded_assert/_4226575268796567461/_6/_159]]
  (2) Unimplemented: {{function_node __inference_train_function_94596}} Asked to propagate a dynamic dimension from hlo convolution.31970@{}@2 to hlo %all-reduce.31975 = f32[3,3,<=3,40]{3,2,1,0} all-reduce(f32[3,3,<=3,40]{3,2,1,0} %convolution.31970), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.31971, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	 [[{{node TPUReplicate/_compile/_3223262300518379275/_5}}]]
	 [[tpu_compile_succeeded_assert/_4226575268796567461/_6/_215]]
  (3) Unimplemented: {{function_node __inference_train_function_94596}} Asked to propagate a dynamic dimension from hlo convolution.31970@{}@2 to hlo %all-reduce.31975 = f32[3,3,<=3,40]{3,2,1,0} all-reduce(f32[3,3,<=3,40]{3,2,1,0} %convolution.31970), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.31971, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	 [[{{node TPUReplicate/_compile/_3223262300518379275/_5}}]]
	 [[TPUReplicate/_compile/_3223262300518379275/_5/_240]]
  (4) Unimplemented: {{function_node __inference_train_function_94596}} Asked to propagate a dynamic dimension from hlo convolution.31970@{}@2 to hlo %all-reduce.31975 = f32[3,3,<=3,40]{3,2,1,0} all-reduce(f32[3,3,<=3,40]{3,2,1,0} %convolution.31970), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.31971, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	 [[{{node TPUReplicate/_compile/_3223262300518379275/_5}}]]
	 [[NoOp/_279]]
  (5) Unimplemented: {{function_node __inference_train_function_94596}} Asked to propagate a dynamic dimension from hlo convolution.31970@{}@2 to hlo %all-reduce.31975 = f32[3,3,<=3,40]{3,2,1,0} all-reduce(f32[3,3,<=3,40]{3,2,1,0} %convolution.31970), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.31971, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	 [[{{node TPUReplicate/_compile/_3223262300518379275/_ ... [truncated]

The model is compiled as:
model.compile(loss='mse', metrics=[tf.keras.metrics.MeanSquaredLogarithmicError()], optimizer='adam')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants