[TF 2.0] Using Keras custom layers, cannot learn on Colaboratory on TPU. #33890

july1997 · 2019-10-31T16:47:10Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Colaboratory on TPU
TensorFlow version (use command below):2.0.0
Python version:Python 3.6.8

Describe the current behavior
Tansfomer, which uses Keras custom layers, cannot learn on Colaboratory's TPU.
It shows the error below.

UnimplementedError:  Compilation failure: Asked to propagate a dynamic dimension from hlo %scatter.14694 = f32[8333,256]{1,0} scatter(f32[8333,256]{1,0} %broadcast.14689, s32[320]{0} %reshape.2569, f32[320,256]{1,0} %reshape.14686), update_window_dims={1}, inserted_window_dims={0}, scatter_dims_to_operand_dims={0}, index_vector_dim=1, to_apply=%scatter-combiner.14690, metadata={op_type="UnsortedSegmentSum" op_name="Adam/CrossReplicaSum/input"}@{}@0 to hlo %all-reduce.14699 = f32[8333,256]{1,0} all-reduce(f32[8333,256]{1,0} %scatter.14694), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.14695, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	TPU compilation failed
	 [[{{node tpu_compile_succeeded_assert/_16451521731088977986/_6}}]]
Additional GRPC error information:
{"created":"@1572537805.656904528","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":" Compilation failure: Asked to propagate a dynamic dimension from hlo %scatter.14694 = f32[8333,256]{1,0} scatter(f32[8333,256]{1,0} %broadcast.14689, s32[320]{0} %reshape.2569, f32[320,256]{1,0} %reshape.14686), update_window_dims={1}, inserted_window_dims={0}, scatter_dims_to_operand_dims={0}, index_vector_dim=1, to_apply=%scatter-combiner.14690, metadata={op_type="UnsortedSegmentSum" op_name="Adam/CrossReplicaSum/input"}@{}@0 to hlo %all-reduce.14699 = f32[8333,256]{1,0} all-reduce(f32[8333,256]{1,0} %scatter.14694), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.14695, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.\n\tTPU compilation failed\n\t [[{{node tpu_compile_succeeded_assert/_16451521731088977986/_6}}]]","grpc_status":12} [Op:__inference_distributed_function_41790]

Function call stack:
distributed_function -> distributed_function

The code I used is from the following site and I wanted it to work with TPU.
(https://medium.com/tensorflow/a-transformer-chatbot-tutorial-with-tensorflow-2-0-88bf59e66fe2)

Describe the expected behavior
The Model can be learned on Colaboratory's TPU using custom layers.

Code to reproduce the issue
Here is a notebook to reproduce the problem. Look at this.
https://colab.research.google.com/github/july1997/transformer_chatbot_tpu/blob/master/transformer_chatbot_tf2_fix_tpu.ipynb

Other info / logs
Here all Logs.

INFO:tensorflow:Initializing the TPU system: 10.8.123.210:8470
INFO:tensorflow:Initializing the TPU system: 10.8.123.210:8470
INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Finished initializing TPU system.
INFO:tensorflow:Finished initializing TPU system.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
Train on 689 steps
Epoch 1/20
---------------------------------------------------------------------------
UnimplementedError                        Traceback (most recent call last)
<ipython-input-23-234b607b615c> in <module>()
     34   model.compile(optimizer=optimizer, loss=loss_function, metrics=[accuracy])
     35 
---> 36   model.fit(create_dataset(questions, answers), epochs=EPOCHS)

11 frames
/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    726         max_queue_size=max_queue_size,
    727         workers=workers,
--> 728         use_multiprocessing=use_multiprocessing)
    729 
    730   def evaluate(self,

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/training_distributed.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, **kwargs)
    683         validation_steps=validation_steps,
    684         validation_freq=validation_freq,
--> 685         steps_name='steps_per_epoch')
    686 
    687   def evaluate(self,

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, validation_freq, mode, validation_in_fit, prepared_feed_values_from_dataset, steps_name, **kwargs)
    297           else:
    298             actual_inputs = ins()
--> 299           batch_outs = f(actual_inputs)
    300         except errors.OutOfRangeError:
    301           if is_dataset:

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/distribute/distributed_training_utils.py in execution_function(input_fn)
    876       def execution_function(input_fn):
    877         # `numpy` translates Tensors to values in Eager mode.
--> 878         return [out.numpy() for out in distributed_function(input_fn)]
    879     else:
    880       execution_function = distributed_function

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/def_function.py in __call__(self, *args, **kwds)
    455 
    456     tracing_count = self._get_tracing_count()
--> 457     result = self._call(*args, **kwds)
    458     if tracing_count == self._get_tracing_count():
    459       self._call_counter.called_without_tracing()

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/def_function.py in _call(self, *args, **kwds)
    518         # Lifting succeeded, so variables are initialized and we can run the
    519         # stateless function.
--> 520         return self._stateless_fn(*args, **kwds)
    521     else:
    522       canon_args, canon_kwds = \

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/function.py in __call__(self, *args, **kwargs)
   1821     """Calls a graph function specialized to the inputs."""
   1822     graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 1823     return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
   1824 
   1825   @property

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/function.py in _filtered_call(self, args, kwargs)
   1139          if isinstance(t, (ops.Tensor,
   1140                            resource_variable_ops.BaseResourceVariable))),
-> 1141         self.captured_inputs)
   1142 
   1143   def _call_flat(self, args, captured_inputs, cancellation_manager=None):

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1222     if executing_eagerly:
   1223       flat_outputs = forward_function.call(
-> 1224           ctx, args, cancellation_manager=cancellation_manager)
   1225     else:
   1226       gradient_name = self._delayed_rewrite_functions.register()

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    509               inputs=args,
    510               attrs=("executor_type", executor_type, "config_proto", config),
--> 511               ctx=ctx)
    512         else:
    513           outputs = execute.execute_with_cancellation(

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     65     else:
     66       message = e.message
---> 67     six.raise_from(core._status_to_exception(e.code, message), None)
     68   except TypeError as e:
     69     keras_symbolic_tensors = [

/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

UnimplementedError:  Compilation failure: Asked to propagate a dynamic dimension from hlo %scatter.14694 = f32[8333,256]{1,0} scatter(f32[8333,256]{1,0} %broadcast.14689, s32[320]{0} %reshape.2569, f32[320,256]{1,0} %reshape.14686), update_window_dims={1}, inserted_window_dims={0}, scatter_dims_to_operand_dims={0}, index_vector_dim=1, to_apply=%scatter-combiner.14690, metadata={op_type="UnsortedSegmentSum" op_name="Adam/CrossReplicaSum/input"}@{}@0 to hlo %all-reduce.14699 = f32[8333,256]{1,0} all-reduce(f32[8333,256]{1,0} %scatter.14694), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.14695, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	TPU compilation failed
	 [[{{node tpu_compile_succeeded_assert/_16451521731088977986/_6}}]]
Additional GRPC error information:
{"created":"@1572537805.656904528","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":" Compilation failure: Asked to propagate a dynamic dimension from hlo %scatter.14694 = f32[8333,256]{1,0} scatter(f32[8333,256]{1,0} %broadcast.14689, s32[320]{0} %reshape.2569, f32[320,256]{1,0} %reshape.14686), update_window_dims={1}, inserted_window_dims={0}, scatter_dims_to_operand_dims={0}, index_vector_dim=1, to_apply=%scatter-combiner.14690, metadata={op_type="UnsortedSegmentSum" op_name="Adam/CrossReplicaSum/input"}@{}@0 to hlo %all-reduce.14699 = f32[8333,256]{1,0} all-reduce(f32[8333,256]{1,0} %scatter.14694), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.14695, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.\n\tTPU compilation failed\n\t [[{{node tpu_compile_succeeded_assert/_16451521731088977986/_6}}]]","grpc_status":12} [Op:__inference_distributed_function_41790]

Function call stack:
distributed_function -> distributed_function

The text was updated successfully, but these errors were encountered:

nikochiko · 2019-10-31T17:46:07Z

It is a known issue #33517 . According to @yunxing (#33517 (comment) ), the fix should have been applied in the tf-nightly release.

july1997 · 2019-11-01T03:00:39Z

Thank you for telling me! I try tf-nightly release.

mirik123 · 2021-10-08T18:00:33Z

I have the same issue with TF 2.6.0:

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/context.py in sync_executors(self)
    672     """
    673     if self._context_handle:
--> 674       pywrap_tfe.TFE_ContextSyncExecutors(self._context_handle)
    675     else:
    676       raise ValueError("Context is not initialized.")

UnimplementedError: 9 root error(s) found.
  (0) Unimplemented: {{function_node __inference_train_function_94596}} Asked to propagate a dynamic dimension from hlo convolution.31970@{}@2 to hlo %all-reduce.31975 = f32[3,3,<=3,40]{3,2,1,0} all-reduce(f32[3,3,<=3,40]{3,2,1,0} %convolution.31970), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.31971, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	 [[{{node TPUReplicate/_compile/_3223262300518379275/_5}}]]
	 [[tpu_compile_succeeded_assert/_4226575268796567461/_6/_145]]
  (1) Unimplemented: {{function_node __inference_train_function_94596}} Asked to propagate a dynamic dimension from hlo convolution.31970@{}@2 to hlo %all-reduce.31975 = f32[3,3,<=3,40]{3,2,1,0} all-reduce(f32[3,3,<=3,40]{3,2,1,0} %convolution.31970), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.31971, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	 [[{{node TPUReplicate/_compile/_3223262300518379275/_5}}]]
	 [[tpu_compile_succeeded_assert/_4226575268796567461/_6/_159]]
  (2) Unimplemented: {{function_node __inference_train_function_94596}} Asked to propagate a dynamic dimension from hlo convolution.31970@{}@2 to hlo %all-reduce.31975 = f32[3,3,<=3,40]{3,2,1,0} all-reduce(f32[3,3,<=3,40]{3,2,1,0} %convolution.31970), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.31971, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	 [[{{node TPUReplicate/_compile/_3223262300518379275/_5}}]]
	 [[tpu_compile_succeeded_assert/_4226575268796567461/_6/_215]]
  (3) Unimplemented: {{function_node __inference_train_function_94596}} Asked to propagate a dynamic dimension from hlo convolution.31970@{}@2 to hlo %all-reduce.31975 = f32[3,3,<=3,40]{3,2,1,0} all-reduce(f32[3,3,<=3,40]{3,2,1,0} %convolution.31970), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.31971, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	 [[{{node TPUReplicate/_compile/_3223262300518379275/_5}}]]
	 [[TPUReplicate/_compile/_3223262300518379275/_5/_240]]
  (4) Unimplemented: {{function_node __inference_train_function_94596}} Asked to propagate a dynamic dimension from hlo convolution.31970@{}@2 to hlo %all-reduce.31975 = f32[3,3,<=3,40]{3,2,1,0} all-reduce(f32[3,3,<=3,40]{3,2,1,0} %convolution.31970), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.31971, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	 [[{{node TPUReplicate/_compile/_3223262300518379275/_5}}]]
	 [[NoOp/_279]]
  (5) Unimplemented: {{function_node __inference_train_function_94596}} Asked to propagate a dynamic dimension from hlo convolution.31970@{}@2 to hlo %all-reduce.31975 = f32[3,3,<=3,40]{3,2,1,0} all-reduce(f32[3,3,<=3,40]{3,2,1,0} %convolution.31970), replica_groups={{0,1,2,3,4,5,6,7}}, to_apply=%sum.31971, metadata={op_type="CrossReplicaSum" op_name="Adam/CrossReplicaSum"}, which is not implemented.
	 [[{{node TPUReplicate/_compile/_3223262300518379275/_ ... [truncated]

The model is compiled as:
model.compile(loss='mse', metrics=[tf.keras.metrics.MeanSquaredLogarithmicError()], optimizer='adam')

july1997 closed this as completed Nov 1, 2019

vivekjoshy mentioned this issue May 18, 2024

TPU v3-8 CrossReplicaSum_33 Error #68210

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TF 2.0] Using Keras custom layers, cannot learn on Colaboratory on TPU. #33890

[TF 2.0] Using Keras custom layers, cannot learn on Colaboratory on TPU. #33890

july1997 commented Oct 31, 2019

nikochiko commented Oct 31, 2019

july1997 commented Nov 1, 2019

mirik123 commented Oct 8, 2021 •

edited

[TF 2.0] Using Keras custom layers, cannot learn on Colaboratory on TPU. #33890

[TF 2.0] Using Keras custom layers, cannot learn on Colaboratory on TPU. #33890

Comments

july1997 commented Oct 31, 2019

nikochiko commented Oct 31, 2019

july1997 commented Nov 1, 2019

mirik123 commented Oct 8, 2021 • edited

mirik123 commented Oct 8, 2021 •

edited