Skip to content

Kernel freeze at tf.keras.Sequential.fit() #449

@rafalpotempa

Description

@rafalpotempa

What I did?

Link to Colab: https://colab.research.google.com/drive/1g6BFapSuG0-WCQzxlrDsPKCcmaGemB9f?usp=sharing

Please use emails connected to the GitHub account for request - I'll accept it.
Notebook is related to my graduation project and I don't want the work to go fully public yet.

I created a custom layer with quantum circuit in quantum_circuit() to represent 8x8 image - 4 readout qubits with two H gates, connected to 16 qubits by ZZ**(param) gates for each of 4 readouts. (8x8 extension of what can be found in MNIST Classification example.

The image is divided into 4 4x4 pieces, each connected to single readout qubit.

The data is represented similarly to what can be found in the example (X gate if normalized_color > 0.5).

I attached a softmax layer directly to quantum one for classification using tf.keras.Sequential model, since I want to extend it further - up to all 10 digits.

qnn_model = tf.keras.Sequential([
    tf.keras.Input(shape=(), dtype=tf.string, name='q_input'),
    tfq.layers.PQC(model_circuit, model_readout, name='quantum'),
    tf.keras.layers.Dense(2, activation=tf.keras.activations.softmax, name='softmax'),
])
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
quantum (PQC)                (None, 4)                 64        
_________________________________________________________________
softmax (Dense)              (None, 2)                 10        
=================================================================
Total params: 74
Trainable params: 74
Non-trainable params: 0
_________________________________________________________________

I compiled the model and I tried to fit it.

What was expected to happen?

The model should start to iterate over given number of epochs.

What happened?

Epoch 1/10 is displayed, but nothing else happens.

  • The Colab kernel restarts yielding log, that can be found in the Attachements section.
  • Using WSL2 local environment I just encountered something I would call 'a kernel freeze'. The cell was trying to run, but there was nothing happening - no CPU, RAM usage. The operation could not have been interrupted - only kernel restart worked.

Environment

tensorflow          2.3.1
tensorflow-quantum  0.4.0

for both:

  • Google Colab
  • Windows Subsystem Linux 2 (Ubuntu 20.04.1 LTS; Windows 10 Pro, build 20270)

No GPU involved.

What I found out?

When I try to run the notebook with compressed_image_size = 4 everything works as intended. I've checked my quantum_circuit() and it seems to be working as intended for version 8x8 - it generates circuit with desired architecture.

When I tried to trace down the error I found out that:

data_adapter.py:
enumerate_epochs() yields correct epoch, but the tf.data.Iterator data_iterator has AttributeErrors like

AttributeError: 'OwnedIterator' object has no attribute '_self_unconditional_checkpoint_dependencies'

in

  • _checkpoint_dependencies
  • _deferred_dependencies
AttributeError: 'OwnedIterator' object has no attribute '_self_name_based_restores'
  • _name_based_restores

and:

AttributeError("'OwnedIterator' object has no attribute '_self_unconditional_checkpoint_dependencies'")
AttributeError("'OwnedIterator' object has no attribute '_self_unconditional_dependency_names'")
AttributeError("'OwnedIterator' object has no attribute '_self_update_uid'")

I'm not sure if this is relevant.

Attachments

colab-jupyter.log

Dec 15, 2020, 10:41:32 AM | WARNING | WARNING:root:kernel b6193863-8d44-476f-b8cc-eadbe7129967 restarted
Dec 15, 2020, 10:41:32 AM | INFO | KernelRestarter: restarting kernel (1/5), keep random ports
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.133076: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.133022: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1b91640 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.131837: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2199995000 Hz
Dec 15, 2020, 10:40:56 AM | WARNING | To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.125112: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.124271: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (0071d832075f): /proc/driver/nvidia/version does not exist
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.123595: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.109400: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
Dec 15, 2020, 10:40:53 AM | WARNING | 2020-12-15 09:40:53.250994: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Dec 15, 2020, 10:37:53 AM | WARNING | WARNING:root:kernel b6193863-8d44-476f-b8cc-eadbe7129967 restarted
Dec 15, 2020, 10:37:53 AM | INFO | KernelRestarter: restarting kernel (1/5), keep random ports
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.601416: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.601370: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x20c3640 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.600345: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2199995000 Hz
Dec 15, 2020, 10:36:24 AM | WARNING | To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.593357: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.592695: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (0071d832075f): /proc/driver/nvidia/version does not exist
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.592632: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.531111: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
Dec 15, 2020, 10:36:20 AM | WARNING | 2020-12-15 09:36:20.926549: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Dec 15, 2020, 10:36:01 AM | INFO | Adapting to protocol v5.1 for kernel b6193863-8d44-476f-b8cc-eadbe7129967
Dec 15, 2020, 10:33:42 AM | INFO | Adapting to protocol v5.1 for kernel b6193863-8d44-476f-b8cc-eadbe7129967
Dec 15, 2020, 10:33:41 AM | INFO | Kernel started: b6193863-8d44-476f-b8cc-eadbe7129967
Dec 15, 2020, 10:33:13 AM | INFO | Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Dec 15, 2020, 10:33:13 AM | INFO | http://172.28.0.2:9000/
Dec 15, 2020, 10:33:13 AM | INFO | The Jupyter Notebook is running at:
Dec 15, 2020, 10:33:13 AM | INFO | 0 active kernels
Dec 15, 2020, 10:33:13 AM | INFO | Serving notebooks from local directory: /
Dec 15, 2020, 10:33:13 AM | INFO | google.colab serverextension initialized.
Dec 15, 2020, 10:33:13 AM | INFO | Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
Dec 15, 2020, 10:33:13 AM | WARNING | Config option `delete_to_trash` not recognized by `ColabFileContentsManager`.

Metadata

Metadata

Labels

kind/bug-reportSomething doesn't seem to work

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions