LSTM is not working with ModelCheckpoint callback #30990

yiyang-yu · 2019-07-24T13:58:22Z

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 10.14.5
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary): pip
TensorFlow version (use command below): v2.0.0-beta0-16-g1d91213fe7 2.0.0-beta1
Python version: 3.6.7
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:

You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior

While training a model with tf.keras.layers.LSTM and having tf.keras.callbacks.ModelCheckpoint in callbacks, model.fit stops with an error message at end of last epoch, and no model weights is saved as ModelCheckpoint should do.

Describe the expected behavior

model.fit should train the model, and model weights should be saved in desired files.

Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Here is an example which reproduces this error:

import tensorflow as tf

from tensorflow.python.keras import layers
from tensorflow.python.keras.callbacks import ModelCheckpoint

model = tf.keras.Sequential()
model.add(layers.LSTM(units=64, input_shape=(28, 28), return_sequences=False))
model.add(layers.Dense(10, activation='softmax'))

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
sample, sample_label = x_train[0], y_train[0]

model.compile(loss='sparse_categorical_crossentropy',
              optimizer='sgd',
              metrics=[])

callback = ModelCheckpoint(filepath='saved/',
                           monitor='val_loss',
                           save_weights_only=False,
                           mode='min', save_freq='epoch')

model.fit(x_train, y_train,
          validation_data=(x_test, y_test),
          batch_size=64, epochs=2, callbacks=[callback])

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

Here is error message after model.fit with the precedent code snippet:

W0724 14:56:18.298580 4508739008 deprecation.py:323] From /Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Train on 60000 samples, validate on 10000 samples
Epoch 1/2
59904/60000 [============================>.] - ETA: 0s - loss: 2.2443W0724 14:57:10.469849 4508739008 saved_model.py:733] Skipping full serialization of object <tensorflow.python.keras.layers.recurrent.LSTM object at 0xb286c80b8>, because an error occurred while tracing layer functions. Error message: in converted code:
    relative to /Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras:
    saving/saved_model.py:1143 call_and_return_conditional_losses  *
        return layer_call(inputs, training=training), layer.get_losses_for(inputs)
    layers/recurrent.py:2533 call
        inputs, mask=mask, training=training, initial_state=initial_state)
    layers/recurrent.py:743 call
        zero_output_for_mask=self.zero_output_for_mask)
    backend.py:3806 rnn
        input_time_zero, tuple(initial_states) + tuple(constants))
    layers/recurrent.py:728 step
        output, new_states = self.cell.call(inputs, states, **kwargs)
    TypeError: wrapped_call() takes 1 positional argument but 2 were given
W0724 14:57:10.520410 4508739008 saved_model.py:733] Skipping full serialization of object <tensorflow.python.keras.engine.sequential.Sequential object at 0x10f940518>, because an error occurred while tracing layer functions. Error message: in converted code:
    relative to /Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras:
    saving/saved_model.py:1143 call_and_return_conditional_losses  *
        return layer_call(inputs, training=training), layer.get_losses_for(inputs)
    engine/sequential.py:248 call
        return super(Sequential, self).call(inputs, training=training, mask=mask)
    engine/network.py:753 call
        return self._run_internal_graph(inputs, training=training, mask=mask)
    engine/network.py:895 _run_internal_graph
        output_tensors = layer(computed_tensors, **kwargs)
    layers/recurrent.py:619 __call__
        return super(RNN, self).__call__(inputs, **kwargs)
    engine/base_layer.py:667 __call__
        outputs = call_fn(inputs, *args, **kwargs)
    layers/recurrent.py:2533 call
        inputs, mask=mask, training=training, initial_state=initial_state)
    layers/recurrent.py:743 call
        zero_output_for_mask=self.zero_output_for_mask)
    backend.py:3806 rnn
        input_time_zero, tuple(initial_states) + tuple(constants))
    layers/recurrent.py:728 step
        output, new_states = self.cell.call(inputs, states, **kwargs)
    TypeError: wrapped_call() takes 1 positional argument but 2 were given
2019-07-24 14:57:10.531729: W tensorflow/python/util/util.cc:280] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
60000/60000 [==============================] - 51s 854us/sample - loss: 2.2442 - val_loss: 2.1145
Epoch 2/2
59968/60000 [============================>.] - ETA: 0s - loss: 1.9488Traceback (most recent call last):
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-d29c3be2bf34>", line 1, in <module>
    runfile('/Users/user/Desktop/tf2-rnn-callback-bugcheck/main.py', wdir='/Users/user/Desktop/tf2-rnn-callback-bugcheck')
  File "/Users/user/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-0/191.7479.30/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Users/user/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-0/191.7479.30/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/user/Desktop/tf2-rnn-callback-bugcheck/main.py", line 37, in <module>
    batch_size=64, epochs=2, callbacks=[callback])
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 643, in fit
    use_multiprocessing=use_multiprocessing)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 664, in fit
    steps_name='steps_per_epoch')
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 439, in model_iteration
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 295, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 961, in on_epoch_end
    self._save_model(epoch=epoch, logs=logs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 1008, in _save_model
    self.model.save(filepath, overwrite=True)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1213, in save
    saving.save_model(self, filepath, overwrite, include_optimizer, save_format)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py", line 106, in save_model
    saved_model.save(model, filepath, overwrite, include_optimizer)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model.py", line 1492, in save
    save_lib.save(model, filepath)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/saved_model/save.py", line 812, in save
    checkpoint_graph_view)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/saved_model/signature_serialization.py", line 65, in find_function_to_export
    functions = saveable_view.list_functions(saveable_view.root)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/saved_model/save.py", line 139, in list_functions
    self._serialization_cache)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2249, in _list_functions_for_serialization
    fns = (saved_model.serialize_all_attributes(self, serialization_cache)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model.py", line 723, in serialize_all_attributes
    function_dict['_default_save_signature'] = _default_save_signature(layer)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model.py", line 881, in _default_save_signature
    fn.get_concrete_function()
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 681, in get_concrete_function
    self._initialize(args, kwargs, add_initializers_to=initializer_map)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 359, in _initialize
    *args, **kwds))
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1360, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1648, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1541, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 716, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 309, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/saving/saving_utils.py", line 139, in _wrapped_model
    outputs_list = nest.flatten(model(inputs=inputs))
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 667, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/sequential.py", line 248, in call
    return super(Sequential, self).call(inputs, training=training, mask=mask)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 753, in call
    return self._run_internal_graph(inputs, training=training, mask=mask)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 895, in _run_internal_graph
    output_tensors = layer(computed_tensors, **kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/layers/recurrent.py", line 619, in __call__
    return super(RNN, self).__call__(inputs, **kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 667, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2533, in call
    inputs, mask=mask, training=training, initial_state=initial_state)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/layers/recurrent.py", line 743, in call
    zero_output_for_mask=self.zero_output_for_mask)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3806, in rnn
    input_time_zero, tuple(initial_states) + tuple(constants))
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/layers/recurrent.py", line 728, in step
    output, new_states = self.cell.call(inputs, states, **kwargs)
TypeError: wrapped_call() takes 1 positional argument but 2 were given

The same error occurs on a Linux machine with tf-nightly (2.0.0-dev20190723).

Thanks for help!

The text was updated successfully, but these errors were encountered:

iwonajs · 2019-07-25T02:13:20Z

I am experiencing a similar issue.

print(tf.version.GIT_VERSION, tf.version.VERSION) > 2.2.4-tf

OS: Ubuntu
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic

running on CPUs
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Thread(s) per core: 2
Intel(R) Xeon(R) W-2155 CPU @ 3.30GHz

Data Generated using a class which inherits from Sequence
from tensorflow.python.keras.utils import Sequence

this works fine:
model.fit_generator(generator=gent, validation_data=genv, epochs=10, callbacks=[logger])

this causes an error:
callback = ModelCheckpoint(checkpoint_file, verbose=1)
model.fit_generator(generator=gent, validation_data=genv, epochs=10, callbacks=[callback])

Error:
Epoch 1/10
7/8 [=========================>....] - ETA: 1s - loss: 1.5245 - acc: 0.2857
Epoch 00001: saving model to /.../checkpoints/weights-improvement-01.hdf5
Traceback (most recent call last):
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/model_base_baseline.py", line 380, in
model.fit_generator(generator=gent, validation_data=genv, epochs=10, callbacks=[callback]) #, use_multiprocessing=True, workers=20)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator
steps_name='steps_per_epoch')
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 331, in model_iteration
callbacks.on_epoch_end(epoch, epoch_logs)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 311, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 969, in on_epoch_end
self._save_model(epoch=epoch, logs=logs)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 1018, in _save_model
self.model.save(filepath, overwrite=True)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1211, in save
saving.save_model(self, filepath, overwrite, include_optimizer, save_format)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py", line 113, in save_model
model, filepath, overwrite, include_optimizer)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 99, in save_model_to_hdf5
'config': model.get_config()
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 940, in get_config
layer_config = layer.get_config()
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 919, in get_config
raise NotImplementedError
NotImplementedError

Process finished with exit code 1

Please help.
model_base_baseline.txt
nn_data_generator.txt

gadagashwini-zz · 2019-07-25T08:53:48Z

I am able to reproduce the issue on Colab with Tensorflow version 2.0.0.beta1. Please take a look at gist of Colab. Thanks!

jvishnuvardhan · 2019-07-25T17:30:16Z

I could reproduce the issue with !pip install tf-nightly-gpu-2.0-preview==2.0.0.dev20190724. Here is the gist. Thanks

qlzh727 · 2019-08-01T21:14:51Z

Thanks for reporting the issue, let me take a look.

qlzh727 · 2019-08-01T22:57:18Z

I wasn't be able to reproduce issue with latest nightly in tf-nightly-gpu-2.0-preview==2.0.0.dev20190731, somehow the issue was fixed recently. Can u have a try again?

Thanks.

jvishnuvardhan · 2019-08-02T04:24:25Z

I am closing the issue. I can confirm that the issue wasn't reproducible with tf-nightly-gpu-2.0-preview==2.0.0.dev20190731. Here is the gist for your reference. Thanks!

tensorflow-bot · 2019-08-02T04:24:27Z

Are you satisfied with the resolution of your issue?
Yes
No

gadagashwini-zz self-assigned this Jul 25, 2019

gadagashwini-zz added 2.0.0-beta0 comp:keras Keras related issues type:bug Bug labels Jul 25, 2019

gadagashwini-zz assigned jvishnuvardhan and unassigned gadagashwini-zz Jul 25, 2019

jvishnuvardhan assigned qlzh727 and unassigned jvishnuvardhan Jul 25, 2019

jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jul 25, 2019

jvishnuvardhan closed this as completed Aug 2, 2019

lvenugopalan added the TF 2.0 Issues relating to TensorFlow 2.0 label Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM is not working with ModelCheckpoint callback #30990

LSTM is not working with ModelCheckpoint callback #30990

yiyang-yu commented Jul 24, 2019 •

edited

iwonajs commented Jul 25, 2019

gadagashwini-zz commented Jul 25, 2019

jvishnuvardhan commented Jul 25, 2019

qlzh727 commented Aug 1, 2019

qlzh727 commented Aug 1, 2019

jvishnuvardhan commented Aug 2, 2019

tensorflow-bot bot commented Aug 2, 2019

LSTM is not working with ModelCheckpoint callback #30990

LSTM is not working with ModelCheckpoint callback #30990

Comments

yiyang-yu commented Jul 24, 2019 • edited

iwonajs commented Jul 25, 2019

I am experiencing a similar issue.

print(tf.version.GIT_VERSION, tf.version.VERSION) > 2.2.4-tf

OS: Ubuntu Distributor ID: Ubuntu Description: Ubuntu 18.04.2 LTS Release: 18.04 Codename: bionic

running on CPUs Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 20 On-line CPU(s) list: 0-19 Thread(s) per core: 2 Intel(R) Xeon(R) W-2155 CPU @ 3.30GHz

Data Generated using a class which inherits from Sequence from tensorflow.python.keras.utils import Sequence

this works fine: model.fit_generator(generator=gent, validation_data=genv, epochs=10, callbacks=[logger])

this causes an error: callback = ModelCheckpoint(checkpoint_file, verbose=1) model.fit_generator(generator=gent, validation_data=genv, epochs=10, callbacks=[callback])

gadagashwini-zz commented Jul 25, 2019

jvishnuvardhan commented Jul 25, 2019

qlzh727 commented Aug 1, 2019

qlzh727 commented Aug 1, 2019

jvishnuvardhan commented Aug 2, 2019

tensorflow-bot bot commented Aug 2, 2019

yiyang-yu commented Jul 24, 2019 •

edited

OS: Ubuntu
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic

running on CPUs
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Thread(s) per core: 2
Intel(R) Xeon(R) W-2155 CPU @ 3.30GHz

Data Generated using a class which inherits from Sequence
from tensorflow.python.keras.utils import Sequence

this works fine:
model.fit_generator(generator=gent, validation_data=genv, epochs=10, callbacks=[logger])

this causes an error:
callback = ModelCheckpoint(checkpoint_file, verbose=1)
model.fit_generator(generator=gent, validation_data=genv, epochs=10, callbacks=[callback])