Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM is not working with ModelCheckpoint callback #30990

Closed
yiyang-yu opened this issue Jul 24, 2019 · 7 comments
Closed

LSTM is not working with ModelCheckpoint callback #30990

yiyang-yu opened this issue Jul 24, 2019 · 7 comments
Assignees
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug

Comments

@yiyang-yu
Copy link

yiyang-yu commented Jul 24, 2019

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 10.14.5
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (use command below): v2.0.0-beta0-16-g1d91213fe7 2.0.0-beta1
  • Python version: 3.6.7
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:

You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior

While training a model with tf.keras.layers.LSTM and having tf.keras.callbacks.ModelCheckpoint in callbacks, model.fit stops with an error message at end of last epoch, and no model weights is saved as ModelCheckpoint should do.

Describe the expected behavior

model.fit should train the model, and model weights should be saved in desired files.

Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Here is an example which reproduces this error:

import tensorflow as tf

from tensorflow.python.keras import layers
from tensorflow.python.keras.callbacks import ModelCheckpoint

model = tf.keras.Sequential()
model.add(layers.LSTM(units=64, input_shape=(28, 28), return_sequences=False))
model.add(layers.Dense(10, activation='softmax'))

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
sample, sample_label = x_train[0], y_train[0]

model.compile(loss='sparse_categorical_crossentropy',
              optimizer='sgd',
              metrics=[])

callback = ModelCheckpoint(filepath='saved/',
                           monitor='val_loss',
                           save_weights_only=False,
                           mode='min', save_freq='epoch')

model.fit(x_train, y_train,
          validation_data=(x_test, y_test),
          batch_size=64, epochs=2, callbacks=[callback])

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

Here is error message after model.fit with the precedent code snippet:

W0724 14:56:18.298580 4508739008 deprecation.py:323] From /Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Train on 60000 samples, validate on 10000 samples
Epoch 1/2
59904/60000 [============================>.] - ETA: 0s - loss: 2.2443W0724 14:57:10.469849 4508739008 saved_model.py:733] Skipping full serialization of object <tensorflow.python.keras.layers.recurrent.LSTM object at 0xb286c80b8>, because an error occurred while tracing layer functions. Error message: in converted code:
    relative to /Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras:
    saving/saved_model.py:1143 call_and_return_conditional_losses  *
        return layer_call(inputs, training=training), layer.get_losses_for(inputs)
    layers/recurrent.py:2533 call
        inputs, mask=mask, training=training, initial_state=initial_state)
    layers/recurrent.py:743 call
        zero_output_for_mask=self.zero_output_for_mask)
    backend.py:3806 rnn
        input_time_zero, tuple(initial_states) + tuple(constants))
    layers/recurrent.py:728 step
        output, new_states = self.cell.call(inputs, states, **kwargs)
    TypeError: wrapped_call() takes 1 positional argument but 2 were given
W0724 14:57:10.520410 4508739008 saved_model.py:733] Skipping full serialization of object <tensorflow.python.keras.engine.sequential.Sequential object at 0x10f940518>, because an error occurred while tracing layer functions. Error message: in converted code:
    relative to /Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras:
    saving/saved_model.py:1143 call_and_return_conditional_losses  *
        return layer_call(inputs, training=training), layer.get_losses_for(inputs)
    engine/sequential.py:248 call
        return super(Sequential, self).call(inputs, training=training, mask=mask)
    engine/network.py:753 call
        return self._run_internal_graph(inputs, training=training, mask=mask)
    engine/network.py:895 _run_internal_graph
        output_tensors = layer(computed_tensors, **kwargs)
    layers/recurrent.py:619 __call__
        return super(RNN, self).__call__(inputs, **kwargs)
    engine/base_layer.py:667 __call__
        outputs = call_fn(inputs, *args, **kwargs)
    layers/recurrent.py:2533 call
        inputs, mask=mask, training=training, initial_state=initial_state)
    layers/recurrent.py:743 call
        zero_output_for_mask=self.zero_output_for_mask)
    backend.py:3806 rnn
        input_time_zero, tuple(initial_states) + tuple(constants))
    layers/recurrent.py:728 step
        output, new_states = self.cell.call(inputs, states, **kwargs)
    TypeError: wrapped_call() takes 1 positional argument but 2 were given
2019-07-24 14:57:10.531729: W tensorflow/python/util/util.cc:280] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
60000/60000 [==============================] - 51s 854us/sample - loss: 2.2442 - val_loss: 2.1145
Epoch 2/2
59968/60000 [============================>.] - ETA: 0s - loss: 1.9488Traceback (most recent call last):
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-d29c3be2bf34>", line 1, in <module>
    runfile('/Users/user/Desktop/tf2-rnn-callback-bugcheck/main.py', wdir='/Users/user/Desktop/tf2-rnn-callback-bugcheck')
  File "/Users/user/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-0/191.7479.30/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Users/user/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-0/191.7479.30/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/user/Desktop/tf2-rnn-callback-bugcheck/main.py", line 37, in <module>
    batch_size=64, epochs=2, callbacks=[callback])
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 643, in fit
    use_multiprocessing=use_multiprocessing)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 664, in fit
    steps_name='steps_per_epoch')
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 439, in model_iteration
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 295, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 961, in on_epoch_end
    self._save_model(epoch=epoch, logs=logs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 1008, in _save_model
    self.model.save(filepath, overwrite=True)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1213, in save
    saving.save_model(self, filepath, overwrite, include_optimizer, save_format)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py", line 106, in save_model
    saved_model.save(model, filepath, overwrite, include_optimizer)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model.py", line 1492, in save
    save_lib.save(model, filepath)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/saved_model/save.py", line 812, in save
    checkpoint_graph_view)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/saved_model/signature_serialization.py", line 65, in find_function_to_export
    functions = saveable_view.list_functions(saveable_view.root)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/saved_model/save.py", line 139, in list_functions
    self._serialization_cache)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2249, in _list_functions_for_serialization
    fns = (saved_model.serialize_all_attributes(self, serialization_cache)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model.py", line 723, in serialize_all_attributes
    function_dict['_default_save_signature'] = _default_save_signature(layer)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model.py", line 881, in _default_save_signature
    fn.get_concrete_function()
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 681, in get_concrete_function
    self._initialize(args, kwargs, add_initializers_to=initializer_map)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 359, in _initialize
    *args, **kwds))
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1360, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1648, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1541, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 716, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 309, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/saving/saving_utils.py", line 139, in _wrapped_model
    outputs_list = nest.flatten(model(inputs=inputs))
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 667, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/sequential.py", line 248, in call
    return super(Sequential, self).call(inputs, training=training, mask=mask)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 753, in call
    return self._run_internal_graph(inputs, training=training, mask=mask)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 895, in _run_internal_graph
    output_tensors = layer(computed_tensors, **kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/layers/recurrent.py", line 619, in __call__
    return super(RNN, self).__call__(inputs, **kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 667, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2533, in call
    inputs, mask=mask, training=training, initial_state=initial_state)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/layers/recurrent.py", line 743, in call
    zero_output_for_mask=self.zero_output_for_mask)
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3806, in rnn
    input_time_zero, tuple(initial_states) + tuple(constants))
  File "/Users/user/anaconda3/envs/tf2beta/lib/python3.6/site-packages/tensorflow/python/keras/layers/recurrent.py", line 728, in step
    output, new_states = self.cell.call(inputs, states, **kwargs)
TypeError: wrapped_call() takes 1 positional argument but 2 were given

The same error occurs on a Linux machine with tf-nightly (2.0.0-dev20190723).

Thanks for help!

@iwonajs
Copy link

iwonajs commented Jul 25, 2019

I am experiencing a similar issue.

print(tf.version.GIT_VERSION, tf.version.VERSION) > 2.2.4-tf

OS: Ubuntu
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic

running on CPUs
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Thread(s) per core: 2
Intel(R) Xeon(R) W-2155 CPU @ 3.30GHz

Data Generated using a class which inherits from Sequence
from tensorflow.python.keras.utils import Sequence

this works fine:
model.fit_generator(generator=gent, validation_data=genv, epochs=10, callbacks=[logger])

this causes an error:
callback = ModelCheckpoint(checkpoint_file, verbose=1)
model.fit_generator(generator=gent, validation_data=genv, epochs=10, callbacks=[callback])

Error:
Epoch 1/10
7/8 [=========================>....] - ETA: 1s - loss: 1.5245 - acc: 0.2857
Epoch 00001: saving model to /.../checkpoints/weights-improvement-01.hdf5
Traceback (most recent call last):
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/model_base_baseline.py", line 380, in
model.fit_generator(generator=gent, validation_data=genv, epochs=10, callbacks=[callback]) #, use_multiprocessing=True, workers=20)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator
steps_name='steps_per_epoch')
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 331, in model_iteration
callbacks.on_epoch_end(epoch, epoch_logs)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 311, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 969, in on_epoch_end
self._save_model(epoch=epoch, logs=logs)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 1018, in _save_model
self.model.save(filepath, overwrite=True)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1211, in save
saving.save_model(self, filepath, overwrite, include_optimizer, save_format)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py", line 113, in save_model
model, filepath, overwrite, include_optimizer)
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 99, in save_model_to_hdf5
'config': model.get_config()
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 940, in get_config
layer_config = layer.get_config()
File "/media/iwona/Optane/Project_BugLoc/DeepTracePy/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 919, in get_config
raise NotImplementedError
NotImplementedError

Process finished with exit code 1


Please help.
model_base_baseline.txt
nn_data_generator.txt

@gadagashwini-zz
Copy link
Contributor

I am able to reproduce the issue on Colab with Tensorflow version 2.0.0.beta1. Please take a look at gist of Colab. Thanks!

@jvishnuvardhan
Copy link
Contributor

I could reproduce the issue with !pip install tf-nightly-gpu-2.0-preview==2.0.0.dev20190724. Here is the gist. Thanks

@jvishnuvardhan jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jul 25, 2019
@qlzh727
Copy link
Member

qlzh727 commented Aug 1, 2019

Thanks for reporting the issue, let me take a look.

@qlzh727
Copy link
Member

qlzh727 commented Aug 1, 2019

I wasn't be able to reproduce issue with latest nightly in tf-nightly-gpu-2.0-preview==2.0.0.dev20190731, somehow the issue was fixed recently. Can u have a try again?

Thanks.

@jvishnuvardhan
Copy link
Contributor

I am closing the issue. I can confirm that the issue wasn't reproducible with tf-nightly-gpu-2.0-preview==2.0.0.dev20190731. Here is the gist for your reference. Thanks!

@tensorflow-bot
Copy link

tensorflow-bot bot commented Aug 2, 2019

Are you satisfied with the resolution of your issue?
Yes
No

@lvenugopalan lvenugopalan added the TF 2.0 Issues relating to TensorFlow 2.0 label Apr 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug
Projects
None yet
Development

No branches or pull requests

6 participants