Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TF 2.0 keras] Unable save and load weights for double nested models #27769

Closed
zzh8829 opened this issue Apr 12, 2019 · 16 comments
Closed

[TF 2.0 keras] Unable save and load weights for double nested models #27769

zzh8829 opened this issue Apr 12, 2019 · 16 comments
Assignees
Labels
comp:keras Keras related issues type:bug Bug

Comments

@zzh8829
Copy link

zzh8829 commented Apr 12, 2019

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): mac
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.0.0
  • Python version: 3.7

Describe the current behavior
load_weights throw exception on a doubly nested model

Describe the expected behavior
load_weights should work

This problem only happens on two+ layers of nested model with non-trainable weights.
The reason is save_weights and load_weights handles nested model differently
save_weights -> call layer.weights for each layer
load_weights -> recursively call model.weights if layer is a nested Model

Code to reproduce the issue

import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization

shape = (None, None, 3)

def BNModel():
    x = inputs = Input(shape)
    x = Conv2D(3, 1)(x)
    x = BatchNormalization()(x)
    return Model(inputs, x)

x = inner_inputs = Input(shape)
x = BNModel()(x)
x = BNModel()(x)
inner_model = Model(inner_inputs, x)

inputs = Input(shape)
model = Model(inputs, inner_model(inputs))

inner_model.save_weights('test.h5')
inner_model.load_weights('test.h5')  # works fine

model.save_weights('test.h5')
model.load_weights('test.h5')   # Exception: axes don't match array !!!

Other info / logs
This bug is also reported on upstream keras keras-team/keras#11847
Here is a detailed analysis on why this is happening keras-team/keras#11847 (comment)

Full Exception

  File "test.py", line 27, in <module>
    model.load_weights('test.h5')   # Exception: axes don't match array !!!
  File "/usr/local/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1497, in load_weights
    hdf5_format.load_weights_from_hdf5_group(f, self.layers)
  File "/usr/local/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 751, in load_weights_from_hdf5_group
    layer, weight_values, original_keras_version, original_backend)
  File "/usr/local/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 377, in preprocess_weights_for_loading
    weights = convert_nested_model(weights)
  File "/usr/local/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 365, in convert_nested_model
    original_backend=original_backend))
  File "/usr/local/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 377, in preprocess_weights_for_loading
    weights = convert_nested_model(weights)
  File "/usr/local/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 353, in convert_nested_model
    original_backend=original_backend))
  File "/usr/local/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 459, in preprocess_weights_for_loading
    weights[0] = np.transpose(weights[0], (3, 2, 0, 1))
  File "/usr/local/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 598, in transpose
    return _wrapfunc(a, 'transpose', axes)
  File "/usr/local/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 51, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
ValueError: axes don't match array
@zzh8829
Copy link
Author

zzh8829 commented Apr 12, 2019

This only affected .h5 format, tensorflow checkpoints format works fine.
I guess alternatively we can tell users to not use h5 format instead of fixing it

@jvishnuvardhan jvishnuvardhan self-assigned this Apr 15, 2019
@jvishnuvardhan jvishnuvardhan added comp:keras Keras related issues type:bug Bug labels Apr 15, 2019
@jvishnuvardhan jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 15, 2019
@abhigyank
Copy link

@zzh8829 What is the alternative way to save a model/weights? I am having this proble min .hdf5 fromat too.

@zzh8829
Copy link
Author

zzh8829 commented May 4, 2019

@abhigyank the alternative is save to *.tf which will create tensorflow checkpoint files instead of hdf5.

@bbrito
Copy link

bbrito commented Jun 18, 2019

Any news on this issue?

I tried the *.tf and it works.

@veqtor
Copy link

veqtor commented Jun 20, 2019

It might seem like .tf saving works but in my experience the only difference is that it doesn't throw an error.
Steps to reproduce:
Make a model with nested models and set some layers to trainable=False
Train for some epochs
Save weights
Evaluate and save metrics
Clear everything
Make model
Load weights
Evaluate

@k-w-w
Copy link
Contributor

k-w-w commented Jun 20, 2019

I am currently submitting a fix for H5.

@veqtor What problem are you seeing with using the TF format?

tensorflow-copybara pushed a commit that referenced this issue Jun 21, 2019
Changed the test to the example from #27769.

PiperOrigin-RevId: 254305891
@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jun 21, 2019
@dbalabka
Copy link

dbalabka commented Jul 1, 2019

@k-w-w I have tested your fix and it works for me 😃 Thank you a lot!

@19giorgosts
Copy link

@k-w-w How can I use your fix? I have the same problem.

@k-w-w
Copy link
Contributor

k-w-w commented Aug 5, 2019

@19giorgosts The fix should be in tensorflow-nightly, which you can install using pip install tf-nightly

@Lannist
Copy link

Lannist commented Aug 7, 2019

It might seem like .tf saving works but in my experience the only difference is that it doesn't throw an error.
Steps to reproduce:
Make a model with nested models and set some layers to trainable=False
Train for some epochs
Save weights
Evaluate and save metrics
Clear everything
Make model
Load weights
Evaluate

I am new coder to keras。 Can you show me a demo about your description?
Thx

@jvishnuvardhan
Copy link
Contributor

jvishnuvardhan commented Aug 9, 2019

@Lannist Here is the colab gist to save/load the weights in *.tf format. Here is the gist to save/load the weights in *.h5 format. The only difference between those two gist is in changing the extension. Thanks!

I am closing the issue as it was resolved in tf-nightly. Please feel free to open if the issue persists again. Thanks!

@tensorflow-bot
Copy link

tensorflow-bot bot commented Aug 9, 2019

Are you satisfied with the resolution of your issue?
Yes
No

@ysyyork
Copy link

ysyyork commented May 6, 2020

is this change gonna be in tf 1 ?

@aii-guo
Copy link

aii-guo commented May 13, 2020

is this change gonna be in tf 1 ?

have you found the solution?
I using the tensorflow 1.1.4 and meet the same error but can not find way to fix it

@aii-guo
Copy link

aii-guo commented May 13, 2020

@19giorgosts The fix should be in tensorflow-nightly, which you can install using pip install tf-nightly

how about tensorflow 1.1.4 or 1.1.5
can not install tensorflow-nightly by pip

@paulaceccon
Copy link

paulaceccon commented Oct 5, 2021

I am currently submitting a fix for H5.

@veqtor What problem are you seeing with using the TF format?

That didn't work for me, using that fix in tf-nightly, for a siamese model such as:

import os
from typing import Optional

import numpy as np
import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras import backend as K
from tensorflow.keras import layers
from tensorflow.keras.applications import EfficientNetB0, ResNet50
from tensorflow.keras.optimizers import Adam

def l1_distance(vects) -> float:
    """
    Finds the L1 distance between two vectors.

    Args:
        vects: List containing two tensors of same length.

    Returns:
        Element-wise L1 distance.
    """
    x, y = vects

    return K.abs(x - y)

def create_model(
    target_shape = (224, 224, 3),
    path = None,
) -> Model:
    """
    Creates the siamese model.

    Args:
        target_shape: image dimensions.
        path: path to best weights.

    Returns:
        Siamese model.
    """
    input_1 = layers.Input(shape=target_shape, name="inp_1")
    input_2 = layers.Input(shape=target_shape, name="inp_2")
    # input_1aug = img_augmentation(input_1)
    # input_2aug = img_augmentation(input_2)

    input = layers.Input(shape=target_shape, name="input")
    lambda_1 = layers.Lambda(
        lambda image: tf.keras.applications.resnet.preprocess_input(image),
        name="pre_process",
    )(input)
    base_cnn = ResNet50(
        weights="imagenet",
        input_tensor=lambda_1,
        input_shape=target_shape,
        include_top=False,
    )
    # CONV/FC -> BatchNorm -> ReLu(or other activation) -> Dropout -> CONV/FC ->
    pool = layers.MaxPooling2D(pool_size=(2, 2))(base_cnn.output)
    flatten = layers.Flatten(name="base_output_flatten")(pool)
    dense1 = layers.BatchNormalization(name="dense1_norm")(flatten)
    dense1 = layers.Dense(512, activation="relu", name="dense1")(dense1)
    dense1 = layers.Dropout(0.3, name="dense1_dropout")(dense1)
    dense2 = layers.BatchNormalization(name="dense2_norm")(dense1)
    dense2 = layers.Dense(256, activation="relu", name="dense2")(dense2)
    dense2 = layers.Dropout(0.2, name="dense2_dropout")(dense2)
    output = layers.Dense(256, name="dense_output")(dense2)

    embedding = Model(input, output, name="Embedding")

    trainable = False
    for layer in base_cnn.layers:
        if layer.name == "conv5_block1_out":
            trainable = True
        layer.trainable = trainable

    tower_1 = embedding(input_1)
    tower_2 = embedding(input_2)

    merge_layer = layers.Lambda(l1_distance, name="l1")([tower_1, tower_2])
    normal_layer = tf.keras.layers.BatchNormalization(name="l1_norm")(merge_layer)
    comparison_layer = layers.Dense(
        1,
        activation="sigmoid",
        name="final_layer",
    )(normal_layer)
    siamese = Model(inputs=[input_1, input_2], outputs=comparison_layer)

    if path is not None:
        siamese.load_weights(path)

    return siamese

early_stopping_callback = tf.keras.callbacks.EarlyStopping(
        monitor="loss", patience=5
    )
tensorboard_callback = tf.keras.callbacks.TensorBoard(
        log_dir="/logs", histogram_freq=1
    )
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
        filepath="/logs/weights{epoch:04d}.tf", save_weights_only=True, save_freq=1
    )

train_generator = get_train_generator(
        split_path, batch_size=batch_size, input_size=target_shape
    )
steps_per_epoch = len(train_generator)
clr = get_cyclical_lr(2 * steps_per_epoch)
optimizer = Adam(clr)

siamese = create_model(target_shape)

siamese.compile(
        loss=loss(margin=margin),
        optimizer=optimizer,
        metrics=[metrics.accuracy, metrics.precision, metrics.recall, metrics.f1],
    )
siamese.summary()

siamese.fit(
        train_generator,
        validation_data=get_valid_generator(
            split_path, batch_size=batch_size, input_size=target_shape
        ),
        epochs=epochs,
        callbacks=[
            early_stopping_callback,
            tensorboard_callback,
            model_checkpoint_callback,
        ],
        verbose=1,
    )

siamese = create_model(path="/content/weights00000012.h5")
ValueError                                Traceback (most recent call last)
<ipython-input-9-e1ec7f0dd441> in <module>()
----> 1 siamese = create_model(path="/content/weights00000012.h5")

3 frames
<__array_function__ internals> in transpose(*args, **kwargs)

/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     56 
     57     try:
---> 58         return bound(*args, **kwds)
     59     except TypeError:
     60         # A TypeError occurs if the object does have such a method in its

ValueError: axes don't match array

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues type:bug Bug
Projects
None yet
Development

No branches or pull requests