Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when loading a Subclass model with tf.keras.Sequential blocks inside #41045

Closed
jis478 opened this issue Jul 2, 2020 · 12 comments
Closed
Assignees
Labels
comp:keras Keras related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.3 Issues related to TF 2.3 type:bug Bug

Comments

@jis478
Copy link

jis478 commented Jul 2, 2020

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
TensorFlow installed from (source or binary): pip install tensorflow-gpu
TensorFlow version: 2.2.0
Python version: 3.7.0
CUDA/cuDNN version: 11.0 / v8.0.1
GPU model and memory: NVIDIA V100
Problem encountered:

Problem encountered:

Please excuse me if this is not a bug but my lack of understanding on model saving. I will make it down right away if so.

I've written a code training ResNet-50 model using tf.keras.Model and tf.keras.layers.Layer APIs as below. As you can see, this a very common example of a Subclass model using tf.keras.Sequential() API as part of the model to stack Residual blocks.

The problem is that I can save the model without any issue using
model.save(MODEL_DIR, save_format='tf') However, loading the trained model using tf.keras.models.load_model(MODEL_DIR) is a pain since it throws errors like the below. I'm wondering whether using tf.keras.Sequential() API inside tf.keras.Model class possibly causes this issue. The error seems to point out that something is missing with sequential blocks.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-58-ce50dde7e33e> in <module>
----> 1 tf.keras.models.load_model(MODEL_DIR)

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py in load_model(filepath, custom_objects, compile)
    188     if isinstance(filepath, six.string_types):
    189       loader_impl.parse_saved_model(filepath)
--> 190       return saved_model_load.load(filepath, compile)
    191 
    192   raise IOError(

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/load.py in load(path, compile)
    114   # TODO(kathywu): Add saving/loading of optimizer, compiled losses and metrics.
    115   # TODO(kathywu): Add code to load from objects that contain all endpoints
--> 116   model = tf_load.load_internal(path, loader_cls=KerasObjectLoader)
    117 
    118   # pylint: disable=protected-access

/opt/conda/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py in load_internal(export_dir, tags, loader_cls)
    602       loader = loader_cls(object_graph_proto,
    603                           saved_model_proto,
--> 604                           export_dir)
    605       root = loader.get(0)
    606       if isinstance(loader, Loader):

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/load.py in __init__(self, *args, **kwargs)
    186     self._models_to_reconstruct = []
    187 
--> 188     super(KerasObjectLoader, self).__init__(*args, **kwargs)
    189 
    190     # Now that the node object has been fully loaded, and the checkpoint has

/opt/conda/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py in __init__(self, object_graph_proto, saved_model_proto, export_dir)
    121       self._concrete_functions[name] = _WrapperFunction(concrete_function)
    122 
--> 123     self._load_all()
    124     self._restore_checkpoint()
    125 

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _load_all(self)
    213 
    214     # Finish setting up layers and models. See function docstring for more info.
--> 215     self._finalize_objects()
    216 
    217   @property

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _finalize_objects(self)
    508 
    509     # Initialize graph networks, now that layer dependencies have been resolved.
--> 510     self._reconstruct_all_models()
    511 
    512   def _unblock_model_reconstruction(self, layer_id, layer):

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _reconstruct_all_models(self)
    539       raise ValueError('Error when loading from SavedModel -- the following '
    540                        'models could not be initialized: {}'
--> 541                        .format(uninitialized_model_names))
    542 
    543   def _reconstruct_model(self, model_id, model, layers):

ValueError: Error when loading from SavedModel -- the following models could not be initialized: ['sequential_72', 'sequential_78', 'sequential_63', 'sequential_75', 'sequential_67', 'sequential_73', 'sequential_79', 'sequential_71', 'sequential_62', 'sequential_68', 'sequential_74', 'sequential_66']

Model construction

class BottleNeck(tf.keras.layers.Layer):  
    expansion = 4 

    def __init__(self, in_channels, out_channels, stride=1):
        super(BottleNeck, self).__init__()
        self.conv1 = tf.keras.layers.Conv2D(filters=out_channels, kernel_size=(1, 1), use_bias=False)  
        self.bn1 = tf.keras.layers.BatchNormalization()
        self.pad1 = tf.keras.layers.ZeroPadding2D(padding=(1, 1))
        self.conv2 = tf.keras.layers.Conv2D(filters=out_channels, kernel_size=(3, 3), strides=(stride, stride), use_bias=False) 
        self.bn2 = tf.keras.layers.BatchNormalization()
        self.conv3 = tf.keras.layers.Conv2D(filters=out_channels * BottleNeck.expansion, kernel_size=(1, 1), use_bias=False)
        self.bn3 = tf.keras.layers.BatchNormalization()
        
        self.downsample = tf.keras.Sequential()

        if stride != 1 or in_channels != out_channels * BottleNeck.expansion:            
            self.downsample.add(tf.keras.layers.Conv2D(filters=out_channels * BottleNeck.expansion, kernel_size=(1, 1), strides=(stride, stride), use_bias=False))
            self.downsample.add(tf.keras.layers.BatchNormalization())
  
    def call(self, inputs, training=None):
        out = self.conv1(inputs)
        out = self.bn1(out)
        out = tf.nn.relu(out)

        out = self.pad1(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = tf.nn.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)
        
        down = self.downsample(inputs)
        out += down
        out = tf.nn.relu(out)

        return out

    
class ResNet(tf.keras.Model):
    def __init__(self, dataset, block, num_blocks, num_classes):
        super(ResNet, self).__init__()        
        self.dataset = dataset
        if self.dataset.startswith('cifar'):
            self.in_channels = 64
            self.pad1 = tf.keras.layers.ZeroPadding2D(padding=(1, 1))
            self.conv1 = tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), use_bias=False)  
            self.bn1 = tf.keras.layers.BatchNormalization()
            self.relu = tf.nn.relu

            self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
            self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
            self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2) 
            self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2) 
            self.avgpool = tf.keras.layers.AveragePooling2D(pool_size=(4, 4))
            self.fc = tf.keras.layers.Dense(num_classes)

    def _make_layer(self, block, out_channels, num_blocks, stride=1, training=None):     
        strides = [stride] + [1] * (num_blocks - 1)
        layers = tf.keras.Sequential()        
        for stride in strides:
            layers.add(block(self.in_channels, out_channels, stride))
            self.in_channels = out_channels * block.expansion
            
        return layers

    def call(self, x, training=None):
        if self.dataset == 'cifar10' or self.dataset == 'cifar100':
            
            x = self.pad1(x)
            x = self.conv1(x)
            x = self.bn1(x)
            x = self.relu(x)
            x = self.layer1(x, training=training)
            x = self.layer2(x, training=training)
            x = self.layer3(x, training=training)
            x = self.layer4(x, training=training)
            x = self.avgpool(x)
            x = tf.keras.layers.Flatten()(x)
            x = self.fc(x)

        return x

@jis478 jis478 added the type:others issues not falling in bug, perfromance, support, build and install or feature label Jul 2, 2020
@ravikyram
Copy link
Contributor

@jis478

Can you please provide colab link or complete code snippet along with supporting files to reproduce the issue in our environment.It helps us in localizing the issue faster.Thanks!

@ravikyram ravikyram added the stat:awaiting response Status - Awaiting response from author label Jul 3, 2020
@jis478
Copy link
Author

jis478 commented Jul 3, 2020

@jis478

Can you please provide colab link or complete code snippet along with supporting files to reproduce the issue in our environment.It helps us in localizing the issue faster.Thanks!

Here comes the link:
https://colab.research.google.com/drive/1pBpAwl66CJc5wlPVcO-TX7b5F-LCXew4?usp=sharing

The last line loading a model causes the initialization error. Thanks!

@ravikyram
Copy link
Contributor

I have tried in colab with TF version 2.2,2.3-rc0,nightly versions and was able to reproduce the issue.Please, find the gist here.Thanks!

@ravikyram ravikyram added comp:keras Keras related issues TF 2.2 Issues related to TF 2.2 type:bug Bug and removed type:others issues not falling in bug, perfromance, support, build and install or feature stat:awaiting response Status - Awaiting response from author labels Jul 3, 2020
@ravikyram ravikyram assigned gowthamkpr and unassigned ravikyram Jul 3, 2020
@jis478
Copy link
Author

jis478 commented Jul 5, 2020

Any updates? I'm looking forward to the solution. Thank you for the support! :)

@gowthamkpr gowthamkpr assigned k-w-w and unassigned gowthamkpr Jul 13, 2020
@gowthamkpr gowthamkpr added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jul 13, 2020
@jis478
Copy link
Author

jis478 commented Jul 21, 2020

Any updates? I'm looking forward to the solution. Thank you for the support! :)

@ravikyram
Copy link
Contributor

I have tried in colab with TF version 2.3,nightly version(2.4.0-dev20200807) and was able to reproduce the issue.Please, find the gist here.Thanks!

@ravikyram ravikyram added TF 2.3 Issues related to TF 2.3 and removed TF 2.2 Issues related to TF 2.2 labels Aug 7, 2020
@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Aug 9, 2020
@saikumarchalla
Copy link

@jis478 I tried to reproduce the issue in TF 2.5 and it seems loading the saved model was fixed but getting different error. Please check the gist here.

@saikumarchalla saikumarchalla added the stat:awaiting response Status - Awaiting response from author label Jun 17, 2021
@saikumarchalla
Copy link

@jis478 Please let us know the update on this issue.Thanks!

@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 24, 2021
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@ai1361720220000
Copy link

anyone has solved the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.3 Issues related to TF 2.3 type:bug Bug
Projects
None yet
Development

No branches or pull requests

7 participants