Error when loading a Subclass model with tf.keras.Sequential blocks inside #41045

jis478 · 2020-07-02T23:37:02Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
TensorFlow installed from (source or binary): pip install tensorflow-gpu
TensorFlow version: 2.2.0
Python version: 3.7.0
CUDA/cuDNN version: 11.0 / v8.0.1
GPU model and memory: NVIDIA V100
Problem encountered:

Problem encountered:

Please excuse me if this is not a bug but my lack of understanding on model saving. I will make it down right away if so.

I've written a code training ResNet-50 model using tf.keras.Model and tf.keras.layers.Layer APIs as below. As you can see, this a very common example of a Subclass model using tf.keras.Sequential() API as part of the model to stack Residual blocks.

The problem is that I can save the model without any issue using
model.save(MODEL_DIR, save_format='tf') However, loading the trained model using tf.keras.models.load_model(MODEL_DIR) is a pain since it throws errors like the below. I'm wondering whether using tf.keras.Sequential() API inside tf.keras.Model class possibly causes this issue. The error seems to point out that something is missing with sequential blocks.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-58-ce50dde7e33e> in <module>
----> 1 tf.keras.models.load_model(MODEL_DIR)

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py in load_model(filepath, custom_objects, compile)
    188     if isinstance(filepath, six.string_types):
    189       loader_impl.parse_saved_model(filepath)
--> 190       return saved_model_load.load(filepath, compile)
    191 
    192   raise IOError(

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/load.py in load(path, compile)
    114   # TODO(kathywu): Add saving/loading of optimizer, compiled losses and metrics.
    115   # TODO(kathywu): Add code to load from objects that contain all endpoints
--> 116   model = tf_load.load_internal(path, loader_cls=KerasObjectLoader)
    117 
    118   # pylint: disable=protected-access

/opt/conda/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py in load_internal(export_dir, tags, loader_cls)
    602       loader = loader_cls(object_graph_proto,
    603                           saved_model_proto,
--> 604                           export_dir)
    605       root = loader.get(0)
    606       if isinstance(loader, Loader):

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/load.py in __init__(self, *args, **kwargs)
    186     self._models_to_reconstruct = []
    187 
--> 188     super(KerasObjectLoader, self).__init__(*args, **kwargs)
    189 
    190     # Now that the node object has been fully loaded, and the checkpoint has

/opt/conda/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py in __init__(self, object_graph_proto, saved_model_proto, export_dir)
    121       self._concrete_functions[name] = _WrapperFunction(concrete_function)
    122 
--> 123     self._load_all()
    124     self._restore_checkpoint()
    125 

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _load_all(self)
    213 
    214     # Finish setting up layers and models. See function docstring for more info.
--> 215     self._finalize_objects()
    216 
    217   @property

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _finalize_objects(self)
    508 
    509     # Initialize graph networks, now that layer dependencies have been resolved.
--> 510     self._reconstruct_all_models()
    511 
    512   def _unblock_model_reconstruction(self, layer_id, layer):

/opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _reconstruct_all_models(self)
    539       raise ValueError('Error when loading from SavedModel -- the following '
    540                        'models could not be initialized: {}'
--> 541                        .format(uninitialized_model_names))
    542 
    543   def _reconstruct_model(self, model_id, model, layers):

ValueError: Error when loading from SavedModel -- the following models could not be initialized: ['sequential_72', 'sequential_78', 'sequential_63', 'sequential_75', 'sequential_67', 'sequential_73', 'sequential_79', 'sequential_71', 'sequential_62', 'sequential_68', 'sequential_74', 'sequential_66']

Model construction

class BottleNeck(tf.keras.layers.Layer):  
    expansion = 4 

    def __init__(self, in_channels, out_channels, stride=1):
        super(BottleNeck, self).__init__()
        self.conv1 = tf.keras.layers.Conv2D(filters=out_channels, kernel_size=(1, 1), use_bias=False)  
        self.bn1 = tf.keras.layers.BatchNormalization()
        self.pad1 = tf.keras.layers.ZeroPadding2D(padding=(1, 1))
        self.conv2 = tf.keras.layers.Conv2D(filters=out_channels, kernel_size=(3, 3), strides=(stride, stride), use_bias=False) 
        self.bn2 = tf.keras.layers.BatchNormalization()
        self.conv3 = tf.keras.layers.Conv2D(filters=out_channels * BottleNeck.expansion, kernel_size=(1, 1), use_bias=False)
        self.bn3 = tf.keras.layers.BatchNormalization()
        
        self.downsample = tf.keras.Sequential()

        if stride != 1 or in_channels != out_channels * BottleNeck.expansion:            
            self.downsample.add(tf.keras.layers.Conv2D(filters=out_channels * BottleNeck.expansion, kernel_size=(1, 1), strides=(stride, stride), use_bias=False))
            self.downsample.add(tf.keras.layers.BatchNormalization())
  
    def call(self, inputs, training=None):
        out = self.conv1(inputs)
        out = self.bn1(out)
        out = tf.nn.relu(out)

        out = self.pad1(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = tf.nn.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)
        
        down = self.downsample(inputs)
        out += down
        out = tf.nn.relu(out)

        return out

    
class ResNet(tf.keras.Model):
    def __init__(self, dataset, block, num_blocks, num_classes):
        super(ResNet, self).__init__()        
        self.dataset = dataset
        if self.dataset.startswith('cifar'):
            self.in_channels = 64
            self.pad1 = tf.keras.layers.ZeroPadding2D(padding=(1, 1))
            self.conv1 = tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), use_bias=False)  
            self.bn1 = tf.keras.layers.BatchNormalization()
            self.relu = tf.nn.relu

            self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
            self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
            self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2) 
            self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2) 
            self.avgpool = tf.keras.layers.AveragePooling2D(pool_size=(4, 4))
            self.fc = tf.keras.layers.Dense(num_classes)

    def _make_layer(self, block, out_channels, num_blocks, stride=1, training=None):     
        strides = [stride] + [1] * (num_blocks - 1)
        layers = tf.keras.Sequential()        
        for stride in strides:
            layers.add(block(self.in_channels, out_channels, stride))
            self.in_channels = out_channels * block.expansion
            
        return layers

    def call(self, x, training=None):
        if self.dataset == 'cifar10' or self.dataset == 'cifar100':
            
            x = self.pad1(x)
            x = self.conv1(x)
            x = self.bn1(x)
            x = self.relu(x)
            x = self.layer1(x, training=training)
            x = self.layer2(x, training=training)
            x = self.layer3(x, training=training)
            x = self.layer4(x, training=training)
            x = self.avgpool(x)
            x = tf.keras.layers.Flatten()(x)
            x = self.fc(x)

        return x

The text was updated successfully, but these errors were encountered:

ravikyram · 2020-07-03T06:54:43Z

@jis478

Can you please provide colab link or complete code snippet along with supporting files to reproduce the issue in our environment.It helps us in localizing the issue faster.Thanks!

jis478 · 2020-07-03T07:58:08Z

@jis478

Can you please provide colab link or complete code snippet along with supporting files to reproduce the issue in our environment.It helps us in localizing the issue faster.Thanks!

Here comes the link:
https://colab.research.google.com/drive/1pBpAwl66CJc5wlPVcO-TX7b5F-LCXew4?usp=sharing

The last line loading a model causes the initialization error. Thanks!

ravikyram · 2020-07-03T10:46:02Z

I have tried in colab with TF version 2.2,2.3-rc0,nightly versions and was able to reproduce the issue.Please, find the gist here.Thanks!

jis478 · 2020-07-05T23:27:55Z

Any updates? I'm looking forward to the solution. Thank you for the support! :)

jis478 · 2020-07-21T06:17:11Z

Any updates? I'm looking forward to the solution. Thank you for the support! :)

ravikyram · 2020-08-07T10:00:37Z

I have tried in colab with TF version 2.3,nightly version(2.4.0-dev20200807) and was able to reproduce the issue.Please, find the gist here.Thanks!

saikumarchalla · 2021-05-31T15:23:27Z

@jis478 I tried to reproduce the issue in TF 2.5 and it seems loading the saved model was fixed but getting different error. Please check the gist here.

saikumarchalla · 2021-06-17T13:44:51Z

@jis478 Please let us know the update on this issue.Thanks!

google-ml-butler · 2021-06-24T14:16:48Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler · 2021-07-03T01:15:55Z

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler · 2021-07-03T01:16:22Z

Are you satisfied with the resolution of your issue?
Yes
No

ai1361720220000 · 2021-08-25T07:27:12Z

anyone has solved the problem?

jis478 added the type:others issues not falling in bug, perfromance, support, build and install or feature label Jul 2, 2020

google-ml-butler bot assigned ravikyram Jul 2, 2020

ravikyram added the stat:awaiting response Status - Awaiting response from author label Jul 3, 2020

ravikyram added comp:keras Keras related issues TF 2.2 Issues related to TF 2.2 type:bug Bug and removed type:others issues not falling in bug, perfromance, support, build and install or feature stat:awaiting response Status - Awaiting response from author labels Jul 3, 2020

ravikyram assigned gowthamkpr and unassigned ravikyram Jul 3, 2020

gowthamkpr assigned k-w-w and unassigned gowthamkpr Jul 13, 2020

gowthamkpr added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jul 13, 2020

ravikyram added TF 2.3 Issues related to TF 2.3 and removed TF 2.2 Issues related to TF 2.2 labels Aug 7, 2020

tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Aug 9, 2020

saikumarchalla added the stat:awaiting response Status - Awaiting response from author label Jun 17, 2021

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 24, 2021

google-ml-butler bot closed this as completed Jul 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when loading a Subclass model with tf.keras.Sequential blocks inside #41045

Error when loading a Subclass model with tf.keras.Sequential blocks inside #41045

jis478 commented Jul 2, 2020 •

edited

ravikyram commented Jul 3, 2020

jis478 commented Jul 3, 2020 •

edited

ravikyram commented Jul 3, 2020

jis478 commented Jul 5, 2020 •

edited

jis478 commented Jul 21, 2020

ravikyram commented Aug 7, 2020

saikumarchalla commented May 31, 2021

saikumarchalla commented Jun 17, 2021

google-ml-butler bot commented Jun 24, 2021

google-ml-butler bot commented Jul 3, 2021

google-ml-butler bot commented Jul 3, 2021

ai1361720220000 commented Aug 25, 2021

Error when loading a Subclass model with tf.keras.Sequential blocks inside #41045

Error when loading a Subclass model with tf.keras.Sequential blocks inside #41045

Comments

jis478 commented Jul 2, 2020 • edited

System information

Problem encountered:

ravikyram commented Jul 3, 2020

jis478 commented Jul 3, 2020 • edited

ravikyram commented Jul 3, 2020

jis478 commented Jul 5, 2020 • edited

jis478 commented Jul 21, 2020

ravikyram commented Aug 7, 2020

saikumarchalla commented May 31, 2021

saikumarchalla commented Jun 17, 2021

google-ml-butler bot commented Jun 24, 2021

google-ml-butler bot commented Jul 3, 2021

google-ml-butler bot commented Jul 3, 2021

ai1361720220000 commented Aug 25, 2021

jis478 commented Jul 2, 2020 •

edited

jis478 commented Jul 3, 2020 •

edited

jis478 commented Jul 5, 2020 •

edited