Keras subclassed model layers' output shape detection (e.g. for summary) #25036

inoryy · 2019-01-18T20:19:30Z

System information

TensorFlow version (you are using): 1.12
Are you willing to contribute it (Yes/No): Yes

Describe the feature and the current behavior/state.

Currently output shapes inference of all layers is essentially disabled if model is subclassed, which results in a rather vague "multiple" in model summary. Example:

import tensorflow as tf

class MyModel(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.dense = tf.keras.layers.Dense(1)

    def call(self, inputs, **kwargs):
        return self.dense(inputs)

model = MyModel()
model.build(input_shape=(None, 1))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                multiple                  2         
=================================================================
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________

After investigating source code I understand why this happens from a technical point of view, but it is quite frustrating for an end user, especially given that it is quite trivial to infer the shapes manually with given information. I think there should be a way to force layer.output_shape recalculation if model.build() is manually called. There is of course a risk of mismatching actual input shapes, but that would be responsibility of the .build() caller to enforce.

Alternatively, allow runtime input(s) shape argument to be passed to model.summary(), that could be used for one-time shape detection and validation.

Will this change the current api? How?

No, not directly. Could possibly affect internal checks if layer.output_shape is manually overriden. However, currently that parameter is completely unused for subclassed models, so risk should be low.

Who will benefit with this feature?

End users of subclassed model providers who could be interested in inspecting model topology.

Any Other info.

I am willing to contribute either option with a little bit of guidance on the direction.

The text was updated successfully, but these errors were encountered:

jashrathod · 2019-01-21T04:36:21Z

Hi.
Can I help with this issue?

jvishnuvardhan · 2019-01-22T17:05:23Z

@jashrathod0 Please feel free to support any issue here. This is a community helping others in the community. Thanks for coming forward.

aminemosbah · 2019-01-23T17:46:00Z

hi
great post ,
i m trying to run baloonn detection API Mask_RCNN , and i get stuck with this error

input that isn't a symbolic tensor

any suggession to work this out ?

marcown · 2019-03-13T14:44:17Z

Are you working on this? Need help?

same happens for tensorflow 2.0

hadim · 2019-05-28T16:04:24Z

Same issue here on TF 2 nightly.

ziyigogogo · 2019-06-06T09:15:06Z

Same issue here, is there anyway to make the sub-class model able to do things like "model.summary()"?

Ja1r0 · 2019-06-07T04:13:07Z

I confront the same issue. In tf.keras API, when create a model by define subclass and implement forward pass in method call, actually have not build a TF graph.
The layers in model.layers can't get the attributes layer.input_shape and layer.output_shape. This is because the layer._inbound_nodes is an empty list. And in the definition of layer.input_shape(also layer.output_shape), there is a judgement statement:

if not self._inbound_nodes:
  raise AttributeError('The layer has never been called '
                       'and thus has no defined input shape.')

When pass a symbolic tensor to a layer, a Node class (define in keras.engine.base_layer.py) will append to layer._inbound_nodes list. But when create model by subclass and implement forward pass in call method, it seems that haven't pass symbolic tensors to each layer. So the layer._inbound_nodes is empty list, and then can't get the attribute layer.input_shape . This is an example(In model.summary() method also need to get the layer.input_shape):

import tensorflow as tf

class MyModel(tf.keras.Model):
    def __init__(self):
        super(MyModel, self).__init__()
        self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
        self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)
    
    def call(self, inputs):
        x = self.dense1(inputs)
        return self.dense2(x)

model = MyModel()
layers = model.layers

for layer in layers:
    name = layer.name
    input_shape = layer.input_shape
    output_shape = layer.output_shape
    print('%s   input shape: %s, output_shape: %s.\n' % (name, input_shape, output_shape))

And this is the traceback:

Traceback (most recent call last):
  File "draft.py", line 195, in <module>
    test8()
  File "draft.py", line 159, in test8
    input_shape = layer.input_shape
  File "C:\Users\Jairo\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\k
eras\engine\base_layer.py", line 1118, in input_shape
    raise AttributeError('The layer has never been called '
AttributeError: The layer has never been called and thus has no defined input sh
ape.

If create a model by functional API or model.Sequential, things go well:

import tensorflow as tf

inputs = tf.keras.Input(shape=(3,))
x = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

or

import tensorflow as tf

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(32, input_shape=(500,)))
model.add(tf.keras.layers.Dense(32))

But when the model structure is complicated, I choose to use subclass model and define forward pass in call. If can get layer.input_shape and layer.output_shape when create model by subclass, it will be very convenient.

svobora · 2019-06-20T08:06:59Z

This is a serious issue, rendering subclassing the Model class unusable. Anyone working on this?

fchollet · 2019-06-20T18:05:08Z

You can do all these things (printing input / output shapes) in a Functional or Sequential model because these models are static graphs of layers.

In contrast, a subclassed model is a piece of Python code (a call method). There is no graph of layers here. We cannot know how layers are connected to each other (because that's defined in the body of call, not as an explicit data structure), so we cannot infer input / output shapes.

Please see this detailed explanation of the differences between Functional/Sequential models and imperative models: https://medium.com/tensorflow/what-are-symbolic-and-imperative-apis-in-tensorflow-2-0-dfccecb01021

tensorflow-bot · 2019-06-20T18:05:09Z

Are you satisfied with the resolution of your issue?
Yes
No

svobora · 2019-06-20T18:32:01Z

@fchollet Thank you for the explanation. It seems that there is a misunderstanding in the purpose of subclassing Model. Every tutorial I've seen uses that as encapsulation of static graphs e.g. as ResNet blocks. However, now I understand that they are meant for dynamic graphs, where each call could evaluate different graph based on input parameters.

zh794390558 · 2019-07-24T12:10:19Z

@fchollet Thanks.

randolf-scholz · 2019-08-19T19:39:48Z

@inoryy and anyone interested, I found a dirty workaround that basically mixes functional API and model subclassing by using __new__

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Flatten

class MyModel(tf.keras.Model):
    def __init__(self, *args, **kwargs):
        super().__init__(self._inputs, self._outputs)
        self.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
        
    def __new__(cls, input_shape, num_classes, neurons=100, layers=2):
        cls._inputs = Input( input_shape, dtype=tf.float32, name='inputs' )
        x = Flatten()(cls._inputs)
        for k in range(layers):
            x =  Dense(neurons)(x)
        cls._outputs = Dense(num_classes)(x)
        
        return super().__new__(cls)
    
model = MyModel( (28,28), 10 )
model.summary()

This can be useful for example if one wants to write a grid-search wrapper that works with either tensorflow/pytorch/scikit-learn type models.

for HPC in HPCs:
   model = MyModel(**HPC)  # initialize with specific HyperParameterCombination
   model.fit(train_data)
   score = model.evaluate(test_data)

ixez · 2019-10-15T07:54:04Z

I borrow the solution from here by adding a model method to infer the model explicitly.

The output shape is especially uncertain when with Dense layer as it depends on the input shape, so it needs to be inferred with a certain input shape.

import tensorflow as tf
from tensorflow.keras.layers import Input

class MyModel(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.dense = tf.keras.layers.Dense(1)

    def call(self, inputs, **kwargs):
        return self.dense(inputs)

    def model(self):
        x = Input(shape=(1))
        return Model(inputs=[x], outputs=self.call(x))

MyModel().model().summary()

danmou · 2019-12-05T10:08:52Z

I created a mixin class that solves this problem for static models defined with the subclassing API. It automatically defines input and output shapes and dtypes the first time the model or layer is called. See the gist for a complete example, but here's a demonstration:

class ExampleNetwork(Model):
    def __init__(self) -> None:
        super().__init__()
        self.encoder = Sequential([
            Conv2D(filters=32, kernel_size=3, strides=2, activation='relu'),
            Flatten(),
        ])
        self.concat = Concatenate(axis=-1)
        self.dense = Dense(units=100)

    def call(self, inputs: List[tf.Tensor]) -> tf.Tensor:
        encoded = self.encoder(inputs[0])
        joined = self.concat([encoded] + inputs[1:])
        return self.dense(joined)

model = ExampleNetwork()
first_batch = [tf.zeros((1, 64, 64, 3)), tf.zeros((1, 10))]
model(first_batch)
model.summary()
# Model: "example_network_1"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #
# =================================================================
# sequential_1 (Sequential)    (None, 30752)             896
# _________________________________________________________________
# concatenate (Concatenate)    (None, 30762)             0
# _________________________________________________________________
# dense (Dense)                (None, 100)               3076300
# =================================================================
# Total params: 3,077,196
# Trainable params: 3,077,196
# Non-trainable params: 0
# _________________________________________________________________
print(model.input_shape)
# ListWrapper([TensorShape([None, 64, 64, 3]), TensorShape([None, 10])])

@fchollet and other devs, any chance this (or something similar) could become part of tf.keras.layers.Layer when it's not explicitly initialized as dynamic? I find the subclassing API better suited for complex projects (I assume that's also why it's used by all the built-in Keras layers) and this would make it just as easy to use as the functional API.

JamesMcGuigan · 2020-02-20T22:25:56Z

It took a few attempts to get this right.

The first hack method is simply to run dummy data through the model

model = ClassCNN()
model( tf.random.uniform(shape=(128,28,28,1)) )  # required to call .summary() before .fit()
model.summary()

However a more robust solution involves explictly .build(input_size) on the key layers, and then on the model itself inside __init__().

Not all the layers need to be defined, as some information can be implied. Define the first "real" layer (Flatten() is not a layer), plus any Conv2D() and Dropout() layers, which might require a little bit of math.

Here are some working examples of keras syntax, which fixes the model.summary() before model.fit() bug:

https://github.com/JamesMcGuigan/kaggle-digit-recognizer/tree/master/src/keras/examples

class ClassCNN(tf.keras.Model):

    def __init__(self, input_shape, output_shape, **kwargs):
        super(ClassCNN, self).__init__()
        self._input_shape  = input_shape   # = (28, 28, 1)
        self._output_shape = output_shape  # = 10

        self.conv1      = Conv2D(32, kernel_size=(3, 3), activation=tf.nn.relu)
        self.conv2      = Conv2D(64, kernel_size=(3, 3), activation=tf.nn.relu)
        self.maxpool    = MaxPooling2D(pool_size=(2, 2))
        self.dropout1   = Dropout(0.25, name='dropout1')
        self.flatten    = Flatten()
        self.dense1     = Dense(128, activation=tf.nn.relu)
        self.dropout2   = Dropout(0.5, name='dropout2')
        self.activation = Dense(self._output_shape, activation=tf.nn.softmax)

        self.conv1.build(     (None,) + input_shape )
        self.conv2.build(     (None,) + tuple(np.subtract(input_shape[:-1],2)) + (32,) )
        self.maxpool.build(   (None,) + tuple(np.subtract(input_shape[:-1],4)) + (64,) )
        self.dropout1.build( tuple(np.floor_divide(np.subtract(input_shape[:-1],4),2)) + (64,) )
        self.dropout2.build( 128 )
        self.build(           (None,) + input_shape)


    def call(self, x, training=False, **kwargs):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.maxpool(x)
        if training:  x = self.dropout1(x)
        x = self.flatten(x)
        x = self.dense1(x)
        if training:  x = self.dropout2(x)
        x = self.activation(x)
        return x

class ClassNN(tf.keras.Model):

    def __init__(self, input_shape, output_shape, **kwargs):
        super(ClassNN, self).__init__()
        self._input_shape  = np.prod(input_shape)  # = (28, 28, 1) = 784
        self._output_shape = output_shape          # = 10

        self.flatten    = tf.keras.layers.Flatten()
        self.dense1     = tf.keras.layers.Dense(128, activation=tf.nn.relu, )
        self.dropout    = tf.keras.layers.Dropout(0.2)
        self.dense2     = tf.keras.layers.Dense(128, activation=tf.nn.softmax)
        self.activation = tf.keras.layers.Dense(self._output_shape, activation=tf.nn.softmax)

        self.dense1.build( self._input_shape)
        self.dropout.build(128)
        self.build( (None, self._input_shape) )


    def call(self, inputs, training=False, **kwargs):
        x = self.flatten(inputs)
        x = self.dense1(x)
        if training: x = self.dropout(x)
        x = self.dense2(x)
        x = self.activation(x)
        return x

def FunctionalCNN(input_shape, output_shape):
    inputs = Input(shape=input_shape)
    x = Conv2D(32, kernel_size=(3, 3), activation='relu')(inputs)
    x = Conv2D(64, kernel_size=(3, 3), activation='relu')(x)
    x = MaxPooling2D(pool_size=(2, 2))(x)
    x = Dropout(0.25)(x)
    x = Flatten()(x)
    x = Dense(128, activation='relu')(x)
    x = Dropout(0.5)(x)
    x = Dense(output_shape, activation='softmax')(x)

    model = Model(inputs, x, name="FunctionalCNN")
    plot_model(model, to_file=os.path.join(os.path.dirname(__file__), "FunctionalCNN.png"))
    return model

def SequentialCNN(input_shape, output_shape):
    model = Sequential()
    model.add( Conv2D(32, kernel_size=(3, 3),
                     activation='relu',
                     input_shape=input_shape) )
    model.add( Conv2D(64, (3, 3), activation='relu') )
    model.add( MaxPooling2D(pool_size=(2, 2)) )
    model.add( Dropout(0.25) )
    model.add( Flatten() )
    model.add( Dense(128, activation='relu') )
    model.add( Dropout(0.5) )
    model.add( Dense(output_shape, activation='softmax') )

    plot_model(model, to_file=os.path.join(os.path.dirname(__file__), "SequentialCNN.png"))
    return model

main loop

#!/usr/bin/env python3
import multiprocessing
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'  # 0, 1, 2, 3  # Disable Tensortflow Logging
os.chdir( os.path.dirname( os.path.abspath(__file__) ) )

import tensorflow as tf
import tensorflow.keras as keras
import time

from src.dataset import DataSet
from src.keras.examples.ClassCNN import ClassCNN
from src.keras.examples.ClassNN import ClassNN
from src.keras.examples.FunctionalCNN import FunctionalCNN
from src.keras.examples.SequentialCNN import SequentialCNN
from src.utils.csv import predict_to_csv

tf.random.set_seed(42)

timer_start = time.time()

dataset = DataSet()
config = {
    "verbose":      False,
    "epochs":       12,
    "batch_size":   128,
    "input_shape":  dataset.input_shape(),
    "output_shape": dataset.output_shape(),
}
print("config", config)

# BUG: ClassCNN accuracy is only 36% compared to 75% for SequentialCNN / FunctionalCNN
# SequentialCNN   validation: | loss: 1.3756675141198293 | accuracy: 0.7430952
# FunctionalCNN   validation: | loss: 1.4285654685610816 | accuracy: 0.7835714
# ClassCNN        validation: | loss: 1.9851970995040167 | accuracy: 0.36214286
# ClassNN         validation: | loss: 2.302224604288737  | accuracy: 0.09059524
models = {
    "SequentialCNN": SequentialCNN(
        input_shape=dataset.input_shape(),
        output_shape=dataset.output_shape()
    ),
    "FunctionalCNN": FunctionalCNN(
        input_shape=dataset.input_shape(),
        output_shape=dataset.output_shape()
    ),
    "ClassCNN": ClassCNN(
        input_shape=dataset.input_shape(),
        output_shape=dataset.output_shape()
    ),
    "ClassNN":  ClassNN(
        input_shape=dataset.input_shape(),
        output_shape=dataset.output_shape()
    )
}


for model_name, model in models.items():
    print(model_name)

    model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer=keras.optimizers.Adadelta(),
                  metrics=['accuracy'])

    model.summary()

    model.fit(
        dataset.data['train_X'], dataset.data['train_Y'],
        batch_size = config["batch_size"],
        epochs     = config["epochs"],
        verbose    = config["verbose"],
        validation_data = (dataset.data["valid_X"], dataset.data["valid_Y"]),
        use_multiprocessing = True, workers = multiprocessing.cpu_count()
    )

for model_name, model in models.items():
    score = model.evaluate(dataset.data['valid_X'], dataset.data['valid_Y'], verbose=config["verbose"])
    print(model_name.ljust(15), "validation:", '| loss:', score[0], '| accuracy:', score[1])

print("time:", int(time.time() - timer_start), "s")

Output

./src/keras/examples/main.py 
config {'verbose': False, 'epochs': 12, 'batch_size': 128, 'input_shape': (28, 28, 1), 'output_shape': 10}
SequentialCNN
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 64)        0         
_________________________________________________________________
dropout (Dropout)            (None, 12, 12, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 9216)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               1179776   
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________
FunctionalCNN
Model: "FunctionalCNN"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 12, 12, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               1179776   
_________________________________________________________________
dropout_3 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1290      
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________
ClassCNN
Model: "class_cnn"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_4 (Conv2D)            multiple                  320       
_________________________________________________________________
conv2d_5 (Conv2D)            multiple                  18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 multiple                  0         
_________________________________________________________________
dropout1 (Dropout)           multiple                  0         
_________________________________________________________________
flatten_2 (Flatten)          multiple                  0         
_________________________________________________________________
dense_4 (Dense)              multiple                  1179776   
_________________________________________________________________
dropout2 (Dropout)           multiple                  0         
_________________________________________________________________
dense_5 (Dense)              multiple                  1290      
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________
ClassNN
Model: "class_nn"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_3 (Flatten)          multiple                  0         
_________________________________________________________________
dense_6 (Dense)              multiple                  100480    
_________________________________________________________________
dropout_4 (Dropout)          multiple                  0         
_________________________________________________________________
dense_7 (Dense)              multiple                  16512     
_________________________________________________________________
dense_8 (Dense)              multiple                  1290      
=================================================================
Total params: 118,282
Trainable params: 118,282
Non-trainable params: 0
_________________________________________________________________
SequentialCNN   validation: | loss: 1.3733067512512207 | accuracy: 0.739881
FunctionalCNN   validation: | loss: 1.438635626293364 | accuracy: 0.77916664
ClassCNN        validation: | loss: 1.3686876785187494 | accuracy: 0.75238097

b-hakim · 2020-08-01T13:45:07Z

I believe it can be handled this way, which sounds net and clean, isn't it?
https://stackoverflow.com/a/55236388/871418

jvishnuvardhan self-assigned this Jan 18, 2019

jvishnuvardhan added comp:keras Keras related issues type:bug Bug labels Jan 18, 2019

jvishnuvardhan assigned fchollet and unassigned jvishnuvardhan Jan 18, 2019

jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 18, 2019

jvishnuvardhan assigned tanzhenyu May 28, 2019

fchollet closed this as completed Jun 20, 2019

jvishnuvardhan mentioned this issue Oct 16, 2019

Model has not been built yet #33244

Closed

bhack mentioned this issue Aug 4, 2020

_create_keras_history_helper error ('NoneType' object has no attribute 'op') in 2.3.0 not in 2.2.0 #42026

Closed

tgaddair mentioned this issue Aug 20, 2020

Added collect_activations to extract intermediate output tensors ludwig-ai/ludwig#835

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keras subclassed model layers' output shape detection (e.g. for summary) #25036

Keras subclassed model layers' output shape detection (e.g. for summary) #25036

inoryy commented Jan 18, 2019

jashrathod commented Jan 21, 2019

jvishnuvardhan commented Jan 22, 2019

aminemosbah commented Jan 23, 2019

marcown commented Mar 13, 2019

hadim commented May 28, 2019

ziyigogogo commented Jun 6, 2019

Ja1r0 commented Jun 7, 2019

svobora commented Jun 20, 2019

fchollet commented Jun 20, 2019

tensorflow-bot bot commented Jun 20, 2019

svobora commented Jun 20, 2019

zh794390558 commented Jul 24, 2019

randolf-scholz commented Aug 19, 2019

ixez commented Oct 15, 2019

danmou commented Dec 5, 2019 •

edited

JamesMcGuigan commented Feb 20, 2020 •

edited

b-hakim commented Aug 1, 2020

Keras subclassed model layers' output shape detection (e.g. for summary) #25036

Keras subclassed model layers' output shape detection (e.g. for summary) #25036

Comments

inoryy commented Jan 18, 2019

jashrathod commented Jan 21, 2019

jvishnuvardhan commented Jan 22, 2019

aminemosbah commented Jan 23, 2019

marcown commented Mar 13, 2019

hadim commented May 28, 2019

ziyigogogo commented Jun 6, 2019

Ja1r0 commented Jun 7, 2019

svobora commented Jun 20, 2019

fchollet commented Jun 20, 2019

tensorflow-bot bot commented Jun 20, 2019

svobora commented Jun 20, 2019

zh794390558 commented Jul 24, 2019

randolf-scholz commented Aug 19, 2019

ixez commented Oct 15, 2019

danmou commented Dec 5, 2019 • edited

JamesMcGuigan commented Feb 20, 2020 • edited

b-hakim commented Aug 1, 2020

danmou commented Dec 5, 2019 •

edited

JamesMcGuigan commented Feb 20, 2020 •

edited