Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keras subclassed model layers' output shape detection (e.g. for summary) #25036

Closed
inoryy opened this issue Jan 18, 2019 · 17 comments
Closed

Keras subclassed model layers' output shape detection (e.g. for summary) #25036

inoryy opened this issue Jan 18, 2019 · 17 comments
Assignees
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug

Comments

@inoryy
Copy link

inoryy commented Jan 18, 2019

System information

  • TensorFlow version (you are using): 1.12
  • Are you willing to contribute it (Yes/No): Yes

Describe the feature and the current behavior/state.

Currently output shapes inference of all layers is essentially disabled if model is subclassed, which results in a rather vague "multiple" in model summary. Example:

import tensorflow as tf

class MyModel(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.dense = tf.keras.layers.Dense(1)

    def call(self, inputs, **kwargs):
        return self.dense(inputs)

model = MyModel()
model.build(input_shape=(None, 1))
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                multiple                  2         
=================================================================
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________

After investigating source code I understand why this happens from a technical point of view, but it is quite frustrating for an end user, especially given that it is quite trivial to infer the shapes manually with given information. I think there should be a way to force layer.output_shape recalculation if model.build() is manually called. There is of course a risk of mismatching actual input shapes, but that would be responsibility of the .build() caller to enforce.

Alternatively, allow runtime input(s) shape argument to be passed to model.summary(), that could be used for one-time shape detection and validation.

Will this change the current api? How?

No, not directly. Could possibly affect internal checks if layer.output_shape is manually overriden. However, currently that parameter is completely unused for subclassed models, so risk should be low.

Who will benefit with this feature?

End users of subclassed model providers who could be interested in inspecting model topology.

Any Other info.

I am willing to contribute either option with a little bit of guidance on the direction.

@jvishnuvardhan jvishnuvardhan self-assigned this Jan 18, 2019
@jvishnuvardhan jvishnuvardhan added comp:keras Keras related issues type:bug Bug labels Jan 18, 2019
@jvishnuvardhan jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 18, 2019
@jashrathod
Copy link

Hi.
Can I help with this issue?

@jvishnuvardhan
Copy link
Contributor

@jashrathod0 Please feel free to support any issue here. This is a community helping others in the community. Thanks for coming forward.

@aminemosbah
Copy link

hi
great post ,
i m trying to run baloonn detection API Mask_RCNN , and i get stuck with this error

input that isn't a symbolic tensor

any suggession to work this out ?

@marcown
Copy link

marcown commented Mar 13, 2019

Are you working on this? Need help?

same happens for tensorflow 2.0

@hadim
Copy link

hadim commented May 28, 2019

Same issue here on TF 2 nightly.

@ziyigogogo
Copy link

Same issue here, is there anyway to make the sub-class model able to do things like "model.summary()"?

@Ja1r0
Copy link

Ja1r0 commented Jun 7, 2019

I confront the same issue. In tf.keras API, when create a model by define subclass and implement forward pass in method call, actually have not build a TF graph.
The layers in model.layers can't get the attributes layer.input_shape and layer.output_shape. This is because the layer._inbound_nodes is an empty list. And in the definition of layer.input_shape(also layer.output_shape), there is a judgement statement:

if not self._inbound_nodes:
  raise AttributeError('The layer has never been called '
                       'and thus has no defined input shape.')

When pass a symbolic tensor to a layer, a Node class (define in keras.engine.base_layer.py) will append to layer._inbound_nodes list. But when create model by subclass and implement forward pass in call method, it seems that haven't pass symbolic tensors to each layer. So the layer._inbound_nodes is empty list, and then can't get the attribute layer.input_shape . This is an example(In model.summary() method also need to get the layer.input_shape):

import tensorflow as tf

class MyModel(tf.keras.Model):
    def __init__(self):
        super(MyModel, self).__init__()
        self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
        self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)
    
    def call(self, inputs):
        x = self.dense1(inputs)
        return self.dense2(x)

model = MyModel()
layers = model.layers

for layer in layers:
    name = layer.name
    input_shape = layer.input_shape
    output_shape = layer.output_shape
    print('%s   input shape: %s, output_shape: %s.\n' % (name, input_shape, output_shape))

And this is the traceback:

Traceback (most recent call last):
  File "draft.py", line 195, in <module>
    test8()
  File "draft.py", line 159, in test8
    input_shape = layer.input_shape
  File "C:\Users\Jairo\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\k
eras\engine\base_layer.py", line 1118, in input_shape
    raise AttributeError('The layer has never been called '
AttributeError: The layer has never been called and thus has no defined input sh
ape.

If create a model by functional API or model.Sequential, things go well:

import tensorflow as tf

inputs = tf.keras.Input(shape=(3,))
x = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

or

import tensorflow as tf

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(32, input_shape=(500,)))
model.add(tf.keras.layers.Dense(32))

But when the model structure is complicated, I choose to use subclass model and define forward pass in call. If can get layer.input_shape and layer.output_shape when create model by subclass, it will be very convenient.

@svobora
Copy link
Contributor

svobora commented Jun 20, 2019

This is a serious issue, rendering subclassing the Model class unusable. Anyone working on this?

@fchollet
Copy link
Member

You can do all these things (printing input / output shapes) in a Functional or Sequential model because these models are static graphs of layers.

In contrast, a subclassed model is a piece of Python code (a call method). There is no graph of layers here. We cannot know how layers are connected to each other (because that's defined in the body of call, not as an explicit data structure), so we cannot infer input / output shapes.

Please see this detailed explanation of the differences between Functional/Sequential models and imperative models: https://medium.com/tensorflow/what-are-symbolic-and-imperative-apis-in-tensorflow-2-0-dfccecb01021

@tensorflow-bot
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@svobora
Copy link
Contributor

svobora commented Jun 20, 2019

@fchollet Thank you for the explanation. It seems that there is a misunderstanding in the purpose of subclassing Model. Every tutorial I've seen uses that as encapsulation of static graphs e.g. as ResNet blocks. However, now I understand that they are meant for dynamic graphs, where each call could evaluate different graph based on input parameters.

@zh794390558
Copy link
Contributor

@fchollet Thanks.

@randolf-scholz
Copy link

@inoryy and anyone interested, I found a dirty workaround that basically mixes functional API and model subclassing by using __new__

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Flatten

class MyModel(tf.keras.Model):
    def __init__(self, *args, **kwargs):
        super().__init__(self._inputs, self._outputs)
        self.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
        
    def __new__(cls, input_shape, num_classes, neurons=100, layers=2):
        cls._inputs = Input( input_shape, dtype=tf.float32, name='inputs' )
        x = Flatten()(cls._inputs)
        for k in range(layers):
            x =  Dense(neurons)(x)
        cls._outputs = Dense(num_classes)(x)
        
        return super().__new__(cls)
    
model = MyModel( (28,28), 10 )
model.summary()

This can be useful for example if one wants to write a grid-search wrapper that works with either tensorflow/pytorch/scikit-learn type models.

for HPC in HPCs:
   model = MyModel(**HPC)  # initialize with specific HyperParameterCombination
   model.fit(train_data)
   score = model.evaluate(test_data)

@ixez
Copy link

ixez commented Oct 15, 2019

I borrow the solution from here by adding a model method to infer the model explicitly.

The output shape is especially uncertain when with Dense layer as it depends on the input shape, so it needs to be inferred with a certain input shape.

import tensorflow as tf
from tensorflow.keras.layers import Input

class MyModel(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.dense = tf.keras.layers.Dense(1)

    def call(self, inputs, **kwargs):
        return self.dense(inputs)

    def model(self):
        x = Input(shape=(1))
        return Model(inputs=[x], outputs=self.call(x))

MyModel().model().summary()

@danmou
Copy link

danmou commented Dec 5, 2019

I created a mixin class that solves this problem for static models defined with the subclassing API. It automatically defines input and output shapes and dtypes the first time the model or layer is called. See the gist for a complete example, but here's a demonstration:

class ExampleNetwork(Model):
    def __init__(self) -> None:
        super().__init__()
        self.encoder = Sequential([
            Conv2D(filters=32, kernel_size=3, strides=2, activation='relu'),
            Flatten(),
        ])
        self.concat = Concatenate(axis=-1)
        self.dense = Dense(units=100)

    def call(self, inputs: List[tf.Tensor]) -> tf.Tensor:
        encoded = self.encoder(inputs[0])
        joined = self.concat([encoded] + inputs[1:])
        return self.dense(joined)

model = ExampleNetwork()
first_batch = [tf.zeros((1, 64, 64, 3)), tf.zeros((1, 10))]
model(first_batch)
model.summary()
# Model: "example_network_1"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #
# =================================================================
# sequential_1 (Sequential)    (None, 30752)             896
# _________________________________________________________________
# concatenate (Concatenate)    (None, 30762)             0
# _________________________________________________________________
# dense (Dense)                (None, 100)               3076300
# =================================================================
# Total params: 3,077,196
# Trainable params: 3,077,196
# Non-trainable params: 0
# _________________________________________________________________
print(model.input_shape)
# ListWrapper([TensorShape([None, 64, 64, 3]), TensorShape([None, 10])])

@fchollet and other devs, any chance this (or something similar) could become part of tf.keras.layers.Layer when it's not explicitly initialized as dynamic? I find the subclassing API better suited for complex projects (I assume that's also why it's used by all the built-in Keras layers) and this would make it just as easy to use as the functional API.

@JamesMcGuigan
Copy link

JamesMcGuigan commented Feb 20, 2020

It took a few attempts to get this right.

The first hack method is simply to run dummy data through the model

model = ClassCNN()
model( tf.random.uniform(shape=(128,28,28,1)) )  # required to call .summary() before .fit()
model.summary()

However a more robust solution involves explictly .build(input_size) on the key layers, and then on the model itself inside __init__().

Not all the layers need to be defined, as some information can be implied. Define the first "real" layer (Flatten() is not a layer), plus any Conv2D() and Dropout() layers, which might require a little bit of math.

Here are some working examples of keras syntax, which fixes the model.summary() before model.fit() bug:

class ClassCNN(tf.keras.Model):

    def __init__(self, input_shape, output_shape, **kwargs):
        super(ClassCNN, self).__init__()
        self._input_shape  = input_shape   # = (28, 28, 1)
        self._output_shape = output_shape  # = 10

        self.conv1      = Conv2D(32, kernel_size=(3, 3), activation=tf.nn.relu)
        self.conv2      = Conv2D(64, kernel_size=(3, 3), activation=tf.nn.relu)
        self.maxpool    = MaxPooling2D(pool_size=(2, 2))
        self.dropout1   = Dropout(0.25, name='dropout1')
        self.flatten    = Flatten()
        self.dense1     = Dense(128, activation=tf.nn.relu)
        self.dropout2   = Dropout(0.5, name='dropout2')
        self.activation = Dense(self._output_shape, activation=tf.nn.softmax)

        self.conv1.build(     (None,) + input_shape )
        self.conv2.build(     (None,) + tuple(np.subtract(input_shape[:-1],2)) + (32,) )
        self.maxpool.build(   (None,) + tuple(np.subtract(input_shape[:-1],4)) + (64,) )
        self.dropout1.build( tuple(np.floor_divide(np.subtract(input_shape[:-1],4),2)) + (64,) )
        self.dropout2.build( 128 )
        self.build(           (None,) + input_shape)


    def call(self, x, training=False, **kwargs):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.maxpool(x)
        if training:  x = self.dropout1(x)
        x = self.flatten(x)
        x = self.dense1(x)
        if training:  x = self.dropout2(x)
        x = self.activation(x)
        return x
class ClassNN(tf.keras.Model):

    def __init__(self, input_shape, output_shape, **kwargs):
        super(ClassNN, self).__init__()
        self._input_shape  = np.prod(input_shape)  # = (28, 28, 1) = 784
        self._output_shape = output_shape          # = 10

        self.flatten    = tf.keras.layers.Flatten()
        self.dense1     = tf.keras.layers.Dense(128, activation=tf.nn.relu, )
        self.dropout    = tf.keras.layers.Dropout(0.2)
        self.dense2     = tf.keras.layers.Dense(128, activation=tf.nn.softmax)
        self.activation = tf.keras.layers.Dense(self._output_shape, activation=tf.nn.softmax)

        self.dense1.build( self._input_shape)
        self.dropout.build(128)
        self.build( (None, self._input_shape) )


    def call(self, inputs, training=False, **kwargs):
        x = self.flatten(inputs)
        x = self.dense1(x)
        if training: x = self.dropout(x)
        x = self.dense2(x)
        x = self.activation(x)
        return x
def FunctionalCNN(input_shape, output_shape):
    inputs = Input(shape=input_shape)
    x = Conv2D(32, kernel_size=(3, 3), activation='relu')(inputs)
    x = Conv2D(64, kernel_size=(3, 3), activation='relu')(x)
    x = MaxPooling2D(pool_size=(2, 2))(x)
    x = Dropout(0.25)(x)
    x = Flatten()(x)
    x = Dense(128, activation='relu')(x)
    x = Dropout(0.5)(x)
    x = Dense(output_shape, activation='softmax')(x)

    model = Model(inputs, x, name="FunctionalCNN")
    plot_model(model, to_file=os.path.join(os.path.dirname(__file__), "FunctionalCNN.png"))
    return model
def SequentialCNN(input_shape, output_shape):
    model = Sequential()
    model.add( Conv2D(32, kernel_size=(3, 3),
                     activation='relu',
                     input_shape=input_shape) )
    model.add( Conv2D(64, (3, 3), activation='relu') )
    model.add( MaxPooling2D(pool_size=(2, 2)) )
    model.add( Dropout(0.25) )
    model.add( Flatten() )
    model.add( Dense(128, activation='relu') )
    model.add( Dropout(0.5) )
    model.add( Dense(output_shape, activation='softmax') )

    plot_model(model, to_file=os.path.join(os.path.dirname(__file__), "SequentialCNN.png"))
    return model

main loop

#!/usr/bin/env python3
import multiprocessing
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'  # 0, 1, 2, 3  # Disable Tensortflow Logging
os.chdir( os.path.dirname( os.path.abspath(__file__) ) )

import tensorflow as tf
import tensorflow.keras as keras
import time

from src.dataset import DataSet
from src.keras.examples.ClassCNN import ClassCNN
from src.keras.examples.ClassNN import ClassNN
from src.keras.examples.FunctionalCNN import FunctionalCNN
from src.keras.examples.SequentialCNN import SequentialCNN
from src.utils.csv import predict_to_csv

tf.random.set_seed(42)

timer_start = time.time()

dataset = DataSet()
config = {
    "verbose":      False,
    "epochs":       12,
    "batch_size":   128,
    "input_shape":  dataset.input_shape(),
    "output_shape": dataset.output_shape(),
}
print("config", config)

# BUG: ClassCNN accuracy is only 36% compared to 75% for SequentialCNN / FunctionalCNN
# SequentialCNN   validation: | loss: 1.3756675141198293 | accuracy: 0.7430952
# FunctionalCNN   validation: | loss: 1.4285654685610816 | accuracy: 0.7835714
# ClassCNN        validation: | loss: 1.9851970995040167 | accuracy: 0.36214286
# ClassNN         validation: | loss: 2.302224604288737  | accuracy: 0.09059524
models = {
    "SequentialCNN": SequentialCNN(
        input_shape=dataset.input_shape(),
        output_shape=dataset.output_shape()
    ),
    "FunctionalCNN": FunctionalCNN(
        input_shape=dataset.input_shape(),
        output_shape=dataset.output_shape()
    ),
    "ClassCNN": ClassCNN(
        input_shape=dataset.input_shape(),
        output_shape=dataset.output_shape()
    ),
    "ClassNN":  ClassNN(
        input_shape=dataset.input_shape(),
        output_shape=dataset.output_shape()
    )
}


for model_name, model in models.items():
    print(model_name)

    model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer=keras.optimizers.Adadelta(),
                  metrics=['accuracy'])

    model.summary()

    model.fit(
        dataset.data['train_X'], dataset.data['train_Y'],
        batch_size = config["batch_size"],
        epochs     = config["epochs"],
        verbose    = config["verbose"],
        validation_data = (dataset.data["valid_X"], dataset.data["valid_Y"]),
        use_multiprocessing = True, workers = multiprocessing.cpu_count()
    )

for model_name, model in models.items():
    score = model.evaluate(dataset.data['valid_X'], dataset.data['valid_Y'], verbose=config["verbose"])
    print(model_name.ljust(15), "validation:", '| loss:', score[0], '| accuracy:', score[1])

print("time:", int(time.time() - timer_start), "s")

Output

./src/keras/examples/main.py 
config {'verbose': False, 'epochs': 12, 'batch_size': 128, 'input_shape': (28, 28, 1), 'output_shape': 10}
SequentialCNN
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 64)        0         
_________________________________________________________________
dropout (Dropout)            (None, 12, 12, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 9216)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               1179776   
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________
FunctionalCNN
Model: "FunctionalCNN"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 12, 12, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               1179776   
_________________________________________________________________
dropout_3 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1290      
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________
ClassCNN
Model: "class_cnn"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_4 (Conv2D)            multiple                  320       
_________________________________________________________________
conv2d_5 (Conv2D)            multiple                  18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 multiple                  0         
_________________________________________________________________
dropout1 (Dropout)           multiple                  0         
_________________________________________________________________
flatten_2 (Flatten)          multiple                  0         
_________________________________________________________________
dense_4 (Dense)              multiple                  1179776   
_________________________________________________________________
dropout2 (Dropout)           multiple                  0         
_________________________________________________________________
dense_5 (Dense)              multiple                  1290      
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________
ClassNN
Model: "class_nn"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_3 (Flatten)          multiple                  0         
_________________________________________________________________
dense_6 (Dense)              multiple                  100480    
_________________________________________________________________
dropout_4 (Dropout)          multiple                  0         
_________________________________________________________________
dense_7 (Dense)              multiple                  16512     
_________________________________________________________________
dense_8 (Dense)              multiple                  1290      
=================================================================
Total params: 118,282
Trainable params: 118,282
Non-trainable params: 0
_________________________________________________________________
SequentialCNN   validation: | loss: 1.3733067512512207 | accuracy: 0.739881
FunctionalCNN   validation: | loss: 1.438635626293364 | accuracy: 0.77916664
ClassCNN        validation: | loss: 1.3686876785187494 | accuracy: 0.75238097

@b-hakim
Copy link

b-hakim commented Aug 1, 2020

I believe it can be handled this way, which sounds net and clean, isn't it?
https://stackoverflow.com/a/55236388/871418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug
Projects
None yet
Development

No branches or pull requests