New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keras subclassed model layers' output shape detection (e.g. for summary) #25036
Comments
Hi. |
@jashrathod0 Please feel free to support any issue here. This is a community helping others in the community. Thanks for coming forward. |
hi input that isn't a symbolic tensor any suggession to work this out ? |
Are you working on this? Need help? same happens for tensorflow 2.0 |
Same issue here on TF 2 nightly. |
Same issue here, is there anyway to make the sub-class model able to do things like "model.summary()"? |
I confront the same issue. In if not self._inbound_nodes:
raise AttributeError('The layer has never been called '
'and thus has no defined input shape.') When pass a symbolic tensor to a layer, a import tensorflow as tf
class MyModel(tf.keras.Model):
def __init__(self):
super(MyModel, self).__init__()
self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)
def call(self, inputs):
x = self.dense1(inputs)
return self.dense2(x)
model = MyModel()
layers = model.layers
for layer in layers:
name = layer.name
input_shape = layer.input_shape
output_shape = layer.output_shape
print('%s input shape: %s, output_shape: %s.\n' % (name, input_shape, output_shape)) And this is the traceback:
If create a model by import tensorflow as tf
inputs = tf.keras.Input(shape=(3,))
x = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs) or import tensorflow as tf
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(32, input_shape=(500,)))
model.add(tf.keras.layers.Dense(32)) But when the model structure is complicated, I choose to use subclass model and define forward pass in |
This is a serious issue, rendering subclassing the Model class unusable. Anyone working on this? |
You can do all these things (printing input / output shapes) in a Functional or Sequential model because these models are static graphs of layers. In contrast, a subclassed model is a piece of Python code (a Please see this detailed explanation of the differences between Functional/Sequential models and imperative models: https://medium.com/tensorflow/what-are-symbolic-and-imperative-apis-in-tensorflow-2-0-dfccecb01021 |
@fchollet Thank you for the explanation. It seems that there is a misunderstanding in the purpose of subclassing Model. Every tutorial I've seen uses that as encapsulation of static graphs e.g. as ResNet blocks. However, now I understand that they are meant for dynamic graphs, where each |
@fchollet Thanks. |
@inoryy and anyone interested, I found a dirty workaround that basically mixes functional API and model subclassing by using import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Flatten
class MyModel(tf.keras.Model):
def __init__(self, *args, **kwargs):
super().__init__(self._inputs, self._outputs)
self.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
def __new__(cls, input_shape, num_classes, neurons=100, layers=2):
cls._inputs = Input( input_shape, dtype=tf.float32, name='inputs' )
x = Flatten()(cls._inputs)
for k in range(layers):
x = Dense(neurons)(x)
cls._outputs = Dense(num_classes)(x)
return super().__new__(cls)
model = MyModel( (28,28), 10 )
model.summary() This can be useful for example if one wants to write a grid-search wrapper that works with either tensorflow/pytorch/scikit-learn type models. for HPC in HPCs:
model = MyModel(**HPC) # initialize with specific HyperParameterCombination
model.fit(train_data)
score = model.evaluate(test_data) |
I borrow the solution from here by adding a The output shape is especially uncertain when with import tensorflow as tf
from tensorflow.keras.layers import Input
class MyModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.dense = tf.keras.layers.Dense(1)
def call(self, inputs, **kwargs):
return self.dense(inputs)
def model(self):
x = Input(shape=(1))
return Model(inputs=[x], outputs=self.call(x))
MyModel().model().summary() |
I created a mixin class that solves this problem for static models defined with the subclassing API. It automatically defines input and output shapes and dtypes the first time the model or layer is called. See the gist for a complete example, but here's a demonstration: class ExampleNetwork(Model):
def __init__(self) -> None:
super().__init__()
self.encoder = Sequential([
Conv2D(filters=32, kernel_size=3, strides=2, activation='relu'),
Flatten(),
])
self.concat = Concatenate(axis=-1)
self.dense = Dense(units=100)
def call(self, inputs: List[tf.Tensor]) -> tf.Tensor:
encoded = self.encoder(inputs[0])
joined = self.concat([encoded] + inputs[1:])
return self.dense(joined)
model = ExampleNetwork()
first_batch = [tf.zeros((1, 64, 64, 3)), tf.zeros((1, 10))]
model(first_batch)
model.summary()
# Model: "example_network_1"
# _________________________________________________________________
# Layer (type) Output Shape Param #
# =================================================================
# sequential_1 (Sequential) (None, 30752) 896
# _________________________________________________________________
# concatenate (Concatenate) (None, 30762) 0
# _________________________________________________________________
# dense (Dense) (None, 100) 3076300
# =================================================================
# Total params: 3,077,196
# Trainable params: 3,077,196
# Non-trainable params: 0
# _________________________________________________________________
print(model.input_shape)
# ListWrapper([TensorShape([None, 64, 64, 3]), TensorShape([None, 10])]) @fchollet and other devs, any chance this (or something similar) could become part of |
It took a few attempts to get this right. The first hack method is simply to run dummy data through the model model = ClassCNN()
model( tf.random.uniform(shape=(128,28,28,1)) ) # required to call .summary() before .fit()
model.summary() However a more robust solution involves explictly Not all the layers need to be defined, as some information can be implied. Define the first "real" layer ( Here are some working examples of keras syntax, which fixes the class ClassCNN(tf.keras.Model):
def __init__(self, input_shape, output_shape, **kwargs):
super(ClassCNN, self).__init__()
self._input_shape = input_shape # = (28, 28, 1)
self._output_shape = output_shape # = 10
self.conv1 = Conv2D(32, kernel_size=(3, 3), activation=tf.nn.relu)
self.conv2 = Conv2D(64, kernel_size=(3, 3), activation=tf.nn.relu)
self.maxpool = MaxPooling2D(pool_size=(2, 2))
self.dropout1 = Dropout(0.25, name='dropout1')
self.flatten = Flatten()
self.dense1 = Dense(128, activation=tf.nn.relu)
self.dropout2 = Dropout(0.5, name='dropout2')
self.activation = Dense(self._output_shape, activation=tf.nn.softmax)
self.conv1.build( (None,) + input_shape )
self.conv2.build( (None,) + tuple(np.subtract(input_shape[:-1],2)) + (32,) )
self.maxpool.build( (None,) + tuple(np.subtract(input_shape[:-1],4)) + (64,) )
self.dropout1.build( tuple(np.floor_divide(np.subtract(input_shape[:-1],4),2)) + (64,) )
self.dropout2.build( 128 )
self.build( (None,) + input_shape)
def call(self, x, training=False, **kwargs):
x = self.conv1(x)
x = self.conv2(x)
x = self.maxpool(x)
if training: x = self.dropout1(x)
x = self.flatten(x)
x = self.dense1(x)
if training: x = self.dropout2(x)
x = self.activation(x)
return x class ClassNN(tf.keras.Model):
def __init__(self, input_shape, output_shape, **kwargs):
super(ClassNN, self).__init__()
self._input_shape = np.prod(input_shape) # = (28, 28, 1) = 784
self._output_shape = output_shape # = 10
self.flatten = tf.keras.layers.Flatten()
self.dense1 = tf.keras.layers.Dense(128, activation=tf.nn.relu, )
self.dropout = tf.keras.layers.Dropout(0.2)
self.dense2 = tf.keras.layers.Dense(128, activation=tf.nn.softmax)
self.activation = tf.keras.layers.Dense(self._output_shape, activation=tf.nn.softmax)
self.dense1.build( self._input_shape)
self.dropout.build(128)
self.build( (None, self._input_shape) )
def call(self, inputs, training=False, **kwargs):
x = self.flatten(inputs)
x = self.dense1(x)
if training: x = self.dropout(x)
x = self.dense2(x)
x = self.activation(x)
return x def FunctionalCNN(input_shape, output_shape):
inputs = Input(shape=input_shape)
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(inputs)
x = Conv2D(64, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.25)(x)
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(output_shape, activation='softmax')(x)
model = Model(inputs, x, name="FunctionalCNN")
plot_model(model, to_file=os.path.join(os.path.dirname(__file__), "FunctionalCNN.png"))
return model def SequentialCNN(input_shape, output_shape):
model = Sequential()
model.add( Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape) )
model.add( Conv2D(64, (3, 3), activation='relu') )
model.add( MaxPooling2D(pool_size=(2, 2)) )
model.add( Dropout(0.25) )
model.add( Flatten() )
model.add( Dense(128, activation='relu') )
model.add( Dropout(0.5) )
model.add( Dense(output_shape, activation='softmax') )
plot_model(model, to_file=os.path.join(os.path.dirname(__file__), "SequentialCNN.png"))
return model main loop #!/usr/bin/env python3
import multiprocessing
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1' # 0, 1, 2, 3 # Disable Tensortflow Logging
os.chdir( os.path.dirname( os.path.abspath(__file__) ) )
import tensorflow as tf
import tensorflow.keras as keras
import time
from src.dataset import DataSet
from src.keras.examples.ClassCNN import ClassCNN
from src.keras.examples.ClassNN import ClassNN
from src.keras.examples.FunctionalCNN import FunctionalCNN
from src.keras.examples.SequentialCNN import SequentialCNN
from src.utils.csv import predict_to_csv
tf.random.set_seed(42)
timer_start = time.time()
dataset = DataSet()
config = {
"verbose": False,
"epochs": 12,
"batch_size": 128,
"input_shape": dataset.input_shape(),
"output_shape": dataset.output_shape(),
}
print("config", config)
# BUG: ClassCNN accuracy is only 36% compared to 75% for SequentialCNN / FunctionalCNN
# SequentialCNN validation: | loss: 1.3756675141198293 | accuracy: 0.7430952
# FunctionalCNN validation: | loss: 1.4285654685610816 | accuracy: 0.7835714
# ClassCNN validation: | loss: 1.9851970995040167 | accuracy: 0.36214286
# ClassNN validation: | loss: 2.302224604288737 | accuracy: 0.09059524
models = {
"SequentialCNN": SequentialCNN(
input_shape=dataset.input_shape(),
output_shape=dataset.output_shape()
),
"FunctionalCNN": FunctionalCNN(
input_shape=dataset.input_shape(),
output_shape=dataset.output_shape()
),
"ClassCNN": ClassCNN(
input_shape=dataset.input_shape(),
output_shape=dataset.output_shape()
),
"ClassNN": ClassNN(
input_shape=dataset.input_shape(),
output_shape=dataset.output_shape()
)
}
for model_name, model in models.items():
print(model_name)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
model.summary()
model.fit(
dataset.data['train_X'], dataset.data['train_Y'],
batch_size = config["batch_size"],
epochs = config["epochs"],
verbose = config["verbose"],
validation_data = (dataset.data["valid_X"], dataset.data["valid_Y"]),
use_multiprocessing = True, workers = multiprocessing.cpu_count()
)
for model_name, model in models.items():
score = model.evaluate(dataset.data['valid_X'], dataset.data['valid_Y'], verbose=config["verbose"])
print(model_name.ljust(15), "validation:", '| loss:', score[0], '| accuracy:', score[1])
print("time:", int(time.time() - timer_start), "s") Output ./src/keras/examples/main.py
config {'verbose': False, 'epochs': 12, 'batch_size': 128, 'input_shape': (28, 28, 1), 'output_shape': 10}
SequentialCNN
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
conv2d_1 (Conv2D) (None, 24, 24, 64) 18496
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 12, 12, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 9216) 0
_________________________________________________________________
dense (Dense) (None, 128) 1179776
_________________________________________________________________
dropout_1 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________
FunctionalCNN
Model: "FunctionalCNN"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 28, 28, 1)] 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
conv2d_3 (Conv2D) (None, 24, 24, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 12, 12, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 9216) 0
_________________________________________________________________
dense_2 (Dense) (None, 128) 1179776
_________________________________________________________________
dropout_3 (Dropout) (None, 128) 0
_________________________________________________________________
dense_3 (Dense) (None, 10) 1290
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________
ClassCNN
Model: "class_cnn"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_4 (Conv2D) multiple 320
_________________________________________________________________
conv2d_5 (Conv2D) multiple 18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 multiple 0
_________________________________________________________________
dropout1 (Dropout) multiple 0
_________________________________________________________________
flatten_2 (Flatten) multiple 0
_________________________________________________________________
dense_4 (Dense) multiple 1179776
_________________________________________________________________
dropout2 (Dropout) multiple 0
_________________________________________________________________
dense_5 (Dense) multiple 1290
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________
ClassNN
Model: "class_nn"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_3 (Flatten) multiple 0
_________________________________________________________________
dense_6 (Dense) multiple 100480
_________________________________________________________________
dropout_4 (Dropout) multiple 0
_________________________________________________________________
dense_7 (Dense) multiple 16512
_________________________________________________________________
dense_8 (Dense) multiple 1290
=================================================================
Total params: 118,282
Trainable params: 118,282
Non-trainable params: 0
_________________________________________________________________
SequentialCNN validation: | loss: 1.3733067512512207 | accuracy: 0.739881
FunctionalCNN validation: | loss: 1.438635626293364 | accuracy: 0.77916664
ClassCNN validation: | loss: 1.3686876785187494 | accuracy: 0.75238097 |
I believe it can be handled this way, which sounds net and clean, isn't it? |
System information
Describe the feature and the current behavior/state.
Currently output shapes inference of all layers is essentially disabled if model is subclassed, which results in a rather vague "multiple" in model summary. Example:
After investigating source code I understand why this happens from a technical point of view, but it is quite frustrating for an end user, especially given that it is quite trivial to infer the shapes manually with given information. I think there should be a way to force
layer.output_shape
recalculation ifmodel.build()
is manually called. There is of course a risk of mismatching actual input shapes, but that would be responsibility of the.build()
caller to enforce.Alternatively, allow runtime input(s) shape argument to be passed to
model.summary()
, that could be used for one-time shape detection and validation.Will this change the current api? How?
No, not directly. Could possibly affect internal checks if
layer.output_shape
is manually overriden. However, currently that parameter is completely unused for subclassed models, so risk should be low.Who will benefit with this feature?
End users of subclassed model providers who could be interested in inspecting model topology.
Any Other info.
I am willing to contribute either option with a little bit of guidance on the direction.
The text was updated successfully, but these errors were encountered: