模型构造

让我们回顾一下在 3.10节（“多层感知机的简洁实现”）一节中含单隐藏层的多层感知机的实现方法。我们首先构造Sequential实例，然后依次添加两个全连接层。其中第一层的输出大小为256，即隐藏层单元个数是256；第二层的输出大小为10，即输出层单元个数是10。我们在上一章的其他节中也使用了Sequential类构造模型。这里我们介绍另外一种基于tf.keras.Model类的模型构造方法：它让模型构造更加灵活。

4.1.1 build model from block

tf.keras.Model类是tf.keras模块里提供的一个模型构造类，我们可以继承它来定义我们想要的模型。下面继承tf.keras.Model类构造本节开头提到的多层感知机。这里定义的MLP类重载了tf.keras.Model类的__init__函数和call函数。它们分别用于创建模型参数和定义前向计算。前向计算也即正向传播。

import tensorflow as tf
import numpy as np
print(tf.__version__)

2.0.0

class MLP(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.flatten = tf.keras.layers.Flatten()    # Flatten层将除第一维（batch_size）以外的维度展平
        self.dense1 = tf.keras.layers.Dense(units=256, activation=tf.nn.relu)
        self.dense2 = tf.keras.layers.Dense(units=10)

    def call(self, inputs):         
        x = self.flatten(inputs)   
        x = self.dense1(x)    
        output = self.dense2(x)     
        return output

以上的MLP类中无须定义反向传播函数。系统将通过自动求梯度而自动生成反向传播所需的backward函数。

我们可以实例化MLP类得到模型变量net。下面的代码初始化net并传入输入数据X做一次前向计算。其中，net(X)将调用MLP类定义的call函数来完成前向计算。

X = tf.random.uniform((2,20))
net = MLP()
net(X)

<tf.Tensor: id=62, shape=(2, 10), dtype=float32, numpy=
array([[ 0.15637134,  0.14062534, -0.11187253, -0.13151687,  0.12066578,
         0.15376692,  0.03429577,  0.07023033, -0.12030508, -0.38496107],
       [-0.02877349,  0.1088542 , -0.20668823,  0.08241277,  0.06292161,
         0.25310248,  0.04884301,  0.27015388, -0.13183925, -0.23431192]],
      dtype=float32)>

4.1.2 Sequential

我们刚刚提到，tf.keras.Model类是一个通用的部件。事实上，Sequential类继承自tf.keras.Model类。当模型的前向计算为简单串联各个层的计算时，可以通过更加简单的方式定义模型。这正是Sequential类的目的：它提供add函数来逐一添加串联的Block子类实例，而模型的前向计算就是将这些实例按添加的顺序逐一计算。

我们用Sequential类来实现前面描述的MLP类，并使用随机初始化的模型做一次前向计算。

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256, activation=tf.nn.relu),
    tf.keras.layers.Dense(10),
])

model(X)

<tf.Tensor: id=117, shape=(2, 10), dtype=float32, numpy=
array([[-0.42563885, -0.11981717,  0.0838763 ,  0.04553887,  0.09710997,
         0.16843301,  0.15290505, -0.00364013, -0.13743742, -0.36868355],
       [-0.37125233, -0.18243487,  0.24916942, -0.04006755,  0.06090571,
         0.05331742,  0.24555533, -0.03183865, -0.10122052, -0.11752242]],
      dtype=float32)>

4.1.3 build complex model

虽然Sequential类可以使模型构造更加简单，且不需要定义call函数，但直接继承tf.keras.Model类可以极大地拓展模型构造的灵活性。下面我们构造一个稍微复杂点的网络FancyMLP。在这个网络中，我们通过constant函数创建训练中不被迭代的参数，即常数参数。在前向计算中，除了使用创建的常数参数外，我们还使用tensor的函数和Python的控制流，并多次调用相同的层。

class FancyMLP(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.flatten = tf.keras.layers.Flatten()
        self.rand_weight = tf.constant(
            tf.random.uniform((20,20)))
        self.dense = tf.keras.layers.Dense(units=20, activation=tf.nn.relu)

    def call(self, inputs):         
        x = self.flatten(inputs)   
        x = tf.nn.relu(tf.matmul(x, self.rand_weight) + 1)
        x = self.dense(x)    
        while tf.norm(x) > 1:
            x /= 2
        if tf.norm(x) < 0.8:
            x *= 10
        return tf.reduce_sum(x)

在这个FancyMLP模型中，我们使用了常数权重rand_weight（注意它不是模型参数）、做了矩阵乘法操作（tf.matmul）并重复使用了相同的Dense层。下面我们来测试该模型的随机初始化和前向计算。

net = FancyMLP()
net(X)

<tf.Tensor: id=220, shape=(), dtype=float32, numpy=24.381481>

因为FancyMLP和Sequential类都是tf.keras.Model类的子类，所以我们可以嵌套调用它们。

class NestMLP(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.net = tf.keras.Sequential()
        self.net.add(tf.keras.layers.Flatten())
        self.net.add(tf.keras.layers.Dense(64, activation=tf.nn.relu))
        self.net.add(tf.keras.layers.Dense(32, activation=tf.nn.relu))
        self.dense = tf.keras.layers.Dense(units=16, activation=tf.nn.relu)

    
    def call(self, inputs):         
        return self.dense(self.net(inputs))

net = tf.keras.Sequential()
net.add(NestMLP())
net.add(tf.keras.layers.Dense(20))
net.add(FancyMLP())

net(X)

<tf.Tensor: id=403, shape=(), dtype=float32, numpy=3.2303767>

注：本节除了代码之外与原书基本相同，原书传送门

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4.1_model-construction.md

4.1_model-construction.md

模型构造

4.1.1 build model from block

4.1.2 Sequential

4.1.3 build complex model

Files

4.1_model-construction.md

Latest commit

History

4.1_model-construction.md

File metadata and controls

模型构造

4.1.1 build model from block

4.1.2 Sequential

4.1.3 build complex model