In [None]:
Implement a custom layer that performs layer normalization (we will use this type of layer in Chapter 15):

a. The build() method should define two trainable weights α and β, both of shape input_shape[-1:] and data type tf.float32. α should be initialized with 1s, and β with 0s.

b. The call() method should compute the mean μ and standard deviation σ of each instance’s features. For this, you can use tf.nn.moments(inputs, axes=-1, keepdims=True), which returns the mean μ and the variance σ2 of all instances (compute the square root of the variance to get the standard deviation). Then the function should compute and return α ⊗ (X – μ)/(σ + ε) + β, where ⊗ represents itemwise multiplication (*) and ε is a smoothing term (a small constant to avoid division by zero, e.g., 0.001).

c. Ensure that your custom layer produces the same (or very nearly the same) output as the tf.keras.layers.LayerNormalization layer.

In [3]:
import tensorflow as tf

In [66]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.boston_housing.load_data()

In [88]:
class MyLayerNormalization(tf.keras.layers.Layer):
    def build(self, input_shape):
        self.alpha = self.add_weight(
            name='alpha',
            shape=input_shape[-1:],
            dtype=tf.float32,
            initializer='ones'
        )
        self.beta = self.add_weight(
            'beta',
            shape=input_shape[-1:],
            dtype=tf.float32,
            initializer='zeros'
        )
        super().build(input_shape)

    def call(self, inputs):
        epsilon = 0.0001
        mean, var = tf.nn.moments(inputs, axes=(-1,), keepdims=True)
        stdev = tf.sqrt(var + epsilon)
        return tf.multiply(self.alpha, (inputs - mean) / stdev) + self.beta

In [89]:
my_layer = MyLayerNormalization()
keras_layer = tf.keras.layers.LayerNormalization()

In [90]:
tf.reduce_mean(tf.losses.mean_absolute_error(my_layer(x_train), keras_layer(x_train)))

<tf.Tensor: shape=(), dtype=float32, numpy=2.4591845e-08>