---

### 🧱 BONUS: Writing Custom TensorFlow Layers and Models

> 🧠 **Why would we override `build()` at all?**
>
> Keras automatically tracks input shapes and will call `build()` for you behind the scenes the first time a layer sees real input data. You don’t usually need to override it unless you’re writing a **custom layer** where the shape of trainable weights depends on the shape of the input. For example, a custom dense layer might not know the input size until it sees it. Overriding `build()` lets you defer weight creation until the input shape is known.
>
> If you're using standard layers like `Dense`, `build()` is handled internally and you won't need to touch it. But for your ViT implementation, you’ll likely need to override `build()` at least once.

In Homework 5, you'll be asked to implement a Vision Transformer from scratch using custom TensorFlow layers. That means you'll need to:
- Create your own subclasses of `tf.keras.layers.Layer`
- Understand how and when to use `__init__()`, `build()`, and `call()`
- Construct models that support `.summary()`

Let’s walk through a toy example: a simple multilayer perceptron (MLP).

#### ✏️ Step 1: Define a Custom Layer with `build()`

In [1]:
import tensorflow as tf

class ScaledDense(tf.keras.layers.Layer):
    def __init__(self, units):
        super().__init__()
        self.units = units

    def build(self, input_shape):
        self.w = self.add_weight("weights", shape=(input_shape[-1], self.units), initializer="random_normal")
        self.b = self.add_weight("bias", shape=(self.units,), initializer="zeros")

    def call(self, inputs):
        return tf.nn.relu(tf.matmul(inputs, self.w) + self.b)

2025-04-17 22:10:02.943048: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-04-17 22:10:02.943085: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-04-17 22:10:02.943767: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-17 22:10:02.948711: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


- `__init__`: stores configuration (like number of units)
- `build`: creates weights **based on the shape of the input**
- `call`: defines the forward computation

#### 🧪 Step 2: Use It in a Model

We wrap this custom layer into a `tf.keras.Model`.

In [2]:
class SimpleMLP(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.dense1 = ScaledDense(16)
        self.dense2 = ScaledDense(2)

    def call(self, x):
        x = self.dense1(x)
        return self.dense2(x)

#### 📏 Step 3: Understand `.build()` and `.summary()`

In [3]:
mlp = SimpleMLP()

# Try to print summary before the model is built
try:
    mlp.summary()
except Exception as e:
    print("Expected error when calling summary before build:", e)

# Now build it with a known input shape
mlp.build(input_shape=(None, 8))
mlp.summary()

Expected error when calling summary before build: This model has not yet been built. Build the model first by calling `build()` or by calling the model on a batch of data.
Model: "simple_mlp"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 scaled_dense (ScaledDense)  multiple                  144       
                                                                 
 scaled_dense_1 (ScaledDens  multiple                  34        
 e)                                                              
                                                                 
Total params: 178 (712.00 Byte)
Trainable params: 178 (712.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


2025-04-17 22:10:13.814837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20200 MB memory:  -> device: 0, name: NVIDIA L4, pci bus id: 0000:3c:00.0, compute capability: 8.9


This is **exactly the kind of debugging you may need** when implementing your Vision Transformer in HW5.

✅ Now you’ve seen how to:
- Build models with custom layers
- Handle shape dependencies with `.build()`
- Produce readable `.summary()` outputs

You’ll use these same ideas in Homework 5, especially for the ViT implementation.