# PyTorch: Control Flow + Weight Sharing

To showcase the power of PyTorch dynamic graphs, we will implement a very strange model: a fully connected ReLU network that on each forward pass randomly chooses a number between 1 and 4 and has that many hidden layers, reusing the same weights multipl tomes to compute the innermost hidden layers.

In [4]:
import random
import torch 

class DynamicNet(torch.nn.Module):
    def __init__(self, input_dimension, hidden_dimension, output_dimension):
        """
        In the constructor we construct three nn.Linear instances that we will use 
        in the forward pass.
        """
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(input_dimension, hidden_dimension)
        self.middle_linear = torch.nn.Linear(hidden_dimension, hidden_dimension)
        self.output_linear = torch.nn.Linear(hidden_dimension, output_dimension)
    
    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 0, 1, 2, or 3 
        and reuse the middle_linear Module that many times to compute hidden layer 
        representations.
        
        Since each forward pass build a dynamic computation graph, we can use normal
        Pythin control-flow operators like loops or condotional statements when 
        defining the forward pass of the model.
        
        Here we also see that it is perfectly safe to reuse the same Module many
        times when defining a computational graph. This is a big improvement from Lua
        Torch, where each Module could be used only once. 
        """
        
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0,3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred
    
batch_size = 64
input_dimension = 1000
hidden_dimension = 100
output_dimension = 10

#Create random Tensors to hole inputs and outputs 
x = torch.randn(batch_size, input_dimension)
y = torch.randn(batch_size, output_dimension)

#Construct our model instantiating the class defined above 
model = DynamicNet(input_dimension, hidden_dimension, output_dimension)

#Contruct our loss funciton and an Optimizer. Training this strange model with 
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum = 0.9)
for n in range(500):
    #Forward pass: Compute predicted y by passing x to the model 
    y_pred = model(x)
    
    #Compute and print loss 
    loss = criterion(y_pred, y)
    print(n, loss.item())
    
    #Zero gradients, perfomr a backward pass, and update the weights.
    optimizer.zero_grad()