# Amanda: DNN Instrumentation Tool Tutorial

For the purpose of this notebook, we will build an instrumentation tool with the Amanda framework step by step. 
With this example, we demonstrate how to implement instrumentation tools with Amanda‘s APIs and applied them to different DNN models.

Firstly, please install the dependencies and Amanda following the installation instructions in [README](../../../README.md).


## Prepare a CNN model

We start the example by defining a simple convolution neural network (CNN) model with the [PyTorch](https://pytorch.org/) machine learning library.

In [1]:
import torch
import torch.nn as nn

class ConvNeuralNet(nn.Module):
    def __init__(self, num_classes):
        super(ConvNeuralNet, self).__init__()
        self.conv_layer1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
        self.conv_layer2 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3)
        self.max_pool1 = nn.MaxPool2d(kernel_size = 2, stride = 2)
        
        self.conv_layer3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
        self.conv_layer4 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3)
        self.max_pool2 = nn.MaxPool2d(kernel_size = 2, stride = 2)
        
        self.fc1 = nn.Linear(1600, 128)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(128, num_classes)
    
    def forward(self, x):
        out = self.conv_layer1(x)
        out = self.conv_layer2(out)
        out = self.max_pool1(out)
        
        out = self.conv_layer3(out)
        out = self.conv_layer4(out)
        out = self.max_pool2(out)
                
        out = out.reshape(out.size(0), -1)
        
        out = self.fc1(out)
        out = self.relu1(out)
        out = self.fc2(out)
        return out

This network is executed (forward propagation) with the following lines.
It will call the `forward` function of the `ConvNeuralNet` object to process the defined operators.
With out any loss of generality, we randomly initialize the input sample.

In [2]:
X = torch.rand((1, 3, 32, 32))
model = ConvNeuralNet(num_classes=10)

Y = model(X)
print(Y)

tensor([[ 0.0697, -0.0384,  0.0884, -0.0934, -0.0392,  0.1024,  0.1233,  0.0373,
          0.0444,  0.0481]], grad_fn=<AddmmBackward>)


## Convolution operator counting tool

The previous code shows a typical scenario of how we define and process a DNN model.
Tt is common for us to conduct some analysis and debug tasks on the model.
For example, we may want to get the execution trace of the operators or dump the output tensor of a particular operator.
To begin with, we show a example of counting the counting the occurrence of convolution operators.
Intuitively, this can be done by going through the source code or insert codes to the DNN model definition.
A better way is to use the module hook API which we will discuss in [the latter of this notebook](#module-hook).
However, this methods are coupled with the DNN source code and cannot be generalized to other analysis tasks.

To this end, we borrow the wisdom of instrumentation concept from programming analysis.
As such DNN tasks can be implemented with the DNN model instrumentation abstraction.
The example instrumentation tool to count the convolution operators is defined as following.

In [3]:
import amanda

class CountOPTool(amanda.Tool):
    def __init__(self, op_name: str):
        super().__init__()
        self.counter = 0
        self.op_name = op_name
        self.add_inst_for_op(self.callback)

    # analysis routine, filter conv2d operators
    def callback(self, context: amanda.OpContext):
        op = context.get_op()
        if self.op_name in op.__name__:
            context.insert_before_op(self.counter_op)

    # instrumentation routine: op for counting
    def counter_op(self, *inputs):
        self.counter += 1
        return inputs




> Conceptually, instrumentation consists of two components:
>
> - A mechanism that decides where and what code is inserted
> - The code to execute at insertion points
>  
> These two components are instrumentation and analysis code.
> 
> (from [Pin documentation](https://software.intel.com/sites/landingpage/pintool/docs/98484/Pin/html/index.html))

These two components are implemented as analysis routine and instrumentation routine in the previous Amanda instrumentation tool.
In this example, the analysis routine filters out the convolution operators and insert instrumentation routine as operators before them.
The instrumentation routine is an operator that accumulate the counter.
With this DNN instrumentation programming model, we can implement much complex instrumentation tools for different DNN tasks.

This instrumentation tool can be applied to the DNN execution process with the `amanda.apply(tool: amanda.Tool)` API.
All the DNN model executed within this context is instrumented by the framework.

In [4]:
tool = CountOPTool("conv2d")

with amanda.apply(tool):
    Y = model(X)
    print(f"Calls of conv2d op: {tool.counter}")

Calls of conv2d op: 4


## Instrument the backward process

Nextly, we extend the instrumentation concept to the backward process of DNN process.
This is also the fundamental difference of DNN instrumentation compared to traditional program instrumentation, where there is only one target program.
In DNN execution, there are two programs to instrument, which are the forward process and backward process.

To instrument the backward process, one just needs to specify the `backward` argument of `amanda.Tool.add_inst_for_op()` to True. Here we continue to count the operators in the backward graph.

In [5]:
class CountOPTool(amanda.Tool):
    def __init__(self, op_name: str, backward_op_name: str):
        super().__init__()
        self.counter = 0
        self.backward_counter = 0
        self.op_name = op_name
        self.backward_op_name = backward_op_name
        self.add_inst_for_op(self.callback)
        self.add_inst_for_op(self.backward_callback, backward=True, require_outputs=True)

    # analysis routine, filter conv2d operators
    def callback(self, context: amanda.OpContext):
        op = context.get_op()
        if self.op_name in op.__name__:
            context.insert_before_op(self.counter_op)

    # analysis routine, filter conv2d operators
    def backward_callback(self, context: amanda.OpContext):
        op = context.get_backward_op()
        if self.backward_op_name in op.__name__:
            context.insert_after_backward_op(self.counter_backward_op)

    # instrumentation routine: op for counting
    def counter_op(self, *inputs):
        self.counter += 1
        return inputs
    
    def counter_backward_op(self, *inputs):
        self.backward_counter += 1
        return inputs

Similarly, we can apply this updated counter tool to the DNN execution.
Note that a explicit backward process is invoked.

In [6]:
tool = CountOPTool(op_name="conv2d", backward_op_name="Conv")
X = torch.rand((1, 3, 32, 32))
model = ConvNeuralNet(10)

with amanda.tool.apply(tool):
    Y = model(X)
    Y.backward(torch.rand_like(Y))

    print(f"Calls of conv2d op: {tool.counter}, backward op: {tool.backward_counter}")

Calls of conv2d op: 4, backward op: 4


More importantly, the operators in forward progress and backward progress have correspondence.
We show a one-to-many case in [graph mapping part](#forward-and-backward-graph-mapping) of this notebook.

## One tool for all models

With this well-defined instrumentation tool, we can easily inspect and locate the occurrence of an operator in any model. This tool is decoupled to the original DNN execution and portable to models. Here we show the effect on more DNN models.

ResNet:

In [7]:
from torchvision.models import resnet50

x = torch.rand((1, 3, 227, 227))
model = resnet50()

tool = CountOPTool(op_name="conv2d", backward_op_name="Conv")

with amanda.tool.apply(tool):

    y = model(x)
    y.backward(torch.rand_like(y))
    print(f"Calls of conv2d op: {tool.counter}, backward op: {tool.backward_counter}")


Calls of conv2d op: 53, backward op: 53


For the transformer-based BERT model, we count the execution number of linear operators.

BERT:

In [8]:
from transformers import BertModel

x = torch.randint(0, 10, (1,8))
model = BertModel.from_pretrained('bert-base-uncased')

tool = CountOPTool(op_name="linear", backward_op_name="Mm")

with amanda.tool.apply(tool):
    y = model(x)
    y[0].backward(torch.rand_like(y[0]))
    print(f"Calls of linear op: {tool.counter}, backward op: {tool.backward_counter}")


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Calls of linear op: 73, backward op: 72


## Instrumentation in Tensorflow graph mode

Nextly, we showcase the operator counting DNN instrumentation tool in graph mode of Tensorflow machine learning library.
The major different of graph mode execution is that the DNN model is first build/compiled to a static computation graph, while the operators in eager mode is executed right away.
As a matter of fact, this graph building process resembles the just-in-time (JIT) compiling process of program instrumentation.
The analysis routines are invoked during graph building and the instrumentation routines are inserted to the computation graph.
With this design, the user level instrumentation interface remains identical.
The operator counting instrumentation tool is defined as following.

In [9]:
class CountOPTool(amanda.Tool):
    def __init__(self, op_name: str, backward_op_name: str):
        super().__init__()
        self.counter = 0
        self.backward_counter = 0
        self.op_name = op_name
        self.backward_op_name = backward_op_name
        self.add_inst_for_op(self.callback)
        self.add_inst_for_op(self.backward_callback, backward=True, require_outputs=True)

    # analysis routine, filter conv2d operators
    def callback(self, context: amanda.OpContext):
        op = context.get_op()
        if self.op_name in op.name:
            context.insert_before_op(self.counter_op)

    # analysis routine, filter conv2d operators
    def backward_callback(self, context: amanda.OpContext):
        op = context.get_backward_op()
        if self.backward_op_name in op.name:
            context.insert_after_backward_op(self.counter_backward_op)

    # instrumentation routine: op for counting
    def counter_op(self, *inputs):
        self.counter += 1
        return inputs
    
    def counter_backward_op(self, *inputs):
        self.backward_counter += 1
        return inputs

Similarly, the instrumentation tool is applied to the model forward and backward process with the `amanda.tool.apply()` API.

In [10]:
import tensorflow as tf
from examples.common.tensorflow.model.resnet_50 import ResNet50

tf.logging.set_verbosity(tf.logging.ERROR)

model = ResNet50()
x = tf.random.uniform(shape=[8, 224, 224, 3])

tool = CountOPTool(op_name="Conv2D", backward_op_name="Conv2DBackpropFilter")

with amanda.tool.apply(tool):
    y = model(x)
    with tf.Session() as session:
        session.run(tf.initialize_all_variables())
        g = tf.gradients(y, x)

        session.run(g)

    print(f"Calls of conv2d op: {tool.counter}, backward op: {tool.backward_counter}")

Calls of conv2d op: 106, backward op: 53


## Context mapping tool

In the previous operator counting instrumentation tools of PyTorch and TensorFlow, the only difference is how the operator name metadata is kept in each context.
In PyTorch, the name is accessed through `op.__name__`.
While in TensorFLow, the name is access through `op.name`.
As such, it is possible to utilize Amanda's context mapping mechanism to further improve the portability of the operator counting tool.

The instrumentation tool consumes the instrumentation context of the DNN operators and returns an updated context.
Amanda supports the dependencies instrumentation tools such that higher level instrumentation tool relies on the transformed context handled by low level tools.
As such, a special mapping tool is used to cope with the context mapping by defining context mapping rules between name spaces.
The mapping rules of operator name for PyTorch and TensorFlow namespaces are defined as following.

In [11]:
from amanda.tools.mapping import MappingTool

def torch_op_name_rule(context: amanda.OpContext):
    context["op_name"] = context.get_op().__name__
    context["backward_op_name"] = context.get_backward_op().__name__ if context.get_backward_op() is not None else None


def tf_op_name_rule(context: amanda.OpContext):
    context["op_name"] = context.get_op().name if context.get_op() is not None else None
    context["backward_op_name"] = context.get_backward_op().name if context.get_backward_op() is not None else None

mapping_tool = MappingTool(
    rules=[
        ["pytorch", torch_op_name_rule],
        ["tensorflow", tf_op_name_rule],
    ]
)

We update the `CountOPTool` with the `MappingTool` of rules dealing with the naming convention of different frameworks.
This reflects the rationale of Amanda to unify the programming model and interface while offloading case-by-case conversions for reuse.
Finally, we get the final version of the operator counting tool.

In [12]:
class CountOPTool(amanda.Tool):
    def __init__(self, op_name: str, backward_op_name: str):
        super().__init__()

        # specify tool dependencies
        self.depends_on(mapping_tool)

        self.counter = 0
        self.backward_counter = 0
        self.op_name = op_name
        self.backward_op_name = backward_op_name
        self.add_inst_for_op(self.callback)
        self.add_inst_for_op(self.backward_callback, backward=True, require_outputs=True)

    # analysis routine, filter conv2d operators
    def callback(self, context: amanda.OpContext):
        if self.op_name in context["op_name"]:
            context.insert_before_op(self.counter_op)

    # analysis routine, filter conv2d operators
    def backward_callback(self, context: amanda.OpContext):
        if self.backward_op_name in context["backward_op_name"]:
            context.insert_after_backward_op(self.counter_backward_op)

    # instrumentation routine: op for counting
    def counter_op(self, *inputs):
        self.counter += 1
        return inputs
    
    def counter_backward_op(self, *inputs):
        self.backward_counter += 1
        return inputs

PyTorch:

In [13]:
from torchvision.models import resnet50

x = torch.rand((1, 3, 227, 227))
model = resnet50()

tool = CountOPTool(op_name="conv2d", backward_op_name="Conv")

with amanda.tool.apply(tool):

    y = model(x)
    y.backward(torch.rand_like(y))
    print(f"Execution time of conv2d op: {tool.counter}, backward op: {tool.backward_counter}")

Execution time of conv2d op: 53, backward op: 53


TensorFlow:

In [14]:
import tensorflow as tf
from examples.common.tensorflow.model.resnet_50 import ResNet50

model = ResNet50()
x = tf.random.uniform(shape=[8, 224, 224, 3])

tool = CountOPTool(op_name="Conv2D", backward_op_name="Conv2DBackpropFilter")

with amanda.tool.apply(tool):
    y = model(x)
    with tf.Session() as session:
        session.run(tf.initialize_all_variables())
        g = tf.gradients(y, x)

        session.run(g)
print(tool.counter, tool.backward_counter)

212 159


It should be mentioned here that the tool dependency and context mapping mechanism is supposed to facilitate the modularization and reuse of instrumentation tools.
Unifying the context between different machine learning libraries is just one of its usage.

## Extend to complex task: DNN Pruning

Finally, we show case how to extend the basic instrumentation tool to a complex DNN optimization task, DNN pruning.
Here, we showcase the weight pruning algorithm with tensor-wise magnitude pruning.
The DNN weight parameters are pruned statically based on the magnitude score of each value independently.
The following function accepts a weight tensor and return its pruning mask by selecting the positions with smallest magnitude values.

In [15]:
def compute_mask(tensor, sparsity):
    with torch.no_grad():
        flattened_tensor = tensor.view(-1)

        num_elements_to_prune = len(flattened_tensor) * sparsity

        _, indices = torch.topk(flattened_tensor, num_elements_to_prune, largest=False)

        mask = torch.zeros_like(flattened_tensor)
        mask[indices] = 1

        flattened_tensor.view(tensor.size())

    return mask

With this pruning function, a pruning instrumentation tool is implemented easily.
In the analysis routine, we filter the target operators of convolution and linear layers.
And we calculate the pruning mask with the previous function and inject the mask to the operator context.
A pruning operator, which multiplies the weight tensor and the pruning mask, is also inserted before the operator execution.

In [16]:
class PruningTool(amanda.Tool):
    def __init__(self):
        super().__init__()
        self.add_inst_for_op(self.callback)

    # analysis routine
    def callback(self, context: amanda.OpContext):
        op = context.get_op()
        if op.__name__ not in ["conv2d", "linear"]:
            return
        weight = context.get_inputs()[1]
        mask = compute_mask(weight)
        context["mask"] = mask
        context.insert_before_op(self.prune_weight, inputs=[1], mask=mask)

    # instrumentation routine
    def prune_weight(self, weight, mask):
        return torch.mul(weight, mask)

## Problems with PyTorch Module Hook

In the following, we show how the usage of the basic module hook API and its problems.

In [17]:
def add_module_hooks(model, op_name, hook):
    if model.__class__.__name__ == op_name:
        model.register_forward_hook(hook)

    if isinstance(model, nn.Module):
        for child_name, child in model.named_children():
            add_module_hooks(child, op_name, hook)

X = torch.rand((1, 3, 32, 32))
model = ConvNeuralNet(num_classes=10)

add_module_hooks(model, 'ReLU', lambda m,i,o: print("hooking relu"))

Y = model(X)

hooking relu


By traversing the DNN module object recursively and register forward hook on the target operator, one can also insert code to a particular operator.

However, this fails when the DNN is not defined with the `Module` API. 
For example, the following CNN model is nearly identical to the one we used at beginning, by only changing the declaration method of the ReLU activation function.
And this time, the module hook fails to provide the desired entry point to the operator.

In [18]:
class ConvNeuralNet(nn.Module):
    def __init__(self, num_classes):
        super(ConvNeuralNet, self).__init__()
        self.conv_layer1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
        self.conv_layer2 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3)
        self.max_pool1 = nn.MaxPool2d(kernel_size = 2, stride = 2)
        
        self.conv_layer3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
        self.conv_layer4 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3)
        self.max_pool2 = nn.MaxPool2d(kernel_size = 2, stride = 2)
        
        self.fc1 = nn.Linear(1600, 128)
        self.fc2 = nn.Linear(128, num_classes)
    
    def forward(self, x):
        out = self.conv_layer1(x)
        out = self.conv_layer2(out)
        out = self.max_pool1(out)
        
        out = self.conv_layer3(out)
        out = self.conv_layer4(out)
        out = self.max_pool2(out)
                
        out = out.reshape(out.size(0), -1)
        
        out = self.fc1(out)
        out = torch.relu(out)
        out = self.fc2(out)
        return out
    
X = torch.rand((1, 3, 32, 32))
model = ConvNeuralNet(num_classes=10)

add_module_hooks(model, 'ReLU', lambda m,i,o: print("hooking relu"))

Y = model(X)

While, the previous define instrumentation tool still coverage this operator no matter the adopted API. This is a very common case for real world networks, especially for the operation without a parameter, for example, the matrix multiplication between Q and V activation tensor in the attention mechanism.

In [19]:
X = torch.rand((1, 3, 32, 32))
model = ConvNeuralNet(num_classes=10)

tool = CountOPTool(op_name="relu", backward_op_name="Relu")

with amanda.tool.apply(tool):
    Y = model(X)
    Y.backward(torch.rand_like(Y))
    print(f"Calls of relu op: {tool.counter}, backward op: {tool.backward_counter}")

Calls of relu op: 1, backward op: 1


## Forward and backward graph mapping

As mentioned earlier, the major difference of DNN instrumentation is the existence of the backward graph.
One particular corner case is that one forward operator might invoke multiple backward operators for its gradients propagation.
Here we use the RNN operator as an example.
The whole forward process is fused to a large rnn operator while multiple backward operators are launched following its internal calculation logic.

In [27]:
# One to many mapping with rnn example

x = torch.rand(16, 2, 128)
model = torch.nn.RNN(input_size=128, hidden_size=128, num_layers=4, batch_first=False)

tool = CountOPTool(op_name="rnn", backward_op_name="mul")

with amanda.tool.apply(tool):

    y = model(x)
    y[0].backward(torch.rand_like(y[0]))
    print(f"Call of rnn op: {tool.counter}, backward op: {tool.backward_counter}")

Call of rnn op: 1, backward op: 16
