# understand Module

In [1]:
#pragma cling add_include_path("../../libtorch/include")
#pragma cling add_include_path("../../libtorch/include/torch/csrc/api/include")
#pragma cling add_library_path("../../libtorch/lib")
#pragma cling load("libtorch")

In [2]:
#include <iostream>
#include <tuple>
#include <string>
#include <vector>
#include <memory>
#include <functional>
#include <type_traits>
#include <torch/torch.h>
#include <torch/script.h>
namespace nn = torch::nn;

* Part I: how to use torch::nn::Module
* Part II: understand it by implement it

# Part I: how to use torch::nn::Module

## 1 Defining the Neural Network Models

In line with the Python interface, neural networks based on the C++ frontend are composed of reusable building blocks called modules. There is a base module class from which all other modules are derived. In Python, this class is torch.nn.Module and in C++ it is torch::nn::Module. Besides a forward() method that implements the algorithm the module encapsulates, a module usually contains any of three kinds of sub-objects: parameters, buffers and submodules.
    
Parameters and buffers store state in form of tensors. Parameters record gradients, while buffers do not. Parameters are usually the trainable weights of your neural network. Examples of buffers include means and variances for batch normalization. In order to re-use particular blocks of logic and state, the PyTorch API allows modules to be nested. A nested module is termed a submodule.
    
Parameters, buffers and submodules must be explicitly registered. Once registered, methods like parameters() or buffers() can be used to retrieve a container of all parameters in the entire (nested) module hierarchy. Similarly, methods like to(...), where e.g. to(torch::kCUDA) moves all parameters and buffers from CPU to CUDA memory, work on the entire module hierarchy.

## 1.1 Defining a Module and Registering Parameters
To put these words into code, let’s consider this simple module written in the Python interface:

~~~
import torch

class Net(torch.nn.Module):
  def __init__(self, N, M):
    super(Net, self).__init__()
    self.W = torch.nn.Parameter(torch.randn(N, M))
    self.b = torch.nn.Parameter(torch.randn(M))

  def forward(self, input):
    return torch.addmm(self.b, input, self.W)
~~~

In C++, it would look like this:

In [3]:
class NetWithParameter : torch::nn::Module {
  NetWithParameter(int64_t N, int64_t M) {
    W = register_parameter("W", torch::randn({N, M}));
    b = register_parameter("b", torch::randn(M));
  }
  torch::Tensor forward(torch::Tensor input) {
    return torch::addmm(b, input, W);
  }
  torch::Tensor W, b;
};

Just like in Python, we define a class called Net (for simplicity here a struct instead of a class) and derive it from the module base class. Inside the constructor, we create tensors using torch::randn just like we use torch.randn in Python. One interesting difference is how we register the parameters. In Python, we wrap the tensors with the torch.nn.Parameter class, while in C++ we have to pass the tensor through the register_parameter method instead. The reason for this is that the Python API can detect that an attribute is of type torch.nn.Parameter and automatically registers such tensors. In C++, reflection is very limited, so a more traditional (and less magical) approach is provided.

## 1.2 Registering Submodules and Traversing the Module Hierarchy

In the same way we can register parameters, we can also register submodules. In Python, submodules are automatically detected and registered when they are assigned as an attribute of a module:

~~~
class Net(torch.nn.Module):
  def __init__(self, N, M):
      super(Net, self).__init__()
      # Registered as a submodule behind the scenes
      self.linear = torch.nn.Linear(N, M)
      self.another_bias = torch.nn.Parameter(torch.rand(M))

  def forward(self, input):
    return self.linear(input) + self.another_bias
~~~
This allows, for example, to use the parameters() method to recursively access all parameters in our module hierarchy:
~~~
>>> net = Net(4, 5)
>>> print(list(net.parameters()))
[Parameter containing:
tensor([0.0808, 0.8613, 0.2017, 0.5206, 0.5353], requires_grad=True), Parameter containing:
tensor([[-0.3740, -0.0976, -0.4786, -0.4928],
        [-0.1434,  0.4713,  0.1735, -0.3293],
        [-0.3467, -0.3858,  0.1980,  0.1986],
        [-0.1975,  0.4278, -0.1831, -0.2709],
        [ 0.3730,  0.4307,  0.3236, -0.0629]], requires_grad=True), Parameter containing:
tensor([ 0.2038,  0.4638, -0.2023,  0.1230, -0.0516], requires_grad=True)]
~~~

To register submodules in C++, use the aptly named register_module() method to register a module like torch::nn::Linear:

In [4]:
class Net : public torch::nn::Module {
    public:
    Net(int64_t N, int64_t M):linear(register_module("linear", torch::nn::Linear(N, M))) {
    another_bias = register_parameter("b", torch::randn(M));
  }
    
  torch::Tensor forward(torch::Tensor input) {
    return linear(input) + another_bias;
  }
    
  torch::nn::Linear linear;
  torch::Tensor another_bias;
};

One subtlety about the above code is why the submodule was created in the constructor’s initializer list, while the parameter was created inside the constructor body. There is a good reason for this, which we’ll touch upon this in the section on the C++ frontend’s ownership model further below. The end result, however, is that we can recursively access our module tree’s parameters just like in Python. Calling parameters() returns a std::vector<torch::Tensor>, which we can iterate over:

In [5]:
int main() {
  Net net(4, 5);
    
  std::vector<torch::Tensor> parameters = net.parameters(); 
  for (const auto& p : parameters) {
    std::cout << p << std::endl;
  }
    
  torch::OrderedDict<std::string, torch::Tensor> ordered_parameter_dict = net.named_parameters();
  for (const auto& pair : ordered_parameter_dict) {
  std::cout << pair.key() << ": " << pair.value() << std::endl;
}
};

In [6]:
main();

 1.5935
-1.1713
 0.7192
-0.9615
-0.7630
[ CPUFloatType{5} ]
-0.0294  0.1393 -0.1203  0.1185
-0.3770 -0.4968 -0.0325  0.1521
 0.2261  0.2344  0.2131 -0.1853
-0.4308  0.3387 -0.2980 -0.0603
 0.4718 -0.2516 -0.0566 -0.3367
[ CPUFloatType{5,4} ]
 0.0899
 0.2099
 0.2911
-0.3102
 0.4213
[ CPUFloatType{5} ]
b:  1.5935
-1.1713
 0.7192
-0.9615
-0.7630
[ CPUFloatType{5} ]
linear.weight: -0.0294  0.1393 -0.1203  0.1185
-0.3770 -0.4968 -0.0325  0.1521
 0.2261  0.2344  0.2131 -0.1853
-0.4308  0.3387 -0.2980 -0.0603
 0.4718 -0.2516 -0.0566 -0.3367
[ CPUFloatType{5,4} ]
linear.bias:  0.0899
 0.2099
 0.2911
-0.3102
 0.4213
[ CPUFloatType{5} ]


## 1.3 Running the Network in Forward Mode
To execute the network in C++, we simply call the forward() method we defined ourselves:

In [9]:
int main() {
  Net net(4, 5);
  std::cout << net.forward(torch::ones({2, 4})) << std::endl;
};

In [10]:
main();

 1.0008  1.6886 -0.4196 -1.2001 -1.4664
 1.0008  1.6886 -0.4196 -1.2001 -1.4664
[ CPUFloatType{2,5} ]


# 2 Module Ownership

At this point, we know how to define a module in C++, register parameters, register submodules, traverse the module hierarchy via methods like parameters() and finally run the module’s forward() method. While there are many more methods, classes and topics to devour in the C++ API, I will refer you to docs for the full menu. We’ll also touch upon some more concepts as we implement the DCGAN model and end-to-end training pipeline in just a second. Before we do so, let me briefly touch upon the ownership model the C++ frontend provides for subclasses of torch::nn::Module.

**For this discussion, the ownership model refers to the way modules are stored and passed around – which determines who or what owns a particular module instance. In Python, objects are always allocated dynamically (on the heap) and have reference semantics.** This is very easy to work with and straightforward to understand. In fact, in Python, you can largely forget about **where objects live and how they get referenced**, and focus on getting things done.

C++, being a lower level language, provides more options in this realm. This increases complexity and heavily influences the design and ergonomics of the C++ frontend. In particular, for modules in the C++ frontend, **we have the option of using either value semantics or reference semantics.** The first case is the simplest and was shown in the examples thus far: **module objects are allocated on the stack and when passed to a function**, can be either copied, moved (with std::move) or taken by reference or by pointer:

In [None]:
struct Net : torch::nn::Module { };

void a(Net net) { }
void b(Net& net) { }
void c(Net* net) { }

int main() {
  Net net;
  a(net);
  a(std::move(net));
  b(net);
  c(&net);
}

For the second case – reference semantics – we can use std::shared_ptr. The advantage of reference semantics is that, like in Python, it reduces the cognitive overhead of thinking about how modules must be passed to functions and how arguments must be declared (assuming you use shared_ptr everywhere).

In [None]:
struct Net : torch::nn::Module {};

void a(std::shared_ptr<Net> net) { }

int main() {
  auto net = std::make_shared<Net>();
  a(net);
}

In our experience, researchers coming from dynamic languages greatly prefer reference semantics over value semantics, even though the latter is more “native” to C++. It is also important to note that torch::nn::Module’s design, in order to stay close to the ergonomics of the Python API, relies on shared ownership. For example, take our earlier (here shortened) definition of Net:

In [None]:
struct Net : torch::nn::Module {
  Net(int64_t N, int64_t M)
    : linear(register_module("linear", torch::nn::Linear(N, M)))
  { }
  torch::nn::Linear linear;
};

**in order to use the linear submodule, we want to store it directly in our class. However, we also want the module base class to know about and have access to this submodule. For this, it must store a reference to this submodule. At this point, we have already arrived at the need for shared ownership. Both the torch::nn::Module class and concrete Net class require a reference to the submodule. For this reason, the base class stores modules as shared_ptrs, and therefore the concrete class must too.**

But wait! I don’t see any mention of shared_ptr in the above code! Why is that? Well, because std::shared_ptr<MyModule> is a hell of a lot to type. To keep our researchers productive, we came up with an elaborate scheme to hide the mention of shared_ptr(**the PIMPL technique**) – **a benefit usually reserved for value semantics – while retaining reference semantics.** To understand how this works, we can take a look at a simplified definition of the torch::nn::Linear module in the core library (the full definition is here):

In [None]:
struct LinearImpl : torch::nn::Module {
  LinearImpl(int64_t in, int64_t out);

  Tensor forward(const Tensor& input);

  Tensor weight, bias;
};

TORCH_MODULE(Linear);

In brief: the module is not called Linear, but LinearImpl. A macro, TORCH_MODULE then defines the actual Linear class. This “generated” class is effectively a wrapper over a std::shared_ptr<LinearImpl>. It is a wrapper instead of a simple typedef so that, among other things, constructors still work as expected, i.e. you can still write torch::nn::Linear(3, 4) instead of std::make_shared<LinearImpl>(3, 4). We call the class created by the macro the module holder. Like with (shared) pointers, you access the underlying object using the arrow operator (like model->forward(...)). The end result is an ownership model that resembles that of the Python API quite closely. **Reference semantics become the default, but without the extra typing of std::shared_ptr or std::make_shared.** For our Net, using the module holder API looks like this:

In [None]:
struct NetImpl : torch::nn::Module {};
TORCH_MODULE(Net);

void a(Net net) { }

int main() {
  Net net;
  a(net);
}

There is one subtle issue that deserves mention here. A default constructed std::shared_ptr is “empty”, i.e. contains a null pointer. What is a default constructed Linear or Net? Well, it’s a tricky choice. We could say it should be an empty (null) std::shared_ptr<LinearImpl>. However, recall that Linear(3, 4) is the same as std::make_shared<LinearImpl>(3, 4). This means that if we had decided that Linear linear; should be a null pointer, then there would be no way to construct a module that does not take any constructor arguments, or defaults all of them. For this reason, in the current API, a default constructed module holder (like Linear()) invokes the default constructor of the underlying module (LinearImpl()). If the underlying module does not have a default constructor, you get a compiler error. To instead construct the empty holder, you can pass nullptr to the constructor of the holder.

In practice, this means you can use submodules either like shown earlier, where the module is registered and constructed in the initializer list:

In [None]:
struct Net : torch::nn::Module {
  Net(int64_t N, int64_t M)
    : linear(register_module("linear", torch::nn::Linear(N, M)))
  { }
  torch::nn::Linear linear;
};

or you can first construct the holder with a null pointer and then assign to it in the constructor (more familiar for Pythonistas):

In [None]:
struct Net : torch::nn::Module {
  Net(int64_t N, int64_t M) {
    linear = register_module("linear", torch::nn::Linear(N, M));
  }
  torch::nn::Linear linear{nullptr}; // construct an empty holder
};

In conclusion: Which ownership model – which semantics – should you use? The C++ frontend’s API best supports the ownership model provided by module holders. The only disadvantage of this mechanism is one extra line of boilerplate below the module declaration. That said, the simplest model is still the value semantics model shown in the introduction to C++ modules. For small, simple scripts, you may get away with it too. But you’ll find sooner or later that, for technical reasons, it is not always supported. For example, the serialization API (torch::save and torch::load) only supports module holders (or plain shared_ptr). As such, the module holder API is the recommended way of defining modules with the C++ frontend, and we will use this API in this tutorial henceforth.

* conclusion 1: if you define your net, like class Net: public torch:nn::Module, then use the instance of this class, you should use the value seantics.

* conclusion 2: if you define your net, like class NetImpl: public torch:nn::Module, then TORCH_MODULE(Net); you can use reference seantics.


# reference

https://pytorch.org/tutorials/advanced/cpp_frontend.html