### Train with NeMo

[Neural Modules (NeMo)](https://nvidia.github.io/NeMo/index.html) is a framework-agnostic toolkit for building AI applications. It currently supports the PyTorch framework.

Using NeMo to train a PyTorch model is simple. In this notebook, we will demonstrate how to use NeMo to train the Asian Barrier Option pricing model

In [None]:
!pip install Cython
!pip install nemo_toolkit[all]==1.0.0rc1

Collecting nemo_toolkit[all]==1.0.0rc1
[?25l  Downloading https://files.pythonhosted.org/packages/2e/0f/530a28abf8d50df9a731948a5749f467fc06701f780fd13ca788036541a0/nemo_toolkit-1.0.0rc1-py3-none-any.whl (737kB)
[K     |████████████████████████████████| 747kB 17.3MB/s 
Collecting omegaconf>=2.0.5
  Downloading https://files.pythonhosted.org/packages/d0/eb/9d63ce09dd8aa85767c65668d5414958ea29648a0eec80a4a7d311ec2684/omegaconf-2.0.6-py3-none-any.whl
Collecting torchtext==0.8.0
[?25l  Downloading https://files.pythonhosted.org/packages/26/8a/e09b9b82d4dd676f17aa681003a7533765346744391966dec0d5dba03ee4/torchtext-0.8.0-cp37-cp37m-manylinux1_x86_64.whl (6.9MB)
[K     |████████████████████████████████| 7.0MB 59.1MB/s 
Collecting pangu
  Downloading https://files.pythonhosted.org/packages/48/77/b52fac2ca4e4596f22dd6200b99ad515fb64b1ae7d3a12325b45b11e2a67/pangu-4.0.6.1-py3-none-any.whl
Collecting sentencepiece<1.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb94

Defining the trainable module is similar to defining a PyTorch module but it defines the input and output ports:-

In [None]:
%%writefile nemo_model.py
import torch.nn as nn
import torch.nn.functional as F
import torch
from nemo.core.neural_types import BatchTag, ChannelTag, NeuralType, AxisType
import nemo

class Net(nemo.backends.pytorch.nm.TrainableNM):
#class Net(nn.Module):
    @staticmethod
    def create_ports():
        input_ports = {"x": NeuralType({0: AxisType(BatchTag),
                                        1: AxisType(ChannelTag, 6)})}
        output_ports = {"y_pred": NeuralType({0: AxisType(BatchTag),
                                              1: AxisType(ChannelTag, 1)})}
        return input_ports, output_ports

    def __init__(self, hidden=512, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.fc1 = nn.Linear(6, hidden)
        self.fc2 = nn.Linear(hidden, hidden)
        self.fc3 = nn.Linear(hidden, hidden)
        self.fc4 = nn.Linear(hidden, hidden)
        self.fc5 = nn.Linear(hidden, hidden)
        self.fc6 = nn.Linear(hidden, 1)
        self.register_buffer('norm',
                             torch.tensor([200.0,
                                           198.0,
                                           200.0,
                                           0.4,
                                           0.2,
                                           0.2]))

    def forward(self, x):
        x = x / self.norm
        x = F.elu(self.fc1(x))
        x = F.elu(self.fc2(x))
        x = F.elu(self.fc3(x))
        x = F.elu(self.fc4(x))
        x = F.elu(self.fc5(x))
        return self.fc6(x)

Overwriting nemo_model.py


The NeMo DataLayer module is wrapped around the normal PyTorch Dataset:-

In [None]:
%%writefile nemo_datalayer.py
import torch
import nemo
from nemo.core.neural_types import BatchTag, ChannelTag, NeuralType, AxisType


class OptionDataSet(torch.utils.data.Dataset):
    def __init__(self, filename, rank=0, world_size=5):
        tensor = torch.load(filename)
        self.tensor = (tensor[0], tensor[1])
        self.length = len(self.tensor[0]) // world_size
        self.world_size = world_size
        self.rank = rank

    def __getitem__(self, index):
        index = index * self.world_size + self.rank

        return self.tensor[0][index], self.tensor[1][index]

    def __len__(self):
        return self.length

class OptionDataLayer(nemo.backends.pytorch.nm.DataLayerNM):
    @staticmethod
    def create_ports():
        # Note: we define the size of the height and width of our output
        # tensors, and thus require a size parameter.
        input_ports = {}
        output_ports = {
            "x": NeuralType({0: AxisType(BatchTag),
                                 1: AxisType(ChannelTag, 6)}),
            "ground": NeuralType({0: AxisType(BatchTag)})
        }
        return input_ports, output_ports

    def __init__(self, filename, rank=0, world_size=5, **kwargs):
        super().__init__(**kwargs)
        self._dataset = OptionDataSet(filename, rank, world_size)

    def __len__(self):
        return len(self._dataset)

    @property
    def dataset(self):
        return self._dataset

    @property
    def data_iterator(self):
        return None

Overwriting nemo_datalayer.py


We define the Loss Neural Module as following, which wraps around the PyTorch MSELoss with added input and output types:-

In [None]:
%%writefile nemo_losslayer.py
import nemo
from nemo.core.neural_types import BatchTag, ChannelTag, NeuralType, AxisType
import torch

class MSELoss(nemo.backends.pytorch.nm.LossNM):
    @staticmethod
    def create_ports():
        input_ports = {"y_pred": NeuralType({0: AxisType(BatchTag),
                                             1: AxisType(ChannelTag, 1)}),
                       "ground": NeuralType({0: AxisType(BatchTag)})}
        output_ports = {"loss": NeuralType(None)}
        return input_ports, output_ports

    def __init__(self, **kwargs):
        # Neural Module API specific
        super().__init__(**kwargs)
        # End of Neural Module API specific
        self._loss = torch.nn.MSELoss()

    # You need to implement this function
    def _loss_function(self, **kwargs):
        v = self._loss(kwargs['y_pred'][:,0], kwargs['ground'])
        return v

Overwriting nemo_losslayer.py


In [None]:
!pip install nemo-asr # installs NeMo ASR collection
!pip install nemo-nlp # installs NeMo NLP collection
!pip install nemo-tts # installs NeMo TTS collection

Collecting nemo-asr
[?25l  Downloading https://files.pythonhosted.org/packages/70/39/6fad2fc5d56ca74e10fc2974b38427551d7410f5fe6c1428515b1ccc7bf9/nemo_asr-0.9.0-py3-none-any.whl (45kB)
[K     |███████▎                        | 10kB 25.7MB/s eta 0:00:01[K     |██████████████▌                 | 20kB 33.2MB/s eta 0:00:01[K     |█████████████████████▊          | 30kB 23.9MB/s eta 0:00:01[K     |█████████████████████████████   | 40kB 27.7MB/s eta 0:00:01[K     |████████████████████████████████| 51kB 5.8MB/s 
Collecting num2words
[?25l  Downloading https://files.pythonhosted.org/packages/eb/a2/ea800689730732e27711c41beed4b2a129b34974435bdc450377ec407738/num2words-0.5.10-py3-none-any.whl (101kB)
[K     |████████████████████████████████| 102kB 6.5MB/s 
Installing collected packages: num2words, nemo-asr
Successfully installed nemo-asr-0.9.0 num2words-0.5.10
Collecting nemo-nlp
[?25l  Downloading https://files.pythonhosted.org/packages/53/69/1da3c4c37b645480e93e10c0c9e488f036104717

Collecting nemo-tts
  Downloading https://files.pythonhosted.org/packages/cf/c6/2738e41e74140db2123026adf5387ba8bbdf11b1e52fd6c81ab69f34f5b7/nemo_tts-0.9.0-py3-none-any.whl
Installing collected packages: nemo-tts
Successfully installed nemo-tts-0.9.0


To use Neural Modules, we need to following 3 steps:-

1. Creation of NeuralModuleFactory and necessary NeuralModule
2. Defining a Directed Acyclic Graph (DAG) of NeuralModule
3. Call to “action” such as train

In [None]:
import nemo
from nemo.core import DeviceType
from nemo_model import Net
from nemo_datalayer import OptionDataLayer
from nemo_losslayer import MSELoss
nf = nemo.core.NeuralModuleFactory()
# nf = nemo.core.NeuralModuleFactory()
dl= OptionDataLayer('trn.pth', 0, 1, batch_size=32)

# instantiate necessary neural modules
fx = Net(hidden=512).cuda() #, placement=DeviceType.GPU)
loss = MSELoss()

# describe activation's flow
x, y = dl()
p = fx(x=x)
lss = loss(y_pred=p, ground=y)

# SimpleLossLoggerCallback will print loss values to console.
callback = nemo.core.SimpleLossLoggerCallback(
    tensors=[lss],
    print_func=lambda x: print(f'Train Loss: {str(x[0].item())}'))

# Invoke "train" action
nf.train([lss], callbacks=[callback],
         optimization_params={"num_epochs": 20, "lr": 0.0003},
         optimizer="adam")

ImportError: ignored

NVIDIA Volta and Turing GPUs have Tensor Cores which can do fast matrix multiplications with values in float16 format. To enable mixed-precision in NeMo all you need to do is to set the optimization_level parameter of nemo.core.NeuralModuleFactory to nemo.core.Optimization.mxprO1. For example:

In [None]:
nf = nemo.core.NeuralModuleFactory(optimization_level=nemo.core.Optimization.mxprO1)

For multi-GPU training, follow two steps in NeMo:
1. Set placement to nemo.core.DeviceType.AllGpu in NeuralModuleFactory
2. Add the ‘local_rank’ argument to your script and do not set it yourself: parser.add_argument(“–local_rank”, default=None, type=int)

In [None]:
%%writefile nemo_dis_train.py
import nemo
from nemo.core import DeviceType
from nemo_model import Net
from nemo_datalayer import OptionDataLayer
from nemo_losslayer import MSELoss
import argparse
import os
parser = argparse.ArgumentParser(description='ResNet50 on ImageNet')
parser.add_argument("--local_rank", default=None, type=int)

args = parser.parse_args()

if args.local_rank is not None:
    device = nemo.core.DeviceType.AllGpu
else:
    device = nemo.core.DeviceType.GPU
    
world_size = int(os.environ['WORLD_SIZE'])

nf = nemo.core.NeuralModuleFactory(backend=nemo.core.Backend.PyTorch,
    local_rank=args.local_rank,
    placement=device,                               
    optimization_level=nemo.core.Optimization.mxprO1)
# nf = nemo.core.NeuralModuleFactory()
dl= OptionDataLayer('trn.pth', args.local_rank, world_size, batch_size=32)

# instantiate necessary neural modules
# RealFunctionDataLayer defaults to f=torch.sin, sampling from x=[-4, 4]
fx = Net(hidden=512).cuda() #, placement=DeviceType.GPU)
loss = MSELoss()

# describe activation's flow
x, y = dl()
p = fx(x=x)
lss = loss(y_pred=p, ground=y)

# SimpleLossLoggerCallback will print loss values to console.
callback = nemo.core.SimpleLossLoggerCallback(
    tensors=[lss],
    print_func=lambda x: print(f'Train Loss: {str(x[0].item())}'))

# Invoke "train" action
nf.train([lss], callbacks=[callback],
         optimization_params={"num_epochs": 20, "lr": 0.0003},
         optimizer="adam")

Overwriting nemo_dis_train.py


In [None]:
!python -m torch.distributed.launch --nproc_per_node=4 nemo_dis_train.py

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Doing distributed training
Doing distributed training
Doing distributed training
Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
m

The [callback API](https://nvidia.github.io/NeMo/tutorials/callbacks.html) makes setting up check points and evaluating the validation dataset easy. Interested readers please check the document for details.