# Training NNP

In this tutorial, we provide instructions on training a coarse-grained model using torchmd-net.
To execute the training of NNP install `torchmd-net` repo: https://github.com/torchmd/torchmd-net

# Training Data

The arrays conataing coorinates and forces for CA atoms of the proteins described in the papers can be downloaded in `../Datasets`


For simplicity we will follow a reduced exmaple of the smallest protein Chignolin. First download the files:

Coordinates:

In [None]:
!wget pub.htmd.org/protein_thermodynamics_data/training_data/chignolin_ca_coords.npy

And delta-forces. Delta-forces are produced by substraction of prior forces form true forces produced during the MD runs. The prior forces are computed based on `Bonded, RepusionCG, Dihedral` force terms. For details on functional form of these force terms look in Supporting Information of the publication. The prior force field used for computation are provided in `prior_force_firld.yaml`. For a tutorial on how to construct the forcefield and compute delta forces look in torchmd-cg repo. https://github.com/torchmd/torchmd-cg


Here we provide already computed delta-forces:

In [None]:
!wget pub.htmd.org/protein_thermodynamics_data/training_data/chignolin_ca_deltaforces.npy

# Embeddings

Aminoacids are encoded as integers. The embedding dictionary we used in this project:

In [None]:
AA2INT = {'ALA':1,
         'GLY':2,
         'PHE':3,
         'TYR':4,
          'ASP':5,
          'GLU':6,
          'TRP':7,
          'PRO':8,
          'ASN':9,
          'GLN':10,
          'HIS':11,
          'HSE':11,
          'HSD':11,
          'SER':12,
          'THR':13,
          'VAL':14,
          'MET':15,
          'CYS':16,
          'NLE':17,
          'ARG':19,
          'LYS':20,
          'LEU':21,
          'ILE':22
         }

In [None]:
! wget pub.htmd.org/protein_thermodynamics_data/training_data/chignolin_ca_embeddings.npy

# Training

In the next step, we use coordinates and delta-forces to train the network. 

SchNet architecture, applied here, learns the features using continuous filter convolutions on a graph neural network and predicts the forces and energy of the system. 

A set of parameters in the configuration file `train.yaml` is listed here:

```yaml
activation: tanh
batch_size: 256
inference_batchsize: 256
dataset: Custom
coord_files: "chignolin_ca_coords.npy"
embed_files: "chignolin_ca_embeddings.npy"
force_files: "chignolin_ca_deltaforces.npy"
cutoff_upper: 12.0
cutoff_lower: 3.0
derivative: true
distributed_backend: ddp
early_stopping_patience: 30
embedding_dimension: 128
label:
- forces
lr: 0.0005
lr_factor: 0.8
lr_min: 1.0e-06
lr_patience: 10
lr_warmup_steps: 0
model: graph-network
neighbor_embedding: false
ngpus: -1
num_epochs: 100
num_layers: 4
num_nodes: 1
num_rbf: 18
num_workers: 8
rbf_type: expnorm
save_interval: 2
seed: 1
test_interval: 2
test_ratio: 0.1
trainable_rbf: true
val_ratio: 0.05
weight_decay: 0.0
```

In case on training on multiple protein datasets use:

```yaml
coord_files: "data/*coords*.npy"
embed_files: "data/*embeddings.npy"
force_files: "data/*deltaforces*.npy"
```

Now we will go through options in a configuration file:

* training input files locations are defined in parameters: `coords`, `forces` and `embeddings`
* `log_dir` - output folder
* `lr` - initial value of learning rate 
* `num_epochs` - number of epochs run during the training
* `batch_size` - batch size
* `lr` - initial value of learning rate. The learning rate is optimized with `torch.optim.lr_scheduler.ReduceLROnPlateau` scheduler with parameters: `lr_patience`, `lr_min` and `lr_factor`
* `distributed_backend` - specifies distributed_backend pytorch-ligtning. Here `dp` (Data Parallel) is adjusted for training on multiple-gpus (`gpus`) and 1 machine (`num_nodes`). Other options include:
    * Data Parallel (`distributed_backend='dp'`)(multiple-gpus, 1 machine)
    * DistributedDataParallel (`distributed_backend=’ddp’`) (multiple-gpus across many machines (python script based)).
    * DistributedDataParallel (`distributed_backend=’ddp_spawn’`) (multiple-gpus across many machines (spawn based)).
    * DistributedDataParallel 2 (`distributed_backend=’ddp2’`) (DP in a machine, DDP across machines).
    * Horovod (`distributed_backend=’horovod’`) (multi-machine, multi-gpu, configured at runtime)
* `gpus` - number of GPUs used in training. Specified as a number of required units (eg. `4`) or a list of cuda devices (eg. `[0, 2, 3]')
* `num_nodes` - number of machines used
* `num_workers` - number of workers in data loader
* `seed` for the calculation
* `eval_interval` - evaluation interval
* `save_interval` - saving interval
* `progress` - Progress bar during batching
* `val_ratio` - Percentual of validation set
* `test_ratio` - Percentual of test set
* Finally schnet-specific parameters: 
    * `num_filters`
    * `num_gaussians`
    * `num_interactions`
    * `max_z`
    * `cutoff`

Training is done using python script and can be run by a simple command:

```bash
mkdir train_light
CUDA_VISIBLE_DEVICES=0 python $PATH/torchmdnet/scripts/train.py --conf train.yaml --log-dir train_light
```
where `$PATH` is the path to your `torchmd-net` repo.

The training saves 8 best epochs. The progress of the training is saved in TensorBoard session. The training and validation curves for the training of full Chignolin dataset are presented here: