## Load the conda environment from our last class

```bash
conda activate atomistic-ml-class
```


## Part 1

## Introduction to machine learning interatomic potentials

In this part, we will learn how to train and use machine learning interatomic potentials (MLIPs) to run molecular dynamics. Specifically we will use graph neural networks (GNNs) as our frameworks for our MLIPs, focusing on two lightweight yet expressive models: PaiNN and SchNet architectures. You can find the original papers for these architectures here:

- [PaiNN](https://arxiv.org/abs/2102.03150)
- [SchNet](https://pubs.aip.org/aip/jcp/article/148/24/241722/962591/SchNet-A-deep-learning-architecture-for-molecules)

Unlike descriptor-based models, GNNs learn directly from atomic graphs with position and feature information.

Both can be trained on consumer hardware (e.g., laptops), making them suitable for hands-on learning.

We will be using graph-pes to handle training and running the models. graph-pes is a Python package that provides a simple interface for training and using GNNs for interatomic potentials. You can find the documentation for graph-pes here:

- [graph-pes documentation](https://jla-gardner.github.io/graph-pes/)

Take a look at the training documentation here:

- [graph-pes training documentation](https://jla-gardner.github.io/graph-pes/cli/graph-pes-train/complete-docs.html)


In [None]:
from load_atoms import load_dataset

# load structures and split the data into training, validation and test
structures = load_dataset("../Class-1/structures_filt.xyz")

# alternatively, you can load the C-GAP-17 dataset
# structures = load_dataset("C-GAP-17")


train, val, test = structures.random_split([0.8, 0.1, 0.1], seed=42)

In [2]:
from ase.io import write

write("train.xyz", train)
write("valid.xyz", val)
write("test.xyz", test)

We provide two configuration input files (in `.yaml` format) for `graphPES` for SchNet and NequIP. Check out their structure, and finish the missing parts to train your first GNN MLIP! Feel free to experiment with different values for hyperparameters such as the `cutoff`, the number of `radial_features` and the number of `layers`. Change the number of training configurations between 10 and 500 to construct the learning curves.


In [None]:
# train a grapPES model from the command line
# if training with the C-GAP-17 dataset, set the number of training configuration to ~1000

!graph-pes-train train_SchNet.yaml general/run_id=train-SchNet

[graph-pes INFO]: Started `graph-pes-train` at 2025-05-23 12:05:21.580
Traceback (most recent call last):
  File "/Users/chihebbenmahmoud/source/miniconda3/envs/graphPES/lib/python3.10/site-packages/graph_pes/config/shared.py", line 78, in instantiate_config_from_dict
    dacite.from_dict(
  File "/Users/chihebbenmahmoud/source/miniconda3/envs/graphPES/lib/python3.10/site-packages/dacite/core.py", line 69, in from_dict
    value = _build_value(type_=field_type, data=data[key], config=config)
  File "/Users/chihebbenmahmoud/source/miniconda3/envs/graphPES/lib/python3.10/site-packages/dacite/core.py", line 107, in _build_value
    data = from_dict(data_class=type_, data=data, config=config)
  File "/Users/chihebbenmahmoud/source/miniconda3/envs/graphPES/lib/python3.10/site-packages/dacite/core.py", line 74, in from_dict
    raise WrongTypeError(field_path=field.name, field_type=field_type, value=value)
dacite.exceptions.WrongTypeError: wrong value type for field "data.train" - should be 

Once the model(s) has finished training, you can load it and evaluate its performance on the training and test sets


In [None]:
# load graphPES models
from graph_pes.models import load_model
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
best_model = (
    load_model("/path/to/trained/mode")  # load the model
    .to(device)  # move to GPU if available
    .eval()  # set to evaluation mode
)

# setup ASE calculator
calculator = best_model.ase_calculator()

# example of getting energies and forces using the calculator
frm = test[0]
frm.calc = calculator
energy_pred = frm.get_potential_energy()
froces_pred = frm.get_forces()

# perform the same for all structures in the test set and find the performance meteric like you did in Day 1
# examine scatter plots of reference energies and forces vs ML predicted quantities
...

Now you can redo the same steps to train a NequIP model (takes ~3 times longer on CPU to obtain a more accurate MLIP). To do so, use the configuration file `train_NequIP.yaml`


# Part 2


In this sectino, we will be running molecular dynamics (MD) simulations using the trained MLIPs of Part 1 and the Atomic Simulation Environment (ASE)


First of all, we test our the stability of the trained MLIPs by annealing (holding at constant temperature) a few starting configurations at 300K.


In [None]:
# load the dataset from Day 1:

starting_configs = load_atoms("/path/to/first/day/carbon/dataset")

In [None]:
# load the trained MLIP

from graph_pes.models import load_model
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
best_model = (
    load_model("/path/to/trained/mode")  # load the model
    .to(device)  # move to GPU if available
    .eval()  # set to evaluation mode
)

# setup ASE calculator
calculator = best_model.ase_calculator()

# test with a few indices for differenet densities and starting configurations
initial_frame = starting_configs[0]

# setup calculator
initial_frame.calc = calculator

In [None]:
# let's run MD

import ase
from ase.md.velocitydistribution import (
    MaxwellBoltzmannDistribution,
    ZeroRotation,
    Stationary,
)
from ase.md import MDLogger
from ase.md.nvtberendsen import NVTBerendsen

Tinit = 300  # K

md_params = {
    "timestep": 0.5 * ase.units.fs,  # MD timestep
    "taut": 100 * 0.5 * ase.units.fs,  # thermostat time constant
}
total_md_steps = ...  # make sure change this

# initialize velocities
MaxwellBoltzmannDistribution(initial_frame, temperature_K=Tinit)
Stationary(initial_frame)
ZeroRotation(initial_frame)

# initialize dynamics object
dyn = NVTBerendsen(initial_frame, temperature_K=Tinit, **md_params)


# write trajectory function
def write_frame():
    dyn.atoms.write(
        f"/path/to/produced/trajectory.xyz", append=True
    )  # make sure the extension is xyz


dyn.attach(write_frame, interval=100)  # set the frequency of writing to trajctory file

# setup the logger
dyn.attach(
    MDLogger(
        dyn,  # dynamics object
        initial_frame,  # intial configuration
        f"path/to/log",
        peratom=True,
        mode="a",
    ),
    interval=100,  # frequency of writing the log
)

# run the MD
dyn.run(total_md_steps)

Taking the last frame from the previous simulation, perform a metl-quench simulation. Below we provide the code to perform such simulation. Perform the simulation and answer questions 3-5


In [None]:
# let's do a melt-quench
# let's run MD

import ase
from ase.md.velocitydistribution import (
    MaxwellBoltzmannDistribution,
    ZeroRotation,
    Stationary,
)
from ase.md import MDLogger
from ase.md.nvtberendsen import NVTBerendsen

import numpy as np

Tinit = 9000  # K
Tfin = 300  # K

md_params = {
    "timestep": 0.5 * ase.units.fs,  # MD timestep
    "taut": 100 * 0.5 * ase.units.fs,  # thermostat time constant
}

# starting config
initial_frame = load_atoms("/path/to/file")[-1]


total_melt_steps = ...  # number of the mlet steps
total_quench_steps = ...  # number of steps to go from Tinit to Tfin
quench_temperatures = np.linspace(Tinit, Tfin, total_quench_steps // 10)

# initialize velocities
MaxwellBoltzmannDistribution(initial_frame, temperature_K=Tinit)
Stationary(initial_frame)
ZeroRotation(initial_frame)

# initialize dynamics object
dyn = NVTBerendsen(initial_frame, temperature_K=Tinit, **md_params)


# write trajectory function
def write_frame():
    dyn.atoms.write(
        f"/path/to/produced/trajectory.xyz", append=True
    )  # make sure the extension is xyz


dyn.attach(write_frame, interval=100)  # set the frequency of writing to trajctory file

dyn.attach(
    MDLogger(
        dyn,  # dynamics object
        initial_frame,  # intial configuration
        f"path/to/log",
        peratom=True,
        mode="a",
    ),
    interval=100,  # frequency of writing the log
)

# run the melt
dyn.run(total_melt_steps)

for t in quench_temperatures:
    dyn.set_temperature(t)
    dyn.run(10)

In [None]:
# perform your analysis on the produced trajectory
...