# Generating Training Data using Metadynamics

Enhanced sampling methods like metadynamics and umbrella sampling allow the generation of more diverse datasets for training machine-learned interatomic potentials (MLIPs). These methods accelerate the exploration of configuration space by overcoming energy barriers more efficiently than traditional molecular dynamics (MD) simulations.

In IPSuite, these enhanced sampling techniques are implemented using the open-source software PLUMED.

Metadynamics relies on the selection of collective variables (CVs) to characterize the relevant configuration space. These CVs can be defined in a PLUMED input file or passed directly as a list of strings via this node.

In this example, we will construct a system of alanine dipeptide and bias the trosion angles $\psi$ and $\phi$ using metadynamics and use the MACE-MP0 model for all calculations.

To use this IPSuite node, please install the latest PLUMED version by following the installation [instructions](https://www.plumed.org/doc-v2.7/user-doc/html/_installation.html#installingpython) or using conda `conda install -c conda-forge py-plumed`.
You might need to `export CC=gcc` and `export CXX=g++` followed by `export PLUMED_KERNEL=/.../plumed2/bin/lib/libplumedKernel.so` for `uv add plumed`.

In [1]:
# temporary directory for testing
import os
from pathlib import Path

os.chdir("/ssd/fzills/tmp")

In [2]:
import ipsuite as ips

project = ips.Project()
mace = ips.MACEMPModel()

2025-05-19 14:47:30,961 (DEBUG): Welcome to IPS - the Interatomic Potential Suite!


Firstly we need to create of Box containing a molecule of alanine dipeptide and then optimize the geometry to start with a relaxed structure.

In [3]:
with project.group("System_Creation"):
    mol = ips.Smiles2Atoms(smiles="CNC(=O)[C@H](C)NC(C)=O")
    geoopt = ips.ASEGeoOpt(
        data=mol.frames, model=mace, optimizer="FIRE", run_kwargs={"fmax": 0.05}
    )



Now we can initialize the plumed calculator

In [4]:
thermostat = ips.LangevinThermostat(
    time_step=0.5,
    temperature=300,
    friction=0.01,
)

FILE = """
FLUSH STRIDE=10000
phi: TORSION ATOMS=8,7,5,3
psi: TORSION ATOMS=7,5,3,2
restraint: METAD ARG=phi,psi SIGMA=0.35,0.35 HEIGHT=1.2 BIASFACTOR=8 \
           PACE=400 FILE=HILLS GRID_MIN=-pi,-pi GRID_MAX=pi,pi
PRINT ARG=phi,psi FILE=COLVAR STRIDE=1
"""
with Path("plumed.dat").open("w") as f:
    f.write(FILE)

When setting values in the plumed setup it is important to note that Lengths are in Angstroms, time is in femtoseconds and energy is in kJ/mol. Additionally, when defining atom indices (e.g., `phi: TORSION ATOMS=8,7,5,3`), PLUMED uses 1-based indexing, meaning numbering starts from 1 instead of 0.

In [5]:
with project.group("METAD"):
    calc = ips.PlumedModel(
        model=mace,
        data=geoopt.frames,
        data_id=-1,
        config="plumed.dat",
        timestep=0.5,
        temperature=300,
    )

    md = ips.ASEMD(
        model=calc,
        data=geoopt.frames,
        thermostat=thermostat,
        steps=4_000_000,
    )

In [6]:
project.build()

2025-05-19 14:47:31,341 - INFO: Saving params.yaml


100%|██████████| 4/4 [00:00<00:00, 230.18it/s]


For troubleshooting purposes, a copy of the PLUMED setup file is saved in the node's working directory. PLUMED results are stored in the `data/` folder, which includes the COLVAR and HILLS files.