# Generating Training Data using Metadynamics

Enhanced sampling methods like metadynamics and umbrella sampling allow the generation of more diverse datasets for training machine-learned interatomic potentials (MLIPs). These methods accelerate the exploration of configuration space by overcoming energy barriers more efficiently than traditional molecular dynamics (MD) simulations.

In IPSuite, these enhanced sampling techniques are implemented using the open-source software PLUMED.

Metadynamics relies on the selection of collective variables (CVs) to characterize the relevant configuration space. These CVs can be defined in a PLUMED input file or passed directly as a list of strings via this node.

In this example, we will construct a system of alanine dipeptide and bias the trosion angles $\psi$ and $\phi$ using metadynamics and use the MACE-MP0 model for all calculations.

To use this IPSuite node, please install the latest PLUMED version by following the installation instructions or using conda `conda install -c conda-forge py-plumed`

In [1]:
import mace_models
from ase import units

import ipsuite as ips

project = ips.Project(remove_existing_tree=True)
mace = mace_models.load()

2025-04-03 10:38:18,749 (DEBUG): Welcome to IPS - the Interatomic Potential Suite!


  _Jd, _W3j_flat, _W3j_indices = torch.load(os.path.join(os.path.dirname(__file__), 'constants.pt'))


cuequivariance or cuequivariance_torch is not available. Cuequivariance acceleration will be disabled.

        You're using the MACE-MP-0 model. The model is released under the MIT license.
        Note:
        If you are using this model, please cite the relevant paper for the Materials Project,
        any paper associated with the MACE model, and also the following:
        - MACE-Universal by Yuan Chiang, 2023, Hugging Face, Revision e5ebd9b,
            DOI: 10.57967/hf/1202, URL: https://huggingface.co/cyrusyc/mace-universal
        - Matbench Discovery by Janosh Riebesell, Rhys EA Goodall, Philipp Benner, Yuan Chiang,
            Alpha A Lee, Anubhav Jain, Kristin A Persson, 2023, arXiv:2308.14920
        - https://arxiv.org/abs/2401.00096
           


Firstly we need to create of Box containing a molecule of alanine dipeptide and then optimize the geometry to start with a relaxed structure.

In [3]:
with project.group("System_Creation"):
    mol = ips.Smiles2Atoms(smiles="CNC(=O)[C@H](C)NC(C)=O")
    geoopt = ips.ASEGeoOpt(
        data=mol.frames, model=mace, optimizer="FIRE", run_kwargs={"fmax": 0.05}
    )

Now we can initialize the plumed calculator

In [None]:
thermostat = ips.LangevinThermostat(
    time_step=0.5 * units.fs,
    temperature=300,
    friction=0.5 / units.fs,
)

setup = [
    "FLUSH STRIDE=10000",
    "phi: TORSION ATOMS=8,7,5,3",
    "psi: TORSION ATOMS=7,5,3,2",
    (
        "restraint: METAD ARG=phi,psi "
        "SIGMA=0.35,0.35 HEIGHT=1.2 BIASFACTOR=8 PACE=400 "
        "FILE=data/HILLS GRID_MIN=-pi,-pi GRID_MAX=pi,pi"
    ),
    "PRINT ARG=phi,psi FILE=data/COLVAR STRIDE=1",
]

When setting values in the plumed setup it is important to note that Lengths are in Angstroms, time is in femtoseconds and energy is in kJ/mol. Additionally, when defining atom indices (e.g., `phi: TORSION ATOMS=8,7,5,3`), PLUMED uses 1-based indexing, meaning numbering starts from 1 instead of 0.

In [None]:
with project.group("METAD"):
    calc = ips.PlumedCalculator(
        model=mace,
        data=geoopt.frames,
        data_id=-1,
        input_string=setup,
        timestep=0.5 * units.fs,
        temperature=300,
    )

    md = ips.ASEMD(
        model=calc,
        data=geoopt.frames,
        thermostat=thermostat,
        steps=4_000_000,
    )

In [6]:
project.build()

2025-04-03 10:38:57,732 - INFO: Saving params.yaml


100%|██████████| 6/6 [00:01<00:00,  3.57it/s]


For troubleshooting purposes, a copy of the PLUMED setup file is saved in the node's working directory. PLUMED results are stored in the `data/` folder, which includes the COLVAR and HILLS files.