# Demo 2: Defining reproducible and serializable OpenMM simulation parameters
`polymerist` supplies a number of a data containers which aim to facilitate [TRUE](https://www.tandfonline.com/doi/full/10.1080/00268976.2020.1742938) workflows (Transparent, Reproducible, Usable, and Extensible, as defined by the [MoSDeF](https://mosdef.org/))

These containers provide you a means to cache high-level thermodynamic, integrator, and checkpoint information related to how to set up a simulation,  
and hand them off to another researcher (possibly yourself in the future!) to reproduce simulations you've run in the past.  

They also distill down the myriad of options passed around within the [OpenMM API](https://docs.openmm.org/latest/userguide/application.html), making it easier to reason about a simulation study at a scientific, rather than technical, level

In [1]:
import logging
logging.basicConfig(level=logging.INFO)

from pathlib import Path
from polymerist.genutils.fileutils.pathutils import is_empty, assemble_path


OUTPUT_DIR = Path('scratch_MD') # dummy directory for writing without tampering with example inputs
OUTPUT_DIR.mkdir(exist_ok=True)

PARAMS_DIR = OUTPUT_DIR / 'simulation_parameters'
PARAMS_DIR.mkdir(exist_ok=True)

## Simulation parameter containers

### ThermoParameters
These store thermodynamic information about your simulation, including any thermostat and/or barostat you choose to realized a particular ensemble

In [17]:
from polymerist.mdtools.openmmtools.parameters import (
    ThermoParameters,
    Thermostat,
    ThermostatParameters,
    Barostat,
    BarostatParameters,
)
from openmm.unit import kelvin, atmosphere, picosecond


thermo_params = ThermoParameters(
    thermostat_params=ThermostatParameters(
        temperature=300*kelvin,
        timescale=1*picosecond**-1, # characteristic timescale for thermostat (e.g. friction coeff for Langevin)
        thermostat=Thermostat.LANGEVIN_MIDDLE
    ),
    barostat_params=BarostatParameters(
        pressure=1*atmosphere,
        update_frequency=25, # number of steps between barostat move attempts
        barostat=Barostat.MONTE_CARLO,
    )
)
print(thermo_params)
print(thermo_params.describe_ensemble()) # ThermoParameters infers what ensemble you are simulating in based on your choices for thermostat and barostat



ThermoParameters(thermostat_params=ThermostatParameters(temperature=300 K, timescale=1 /ps, thermostat=<Thermostat.LANGEVIN_MIDDLE: <class 'openmm.openmm.LangevinMiddleIntegrator'>>), barostat_params=BarostatParameters(pressure=1 atm, temperature=300 K, update_frequency=25, barostat=<Barostat.MONTE_CARLO: <class 'openmm.openmm.MonteCarloBarostat'>>))
NPT (Isothermal-isobaric) ensemble


In the absence of explicit thermodynamic parameters, it is assumed you are attempting to run a simulation in the NVE ensemble...

In [22]:
empty_thermo = ThermoParameters()
print(empty_thermo)
print(empty_thermo.describe_ensemble()) 

ThermoParameters(thermostat_params=None, barostat_params=None)
NVE (Microcanonical) ensemble


... however, `polymerist` also provides a handful of thermostats and barostats which work out-of-box and allow you to fix temperature and/or pressure

In [26]:
print('Available Thermostats:')
for thermostat_type in Thermostat:
    print(f'{thermostat_type.name}: {thermostat_type.value}')
    
print('\nAvailable Barostats:')
for barostat_type in Barostat:
    print(f'{barostat_type.name}: {barostat_type.value}')

Available Thermostats:
ANDERSEN: <class 'openmm.openmm.AndersenThermostat'>
BROWNIAN: <class 'openmm.openmm.BrownianIntegrator'>
LANGEVIN: <class 'openmm.openmm.LangevinIntegrator'>
LANGEVIN_MIDDLE: <class 'openmm.openmm.LangevinMiddleIntegrator'>
NOSE_HOOVER: <class 'openmm.openmm.NoseHooverIntegrator'>

Available Barostats:
MONTE_CARLO: <class 'openmm.openmm.MonteCarloBarostat'>
MONTE_CARLO_FLEXIBLE: <class 'openmm.openmm.MonteCarloFlexibleBarostat'>


In [28]:
const_T_thermo = ThermoParameters(
    thermostat_params=ThermostatParameters(
        temperature=310*kelvin,
        thermostat='Andersen' # If you don't feel like importing the Thermostat/Barostat enums explicitly, you can also reference them by their names:
    )
)
print(const_T_thermo)
print(const_T_thermo.describe_ensemble())
print(const_T_thermo.ensemble == empty_thermo.ensemble)

ThermoParameters(thermostat_params=ThermostatParameters(temperature=310 K, timescale=1 /ps, thermostat=<Thermostat.ANDERSEN: <class 'openmm.openmm.AndersenThermostat'>>), barostat_params=None)
NVT (Canonical) ensemble
False


NOTE that simulations in the NPH ensemble are not supported, as [OpenMM simulations run with barostat also require a thermostat to achieve correct results](https://docs.openmm.org/development/userguide/application/02_running_sims.html#temperature-coupling) (i.e. can't have a barostat without a thermostat)

In [30]:
ThermoParameters(
    thermostat_params=None,
    barostat_params=BarostatParameters(
        pressure=1*atmosphere,
        update_frequency=25,
        barostat='MC',
    )
)

NPHEnsembleUnsupported: NPH ensemble not supported; either add a thermostat or remove a barostat from thermodynamic parameters

ThermoParameters (and for that matter **ALL** parameter sets we show here) can be trivially cached to file...

In [31]:
thermo_params_path = assemble_path(PARAMS_DIR, 'thermo_params', extension='json')
thermo_params.to_file(thermo_params_path)

... and read back from a file just as easily

In [None]:
thermo_params_from_file = ThermoParameters.from_file(thermo_params_path)
print(f'ThermoParameters from file match those written: {thermo_params_from_file == thermo_params}')         # the parameters read from file are equivalent to those written (as we'd hope)...
print(f'ThermoParameters are identical object to those written: {thermo_params_from_file is thermo_params}') # ... but are not literally the same object (just to show I'm not pulling any tricks on you)

ThermoParameters from file match those written: True
ThermoParameters are identical object to those written: False


### IntegratorParameters
These store parameters related to the duration of the total simulation, simulation timestep, and checkpointing interval

In [34]:
from polymerist.mdtools.openmmtools.parameters import IntegratorParameters
from openmm.unit import nanosecond, femtosecond


integ_params = IntegratorParameters(
    time_step=1*femtosecond,    # integrator timestep
    total_time=0.5*nanosecond,  # total simulation time
    num_samples=200,            # number of equally-spaced checkpoint//state data samples to take over the course of the simulation
)
print(integ_params)

# derived parameters
print(integ_params.num_steps)
print(integ_params.report_interval)

IntegratorParameters(time_step=1 fs, total_time=0.5 ns, num_samples=200)
500000
2500


In [35]:
integ_params_path = assemble_path(PARAMS_DIR, 'integ_params', extension='json')
integ_params.to_file(integ_params_path)

In [36]:
integ_params_from_file = IntegratorParameters.from_file(integ_params_path)
print(integ_params_from_file == integ_params)
print(integ_params_from_file is integ_params)

True
False


### ReporterParameters
These govern the format and type of simulation snapshots ([Checkpoint](https://docs.openmm.org/development/api-python/generated/openmm.app.checkpointreporter.CheckpointReporter.html) or [State](https://docs.openmm.org/development/api-python/generated/openmm.openmm.State.html)), as well as which pieces of [state data](https://docs.openmm.org/7.0.0/api-python/generated/simtk.openmm.app.statedatareporter.StateDataReporter.html) (e.g. simulation speed, box volume, etc.) to report periodically

In [37]:
from polymerist.mdtools.openmmtools.reporters import DEFAULT_STATE_DATA_PROPS
from polymerist.mdtools.openmmtools.parameters import ReporterParameters


reporter_params = ReporterParameters(
    report_trajectory=True, # not reporting a trajectory kind of makes your simulation pointless, but the option exists if you need it :P
    traj_ext='dcd',         # output to compressed binary trajectory files (recommended)
    report_checkpoint=True, # also keep checkpoints of OpenMM objects (specific to Context and machine)
    report_state=True,      # saving State is a bit redundant with checkpoints, but is machine-transferrable
    report_state_data=True, # distinct from State (confusingly), refers to quantities which summarize the simulation
    state_data=DEFAULT_STATE_DATA_PROPS, # which particular pieces of state data are recorded on each dump; tune these to taste
)
print(reporter_params)

ReporterParameters(report_checkpoint=True, report_state=True, report_trajectory=True, report_state_data=True, traj_ext='dcd', num_steps=None, state_data={'step': True, 'time': True, 'potentialEnergy': True, 'kineticEnergy': True, 'totalEnergy': True, 'temperature': True, 'volume': True, 'density': True, 'speed': True, 'progress': False, 'remainingTime': False, 'elapsedTime': False}, reporter_paths=None)


In [38]:
reporter_params_path = assemble_path(PARAMS_DIR, 'reporter_params', extension='json')
reporter_params.to_file(reporter_params_path)

In [39]:
reporter_params_from_file = ReporterParameters.from_file(reporter_params_path)
print(reporter_params_from_file == reporter_params)
print(reporter_params_from_file is reporter_params)

True
False


### SimulationParameters
For bundling together all ThermoParameters, IntegratorParameters, and ReporterParameters into a single convenient container

In [40]:
from polymerist.mdtools.openmmtools.parameters import SimulationParameters


sim_params = SimulationParameters(
    thermo_params=thermo_params,
    integ_params=integ_params,
    reporter_params=reporter_params,
)
print(sim_params)

SimulationParameters(integ_params=IntegratorParameters(time_step=1 fs, total_time=0.5 ns, num_samples=200), thermo_params=ThermoParameters(thermostat_params=ThermostatParameters(temperature=300 K, timescale=1 /ps, thermostat=<Thermostat.LANGEVIN_MIDDLE: <class 'openmm.openmm.LangevinMiddleIntegrator'>>), barostat_params=BarostatParameters(pressure=1 atm, temperature=300 K, update_frequency=25, barostat=<Barostat.MONTE_CARLO: <class 'openmm.openmm.MonteCarloBarostat'>>)), reporter_params=ReporterParameters(report_checkpoint=True, report_state=True, report_trajectory=True, report_state_data=True, traj_ext='dcd', num_steps=500000, state_data={'step': True, 'time': True, 'potentialEnergy': True, 'kineticEnergy': True, 'totalEnergy': True, 'temperature': True, 'volume': True, 'density': True, 'speed': True, 'progress': False, 'remainingTime': False, 'elapsedTime': False}, reporter_paths=None))


In [41]:
sim_params_path = assemble_path(PARAMS_DIR, 'sim_params', extension='json')
sim_params.to_file(sim_params_path)

In [42]:
sim_params_from_file = SimulationParameters.from_file(sim_params_path)
print(sim_params_from_file == sim_params)
print(sim_params_from_file is sim_params)

True
False


## SimulationPaths and directory-based workflows
Most often, the files associated with a given simulation will coexist in a single directory with common formatting,  
to prevent mix-ups between files from multiple simulations (be it serial simulations or replicates)

The management of such a directory and many files contained within it are facilitates by `polymerist`'s SimulationPaths
All contained files will share the selected prefix for consistency, and the individual files within are accesible via the SimulationPaths API:

In [43]:
from polymerist.mdtools.openmmtools.serialization import SimulationPaths


sim_paths = SimulationPaths.from_dir_and_parameters(PARAMS_DIR/'PNIPAAm', prefix='equilibration', sim_params=sim_params)
print(sim_paths.parameters_path)
print(sim_paths.topology_path)
print(sim_paths.system_path)
print(sim_paths.trajectory_path)

scratch_MD/simulation_parameters/PNIPAAm/equilibration_parameters.json
scratch_MD/simulation_parameters/PNIPAAm/equilibration_topology.pdb
scratch_MD/simulation_parameters/PNIPAAm/equilibration_system.xml
scratch_MD/simulation_parameters/PNIPAAm/equilibration_trajectory.dcd


As you may have expected based on the above examples, SimulationPaths _itself_ can be saved to file and reloaded later  
This allows you to easily re-access the API for a given directory in later simulations or scripts

In [44]:
prev_sim_paths = SimulationPaths.from_file(sim_paths.paths_path) # our Paths, are so path-y, even the Paths have Paths!
prev_sim_params = SimulationParameters.from_file(prev_sim_paths.parameters_path)
print(prev_sim_params)

SimulationParameters(integ_params=IntegratorParameters(time_step=1 fs, total_time=0.5 ns, num_samples=200), thermo_params=ThermoParameters(thermostat_params=ThermostatParameters(temperature=300 K, timescale=1 /ps, thermostat=<Thermostat.LANGEVIN_MIDDLE: <class 'openmm.openmm.LangevinMiddleIntegrator'>>), barostat_params=BarostatParameters(pressure=1 atm, temperature=300 K, update_frequency=25, barostat=<Barostat.MONTE_CARLO: <class 'openmm.openmm.MonteCarloBarostat'>>)), reporter_params=ReporterParameters(report_checkpoint=True, report_state=True, report_trajectory=True, report_state_data=True, traj_ext='dcd', num_steps=500000, state_data={'step': True, 'time': True, 'potentialEnergy': True, 'kineticEnergy': True, 'totalEnergy': True, 'temperature': True, 'volume': True, 'density': True, 'speed': True, 'progress': False, 'remainingTime': False, 'elapsedTime': False}, reporter_paths={'trajectory': PosixPath('scratch_MD/simulation_parameters/PNIPAAm/equilibration_trajectory.dcd'), 'check