# Demo 2: Defining reproducible and serializable OpenMM simulation parameters
`polymerist` supplies a number of a data containers which aim to facilitate [TRUE](https://www.tandfonline.com/doi/full/10.1080/00268976.2020.1742938) workflows (Transparent, Reproducible, Usable, and Extensible, as defined by the [MoSDeF](https://mosdef.org/))

These containers provide you a means to cache high-level thermodynamic, integrator, and checkpoint information related to how to set up a simulation,  
and hand them off to another researcher (possibly yourself in the future!) to reproduce simulations you've run in the past.  

They also distill down the myriad of options passed around within the [OpenMM API](https://docs.openmm.org/latest/userguide/application.html), making it easier to reason about a simulation study at a scientific, rather than technical, level

In [1]:
import logging
logging.basicConfig(level=logging.INFO)

from pathlib import Path
from polymerist.genutils.fileutils.pathutils import is_empty, assemble_path


OUTPUT_DIR = Path('scratch_MD') # dummy directory for writing without tampering with example inputs
OUTPUT_DIR.mkdir(exist_ok=True)

PARAMS_DIR = OUTPUT_DIR / 'simulation_parameters'
PARAMS_DIR.mkdir(exist_ok=True)

## Simulation parameter containers

### ThermoParameters
For storing thermodynamic parameters of your simulation, including the target ensemble and any associated thermostat or barostat parameters

In [2]:
from polymerist.mdtools.openmmtools.parameters import ThermoParameters
from openmm.unit import kelvin, atmosphere, picosecond


thermo_params = ThermoParameters(
    ensemble='NPT', # options are NVE, NVT, and NPT; all thermostatted ensembles will use a Langevin Thermostat
    temperature=300*kelvin,
    pressure=1*atmosphere,
    friction_coeff=1*picosecond**-1, # required for Langevin Thermostat
    barostat_freq=25, # number of steps between barostat move attempts
)
print(thermo_params)

ThermoParameters(ensemble='NPT', temperature=Quantity(value=300, unit=kelvin), pressure=Quantity(value=1, unit=atmosphere), friction_coeff=Quantity(value=1, unit=/picosecond), barostat_freq=25)


ThermoParameters, and for that matter **ALL** parametere sets we show here, can be trivially cached to file...

In [3]:
thermo_params_path = assemble_path(PARAMS_DIR, 'thermo_params', extension='json')
thermo_params.to_file(thermo_params_path)

... and read back from the file path just as easily

In [4]:
thermo_params_from_file = ThermoParameters.from_file(thermo_params_path)
print(thermo_params_from_file == thermo_params) # the parameters read from file are equivalent to those written (as we'd hope)...
print(thermo_params_from_file is thermo_params) # ... but are not literally the same object (just to show I'm not pulling any tricks on you)

True
False


### IntegratorParameters
For storing parameters related to the duration of the total simulation, simulation timestep, and checkpointing interval

In [5]:
from polymerist.mdtools.openmmtools.parameters import IntegratorParameters
from openmm.unit import nanosecond, femtosecond


integ_params = IntegratorParameters(
    time_step=1*femtosecond,    # integrator timestep
    total_time=0.5*nanosecond,  # total simulation time
    num_samples=200,            # number of equally-spaced checkpoint//state data samples to take over the course of the simulation
)
print(integ_params)

# derived parameters
print(integ_params.num_steps)
print(integ_params.report_interval)

IntegratorParameters(time_step=Quantity(value=1, unit=femtosecond), total_time=Quantity(value=0.5, unit=nanosecond), num_samples=200)
500000
2500


In [6]:
integ_params_path = assemble_path(PARAMS_DIR, 'integ_params', extension='json')
integ_params.to_file(integ_params_path)

In [7]:
integ_params_from_file = IntegratorParameters.from_file(integ_params_path)
print(integ_params_from_file == integ_params)
print(integ_params_from_file is integ_params)

True
False


### ReporterParameters
For governing the format and type of simulation snapshots ([Checkpoint](https://docs.openmm.org/development/api-python/generated/openmm.app.checkpointreporter.CheckpointReporter.html) or [State](https://docs.openmm.org/development/api-python/generated/openmm.openmm.State.html)), as well as which pieces of [state data](https://docs.openmm.org/7.0.0/api-python/generated/simtk.openmm.app.statedatareporter.StateDataReporter.html) (e.g. simulation speed, box volume, etc.) to report periodically

In [8]:
from polymerist.mdtools.openmmtools.reporters import DEFAULT_STATE_DATA_PROPS
from polymerist.mdtools.openmmtools.parameters import ReporterParameters


reporter_params = ReporterParameters(
    report_trajectory=True, # not reporting a trajectory kind of makes your simulation pointless, but the option exists if you need it :P
    traj_ext='dcd',         # output to compressed binary trajectory files (recommended)
    report_checkpoint=True, # also keep checkpoints of OpenMM objects (specific to Context and machine)
    report_state=True,      # saving State is a bit redundant with checkpoints, but is machine-transferrable
    report_state_data=True, # distinct from State (confusingly), refers to quantities which summarize the simulation
    state_data=DEFAULT_STATE_DATA_PROPS, # which particular pieces of state data are recorded on each dump; tune these to taste
)
print(reporter_params)

ReporterParameters(report_checkpoint=True, report_state=True, report_trajectory=True, report_state_data=True, traj_ext='dcd', num_steps=None, state_data={'step': True, 'time': True, 'potentialEnergy': True, 'kineticEnergy': True, 'totalEnergy': True, 'temperature': True, 'volume': True, 'density': True, 'speed': True, 'progress': False, 'remainingTime': False, 'elapsedTime': False}, reporter_paths=None)


In [9]:
reporter_params_path = assemble_path(PARAMS_DIR, 'reporter_params', extension='json')
reporter_params.to_file(reporter_params_path)

In [10]:
reporter_params_from_file = ReporterParameters.from_file(reporter_params_path)
print(reporter_params_from_file == reporter_params)
print(reporter_params_from_file is reporter_params)

True
False


### SimulationParameters
For bundling together all ThermoParameters, IntegratorParameters, and ReporterParameters into a single convenient container

In [11]:
from polymerist.mdtools.openmmtools.parameters import SimulationParameters


sim_params = SimulationParameters(
    thermo_params=thermo_params,
    integ_params=integ_params,
    reporter_params=reporter_params,
)
print(sim_params)

SimulationParameters(integ_params=IntegratorParameters(time_step=Quantity(value=1, unit=femtosecond), total_time=Quantity(value=0.5, unit=nanosecond), num_samples=200), thermo_params=ThermoParameters(ensemble='NPT', temperature=Quantity(value=300, unit=kelvin), pressure=Quantity(value=1, unit=atmosphere), friction_coeff=Quantity(value=1, unit=/picosecond), barostat_freq=25), reporter_params=ReporterParameters(report_checkpoint=True, report_state=True, report_trajectory=True, report_state_data=True, traj_ext='dcd', num_steps=500000, state_data={'step': True, 'time': True, 'potentialEnergy': True, 'kineticEnergy': True, 'totalEnergy': True, 'temperature': True, 'volume': True, 'density': True, 'speed': True, 'progress': False, 'remainingTime': False, 'elapsedTime': False}, reporter_paths=None))


In [12]:
sim_params_path = assemble_path(PARAMS_DIR, 'sim_params', extension='json')
sim_params.to_file(sim_params_path)

In [13]:
sim_params_from_file = SimulationParameters.from_file(sim_params_path)
print(sim_params_from_file == sim_params)
print(sim_params_from_file is sim_params)

True
False


### EnsembleFactory
Creates the OpenMM objects which realize a simulation specified by the parameter sets above

In [14]:
from polymerist.mdtools.openmmtools.thermo import EnsembleFactory

ens_fac = EnsembleFactory.from_thermo_params(sim_params.thermo_params)
print(ens_fac.forces())
print(ens_fac.integrator(time_step=sim_params.integ_params.time_step))

INFO:polymerist.mdtools.openmmtools.thermo:Created MonteCarloBarostat Force(s) for NPT (Isothermal-isobaric) ensemble
INFO:polymerist.mdtools.openmmtools.thermo:Created LangevinMiddleIntegrator for NPT (Isothermal-isobaric) ensemble


[<openmm.openmm.MonteCarloBarostat; proxy of <Swig Object of type 'OpenMM::MonteCarloBarostat *' at 0x7f97340847b0> >]
<openmm.openmm.LangevinMiddleIntegrator; proxy of <Swig Object of type 'OpenMM::LangevinMiddleIntegrator *' at 0x7f9655cefa50> >


## SimulationPaths and directory-based workflows
Most often, the files associated with a given simulation will coexist in a single directory with common formatting,  
to prevent mix-ups between files from multiple simulations (be it serial simulations or replicates)

The management of such a directory and many files contained within it are facilitates by `polymerist`'s SimulationPaths
All contained files will share the selected prefix for consistency, and the individual files within are accesible via the SimulationPaths API:

In [15]:
from polymerist.mdtools.openmmtools.serialization import SimulationPaths


sim_paths = SimulationPaths.from_dir_and_parameters(PARAMS_DIR/'PNIPAAm', prefix='equilibration', sim_params=sim_params)
print(sim_paths.parameters_path)
print(sim_paths.topology_path)
print(sim_paths.system_path)
print(sim_paths.trajectory_path)

scratch_MD/simulation_parameters/PNIPAAm/equilibration_parameters.json
scratch_MD/simulation_parameters/PNIPAAm/equilibration_topology.pdb
scratch_MD/simulation_parameters/PNIPAAm/equilibration_system.xml
scratch_MD/simulation_parameters/PNIPAAm/equilibration_trajectory.dcd


As you may have expected based on the above examples, SimulationPaths _itself_ can be saved to file and reloaded later  
This allows you to easily re-access the API for a given directory in later simulations or scripts

In [16]:
prev_sim_paths = SimulationPaths.from_file(sim_paths.paths_path) # our Paths, are so path-y, even the Paths have Paths!
prev_sim_params = SimulationParameters.from_file(prev_sim_paths.parameters_path)
print(prev_sim_params)

SimulationParameters(integ_params=IntegratorParameters(time_step=Quantity(value=1, unit=femtosecond), total_time=Quantity(value=0.5, unit=nanosecond), num_samples=200), thermo_params=ThermoParameters(ensemble='NPT', temperature=Quantity(value=300, unit=kelvin), pressure=Quantity(value=1, unit=atmosphere), friction_coeff=Quantity(value=1, unit=/picosecond), barostat_freq=25), reporter_params=ReporterParameters(report_checkpoint=True, report_state=True, report_trajectory=True, report_state_data=True, traj_ext='dcd', num_steps=500000, state_data={'step': True, 'time': True, 'potentialEnergy': True, 'kineticEnergy': True, 'totalEnergy': True, 'temperature': True, 'volume': True, 'density': True, 'speed': True, 'progress': False, 'remainingTime': False, 'elapsedTime': False}, reporter_paths={'trajectory': PosixPath('scratch_MD/simulation_parameters/PNIPAAm/equilibration_trajectory.dcd'), 'checkpoint': PosixPath('scratch_MD/simulation_parameters/PNIPAAm/equilibration_checkpoint.chk'), 'state