# Mapping Molecules

In this notebook we will cover how to map molecules in different ways and look at some of the things we can do with them.

Throughout this demonstration we will load data from a GROMACS simulation and therefore, we need to define a set of units and a file reader object to use. For this reason, we have changed the imports a little bit to keep the code to minimum.

In [1]:
import mdsuite as mds
import mdsuite.file_io.chemfiles_read
from mdsuite.utils import Units

from zinchub import DataHub

### Loading the data

In this tutorial we are using 50 ns simulations of 14 water molecules in a continuum fluid performed with GROMACS. We will use pure atomistic naming as well as ligand naming, the topology files for which are contained on DataHub.

In [2]:
water = DataHub(url="https://github.com/zincware/DataHub/tree/main/Water_14_Gromacs")
water.get_file('./')
file_paths = [
        f for f in water.file_raw
    ]

### Project definition

Here we create the project and define some custom units used by GROMACS.

In [3]:
project = mds.Project("Mapping_Molecules")

gmx_units = Units(
        time=1e-12,
        length=1e-10,
        energy=1.6022e-19,
        NkTV2p=1.6021765e6,
        boltzmann=8.617343e-5,
        temperature=1,
        pressure=100000,
    )

2022-01-22 00:57:49,836 - INFO: Creating new project Mapping_Molecules


INFO - 2022-01-22 00:57:49,836 - project - Creating new project Mapping_Molecules


### Mapping molecules with SMILES

In this section we take a look at how one can map molecules using SMILES strings.

In [4]:
traj_path = file_paths[2]
topol_path = file_paths[0]

file_reader = mdsuite.file_io.chemfiles_read.ChemfilesRead(
    traj_file_path=traj_path, topol_file_path=topol_path
)

water_chemical = project.add_experiment(
    name=f"water_chemical",
    timestep=0.002,
    temperature=300.0,
    units=gmx_units,
    simulation_data=file_reader,
)

2022-01-22 00:57:49,901 - INFO: Creating a new experiment!


INFO - 2022-01-22 00:57:49,901 - experiment - Creating a new experiment!
100%|███████████████████████████████████| 1/1 [00:00<00:00,  1.81it/s]


In [5]:
water_chemical.run.CoordinateWrapper()

water_chemical.run.MolecularMap(
    molecules={"water": {"smiles": "[H]O[H]", "amount": 14, "cutoff": 1.7}}
)

Applying transformation 'Positions' to 'O': 100%|█| 1/1 [00:00<00:00, 
Applying transformation 'Positions' to 'H': 100%|█| 1/1 [00:00<00:00, 
Building molecular graph from configuration for water: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [00:00<00:00, 650.77it/s]
Mapping molecule graphs onto trajectory for water: 100%|█| 2/2 [00:00<
Applying transformation 'Positions' to 'water': 100%|█| 1/1 [00:00<00:


### Mapping Molecules with a reference dict

If you do not have particles with chemical names but you nonetheless wish to construct groups out of particles, this can be achieved by using a reference dict.

In this example, we use the ligand naming from GROMACS to construct water molecules.

In [7]:
traj_path = file_paths[2]
topol_path = file_paths[1]

file_reader = mdsuite.file_io.chemfiles_read.ChemfilesRead(
    traj_file_path=traj_path, topol_file_path=topol_path
)

water_ligand = project.add_experiment(
    name=f"water_ligand",
    timestep=0.002,
    temperature=300.0,
    units=gmx_units,
    simulation_data=file_reader,
)

2022-01-22 01:01:58,255 - INFO: Creating a new experiment!


INFO - 2022-01-22 01:01:58,255 - experiment - Creating a new experiment!
INFO - 2022-01-22 01:02:00,315 - pubchempy - 'PUGREST.NotFound: No CID found that matches the given name'




100%|███████████████████████████████████| 1/1 [00:00<00:00,  2.53it/s]


Keep in mind, as the particles are not named from the periodic tables, important properties such as mass will need to be filled in manually.

In [8]:
water_ligand.species['OW'].mass = [15.999]
water_ligand.species['HW1'].mass = [1.00784]
water_ligand.species['HW2'].mass = [1.00784]

In [11]:
water_ligand.run.CoordinateWrapper()

water_ligand.run.MolecularMap(
    molecules={"water": {"reference": {"HW1": 1, "OW": 1, "HW2": 1}, "amount": 14, "cutoff": 1.7}}
)

2022-01-22 01:03:32,953 - INFO: Positions already exists for OW, skipping transformation


INFO - 2022-01-22 01:03:32,953 - transformations - Positions already exists for OW, skipping transformation


2022-01-22 01:03:32,958 - INFO: Positions already exists for HW1, skipping transformation


INFO - 2022-01-22 01:03:32,958 - transformations - Positions already exists for HW1, skipping transformation


2022-01-22 01:03:32,963 - INFO: Positions already exists for HW2, skipping transformation


INFO - 2022-01-22 01:03:32,963 - transformations - Positions already exists for HW2, skipping transformation
Building molecular graph from configuration for water: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [00:00<00:00, 631.77it/s]
Mapping molecule graphs onto trajectory for water: 100%|█| 2/2 [00:00<
Applying transformation 'Positions' to 'water': 100%|█| 1/1 [00:00<00:


### What information is stored?

So the molecule mapping itself was quick and easy, but what information has been stored along the way?