In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import numpy as np
np.warnings.filterwarnings('ignore')

In [3]:
import molsysmt as msm
from molsysmt import puw
import numpy as np
import matplotlib.pyplot as plt





# Getting minimum and maximum distances.

MolSysMT includes a very versatile method to calculate distances between points in space, atoms and/or groups of atoms. As many other methods of this multitool, the method `molsysmt.distance()` has an input argument to choose the engine in charge of getting the result. For instance, `molsysmt.distance()` currently offers two engines `MolSysMT` and `MDTraj`. At this moment only `MolSysMT` will be reported in this guide.

The different options of the method `molsysmt.distance()` will be shown, little by little, along with the following examples.

## The XYZ molecular system form

The first case, and the most simple one, is getting distances from a points distribution in space. MolSysMT accepts a molecular system form where only spatial coordinates are described, with out topological information: the `XYZ` form.

In [4]:
molecular_system = np.zeros([6,3]) * puw.unit('nm')

In [5]:
msm.get_form(molecular_system)

'XYZ'

The `XYZ` form accepts numpy arrays with length units of the shape $[n\_frames, n\_atoms, 3]$ or $[n\_atoms, 3]$. In case of having an array of rank 2, MolSysMT always understands $n\_frames=1$ and the first rank as the number of atoms:

In [6]:
msm.get(molecular_system, n_frames=True, n_atoms=True)

[1, 6]

Lets create a couple of `XYZ` molecular systems with more than a frame. These two systems will help us illustrate the firts distance calculations:

In [7]:
# Molecular system with three atoms and three frames.

molecular_system = np.zeros([3,4,3], dtype='float64') * puw.unit('nm')

## First atom
molecular_system[0,0,:] = [0, 2, -1] * puw.unit('nm')
molecular_system[1,0,:] = [1, 2, -1] * puw.unit('nm')
molecular_system[2,0,:] = [0, 2, -1] * puw.unit('nm')

## Second atom
molecular_system[0,1,:] = [-1, 1, 1] * puw.unit('nm')
molecular_system[1,1,:] = [-1, 0, 1] * puw.unit('nm')
molecular_system[2,1,:] = [0, 0, 1] * puw.unit('nm')

## Third atom
molecular_system[0,2,:] = [-2, 0, 1] * puw.unit('nm')
molecular_system[1,2,:] = [-2, 0, 0] * puw.unit('nm')
molecular_system[2,2,:] = [-1, 1, 0] * puw.unit('nm')

## Fourth atom
molecular_system[0,3,:] = [-2, -2, -2] * puw.unit('nm')
molecular_system[1,3,:] = [0, 0, 0] * puw.unit('nm')
molecular_system[2,3,:] = [2, 2, 2] * puw.unit('nm')

In [8]:
molecular_system

0,1
Magnitude,[[[0.0 2.0 -1.0]  [-1.0 1.0 1.0]  [-2.0 0.0 1.0]  [-2.0 -2.0 -2.0]]  [[1.0 2.0 -1.0]  [-1.0 0.0 1.0]  [-2.0 0.0 0.0]  [0.0 0.0 0.0]]  [[0.0 2.0 -1.0]  [0.0 0.0 1.0]  [-1.0 1.0 0.0]  [2.0 2.0 2.0]]]
Units,nanometer


In [9]:
msm.info(molecular_system)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_frames
XYZ,4,,,,,,3


### Minimum and Maximum distance

Sometimes the minimum and maximum distance between two sets of atoms needs to be obtained. Although this step could be done with the method `molsysmt.distance()` and  a little coding, MolSysMT includes two methods to make it even easier: `molsysmt.minimum_distance()` and `molsysmt.maximum_distance()`. Lets see in the following cells how they work.

As first example, lets get the minimum distance between two atoms selection:

In [10]:
min_pairs, min_distances = msm.structure.get_minimum_distances(molecular_system)

The result is offered as two numpy arrays: the list of atoms pairs minimizing the distance for each frame, and the corresponding value of the minimum distance (for each frame also).

In [11]:
min_pairs.shape

(3, 2)

In [12]:
min_pairs

array([[0, 0],
       [0, 0],
       [0, 0]])

In [13]:
min_distances.shape

(3,)

In [14]:
min_distances

0,1
Magnitude,[0.0 0.0 0.0]
Units,nanometer


In [15]:
print('The minimum distance in frame 2-th is given by atom {}-th and atom {}-th: {}'.format(min_pairs[2,0], min_pairs[2,1], min_distances[2]))

The minimum distance in frame 2-th is given by atom 0-th and atom 0-th: 0.0 nanometer


All input arguments described in previous subsections can also be used with `molsysmt.minimum_distance()` and `molsysmt.maximum_distance()`. Lets see an example:

In [16]:
min_pairs, min_distances = msm.structure.get_minimum_distances(molecular_system, selection=[0,1,2], selection_2=[0,1,2],
                                               frame_indices=[0,1], frame_indices_2=[1,2], pairs=True)

Remember that with `pairs=True`, the output does not longer refer atoms indices but pairs indices. That is the reason why the shape of min_pairs is now:

In [17]:
min_pairs.shape

(2,)

While,

In [18]:
min_distances.shape

(2,)

Which means that the minimum displacement between consecutive frames was observed for:

In [19]:
for ii in range(2):
    print('Atom {}-th had the minimum displacement of A between frames {}-th and {}-th: {}'.format(min_pairs[ii], ii, ii+1, min_distances[ii]))

Atom 0-th had the minimum displacement of A between frames 0-th and 1-th: 1.0 nanometer
Atom 0-th had the minimum displacement of A between frames 1-th and 2-th: 1.0 nanometer


There are situations in which we have a list of atoms in `selection` and the minimum distance with a second set of atoms `selection_2` needs to be known for every single atom of the first set. In this case the first set has to be considered not as entity (as set) in view of getting a single minimum distance. Lets illustrate this with an example:

In [22]:
min_pairs, min_distances = msm.structure.get_minimum_distances(molecular_system, selection=[1,2], frame_indices=[0,1,2],
                                                selection_2=[0,1], as_entity=False, as_entity_2=True)

The output corresponds to the minimum distance of atom 1-th of A to any atom of B and the minimum distance of atom 2-th of A to any atom of B, at every frame:

In [23]:
min_pairs.shape

(3, 2)

In [24]:
min_distances.shape

(3, 2)

In [25]:
selection_2=[0,1]
print('Atom 1-th of A has the minimum distance to B with its atom {}-th in frame 1-th: {}'.format(selection_2[min_pairs[1,0]], min_distances[1,0]))

Atom 1-th of A has the minimum distance to B with its atom 1-th in frame 1-th: 0.0 nanometer


In [26]:
for ii in range(3):
    print('The {}-th is the closest atom of B to atom 1-th of A at frame {}-th with {}'.format(selection_2[min_pairs[ii,0]],ii, min_distances[ii,0]))

The 1-th is the closest atom of B to atom 1-th of A at frame 0-th with 0.0 nanometer
The 1-th is the closest atom of B to atom 1-th of A at frame 1-th with 0.0 nanometer
The 1-th is the closest atom of B to atom 1-th of A at frame 2-th with 0.0 nanometer


### Minimum and Maximum distance of atom groups

Sometimes the pair of atom groups with the shortest distance between their geometric centers, or centers of mass, needs to be determined. Lets work to illustrate this case with a dimeric protein complex:

In [27]:
molecular_system = msm.convert('1TCD', 'molsysmt.MolSys')

In [28]:
msm.info(molecular_system, target='component', selection='molecule_type=="protein"')

index,n atoms,n groups,chain index,molecule index,molecule type,entity index,entity name
0,1906,248,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
1,1912,249,1,0,protein,0,TRIOSEPHOSPHATE ISOMERASE


Lets find out the closest pairs of distance from different components:

In [29]:
atoms_groups_component_0 = msm.get(molecular_system, target='group',
                                   selection='component_index==0', atom_index=True)
atoms_groups_component_1 = msm.get(molecular_system, target='group',
                                   selection='component_index==1', atom_index=True)

In [30]:
min_pairs, min_distances = msm.structure.get_minimum_distances(molecular_system,
                                                groups_of_atoms=atoms_groups_component_0,
                                                group_behavior='geometric_center',
                                                groups_of_atoms_2=atoms_groups_component_1,
                                                group_behavior_2='geometric_center')

There is a single frame in our molecular system, thats why the shape of the numpy array with the pair of groups is the following:

In [31]:
min_pairs.shape

(1, 2)

Where the indices found in min_pairs correspond to the n-th and m-th atoms group of the first list and the second list respectively:

In [32]:
min_pairs[0]

array([69, 12])

In [33]:
group_index_in_component_0 = msm.get(molecular_system, target='group',
                                     selection='component_index==0', index=True)[69]
group_index_in_component_1 = msm.get(molecular_system, target='group',
                                     selection='component_index==1', index=True)[12]

In [34]:
msm.info(molecular_system, target='group', indices=[group_index_in_component_0,
                                                    group_index_in_component_1])

index,id,name,type,n atoms,component index,chain index,molecule index,molecule type,entity index,entity name
69,73,GLY,aminoacid,4,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
260,15,CYS,aminoacid,6,1,1,0,protein,0,TRIOSEPHOSPHATE ISOMERASE


And the corresponding minimum distance between both residues from the two components is:

In [35]:
min_distances[0]

On the other hand, if the maximum distance needs to be obtained, the method to be used is `molsysmt.maximum_distance()`. Lets show how this method works with a short trajectory of the pentalanine peptide.

In [37]:
molecular_system = msm.demo['pentalanine']['traj.h5']
molecular_system = msm.convert(molecular_system, to_form='molsysmt.MolSys')

In [38]:
msm.info(molecular_system)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_peptides,n_frames
molsysmt.MolSys,62,7,1,1,1,1,1,5000


The trajectory has 5000 frames. Lets see, for each frame, whats the pair of residues with the longest distance between their geometric centers:

In [39]:
list_atom_groups = msm.get(molecular_system, target='group', selection='all', atom_index=True)
max_pairs, max_distances = msm.structure.get_maximum_distances(molecular_system, groups_of_atoms=list_atom_groups,
                                                group_behavior='geometric_center')

This time we have 5000 pairs of group, one for each frame; and 5000 maximum distances:

In [40]:
max_pairs.shape

(5000, 2)

In [41]:
max_pairs[0]

array([0, 6])

In [42]:
max_distances

0,1
Magnitude,[1.4298022125781427 1.5060004722702336 1.6863101013212674 ...  1.8300611459902283 1.2458154349050328 1.392606254657962]
Units,nanometer


To give a last example on this methods, lets wonder: what is the residue of the peptide with the largest displacement between each frame and the next one? (at each frame, of course)

In [43]:
list_atom_groups = msm.get(molecular_system, target='group', selection='all', atom_index=True)
frames=np.arange(msm.get(molecular_system, n_frames=True))
max_group, max_distances = msm.structure.get_maximum_distances(molecular_system,
                                                groups_of_atoms=list_atom_groups,
                                                group_behavior='geometric_center',
                                                groups_of_atoms_2=list_atom_groups,
                                                group_behavior_2='geometric_center',
                                                frame_indices=frames[:-1],
                                                frame_indices_2=frames[1:],
                                                pairs=True)

Since we are using the option `pairs=True`, the output this time corresponds to the index of the pair made by the elements in both `groups_of_atoms_1` and `groups_of_atoms_2` with the maximum distance, or displacement in this case, for each i-th frame with the consecutive (i+1)-th frame:

In [44]:
max_group.shape

(4999,)

In [45]:
max_group

array([2, 0, 6, ..., 6, 6, 6])

And the value of this maximum displacements are:

This way:

In [46]:
print('The {}-th is the group with the maximum displacement between frames 200-th and 201-th: {}'.format(max_group[200], max_distances[200]))

The 3-th is the group with the maximum displacement between frames 200-th and 201-th: 0.6367210108091421 nanometer
