In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import numpy as np
np.warnings.filterwarnings('ignore')

In [3]:
import molsysmt as msm
from molsysmt import puw
import numpy as np
import matplotlib.pyplot as plt



# Get neighbors

With the method `molsysmt.distance()` many questions about a molecular system can be answered. Two of the most common distance related questions are: what are the closest n atoms to a given one? or what are the atoms closest than a given distance threshold? MolSysMT includes a method to provide with this distances processing: `molsysmt.neighbors()`. 

### First closest neighbor atoms or groups

There are two ways to compute distance neighbors. The closest n atoms to a given one can be obtained with the option `num_neighbors` or `threshold`. Lets show with a simple example how this first option works:

In [4]:
molecular_system = msm.demo['pentalanine']['traj.h5']
molecular_system = msm.convert(molecular_system, to_form='molsysmt.MolSys')

In [5]:
msm.info(molecular_system)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_peptides,n_structures
molsysmt.MolSys,62,7,1,1,1,1,1,5000


We can compute the closest 3 CA atoms to each CA atom of the molecular system:

In [6]:
CA_atoms_list = msm.select(molecular_system, selection='atom_name=="CA"')

In [7]:
neighbors, distances = msm.structure.get_neighbors(molecular_system, selection=CA_atoms_list, num_neighbors=3)

Two objects are returned. A numpy array with the list of 3 neighbor atom indices per atom in `selection_1`, per frame:

In [8]:
neighbors.shape

(5000, 5, 3)

And the corresponding distances:

In [None]:
distances.shape

This way, the closest 3 atoms of the first CA atom at frame 2000-th are:

In [None]:
print("3 first neighbor CAs of atom {}-th at frame 0-th".format(CA_atoms_list[0]))
print("------------------------------------------")

for ii in range(3):
    print("{}° neighbor is atom {}-th with distance: {}".format(ii+1, CA_atoms_list[neighbors[2000,0,ii]], distances[2000,0,ii]))

Lets see now the 4 closest atoms, any kind, to each CA atom of the molecular system:

In [None]:
neighbors, distances = msm.structure.get_neighbors(molecular_system, selection=CA_atoms_list, selection_2='all', num_neighbors=4)

In [None]:
print("4 first neighbors of atom {}-th at frame 2000-th".format(CA_atoms_list[0]))
print("------------------------------------------")

for ii in range(4):
    print("{}° neighbor is atom {}-th with distance: {}".format(ii+1, neighbors[2000,0,ii], distances[2000,0,ii]))

Notice that, in this case, `msm.neighbors_list` is built to assume that is working with two different set of atoms since `selection`$\neq$`selection_2`. Thats the reason why this time the first neighbor atom is the atom itself.

The method `msm.neighbors_list()` was built on top of `msm.distance()`, thus the input arguments are almost the same. If you already had a look to the section about atoms distance, you will be probably wonder if `msm.neighbors_list` can also work with atoms groups. Lets illustrate this case with the following cells:

In [None]:
molecular_system = msm.convert('1TCD', 'molsysmt.MolSys')

In [None]:
atoms_in_residues_chain_0 = msm.get(molecular_system, target='group',
                                    selection="molecule_type=='protein' and chain_index==0",
                                    atom_index=True)
atoms_in_residues_chain_1 = msm.get(molecular_system, target='group',
                                    selection="molecule_type=='protein' and chain_index==1",
                                    atom_index=True)

In [None]:
print('Number of residues in chain 0:', len(atoms_in_residues_chain_0))
print('Number of residues in chain 1:', len(atoms_in_residues_chain_1))

In [None]:
neighbors, distances = msm.structure.get_neighbors(molecular_system, groups_of_atoms=atoms_in_residues_chain_0,
                                     group_behavior= 'geometric_center', num_neighbors=8)

In [None]:
print(neighbors.shape)

In [None]:
print("8 first group neighbors of the geometric center of residue 0-th")
print("------------------------------------------")

for ii in range(8):
    print("{}° neighbor is group {}-th with distance: {}".format(ii+1, neighbors[0,0,ii], distances[0,0,ii]))

The list groups neighbors can be computed also from two molecular systems or two list of groups:

In [None]:
neighbors, distances = msm.structure.get_neighbors(molecular_system,
                                     groups_of_atoms=atoms_in_residues_chain_0,
                                     group_behavior= 'geometric_center',
                                     groups_of_atoms_2=atoms_in_residues_chain_1,
                                     group_behavior_2= 'geometric_center',
                                     num_neighbors=8)

In [None]:
print("8 first group neighbors from chain 1 of the geometric center of residue 0-th from chain 0")
print("------------------------------------------")

for ii in range(8):
    print("{}° neighbor is group {}-th with distance: {}".format(ii+1, neighbors[0,0,ii], distances[0,0,ii]))

The method `molsysmt.neighbors_lists()` can also mix atoms and atoms groups. Lets, as last example, get the closest geometric centers of residues to a specific atom:

In [None]:
neighbors, distances = msm.structure.get_neighbors(molecular_system, selection=100,
                                     groups_of_atoms_2=atoms_in_residues_chain_1,
                                     group_behavior_2= 'geometric_center',
                                     num_neighbors=4)

In [None]:
print("4 closest geometric centers of residues of chain 1 from atom 100-th")
print("-------------------------------------------------------------------")

for ii in range(4):
    print("{}° closest neighbor is group {}-th with distance: {}".format(ii+1, neighbors[0,0,ii], distances[0,0,ii]))

### Closest neighbor atoms or groups below a distance threshold

In addition to the input argument `num_neighbors`, `molsysmt.neighbors()` includes the option of getting those neighbors with a distance below a given threshols: `threshold`. Lets get for the following molecular system the list of CA atoms closest than 8 $\unicode{xC5}$:

In [None]:
molecular_system = msm.convert('1TCD', 'molsysmt.MolSys')

In [None]:
CA_atoms = msm.select(molecular_system, selection='atom_name=="CA"')

In [None]:
neighbors, distances = msm.structure.get_neighbors(molecular_system, selection=CA_atoms, threshold='8 angstroms')

In this example, each CA atom has a different number of neighbors. This time the output is not a tensor ranked 3, but a matrix where the elements are not numbers but list of neighbors:

In [None]:
print(neighbors.shape)

In [None]:
print(distances.shape)

The molecular system had 1 single frame and 497 CA atoms, lets see now the number of CA neighbors of the first 10 CA atoms in our list:

In [None]:
for ii in range(10):
    print("The {}° CA has {} CA neighbors.".format(ii+1,len(neighbors[0,ii])))

Lets print out the neighbors of the 20-th CA in the list:

In [None]:
for ii,dd in zip(neighbors[0,20], distances[0,20]):
    print("The {}-th CA is {} away from the 20-th CA".format(ii,dd))

As well as for the input argument `num_neighbors` -previous subsection-, the neighbors closest than a given threshold can also be computed between atoms groups or atoms and atoms groups. Lets show a example where the neighbors of the residues of chain 0 in our molecular system are defined as those residues of chain 1 closest the 1.2 nm:

In [None]:
atoms_in_residues_chain_0 = msm.get(molecular_system, target='group',
                                    selection="molecule_type=='protein' and chain_index==0",
                                    atom_index=True)
atoms_in_residues_chain_1 = msm.get(molecular_system, target='group',
                                    selection="molecule_type=='protein' and chain_index==1",
                                    atom_index=True)

In [None]:
neighbors, distances = msm.structure.get_neighbors(molecular_system,
                                     groups_of_atoms= atoms_in_residues_chain_0,
                                     group_behavior='geometric_center',
                                     groups_of_atoms_2= atoms_in_residues_chain_1,
                                     group_behavior_2='geometric_center',
                                     threshold=1.2*puw.unit('nanometers'))

Lets print out the number of contacts in chain 1 per residue of chain 0, if any:

In [None]:
for ii in range(len(atoms_in_residues_chain_0)):
    n_contacts = len(neighbors[0,ii])
    if n_contacts >0:
        print('The {}-th residue of chain 0 has {} residue contacts in chain 1'.format(ii,n_contacts))

This information is usually represented as a contact map. If this is what you are looking for, you will probably find the next section more appropriate to your needs.