In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import molsysmt as msm



# Covalent chains and blocks

##  How to get covalent chains
Lets load first of all a molecular system to work with in this section:

In [3]:
molecular_system = msm.demo_systems.files['1tcd.mmtf']
molecular_system = msm.convert(molecular_system)

In [4]:
msm.info(molecular_system)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_waters,n_proteins,n_frames
molsysmt.MolSys,3983,662,167,4,166,2,165,1,1


MolSysMT includes a method to get all covalent chains found in the molecular system given by a sequence of atom names. To illustrate how the method `molsysmt.covalent_chains` works lets extract all segments of atoms C, N, CA an C covalently bound in this order (C-N-CA-C):

In [5]:
covalent_chains =msm.covalent_chains(molecular_system, chain=['atom_name=="C"', 'atom_name=="N"',
                                                              'atom_name=="CA"', 'atom_name=="C"'],
                                     selection="component_index==0")

In [6]:
covalent_chains.shape

(247, 4)

The output is a numpy array 2-ranked where the dimension of the first axe or rank is the number of chains found in the system, and the second rank has dimension 4 (since it chain was chosen to have 4 atoms):

In [7]:
covalent_chains

array([[   2,    9,   10,   11],
       [  11,   16,   17,   18],
       [  18,   25,   26,   27],
       ...,
       [1877, 1884, 1885, 1886],
       [1886, 1889, 1890, 1891],
       [1891, 1896, 1897, 1898]])

Lets check that the name of the atoms in any of the obtained chains is correct:

In [8]:
msm.get(molecular_system, selection=covalent_chains[0], name=True)

array(['C', 'N', 'CA', 'C'], dtype=object)

The atom name specified at each place does not need to be unique, we can introduce variants at any position defining the covalent chain. Lets see for instance how to get all 4 atoms covalent chains where the first three atoms are C-N-CA, in this order, and the fourth atom can either be C or CB:

In [9]:
covalent_chains =msm.covalent_chains(molecular_system, chain=['atom_name=="C"', 'atom_name=="N"',
                                                              'atom_name=="CA"', 'atom_name==["C", "CB"]'],
                                                              selection="component_index==0")

The covalent chains defining the $\phi$, $\psi$, $\omega$ and , $\xi_1$ dihedral angles are obtained as follows:

In [10]:
# Covalent chains defining all phi dihedral angles in the molecular system
phi_chains = msm.covalent_chains(molecular_system, chain=['atom_name=="C"', 'atom_name=="N"',
                                                          'atom_name=="CA"', 'atom_name=="C"'])

In [11]:
# Covalent chains defining all psi dihedral angles in the molecular system
psi_chains = msm.covalent_chains(molecular_system, chain=['atom_name=="N"', 'atom_name=="CA"',
                                                          'atom_name=="C"', 'atom_name=="N"'])

In [12]:
# Covalent chains defining all omega dihedral angles in the molecular system
omega_chains = msm.covalent_chains(molecular_system, chain=['atom_name==["CA","CH3"]', 'atom_name=="C"',
                                                            'atom_name=="N"', 'atom_name==["CA", "CH3"]'])

In [13]:
# Covalent chains defining all chi1 dihedral angles in the molecular system
chi1_chains = msm.covalent_chains(molecular_system, chain=['atom_name=="N"', 'atom_name=="CA"',
                                                           'atom_name=="CB"', 'atom_name=="CG"'])

## How to get the atoms quartets defining the dihedral angles

MolSysMT includes a method to obtain the sets of atoms quartets defining all dihedral angles present in the system given their names. There is no need then to remember the atom names defining the angle $\phi$, $\psi$, $\omega$, or any of the $\chi$ angles. Lets see how this method works over one of the demo molecular systems:

In [14]:
molecular_system = msm.demo_systems.files['1tcd.mmtf']
molecular_system = msm.convert(molecular_system)

The quartets defining the angles $\phi$, $\psi$ or $\omega$ over the whole system can be obtained as follows:

In [15]:
phi_chains = msm.covalent_dihedral_quartets(molecular_system, dihedral_angle='phi')

In [16]:
print(phi_chains)

[[   2    9   10   11]
 [  11   16   17   18]
 [  18   25   26   27]
 ...
 [3789 3796 3797 3798]
 [3798 3801 3802 3803]
 [3803 3808 3809 3810]]


The search of these quartets can be limited to a specific selection. Lets see how to get the quartes of the $\psi$ angles in residues 10 to 15:

In [17]:
psi_chains = msm.covalent_dihedral_quartets(molecular_system, dihedral_angle='psi',
                                            selection='10<=group_index<=15')

In [18]:
print(psi_chains)

[[ 77  78  79  86]
 [ 86  87  88  92]
 [ 92  93  94 100]
 [100 101 102 104]
 [104 105 106 110]]


Atoms chains defining $\chi$ angles can be also extracted. Lets get, for instance, all $\chi_{5}$ in the system:

In [19]:
chi5_chains = msm.covalent_dihedral_quartets(molecular_system, dihedral_angle='chi5')

There's a high number of ARG residues in our system. ARG is the only amino-acide with a $\chi_{5}$ dihedral angle.

In [20]:
print(chi5_chains.shape[0])

26


In [21]:
n_args = msm.get(molecular_system, target='group', selection='group_name=="ARG"', n_groups=True)
print(n_args)

26


In [22]:
phi_psi_chains = msm.covalent_dihedral_quartets(molecular_system, dihedral_angle='phi-psi')

In [23]:
print(phi_psi_chains.shape)

(990, 4)


In [24]:
msm.get(molecular_system, selection=phi_psi_chains[0], name=True)

array(['C', 'N', 'CA', 'C'], dtype=object)

If all dihedral angles needs to be considered, the value 'all' for the input argument `dihedral_angle` returns all atoms quartets for any $\phi$, $\psi$, $\omega$, $\chi_{1}$, $\chi_{2}$, $\chi_{3}$, $\chi_{4}$ and $\chi_{5}$ angle:

In [25]:
all_angles_chains = msm.covalent_dihedral_quartets(molecular_system, dihedral_angle='all')

In [26]:
print(all_angles_chains.shape)

(2362, 4)


In the following tables a summary of the dihedral angle definitions are included in this document for future reference. The corresponding string taken by the input argument `dihedral_angle` is written down between parentesis next to each greek letter naming the angle: 

#### $\phi$ (`phi`)

| Residue | Atoms | Zero value | Range (degrees)|
| :---: | :---: | :---: | :---: |
| all but PRO | C-N-CA-C | C cis to C | [-180, 180) |
| PRO | C-N-CA-C | C cis to C | ~-90 |

#### $\psi$ (`psi`)

| Residue | Atoms | Zero value | Range (degrees)|
| :---: | :---: | :---: | :---: |
| all | N-CA-C-N | N cis to N | [-180, 180) |

#### $\omega$ (`omega`)

| Residue | Atoms | Zero value | Range (degrees)|
| :---: | :---: | :---: | :---: |
| all | CA-C-N-CA | CA cis to CA | ~180 |
| all | CH3-C-N-CA | CA cis to CA | ~180 |
| all | CA-C-N-CH3 | CA cis to CA | ~180 |

#### $\chi_{1}$ (`chi1`)

| Residue | Atoms | Zero value | Range (degrees)|
| :---: | :---: | :---: | :---: |
| ARG | N-CA-CB-CG | CG cis to N | [-180, 180) |
| ASN | N-CA-CB-CG | CG cis to N | [-180, 180) |
| ASP | N-CA-CB-CG | CG cis to N | [-180, 180) |
| CYS | N-CA-CB-SG | SG cis to N | [-180, 180) |
| GLN | N-CA-CB-CG | CG cis to N | [-180, 180) |
| GLU | N-CA-CB-CG | CG cis to N | [-180, 180) |
| HIS | N-CA-CB-CG | CG cis to N | [-180, 180) |
| ILE | N-CA-CB-CG1 | CG1 cis to N | [-180Â°, 180) |
| LEU | N-CA-CB-CG | CG cis to N | [-180, 180) |
| LYS | N-CA-CB-CG | CG cis to N | [-180, 180) |
| MET | N-CA-CB-CG | CG cis to N | [-180, 180) |
| PHE | N-CA-CB-CG | CG cis to N | [-180, 180) |
| PRO | N-CA-CB-CG | CG cis to N | CA-CB is part of ring |
| SER | N-CA-CB-OG | OG cis to N | [-180, 180) |
| THR | N-CA-CB-OG1 | OG1 cis to N | [-180, 180) |
| TRP | N-CA-CB-CG | CG cis to N | [-180, 180) |
| TYR | N-CA-CB-CG | CG cis to N | [-180, 180) |
| VAL | N-CA-CB-CG1 | CG1 cis to N | [-180, 180) |


### $\chi_{2}$ (`chi2`)

| Residue | Atoms | Zero value | Range (degrees)|
| :---: | :---: | :---: | :---: |
| ARG | CA-CB-CG-CD  | CD cis to CA     | [-180, 180) |
| ASN | CA-CB-CG-OD1 | OD1 cis to CA    | [-180, 180) |
| ASP | CA-CB-CG-OD  | OD1 cis to CA    | [-180, 180) |
| GLN | CA-CB-CG-CD  | CD cis to CA     | [-180, 180) |
| GLU | CA-CB-CG-CD  | CD cis to CA     | [-180, 180) |
| HIS | CA-CB-CG-ND1 | ND1 cis to CA    | [-180, 180) |
| ILE | CA-CB-CG1-CD | CD cis to CA     | [-180, 180) |
| LEU | CA-CB-CG-CD1 | CD1 cis to CA    | [-180, 180) |
| LYS | CA-CB-CG-CD  | CD cis to CA     | [-180, 180) |
| MET | CA-CB-CG-SD  | SD cis to CA     | [-180, 180) |
| PHE | CA-CB-CG-CD  | CD1 cis to CA    | [-180, 180) |
| PRO | CA-CB-CG-CD  | CD cis to CA     | CB-CG is part of ring |
| TRP | CA-CB-CG-CD1 | CD1 cis to CA    | [-180, 180) |
| TYR | CA-CB-CG-CD1 | CD1 cis to CA    | [-180, 180) |

#### $\chi_{3}$ (`chi3`)

| Residue | Atoms | Zero value | Range (degrees)|
| :---: | :---: | :---: | :---: |
| ARG | CB-CG-CD-NE  | NE cis to CB     | [-180, 180) |
| GLN | CB-CG-CD-OE1 | OE1 cis to CB    | [-180, 180) |
| GLU | CB-CG-CD-OE1 | OE1 cis to CB    | [-180, 180) |
| LYS | CB-CG-CD-CE  | CE cis to CB     | [-180, 180) |
| MET | CB-CG-SD-CE  | CE cis to CB     | [-180, 180) |

#### $\chi_{4}$ (`chi4`)

| Residue | Atoms | Zero value | Range (degrees)|
| :---: | :---: | :---: | :---: |
| ARG | CG-CD-NE-CZ | CZ cis to CG      | [-180, 180) |
| LYS | CG-CD-CE-NZ | NZ cis to CG      | [-180, 180) |

#### $\chi_{5}$ (`chi5`)

| Residue | Atoms | Zero value | Range (degrees)|
| :---: | :---: | :---: | :---: |
| ARG | CD-NE-CZ-NH1 | NH1 cis to CD    | [-180, 180) |

Every dihedral angle is defined in a peptide by three vectors delimited by four consecutive covalently bonded atoms. The vector in the middle defines the orthogonal plane where rotations are defined by the projection of vectors first and third, this way two blocks of atoms change its relative positions: all atoms covalently bonded before and after the second vector in the polymer. Or explained in other words, removing the second vector two sets of covalently bonded atoms are defined. Each of these two atoms sets move in unison when the dihedral angle changes. MolSysMT includes the input argument `with_blocks` for the method `molsysmt.covalent_dihedral_quartets` to return these atoms sets together with the quartets. Lets see how it works with an example:

In [27]:
molecular_system = msm.demo_systems.metenkephalin()

In [28]:
phi_chains, phi_blocks = msm.covalent_dihedral_quartets(molecular_system, dihedral_angle='phi',
                                                        with_blocks=True)

Lets for instance have a look to the quartet defining the 3-th $\phi$ angle:

In [29]:
view = msm.view(molecular_system, viewer='NGLView')
selection_quartet = msm.select(molecular_system, selection=phi_chains[2], to_syntaxis='NGLView')
view.clear()
view.add_licorice(color='white')
view.add_ball_and_stick(selection_quartet, color='orange')
view

NGLWidget()

In [30]:
phi_blocks[2]

array([{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36},
       {37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71}],
      dtype=object)

Lets show in blue and red the two blocks of atoms defined by this 4-th $\phi$ dihedral angle.

In [32]:
view = msm.view(molecular_system, viewer='NGLView')
view.clear()
selection_quartet = msm.select(molecular_system, selection=phi_chains[2], to_syntaxis='NGLView')
selection_block_0 = msm.select(molecular_system, selection=list(phi_blocks[2][0]), to_syntaxis='NGLView')
selection_block_1 = msm.select(molecular_system, selection=list(phi_blocks[2][1]), to_syntaxis='NGLView')
view.add_licorice(color='white')
view.add_ball_and_stick(selection_quartet, color='orange')
view.add_ball_and_stick(selection_block_0, color='red')
view.add_ball_and_stick(selection_block_1, color='blue')
view

NGLWidget()

##  How to get covalent blocks

In addition to getting the covalent chains, MolSysMT provides with a third method, `molsysmt.covalent_blocks`, to obtain the sets of atoms covalently bonded. In order to illustrate the results given by this method, lets load first of all a molecular system to work with it:

In [33]:
molecular_system = msm.demo_systems.metenkephalin()

With the molecular system as the only input argument, the output corresponds to the list of sets of atoms covalently bonded.

In [34]:
blocks = msm.covalent_blocks(molecular_system)

In [35]:
print(blocks)

[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71}]


This way the method does not offer new information. The result is nothing but the definition of the components in the system. However, using the input argument `remove_bonds` the method turns into a more interesting tool. Lets remove a couple of bonds to see the effect:

In [36]:
msm.get(molecular_system, target='atom', selection='atom_name==["C", "N"]', inner_bonded_atoms=True)

array([[19, 21],
       [26, 28],
       [33, 35],
       [53, 55]])

In [37]:
blocks = msm.covalent_blocks(molecular_system, remove_bonds=[[19,21],[33,35]])

In [38]:
print(blocks)

[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}
 {32, 33, 34, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}
 {35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71}]


The output can also be return as a numpy array:

In [39]:
blocks = msm.covalent_blocks(molecular_system, remove_bonds=[[19,21],[33,35]], output_form='array')

In this case an array is returned with the index of the block each atom belongs to (0-based):

In [40]:
print(blocks)

[0 0 0 ... 2 2 2]
