In [1]:
# This cell is removed with the tag: "remove-input"
# As such, it will not be shown in documentation

%load_ext autoreload
%autoreload 2

# Quickstart guide

## Ready in two steps

Install MolSysMT in your conda env:

```bash
conda install -c uibcdf molsysmt
```

and open a Jupyter notebook or an IPython session to import the library:

In [2]:
import molsysmt as msm



## Molecular systems' forms

In MolSysMT's language, a same molecular system can have different forms. Not every form has the same attributes, but they are all different representations of the same system. For example, a PDB id is a form of a molecular system containing a list of atom names, atom residues, chains... etc, of a molecular system together with the spatial coordinates of a structure (or many). And the corresponding pdb file, mmtf file, Trajectory object of mdtraj or pytraj, NGLWidget of nglview, or MolSys object of MolSysMT can be other forms of the same molecular system. Let's see this with an example.

Let's define a first molecular system with a PDB id.

In [3]:
molecular_system = '181L'

We can check the form of our molecular system.

In [4]:
form = msm.get_form(molecular_system)
print(f'The molecular system has the "{form}" form')

The molecular system has the "string:pdb_id" form


And we can also get a description of the molecular system with the help of `molsysmt.info`:

In [5]:
msm.info(molecular_system)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_waters,n_ions,n_small_molecules,n_proteins,n_structures
string:pdb_id,1441,302,141,6,141,5,136,2,2,1,1


Let's now convert our molecular system to different forms:

In [6]:
molecular_system = msm.convert(molecular_system, to_form='181L.mmtf')

old_form = form
form = msm.get_form(molecular_system)
print(f'The molecular system was converted from "{old_form}" to "{form}"')

The molecular system was converted from "string:pdb_id" to "file:mmtf"


In [7]:
molecular_system = msm.convert(molecular_system, to_form='openmm.Topology')

old_form = form
form = msm.get_form(molecular_system)
print(f'The molecular system was converted from "{old_form}" to "{form}"')

The molecular system was converted from "file:mmtf" to "openmm.Topology"


In [8]:
molecular_system = msm.convert(molecular_system, to_form='molsysmt.Topology')

old_form = form
form = msm.get_form(molecular_system)
print(f'The molecular system was converted from "{old_form}" to "{form}"')

The molecular system was converted from "openmm.Topology" to "molsysmt.Topology"


In [9]:
molecular_system = msm.convert(molecular_system, selection='molecule_index==0', to_form='string:aminoacids1')

old_form = form
form = msm.get_form(molecular_system)
print(f'The molecular system was converted from "{old_form}" to "{form}"')

The molecular system was converted from "molsysmt.Topology" to "string:aminoacids1"


Finnally, we have a molecular system with the 1-letter-aminoacids code's string form:

In [10]:
print(molecular_system)

MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAAAINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYK


```{admonition} Note
:class: note
MolSysMT includes some native forms such as 'molsysmt.MolSys', 'molsysmt.Topology' or 'molsysmt.Structures'.
```

## Elements selection

In [11]:
molecular_system = msm.convert('181L')

In [12]:
msm.info(molecular_system, element='entity')

index,name,type,n atoms,n groups,n components,n chains,n molecules
0,T4 lysozyme,protein,1289,162,1,1,1
1,Chloride ion,ion,2,2,2,2,2
2,2-hydroxyethyl disulfide,small molecule,8,1,1,1,1
3,Benzene,small molecule,6,1,1,1,1
4,water,water,136,136,136,1,136


In [13]:
ions = msm.select(molecular_system, selection='entity_type=="ion"')

In [14]:
print(ions)

[1289 1290]


In [15]:
msm.info(molecular_system, element='atom', selection='@ions')

index,id,name,type,group index,group id,group name,group type,component index,chain index,molecule index,molecule type,entity index,entity name
1289,1290,CL,Cl,162,173,CL,ion,1,1,1,ion,1,Chloride ion
1290,1291,CL,Cl,163,178,CL,ion,2,2,2,ion,1,Chloride ion


In [16]:
CAs_in_contact = msm.select(molecular_system, selection='atom_name=="CA" within 5.0 angstroms of @ions')

In [17]:
print(CAs_in_contact)

[ 385 1115 1122 1129 1137]


In [18]:
residues_in_contact = msm.get(molecular_system, element='atom', selection=CAs_in_contact, group_index=True)

In [19]:
print(residues_in_contact)

[ 48 141 142 143 144]


In [20]:
residues_in_contact = msm.select(molecular_system, element='group', selection='atom_name=="CA" within 5.0 angstroms of @ions')

In [21]:
print(residues_in_contact)

[ 48 141 142 143 144]


### You can use your favourite selection syntax

In [22]:
msm.select(molecular_system, selection='name =~ "C[1-4]"', syntax='MDTraj')

array([1291, 1293, 1299, 1300, 1301, 1302])

### You can convert a selection into your favourite selection syntax

In [23]:
msm.select(molecular_system, element='group', selection='molecule_type=="ion"', to_syntax='NGLView')

'173:B 178:C'

## Getting attributes from elements

In [24]:
msm.get(molecular_system, n_atoms=True)

1441

In [25]:
msm.get(molecular_system, n_structures=True)

1

In [26]:
msm.get(molecular_system, box_volume=True)

0,1
Magnitude,[311.5566139349998]
Units,nanometer3


In [27]:
msm.get(molecular_system, element='atom', selection=[10, 11, 12], atom_name=True, group_name=True)

[array(['C', 'O', 'CB'], dtype=object),
 array(['ASN', 'ASN', 'ASN'], dtype=object)]

In [28]:
msm.get(molecular_system, element='chain', selection='molecule_type=="water"', id=True)

array(['F'], dtype=object)

## Tools

MolSysMT have different categories of tools to work with molecular modules. They can be found in the modules: `molsysmt.basic`, `molsysmt.build`, `molsysmt.topology`, `molsysmt.structure`, `molsysmt.pbc`, ... Let's illustrate with examples how some of these tools work.

```{admonition} Did you know...?
:class: tip, dropdown
*MolSysMT is form agnostic*. All tools work no matter the form of the input molecular system.
```

### Basic

"Basic" tools such as `select`, `get`, `convert`, `add`, or `remove`, can be found in the module `molsysmt.basic`. Let's see some examples:

In [29]:
molecular_system = msm.basic.convert('181L', to_form='pdbfixer.PDBFixer')

In [30]:
msm.basic.contains(molecular_system, waters=True, ions=True, small_molecules=True)

True

In [31]:
molecular_system = msm.basic.remove(molecular_system, selection='molecule_type==["water", "ion", "small molecule"]')

In [32]:
msm.basic.get(molecular_system, n_waters=True, n_ions=True, n_small_molecules=True)

[0, 0, 0]

In [33]:
msm.basic.get_form(molecular_system)

'pdbfixer.PDBFixer'

In [34]:
msm.basic.view(molecular_system, viewer='NGLView')

NGLWidget()

### Build

"Build" tools such as `solvate`, `add_missing_hydrogens`, `build_peptide`, `get_atoms_with_alternate_locations`, or `make_bioassembly`, can be found in the module `molsysmt.build`. Let's see some examples:

In [None]:
molecular_system = msm.build.build_peptide('AceAlaAlaAlaNme')

In [None]:
molecular_system = msm.structure.center(molecular_system)

In [None]:
msm.get(molecular_system, n_aminoacids=True, n_groups=True)

In [None]:
molecular_system = msm.build.solvate(molecular_system, box_shape='truncated octahedral',
                                     clearance='14.0 angstroms')

In [None]:
msm.build.is_solvated(molecular_system)

In [None]:
molecular_system = msm.pbc.wrap_to_mic(molecular_system)

In [None]:
msm.view(molecular_system, standardize=True, water_as_surface=True)

### Structure

In [35]:
molecular_system = msm.basic.convert('181L', selection='molecule_type=="protein"')

In [36]:
msm.structure.get_distances(molecular_system, selection='atom_index==10', selection_2='atom_index==100')

0,1
Magnitude,[[[1.5211279959293365]]]
Units,nanometer


In [37]:
msm.info(molecular_system, element='atom', selection='group_index==[3,4]')

index,id,name,type,group index,group id,group name,group type,component index,chain index,molecule index,molecule type,entity index,entity name
24,25,N,N,3,4,PHE,aminoacid,0,0,0,protein,0,T4 lysozyme
25,26,CA,C,3,4,PHE,aminoacid,0,0,0,protein,0,T4 lysozyme
26,27,C,C,3,4,PHE,aminoacid,0,0,0,protein,0,T4 lysozyme
27,28,O,O,3,4,PHE,aminoacid,0,0,0,protein,0,T4 lysozyme
28,29,CB,C,3,4,PHE,aminoacid,0,0,0,protein,0,T4 lysozyme
29,30,CG,C,3,4,PHE,aminoacid,0,0,0,protein,0,T4 lysozyme
30,31,CD1,C,3,4,PHE,aminoacid,0,0,0,protein,0,T4 lysozyme
31,32,CD2,C,3,4,PHE,aminoacid,0,0,0,protein,0,T4 lysozyme
32,33,CE1,C,3,4,PHE,aminoacid,0,0,0,protein,0,T4 lysozyme
33,34,CE2,C,3,4,PHE,aminoacid,0,0,0,protein,0,T4 lysozyme


In [38]:
msm.topology.get_dihedral_quartets(molecular_system, dihedral_angle='psi', selection='group_index==[3,4]')

['atom_name=="N"', 'atom_name=="CA"', 'atom_name=="C"', 'atom_name=="N"']


array([], dtype=int64)

In [41]:
msm.topology.get_covalent_chains(molecular_system, chain=['atom_name=="N"', 'atom_name=="CA"', 'atom_name=="C"', 'atom_name=="N"'])

IndexError: positional indexers are out-of-bounds

In [39]:
msm.topology.get_dihedral_quartets(molecular_system, dihedral_angle='psi')

['atom_name=="N"', 'atom_name=="CA"', 'atom_name=="C"', 'atom_name=="N"']


IndexError: positional indexers are out-of-bounds

In [None]:
msm.structure.get_dihedral_angles(molecular_system, dihedral_angle='psi', selection='group_index==[3,4]')

In [None]:
msm.structure.get_contacts(molecular_system, selection='atom_name=="CA"', threshold='9 angstroms')

In [None]:
####### REMOVING FILES ############################
# This cell is removed with the tag: "remove-input"
# As such, it will not be shown in documentation

import os
for filename in ['181L.pdb', '181L.mmtf']:
    os.remove(filename)