In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import molsysmt as msm



# Convert

The meaning of molecular system 'form', in the context of MolSysMT, has been described previously in the section XXX. There is in MolSysMT a method to convert a form into other form: `molsysmt.convert()`. This method is the keystone of this library, the hinge all other methods and tools in MolSysMT rotates on. And in addition, the joining piece connecting the pipes of your work-flow when using different python libraries.

The method `molsysmt.convert()` requires at least two input arguments: the original pre-existing item in whatever form accepted by MolSysMT (see XXX), and the name of the output form: 

In [3]:
molecular_system = msm.convert('pdb_id:1TCD', 'molsysmt.MolSys')



The id code `1TCD` from the Protein Data Bank is converted into a native `molsysmt.MolSys` python object. At this point, you probably think that this operation can also be done with the method `molsysmt.load()`. And you are right. Actually, `molsysmt.load()` is nothing but an alias of `molsysmt.convert()`. Although redundant, a loading method was included in MolSysMT just for the sake of intuitive usability. But it could be removed from the library since `molsysmt.convert()` has the same functionality.

The following cells illustrate some conversions you can do with `molsysmt.convert()`:

In [4]:
msm.convert('pdb_id:1SUX', '1sux.pdb') # fetching a pdb file to save it locally

'1sux.pdb'

In [5]:
msm.convert('pdb_id:1SUX', '1sux.mmtf') # fetching an mmtf to save it locally

'1sux.mmtf'

In [6]:
pdb_file = msm.demo['TcTIM']['1tcd.pdb']
molecular_system = msm.convert(pdb_file, 'mdtraj.Trajectory') # loading a pdb file as an mdtraj.Trajectory object

In [7]:
seq_aa3 = msm.convert(molecular_system, selection='molecule_type=="protein"', to_form='string:aminoacids3') # converting an mdtraj.Trajectory into a sequence form

In [8]:
seq_aa3

'LysProGlnProIleAlaAlaAlaAsnTrpLysCysAsnGlySerGluSerLeuLeuValProLeuIleGluThrLeuAsnAlaAlaThrPheAspHisAspValGlnCysValValAlaProThrPheLeuHisIleProMetThrLysAlaArgLeuThrAsnProLysPheGlnIleAlaAlaGlnAsnAlaIleThrArgSerGlyAlaPheThrGlyGluValSerLeuGlnIleLeuLysAspTyrGlyIleSerTrpValValLeuGlyHisSerGluArgArgLeuTyrTyrGlyGluThrAsnGluIleValAlaGluLysValAlaGlnAlaCysAlaAlaGlyPheHisValIleValCysValGlyGluThrAsnGluGluArgGluAlaGlyArgThrAlaAlaValValLeuThrGlnLeuAlaAlaValAlaGlnLysLeuSerLysGluAlaTrpSerArgValValIleAlaTyrGluProValTrpAlaIleGlyThrGlyLysValAlaThrProGlnGlnAlaGlnGluValHisGluLeuLeuArgArgTrpValArgSerLysLeuGlyThrAspIleAlaAlaGlnLeuArgIleLeuTyrGlyGlySerValThrAlaLysAsnAlaArgThrLeuTyrGlnMetArgAspIleAsnGlyPheLeuValGlyGlyAlaSerLeuLysProGluPheValGluIleIleGluAlaThrLysSerLysProGlnProIleAlaAlaAlaAsnTrpLysCysAsnGlySerGluSerLeuLeuValProLeuIleGluThrLeuAsnAlaAlaThrPheAspHisAspValGlnCysValValAlaProThrPheLeuHisIleProMetThrLysAlaArgLeuThrAsnProLysPheGlnIleAlaAlaGlnAsnAlaIleThrArgSerGlyAlaPheThrGlyGluValSerLeuGlnIleLeuLysAspTyr

## How to convert just a selection

The conversion can be done over the entiry system or over a part of it. The input argument `selection` works with most of the MolSysMT methods, with `molsysmt.convert()` also. To know more about how to perform selections there is a section on this documentation entitled "XXX". By now, lets see some simple selections to see how it operates: 

In [9]:
pdb_file = msm.demo['TcTIM']['1tcd.pdb']
whole_molecular_system = msm.convert(pdb_file, to_form='openmm.Topology')

In [10]:
msm.info(whole_molecular_system)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_waters,n_proteins,n_structures
openmm.Topology,3983,662,167,4,167,,165,2,0


In [11]:
aa = msm.convert(pdb_file, to_form='string:pdb_text')

In [12]:
msm.get_form(aa)

'string:pdb_text'

In [13]:
molecular_system = msm.convert(pdb_file, to_form='openmm.Topology',
                               selection='molecule_type=="protein"')

In [14]:
msm.info(molecular_system)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_waters,n_proteins,n_structures
openmm.Topology,3983,662,167,4,167,,165,2,0


## How to combine multiple forms into one

Sometimes the molecular system comes from the combination of more than a form. For example, we can have two files with topology and coordinates to be converted into an only molecular form:

In [15]:
prmtop_file = msm.demo['pentalanine']['pentalanine.prmtop']
inpcrd_file = msm.demo['pentalanine']['pentalanine.inpcrd']
molecular_system = msm.convert([prmtop_file, inpcrd_file], to_form='molsysmt.MolSys')

In [16]:
msm.info(molecular_system)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_waters,n_peptides,n_structures
molsysmt.MolSys,5207,1722,1716,1,1716,2,1715,1,1


## How to convert a form into multiple ones at once

In the previous section the way to convert multiple forms into one was illustrated. Lets see now how to produce more than an output form in just a single line:

In [17]:
h5_file = msm.demo['pentalanine']['traj.h5']
topology, structures = msm.convert(h5_file, to_form=['molsysmt.Topology','molsysmt.Structures'])

In [18]:
msm.info(topology)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_peptides,n_structures
molsysmt.Topology,62,7,1,1,1,1,1,


In [19]:
msm.info(structures)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_structures
molsysmt.Structures,62,,,,,,5000


In [20]:
msm.info([topology, structures])

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_peptides,n_structures
"['molsysmt.Topology', 'molsysmt.Structures']",62,7,1,1,1,1,1,5000


Lets now combine both forms into one to see their were properly converted:

In [21]:
pdb_string = msm.convert([topology, structures], to_form='string:pdb_text', structure_indices=1000)
print(pdb_string)

REMARK   1 CREATED WITH OPENMM 7.7 BY MOLSYSMT 0+untagged.786.gd5f0f85.dirty, 2022-04-20
CRYST1   20.000   20.000   20.000  90.00  90.00  90.00 P 1           1 
HETATM    1  H1  ACE 0   1      -0.543  17.716   0.339  1.00  0.00           H  
HETATM    2  CH3 ACE 0   1       0.128  18.016  -0.466  1.00  0.00           C  
HETATM    3  H2  ACE 0   1       0.702  18.811   0.010  1.00  0.00           H  
HETATM    4  H3  ACE 0   1      -0.534  18.283  -1.290  1.00  0.00           H  
HETATM    5  C   ACE 0   1       1.095  16.881  -0.794  1.00  0.00           C  
HETATM    6  O   ACE 0   1       1.119  16.351  -1.907  1.00  0.00           O  
ATOM      7  N   ALA 0   2       2.030  16.563   0.123  1.00  0.00           N  
ATOM      8  H   ALA 0   2       1.862  16.985   1.025  1.00  0.00           H  
ATOM      9  CA  ALA 0   2       3.294  16.016  -0.068  1.00  0.00           C  
ATOM     10  HA  ALA 0   2       3.448  15.867  -1.137  1.00  0.00           H  
ATOM     11  CB  ALA 0   2   

## Some examples with files

In [22]:
PDB_file = msm.demo['TcTIM']['1tcd.pdb']
system_pdbfixer = msm.convert(PDB_file, to_form='pdbfixer.PDBFixer')
system_parmed = msm.convert(PDB_file, to_form='parmed.Structure')



In [23]:
MOL2_file = msm.demo['caffeine']['caffeine.mol2']
system_openmm = msm.convert(MOL2_file, to_form='openmm.Modeller')
system_mdtraj = msm.convert(MOL2_file, to_form='mdtraj.Trajectory')

In [24]:
MMTF_file = msm.demo['TcTIM']['1tcd.mmtf']
system_aminoacids1_seq = msm.convert(MMTF_file, selection='molecule_type=="protein"', to_form='string:aminoacids1')
system_molsys = msm.convert(MMTF_file, to_form='molsysmt.MolSys')



In [25]:
print('Form of object system_pdbfixer: ', msm.get_form(system_pdbfixer))
print('Form of object system_parmed: ', msm.get_form(system_parmed))
print('Form of object system_openmm: ', msm.get_form(system_openmm))
print('Form of object system_mdtraj: ', msm.get_form(system_mdtraj))
print('Form of object system_aminoacids1_seq: ', msm.get_form(system_aminoacids1_seq))
print('Form of object system_molsys: ', msm.get_form(system_molsys))

Form of object system_pdbfixer:  pdbfixer.PDBFixer
Form of object system_parmed:  parmed.Structure
Form of object system_openmm:  openmm.Modeller
Form of object system_mdtraj:  mdtraj.Trajectory
Form of object system_aminoacids1_seq:  string:aminoacids1
Form of object system_molsys:  molsysmt.MolSys


## Some examples with IDs

In [26]:
molecular_system = msm.convert('pdb_id:1TCD', to_form='mdtraj.Trajectory')

## Conversions implemented in MolSysMT

In [27]:
msm.help.convert(from_form='mdtraj.Trajectory', to_form_type='string')

Unnamed: 0,string:aminoacids1,string:aminoacids3,string:pdb_id,string:pdb_text
mdtraj.Trajectory,True,True,False,False


In [28]:
msm.help.convert(from_form='mdtraj.Trajectory', to_form_type='file', as_rows='to')

Unnamed: 0,mdtraj.Trajectory
file:h5,False
file:inpcrd,False
file:mmtf,False
file:mol2,False
file:pdb,True
file:prmtop,False


In [29]:
from_list=['pytraj.Trajectory','mdanalysis.Universe']
to_list=['mdtraj.Trajectory', 'openmm.Topology']
msm.help.convert(from_form=from_list, to_form=to_list)

Unnamed: 0,mdtraj.Trajectory,openmm.Topology
pytraj.Trajectory,True,False
mdanalysis.Universe,True,False
