In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import molsysmt as msm

_ColormakerRegistry()

# How to convert a form into other form

The meaning of molecular system 'form', in the context of MolSysMT, has been described previously in the section XXX. There is in MolSysMT a method to convert a form into other form: `molsysmt.convert()`. This method is the keystone of this library, the hinge all other methods and tools in MolSysMT rotates on. And in addition, the joining piece connecting the pipes of your work-flow when using different python libraries.

The method `molsysmt.convert()` requires at least two input arguments: the original pre-existing item in whatever form accepted by MolSysMT (see XXX), and the name of the output form: 

In [3]:
molecular_system = msm.convert('mmtf:1tcd', 'molsysmt.MolSys')

The id code `1tcd` from the MMTF Protein Data Bank is converted into a native `molsysmt.MolSys` python object. At this point, you probably think that this operation can also be done with the method `molsysmt.load()`. And you are right. Actually, `molsysmt.load()` is nothing but an alias of `molsysmt.convert()`. Although redundant, a loading method was included in MolSysMT just for the sake of intuitive usability. But it could be removed from the library since `molsysmt.convert()` has the same functionality.

The following cells illustrate some conversions you can do with `molsysmt.convert()`:

In [4]:
msm.convert('pdb:1sux', '1sux.pdb') # fetching a pdb file to save it locally

In [5]:
msm.convert('mmtf:1sux', '1sux.mmtf') # fetching an mmtf to save it locally

In [6]:
molecular_system = msm.convert('1tcd.pdb', 'mdtraj.Trajectory') # loading a pdb file as an mdtraj.Trajectory object

In [7]:
seq_aa3 = msm.convert(molecular_system, 'aminoacids3:seq') # converting the system to an aminoacids sequence

In [8]:
print(seq_aa3)

aminoacids3:LysProGlnProIleAlaAlaAlaAsnTrpLysCysAsnGlySerGluSerLeuLeuValProLeuIleGluThrLeuAsnAlaAlaThrPheAspHisAspValGlnCysValValAlaProThrPheLeuHisIleProMetThrLysAlaArgLeuThrAsnProLysPheGlnIleAlaAlaGlnAsnAlaIleThrArgSerGlyAlaPheThrGlyGluValSerLeuGlnIleLeuLysAspTyrGlyIleSerTrpValValLeuGlyHisSerGluArgArgLeuTyrTyrGlyGluThrAsnGluIleValAlaGluLysValAlaGlnAlaCysAlaAlaGlyPheHisValIleValCysValGlyGluThrAsnGluGluArgGluAlaGlyArgThrAlaAlaValValLeuThrGlnLeuAlaAlaValAlaGlnLysLeuSerLysGluAlaTrpSerArgValValIleAlaTyrGluProValTrpAlaIleGlyThrGlyLysValAlaThrProGlnGlnAlaGlnGluValHisGluLeuLeuArgArgTrpValArgSerLysLeuGlyThrAspIleAlaAlaGlnLeuArgIleLeuTyrGlyGlySerValThrAlaLysAsnAlaArgThrLeuTyrGlnMetArgAspIleAsnGlyPheLeuValGlyGlyAlaSerLeuLysProGluPheValGluIleIleGluAlaThrLysSerLysProGlnProIleAlaAlaAlaAsnTrpLysCysAsnGlySerGluSerLeuLeuValProLeuIleGluThrLeuAsnAlaAlaThrPheAspHisAspValGlnCysValValAlaProThrPheLeuHisIleProMetThrLysAlaArgLeuThrAsnProLysPheGlnIleAlaAlaGlnAsnAlaIleThrArgSerGlyAlaPheThrGlyGluValSerLeuGlnIleL

In [9]:
seq_aa1 = msm.convert(seq_aa3, 'aminoacids1:seq') # converting a sequence into other sequence form

In [10]:
print(seq_aa1)

aminoacids1:KPQPIAAANWKCNGSESLLVPLIETLNAATFDHDVQCVVAPTFLHIPMTKARLTNPKFQIAAQNAITRSGAFTGEVSLQILKDYGISWVVLGHSERRLYYGETNEIVAEKVAQACAAGFHVIVCVGETNEEREAGRTAAVVLTQLAAVAQKLSKEAWSRVVIAYEPVWAIGTGKVATPQQAQEVHELLRRWVRSKLGTDIAAQLRILYGGSVTAKNARTLYQMRDINGFLVGGASLKPEFVEIIEATKSKPQPIAAANWKCNGSESLLVPLIETLNAATFDHDVQCVVAPTFLHIPMTKARLTNPKFQIAAQNAITRSGAFTGEVSLQILKDYGISWVVLGHSERRLYYGETNEIVAEKVAQACAAGFHVIVCVGETNEEREAGRTAAVVLTQLAAVAQKLSKEAWSRVVIAYEPVWAIGTGKVATPQQAQEVHELLRRWVRSKLGTDIAAQLRILYGGSVTAKNARTLYQMRDINGFLVGGASLKPEFVEIIEATKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


## How to convert just a selection

The conversion can be done over the entiry system or over a part of it. The input argument `selection` works with most of the MolSysMT methods, with `molsysmt.convert()` also. To know more about how to perform selections there is a section on this documentation entitled "XXX". By now, lets see some simple selections to see how it operates: 

In [14]:
whole_molecular_system = msm.convert('1tcd.mmtf', to_form='molsysmt.DataFrame')

In [16]:
msm.info(whole_molecular_system)

form,n atoms,n groups,n components,n chains,n molecules,n entities,n waters,n proteins,n frames
molsysmt.DataFrame,3983,662,167,4,166,2,165,1,0


In [17]:
molecular_system = msm.convert('1tcd.mmtf', to_form='molsysmt.DataFrame', selection='molecule.type=="protein"')

In [18]:
msm.info(molecular_system)

form,n atoms,n groups,n components,n chains,n molecules,n entities,n proteins,n frames
molsysmt.DataFrame,3818,497,2,2,1,1,1,0


## How to combine multiple forms into one

Sometimes the molecular system comes from the combination of more than a form. For example, we can have two files with topology and coordinates to be converted into an only molecular form:

In [19]:
molecular_system = msm.convert(['pentalanine.prmtop','pentalanine.inpcrd'], to_form='molsysmt.MolSys')

In [20]:
msm.info(molecular_system)

form,n atoms,n groups,n components,n chains,n molecules,n entities,n frames
molsysmt.MolSys,5207,1722,1716,1,1,1,1


## How to convert a form into multiple ones at once

In the previous section the way to convert multiple forms into one was illustrated. Lets see now how to produce more than an output form in just a single line:

In [21]:
dataframe, trajectory = msm.convert('pentalanine.h5', to_form=['molsysmt.DataFrame','molsysmt.Trajectory'])

In [22]:
msm.info(dataframe)

form,n atoms,n groups,n components,n chains,n molecules,n entities,n frames
molsysmt.DataFrame,62,7,1,1,1,1,0


In [23]:
msm.info(trajectory)

form,n atoms,n groups,n components,n chains,n molecules,n entities,n waters,n ions,n cosolutes,n small molecules,n peptides,n proteins,n dnas,n rnas,n frames
molsysmt.Trajectory,62,,,,,,,,,,,,,,5000


Lets now combine both forms into one to see their were properly converted:

In [24]:
pdb_string = msm.convert([dataframe,trajectory], to_form='.pdb', frame_indices=0)
print(pdb_string)

REMARK   1 CREATED WITH OPENMM 7.4.1 BY MOLSYSMT, 2020-05-16
HETATM    1  H1  ACE A   1       7.249   2.812  -0.651  1.00  0.00           H  
HETATM    2  CH3 ACE A   1       8.184   3.354  -0.797  1.00  0.00           C  
HETATM    3  H2  ACE A   1       8.246   3.843  -1.769  1.00  0.00           H  
HETATM    4  H3  ACE A   1       8.879   2.516  -0.838  1.00  0.00           H  
HETATM    5  C   ACE A   1       8.514   4.229   0.378  1.00  0.00           C  
HETATM    6  O   ACE A   1       9.025   3.617   1.320  1.00  0.00           O  
ATOM      7  N   ALA A   2       8.131   5.505   0.206  1.00  0.00           N  
ATOM      8  H   ALA A   2       7.661   5.747  -0.654  1.00  0.00           H  
ATOM      9  CA  ALA A   2       8.375   6.544   1.175  1.00  0.00           C  
ATOM     10  HA  ALA A   2       8.686   6.060   2.100  1.00  0.00           H  
ATOM     11  CB  ALA A   2       9.619   7.378   0.809  1.00  0.00           C  
ATOM     12  HB1 ALA A   2       9.428   7.802  

## Some examples with files

In [25]:
PDB_file = '1brs.pdb'
system_pdbfixer = msm.convert(PDB_file, to_form='pdbfixer.PDBFixer')
system_parmed = msm.convert(PDB_file, to_form='parmed.Structure')

In [26]:
MOL2_file = 'caffeine.mol2'
system_openmm = msm.convert(MOL2_file, to_form='openmm.Modeller')
system_mdtraj = msm.convert(MOL2_file, to_form='mdtraj.Trajectory')

In [27]:
MMTF_file = '1tcd.mmtf'
system_aminoacids1_seq = msm.convert(MMTF_file, to_form='aminoacids1:seq')
system_molsys = msm.convert(MMTF_file)

In [28]:
print('Form of object system_pdbfixer: ', msm.get(system_pdbfixer, target='system', form=True))
print('Form of object system_parmed: ', msm.get(system_parmed, target='system', form=True))
print('Form of object system_openmm: ', msm.get(system_openmm, target='system', form=True))
print('Form of object system_mdtraj: ', msm.get(system_mdtraj, target='system', form=True))
print('Form of object system_aminoacids1_seq: ', msm.get(system_aminoacids1_seq, target='system', form=True))
print('Form of object system_molsys: ', msm.get(system_molsys, target='system', form=True))

Form of object system_pdbfixer:  pdbfixer.PDBFixer
Form of object system_parmed:  parmed.Structure
Form of object system_openmm:  openmm.Modeller
Form of object system_mdtraj:  mdtraj.Trajectory
Form of object system_aminoacids1_seq:  aminoacids1:seq
Form of object system_molsys:  molsysmt.MolSys


A single file can be converted into more than a form in just a line:

In [30]:
dataframe, trajectory = msm.convert('pentalanine.h5', to_form=['molsysmt.DataFrame','molsysmt.Trajectory'])

When the output file path is only a dot followed by the file extension, the output is a string insted of a written file. Lets see how this works when two forms are combinend into a pdb string:

In [31]:
pdb_string = msm.convert([dataframe,trajectory], to_form='.pdb', frame_indices=0)
print(pdb_string)

REMARK   1 CREATED WITH OPENMM 7.4.1 BY MOLSYSMT, 2020-05-16
HETATM    1  H1  ACE A   1       7.249   2.812  -0.651  1.00  0.00           H  
HETATM    2  CH3 ACE A   1       8.184   3.354  -0.797  1.00  0.00           C  
HETATM    3  H2  ACE A   1       8.246   3.843  -1.769  1.00  0.00           H  
HETATM    4  H3  ACE A   1       8.879   2.516  -0.838  1.00  0.00           H  
HETATM    5  C   ACE A   1       8.514   4.229   0.378  1.00  0.00           C  
HETATM    6  O   ACE A   1       9.025   3.617   1.320  1.00  0.00           O  
ATOM      7  N   ALA A   2       8.131   5.505   0.206  1.00  0.00           N  
ATOM      8  H   ALA A   2       7.661   5.747  -0.654  1.00  0.00           H  
ATOM      9  CA  ALA A   2       8.375   6.544   1.175  1.00  0.00           C  
ATOM     10  HA  ALA A   2       8.686   6.060   2.100  1.00  0.00           H  
ATOM     11  CB  ALA A   2       9.619   7.378   0.809  1.00  0.00           C  
ATOM     12  HB1 ALA A   2       9.428   7.802  

## Some examples with IDs

In [32]:
molecular_system = msm.convert('pdb:1SUX', to_form='mdtraj.Trajectory')