In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import molsysmt as msm



# Info

*Printing out summary information of a molecular system*



There is in MolSysMT a method to print out a brief overview of a molecular system and its elements. The output of this method can be a `pandas.DataFrame` or a `string`. Lets load a molecular system to illustrate with some simple examples how it works:

In [3]:
item = msm.demo_systems.files['1tcd.mmtf']
molecular_system = msm.convert(item)

## As a DataFrame

### Summary information on atoms

The method `molsysmt.info()` can be applied over any element of the molecular system. Lets see an example where the summary information is shown for a set of atoms when the input argument `output='dataframe'`:

In [4]:
msm.info(molecular_system, target='atom', indices=[9,10,11,12], output='dataframe')

index,id,name,type,group index,group id,group name,group type,component index,chain index,molecule index,molecule type,entity index,entity name
9,10,N,N,1,5,PRO,aminoacid,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
10,11,CA,C,1,5,PRO,aminoacid,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
11,12,C,C,1,5,PRO,aminoacid,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
12,13,O,O,1,5,PRO,aminoacid,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE


The method can also take a selection input argument:

In [5]:
msm.info(molecular_system, target='atom', selection='group_index==6')

index,id,name,type,group index,group id,group name,group type,component index,chain index,molecule index,molecule type,entity index,entity name
45,46,N,N,6,10,ALA,aminoacid,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
46,47,CA,C,6,10,ALA,aminoacid,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
47,48,C,C,6,10,ALA,aminoacid,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
48,49,O,O,6,10,ALA,aminoacid,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
49,50,CB,C,6,10,ALA,aminoacid,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE


Notice that the default option for `output` is 'dataframe'.

### Summary information on groups

Lets see an example where the summary information is shown for a set of groups:

In [6]:
msm.info(molecular_system, target='group', indices=[20,21,22,23])

index,id,name,type,n atoms,component index,chain index,molecule index,molecule type,entity index,entity name
20,24,PRO,aminoacid,7,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
21,25,LEU,aminoacid,8,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
22,26,ILE,aminoacid,8,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
23,27,GLU,aminoacid,9,0,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE


### Summary information on components

Find here now an example on how the method `molsysmt.info()` works over components:

In [7]:
msm.info(molecular_system, target='component', selection='molecule_type!="water"')

index,n atoms,n groups,chain index,molecule index,molecule type,entity index,entity name
0,1906,248,0,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
1,1912,249,1,0,protein,0,TRIOSEPHOSPHATE ISOMERASE


### Summary information on chains

If the summary information on all chains in the molecular system needs to be printed out:

In [8]:
msm.info(molecular_system, target='chain')

index,id,name,n atoms,n groups,n components,molecule index,molecule type,entity index,entity name
0,A,A,1906,248,1,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
1,B,B,1912,249,1,0,protein,0,TRIOSEPHOSPHATE ISOMERASE
2,C,A,73,73,73,[ 1 2 3 ... 71 72 73],['water' 'water' 'water' ... 'water' 'water' 'water'],1,water
3,D,B,92,92,92,[ 74 75 76 ... 163 164 165],['water' 'water' 'water' ... 'water' 'water' 'water'],1,water


### Summary information on molecules

The following is an example on how the method works when the targetted element is 'molecule':

In [9]:
msm.info(molecular_system, target='molecule', selection='molecule_type!="water"')

index,name,type,n atoms,n groups,n components,chain index,entity index,entity name
0,TRIOSEPHOSPHATE ISOMERASE,protein,3818,497,2,[0 1],0,TRIOSEPHOSPHATE ISOMERASE


### Summary information on entities

If the targetted element is 'entity' the method prints out the next summary information:

In [10]:
msm.info(molecular_system, target='entity')

index,name,type,n atoms,n groups,n components,n chains,n molecules
0,TRIOSEPHOSPHATE ISOMERASE,protein,3818,497,2,2,1
1,water,water,165,165,165,2,165


### Summary information on a molecular system

At last, a summary information can be shown on the whole molecular system as follows:

In [11]:
msm.info(molecular_system)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_waters,n_proteins,n_frames
molsysmt.MolSys,3983,662,167,4,166,2,165,1,1


In [12]:
topology, trajectory = msm.convert(item, to_form=['molsysmt.Topology','molsysmt.Trajectory'])

In [13]:
msm.info(topology)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_waters,n_proteins,n_frames
molsysmt.Topology,3983,662,167,4,166,2,165,1,


In [14]:
msm.info(trajectory)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_frames
molsysmt.Trajectory,3983,,,,,,1


In [15]:
msm.info([topology, trajectory])

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_waters,n_proteins,n_frames
"['molsysmt.Topology', 'molsysmt.Trajectory']",3983,662,167,4,166,2,165,1,1


## As a string

The method `molsysmt.info()` can also return a string, short or long, with key information to identify the targetted element.

### Summary information on atoms

If we only need to get a short string encoding the main attributes of an atom, the input argument `output` should take the value 'short_string':

In [16]:
msm.info(molecular_system, target='atom', indices=10, output='short_string')

'CA-11@10'

The string is nothing but the atom name, the atom id and the atom index with '-' between the name and the id, and '@' between the id and the index. The input argument `indices` accepts also a list of indices:

In [17]:
msm.info(molecular_system, target='atom', indices=[10,11,12,13], output='short_string')

['CA-11@10', 'C-12@11', 'O-13@12', 'CB-14@13']

The long version of the string includes the short string of the group, chain and molecule the atom belongs to; with the character '/' in between:

In [18]:
msm.info(molecular_system, target='atom', indices=10, output='long_string')

'CA-11@10/PRO-5@1/A-A@0/TRIOSEPHOSPHATE ISOMERASE@0'

### Summary information on groups

The short string corresponding to a group is composed of its name, id and index. The characters used as separators are the same as with atoms: '-' between name and id, and '@' between id and index.

In [19]:
msm.info(molecular_system, target='group', indices=0, output='short_string')

'LYS-4@0'

The long version of the string includes the short string for the chain and molecule the group belongs to:

In [20]:
msm.info(molecular_system, target='group', indices=3, output='long_string')

'PRO-7@3/A-A@0/TRIOSEPHOSPHATE ISOMERASE@0'

### Summary information on components

The short string with the summary information of a component is its index only:

In [21]:
msm.info(molecular_system, target='component', indices=2, output='short_string')

'2'

The long version of the string includes the chain and molecule the component belongs to with the character '/' as separator.

In [22]:
msm.info(molecular_system, target='component', indices=2, output='long_string')

'2/A-C@2/water@1'

### Summary information on chains

Just like with atoms and groups, the short version of the chain string is made up of the sequence of atributes: name, id and index. The character '-' is in between the chain name and the chain id, and '@' precedes the chain index:

In [23]:
msm.info(molecular_system, target='chain', indices=2, output='short_string')

'A-C@2'

The long version of the string in this case is the same as the short one:

In [24]:
msm.info(molecular_system, target='chain', indices=2, output='long_string')

'A-C@2'

### Summary information on molecules

Molecules have no relevant id attributes, thats why in this case the short string is the molecule name followed by the character '@' and the molecule index:

In [25]:
msm.info(molecular_system, target='molecule', indices=0, output='short_string')

'TRIOSEPHOSPHATE ISOMERASE@0'

As well as with chains, the short and long strings are equivalent here:

In [26]:
msm.info(molecular_system, target='molecule', indices=0, output='long_string')

'TRIOSEPHOSPHATE ISOMERASE@0'

### Summary information on entities

The significant attributes for entities are only two. In this case the string takes the same coding as before, with the character '@' between the name and the index.

In [27]:
msm.info(molecular_system, target='entity', indices=0, output='short_string')

'TRIOSEPHOSPHATE ISOMERASE@0'

The long string is equal to the short string when the targetted element is an entity:

In [28]:
msm.info(molecular_system, target='entity', indices=0, output='long_string')

'TRIOSEPHOSPHATE ISOMERASE@0'