In [1]:
# This cell is removed with the tag: "remove-input"
# As such, it will not be shown in documentation

#import warnings
#warnings.filterwarnings('ignore')

(Tutorial_Select)=
# Select
*Selecting elements in a molecular system*

Elements selections is probably one of the most frequently tasks when we work with molecular systems. There are many circumstances under which we need to know lists of elements acomplishing certain conditions. We probably need, for instance, to calculate de contact map between CA atoms from two chains, or to remove the solvent atoms, or to know how many 'HIS' residues there are in a protein. All these conditions can be expresed as a query sentence the elements of the system needs to match. Each library to work with molecular systems has its own syntax you have to follow to write this query sentences. See for instance different examples in the tools [MDTraj](https://www.mdtraj.org/1.9.7/atom_selection.html), [PyTraj](https://amber-md.github.io/pytraj/latest/atom_mask_selection.html?highlight=select), [MDAnalysis](https://docs.mdanalysis.org/stable/documentation_pages/selections.html), [NGLView](http://nglviewer.org/ngl/api/manual/usage/selection-language.html).

The basic module of MolSysMT includes the function {func}`molsysmt.basic.select` to perform elements selection on molecular systems with a native syntax or making use of other supported supported syntaxes.

## MolSysMT selection syntax

Although you can use the function {func}`molsysmt.basic.select` with some of your preferred syntaxes (see [User guide > Introduction > Selection syntaxes](../../intro/selection_syntaxes.ipynb)), MolSysMT has its own selection syntax based on the names of the attributes of the elements as atoms, groups, molecules, etc. These are the "words" allowed for the elements attributes to be include in the query strings:

<br/>

| Word | Attribute |
|---|---|
| "atom_index" | atom_index |
| "atom_name" | atom_name |
| "atom_id" | atom_id |
| "atom_type" | atom_type |
| "group_index" | group_index |
| "group_name" | group_name |
| "group_id" | group_id |
| "group_type" | group_type |
| "component_index" | component_index |
| "component_name" | component_name |
| "component_id" | component_id |
| "component_type" | component_type |
| "chain_index" | chain_index |
| "chain_name" | chain_name |
| "chain_id" | chain_id |
| "chain_type" | chain_type |
| "molecule_index" | molecule_index |
| "molecule_name" | molecule_name |
| "molecule_id" | molecule_id |
| "molecule_type" | molecule_type |
| "entity_index" | entity_index |
| "entity_name" | entity_name |
| "entity_id" | entity_id |
| "entity_type" | entity_type |
| "occupancy" | occupancy |
| "alternate_location" | alternate_location |
| "b_factor" | b_factor 
| "formal_charge" | formal_charge |
| "partial_charge" | partial_charge |

<br/>

The plural forms of the former words are also accepted. For example:

<br/>

| Word | Equivalent |
|---|---|
| "atom_indices" | "atom_index" |
| "component_names" | "component_name" |
| "occupancies" | "occupancy" |
| "formal_charges" | "formal_charge" |

<br/>

The boolean syntax MolSysMT accepts includes the following words and symbols:

<br/>

| Word | Symbol | Meaning |
|---|---|---|
| "and" | & | and |
| "or" | \| | or |
| "not" | ~ | not |
| "in" | | in |
|  | == | equal |
|  | != | not equal |
|  | < | less than |
|  | <= | less or equal than |
|  | > | greater than |
|  | >= | greater or equal than |

<br/>

MolSysMT includes in its native selection syntax some other query restrictions involving spatial coordinates and bonds:

<br/>

| Words | Meaning |
|---|---|
| "within ... of" | Elements are selected if they are at a certain distance of other elements |
| "not within ... of" | Elements are selected if they not are at a certain distance of other elements|
| "within ... with pbc of" | Elements are selected if they are at a certain distance of other elements with periodic boundary conditions|
| "within ... without pbc of" | Elements are selected if they are at a certain distance of other elements without periodic boundary conditions|

<br/>

| Words | Meaning |
|---|---|
| "bonded to" | Elements are selected if they are bonded to other elements |
| "not bonded to" | Elements are selected if they are not bonded to other elements|

<br/>

### Native selection shortcuts

In addition to the former words and symbols, the native MolSysMT selection syntax includes some shortcuts:

<br/>

| Word | Equivalent |
|---|---|
| "index" or "indices" | "*element*_index" where *element* is given by the input argument ``element``|
| "id" or "ids" | "*element*_id" where *element* is given by the input argument ``element``|
| "name" or "names" | "*element*_name" where *element* is given by the input argument ``element``|
| "type" or "types" | "*element*_type" where *element* is given by the input argument ``element``|

<br/>

| Word | Equivalent |
|---|---|
| "backbone" | "atom_name in ['CA', 'N', 'C', 'O']"|
| "hydrogen" or "hydrogens" | "atom_type=='H'"|

<br/>

### Your own selection shortcuts

You can customize the selection experience with your own selection shortcuts. Have a look to the section [User guide > Introduction > Configuration options](../../intro/configuration_options.ipynb)

## How this function works

:::{admonition} API documentation
Follow this link for a detailed description of the input arguments, raised errors, and returned objects of this function:{func}`molsysmt.basic.select`.
:::

Let's illustrate how the function {func}`molsysmt.basic.select` works making use of the MolSysMT native syntax. The following are some examples where a list of atoms is obtained matching some selection criteria based on atoms attributes:

In [None]:
import molsysmt as msm

In [None]:
molecular_system = msm.convert('1TCD', to_form='molsysmt.MolSys')

In [None]:
# Atoms with name CA or CB and id < 20
msm.select(molecular_system, element='atom', selection='atom_name in ["CA","CB"] and atom_id<20')

```{admonition} Tip
:class: tip
All functions defined in the {ref}`molsysmt.basic <API basic>` module can be invoked also from the main level of the library. Hence, {func}`molsysmt.select` is the same function as {func}`molsysmt.basic.select`.
```

In [None]:
# Heavy atoms
msm.select(molecular_system, 'not atom_type=="H"')

```{admonition} Tip
:class: tip
By default, the input argument `element` takes the value "atom" in {func}`molsysmt.basic.select`.
```

The selection argument accepts also lists of indices, which are **always** intepreted as element indices.

In [None]:
msm.select(molecular_system, element='atom', selection=[0,1,2])

## Some examples of atoms selections

Let's show here some examples of atoms selections using the MolSysMT native syntax.

In [None]:
# Atoms of type C not named CA
msm.select(molecular_system, 'atom_type=="C" and not atom_name=="CA"')

In [None]:
# Atoms not named CA, CB or C
msm.select(molecular_system, 'atom_name!=["CA","CB","C"]')

Atoms can be selected using attributes of other elements in the hierarchical organization of the molecular system: 'group', 'component', 'molecule', 'chain', 'entity'. You can find further information of these elements in [User guide > Introduction > Molecular system > Elements](../../intro/molecular_dynamics/elements.ipynb). These are some examples of selection sentences including other criteria than atoms attributes:

In [None]:
# Atoms belonging to molecules of type water.
msm.select(molecular_system, 'molecule_type=="water"')

In [None]:
# Heavy atoms belonging to molecules of type protein.
msm.select(molecular_system, 'molecule_type=="protein" and atom_type!="H" and group_index==3')

In [None]:
# Atoms belonging to residues named GLY, ALA or VAL in chain id A.
msm.select(molecular_system, 'group_name==["GLY","ALA","VAL"] and chain_id=="A"') 

Remember that when selection takes a list of integers as value, this is **always** intepreted as a list of element indices -atoms in this case-:

In [None]:
msm.select(molecular_system, selection=[10,11,12,13]) 

## Some examples of other elements selections

The selection method of MolSysMT can also return other elements indices than atoms. As many functions in MolSysMT, {func}`molsysmt.basic.select()` has an input argument named `element` to select the elements to which the function operates. Let's see some examples:

In [None]:
# Groups with name "ALA"
msm.select(molecular_system,  element='group', selection='group_name=="ALA"')

In [None]:
# Groups of atoms index 34, 44 or 64
msm.select(molecular_system, element='group', selection='atom_index==[34,44,64]')

In [None]:
# Groups belonging to chain id A or C and molecule of type anything but water
msm.select(molecular_system, element='group', selection='chain_id in ["A", "C"] and molecule_type!="water"')

In [None]:
# Molecules of type water
msm.select(molecular_system, 'molecule_type=="water"', element='molecule')

In [None]:
# Chains with molecules of type water
msm.select(molecular_system, 'molecule_type=="water"', element='chain')

In [None]:
# Bonds in group index 5
msm.select(molecular_system, 'group_index==5', element='bond')

When selection takes a list of integers as value, this is **always** intepreted as a list of elements indices:

In [None]:
msm.select(molecular_system, element='group', selection=[0,1,2,3,4,5,6,7,8,9,10,11])

In [None]:
msm.select(molecular_system, element='molecule', selection=[3900, 3910, 3920])

## Including external variables in the selection sentence

Pandas query method allows the use of external variables in the logical sentence. To include them, variables names have to be preceded by the character '@'. Let's illustrate its use with some examples:

In [None]:
# Atoms in groups with indices 10, 11 or 12.
indices=[10,11,12]
msm.select(molecular_system, 'group_index==@indices')

In [None]:
# Atoms named CA, C, O or N in groups with indices 10 to 29.
indices=list(range(10,30))
atoms=["CA", "C", "O", "N"]
msm.select(molecular_system, 'atom_name==@atoms & atom_index==@indices') 

In [None]:
# Groups with indices equal to 0, 100 or 200
indices=[0,100,200]
msm.select(molecular_system, element='group', selection='group_index==@indices')

## Selecting elements "within a distance of"

A selection of elements within a certain distance of a set of elements can be obtained using the string `within ... of`. Here you can find some examples:

In [None]:
msm.select(molecular_system, 'chain_id=="A" within 0.3 nm of chain_id=="B"')

In [None]:
msm.select(molecular_system, '(atom_name=="N" and chain_id=="A") within 3 angstroms of (atom_type=="O" and molecule_type=="water")')

In [None]:
msm.select(molecular_system, '(atom_name=="CA" and chain_id=="A") within 0.5 nm of (atom_name=="CA" and chain_id=="B")',
          element='group')

The string "not within ... of ..." can also be used:

In [None]:
msm.select(molecular_system, 'chain_id=="A" not within 7.8 nanometers of chain_id=="B"')

And distances can be interpreted with or without periodic boundary conditions:

In [None]:
msm.select(molecular_system, 'chain_id=="A" within 0.3 nm without pbc of chain_id=="B"')

In [None]:
msm.select(molecular_system, 'chain_id=="A" within 0.3 nm with pbc of chain_id=="B"')

## Selecting atoms "bonded to ..."

Atoms bonded to specific atoms can also be selected with `bonded to`:

In [None]:
msm.select(molecular_system, 'atom_name=="N" bonded to atom_type=="C"')

In [None]:
msm.select(molecular_system, '(atom_type=="O" and chain_id=="A") bonded to (atom_type=="C" and chain_id=="A")')

The string "not bonded to" can also be used:

In [None]:
msm.select(molecular_system, '(all not bonded to atom_type==["H","N","C","O"]) and molecule_type=="protein"')

And both, `within .. of` and `bonded to`, can be mixed in the same selection sentence:

In [None]:
msm.select(molecular_system, '((atom_name=="N" and chain_id=="A") bonded to atom_type=="C") within 3 angstroms of (atom_type=="O" and molecule_type=="water")')

## Other syntaxes supported

There is no need for the user to learn the native syntax. Other syntaxes such as the used by MDTraj is also supported by MolSysMT:

In [None]:
msm.select(molecular_system, selection='(name =~ "C[A-B]") and (resid 1 to 3)', syntax='MDTraj')

## Translation between different syntaxes

MolSysMT is prepared to easily interact with other tools. The main goal of this library is providing with a set of pipes and joins to set up your workflows, keeping simple the integration of other tools. But different tools have different selection syntax. Learning how to use the selection syntax of MDTraj, ParmEd or NGLview is something very useful. Those are tools that we all use frequently in our labs. But it happens that we forget soon the rules of each tool. To keep a unique selection syntax in your projects, MolSysMT includes the input argument `to_syntax` in the method {func}`molsysmt.basic.select()`. Lets illustrate some examples:

In [None]:
msm.select(molecular_system, selection='group_index==[3,4,5]', to_syntax='NGLView')

In [None]:
msm.select(molecular_system, selection='group_index==[3,4,5]', to_syntax='MDTraj')

The output string can be obtained, if the selection is done over other elementted elements, as a sequence of groups or chains:

In [None]:
msm.select(molecular_system, element='group', selection='group_index==[3,4,5]', to_syntax='NGLView')

In [None]:
msm.select(molecular_system, element='group', selection='group_index==[3,4,5]', to_syntax='MDTraj')

:::{seealso} 
[User guide > Introduction > Molecular System > Attributes](../../intro/molecular_systems/attributes.ipynb):    
[User guide > Introduction > Selection syntaxes ](../../intro/selection_syntaxes.ipynb):    
[User guide > Tools > Basic > Convert](convert.ipynb):
[User guide > Tools > Basic > Info](info.ipynb):       
:::