# Template for the User Tutorial


#### Developed at Volkamer Lab, Charit√©/FU Berlin 

Annie Pham

# Reference

### Provide some general information about your method and the source of it

Use this citation style

* Keyword describing resource: Journal (year), volume, pages (link to resource)

Example:

* ChEMBL web services: Nucleic Acids Res. (2015), 43, 612-620 (https://academic.oup.com/nar/article/43/W1/W612/2467881)

# Introduction


## What are the  chosen structures

*These points should refer to the headlines of your theory section below.*

* Names of the structures 
* Function of the structures

## Why they have been chosen

*These points should refer to the headlines of your practical section below.*

* Interesting facts (e.g. some event that happened)

# Theory

## RMSD

The RMSD is the average distance between the atoms of superposed structures in Angstrom.

## coverage

The coverage of the aligned structures.

## Details of the algorithm

### MatchMaker
    
#### Introduction:

MatchMaker superimposes protein by first creating pairwise sequence alignments, then fitting the aligned residue pairs.
The standard Needleman-Wunsch and Smith-Waterman algorithms are available for producing global and local sequence alignments and MatchMaker can identify the best-matching chains based on alignment scores.
Alignment scores can include residue similarity, secondary structure information, and gap penalties.
MatchMaker performs a spatial superposition by minimizing the RMSD.
    
#### Preparation:

Generating pairwise sequence alignments and matching, i.e., superimposing the structures according to those pairwise alignments

#### Alignment:

Spatially align the group of atoms `mobile` to `reference` by doing a RMSD fit on `select` atoms.

- A rotation matrix is computed that minimizes the RMSD between the coordinates of `mobile.select_atoms(sel1)` and `reference.select_atoms(sel2)`; before the rotation, `mobile` is translated so that its center of geometry (or center of mass)
coincides with the one of `reference`.
- All atoms in `~MDAnalysis.core.universe.Universe` that contain `mobile` are shifted and rotated.
 
#### Analysis:

RMSD before and after spatial alignment


# Preparation

## How to get the structure from the CLI

To get the structures directly from the RCSB, the syntax looks like this:

In [1]:
!structuralalignment --method=YOUR_METHOD NAME_OF_STRUCTURE_1 NAME_OF_STRUCTURE_2

usage: structuralalignment [-h] [--version] [-v]
                           [--method {theseus,mmligner,matchmaker}]
                           [--method-options METHOD_OPTIONS]
                           structures [structures ...]
structuralalignment: error: argument --method: invalid choice: 'YOUR_METHOD' (choose from 'theseus', 'mmligner', 'matchmaker')


When you want to use structures which are locally saved, do this:

In [2]:
!structuralalignment --method=YOUR_METHOD PATH_OF_STRUCTURE_1 PATH_OF_STRUCTURE_2

usage: structuralalignment [-h] [--version] [-v]
                           [--method {theseus,mmligner,matchmaker}]
                           [--method-options METHOD_OPTIONS]
                           structures [structures ...]
structuralalignment: error: argument --method: invalid choice: 'YOUR_METHOD' (choose from 'theseus', 'mmligner', 'matchmaker')


## Getting the structure in python

The method will use atomium.models as input.

If you want to get the structures from the RCSB, you can do the following:

In [3]:
%load_ext autoreload

In [4]:
%autoreload 2

In [5]:
import atomium

structure1 = atomium.fetch("4u3y").model
structure2 = atomium.fetch("4u40").model

# Alignment 

Use your method and explain the steps it takes.

In [129]:
from structuralalignment.superposition.matchmaker import MatchMakerAligner, mda_align

aligner = MatchMakerAligner(alignment_strategy="local")
res = aligner.calculate([structure1, structure2])
res

{'superposed': [<Universe with 5478 atoms>, <Universe with 4985 atoms>],
 'scores': {'rmsd': None},
 'metadata': {'selection': {'reference': '( resid 1 and ( name CA or name CB ) ) or ( resid 2 and ( name CA or name CB ) ) or ( resid 3 and ( name CA or name CB ) ) or ( resid 4 and ( name CA or name CB ) ) or ( resid 5 and ( name CA or name CB ) ) or ( resid 6 and ( name CA or name CB ) ) or ( resid 7 and name CA ) or ( resid 8 and ( name CA or name CB ) ) or ( resid 9 and ( name CA or name CB ) ) or ( resid 10 and ( name CA or name CB ) ) or ( resid 11 and ( name CA or name CB ) ) or ( resid 12 and ( name CA or name CB ) ) or ( resid 13 and ( name CA or name CB ) ) or ( resid 14 and ( name CA or name CB ) ) or ( resid 15 and ( name CA or name CB ) ) or ( resid 16 and name CA ) or ( resid 17 and ( name CA or name CB ) ) or ( resid 18 and name CA ) or ( resid 19 and ( name CA or name CB ) ) or ( resid 20 and ( name CA or name CB ) ) or ( resid 21 and name CA ) or ( resid 22 and ( name CA

In [130]:
ref = res["superposed"][0]
mob = res["superposed"][1]

In [133]:
moba = ref.select_atoms(res["metadata"]["selection"]["reference"])

<AtomGroup with 1101 atoms>

In [135]:
refa = mob.select_atoms(res["metadata"]["selection"]["mobile"])

In [136]:
print(*refa, sep="\n")

<Atom 2: CA of type C of resname SER, resid 17 and segid A and altLoc >
<Atom 5: CB of type C of resname SER, resid 17 and segid A and altLoc >
<Atom 8: CA of type C of resname LEU, resid 18 and segid A and altLoc >
<Atom 16: CA of type C of resname ARG, resid 19 and segid A and altLoc >
<Atom 19: CB of type C of resname ARG, resid 19 and segid A and altLoc >
<Atom 27: CA of type C of resname ASP, resid 20 and segid A and altLoc >
<Atom 30: CB of type C of resname ASP, resid 20 and segid A and altLoc >
<Atom 35: CA of type C of resname PRO, resid 21 and segid A and altLoc >
<Atom 42: CA of type C of resname ALA, resid 22 and segid A and altLoc >
<Atom 45: CB of type C of resname ALA, resid 22 and segid A and altLoc >
<Atom 47: CA of type C of resname GLY, resid 23 and segid A and altLoc >
<Atom 51: CA of type C of resname ILE, resid 24 and segid A and altLoc >
<Atom 54: CB of type C of resname ILE, resid 24 and segid A and altLoc >
<Atom 59: CA of type C of resname PHE, resid 25 and se

<Atom 4068: CA of type C of resname GLU, resid 225 and segid B and altLoc >
<Atom 4071: CB of type C of resname GLU, resid 225 and segid B and altLoc >
<Atom 4077: CA of type C of resname MET, resid 226 and segid B and altLoc >
<Atom 4080: CB of type C of resname MET, resid 226 and segid B and altLoc >
<Atom 4085: CA of type C of resname ALA, resid 227 and segid B and altLoc >
<Atom 4088: CB of type C of resname ALA, resid 227 and segid B and altLoc >
<Atom 4090: CA of type C of resname GLU, resid 228 and segid B and altLoc >
<Atom 4093: CB of type C of resname GLU, resid 228 and segid B and altLoc >
<Atom 4099: CA of type C of resname GLY, resid 229 and segid B and altLoc >
<Atom 4103: CA of type C of resname ALA, resid 230 and segid B and altLoc >
<Atom 4106: CB of type C of resname ALA, resid 230 and segid B and altLoc >
<Atom 4108: CA of type C of resname PRO, resid 231 and segid B and altLoc >
<Atom 4111: CB of type C of resname PRO, resid 231 and segid B and altLoc >
<Atom 4115: 

In [121]:
from biotite.sequence.io.fasta import FastaFile
fasta = FastaFile()
fasta["ref"] = aln.get_gapped_sequences()[0]
fasta["mob"] = aln.get_gapped_sequences()[1]
from io import StringIO
memfile = StringIO()
fasta.write(memfile)
memfile.seek(0)
print(memfile.getvalue())

>ref
SLRDPAGIFELVEVVGNGTYGQVYKGRHVKTGQLAAIKVMDVTEDEEEEIKLEINMLKKYSHHRNIATYYGAFIKKSPPG
HDDQLWLVMEFCGAGSITDLVKNTKGNTLKEDWIAYISREILRGLAHLHIHHVIHRDIKGQNVLLTENAEVKLVDFGVSA
QLDRTVGRRNTFIGTPYWMAPEVIACDENPDATYDYRSDLWSCGITAIEMAEGAPPLCDMHPMRALFLIPRNPPPRLKSK
KWSKKFFSFIEGCLVKNYMQRPSTEQLLKHPFIRDQPNERQVRIQLKDHIDRTRKKSLVDIDLSSLRDPAGIFELVEVVG
NGTYGQVYKGRHVKTGQLAAIKVMDVTEDEEEEIKLEINMLKKYSHHRNIATYYGAFIKKS----DDQLWLVMEFCGAGS
ITDLVKNTKGNTLKEDWIAYISREILRGLAHLHIHHVIHRDIKGQNVLLTENAEVKLVDFGVSAQLDRTVGRRNTFIGTP
YWMAPEVIACDENPDATYDYRSDLWSCGITAIEMAEGAPPLCDMHPMRALFLIPRNPPPRLKSKKWSKKFFSFIEGCLVK
NYMQRPSTEQLLKHPFIRDQPNERQVRIQLKDHIDRTRK
>mob
SLRDPAGIFELVEVVGNGTYGQVYKGRHVKTGQLAAIKVMDVTEDEEEEIKLEINMLKKYSHHRNIATYYGAFIKKSPPG
HDDQLWLVMEFCGAGSITDLVKNTKGNTLKEDWIAYISREILRGLAHLHIHHVIHRDIKGQNVLLTENAEVKLVDFGVSA
QLDRTVGRRNTFIGTPYWMAPEVIACDENPDATYDYRSDLWSCGITAIEMAEGAPPLCDMHPMRALFLIPRNPPPRLKSK
KWSKKFFSFIEGCLVKNYMQRPSTEQLLKHPFIRDQPNERQVRIQLKDHIDRTRK-----IDLSSLRDPAGIFELVEVVG
NGTYGQVYKGRHVKTGQLAAIKVMDVTEDEEEEIKLEINMLKKYSHHRNIATYYGAFIK

In [124]:
from MDAnalysis.analysis.align import fasta2select
with open("some.fasta", "w") as f:
    f.write(memfile.getvalue())

d = fasta2select("some.fasta", is_aligned=True)
d

{'reference': '( resid 1 and ( backbone or name CB ) ) or ( resid 2 and ( backbone or name CB ) ) or ( resid 3 and ( backbone or name CB ) ) or ( resid 4 and ( backbone or name CB ) ) or ( resid 5 and ( backbone or name CB ) ) or ( resid 6 and ( backbone or name CB ) ) or ( resid 7 and backbone ) or ( resid 8 and ( backbone or name CB ) ) or ( resid 9 and ( backbone or name CB ) ) or ( resid 10 and ( backbone or name CB ) ) or ( resid 11 and ( backbone or name CB ) ) or ( resid 12 and ( backbone or name CB ) ) or ( resid 13 and ( backbone or name CB ) ) or ( resid 14 and ( backbone or name CB ) ) or ( resid 15 and ( backbone or name CB ) ) or ( resid 16 and backbone ) or ( resid 17 and ( backbone or name CB ) ) or ( resid 18 and backbone ) or ( resid 19 and ( backbone or name CB ) ) or ( resid 20 and ( backbone or name CB ) ) or ( resid 21 and backbone ) or ( resid 22 and ( backbone or name CB ) ) or ( resid 23 and ( backbone or name CB ) ) or ( resid 24 and ( backbone or name CB ) ) o

# Analysis

### NGLview

If you have trouble with NGLview, follow this [troubleshooting guide](https://github.com/SBRG/ssbio/wiki/Troubleshooting#tips-for-nglview).

In [113]:
import nglview as nv
print("nglview version = {}".format(nv.__version__))
# your nglview version should be 1.1.7 or later

view = nv.show_mdanalysis(res["superposed"][0])
view.add_component(res["superposed"][1])
view

nglview version = 2.7.5


NGLWidget()

## Report

* RMSD before and after
* coverage
* what residues are mapped