# Structural Alignment with Theseus


_Developed at Volkamer Lab, Charité/FU Berlin_

Enes Kurnaz

## Reference

### Theseus

Theseus superposes a set of macromolecular structures simultaneously using the method of maximum like-lihood (ML), rather than the conventional least-squares criterion. Theseus assumes that the structures are distributed according to a matrix Gaussian distribution and that the eigenvalues of the atomic covariancematrix are hierarchically distributed according to an inverse gamma distribution. This ML superpositioning model produces much more accurate results by essentially downweighting variable regions of the structuresand by correcting for correlations among atoms.


* https://theobald.brandeis.edu/theseus/
* Optimal simultaneous superpositioning of multiple structures with missing data.
  Theobald, Douglas L. & Steindel, Philip A. (2012) Bioinformatics 28 (15): 1972-1979 ([Link](http://bioinformatics.oxfordjournals.org/content/28/15/1972.full.pdf+html))
* Accurate structural correlations from maximum likelihood superpositions.
  Theobald, Douglas L. & Wuttke, Deborah S. (2008) PLOS Computational Biology 4(2):e43 ([Link](http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0040043))

## Introduction

In this tutorial we use Theseus as a python wrapper.

### What are the  chosen structures

#### Names of the structures 
- 2BBM (Drosophila melanogaster)
- 1CFC (Xenopus laevis)

#### Information of the structures
- Length: 148
- 97% sequence identity (145/148), 99% similar
- Contain calmodulin

### Why have they been chosen

- In 2BBM, the two calcium-binding domains are wrapped around a peptide.
- In 1CFC is no calcium and no peptide, and the linker between the two domains is flexible.


([Source](https://proteopedia.org/wiki/index.php/Structural_alignment_tools#Examples))


## Theory

### RMSD

The RMSD is the average distance between the atoms of superposed structures in Angstrom.

### coverage

The coverage of the aligned structures.

## Preparation

### Getting the structure in python

First thing you need to do is to download the proteins and pass them to `superposer`. We do that with the `Structure` objects and the `.from_pdbid()` class method.

In [1]:
from superposer.api import Structure

structure1 = Structure.from_pdbid("2BBM")
structure2 = Structure.from_pdbid("1CFC")

`2BBM` has a single segment, but `1CFC` has 25 different chains!

In [2]:
len(structure1.models), len(structure2.models)

(1, 25)

We can splice a Structure into a sub-AtomGroup and recreate the Structure:

In [12]:
structure2_onemodel = Structure.from_atomgroup(structure2.models[0])

## Alignment 

Use your method and explain the steps it takes.

First we import `TheseusAligner` and instantiate the engine. Then we use the ``.calculate()`` method which needs structures. Here we will use the structures ``structure1`` and ``structure2`` declared before. Afterwards we will get our results.

In [5]:
from superposer.superposition.theseus import TheseusAligner
aligner = TheseusAligner()
results = aligner.calculate([structure1, structure2_onemodel])
results

{'scores': {'rmsd': 9.02821},
 'metadata': {'least_squares': 2.60622,
  'maximum_likelihood': 1.70454,
  'log_marginal_likelihood': -2029.66,
  'aic': -2973.42,
  'bic': -3580.94,
  'omnibus_chi_square': 3.82,
  'hierarchical_var_chi_square': 1.36,
  'rotational_translational_covar_chi_square': 3.82,
  'hierarchical_minimum_var': 0.762,
  'hierarchical_minimum_sigma': 0.873,
  'skewness': -0.0,
  'skewness_z': 0.0,
  'kurtosis': -1.11,
  'kurtosis_z': 6.74,
  'data_pts': 888.0,
  'free_params': 457.0,
  'd_p': 1.9,
  'median_structure': 1.0,
  'n_total': 296.0,
  'n_atoms': 148.0,
  'n_structures': 2.0,
  'total_rounds': 28.0,
  'transformation': {1: array([[-0.9958405, -0.0754213, -0.0511206, -0.8419   ],
          [ 0.0871827, -0.9517875, -0.2941084, -0.2643   ],
          [-0.0264739, -0.2973418,  0.954404 , -0.9037   ]]),
   2: array([[ 0.9509416, -0.3091028,  0.012868 , -9.6969   ],
          [ 0.2687773,  0.8048547, -0.5291197, -4.9858   ],
          [ 0.1531955,  0.5066206,  0.8


To get the statistics without superposition, we run the method **run_theseus_different_no_superposition**.

In [11]:
aligner.run_theseus_different_no_superposition([structure1, structure2_onemodel])



{'scores': {'rmsd': 34.88939},
 'metadata': {'least_squares': 10.0717,
  'maximum_likelihood': 6.41057,
  'log_marginal_likelihood': -3381.82,
  'aic': -4325.57,
  'bic': -4933.1,
  'omnibus_chi_square': 7.65,
  'hierarchical_var_chi_square': 5.38,
  'rotational_translational_covar_chi_square': 7.65,
  'hierarchical_minimum_var': 7.21,
  'hierarchical_minimum_sigma': 2.68,
  'skewness': -0.0,
  'skewness_z': 0.0,
  'kurtosis': -0.9,
  'kurtosis_z': 5.46,
  'data_pts': 888.0,
  'free_params': 457.0,
  'd_p': 1.9,
  'median_structure': 1.0,
  'n_total': 296.0,
  'n_atoms': 148.0,
  'n_structures': 2.0,
  'total_rounds': 101.0}}

## Analysis

### NGLview

If you have trouble with NGLview, follow this [troubleshooting guide](https://github.com/SBRG/ssbio/wiki/Troubleshooting#tips-for-nglview).

In [10]:
import nglview as nv
view = nv.show_mdanalysis(results["superposed"][0].atoms)
view.add_component(results["superposed"][1].atoms)
view

NGLWidget()

In this case, the scores are not too great. We go from `34.9A` to `9A`.