[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/delalamo/af2_conformations/blob/main/notebooks/mutate_msa.ipynb)

# Conformationally selective AlphaFold predictions by mutagenesis

This notebook provides an interface for predicting the structures of proteins using AlphaFold [1]. It simplifies the introduction of mutations in the query sequence and multiple sequence alignment (MSA). **Its intended audience are users familiar with Python.** The code borrows heavily from ColabFold [2], and makes use of the same MMSeqs2 API for retrieval of sequence alignments and templates [3,4]. Additionally, the principle outline in this notebook is introduced and described in ref [5]. Users of this notebook should cite these publications (listed below).

This notebook provides an approach, described independently by [Sergey Ovchinnikov](https://twitter.com/sokrypton/status/1464748132852547591) and [Richard A. Stein and Hassane S. Mchaourab](https://www.biorxiv.org/content/10.1101/2021.11.29.470469v1) [5], at concealing and/or modifying interresidue relationships across the MSA by mutagenesis. This has the effect of causing AlphaFold2 to sample alternative conformations.

Some notes and caveats:
* Currently only the structures of monomers can be predicted.
* Relax is disabled. If you plan on evaluating these structures using an energy function, be sure to minimize them using OpenMM [6] or Rosetta [7] beforehand.
* We removed many of the bells and whistles of other colab notebooks, including pLDDT-based model ranking, visualization of sequence alignment coverage, progress bars, etc.

Models can be downloaded either at the end of the run or incrementally while the program is still running. For the latter, click the folder icon on the left sidebar, hovering over the file of interest and click the three vertical dots, and select "download".

In [None]:
#@title Set up Colab environment (1 of 2)
%%bash

# get templates
git clone https://github.com/delalamo/af2_conformations.git
    
# get AF2
git clone https://github.com/deepmind/alphafold.git
pip3 install -r ./alphafold/requirements.txt
    
mv alphafold alphafold_
mv alphafold_/alphafold .
rm -r alphafold_
# remove "END" from PDBs, otherwise biopython complains
sed -i "s/pdb_lines.append('END')//" /content/alphafold/common/protein.py
sed -i "s/pdb_lines.append('ENDMDL')//" /content/alphafold/common/protein.py

# download model params (~1 min)
mkdir params
curl -fsSL https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar | tar x -C params

# download libraries for interfacing with MMseqs2 API
apt-get -y update
apt-get -y install jq curl zlib1g gawk

# setup conda
wget -qnc https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -bfp /usr/local  2>&1 1>/dev/null
rm Miniconda3-latest-Linux-x86_64.sh

# setup template search
conda install -q -y  -c conda-forge -c bioconda kalign3=3.2.2 hhsuite=3.3.0 python=3.7

In [None]:
#@title Set up Colab environment (2 of 2)

from google.colab import files

from af2_conformations.scripts import predict
from af2_conformations.scripts import util
from af2_conformations.scripts import mmseqs2

import random
import os

from absl import logging
logging.set_verbosity(logging.DEBUG)

Once everything has been installed, the code below can be modified and executed.

In [None]:
# In this example we introduce alanine mutations into the sequence of MCT1
# MCT1 exclusively adopts the inward-facing conformation when no templates are used
# Alanine mutations were placed at the extracellular gate
jobname = 'MCT1'
sequence = ("MPPAVGGPVGYTPPDGGWGWAVVIGAFISIGFSYAFPKSITVFFKEIEGIFHATTSEVSWISS"
            "IMLAVMYGGGPISSILVNKYGSRIVMIVGGCLSGCGLIAASFCNTVQQLYVCIGVIGGLGLAF"
            "NLNPALTMIGKYFYKRRPLANGLAMAGSPVFLCTLAPLNQVFFGIFGWRGSFLILGGLLLNCC"
            "VAGALMRPIGPKPTKAGKDKSKASLEKAGKSGVKKDLHDANTDLIGRHPKQEKRSVFQTINQF"
            "LDLTLFTHRGFLLYLSGNVIMFFGLFAPLVFLSSYGKSQHYSSEKSAFLLSILAFVDMVARPS"
            "MGLVANTKPIRPRIQYFFAASVVANGVCHMLAPLSTTYVGFCVYAGFFGFAFGWLSSVLFETL"
            "MDLVGPQRFSSAVGLVTIVECCPVLLGPPLLGRLNDMYGDYKYTYWACGVVLIISGIYLFIGM"
            "GINYRLLAKEQKANEQKKESKEEETSIDVAGKPNEVTKAAESPDQKDTDGGPKEEESPV" )

# The MMSeqs2Runner object submits the amino acid sequence to
# the MMSeqs2 server, generates a directory, and populates it with
# data retrieved from the server.
mmseqs2_runner = mmseqs2.MMSeqs2Runner( jobname, sequence )

# Fetch sequences and download data
a3m_lines, _ = mmseqs2_runner.run_job()

# Define the mutations and introduce into the sequence and MSA
muts = { x: "A" for x in [ 41,42,45,46,56,59,60,63,281,282,285,286,403,407 ] }

mutated_msa = util.mutate_msa( a3m_lines, muts )
mutated_seq = util.mutate_msa( sequence, muts )

for n_model in range( 5 ):

  # Specify the name of the output PDB
  outname = f"model_{ n_model }.pdb"

  predict.predict_structure_no_templates( mutated_seq, outname, mutated_msa )

# To download predictions:
!zip -FSr "af2.zip" *".pdb"
files.download( "af2.zip" )

# References:
1. Jumper et al "Highly accurate protein structure prediction with AlphaFold" Nature (2021)
2. Mirdita et al "ColabFold - making protein folding accessible to all" biorXiv (2021)
3. Steinegger & Söding "MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets" Nature Biotechnology (2017)
4. Mirdita et al "MMseqs2 desktop and local web server app for fast, integrative sequence searches" Bioinformatics (2019)
5. Stein & Mchaourab "Modeling Alternate Conformations with Alphafold2 via Modification of the Multiple Sequence Alignment" bioRxiv (2021)
6. Eastman et al "OpenMM 7: Rapid development of high performance algorithms for molecular dynamics" Plos Comp Bio (2017)
7. Koehler-Leman et al "Macromolecular modeling and design in Rosetta: recent methods and frameworks" Nature Methods (2020)