Skip to content

Latest commit

 

History

History
593 lines (503 loc) · 26.5 KB

Alphafold2_talk.org

File metadata and controls

593 lines (503 loc) · 26.5 KB

Highly accurate protein structure prediction with AlphaFold

HEADER

\setbeamerfont{large}{size=\large}

START [0/2]

LOG

[.] include slide with repo, colab links

[.] train todo

Predictions of side chain chi angles as well as the final, per-residue accuracy of the structure (pLDDT) are computed with small per-residue networks on the final activations at the end of the network. The estimate of the TM-score (pTM) is obtained from a pairwise error prediction that is computed as a linear projection from the final pair representation. The final loss (that we term the frame-aligned point error (FAPE) (Fig. 3f)) compares the predicted atom positions to the true positions under many different alignments. For each alignment, defined by aligning the predicted frame (Rk,tk) to the corresponding true frame, we compute the distance of all predicted atom positions xi from the true atom positions. The resulting Nframes × Natoms distances are penalized with a clamped L1-loss. This creates a strong bias for atoms to be correct relative to the local frame of each residue and hence correct with respect to its side chain interactions, as well as providing the main source of chirality for AlphaFold (Suppl. Methods 1.9.3 and Suppl. Fig. 9).

makes effective use of the unlabelled sequence data and significantly improves the accuracy of the resulting network. Additionally, we randomly mask out or mutate individual residues within the MSA and have a Bidirectional Encoder Representations from Transformers (BERT)-style37 objective to predict the masked elements of the MSA sequences. This objective encourages the network to learn to interpret phylogenetic and covariation relationships without hardcoding a particular correlation statistic into the features. The BERT objective is trained jointly with the normal PDB structure loss on the same training examples and is not pre-trained, in contrast to recent independent work38.

refs

snips

CODE [0/0]

Results

Supplemental figures

Methods

Model training and evaluation

Introduction

Introduction

./imgs/profilepic2.jpg

Willy Rempel
  • HBSc Computer Science
  • BSc Mathematics
  • Research Associate, AISC
  • seeking opportunities in the field

Introduction

Although all of the ideas in the model are doubtlessly clever, the main secret behind AlphaFold 2’s success is the superb deep learning engineering. A close look at the model reveals an architecture with a large amount of small details that seem fundamental for the performance of the network. As we admire the end product, we should not turn a blind eye to the enormous budget, and the large team of full-time, handsomely paid engineers that made it possible. cite:rubieraAlphaFoldHereWhat

This, and many other tricks, are described in exhaustive detail in the Supplementary Information. A reduced subset has been analysed in a brief ablation study, but ultimately, how important are each of the minor details is anybody’s guess. cite:rubieraAlphaFoldHereWhat

\flushright{(above blog post is recommended reading)}

Model Overview cite:jumperHighlyAccurateProtein2021

./imgs/model-overview.png

Data Pipeline

Initial Input: mmCIF or FASTA files

./imgs/mmcif-eg.png

Initial Input: mmCIF or FASTA files

./imgs/fastafiles_2021-07-20.png

Parsing cite:jumperHighlyAccurateProtein2021

  • only certain metadata (more from mmCIF)
  • change MSE residues into MET

[.] Genetic Search cite:jumperHighlyAccurateProtein2021

For MSAs

  • JackHMMER
    • MGnify: MSA depth 5,000
    • UniRef90: MSA depth 10,000
  • HHBlits
    • Uniclust30 + BFD: MSA depth unlimited
  • MSAs duplicated and stacked

flags:
JackHMMER: -N 1 -E 0.0001 –incE 0.0001 –F1 0.0005 –F2 0.00005 –F3 0.0000005. \ HHBlits: -n 3 -e 0.001 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -maxseq 1000000.

Template Search cite:jumperHighlyAccurateProtein2021

  • UniRef90 MSA from prior search used for PDB70 search using HHSearch.
  • Filter out:
    • released after the input sequence
    • or identical to the input sequence
    • too small
  • At inference use top 4 templates

Training Data cite:jumperHighlyAccurateProtein2021

  • 75:25 self-distillation : known structure (PDB)
  • stochastic filters (next)

Filtering cite:jumperHighlyAccurateProtein2021

  • stochastic filters:
    • Input mmCIFs are restricted to have resolution less than 9 Å. This is not a very restrictive filter and only removes around 0.2% of structures.
    • Longer protein chains are selected with higher probability.
    • Also favour protein chains from smaller clusters. They use 40% sequence identity clusters of the Protein Data Bank clustered with MMSeqs2.
    • Sequences are filtered out when any single amino acid accounts for more than 80% of the input primary sequence. This filter removes about 0.8% of sequences.

MSA block deletion cite:jumperHighlyAccurateProtein2021

  • Block deletion tends to remove similarities (ie. whole branch phylogeny) and promote diversity
    • Similar sequences are likely to be adjacent
    • Contiguous blocks in MSAs are deleted.
    • First MSAs are grouped by tool
    • Then sorted according to tool defaults (usually e-value)

Algorithm 1 MSA Block deletion cite:jumperHighlyAccurateProtein2021

./imgs/algo1_block_deletion.png

MSA clustering cite:jumperHighlyAccurateProtein2021

  • Similarity clusters used to randomly select subset of MSA sequences
    • to reduce computational cost from attention modules, reduce $N_seq$
  • Modified K-means is used, with the input sequence used as first cluster center


Clustering Algorithm:

  1. $Nclust$ centers selected from MSA
  2. Generate a mask where p=0.15 that any position is selected by the mask
  3. Each center is modified for each mask selected residue according to:
    1. p=0.1 replaced with a uniformly sampled random amino acid
    2. p=0.1 replaced with an amino acid sampled from the MSA profile
    3. p=0.1 no replacement
    4. p=0.7 replaced with a special token (masked_msa_token)
  4. hamming distance measure for remaining selections

Residue cropping cite:jumperHighlyAccurateProtein2021

During training:

  1. unclamped & clamped - sampling start index from uniform distributions
  2. Cropped with fixed size $N_res$

[.] Featurization and model inputs cite:jumperHighlyAccurateProtein2021

  • target_feat
    This is a feature of size [Nres, 21] consisting of the “aatype” feature.
  • residue_index
    Positional encoding constant tensor. This is a feature of size [Nres] consisting of the “residue_index” feature.
  • msa_feat
    This is a feature of size [Nclust, Nres, 49] constructed by concatenating “cluster_msa”, “cluster_has_deletion”, “cluster_deletion_value”, “cluster_deletion_mean”, “cluster_profile”. We draw Ncycle×Nensemble random samples from this feature to provide each recycling/ensembling iteration of the network with a different sample (see subsubsection 1.11.2).
  • extra_msa_feat
    This is a feature of size [Nextra_seq, Nres, 25] constructed by concatenating “extra_msa”, “extra_msa_has_deletion”, “extra_msa_deletion_value”. Together with “msa_feat’ above we also draw Ncycle × Nensemble random samples from this feature (see subsubsection 1.11.2).

[.] Featurization and model inputs cite:jumperHighlyAccurateProtein2021

  • template_pair_feat
    This is a feature of size [Ntempl, Nres, Nres, 88] and consists of concatenation of the pair residue features “template_distogram”, “template_unit_vector”, and also several residue features, which are transformed into pair features. \ The “template_aatype” feature is included via tiling and stacking (this is done twice, in both residue directions). \ Also the mask features “template_pseudo_beta_mask” and “template_backbone_frame_mask” are included, where the feature fij = maski · maskj. \
  • template_angle_feat
    This is a feature of size [Ntempl, Nres, 51] constructed by concatenating the following features: “template_aatype”, “template_torsion_angles”, “template_alt_torsion_angles”, and “template_torsion_angles_mask”.

Table 1 Input Features (1/2) cite:jumperHighlyAccurateProtein2021

./imgs/table1_inputs_1.png

Table 1 Input Features (2/2) cite:jumperHighlyAccurateProtein2021

./imgs/table1_inputs_2.png

Self-distillation dataset cite:jumperHighlyAccurateProtein2021

  • Build dataset (on unlabeled sequences):
    1. Make MSA for every cluster in Uniclust30
    2. Remove sequences that appear in another sequences MSA
    3. Keep sequences of 200 < length < 1024
    4. Remove sequences where MSA < 200 alignments
  • For predicted structures:
    • train ‘undistlled’ model on just PDB dataset
    • use this model to predict above set
    • for every residue pair, computer confidence metric using KL-divergence between distance distribution and a reference distribution
    • reference distribution
  • self-distillation training took roughly 2 weeks

AlphaFold Inference

AlphaFold Inference cite:jumperHighlyAccurateProtein2021

  • AlphaFold receives input features derived from:
    • the amino-acid sequence
    • MSA
    • templates (see subsubsection 1.2.9)
  • outputs features:
    • atom coordinates
    • the distogram
    • per-residue confidence scores.
  • Recycling x3
    • initial recycled inputs are zero

Algorithm 2 outlines the main steps (see also Fig 1e and the corresponding description in the main article).

Algorithm 2 Model Inference cite:jumperHighlyAccurateProtein2021

./imgs/algo2_cut_part1.png

Algorithm 2 Model Inference cite:jumperHighlyAccurateProtein2021

./imgs/algo2_cut_part2.png

AlphaFold Training cite:jumperHighlyAccurateProtein2021

./imgs/table4_model_training.png

Model Architecture

Input embeddings cite:jumperHighlyAccurateProtein2021

./imgs/input_embeddings.png

Algorithm 3 Input embeddings cite:jumperHighlyAccurateProtein2021

./imgs/algo3_input_embed.png

Algorithm 4 Relative positional encoding cite:jumperHighlyAccurateProtein2021

./imgs/algo4_pos_encoding.png

Algorithm 5 One-hot encoding with nearest bin cite:jumperHighlyAccurateProtein2021

./imgs/algo5_1hot_bin_encode.png

EvoFormer

EvoFormer: Overview cite:jumperHighlyAccurateProtein2021

./imgs/model-evoformer-main.png

EvoFormer: Overview cite:jumperHighlyAccurateProtein2021

  • cast as a graph inference problem
  • cross-optimization and information flow between MSA representation and pair-wise representation
  • layer normalization

Algorithm 6 EvoFormer stack cite:jumperHighlyAccurateProtein2021

./imgs/algo6_evoformer.png

EvoFormer: Row wise Gated Attention cite:jumperHighlyAccurateProtein2021

./imgs/rowwise-gated-attention.png

Algorithm 7 Row wise Gated Attention cite:jumperHighlyAccurateProtein2021

./imgs/algo7_evo_row_attention.png

EvoFormer: Column wise Gated Attention cite:jumperHighlyAccurateProtein2021

./imgs/columnwise-gated-attention.png

Algorithm 8 Column wise Gated Attention cite:jumperHighlyAccurateProtein2021

./imgs/algo8_evo_col_attention.png

EvoFormer: MSA Translation Layer cite:jumperHighlyAccurateProtein2021

./imgs/msa-translation-layer.png

Algorithm 9 MSA Translation Layer cite:jumperHighlyAccurateProtein2021

./imgs/algo9_msa_transition.png

EvoFormer: Outer-Product Mean cite:jumperHighlyAccurateProtein2021

./imgs/outer-product-mean.png

Algorithm 10 Outer-Product Mean cite:jumperHighlyAccurateProtein2021

./imgs/algo10_outer_product.png

EvoFormer: Residue Pairs cite:jumperHighlyAccurateProtein2021

./imgs/model-evoformer-pair1.png

./imgs/model-evoformer-pair2.png

EvoFormer: Triangular Multiplicative Update cite:jumperHighlyAccurateProtein2021

./imgs/triangular-mult-update.png

Algorithm 11 Triangular Multiplicative Update: outward cite:jumperHighlyAccurateProtein2021

./imgs/algo11_triangular_out.png

Algorithm 12 Triangular Multiplicative Update: inward cite:jumperHighlyAccurateProtein2021

./imgs/algo12_triangular_in.png

EvoFormer: Triangular Self-Attention cite:jumperHighlyAccurateProtein2021

./imgs/triangular-self-attention.png

Algorithm 13 Triangular Self-Attention: start cite:jumperHighlyAccurateProtein2021

./imgs/algo13_triangular_attention_start.png

Algorithm 14 Triangular Self-Attention: end cite:jumperHighlyAccurateProtein2021

./imgs/algo14_triangular_end.png

Algorithm 15 Transition layer in the pair stack cite:jumperHighlyAccurateProtein2021

./imgs/algo15_pair_transition.png

Algorithm 16 Template pair stack cite:jumperHighlyAccurateProtein2021

./imgs/algo16_template_pair_stack.png

Algorithm 17 Template pointwise attention cite:jumperHighlyAccurateProtein2021

./imgs/algo17_template_point_attention.png

Algorithm 18 Extra MSA stack cite:jumperHighlyAccurateProtein2021

./imgs/algo18_extra_msa_stack.png

Algorithm 19 MSA global column-wise gated self-attention cite:jumperHighlyAccurateProtein2021

./imgs/algo19_msa_global_col_attention.png

Structure Module

Structure Module: Overview cite:jumperHighlyAccurateProtein2021

./imgs/model-structure.png

Structure Module: Frame Representation

./imgs/TransformationMatrix1.png

  • rotation + translation transforms $T_i := (R_i,t_i)$
  • no reflection, scaling, or shear
  • they construct ground truth frames using the position of three atoms from the ground truth PDB structures using a Gram–Schmidt process (Algorithm 21)

Structure Module: Invariant point attention (IPA) cite:jumperHighlyAccurateProtein2021

./imgs/ipa.png

Structure Module: Algorithm Part 1 cite:jumperHighlyAccurateProtein2021

./imgs/algo20-part1.png

Structure Module: Algorithm Part 2 cite:jumperHighlyAccurateProtein2021

./imgs/algo20-part2.png

Structure Module: Algorithm Part 3 cite:jumperHighlyAccurateProtein2021

./imgs/algo20-part3.png

Table 2 Rigid atomic groups from torsion angles cite:jumperHighlyAccurateProtein2021

./imgs/table2_rigid_atom_groups.png

Algorithm 21 Frame construction from ground truth atom positions cite:jumperHighlyAccurateProtein2021

./imgs/algo21_gram_schmidt.png

Algorithm 22 Invariant point attention (IPA) cite:jumperHighlyAccurateProtein2021

./imgs/algo22_ipa.png

Algorithm 23 Backbone update cite:jumperHighlyAccurateProtein2021

./imgs/algo23_backbone-update.png

Algorithm 24 Compute all atom coordinates cite:jumperHighlyAccurateProtein2021

./imgs/algo24_all-atom-coords-algo.png

Algorithm 25 Make a transformation that rotates around the x-axis cite:jumperHighlyAccurateProtein2021

./imgs/algo25_xaxis-transform.png

Table 3 Ambiguous atom renaming swaps cite:jumperHighlyAccurateProtein2021

./imgs/table3_atom_renaming.png

Algorithm 26 Rename symmetric ground truth atoms cite:jumperHighlyAccurateProtein2021

./imgs/algo26_rename-truth-atoms.png

Distograms cite:jumperHighlyAccurateProtein2021

./imgs/distogram_example.png

Structure Module: Output

  • predicts backbone frames $T_i$ and torsion angles $α^f_i$
  • then computes atom coordinates by applying the torsion angles to the corresponding amino acid structure with idealized bond angles and bond lengths.
  • We attach a local frame to each rigid group (see Table 2), such that the torsion axis is the x-axis, and store the ideal literature atom coordinates [97] for each amino acid relative to these frames

in a table ~xlit r,f,a , where r ∈ {ALA, ARG, ASN, … } denotes the residue type, f ∈ Storsion names denotes the frame and a the atom name. We further pre-compute rigid transformations that transform atom coordinates lit from each frame to the frame that is higher up in the hierarchy. E.g. Tr,(χ maps atoms in amino-acid 2 →χ1 ) type r from the χ2 -frame to the χ1 -frame. As we are only predicting heavy atoms, the extra backbone rigid groups ω and φ do not contain atoms, but the corresponding frames contribute to the FAPE loss for alignment to the ground truth (like all other frames). cite:jumperHighlyAccurateProtein2021

Amber Relaxation cite:jumperHighlyAccurateProtein2021

  • Iterative restrained energy minimization procedure:
    • minimization of the AMBER99SB force field
      • additional harmonic restraints (keep system near its input structure)
      • restraints applied independently to heavy atoms, with a spring constant of 10 kcal/mol Å2
  • Once minimizer has converged, determine which residues still contain violations
    • remove restraints from all atoms within these residues and perform restrained minimization once again
    • process repeated until all violations are resolved.
  • Note: In the CASP14 assessment only one iteration was used
  • Full energy minimization and hydrogen placement was performed using OpenMM simulation package
    • tolerance of 2.39 kcal/mol and unlimited number of steps (OpenMM default values)

Loss Functions

Loss Functions cite:jumperHighlyAccurateProtein2021

./imgs/loss-eq.png

  • weighted sum
  • weighted to reduce importance of short sequences

Loss Functions & Auxillary Heads cite:jumperHighlyAccurateProtein2021

  1. Side chain and backbone torsion angle loss (sec. 1.9.1)
  2. Frame aligned point error (FAPE) (sec. 1.9.2)
    • Configurations with FAPE(X,Y) = 0 (sec. 1.9.4)
    • Metric properties of FAPE (sec. 1.9.5)
  3. Chiral properties of AlphaFold and its loss (sec. 1.9.3)
    • transforms $T_i$ are variant under reflection (see eq. 11 to 17)
    • atom positions via backbone frames and $χ$ angles
  4. Model confidence prediction (pLDDT) (sec. 1.9.6)
  5. TM-score prediction (sec. 1.9.7)
  6. Distogram prediction (sec. 1.9.8)
  7. Masked MSA prediction (sec. 1.9.9)
  8. “Experimentally resolved” prediction (sec. 1.9.10)
  9. Structural violations (sec. 1.9.11)

Algorithm 27 Side chain and backbone torsion angle loss cite:jumperHighlyAccurateProtein2021

./imgs/algo27_sidechain-backbonetorsion-loss.png

Loss Functions: FAPE

./imgs/algo28_fape.png

  • Variation of commonly used root-mean-squared deviation (RMSD) of atomic positions
  • not invariant to reflections, preventing proteins of the wrong chirality. cite:rubieraAlphaFoldHereWhat, cite:jumperHighlyAccurateProtein2021

Algorithm 29 Predict model confidence pLDDT cite:jumperHighlyAccurateProtein2021

./imgs/algo29_confidence-pLDDT.png

TM-score prediction cite:jumperHighlyAccurateProtein2021

  • Global superposition metric of $C_α$ atoms (eqs. 31 - 33)
    1. approximated (eqs. 34-36)
    2. Probabilistic lower-bound maximum-of-expectation score (eqs. 37-38)
    3. approximated TM-score using pairwise $C_α$ based computation ($eij$ matrix) and above (see eq. 39 and adjacent text)
    4. TM-score of any residue subset $D$ can be computed (eq. 40)
      • can also be used to estimate GDT, FAPE, RMSD (using $eij$) matrix (not done)
      • used for confident domain packing visualizations

Distogram prediction cite:jumperHighlyAccurateProtein2021

  • linear project pair representations to bins, sum directed edges, ie. $zij + zji$
  • 64 bins ($2 \angstrom - 22 \angstrom$)
  • prediction targets: $y^bij$ one-hot encoding (bins)
    • from ground-truth $C_β$ atoms from all residues (except glycine, use $C_α$)

#+ATTR_LATEX :scale 0.4 :caption \caption{Cross entropy distogram loss averaged overall pairs (eq. 41)} ./imgs/distogram_loss.png

[.] Masked MSA prediction cite:jumperHighlyAccurateProtein2021

[.] “Experimentally resolved” prediction (fine tuning) cite:jumperHighlyAccurateProtein2021

[.] Structural violations (fine tuning) cite:jumperHighlyAccurateProtein2021

Training & Inference Details

Recycling iterations

Algorithm 30 Generic recycling inference procedure cite:jumperHighlyAccurateProtein2021

./imgs/algo30_recycling.png

Algorithm 31 Generic recycling training procedure cite:jumperHighlyAccurateProtein2021

./imgs/algo31_generic-recycling.png

Algorithm 32 Embedding of evoformer and structure module outputs for recycling cite:jumperHighlyAccurateProtein2021

./imgs/algo32_recycling-embedding.png

[.] Training stages cite:jumperHighlyAccurateProtein2021

[.] MSA resampling and ensembling cite:jumperHighlyAccurateProtein2021

Optimization details cite:jumperHighlyAccurateProtein2021

  • Adam $\italic{lr} == 10-3, β_1 = 0.9, β_2 = 0.999, ε = 10-6$
    • lr warm-up for $0.128 \dot 10^6$ samples, increase again by $0.65 after 6.4 \dot 10^6$ samples
  • batch: 128
  • gradient clipping by global norm (per parameter*sample) of 0.1

[.] Parameters initialization cite:jumperHighlyAccurateProtein2021

[.] Loss clamping details cite:jumperHighlyAccurateProtein2021

[.] Dropout details cite:jumperHighlyAccurateProtein2021

[.] Evaluator setup cite:jumperHighlyAccurateProtein2021

[.] Reducing the memory consumption cite:jumperHighlyAccurateProtein2021

Results

[.] CASP14 Assessment cite:jumperHighlyAccurateProtein2021

They did well

Ablation Studies cite:jumperHighlyAccurateProtein2021

Baseline for all ablation models: Full model without noisy-student self-attention Ablations:

  1. With noisy-student self-distillation training
  2. No templates
  3. No raw MSA (use MSA pairwise frequencies)
  4. No triangles, biasing, or gating (use axial attention)
  5. No recycling
  6. No IPA (use direct projection)
  7. No invariant IPA & no recycling
  8. No end-to-end structure gradients (keep auxiliary heads)
  9. No auxiliary distogram head
  10. No auxiliary masked MSA head

Ablation Results (in main paper) cite:jumperHighlyAccurateProtein2021

./imgs/ablation_results.png

Ablation Results cite:jumperHighlyAccurateProtein2021

./imgs/fig10_ablation_results.png

[.] Network Probing cite:jumperHighlyAccurateProtein2021

todo

[.] Novel Folds

They did well

[.] Attention Visualization cite:jumperHighlyAccurateProtein2021

todo

[.] Additional Results cite:jumperHighlyAccurateProtein2021

todo

Biblio

Bibliography

\printbibliography