# AlphaFold: Revolutionizing Protein Structure Prediction
## What is AlphaFold?
Developed by DeepMind (a subsidiary of Alphabet/Google), [AlphaFold](https://deepmind.google/technologies/alphafold/) is an artificial intelligence (AI) system that predicts the 3D structure of proteins from their amino acid sequences. It addresses the long-standing "protein folding problem," which has puzzled scientists for decades. Proteins are essential to nearly all biological processes, and their functions are determined by their intricate 3D shapes. Traditional methods like X-ray crystallography or cryo-EM are time-consuming and costly, making computational prediction a game-changer.

## How AlphaFold Works:
AlphaFold uses deep learning and neural networks trained on:
- Protein Data Bank (PDB): A repository of experimentally determined protein structures.
- Evolutionary Data: Multiple sequence alignments (MSAs) to infer evolutionary relationships.
- Physical Constraints: Geometric and chemical rules (e.g., bond angles, steric clashes).

The system employs a transformer-based architecture to model interactions between amino acids, generating highly accurate predictions (often near-experimental accuracy).

## Applications of AlphaFold
- Drug Discovery:
    - Accelerates identification of drug targets by predicting structures of disease-related proteins (e.g., cancer, Alzheimer’s).
    - Enables structure-based drug design (e.g., targeting SARS-CoV-2 spike protein).
- Understanding Genetic Diseases:
    - Predicts how mutations (e.g., in cystic fibrosis or sickle cell anemia) disrupt protein function.
- Enzyme Engineering:
    - Designs enzymes for industrial applications (e.g., biofuel production, plastic degradation).
- Synthetic Biology:
    - Facilitates creation of artificial proteins for novel functions.
- Antibiotic Development:
    - Predicts structures of bacterial proteins to combat antibiotic resistance.
- Basic Research:
    - Provides structural insights for poorly characterized proteins, expanding biological knowledge.

AlphaFold’s success was validated in the CASP competition (Critical Assessment of Structure Prediction), where it achieved unprecedented accuracy, rivaling experimental methods.

# LocalColabFold: Democratizing AlphaFold’s Power
## What is LocalColabFold?
[LocalColabFold](https://github.com/YoshitakaMo/localcolabfold) is an open-source, community-driven adaptation of ColabFold, which itself combines AlphaFold with faster, user-friendly tools. It allows researchers to run protein structure predictions locally (on their own hardware) without relying on cloud services like Google Colab.

Key Features:
- Accessibility:
    - Eliminates dependency on internet or cloud resources.
    - Ideal for sensitive data (e.g., proprietary or medical sequences).
- Speed & Efficiency:
    - Uses MMseqs2 (instead of HHblits) for rapid multiple sequence alignments (MSAs).
    - Reduced computational footprint compared to AlphaFold’s original implementation.
- Ease of Use:
    - Simplified setup via Conda or Docker.
    - Compatible with GPUs for faster predictions.
## Applications of LocalColabFold
- Academic Research:
    - Enables small labs to predict structures for hypothesis testing.
    - Useful for teaching structural biology concepts.
- Personalized Medicine:
    - Predicts structures of patient-specific protein variants.
- Structural Genomics:
    - Scales predictions for large protein datasets (e.g., metagenomic studies).
- Collaborative Projects:
    - Integrates with high-performance computing (HPC) clusters for batch processing.
## Limitations
- **Computational Resources**: LocalColabFold still requires a GPU for optimal performance.
- **Accuracy**: Slightly lower than AlphaFold for certain proteins due to simplified MSAs.
- **Multimer Support**: Early versions struggled with protein complexes, but updates have improved this.



## Setup an AWS SageMaker Notebook Instance

**Instance Type**: Choose a GPU-enabled instance (e.g., g4dn.xlarge or p3.2xlarge for CUDA support).

## Install LocalColabFold

From "File"->"New"->"Terminal". Click the newwly opened Terminal tab and run the commands below to download "install_colabbatch_linux.sh" from repository
```bash
cd  /home/ec2-user/
wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_linux.sh
bash install_colabbatch_linux.sh
```

If installation is successful, you will see the following message:

```bash
Installation of ColabFold finished.
Add /home/ec2-user/localcolabfold/colabfold-conda/bin to your PATH environment variable to run 'colabfold_batch'.
i.e. for Bash:
        export PATH="/home/ec2-user/localcolabfold/colabfold-conda/bin:$PATH"
For more details, please run 'colabfold_batch --help'.
```


Add environment variables to ~/.bashrc by running the following commands from the terminal.
```bash
echo 'export PATH="/home/ec2-user/localcolabfold/colabfold-conda/bin:$PATH"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="/home/ec2-user/localcolabfold/colabfold-conda/lib/:$LD_LIBRARY_PATH"' >> ~/.bashrc
source ~/.bashrc
```

## Run LocalColabFold

Run the following commands from the terminal.

```bash
echo ">Example_Protein|ChainA" > example.fasta 
echo "GIVEQCCTSICSLYQLENYCN" >> example.fasta
colabfold_batch --templates --amber example.fasta results/
ls -l results/
```


## Visualization of Prediction Results

In [None]:
# Install py3Dmol for visualization
!pip install py3Dmol

### Visualize multiple PDB files using py3Dmol and Interactive Dropdown Viewer

In [None]:
from ipywidgets import interact, Dropdown
import py3Dmol
import glob
import re
from IPython.display import display, HTML

results_dir = "/home/ec2-user/results/"
# 1. Get sorted list of PDB files by rank
pdb_files = sorted(glob.glob(results_dir+'*_rank_*_alphafold2_ptm_model_*.pdb'),
                   key=lambda x: int(re.search(r'rank_(\d+)', x).group(1)))

# Create dropdown selector
pdb_dict = {f: f for f in pdb_files}

@interact(File=Dropdown(options=pdb_dict))
def show_structure(File):
    view = py3Dmol.view(width=600, height=400)
    with open(File) as f:
        view.addModel(f.read(), 'pdb')
    view.setStyle({'cartoon': {'color': 'spectrum'}})
    view.zoomTo()
    return view.show()