# Final Project — Structural Bioinformatics

**Student(s):**  
**Project title:**  
**Track:** (Prediction / Docking / MD-Ensembles / Free Energy / Generative AI / Integrative)  
**Date:**  

---

## Abstract (250–350 words)
(Write this last, but paste it here for the final submission.)


## 1. Background & Motivation (½–1 page)
- What is the biological system?
- Why is this question interesting/important?
- What is already known (at a high level)?


## 2. Research Question / Hypothesis (very specific)
State a **testable** question. Good patterns:
- Does A differ from B?
- What changes with/without ligand?
- How robust is result X to choice Y?
- How does confidence/uncertainty affect interpretation?

**Primary question:**  
**Secondary question (optional):**  
**Success criteria:** what would count as an informative result?


In [None]:
import sys, platform, random, os
import numpy as np

SEED = 123
random.seed(SEED)
np.random.seed(SEED)

print("Python:", sys.version.split()[0])
print("Platform:", platform.platform())
print("Seed:", SEED)


## 3. Setup: Install packages and define paths

**Instructions**
- Keep installs minimal
- Use a single project root directory
- Save all generated outputs (figures, tables) to `outputs/`


In [None]:
from pathlib import Path
USE_DRIVE = True

if USE_DRIVE:
    from google.colab import drive
    drive.mount('/content/drive')
    PROJECT_ROOT = Path("/content/drive/MyDrive/structbio_final_project")
else:
    PROJECT_ROOT = Path("/content/structbio_final_project")

DATA_DIR = PROJECT_ROOT / "data"
OUT_DIR  = PROJECT_ROOT / "outputs"
CODE_DIR = PROJECT_ROOT / "code"

for d in [DATA_DIR, OUT_DIR, CODE_DIR]:
    d.mkdir(parents=True, exist_ok=True)

print("PROJECT_ROOT:", PROJECT_ROOT)
print("DATA_DIR:", DATA_DIR)
print("OUT_DIR:", OUT_DIR)


In [None]:
# Edit for your project. Keep it short.
!pip -q install biopython mdtraj py3Dmol

# Examples depending on track:
# !pip -q install openmm
# !pip -q install pyemma
# !pip -q install openfe


## 4. Data & Inputs (with provenance)

List exactly what you used:
- PDB IDs / UniProt IDs / sequences
- ligand library sources (ZINC / ChEMBL / curated list)
- any precomputed trajectories you downloaded
- any parameters that matter (force field, water model, docking exhaustiveness)

Include links in plain text if useful, but also store key files in `data/`.


In [None]:
import requests

def fetch_pdb(pdb_id: str, out_path: Path) -> Path:
    out_path.parent.mkdir(parents=True, exist_ok=True)
    url = f"https://files.rcsb.org/download/{pdb_id}.pdb"
    r = requests.get(url)
    r.raise_for_status()
    out_path.write_text(r.text)
    return out_path

# Example:
# pdb_file = fetch_pdb("1CRN", DATA_DIR/"pdb"/"1CRN.pdb")
# print("Saved:", pdb_file)


## 5. Methods (overview first)
Before code, write a short narrative of what you will do and why each step answers the question.

Example structure:
1. Prepare structure(s)
2. Generate predictions / dock poses / run or load trajectories
3. Compute metrics (RMSD/RMSF/PCA/MSM/ΔG/etc.)
4. Summarize into figures and tables


## 7. Results (figures + tables)
Provide:
- 2–6 key figures
- 1–3 tables (if helpful)
- captions that interpret what the figure shows (not just what it is)


## 8. Interpretation, Limitations, and “What I’d do next”
- What did you learn relative to the original question?
- What limitations might change the conclusion?
- What additional computation/experiment would strengthen the claim?


In [None]:
import shutil

# Example: save a figure to OUT_DIR inside plotting code:
# plt.savefig(OUT_DIR/"figure1_rmsd.png", dpi=200, bbox_inches="tight")

zip_path = shutil.make_archive(str(PROJECT_ROOT), 'zip', root_dir=str(PROJECT_ROOT))
print("Created:", zip_path)


## Submission Checklist
- [ ] Notebook runs top-to-bottom without manual intervention (except long computations)
- [ ] All outputs saved in `outputs/`
- [ ] Data files used are in `data/` (or clearly downloaded in the notebook)
- [ ] Abstract completed
- [ ] Figures have captions and units where appropriate
- [ ] Limitations section is honest and specific
