# Nipah Inhibitor Discovery Pipeline
**Student:** Vihaan Agrawal  
**Project:** Computational Discovery of Resistance-Proof Antivirals

## Quick Start
Welcome to the laboratory! This notebook controls the entire scientific pipeline. Everything is self-contained here.

### What this code does:
1.  **Builds the Receptors:** Downloads raw data and creates the mutant structure from scratch.
2.  **Runs the Physics:** Docks the drug candidates into the protein pocket using Vina.
3.  **Verifies Results:** Checks for geometric clashes and calculates the final resistance scores.

---

## Phase 0: Environment Setup
Here we load the Python libraries that act as our "Digital Lab Equipment".
*   **Bio.PDB:** Allows us to manipulate 3D protein structures.
*   **subprocess:** Lets us run external tools like `vina` and `obabel`.
*   **scipy/numpy:** Handles the heavy math (calculating 3D distances).


In [None]:
import sys
import os
import subprocess
import shutil
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

# BioPython is our main tool for reading PDB files
from Bio.PDB import PDBParser, PDBIO, Select
from scipy.spatial import distance
import gemmi  # Used for converting CIF (new format) to PDB (classic format)

# Define where our data lives
BASE_DIR = Path(".").resolve()
DATA_DIR = BASE_DIR / "data"
DATA_DIR.mkdir(exist_ok=True)

print("Digital Laboratory Initialized.")

## Phase 1: Data Preparation ("The Build")
In a real science fair project, you shouldn't just download a ready-made file. You should build it.
This section creates the **W730A Mutant** structure by digitally editing the atoms of the Wild-Type protein.

**Key Concept:** We change Tryptophan (Big, bulky) to Alanine (Tiny). This creates a "hole" in the protein.

In [None]:
def setup_receptors():
    print("--- Starting Data Build ---")
    
    # 1. Download the Raw Crystal Structure (9KNZ)
    # We use the CIF format because it's the modern standard from the RCSB PDB.
    url = "https://files.rcsb.org/download/9KNZ.cif"
    cif_path = DATA_DIR / "9KNZ.cif"
    if not cif_path.exists():
        print(f"Downloading raw data from {url}...")
        subprocess.run(["curl", "-o", str(cif_path), url], check=True)

    # 2. Convert to PDB & Clean
    # Raw PDBs are messy. They contain water molecules and sometimes multiple protein chains.
    # We want ONLY Chain A and NO water.
    print("Cleaning Crystal Structure...")
    doc = gemmi.cif.read(str(cif_path))
    structure = gemmi.make_structure_from_block(doc.sole_block())
    raw_pdb = DATA_DIR / "9KNZ.pdb"
    structure.write_pdb(str(raw_pdb))

    # Using BioPython to filter atoms
    parser = PDBParser(QUIET=True)
    s = parser.get_structure("WT", str(raw_pdb))
    
    class CleanSelect(Select):
        def accept_chain(self, chain): 
            return chain.id == "A"  # Only keep Chain A
        def accept_residue(self, residue): 
            return residue.id[0] == " " # Remove 'HETATM' (Water/Ions)
    
    io = PDBIO()
    io.set_structure(s)
    wt_pdb = DATA_DIR / "receptor_wt.pdb"
    io.save(str(wt_pdb), CleanSelect())
    print("  > Wild-Type Created.")

    # 3. Create the Mutant (W730 -> A730)
    # This is the 'In Silico Mutagenesis' step.
    print("Generating Mutant Structure...")
    s_mut = parser.get_structure("MUT", str(wt_pdb))
    mutation_count = 0
    
    for model in s_mut:
        for chain in model:
            for res in chain:
                # Identify Tryptophan at position 730
                if res.id[1] == 730 and res.resname == "TRP":
                    print(f"  > Found Target: {res.resname} 730")
                    res.resname = "ALA"  # Rename to Alanine
                    
                    # Physically remove the sidechain atoms
                    # We keep the backbone (N, CA, C, O) but delete the rest.
                    atoms_to_delete = []
                    for atom in list(res):
                        if atom.name not in ["N", "CA", "C", "O", "CB"]:
                            atoms_to_delete.append(atom.name)
                            res.detach_child(atom.name)
                    print(f"  > Mutated to ALA. Deleted atoms: {atoms_to_delete}")
                    mutation_count += 1

    mut_pdb = DATA_DIR / "receptor_mut.pdb"
    io.set_structure(s_mut)
    io.save(str(mut_pdb))

    # 4. Convert to PDBQT (Physics Ready)
    # Vina requires a special format called PDBQT which includes "Partial Charges".
    # We use OpenBabel to calculate these charges.
    print("Converting to Physics Format (PDBQT)...")
    for pdb in [wt_pdb, mut_pdb]:
        out = pdb.with_suffix(".pdbqt")
        # -xr = strict PDB reading, -p 7.4 = simulate pH 7.4 (human body)
        subprocess.run(["obabel", str(pdb), "-O", str(out), "-xr", "-p", "7.4", "--partialcharge", "gasteiger"], check=True)
    
    print("Setup Complete. Receptors are ready for docking.")

setup_receptors()

## Phase 2: The Docking Engine
This is the heart of the project. We use two custom functions here:
1.  **`run_docking`**: Calls the Vina executable to perform the physics simulation.
2.  **`check_ghost_clash`**: A unique filter I wrote to solve the "Vacuum Hole Fallacy".

In [None]:
# CONFIGURATION
# The exact coordinates of the active site box
BOX_CENTER = [133.301, 137.79, 150.667]
BOX_SIZE = [22, 22, 22]
VINA_EXEC = BASE_DIR / "bin/vina"

def check_ghost_clash(ligand_pdbqt, wt_pdb_path=DATA_DIR/"receptor_wt.pdb"):
    """
    checks if the ligand overlaps with the 'Ghost Atoms' of W730.
    If the ligand hits the ghost atoms, it means it is occupying space that exists
    in the Wild-Type protein, meaning it would CLASH in the real virus.
    """
    parser = PDBParser(QUIET=True)
    structure = parser.get_structure("WT", str(wt_pdb_path))
    ghost_atoms = []
    
    # Extract the coordinates of the W730 sidechain (The Ghost)
    chain = structure[0]['A']
    for res in chain:
        if res.id[1] == 730:
            for atom in res:
                # Only consider sidechain atoms, not the backbone
                if atom.name not in ["N", "CA", "C", "O"]:
                    ghost_atoms.append(atom.get_coord())
            break
            
    # Parse the Ligand coordinates from the PDBQT file
    lig_coords = []
    with open(ligand_pdbqt) as f:
        for line in f:
            if line.startswith("ATOM") or line.startswith("HETATM"):
                # PDB format uses fixed column positions for X, Y, Z
                lig_coords.append([float(line[30:38]), float(line[38:46]), float(line[46:54])])
                
    # Calculate distances between every ligand atom and every ghost atom
    if not ghost_atoms or not lig_coords: return 999.9
    all_distances = distance.cdist(lig_coords, ghost_atoms)
    
    # Return the smallest distance found (Minimum Distance)
    return np.min(all_distances)

def run_docking(receptor, ligand, output):
    """Runs AutoDock Vina"""
    cmd = [
        str(VINA_EXEC), 
        "--receptor", str(receptor), 
        "--ligand", str(ligand),
        "--center_x", str(BOX_CENTER[0]), 
        "--center_y", str(BOX_CENTER[1]), 
        "--center_z", str(BOX_CENTER[2]),
        "--size_x", str(BOX_SIZE[0]), 
        "--size_y", str(BOX_SIZE[1]), 
        "--size_z", str(BOX_SIZE[2]),
        "--exhaustiveness", "16",      # High precision mode
        "--cpu", "4",                  # Use 4 CPU cores
        "--out", str(output)           # Save results here
    ]
    # subprocess.run executes the command line tool exactly like a terminal
    subprocess.run(cmd, stdout=subprocess.DEVNULL, check=True)
    
    # Parse the score from the output file (Look for 'REMARK VINA RESULT')
    with open(output) as f:
        for line in f:
            if "REMARK VINA RESULT" in line:
                # The score is usually the 2nd number in the line
                return float(line.split()[3])
    return 0.0

## Phase 3: The Verification Experiment
Now we run the experiment. We test two drugs:
1.  **BMS-986205:** My proposed inhibitor.
2.  **ERDRP-0519:** The experimental control (known to fail).

We check two things:
*   **Affinity:** Does it stick tightly? (More negative is better, e.g. -7.5)
*   **Resilience:** Does it still stick to the mutant? (Delta should be close to 0)
*   **Ghost Clash:** Is it cheating by occupying the vacuum hole? (Must be safe)

In [None]:
def verify_candidate(name, lig_file):
    print(f"Experiment: {name}")
    print("--------------------------------------------------")
    
    # 1. Dock into Wild-Type
    out_wt = DATA_DIR / f"{name}_wt.pdbqt"
    print(f"  Running Vina on Wild-Type...", end=" ")
    score_wt = run_docking(DATA_DIR/"receptor_wt.pdbqt", lig_file, out_wt)
    print(f"Done. Affinity: {score_wt} kcal/mol")

    # 2. Dock into Mutant
    out_mut = DATA_DIR / f"{name}_mut.pdbqt"
    print(f"  Running Vina on Mutant...   ", end=" ")
    score_mut = run_docking(DATA_DIR/"receptor_mut.pdbqt", lig_file, out_mut)
    print(f"Done. Affinity: {score_mut} kcal/mol")

    # 3. Analyze Results
    delta = score_mut - score_wt
    clash = check_ghost_clash(out_mut)
    print(f"  > Resistance Delta (Mutant-WT): {delta:+.2f} kcal/mol")
    print(f"  > Ghost Atom Clearance:         {clash:.2f} Ã…")
    
    # 4. Final Verdict
    print("  VERDICT: ", end="")
    if clash < 1.5:
        print("FAILED (Ghost Clash Detected - The 'Vacuum Hole Fallacy')")
    elif delta > 0.5:
        print("FAILED (Vulnerable to Resistance)")
    elif score_wt > -6.5:
        print("FAILED (Too Weak)")
    else:
        print("SUCCESS (Potent & Resilient)")
    print("\n")

# Run the Comparison
verify_candidate("BMS-986205", DATA_DIR/"ligand_BMS_986205.pdbqt")
verify_candidate("ERDRP-0519", DATA_DIR/"ligand_ERDRP_0519.pdbqt")