<<<<<<< HEAD
A complete workflow for protein-ligand docking using Snakemake, based on the TeachOpenCADD tutorial T015 (https://projects.volkamerlab.org/teachopencadd/talktorials/T015_protein_ligand_docking.html).
This pipeline automates the molecular docking process:
- Prepare structures - Extract protein and ligand from PDB
- Convert to PDBQT - Format conversion for docking
- Molecular docking - Run smina (AutoDock Vina fork) with 5 poses per ligand
- Post-processing - Extract docking scores
- Split poses - Separate 5 individual docking poses
- Visualization - Generate 3D protein-ligand complex images using PyMOL for each pose found in the PDB file data
- Python 3.10+
- Conda (Miniforge or Anaconda)
git clone <your-repo-url>
cd dock_projectconda env create -f environment.yml
conda activate dock_envThis installs all dependencies:
- snakemake (workflow engine)
- openbabel (structure format conversion)
- pymol-open-source (3D visualization)
- rdkit (molecular toolkit)
- smina (docking engine)
- And supporting libraries
Place PDB files in data/raw/:
data/raw/2ito.pdb
data/raw/1A2C.pdb (example)
...
Update config.yaml with the PDB file names (without .pdb extension). Only include files that are actually in data/raw/:
pdb_files:
- "2ito"
- "1A2C"snakemake -nsnakemake -j 1 #(or other number can be used)(Use -j x to run x "jobs" in parallel)
snakemake -F -j 1data/
├── processed/
│ ├── 2ito/
│ │ ├── 2ito_protein.pdb
│ │ ├── 2ito_ligand.pdb
│ │ ├── 2ito_protein.pdbqt
│ │ └── 2ito_ligand.pdbqt
│ └── 1A2C/
│ ├── 1A2C_protein.pdb
│ ├── 1A2C_ligand.pdb
│ ├── 1A2C_protein.pdbqt
│ └── 1A2C_ligand.pdbqt
└── docking/
├── 2ito/
│ ├── 2ito_docked.sdf # All poses that were in PDB File
│ ├── 2ito_scores.csv # Affinity scores
│ ├── poses/
│ │ ├── pose_00.sdf
│ │ ├── pose_01.sdf
│ │ └── ...
│ └── visualize/
│ ├── pose_00.png
│ ├── pose_01.png
│ └── ...
└── 1A2C/
├── 1A2C_docked.sdf
├── 1A2C_scores.csv
├── poses/
│ ├── pose_00.sdf
│ ├── pose_01.sdf
│ └── ...
└── visualize/
├── pose_00.png
├── pose_01.png
└── ...
A DAG (Directed Acyclic Graph) visualization is included as dag_protein_docking.svg. This shows how the 6 rules are connected and their dependencies.
To regenerate the DAG after modifying the Snakefile:
snakemake --dag --snakefile Snakefile | dot -Tsvg > dag_protein_docking.svg(Requires graphviz: conda install -c conda-forge graphviz)
- Snakefile - Workflow definition with all 6 rules
- environment.yml - Conda dependencies specification
- config.yaml - Project configuration (paths, PDB files)
- scripts/ - Python scripts for each workflow step
prepare_structures.py- Extract protein/ligand from PDBconvert_to_pdbqt.py- Convert to PDBQT formatrun_analysis.py- Extract docking scoresvisualize_pose.py- Generate 3D visualizations
Based on TeachOpenCADD T015: Protein-Ligand Docking https://projects.volkamerlab.org/teachopencadd/talktorials/T015_protein_ligand_docking.html
PDB Files used in creation: https://www.rcsb.org/structure/2ITO https://www.rcsb.org/structure/2XNI
FHNW
-
Unsupported atom types (e.g. Boron): The pipeline converts ligand and protein files to the PDBQT format and runs
sminafor docking. The AutoDock/Vina-style PDBQT format (used bysmina) supports a limited set of atom types. Ligands that contain some elements (for example, boronB) may fail during PDBQT parsing and cause the workflow to stop with a parse error.Detection (quick): run this to list element symbols found in a ligand PDB:
awk '{print substr($0,77,2)}' data/processed/<PDB>/<PDB>_ligand.pdb | sort | uniq -c
59d5283 (Initial project: Snakemake docking pipeline)