CoMR is a mitochondrial / mitochondrial-related organelle (MRO) proteome prediction and reconstruction workflow for model organisms and eukaryotes with atypical targeting signals. It combines targeting predictors, HMM searches, DIAMOND homology searches, and downstream parsing to produce scored candidate lists.
| If you want to... | Read |
|---|---|
| Install and run the packaged workflow | README.md |
| Understand pipeline logic, outputs, and scoring | README_CoMR.md |
| Build the container images yourself | README_BUILD.md |
The CoMR container includes Snakemake and the runtime software environment, but you still need to provide a few host-side assets:
| Required on host | Why |
|---|---|
| CoMR repository clone | Workflow code and config |
| TargetP 2.0 binary | Licensed external dependency |
| CoMR database bundle | HMMs, alignments, MitoDB, SubtractedDB, UniProt |
| DIAMOND-formatted NR database | Optional but recommended NR homology search |
| NCBI taxonomy files | Needed if taxonomy is not embedded in nr.dmnd |
git clone https://github.com/theLabUpstairs/CoMR.git
cd CoMRbash scripts/fetch_third_party.shThis retrieves MitoFates and Mitoprot II.
Request a license from DTU Health Tech,
download the Linux archive, and unpack it somewhere readable on the host, for
example /your/path/to/targetp-2.0.
tar -xzf targetp-2.0.Linux.tar.gz
chmod -R 755 targetp-2.0CoMR expects this directory to be mounted inside the container as
/mnt/software/targetp-2.0.
Download and extract the CoMR database bundle CoMR_DB_hmm from
Figshare, including:
Alignments/Hmm_profile/SMD_MitoDB.fastaSMD_SubtractedDB.fastauniprot_sprot.fasta
Example host location: /your/path/to/CoMR_DB_hmm. If you wish to use an optional Custom Database, place it at the same location.
If you want the NR search stage, prepare nr.dmnd and record its path.
Example host location: /your/path/to/blastdb/nr.dmnd
Build NR with embedded taxonomy:
mkdir -p /your/path/to/blastdb
cd /your/path/to/blastdb
wget https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz
wget https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz
wget https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.tar.gz
tar -xzf new_taxdump.tar.gz names.dmp nodes.dmp
diamond makedb \
--in nr.gz \
--db nr \
--taxonmap prot.accession2taxid.FULL.gz \
--taxonnodes nodes.dmp \
--taxonnames names.dmpIf you already have a DIAMOND NR database without taxonomy support, also keep these NCBI taxonomy files available:
prot.accession2taxid.gznodes.dmpnames.dmp
Example host location: /your/path/to/taxonomy
If you do not have an NR database, CoMR can still run with NR searches disabled using
enable_nr=false.
Copy the template:
cp config/config.yaml config/config_runtime.yamlThe template is intended to run with minimal edits. In practice, these are the settings users most often change:
In case your Diamond-indexed NR database was built without taxonomy:
diamond_search:
taxonomy_enabled: FalseEnable an optional CustomDB FASTA:
database:
customdb: "/mnt/databases/your_custom_db.fasta"If customdb is set, also run CoMR with enable_customdb=true and be sure your Custom DB exists at /your/path/to/CoMR_DB_hmm.
Exclude specific taxa from NR hits by taxid:
diamond_search:
excluded_taxids:
- 9606
- 10090Or exclude taxa from a one-taxid-per-line file:
diamond_search:
excluded_taxids_file: "config/exclusions/listoftaxids.txt"Use taxon exclusion only when taxonomy_enabled: True.
Adjust the misc section to match your hardware:
| Node profile | Total cores / RAM | threads |
threads_diamond |
block_size |
diamond_slots |
Notes |
|---|---|---|---|---|---|---|
| Workstation | 8 cores / 32 GB | 4-8 | 4-8 | 1 | 1 | Often best to disable NR |
| Mid-size HPC | 32 cores / 128 GB | 24-32 | 8-16 | 1-2 | 1 | Leave CPU headroom for MAFFT and DeepMito |
| Large HPC | 64+ cores / 256+ GB | 32-64 | 32-64 | 4+ | 2 | Only raise diamond_slots if storage/RAM can keep up |
Rule of thumb: DIAMOND typically needs about 2-3 GB RAM per thread plus strong I/O.
You can define input FASTA files in the config:
fasta_files:
- /path/to/sample1.fasta
- /path/to/sample2.faOr pass them dynamically at runtime:
--config fasta=sample.pep
--config fasta=/path/to/sample.fasta
--config fasta=/path/to/sample1.fasta,/path/to/sample2.fastaAccepted input extensions:
.fasta.fa.fas.aa.pep
Each input must resolve to a unique sample basename. For example,
sample.fasta and sample.pep will collide.
To send declared outputs outside the repository, set:
output_dir: "/path/to/comr_results"This redirects workflow outputs and logs such as:
00_data_format_<FASTA>/01_analysis_original_<FASTA>/02_analysis_parsed_<FASTA>/03_alignments_<FASTA>/04_trees_<FASTA>/05_CoMR_<FASTA>/logs_<FASTA>/
Internal runtime/cache directories such as .snakemake/ and .inline_cache/
remain under the CoMR installation directory.
Set your local paths once:
COMR_ROOT=/path/to/CoMR
DB_DIR=/path/to/CoMR_DB_hmm
NR_DMND=/path/to/blastdb/nr.dmnd
TAXONOMY=/path/to/taxonomy
TARGETP=/path/to/targetp-2.0
CORES=32
FASTA_INPUT=proteins.pep
OUTPUT_DIR=/path/to/comr_resultsKeep the container-internal paths consistent with the config:
- databases under
/mnt/databases - NR under
/mnt/blastdb - taxonomy under
/mnt/taxonomy - TargetP under
/mnt/software/targetp-2.0
Pull the image:
docker pull ghcr.io/thelabupstairs/comr:latestRun:
IMAGE=comr:latest
docker run --rm \
--user "$(id -u)":"$(id -g)" \
-e HOME=/opt/CoMR \
-v "$DB_DIR:/mnt/databases:ro" \
-v "$NR_DMND:/mnt/blastdb/nr.dmnd:ro" \
-v "$TAXONOMY:/mnt/taxonomy:ro" \
-v "$TARGETP:/mnt/software/targetp-2.0:ro" \
-v "$COMR_ROOT:/opt/CoMR" \
-w /opt/CoMR \
"$IMAGE" \
snakemake --cores "$CORES" \
--configfile config/config_runtime.yaml \
--config fasta="$FASTA_INPUT" output_dir="$OUTPUT_DIR"Notes:
- Add
enable_customdb=trueinside the same--configblock to enable CustomDB - Add
enable_nr=falseinside the same--configblock to skip NR - If you override multiple keys, keep them after a single
--config - Use repeated
--group-add <gid>if your filesystem permissions require additional groups
Example with multiple overrides:
snakemake --cores "$CORES" \
--configfile config/config_runtime.yaml \
--config fasta="$FASTA_INPUT" output_dir="$OUTPUT_DIR" enable_customdb=trueDownload the image CoMR.sif from Figshare
Set:
COMR_IMAGE=/path/to/CoMR.sif
cd "$COMR_ROOT"
mkdir -p "$COMR_ROOT/.inline_cache"On Slurm systems:
srun singularity exec \
--bind "$DB_DIR:/mnt/databases:ro" \
--bind "$NR_DMND:/mnt/blastdb:ro" \
--bind "$TAXONOMY:/mnt/taxonomy:ro" \
--bind "$TARGETP:/mnt/software/targetp-2.0:ro" \
--bind "$COMR_ROOT:/opt/CoMR" \
--bind "$COMR_ROOT/.inline_cache:/opt/software/MitoFates/bin/modules/_Inline" \
"$COMR_IMAGE" \
snakemake --cores "$CORES" \
--configfile config/config_runtime.yaml \
--config fasta="$FASTA_INPUT" output_dir="$OUTPUT_DIR"On systems without Slurm:
singularity exec \
--bind "$DB_DIR:/mnt/databases:ro" \
--bind "$NR_DMND:/mnt/blastdb:ro" \
--bind "$TAXONOMY:/mnt/taxonomy:ro" \
--bind "$TARGETP:/mnt/software/targetp-2.0:ro" \
--bind "$COMR_ROOT:/opt/CoMR" \
--bind "$COMR_ROOT/.inline_cache:/opt/software/MitoFates/bin/modules/_Inline" \
"$COMR_IMAGE" \
snakemake --cores "$CORES" \
--configfile config/config_runtime.yaml \
--config fasta="$FASTA_INPUT" output_dir="$OUTPUT_DIR"If your system uses Apptainer, replace singularity exec with apptainer exec.
If you are on an HPC system, check your local site documentation or ask your
system administrators for the correct module setup, scheduler integration, and
container invocation pattern.
- Record which version and location of every external asset you use
- Keep host paths and container paths consistent
- If NR taxonomy is not embedded in
nr.dmnd, make sure taxonomy files are mounted separately - If you disable NR, CoMR can still run, but NR-based search/parse stages will be skipped
- If you enable CustomDB, make sure the FASTA exists inside the mounted
database directory and pass
enable_customdb=true