┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ _ _ _____ _____ ____ __ __ ______ _____ ┃
┃ /\ | \ | | __ \| __ \ / __ \| \/ | ____| __ \ /\ ┃
┃ / \ | \| | | | | |__) | | | | \ / | |__ | | | | / \ ┃
┃ / /\ \ | . ` | | | | _ /| | | | |\/| | __| | | | |/ /\ \ ┃
┃ / ____ \| |\ | |__| | | \ \| |__| | | | | |____| |__| / ____ \ ┃
┃ /_/ \_\_| \_|_____/|_| \_\\____/|_| |_|______|_____/_/ \_\ ┃
┃ ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
Alignment-based Nucleotide Detection and Read Optimization for Mapping Errors, Deaminations, and Alterations
ANDROMEDA is a bioinformatics tool designed to process mapped sequencing reads and extract information from ambiguous nucleotide positions. The tool identifies barcode and UMI sequences from predefined mapping positions, assigns tags to BAM files, and detects mismatched bases resulting from RNA modifications (e.g., deamination by TadA or other base editors). It also enables UMI grouping, consensus calling, and error annotation to facilitate high-confidence sequence reconstruction.
- Map-based Barcode & UMI Extraction – Extracts nucleotide sequences from user-defined ambiguous positions in mapped reads.
- BAM Tagging – Annotates BAM files with extracted barcode/UMI sequences for downstream analysis.
- Modification Detection – Identifies mismatched bases caused by base modifications (e.g., deamination).
- UMI Grouping – Leverages UMI-tools to cluster reads.
- PCR Collapsing & Consensus Calling - UMI groups are collapsed to generate high-confidence consensus sequences with additional Phred averaging for each nucleotide.
- Error & Confidence Tagging – Adds metadata on sequence confidence, ambiguous bases, and read support.
- Modular Design – Individual processing steps can be run independently, allowing seamless integration into other workflows.
To run ANDROMEDA with uv, you will need to have the following installed:
- uv (GitHub link)
- samtools (GitHub link)
The uvx
command will do the rest (gathering the correct python version and all other dependencies)! WOW!
To run the full pipeline, you can use the following command:
uvx --from git+https://github.com/MViscardi-UCSC/ANDROMEDA andromeda run-all --help
ANDROMEDA requires Python 3.8+ and dependencies such as pysam
, UMI-tools
, and samtools
. Install using:
# Clone the repository
git clone https://github.com/MViscardi-UCSC/ANDROMEDA.git
cd ANDROMEDA
# Create a virtual environment (optional but recommended)
python -m venv andromeda_env
source andromeda_env/bin/activate # On Windows use: andromeda_env\Scripts\activate
# Install dependencies
pip install -r requirements.txt
ANDROMEDA consists of multiple modules that can be run independently or as part of a pipeline (coming soon).
Within the andromeda folder we currently have these modules working:
- ref_pos_picker.py will help users pick correct coordinates for their UMIs in their reference
- extract.py will take in a mapped BAM file, a reference genome, and a list of positions to extract from the reads. It will output a new BAM file with the extracted sequences in the read tags.
- umi_group.py will take in the output from extract.py and group reads by UMI. It will output a new BAM file with the UMI sequences in the read tags. (This will leverage UMI-Tools)
- consensus.py will take in the output from umi_group.py and generate consensus sequences for each UMI group. It will output a new BAM file of the consensus sequences.
More specific instructions for each module can be found below, with extended details available with the --help
flag.
python andromeda ref_pos_picker <ref.fasta> <output_parent_directory>
python andromeda extract <ref.fasta> <mapped.bam> <output_parent_directory>
python andromeda umi_group <tagged.bam (from extract step)> <output_parent_directory>
python andromeda consensus <ref.fasta> <grouped.bam (from group step)> <output_parent_directory>
python andromeda run-all <ref.fasta> <mapped.bam> <output_parent_directory>
Python 3.11+
samtools
UMI-tools
pandas
biopython
seaborn
andmatplotlib
for plotting (optional) TODO: Make this optional
- Support for base modifications, not just mismatches due to deamination
- Integration with alternative consensus-calling algorithms
- Enhanced confidence scoring for ambiguous base calls
MIT License. See LICENSE
for details.
Developed by Marcus Viscardi and Liam Tran in the Arribere Lab at UCSC. Contributions welcome!
For questions or feedback, please contact marcus.viscardi@gmail.com or open an issue on GitHub.