Skip to content

Alignment-based Nucleotide Detection and Read Optimization for Mapping Errors, Deaminations, and Alterations

License

Notifications You must be signed in to change notification settings

MViscardi-UCSC/ANDROMEDA

Repository files navigation

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃            _   _ _____  _____   ____  __  __ ______ _____            ┃
┃      /\   | \ | |  __ \|  __ \ / __ \|  \/  |  ____|  __ \   /\      ┃
┃     /  \  |  \| | |  | | |__) | |  | | \  / | |__  | |  | | /  \     ┃
┃    / /\ \ | . ` | |  | |  _  /| |  | | |\/| |  __| | |  | |/ /\ \    ┃
┃   / ____ \| |\  | |__| | | \ \| |__| | |  | | |____| |__| / ____ \   ┃
┃  /_/    \_\_| \_|_____/|_|  \_\\____/|_|  |_|______|_____/_/    \_\  ┃
┃                                                                      ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

Python GitHub License GitHub commit activity GitHub contributors GitHub Tag GitHub Actions Workflow Status

uv GitHub Actions Workflow Status

Alignment-based Nucleotide Detection and Read Optimization for Mapping Errors, Deaminations, and Alterations


Overview

ANDROMEDA is a bioinformatics tool designed to process mapped sequencing reads and extract information from ambiguous nucleotide positions. The tool identifies barcode and UMI sequences from predefined mapping positions, assigns tags to BAM files, and detects mismatched bases resulting from RNA modifications (e.g., deamination by TadA or other base editors). It also enables UMI grouping, consensus calling, and error annotation to facilitate high-confidence sequence reconstruction.

Key Features

  • Map-based Barcode & UMI Extraction – Extracts nucleotide sequences from user-defined ambiguous positions in mapped reads.
  • BAM Tagging – Annotates BAM files with extracted barcode/UMI sequences for downstream analysis.
  • Modification Detection – Identifies mismatched bases caused by base modifications (e.g., deamination).
  • UMI Grouping – Leverages UMI-tools to cluster reads.
  • PCR Collapsing & Consensus Calling - UMI groups are collapsed to generate high-confidence consensus sequences with additional Phred averaging for each nucleotide.
  • Error & Confidence Tagging – Adds metadata on sequence confidence, ambiguous bases, and read support.
  • Modular Design – Individual processing steps can be run independently, allowing seamless integration into other workflows.

Running ANDROMEDA with uv

To run ANDROMEDA with uv, you will need to have the following installed:

The uvx command will do the rest (gathering the correct python version and all other dependencies)! WOW!

To run the full pipeline, you can use the following command:

uvx --from git+https://github.com/MViscardi-UCSC/ANDROMEDA andromeda run-all --help

Full Installation

ANDROMEDA requires Python 3.8+ and dependencies such as pysam, UMI-tools, and samtools. Install using:

# Clone the repository
git clone https://github.com/MViscardi-UCSC/ANDROMEDA.git
cd ANDROMEDA

# Create a virtual environment (optional but recommended)
python -m venv andromeda_env
source andromeda_env/bin/activate  # On Windows use: andromeda_env\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Usage

ANDROMEDA consists of multiple modules that can be run independently or as part of a pipeline (coming soon).

Within the andromeda folder we currently have these modules working:

  1. ref_pos_picker.py will help users pick correct coordinates for their UMIs in their reference
  2. extract.py will take in a mapped BAM file, a reference genome, and a list of positions to extract from the reads. It will output a new BAM file with the extracted sequences in the read tags.
  3. umi_group.py will take in the output from extract.py and group reads by UMI. It will output a new BAM file with the UMI sequences in the read tags. (This will leverage UMI-Tools)
  4. consensus.py will take in the output from umi_group.py and generate consensus sequences for each UMI group. It will output a new BAM file of the consensus sequences.

More specific instructions for each module can be found below, with extended details available with the --help flag.

0. Pick Reference Positions for UMIs

python andromeda ref_pos_picker <ref.fasta> <output_parent_directory>

1. Extract Barcodes and UMIs from Mapped Reads

python andromeda extract <ref.fasta> <mapped.bam> <output_parent_directory>

2. Group UMIs Using UMI-Tools

python andromeda umi_group <tagged.bam (from extract step)> <output_parent_directory>

3. Create Consensus Sequences

python andromeda consensus <ref.fasta> <grouped.bam (from group step)> <output_parent_directory>

Or Run all the steps together!:

python andromeda run-all <ref.fasta> <mapped.bam> <output_parent_directory>

Dependencies

  • Python 3.11+
  • samtools
  • UMI-tools
  • pandas
  • biopython
  • seaborn and matplotlib for plotting (optional) TODO: Make this optional

Future Features

  • Support for base modifications, not just mismatches due to deamination
  • Integration with alternative consensus-calling algorithms
  • Enhanced confidence scoring for ambiguous base calls

License

MIT License. See LICENSE for details.

Contributors

Developed by Marcus Viscardi and Liam Tran in the Arribere Lab at UCSC. Contributions welcome!

Contact

For questions or feedback, please contact marcus.viscardi@gmail.com or open an issue on GitHub.

About

Alignment-based Nucleotide Detection and Read Optimization for Mapping Errors, Deaminations, and Alterations

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages