BWA-FastAlign: Faster and Cheaper Sequence Alignment on Commercial CPUs

BWA-FastAlign is a high-performance, cost-efficient software package for mapping low-divergent sequences against a large reference genome, such as the human genome.

It is designed as a drop-in replacement for the de facto standard BWA-MEM, offering 2.27× ∼ 3.28× throughput speedup and 2.54× ∼ 5.65× cost reductions on standard CPU servers, while guaranteeing 100% identical output (SAM/BAM) to BWA-MEM.

🚀 Key Features

High Throughput: Achieves ~2.85× average speedup over BWA-MEM by optimizing both the seeding and extension phases.
Cost Efficient: Delivers 2.54× ∼ 5.65× cost reduction compared to state-of-the-art CPU and GPU baselines (including BWA-MEM2 and BWA-GPU).
Identical Output: Guarantees 100% output compatibility with BWA-MEM. You can swap it into your existing pipelines without changing downstream analysis results.
Low Memory Footprint: Uses a novel Multi-stage Seeding strategy (Hybrid Index) that improves search performance without the massive memory overhead seen in hash-based or learned-index aligners (e.g., ERT-BWA-MEM2).
Optimized for Modern CPUs: Features an Intra-query Parallel algorithm for the seed-extension phase, utilizing AVX2 instructions to eliminate computation bubbles caused by varying read lengths.

🔧 Technical Innovations

BWA-FastAlign revitalizes the traditional alignment pipeline with two core algorithmic contributions:

Multi-Stage Seeding (Hybrid Index)
- Combines Kmer-Index, FMT-Index (Enhanced FM-Index with prefetching), and Direct-Index.
- Dynamically switches strategies based on seed length and match density.
- Achieves an 18.92× improvement in memory efficiency (bases processed per GB per second).
Intra-Query Parallel Seed-Extension
- Unlike BWA-MEM2 (which uses inter-query parallelism and suffers from load imbalance), BWA-FastAlign parallelizes the Smith-Waterman alignment within a single query.
- Includes Dynamic Pruning to skip zero-alignment scores.
- Implements a Sliding Window mechanism to reduce costly memory gather operations.
- Achieves 3.45× higher SIMD utilization, performing consistently well on both WGS (Whole Genome Sequencing) and WES (Whole Exome Sequencing) data.

📥 Installation

Option 1: Install via Bioconda (Recommended)

BWA-FastAlign is available on Bioconda. This is the easiest way to install as it handles dependencies automatically.

conda install -c bioconda bwa-fastalign

Option 2: Build from Source

Prerequisites

Linux operating system (tested on Ubuntu 22.04).
GCC compiler (version 11.4 or higher recommended).
CPU supporting AVX2 instructions (most modern Intel/AMD CPUs).
zlib development files.

Compilation

git clone https://github.com/your-username/BWA-FastAlign.git
cd BWA-FastAlign
make

📖 Usage

BWA-FastAlign follows the same command-line interface as BWA-MEM.

Download Datasets. We download E.coli reference genome and sequencing reads.

# Download reference genome
wget http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/005/845/GCA_000005845.2_ASM584v2/GCA_000005845.2_ASM584v2_genomic.fna.gz
gzip -d GCA_000005845.2_ASM584v2_genomic.fna.gz
mv GCA_000005845.2_ASM584v2_genomic.fna ref.fasta

# Download sequencing reads
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/003/SRR2584863/SRR2584863_1.fastq.gz -O reads_1.fq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/003/SRR2584863/SRR2584863_2.fastq.gz -O reads_2.fq.gz

Index the Reference. Before alignment, you must index your reference genome.

# This will generate the hybrid index files
./fastalign index ref.fa

Align Reads (Mem). Map single-end or paired-end reads to the reference.

# Single-end alignment
./fastalign mem ref.fasta reads_1.fq.gz > aln.sam

# Paired-end alignment
./fastalign mem ref.fasta reads_1.fq.gz reads_2.fq.gz > aln.sam

# Using multiple threads (Recommended: 32-128 threads for high throughput)
./fastalign mem -t 64 ref.fasta reads_1.fq.gz reads_2.fq.gz > aln.sam

Options. BWA-FastAlign supports the standard BWA-MEM options. Run ./fastalign mem to see the full list.

📜 Citation

If you find BWA-FastAlign is useful in your research, please cite our paper:

@inproceedings{fastalign2026,
  title={Faster and Cheaper: Pushing the Sequence Alignment Throughput with Commercial CPUs},
  author={Zhonghai Zhang, Yewen Li, Ke Meng, Chunming Zhang, Guangming Tan},
  booktitle={Proceedings of the 31st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '26)},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.vscode		.vscode
bwakit		bwakit
.gitignore		.gitignore
COPYING		COPYING
ChangeLog		ChangeLog
LICENSE		LICENSE
Makefile		Makefile
NEWS.md		NEWS.md
QSufSort.c		QSufSort.c
QSufSort.h		QSufSort.h
README-alt.md		README-alt.md
README.md		README.md
bamlite.c		bamlite.c
bamlite.h		bamlite.h
bntseq.c		bntseq.c
bntseq.h		bntseq.h
bwa.1		bwa.1
bwa.c		bwa.c
bwa.h		bwa.h
bwamem.c		bwamem.c
bwamem.h		bwamem.h
bwamem_extra.c		bwamem_extra.c
bwamem_pair.c		bwamem_pair.c
bwape.c		bwape.c
bwase.c		bwase.c
bwase.h		bwase.h
bwaseqio.c		bwaseqio.c
bwashm.c		bwashm.c
bwt.c		bwt.c
bwt.h		bwt.h
bwt_gen.c		bwt_gen.c
bwt_lite.c		bwt_lite.c
bwt_lite.h		bwt_lite.h
bwtaln.c		bwtaln.c
bwtaln.h		bwtaln.h
bwtgap.c		bwtgap.c
bwtgap.h		bwtgap.h
bwtindex.c		bwtindex.c
bwtsw2.h		bwtsw2.h
bwtsw2_aux.c		bwtsw2_aux.c
bwtsw2_chain.c		bwtsw2_chain.c
bwtsw2_core.c		bwtsw2_core.c
bwtsw2_main.c		bwtsw2_main.c
bwtsw2_pair.c		bwtsw2_pair.c
code_of_conduct.md		code_of_conduct.md
debug.c		debug.c
debug.h		debug.h
example.c		example.c
fastmap.c		fastmap.c
fmt_idx.c		fmt_idx.c
fmt_idx.h		fmt_idx.h
h.txt		h.txt
is.c		is.c
kbtree.h		kbtree.h
khash.h		khash.h
kopen.c		kopen.c
kseq.h		kseq.h
ksort.h		ksort.h
kstring.c		kstring.c
kstring.h		kstring.h
ksw.c		ksw.c
ksw.h		ksw.h
ksw_extend2_avx2.c		ksw_extend2_avx2.c
ksw_extend2_avx2_u8.c		ksw_extend2_avx2_u8.c
kthread.c		kthread.c
kvec.h		kvec.h
main.c		main.c
malloc_wrap.c		malloc_wrap.c
malloc_wrap.h		malloc_wrap.h
maxk.c		maxk.c
neon_sse.h		neon_sse.h
pemerge.c		pemerge.c
profiling.c		profiling.c
profiling.h		profiling.h
qualfa2fq.pl		qualfa2fq.pl
rle.c		rle.c
rle.h		rle.h
rope.c		rope.c
rope.h		rope.h
run.sh		run.sh
scalar_sse.h		scalar_sse.h
utils.c		utils.c
utils.h		utils.h
xa2multi.pl		xa2multi.pl
yarn.c		yarn.c
yarn.h		yarn.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BWA-FastAlign: Faster and Cheaper Sequence Alignment on Commercial CPUs

🚀 Key Features

🔧 Technical Innovations

📥 Installation

Option 1: Install via Bioconda (Recommended)

Option 2: Build from Source

Prerequisites

Compilation

📖 Usage

📜 Citation

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BWA-FastAlign: Faster and Cheaper Sequence Alignment on Commercial CPUs

🚀 Key Features

🔧 Technical Innovations

📥 Installation

Option 1: Install via Bioconda (Recommended)

Option 2: Build from Source

Prerequisites

Compilation

📖 Usage

📜 Citation

About

Resources

License

Licenses found

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages