HyPhy Version Update: 2.5.101
This update introduces high-performance hardware-specific matrix optimizations (AVX & ARM NEON), extensive batch language modernizations, and key bug fixes for sequence cleaning pipelines.
Core C++ Matrix & Likelihood Optimizations
1. Vectorized Matrix Arithmetic (AVX & NEON)
- Scale and Add Optimizations: Refactored
_Matrix::ScaleAndAddto support both AVX (FMA3 and baseline AVX) and ARM NEON intrinsics, improving performance on vector processing architectures. - Improved Thread Scaling: Optimized loop scheduling and cache friendliness for core matrix multiplications (
src/core/matrix_mult.cpp) and likelihood calculations (src/core/likefunc.cpp). - Tree Evaluator Performance: Boosted tree evaluation throughput (
src/core/tree_evaluator.cpp) by optimizing loop structures and dynamic object allocations during pruning sweeps.
HBL Modernization & Refactoring
1. Style & Syntax Modernization
- AnalyzeCodonData & dNdSRateAnalysis: Comprehensively refactored AnalyzeCodonData.bf and dNdSRateAnalysis.bf to modernize scoping braces and convert incremental expressions to the cleaner HBL
+=operator. - Modernization Tracker: Added modernized_files.md and detailed progress reports under
modernization_progress/to systematically track the migration of standard analysis templates (e.g.,BUSTED.bf,RELAX.bf,AnalyzeNucProtData.bf) to modernized style conventions.
2. Output Format Alignments
- ACD Suffix: Aligned default output formatting in
AnalyzeCodonData.bfto standard.ACD.jsonsuffix conventions.
Bug Fixes
1. Sequence Cleaning (cln / rmv shortcuts)
- HBL Comparison Bug: Resolved critical logic bugs in CleanStopCodons.bf caused by string-to-number comparisons:
- In HBL,
string >= number(e.g.filteringOption >= 2) is evaluated by converting the number to a string and performing an alphabetical check. Since all option strings (like"No/No","No/Yes") alphabetically succeed"2", duplicate sequence filtering was erroneously running for all choices. This is resolved by comparingfilteringOptiondirectly to specific string options. - The modulo comparison
filteringOption % 2(used to decide if sites with gaps should be filtered) is also corrected. In HBL, the%operator on a string left-hand operand acts as a case-insensitive string equality comparison, which always returned0when compared to numeric2. - Corrected the typo
filterinOptiontocln.disallow_stopson line 157.
- In HBL,
- Combined, these fixes ensure the
Keep all sequences and sites(No/No) option correctly preserves sequences.
Testing & Quality Assurance
1. Modernization & Formatting Verification
- Brace & Operator Formatting: Added automated formatting utility scripts
format_braces.pyandreplace_increment.pyto enforce code styling rules. - Integration Tests: Introduced
pairwise_test.pyto run combinatorial correctness checks on modernized batch scripts and prevent regression in analysis results.