Skip to content

2.5.101

Latest

Choose a tag to compare

@spond spond released this 18 Jun 15:27
646e16e

HyPhy Version Update: 2.5.101

This update introduces high-performance hardware-specific matrix optimizations (AVX & ARM NEON), extensive batch language modernizations, and key bug fixes for sequence cleaning pipelines.

Core C++ Matrix & Likelihood Optimizations

1. Vectorized Matrix Arithmetic (AVX & NEON)

  • Scale and Add Optimizations: Refactored _Matrix::ScaleAndAdd to support both AVX (FMA3 and baseline AVX) and ARM NEON intrinsics, improving performance on vector processing architectures.
  • Improved Thread Scaling: Optimized loop scheduling and cache friendliness for core matrix multiplications (src/core/matrix_mult.cpp) and likelihood calculations (src/core/likefunc.cpp).
  • Tree Evaluator Performance: Boosted tree evaluation throughput (src/core/tree_evaluator.cpp) by optimizing loop structures and dynamic object allocations during pruning sweeps.

HBL Modernization & Refactoring

1. Style & Syntax Modernization

  • AnalyzeCodonData & dNdSRateAnalysis: Comprehensively refactored AnalyzeCodonData.bf and dNdSRateAnalysis.bf to modernize scoping braces and convert incremental expressions to the cleaner HBL += operator.
  • Modernization Tracker: Added modernized_files.md and detailed progress reports under modernization_progress/ to systematically track the migration of standard analysis templates (e.g., BUSTED.bf, RELAX.bf, AnalyzeNucProtData.bf) to modernized style conventions.

2. Output Format Alignments

  • ACD Suffix: Aligned default output formatting in AnalyzeCodonData.bf to standard .ACD.json suffix conventions.

Bug Fixes

1. Sequence Cleaning (cln / rmv shortcuts)

  • HBL Comparison Bug: Resolved critical logic bugs in CleanStopCodons.bf caused by string-to-number comparisons:
    • In HBL, string >= number (e.g. filteringOption >= 2) is evaluated by converting the number to a string and performing an alphabetical check. Since all option strings (like "No/No", "No/Yes") alphabetically succeed "2", duplicate sequence filtering was erroneously running for all choices. This is resolved by comparing filteringOption directly to specific string options.
    • The modulo comparison filteringOption % 2 (used to decide if sites with gaps should be filtered) is also corrected. In HBL, the % operator on a string left-hand operand acts as a case-insensitive string equality comparison, which always returned 0 when compared to numeric 2.
    • Corrected the typo filterinOption to cln.disallow_stops on line 157.
  • Combined, these fixes ensure the Keep all sequences and sites (No/No) option correctly preserves sequences.

Testing & Quality Assurance

1. Modernization & Formatting Verification

  • Brace & Operator Formatting: Added automated formatting utility scripts format_braces.py and replace_increment.py to enforce code styling rules.
  • Integration Tests: Introduced pairwise_test.py to run combinatorial correctness checks on modernized batch scripts and prevent regression in analysis results.