Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

noodles implementation is slower than rust-htslib, did I do something suboptimal? #218

Closed
bwlang opened this issue Nov 26, 2023 · 5 comments

Comments

@bwlang
Copy link

bwlang commented Nov 26, 2023

rust-htslib:

cat   0.00s user 0.01s system 1% cpu 0.568 total
sudo cargo run --release -- > tandem_reads.bam  0.43s user 0.04s system 82% cpu 0.570 total

noodles:

cat   0.00s user 0.01s system 0% cpu 1.192 total
sudo cargo run --release -- > tandem_reads.bam  1.04s user 0.05s system 91% cpu 1.195 total

i first implemented with htslib, then tried switching to noodles with no architectural changes.
image

for noodles i'm opening like this:

    let mut bam_reader = bam::reader::Builder.build_from_path(&opt.input_bam_file)?;
    let header = bam_reader.read_header()?;
    let mut bam_writer = bam::writer::Builder.build_from_path(&opt.bam_output_file)?;
    for r in bam_reader.records(&header) {
        ...
    }

for htlslib like this:

    let mut bam = bam::Reader::from_path(&opt.input_bam_file).unwrap();
    let header = bam::Header::from_template(bam.header());
    let mut bam_out = bam::Writer::from_path(&opt.bam_output_file, &header, bam::Format::Bam).unwrap();
    for r in bam.records() {
        ...
    }

The internal filtering logic is the same , reasoning about flags insert size , etc - writing out some records.

Any suggestions?

@bwlang
Copy link
Author

bwlang commented Nov 26, 2023

this is on a macbook arm64 computer.

@zaeleus
Copy link
Owner

zaeleus commented Nov 26, 2023

noodles tends to do more record validation and tries to ensure data is spec compliant. However, in your flamegraphs, it seems the DEFLATE implementation htslib linked to is using SIMD intrinsics (see inflate_fast_neon and longest_match_neon) on your CPU. By default, noodles uses the pure Rust implementation miniz_oxide, which can't compete in this case.

Can you retime your tests using libdeflate for both? I.e., cargo add noodles-bgzf --features libdeflate for noodles and cargo add rust-htslib --features libdeflate for rust-htslib. libdeflate is much faster for this application, regardless, and it would be better than comparing differing DEFLATE implementations.

@bwlang
Copy link
Author

bwlang commented Nov 26, 2023

with htslib libdeflate:

cat   0.00s user 0.01s system 1% cpu 0.574 total
sudo cargo run --release -- > tandem_reads.bam  0.45s user 0.03s system 83% cpu 0.577 total

with noodles

cat   0.00s user 0.01s system 1% cpu 0.740 total
sudo cargo run --release -- > tandem_reads.bam  0.61s user 0.05s system 89% cpu 0.743 total
image

much closer... thanks for the suggestion! Anything else I should try?

@zaeleus
Copy link
Owner

zaeleus commented Nov 26, 2023

If you don't need owned records, you can reuse the record buffer.

let mut record = sam::alignment::Record::default();

while bam_reader.read_record(&header, &mut record)? != 0 {
    // ...
}

When doing this, you may also want to use Reader::rc_records in your rust-htslib test for a more fair comparison.

@zaeleus
Copy link
Owner

zaeleus commented Jan 25, 2024

noodles 0.61.0 now allows alignment format records to read and written, but I'm not sure if this will help in a passthrough case like in your tests. Feel free to open a new discussion if you have any further questions.

@zaeleus zaeleus closed this as completed Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants