Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement skani as fastani alternative #30

Merged
merged 8 commits into from
Aug 5, 2023
Merged

Conversation

AroneyS
Copy link
Collaborator

@AroneyS AroneyS commented Jul 14, 2023

Add skani as fastani alternative

  • Refactor fastani implementation to allow enumeration
    • Assert threshold > 1 (in new method?)
    • Move find_representatives and find_memberships back into clusterer.rs
    • Pass Clusterer through above functions so it needs only implement calculate_ani
    • Need get_threshold method?
  • Add skani as fastani alternative
    • Implement calculate_skani
  • Add to cli

@AroneyS AroneyS changed the title Implement skani Implement skani as fastani alternative Jul 19, 2023
src/clusterer.rs Outdated Show resolved Hide resolved
src/clusterer.rs Outdated Show resolved Hide resolved
Copy link
Owner

@wwood wwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh I think that's roughly the idea

src/clusterer.rs Outdated Show resolved Hide resolved
src/clusterer.rs Outdated Show resolved Hide resolved
num_kmers: 1000,
kmer_length: 21,
},
&crate::skani::SkaniClusterer { threshold: 99.0 },
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Higher than for fastani. Skani gives ANI of >98% for all pairs measured.

@AroneyS
Copy link
Collaborator Author

AroneyS commented Jul 25, 2023

Few steps closer. Still need to add to cli.
Also may be faster to let skani handle the fileio in bulk, like in https://github.com/bluenote-1577/skani-lib-example/blob/main/src/main.rs. But then we would need independent logic for skani/fastani, since it would calculate all pairs up-front. Could add it to the initialise method for skani and then reference a look-up table in calculate_ani...

@wwood
Copy link
Owner

wwood commented Jul 26, 2023

Are you thinking per-disconnected component after the preclusterer? Or just in total? If the latter then no point in preclustering, I think.

@AroneyS
Copy link
Collaborator Author

AroneyS commented Jul 26, 2023

Not sure. Both are possibilities, though skani doesn't recommend comparing genomes with <82% ANI, so we would have to deal with that if we skip preclustering, right? Though it says "If the resulting aligned fraction for the two genomes is < 15%, no output is given.", so maybe <82% just doesn't give an answer, rather than giving an unreliable answer.

options: fastani, skani
fastani_min_aligned_threshold,
fastani_fraglen,
),
Preclusterer::Dashing { min_ani, threads } => match self.clusterer {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting a bit unwieldy. I tried a let preclusterer = match, but it doesn't work since they are different types. Should the Preclusterer/Clusterer enum's be defined using the underlying structs instead of dummy ones?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is that here we have to define the behaviour for every combination of clusterer and preclusterer, right?

I think the answer is yes, well enum or dyn, up to you

@AroneyS
Copy link
Collaborator Author

AroneyS commented Jul 27, 2023

Also, I get this warning on compile: warning: the following packages contain code that will be rejected by a future version of Rust: buf_redux v0.8.4, partitions v0.2.4

@AroneyS
Copy link
Collaborator Author

AroneyS commented Jul 27, 2023

--cluster-method isn't in the --full-help. Is there somewhere that I missed? Or do I have to rebuild docs?

@AroneyS AroneyS requested a review from wwood July 27, 2023 23:15
@wwood
Copy link
Owner

wwood commented Jul 28, 2023

I didn't go through every line, but seems about good. I think you need to add skani to the conda yml, and can you enable runs on PR using

on: [push, pull_request]

in the actions yml please?

@wwood
Copy link
Owner

wwood commented Aug 5, 2023

--cluster-method isn't in the --full-help. Is there somewhere that I missed? Or do I have to rebuild docs?

You added that argument, so won't show up until docs are redployed from main/release.

@wwood wwood marked this pull request as ready for review August 5, 2023 22:12
@wwood wwood merged commit 1c3d905 into wwood:main Aug 5, 2023
1 check passed
@AroneyS AroneyS deleted the implement-skani branch August 6, 2023 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants