-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement skani as fastani alternative #30
Conversation
implement fastani clusterer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh I think that's roughly the idea
num_kmers: 1000, | ||
kmer_length: 21, | ||
}, | ||
&crate::skani::SkaniClusterer { threshold: 99.0 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Higher than for fastani. Skani gives ANI of >98% for all pairs measured.
Few steps closer. Still need to add to cli. |
Are you thinking per-disconnected component after the preclusterer? Or just in total? If the latter then no point in preclustering, I think. |
Not sure. Both are possibilities, though skani doesn't recommend comparing genomes with <82% ANI, so we would have to deal with that if we skip preclustering, right? Though it says "If the resulting aligned fraction for the two genomes is < 15%, no output is given.", so maybe <82% just doesn't give an answer, rather than giving an unreliable answer. |
options: fastani, skani
fastani_min_aligned_threshold, | ||
fastani_fraglen, | ||
), | ||
Preclusterer::Dashing { min_ani, threads } => match self.clusterer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting a bit unwieldy. I tried a let preclusterer = match
, but it doesn't work since they are different types. Should the Preclusterer/Clusterer enum's be defined using the underlying structs instead of dummy ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue is that here we have to define the behaviour for every combination of clusterer and preclusterer, right?
I think the answer is yes, well enum or dyn, up to you
Also, I get this warning on compile: |
|
I didn't go through every line, but seems about good. I think you need to add skani to the conda yml, and can you enable runs on PR using on: [push, pull_request] in the actions yml please? |
You added that argument, so won't show up until docs are redployed from main/release. |
Add skani as fastani alternative
new
method?)find_representatives
andfind_memberships
back intoclusterer.rs
Clusterer
through above functions so it needs only implementcalculate_ani
get_threshold
method?calculate_skani