Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Good settings for enriched similar sequences #6

Closed
suzukimicro opened this issue Oct 8, 2021 · 8 comments
Closed

Good settings for enriched similar sequences #6

suzukimicro opened this issue Oct 8, 2021 · 8 comments

Comments

@suzukimicro
Copy link

Hi,
We are struggling to perform de novo assembly of meta bacterial samples selectively cultured with antimicrobials from wasterwater using hifiasm-meta with the default parameters. The sequencing depth seemed to be fine, but the number of circulated bacterial genomes and plasmids is not large, so the resulted contigs would not be good. We guess the cause might be due to the increased redundancy of sequences (bacterial species and plasmids). Someone knows if there are any effective settings to deal with this kind of data?
Thanks!

@xfengnefx
Copy link
Owner

What's the library size? Could you share the log file (stderr)? Maybe you can try a run with -A using the previous bin files. This does a slightly more aggressive local graph cleaning, would help if you already get circle-ish tangled subgraphs in bandage layout, but probably not useful otherwise.

@lh3
Copy link

lh3 commented Oct 8, 2021

the number of circulated bacterial genomes and plasmids is not large

Just curious: what's the number?

@suzukimicro
Copy link
Author

What's the library size? Could you share the log file (stderr)? Maybe you can try a run with -A using the previous bin files. This does a slightly more aggressive local graph cleaning, would help if you already get circle-ish tangled subgraphs in bandage layout, but probably not useful otherwise.

Thank you for your suggestion. We will try it.
Here are the summary of our hifiasm_meta analysis and the log file.

Hifiasm_meta_p_ctg.pdf
Hifiasm_meta_p_ctg
Hifiasm_meta.log

@suzukimicro
Copy link
Author

suzukimicro commented Oct 8, 2021

the number of circulated bacterial genomes and plasmids is not large

Just curious: what's the number?

Thank you for your comment. In this case, the number of circular contigs were 50.

@xfengnefx
Copy link
Owner

I saw you're using r40, so please pull and make before the -A rerun - it was introduced later. The newer revisions also had some bug fixes.

Thanks for the bandage plot. Since you mentioned the sample was selectively cultured, is there any strain or genera that you expect to see in the assembly, but is not assembled well? The graph looks pretty clean to me. For the disconnected linear contigs (4th row in the bandage plot, I guess those are in the 300kb-1Mb range?), maybe you can have a look at their ts:B tag in gfa S lines, or just plot *r_utg.nose.gfa with bandage. My guess is the tags are short in length, and the unitig graph plot has roughly the same overall structure compared to the p_ctg one.

@xfengnefx
Copy link
Owner

the number of circulated bacterial genomes and plasmids is not large

Just curious: what's the number?

Thank you for your comment. In this case, the number of circular contigs were 50.

A side note: some small circular ones may be plasmids. In a mock dataset we were able to confirm some thanks to the reference. However, it was hard to directly evaluate in real datasets. I would love to learn if there's any robust way to do the quality check...

@suzukimicro
Copy link
Author

suzukimicro commented Oct 8, 2021

I saw you're using r40, so please pull and make before the -A rerun - it was introduced later. The newer revisions also had some bug fixes.

Thanks for the bandage plot. Since you mentioned the sample was selectively cultured, is there any strain or genera that you expect to see in the assembly, but is not assembled well? The graph looks pretty clean to me. For the disconnected linear contigs (4th row in the bandage plot, I guess those are in the 300kb-1Mb range?), maybe you can have a look at their ts:B tag in gfa S lines, or just plot *r_utg.nose.gfa with bandage. My guess is the tags are short in length, and the unitig graph plot has roughly the same overall structure compared to the p_ctg one.

Thank you for your kind help. Let you know the result.

A side note: some small circular ones may be plasmids. In a mock dataset we were able to confirm some thanks to the reference. However, it was hard to directly evaluate in real datasets. I would love to learn if there's any robust way to do the quality check...

We want to analyze AMR-associated plasmids. That's why we selected environmental bacteria by antimicrobials. For plasmids, there would be no tool to evaluate completeness, such as CheckM and CheckV. Examination of circular contigs and the presence of plasmid replicon genes (can be detected by MOB-typer) would be evaluation methods for assembly. We can provide our raw data if it would help improving the development of this wonderful program.

@xfengnefx
Copy link
Owner

I see. Plasmids are tricky..

Would be great if you can share the readset set with us. We will use it only for method developement. My mail is xfeng (at) ds.dfci.harvard.edu if you would like to bring the conversation there, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants