Good settings for enriched similar sequences #6

suzukimicro · 2021-10-08T11:55:48Z

Hi,
We are struggling to perform de novo assembly of meta bacterial samples selectively cultured with antimicrobials from wasterwater using hifiasm-meta with the default parameters. The sequencing depth seemed to be fine, but the number of circulated bacterial genomes and plasmids is not large, so the resulted contigs would not be good. We guess the cause might be due to the increased redundancy of sequences (bacterial species and plasmids). Someone knows if there are any effective settings to deal with this kind of data?
Thanks!

xfengnefx · 2021-10-08T14:27:09Z

What's the library size? Could you share the log file (stderr)? Maybe you can try a run with -A using the previous bin files. This does a slightly more aggressive local graph cleaning, would help if you already get circle-ish tangled subgraphs in bandage layout, but probably not useful otherwise.

lh3 · 2021-10-08T14:31:50Z

the number of circulated bacterial genomes and plasmids is not large

Just curious: what's the number?

suzukimicro · 2021-10-08T14:54:50Z

What's the library size? Could you share the log file (stderr)? Maybe you can try a run with -A using the previous bin files. This does a slightly more aggressive local graph cleaning, would help if you already get circle-ish tangled subgraphs in bandage layout, but probably not useful otherwise.

Thank you for your suggestion. We will try it.
Here are the summary of our hifiasm_meta analysis and the log file.

Hifiasm_meta_p_ctg.pdf

Hifiasm_meta.log

suzukimicro · 2021-10-08T14:56:46Z

the number of circulated bacterial genomes and plasmids is not large

Just curious: what's the number?

Thank you for your comment. In this case, the number of circular contigs were 50.

xfengnefx · 2021-10-08T15:21:46Z

I saw you're using r40, so please pull and make before the -A rerun - it was introduced later. The newer revisions also had some bug fixes.

Thanks for the bandage plot. Since you mentioned the sample was selectively cultured, is there any strain or genera that you expect to see in the assembly, but is not assembled well? The graph looks pretty clean to me. For the disconnected linear contigs (4th row in the bandage plot, I guess those are in the 300kb-1Mb range?), maybe you can have a look at their ts:B tag in gfa S lines, or just plot *r_utg.nose.gfa with bandage. My guess is the tags are short in length, and the unitig graph plot has roughly the same overall structure compared to the p_ctg one.

xfengnefx · 2021-10-08T15:26:39Z

the number of circulated bacterial genomes and plasmids is not large

Just curious: what's the number?

Thank you for your comment. In this case, the number of circular contigs were 50.

A side note: some small circular ones may be plasmids. In a mock dataset we were able to confirm some thanks to the reference. However, it was hard to directly evaluate in real datasets. I would love to learn if there's any robust way to do the quality check...

suzukimicro · 2021-10-08T15:53:15Z

I saw you're using r40, so please pull and make before the -A rerun - it was introduced later. The newer revisions also had some bug fixes.

Thanks for the bandage plot. Since you mentioned the sample was selectively cultured, is there any strain or genera that you expect to see in the assembly, but is not assembled well? The graph looks pretty clean to me. For the disconnected linear contigs (4th row in the bandage plot, I guess those are in the 300kb-1Mb range?), maybe you can have a look at their ts:B tag in gfa S lines, or just plot *r_utg.nose.gfa with bandage. My guess is the tags are short in length, and the unitig graph plot has roughly the same overall structure compared to the p_ctg one.

Thank you for your kind help. Let you know the result.

A side note: some small circular ones may be plasmids. In a mock dataset we were able to confirm some thanks to the reference. However, it was hard to directly evaluate in real datasets. I would love to learn if there's any robust way to do the quality check...

We want to analyze AMR-associated plasmids. That's why we selected environmental bacteria by antimicrobials. For plasmids, there would be no tool to evaluate completeness, such as CheckM and CheckV. Examination of circular contigs and the presence of plasmid replicon genes (can be detected by MOB-typer) would be evaluation methods for assembly. We can provide our raw data if it would help improving the development of this wonderful program.

xfengnefx · 2021-10-08T16:26:26Z

I see. Plasmids are tricky..

Would be great if you can share the readset set with us. We will use it only for method developement. My mail is xfeng (at) ds.dfci.harvard.edu if you would like to bring the conversation there, thanks!

xfengnefx closed this as completed May 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Good settings for enriched similar sequences #6

Good settings for enriched similar sequences #6

suzukimicro commented Oct 8, 2021

xfengnefx commented Oct 8, 2021

lh3 commented Oct 8, 2021

suzukimicro commented Oct 8, 2021

suzukimicro commented Oct 8, 2021 •

edited

xfengnefx commented Oct 8, 2021

xfengnefx commented Oct 8, 2021

suzukimicro commented Oct 8, 2021 •

edited

xfengnefx commented Oct 8, 2021

Good settings for enriched similar sequences #6

Good settings for enriched similar sequences #6

Comments

suzukimicro commented Oct 8, 2021

xfengnefx commented Oct 8, 2021

lh3 commented Oct 8, 2021

suzukimicro commented Oct 8, 2021

suzukimicro commented Oct 8, 2021 • edited

xfengnefx commented Oct 8, 2021

xfengnefx commented Oct 8, 2021

suzukimicro commented Oct 8, 2021 • edited

xfengnefx commented Oct 8, 2021

suzukimicro commented Oct 8, 2021 •

edited

suzukimicro commented Oct 8, 2021 •

edited