Remove low coverage bases from consensus sequence? #67

jiaan-yu · 2024-04-16T09:30:34Z

Hi Yan,

I use abPOA to generate a consensus from four sequences. Here is the distribtuion of n_seq / n_cov in each base in the consensus sequence. My question is, is there a way to remove low coverage bases? For this consensus sequence, the coverage should be 0.25/0.5/0.75/1; however, the low coverage bases at 0.25 are clearly errors from the input sequences.

Cheers,
Jiaan

yangao07 · 2024-04-16T14:01:48Z

I guess you mean the consesus called by abPOA may not be as accurate as you expected?
Can you share your input sequences here so that I can have more ideas about that?

jiaan-yu · 2024-04-17T02:54:12Z

Hi Yan,

Here is a section of the MSA output from abPOA (everything at default). I outlined the expected behaviour in green box and unexpected in red.

Also attached is the input fastq file. input.fastq.txt

Cheers

yangao07 · 2024-04-17T20:17:53Z

Thanks for the clarification!
abPOA did have an option to output consensus sequence with the most common/frequent base at each position, but was removed now.
Right now, the consensus sequence is a travsal with max edge weight on the graph.

I can re-add this option.
Will post the update here.

yangao07 · 2024-04-21T16:22:26Z

The latest abPOA has an option to chose a different consensus calling method: -a1, which selects the most frequent base at each position. The default is still heaviest bundling.

abpoa -a1 input.fastq.txt -r2

jiaan-yu · 2024-04-25T08:18:42Z

Hi Yan,

Thanks so much for your swift response! I don't know if you have benchmarked the performance of two consensus algo, here is my findings: Sequences are generated with fixed and independent errors
The performance of default algo (-a0) is superior when the number of input sequences is >= 7; on the contrary, the performance of frequency algo (-a1) is much better when the number of input sequences is low...
It is good to know when to use which cons method.

You may close this issue.

Cheers,
Jiaan

yangao07 · 2024-04-25T15:21:59Z

Hi Jiaan,

Thanks for the information. Actually I wasn't aware of this "input-number-dependent" performance difference.
This is interesting to me.
I guess you have a template sequence as the ground truth, and simulated read sets with different read counts, right?
Do you mind providing more details about your simulation, or maybe simply sharing your scripts so that I can look more into this?

jiaan-yu · 2024-04-26T07:21:41Z

Hi Yan,
Here is how I tested abPOA:

(1) I randomly subset sequences from a reference fasta (E.coli etc.) as templates. The length of the templates can be 1K to 5K; but the consensus accuracy is independent of the length anyway.

(2) Simlord to insert errors into the templates https://bitbucket.org/genomeinformatics/simlord/src/master/.

simlord \
    --read-reference $template_fasta \
    -n 10 -ps 0.2 -pi 0.2 -pd 0.2 -t 0.18 \
    --no-sam $error_fasta

-n 10 for 10 repeats; -t 0.18 for overall error rate

(3) Generate consensus sequence with abPOA:
abpoa -Q -r5 $error_fasta > $ccs_fastq

(4) I use edit distance (a python package) to calculate read accuracy. https://pypi.org/project/editdistance/

import editdistance
accuracy = 1-editdistance.eval(ccs_fastq_seq, template_fasta_seq) / float(len(template_fasta_seq))

Cheers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove low coverage bases from consensus sequence? #67

Remove low coverage bases from consensus sequence? #67

jiaan-yu commented Apr 16, 2024

yangao07 commented Apr 16, 2024

jiaan-yu commented Apr 17, 2024

yangao07 commented Apr 17, 2024 •

edited

Loading

yangao07 commented Apr 21, 2024

jiaan-yu commented Apr 25, 2024 •

edited

Loading

yangao07 commented Apr 25, 2024

jiaan-yu commented Apr 26, 2024 •

edited

Loading

Remove low coverage bases from consensus sequence? #67

Remove low coverage bases from consensus sequence? #67

Comments

jiaan-yu commented Apr 16, 2024

yangao07 commented Apr 16, 2024

jiaan-yu commented Apr 17, 2024

yangao07 commented Apr 17, 2024 • edited Loading

yangao07 commented Apr 21, 2024

jiaan-yu commented Apr 25, 2024 • edited Loading

yangao07 commented Apr 25, 2024

jiaan-yu commented Apr 26, 2024 • edited Loading

yangao07 commented Apr 17, 2024 •

edited

Loading

jiaan-yu commented Apr 25, 2024 •

edited

Loading

jiaan-yu commented Apr 26, 2024 •

edited

Loading