Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not calling variant present in bam #7

Open
Mailinnia opened this issue Oct 21, 2020 · 2 comments
Open

Not calling variant present in bam #7

Mailinnia opened this issue Oct 21, 2020 · 2 comments

Comments

@Mailinnia
Copy link

I have used your tool on one of my bam files. However, I am wondering why it isn't calling a variant I can clearly see present in the file in IGV. There is plenty of coverage, and few indels in the reads at this position.

image

I find it in the snp_stats, but I do not understand why it is not output in the snps.vcf:
pos,ref,prob_GT,prob_A,prob_G,prob_T,prob_C,DP,freq
42131531,G,0.9530,0.1847,0.9624,0.0013,0.0008,111,0.3063

I'm running the following command:
python ../NanoCaller/scripts/NanoCaller.py -bam gene.sort2.bam -ref hg38_genome.fasta -prefix output -chrom Chr_22 -start 42076077 -end 42176157 --disable_whatshap -sup

@umahsn
Copy link
Collaborator

umahsn commented Oct 21, 2020

It is present in the snp_stats file because it was picked up as a candidate site, but not included in the vcf file because it was determined to be false positive. NanoCaller calculated probability of presence of A base =0.1847 which is too low for a variant call. Are you using Nanopore reads or PacBio? It might help to zoom out on IGV to see the surrounding 1-2000 bp for a better understanding of why this was regarded as false positive.

We are planning to release an update which allows you to get a different snapshot of the bam file than IGV, similar to the one in Fig1 of our biorxiv paper. It will show you only the high alternative allele frequency sites and skips other bases, and this would allow you to see if an allele might be false or not.

@Mailinnia
Copy link
Author

Mailinnia commented Oct 21, 2020

If I filter for only primary read mappings, then it calls the variant fine:
snps.vcf:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
Chr_22 42131531 . G A 38.810 PASS . GT:DP:FQ 0/1:45:0.4444

snp_stats:
pos,ref,prob_GT,prob_A,prob_G,prob_T,prob_C,DP,freq
42131531,G,0.9283,0.5908,0.9420,0.0000,0.0000,45,0.4444

igv_snapshot

I'm trying to understand why it calculates the probability of presence of A base to be so low when the supplementary reads are included.
I'm guessing including the supplementary reads introduces too much 'noise' in the surrounding area?

I'm using ONT data that has been corrected with Netcat. I just wanted to compare the variant callings with and without error correction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants