Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High coverage genotyping outputs only G #27

Open
rhinempi opened this issue Jun 28, 2022 · 10 comments
Open

High coverage genotyping outputs only G #27

rhinempi opened this issue Jun 28, 2022 · 10 comments

Comments

@rhinempi
Copy link

Dear Developer,

I am trying to use NanoCaller on a high coverage (2000x on Microbial genomes) datasets. When I omit the sub-sampling process, the result genotypes all turned to G

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=umi1bins>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Depth">
##FORMAT=<ID=FQ,Number=1,Type=Float,Description="Alternative Allele Frequency">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
xxx 511 . T G 37.230 PASS . GT:DP:FQ 1/1:2519:0.2588
xxx 1077 . C G 53.094 PASS . GT:DP:FQ 1/1:2519:0.2060
xxx 1078 . T G 60.799 PASS . GT:DP:FQ 1/1:2519:0.2557
xxx 1944 . C G 40.271 PASS . GT:DP:FQ 1/1:2518:0.2121
xxx 1949 . C G 31.177 PASS . GT:DP:FQ 1/1:2518:0.1898
xxx 2173 . C G 43.839 PASS . GT:DP:FQ 1/1:2517:0.5161
xxx 2232 . C G 38.336 PASS . GT:DP:FQ 1/1:2504:0.1773

Liren

@umahsn
Copy link
Collaborator

umahsn commented Jun 28, 2022

Hi Liren,

Can you try running NanoCaller with `-nbr_t '0.2,0.6' parameter? Also, is it possible for you to share an IGV plot of what this region 511-2232 looks like?

@rhinempi
Copy link
Author

Hi Mian Umair Ahsan,

thanks for your response.

Screenshot 2022-06-28 at 21 14 40

I have tried the new parameter that you suggested. It seems it removes the SNPs that are close neighbours. But the G genotypes still exist, see below:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=umi1bins>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Depth">
##FORMAT=<ID=FQ,Number=1,Type=Float,Description="Alternative Allele Frequency">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
xxx 511 . T G 33.975 PASS . GT:DP:FQ 1/1:2519:0.2588
xxx 1078 . T G 52.364 PASS . GT:DP:FQ 1/1:2519:0.2557
xxx 2173 . C G 43.977 PASS . GT:DP:FQ 1/1:2517:0.5161

For IGV plot, please see attached.

@pkerbs
Copy link

pkerbs commented Mar 14, 2023

Dear Developers,
I have noticed the same problem. Is there a solution?
Every variant call outputs a G as the ALT base:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	ERR4080473
MN908947.3	92	.	C	G	47.315	PASS	vafator_af=0.0003029843963035904;vafator_ac=2;vafator_dp=6601	GT:DP:FQ	1/1:6601:0.0783
MN908947.3	241	.	C	G	47.345	PASS	vafator_af=0.012721490231712857;vafator_ac=84;vafator_dp=6603	GT:DP:FQ	1/1:6603:0.8358
MN908947.3	337	.	C	G	47.221	PASS	vafator_af=0.0013630168105406633;vafator_ac=9;vafator_dp=6603	GT:DP:FQ	1/1:6603:0.0519
MN908947.3	367	.	C	G	47.336	PASS	vafator_af=0.009995456610631531;vafator_ac=66;vafator_dp=6603	GT:DP:FQ	1/1:6603:0.0698
MN908947.3	397	.	A	G	42.414	PASS	vafator_af=0.05088596092685143;vafator_ac=336;vafator_dp=6603	GT:DP:FQ	1/1:6603:0.0509
MN908947.3	432	.	T	G	44.453	PASS	vafator_af=0.004240496743904286;vafator_ac=28;vafator_dp=6603	GT:DP:FQ	1/1:6603:0.0733
MN908947.3	454	.	A	G	42.383	PASS	vafator_af=0.16159321520520975;vafator_ac=1067;vafator_dp=6603	GT:DP:FQ	1/1:6603:0.1616
MN908947.3	456	.	T	G	44.352	PASS	vafator_af=0.005603513554444949;vafator_ac=37;vafator_dp=6603	GT:DP:FQ	1/1:6603:0.0513
MN908947.3	574	.	A	G	42.236	PASS	vafator_af=0.058003937604119336;vafator_ac=383;vafator_dp=6603	GT:DP:FQ	1/1:6603:0.058
MN908947.3	580	.	T	G	44.345	PASS	vafator_af=0.0016659094351052551;vafator_ac=11;vafator_dp=6603	GT:DP:FQ	1/1:6603:0.0721
MN908947.3	610	.	A	G	42.445	PASS	vafator_af=0.0631531122217174;vafator_ac=417;vafator_dp=6603	GT:DP:FQ	1/1:6603:0.0632
MN908947.3	664	.	C	G	47.232	PASS	vafator_af=0.002120248371952143;vafator_ac=14;vafator_dp=6603	GT:DP:FQ	1/1:6603:0.0566
MN908947.3	665	.	C	G	47.291	PASS	vafator_af=0.005719068210350118;vafator_ac=41;vafator_dp=7169	GT:DP:FQ	1/1:7169:0.1121
MN908947.3	672	.	A	G	42.452	PASS	vafator_af=0.05645386116531921;vafator_ac=405;vafator_dp=7174	GT:DP:FQ	1/1:7174:0.0565
MN908947.3	679	.	C	G	47.325	PASS	vafator_af=0.0019504040122596824;vafator_ac=14;vafator_dp=7178	GT:DP:FQ	1/1:7178:0.0755
MN908947.3	700	.	A	G	42.432	PASS	vafator_af=0.05682451253481894;vafator_ac=408;vafator_dp=7180	GT:DP:FQ	1/1:7180:0.0568
MN908947.3	724	.	T	G	44.388	PASS	vafator_af=0.0015318200807686953;vafator_ac=11;vafator_dp=7181	GT:DP:FQ	1/1:7181:0.1106
MN908947.3	726	.	A	G	42.432	PASS	vafator_af=0.05514552290767303;vafator_ac=396;vafator_dp=7181	GT:DP:FQ	1/1:7181:0.0551
MN908947.3	761	.	A	G	42.394	PASS	vafator_af=0.9338440111420613;vafator_ac=6705;vafator_dp=7180	GT:DP:FQ	1/1:7180:0.9338
MN908947.3	776	.	C	G	47.226	PASS	vafator_af=0.0036211699164345403;vafator_ac=26;vafator_dp=7180	GT:DP:FQ	1/1:7180:0.051
MN908947.3	818	.	T	G	44.452	PASS	vafator_af=0.001671541997492687;vafator_ac=12;vafator_dp=7179	GT:DP:FQ	1/1:7179:0.0744
MN908947.3	956	.	T	G	44.354	PASS	vafator_af=0.0008358874338255781;vafator_ac=6;vafator_dp=7178	GT:DP:FQ	1/1:7178:0.0545

I used the following parameters:

--preset ont
--cpu ${params.cpus}
--min_allele_freq 0.05
--ins_threshold 0.05
--del_threshold 0.05
--maxcov 1000
--mincov 10 

@umahsn
Copy link
Collaborator

umahsn commented Mar 14, 2023

Hi,

Thank you bringing this to our attention and we would like to test this on our end too. Is this data publicly available for us to download and test? If not, would it be possible for you to share a small sample of the data for us to evaluate?

@pkerbs
Copy link

pkerbs commented Mar 14, 2023

Hi,
thank you for testing this.
I used the publicly available SARS CoV2 sample ERR4080473 which is available here:
ftp.sra.ebi.ac.uk/vol1/fastq/ERR408/003/ERR4080473/ERR4080473_1.fastq.gz

MD5SUM:
312b51c9422663b1c0d322f3f083dcad

@umahsn
Copy link
Collaborator

umahsn commented Mar 14, 2023

Thank you, I will get back to you shortly after running some tests.

@umahsn
Copy link
Collaborator

umahsn commented Mar 15, 2023

Hi, the error should be fixed now. There was problem with properly normalizing coverage, and also an integer overflow due to very high coverage. Both issues have been fixed and you should get appropriate variant calls now. With the settings you used, I was able to get the following variants:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
MN908947.3      30      .       A       G       40.002  PASS    .       GT:DP:FQ        1/1:17:0.0588
MN908947.3      241     .       C       T       388.507 PASS    .       GT:DP:FQ        1/1:7400:0.8147
MN908947.3      761     .       A       G       439.735 PASS    .       GT:DP:FQ        1/1:8015:0.9201
MN908947.3      1059    .       C       T       312.051 PASS    .       GT:DP:FQ        1/1:638:0.9404
MN908947.3      3037    .       C       T       305.081 PASS    .       GT:DP:FQ        1/1:2630:0.5856
MN908947.3      6825    .       A       C       440.255 PASS    .       GT:DP:FQ        1/1:1934:0.8268
MN908947.3      14408   .       C       T       371.324 PASS    .       GT:DP:FQ        1/1:7573:0.8575
MN908947.3      23403   .       A       G       420.27  PASS    .       GT:DP:FQ        1/1:2616:0.8375
MN908947.3      25563   .       G       T       407.427 PASS    .       GT:DP:FQ        1/1:2676:0.8647

Thank you for bringing this to our attention. Let me know if you still have any trouble running NanoCaller.

@pkerbs
Copy link

pkerbs commented Mar 15, 2023

Hi,
thank you again for the fast response and I will try it out.
I just noticed that the new release here on github is versioned 3.1.0 while in conda the version is 3.0.1?

Best,
Paul

@pkerbs
Copy link

pkerbs commented Mar 16, 2023

Hi Mian,
I got the same variant calls as you know, except the one at position 30 using the version 3.0.1 from conda.
But it might be due to different pre-processing of the fastq. So I guess its fine :)
Would indel calls also be affected by this issue? Unfortunately, I don't have a sample at hand to test that at the moment.

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	ERR4080473
MN908947.3	241	.	C	T	374.199	PASS	.	GT:DP:FQ	1/1:6603:0.8358
MN908947.3	761	.	A	G	427.729	PASS	.	GT:DP:FQ	1/1:7180:0.9338
MN908947.3	1059	.	C	T	375.889	PASS	.	GT:DP:FQ	1/1:901:0.9489
MN908947.3	3037	.	C	T	291.825	PASS	.	GT:DP:FQ	1/1:2766:0.5983
MN908947.3	6825	.	A	C	426.855	PASS	.	GT:DP:FQ	1/1:1768:0.8405
MN908947.3	14408	.	C	T	404.003	PASS	.	GT:DP:FQ	1/1:7185:0.8703
MN908947.3	23403	.	A	G	405.387	PASS	.	GT:DP:FQ	1/1:2364:0.86
MN908947.3	25563	.	G	T	381.818	PASS	.	GT:DP:FQ	1/1:2418:0.8747

@umahsn
Copy link
Collaborator

umahsn commented Mar 17, 2023

I don't think the indel calling would be affected, but I will take a look at it to make sure. As for v3.0.1 or 3.1.0, I think there may be an issue with syncing version numbers between github and conda. But there was only one release after v3.0.0 so both 3.0.1 on conda and 3.1 on github should be the same. I will try to fix the version numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants