Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allelic depth calculated, but glm & ifRNAE not calculated, MI statistics are off #27

Closed
JohnMCMa opened this issue Oct 21, 2016 · 10 comments

Comments

@JohnMCMa
Copy link

I'm running GIREMI with a list of 2,233 locations, with the first 10 lines as follows:

chr1 135124 135125 Inte 0 #
chr1 944681 944682 NOC2L 0 -
chr1 1296622 1296623 ACAP3 0 -
chr1 1635542 1635543 CDK11B 0 -
chr1 1825442 1825443 GNB1 0 -
chr1 6185501 6185502 RPL22 0 -
chr1 6223953 6223954 ICMT 0 -
chr1 6646402 6646403 DNAJC11 0 -
chr1 7970927 7970928 PARK7 0 +
chr1 7985013 7985014 PARK7 0 +

The commandline output was something like this:

[mpileup] 1 samples in 1 input files
coor:135125 chr:chr1coor:944682 chr:chr1[long list of locations with no newlines]

Calculating the MI values...
meanMI:-nan sdMI:-nan
Estimating the allelic ratios...
R CMD BATCH --no-save --no-restore '--args finput="/scratch/lym_myl_rsch/mma/Biovest/RNA-seq/RNAEditingSites/GIREMI/1112961-Tumor/1112961-Tumor-GIREMI.out-2.txt" fout="/scratch/lym_myl_rsch/mma/Biovest/RNA-seq/RNAEditingSites/GIREMI/1112961-Tumor/1112961-Tumor-GIREMI.out-2.txt.res"' giremi.r /scratch/lym_myl_rsch/mma/Biovest/RNA-seq/RNAEditingSites/GIREMI/1112961-Tumor/1112961-Tumor-GIREMI.out-2.txt.Rout
Analysis DONE!

GIREMI failed to run the R script, and reviewing the intermediate output, it looks like samtools was able to do its job as the allelic depths were counted corrected, but the stats were strange. ifRNAE doesn't exist, the GLM stats are all zero, and while raw MI looks fine, ifMI, pvalue_MI, and allelic ratio were off:

chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar ifneg iflm lmp type A C G T
chr1 135125 # 0 Inte C C G T 89 89 1 -1 -1 -1.00E+00 0.5 0 0 0 CT 0 0 0 89
chr1 944682 - 0 NOC2L A C G A 92 95 0.968421 2 0.064177 0.00E+00 0.5 0 0 0 TC 92 0 3 0
chr1 1296623 - 0 ACAP3 C G G G 16 18 0.888889 -1 -1 -1.00E+00 0.5 0 0 0 GC 0 2 16 0
chr1 1635543 - 0 CDK11B T G C C 39 39 1 2 0 0.00E+00 0.5 0 0 0 AG 0 39 0 0
chr1 1825443 - 0 GNB1 A A G A 288 290 0.993103 -1 -1 -1.00E+00 0.5 0 0 0 TC 288 0 2 0
chr1 6185502 - 0 RPL22 A G G A 248 250 0.992 2 0.016064 0.00E+00 0.5 0 0 0 TC 248 0 2 0
chr1 6223954 - 0 ICMT A C A A 99 102 0.970588 2 0.059706 0.00E+00 0.5 0 0 0 TC 99 0 3 0
chr1 6646403 - 0 DNAJC11 T G G T 6 9 0.666667 -1 -1 -1.00E+00 0.5 0 0 0 AC 0 0 3 6
chr1 7970928 + 0 PARK7 A G A A 282 283 0.996466 -1 -1 -1.00E+00 0.5 0 0 0 AG 282 0 1 0
chr1 7985014 + 0 PARK7 T G G T 278 292 0.952055 -1 -1 -1.00E+00 0.5 0 0 0 TC 3 7 4 278

I would appreciate any assistance.

PS: I can confirm the chromosome names are fine, and since I used the same index files for GATK, I take that to mean it wasn't corrupted.

@zhqingit
Copy link
Owner

Hi,

How big is the bam file? It looks GIREMI can't calculate the MI value.

Best,
Qing

2016-10-21 11:13 GMT-07:00 John Ma notifications@github.com:

I'm running GIREMI with a list of 2,233 locations, with the first 10 lines
as follows:

chr1 135124 135125 Inte 0 #
chr1 944681 944682 NOC2L 0 -
chr1 1296622 1296623 ACAP3 0 -
chr1 1635542 1635543 CDK11B 0 -
chr1 1825442 1825443 GNB1 0 -
chr1 6185501 6185502 RPL22 0 -
chr1 6223953 6223954 ICMT 0 -
chr1 6646402 6646403 DNAJC11 0 -
chr1 7970927 7970928 PARK7 0 +
chr1 7985013 7985014 PARK7 0 +

The commandline output was something like this:

[mpileup] 1 samples in 1 input files
coor:135125 chr:chr1coor:944682 chr:chr1[long list of locations with no
newlines]

Calculating the MI values...
meanMI:-nan sdMI:-nan
Estimating the allelic ratios...
R CMD BATCH --no-save --no-restore '--args finput="/scratch/lym_myl_rsch/
mma/Biovest/RNA-seq/RNAEditingSites/GIREMI/1112961-Tumor/1112961-Tumor-GIREMI.out-2.txt"
fout="/scratch/lym_myl_rsch/mma/Biovest/RNA-seq/RNAEditingSites/GIREMI/
1112961-Tumor/1112961-Tumor-GIREMI.out-2.txt.res"' giremi.r
/scratch/lym_myl_rsch/mma/Biovest/RNA-seq/RNAEditingSites/GIREMI/
1112961-Tumor/1112961-Tumor-GIREMI.out-2.txt.Rout
Analysis DONE!

GIREMI failed to run the R script, and reviewing the intermediate output,
it looks like samtools was able to do its job as the allelic depths were
counted corrected, but the stats were strange. ifRNAE doesn't exist, the
GLM stats are all zero, and while raw MI looks fine, ifMI, pvalue_MI, and
allelic ratio were off:

chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi
mi mip ar ifneg iflm lmp type A C G T
chr1 135125 # 0 Inte C C G T 89 89 1 -1 -1 -1.00E+00 0.5 0 0 0 CT 0 0 0 89
chr1 944682 - 0 NOC2L A C G A 92 95 0.968421 2 0.064177 0.00E+00 0.5 0 0 0
TC 92 0 3 0
chr1 1296623 - 0 ACAP3 C G G G 16 18 0.888889 -1 -1 -1.00E+00 0.5 0 0 0 GC
0 2 16 0
chr1 1635543 - 0 CDK11B T G C C 39 39 1 2 0 0.00E+00 0.5 0 0 0 AG 0 39 0 0
chr1 1825443 - 0 GNB1 A A G A 288 290 0.993103 -1 -1 -1.00E+00 0.5 0 0 0
TC 288 0 2 0
chr1 6185502 - 0 RPL22 A G G A 248 250 0.992 2 0.016064 0.00E+00 0.5 0 0 0
TC 248 0 2 0
chr1 6223954 - 0 ICMT A C A A 99 102 0.970588 2 0.059706 0.00E+00 0.5 0 0
0 TC 99 0 3 0
chr1 6646403 - 0 DNAJC11 T G G T 6 9 0.666667 -1 -1 -1.00E+00 0.5 0 0 0 AC
0 0 3 6
chr1 7970928 + 0 PARK7 A G A A 282 283 0.996466 -1 -1 -1.00E+00 0.5 0 0 0
AG 282 0 1 0
chr1 7985014 + 0 PARK7 T G G T 278 292 0.952055 -1 -1 -1.00E+00 0.5 0 0 0
TC 3 7 4 278

I would appreciate any assistance.

PS: I can confirm the chromosome names are fine, and since I used the same
index files for GATK, I take that to mean it wasn't corrupted.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#27, or mute the thread
https://github.com/notifications/unsubscribe-auth/ALL2zXqMhxqoYvswuqhNYJwAw3lgcSDWks5q2QDGgaJpZM4Kddop
.

@JohnMCMa
Copy link
Author

1,179,375kb. This is the default bamout from GATK, of which I start to wonder if it has dropped too much reads. There're several flags in GATK of which I can obtain a more comprehensive set of reads; I may have to play with them.

I'm convinced, however, that if the location list used in GIREMI originates from HaplotypeCaller or MuTect2, then it must be some kind of GATK bamout from the same run that should be used as the BAM as well.

@JohnMCMa
Copy link
Author

JohnMCMa commented Nov 1, 2016

Hi Qing,
Do you think a .bam with 51,996,757 reads (as counted by samtools view -c ) has enough reads for GIREMI?

@JohnMCMa
Copy link
Author

JohnMCMa commented Nov 3, 2016

Hi Qing,

I have just re-ran GIREMI with a fresh more "conventional" BAM but with the same list. This time the situation seems to have improved in that the the mi and if_mi fields seems to be calculated correctly, but the pvalue of MI (mip) in the file and the mean and SDs of MI emitted on the screen are still off. mip is either 0 or -1, and both meanMI and sdMI are -nan.

I have attached the output of the C code for your reference.
SNPiR_1112961-Tumor.UC.GIREMI.results.txt

@zhqingit
Copy link
Owner

zhqingit commented Nov 4, 2016

Hi John,

I found the nucleotide ratios of many variants are very high 0.98~1. I
believed there are many homozygous SNV in your list that might affect the
calculation. Please use more strict filters to remove these SNVs and try
again. Thanks.

Best,
Qing

2016-11-03 12:43 GMT-07:00 John Ma notifications@github.com:

Hi Qing,

I have just re-ran GIREMI with a fresh more "conventional" BAM but with
the same list. This time the situation seems to have improved in that the
the mi and if_mi fields seems to be calculated correctly, but the pvalue of
MI (mip) in the file and the mean and SDs of MI emitted on the screen are
still off. mip is either 0 or -1, and both meanMI and sdMI are -nan.

I have attached the output of the C code for your reference.
SNPiR_1112961-Tumor.UC.GIREMI.results.txt
https://github.com/zhqingit/giremi/files/569949/SNPiR_1112961-Tumor.UC.GIREMI.results.txt


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#27 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ALL2zd_RhNMnKAX5EwBZ9eThqADgFtdwks5q6jlZgaJpZM4Kddop
.

@JohnMCMa
Copy link
Author

JohnMCMa commented Nov 9, 2016

Sorry for the delay; I had other projects to handle.

I have further filtered the input to limit it to SNPs that have alt allelic fraction between (and including) 0.1-0.9 from both UnifiedGenotyper and SNPiR. It comes out the same--mip are either -1 or 0.

The list and the output by the C code attached. The coordinates are from GRCh38, primary chromosomes only, GENCODE version.

Tumor.filtered-rmhomop.GIREMI.list.txt
Tumor.filtered-rmhomop.GIREMI.out.txt

@zhqingit
Copy link
Owner

Hi John,

I found there are no any known SNP in your list.

Best,
Qing

2016-11-09 14:40 GMT-07:00 John Ma notifications@github.com:

Sorry for the delay; I had other projects to handle.

I have further filtered the input to limit it to SNPs that have alt
allelic fraction between (and including) 0.1-0.9 from both UnifiedGenotyper
and SNPiR. It comes out the same--mip are either -1 or 0.

The list and the output by the C code attached. The coordinates are from
GRCh38, primary chromosomes only, GENCODE version.

Tumor.filtered-rmhomop.GIREMI.list.txt
https://github.com/zhqingit/giremi/files/581881/Tumor.filtered-rmhomop.GIREMI.list.txt
Tumor.filtered-rmhomop.GIREMI.out.txt
https://github.com/zhqingit/giremi/files/581882/Tumor.filtered-rmhomop.GIREMI.out.txt


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#27 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ALL2zZAur1SBWMfI0_3MF_OMWo-vXxHTks5q8j24gaJpZM4Kddop
.

@JohnMCMa
Copy link
Author

@zhqingit:
Yes, I have filtered away dbSNP positions beforehand. Are those required for GIREMI calculations?

@zhqingit
Copy link
Owner

Yes, giremi needs these known SNP information to create the MI
distribution.

Best,
Qing

2016-11-10 7:11 GMT-07:00 John Ma notifications@github.com:

@zhqingit https://github.com/zhqingit:
Yes, I have filtered away dbSNP positions beforehand. Are those required
for GIREMI calculations?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#27 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ALL2zXa5pSyBZl0ZO0EyXqocCHXmxMTOks5q8yYvgaJpZM4Kddop
.

@JohnMCMa
Copy link
Author

JohnMCMa commented Nov 14, 2016

@zhqingit: Spiking in dbSNP variants does complete the program. Thanks!
Closing ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants