-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allelic depth calculated, but glm & ifRNAE not calculated, MI statistics are off #27
Comments
Hi, How big is the bam file? It looks GIREMI can't calculate the MI value. Best, 2016-10-21 11:13 GMT-07:00 John Ma notifications@github.com:
|
1,179,375kb. This is the default I'm convinced, however, that if the location list used in GIREMI originates from HaplotypeCaller or MuTect2, then it must be some kind of GATK bamout from the same run that should be used as the BAM as well. |
Hi Qing, |
Hi Qing, I have just re-ran GIREMI with a fresh more "conventional" BAM but with the same list. This time the situation seems to have improved in that the the mi and if_mi fields seems to be calculated correctly, but the pvalue of MI (mip) in the file and the mean and SDs of MI emitted on the screen are still off. mip is either 0 or -1, and both meanMI and sdMI are -nan. I have attached the output of the C code for your reference. |
Hi John, I found the nucleotide ratios of many variants are very high 0.98~1. I Best, 2016-11-03 12:43 GMT-07:00 John Ma notifications@github.com:
|
Sorry for the delay; I had other projects to handle. I have further filtered the input to limit it to SNPs that have alt allelic fraction between (and including) 0.1-0.9 from both UnifiedGenotyper and SNPiR. It comes out the same--mip are either -1 or 0. The list and the output by the C code attached. The coordinates are from GRCh38, primary chromosomes only, GENCODE version. Tumor.filtered-rmhomop.GIREMI.list.txt |
Hi John, I found there are no any known SNP in your list. Best, 2016-11-09 14:40 GMT-07:00 John Ma notifications@github.com:
|
@zhqingit: |
Yes, giremi needs these known SNP information to create the MI Best, 2016-11-10 7:11 GMT-07:00 John Ma notifications@github.com:
|
@zhqingit: Spiking in dbSNP variants does complete the program. Thanks! |
I'm running GIREMI with a list of 2,233 locations, with the first 10 lines as follows:
chr1 135124 135125 Inte 0 #
chr1 944681 944682 NOC2L 0 -
chr1 1296622 1296623 ACAP3 0 -
chr1 1635542 1635543 CDK11B 0 -
chr1 1825442 1825443 GNB1 0 -
chr1 6185501 6185502 RPL22 0 -
chr1 6223953 6223954 ICMT 0 -
chr1 6646402 6646403 DNAJC11 0 -
chr1 7970927 7970928 PARK7 0 +
chr1 7985013 7985014 PARK7 0 +
The commandline output was something like this:
[mpileup] 1 samples in 1 input files
coor:135125 chr:chr1coor:944682 chr:chr1[long list of locations with no newlines]
Calculating the MI values...
meanMI:-nan sdMI:-nan
Estimating the allelic ratios...
R CMD BATCH --no-save --no-restore '--args finput="/scratch/lym_myl_rsch/mma/Biovest/RNA-seq/RNAEditingSites/GIREMI/1112961-Tumor/1112961-Tumor-GIREMI.out-2.txt" fout="/scratch/lym_myl_rsch/mma/Biovest/RNA-seq/RNAEditingSites/GIREMI/1112961-Tumor/1112961-Tumor-GIREMI.out-2.txt.res"' giremi.r /scratch/lym_myl_rsch/mma/Biovest/RNA-seq/RNAEditingSites/GIREMI/1112961-Tumor/1112961-Tumor-GIREMI.out-2.txt.Rout
Analysis DONE!
GIREMI failed to run the R script, and reviewing the intermediate output, it looks like samtools was able to do its job as the allelic depths were counted corrected, but the stats were strange. ifRNAE doesn't exist, the GLM stats are all zero, and while raw MI looks fine, ifMI, pvalue_MI, and allelic ratio were off:
chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar ifneg iflm lmp type A C G T
chr1 135125 # 0 Inte C C G T 89 89 1 -1 -1 -1.00E+00 0.5 0 0 0 CT 0 0 0 89
chr1 944682 - 0 NOC2L A C G A 92 95 0.968421 2 0.064177 0.00E+00 0.5 0 0 0 TC 92 0 3 0
chr1 1296623 - 0 ACAP3 C G G G 16 18 0.888889 -1 -1 -1.00E+00 0.5 0 0 0 GC 0 2 16 0
chr1 1635543 - 0 CDK11B T G C C 39 39 1 2 0 0.00E+00 0.5 0 0 0 AG 0 39 0 0
chr1 1825443 - 0 GNB1 A A G A 288 290 0.993103 -1 -1 -1.00E+00 0.5 0 0 0 TC 288 0 2 0
chr1 6185502 - 0 RPL22 A G G A 248 250 0.992 2 0.016064 0.00E+00 0.5 0 0 0 TC 248 0 2 0
chr1 6223954 - 0 ICMT A C A A 99 102 0.970588 2 0.059706 0.00E+00 0.5 0 0 0 TC 99 0 3 0
chr1 6646403 - 0 DNAJC11 T G G T 6 9 0.666667 -1 -1 -1.00E+00 0.5 0 0 0 AC 0 0 3 6
chr1 7970928 + 0 PARK7 A G A A 282 283 0.996466 -1 -1 -1.00E+00 0.5 0 0 0 AG 282 0 1 0
chr1 7985014 + 0 PARK7 T G G T 278 292 0.952055 -1 -1 -1.00E+00 0.5 0 0 0 TC 3 7 4 278
I would appreciate any assistance.
PS: I can confirm the chromosome names are fine, and since I used the same index files for GATK, I take that to mean it wasn't corrupted.
The text was updated successfully, but these errors were encountered: