Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpretation of GIREMI results #28

Open
ma-diroma opened this issue Nov 22, 2016 · 4 comments
Open

Interpretation of GIREMI results #28

ma-diroma opened this issue Nov 22, 2016 · 4 comments

Comments

@ma-diroma
Copy link

Hi,

I used GIREMI to analyze an RNA-seq sample, which was previously processed by GATK to obtain the list of heterozygous SNVs required by your tool.
GIREMI generated 2 different output files: file.res and file.res.res, which differ in some fields

file.res content
chr
coor
strand
ifsnp
gene
refB
upB
downB
majorB
majorN
totN
majorR
ifmi
mi
mip
ar
ifneg
iflm
lmp
type
A
C
G
T

file.res.res content:
chr
coordinate
strand
ifSNP
gene
reference_base
upstream_1base
downstream_1base
major_base
major_count
tot_count
major_ratio
MI
pvalue_MI
estimated_allelic_ratio
ifNEG
RNAE_t
A
C
G
T
ifRNAE

Could you please tell me if file.res is just a temporary file?
Moreover, I see that for some editing sites only 1 base (generally the reference base) was detected, e.g.

chr coordinate strand ifSNP gene reference_base upstream_1base downstream_1base major_base major_count tot_count major_ratio MI pvalue_MI estimated_allelic_ratio ifNEG RNAE_t A C G T ifRNAE
chr1|899917|+ chr1 899917 + 0 KLHL17 A C G A 8 8 1 0 0.002023009 0.9 0 AC 8 0 0 0 1

Could you please tell me why this is identified as an edited site?
Thank you,
Maria Angela

@zhqingit
Copy link
Owner

Hi Angela,

The file.res.res is the final result file.

For these sites, what's the alternative frequencies from the GATK? Could
you also try samtools to see if these sites are the heterozygous SNVs?
Thank you!

Best,
Qing

2016-11-22 9:27 GMT-07:00 Maria Angela Diroma notifications@github.com:

Hi,

I used GIREMI to analyze an RNA-seq sample, which was previously processed
by GATK to obtain the list of heterozygous SNVs required by your tool.
GIREMI generated 2 different output files: file.res and file.res.res,
which differ in some fields

file.res content
chr
coor
strand
ifsnp
gene
refB
upB
downB
majorB
majorN
totN
majorR
ifmi
mi
mip
ar
ifneg
iflm
lmp
type
A
C
G
T

file.res.res content:
chr
coordinate
strand
ifSNP
gene
reference_base
upstream_1base
downstream_1base
major_base
major_count
tot_count
major_ratio
MI
pvalue_MI
estimated_allelic_ratio
ifNEG
RNAE_t
A
C
G
T
ifRNAE

Could you please tell me if file.res is just a temporary file?
Moreover, I see that for some editing sites only 1 base (generally the
reference base) was detected, e.g.

chr coordinate strand ifSNP gene reference_base upstream_1base
downstream_1base major_base major_count tot_count major_ratio MI pvalue_MI
estimated_allelic_ratio ifNEG RNAE_t A C G T ifRNAE
chr1|899917|+ chr1 899917 + 0 KLHL17 A C G A 8 8 1 0 0.002023009 0.9 0 AC
8 0 0 0 1

Could you please tell me why this is identified as an edited site?
Thank you,
Maria Angela


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#28, or mute the thread
https://github.com/notifications/unsubscribe-auth/ALL2zQrjK18HoYBkr-Y5O6WymVXw-iOTks5rAxfjgaJpZM4K5pE5
.

@ma-diroma
Copy link
Author

I had already selected heterozygous SNVs from GATK HaplotypeCaller VCF file, where there is a different count of the alternative base for that position. Allele depth AD=8 for A, which is the reference base (the same count reported in GIREMI output), AD=6 for the alternative base C (0 in GIREMI output)

chr1 899917 . A C 241.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.323;ClippingRankSum=-0.968;DP=14;ExcessHet=3.0103;FS=2.363;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.581;QD=17.27;ReadPosRankSum=-2.905;SOR=0.169 GT:AD:DP:GQ:PL 0/1:8,6:14:99:270,0,2178

Since I used default parameters (-m=5), I expected to see the variant, which is even detected as editing site, but it is pretty weird base count does not correspond to the variant caller results.

Best,
Maria Angela

@zhqingit
Copy link
Owner

zhqingit commented Nov 25, 2016 via email

@ma-diroma
Copy link
Author

Hi Qing,

Yes, I used the same bam file. Thus the different base count may be due to different variant callers. Thanks.
I had another issue with a different alignment file, this time I used GSNAP, followed by GATK variant calling. I removed all homozygous sites from the SNVs list but I obtain only the intermediate out.res file, so I am missing the ifRNAe information.
The only error was found in out.res.Rout:

Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 0, 1
Calls: cbind -> cbind -> data.frame
Esecuzione interrotta

I checked if there were pvalue_MI>0 & pvalue_MI<=0.05 & ifSNP==0 in the intermediate file, it is ok. Could you please suggest what is this new issue?
Please find attached my list of SNVs, out.res and out.res.Rout files.

Thanks for your help,
Maria Angela

giremi_gsnap.err.txt
giremi_gsnap.out.res.Rout.txt
giremi_gsnap.out.res.txt
giremi_gsnap_snvlist.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants