Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explanation of SQuIRE Count output file format #28

Open
Winbuntu opened this issue Apr 5, 2019 · 7 comments
Open

Explanation of SQuIRE Count output file format #28

Winbuntu opened this issue Apr 5, 2019 · 7 comments

Comments

@Winbuntu
Copy link

Winbuntu commented Apr 5, 2019

Hi I wonder if you could add explanations about the format of the files generated by SQuIRE Count. This would be helpful for people who want to perform customized analysis besides differential expression on their own.

I think a good example of format explanations is from StringTie: http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#output

Specifically for me, I am wondering what each column in *_refGenecounts.txt means. For example,

chr17 26138685 26195811 Axin1 11.196273 + 264 NM_001159598,NM_009733

I think 11.196273 is the TPM, and 264 is the read count? And how did you deal with multiple mapped reads that align to genes?

Thanks.

@Winbuntu
Copy link
Author

Winbuntu commented Apr 5, 2019

Also, it would be helpful if you could clarify that each column means in *_TEcounts.txt file.
i.e. if "fpkm" computed using "uniq_counts" , "tot_counts" or "tot_reads"?

@cpacyna
Copy link
Collaborator

cpacyna commented Apr 12, 2019

Hello,

Sorry for the long response time. Thanks for your feedback; we'll work on building documentation of the output files.

For *_refGenecounts.txt, here are the headers:
chr: chromosome of transcription
tx_start: coordinate for start of transcription
tx_stop: coordinate for end of transcription
Gene_ID: gene name
fpkm: fragments per kilobase per million reads
strand: + or - for stranded data
count: read count, computed by transcript length * coverage / readlength
transcript_ID: transcript specifier

We used the base StringTie parameters for gene expression counting (-M .95) for guided assembly.

We're working on making the full *_TEcounts.txt format documentation, but for the time being, fpkm is computed using total counts.

Thanks for your patience as we sort it out! Let us know if you have other questions in the meantime.

Regards,
Chloe

@bvaldebenitom
Copy link

Hi there!

Has this been sorted out? I'm running some analysis on these files, and need a bit of clarification in the *_TEcounts.txt headers.

@MaxwellShih
Copy link

MaxwellShih commented Mar 9, 2020

Hello,
I am interested in getting the TPM (transcripts per kilobase million) for TE, which is lacking in the version I am using. I am trying to calculate the TPM from fpkm. (https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/?fbclid=IwAR0gVbmAYOZzIKWtQJU8SHUbQ5Nhh4NaSIi9ObOr0Shoxplft1ZjAv8qPDg)
But before doing so, I want to make sure the FPKM was calculated from the total count of TE and refGene, right?
Many thanks for developing this awesome tool!

@emattei
Copy link

emattei commented Aug 17, 2020

Hi @MaxwellShih,
I started using SQuiRE recently and I am facing all the problems mentioned in the issues which haven't been corrected yet.
I also would like to use TPM and I thin you should use the read_count field as input for the TPM calculation.
I think you can consider the read_count as the raw_counts.

@MaxwellShih
Copy link

Hello @emattei ,
Thanks for your comments. To calculate TPMs from raw counts, I have to get the gene length information. However, I found it's not easy to get the gene length information, at least for me. So, I just did it from FPKM.

@emattei
Copy link

emattei commented Aug 17, 2020

I see, and in the denominator you used the sum of genes and TEs FPKM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants