Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize based on gene length? #17

Open
transcript opened this issue Aug 31, 2018 · 2 comments
Open

Normalize based on gene length? #17

transcript opened this issue Aug 31, 2018 · 2 comments
Assignees

Comments

@transcript
Copy link
Owner

From Pedro Torres, suggestion to normalize counts based on gene length, as longer genes would have more reads.

Probably is feasible by mapping the database and getting a geneLength number for each gene. This could be plugged in later, likely at the R step, and used to normalize based on gene lengths.

@transcript transcript self-assigned this Aug 31, 2018
@seb951
Copy link
Contributor

seb951 commented Aug 31, 2018

I don't recommend doing that for gene expression analyses based on DESeq2. See this post:
https://bioconductor.org/packages/3.7/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-un-normalized-counts

A gene expression analysis is used to say "These N genes are differentially expressed between group A and B", not to say "Gene X is expressed twice as much as Gene Y in group A". The same should hold for other type of count data such as here.

@transcript
Copy link
Owner Author

@seb951 Noted. I've pinged Pedro with the link to this issue and invited him to comment on his use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants