Normalize based on gene length? #17

transcript · 2018-08-31T16:03:05Z

From Pedro Torres, suggestion to normalize counts based on gene length, as longer genes would have more reads.

Probably is feasible by mapping the database and getting a geneLength number for each gene. This could be plugged in later, likely at the R step, and used to normalize based on gene lengths.

seb951 · 2018-08-31T17:19:39Z

I don't recommend doing that for gene expression analyses based on DESeq2. See this post:
https://bioconductor.org/packages/3.7/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-un-normalized-counts

A gene expression analysis is used to say "These N genes are differentially expressed between group A and B", not to say "Gene X is expressed twice as much as Gene Y in group A". The same should hold for other type of count data such as here.

transcript · 2018-08-31T20:27:58Z

@seb951 Noted. I've pinged Pedro with the link to this issue and invited him to comment on his use case.

transcript added the enhancement label Aug 31, 2018

transcript self-assigned this Aug 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize based on gene length? #17

Normalize based on gene length? #17

transcript commented Aug 31, 2018

seb951 commented Aug 31, 2018

transcript commented Aug 31, 2018

Normalize based on gene length? #17

Normalize based on gene length? #17

Comments

transcript commented Aug 31, 2018

seb951 commented Aug 31, 2018

transcript commented Aug 31, 2018