-
Notifications
You must be signed in to change notification settings - Fork 79
Implement sophisticated seq logo algorithms #94
Comments
The most popular and highest cited seq logo implementation is WebLogo of Steven Brenner: http://weblogo.berkeley.edu/logo.cgi Link to the paper: http://weblogo.berkeley.edu/Crooks-2004-GR-WebLogo.pdf From: Seb This issue is a follow-up to #16. Unfortunately the seqlogo doesn't handle this natively. original paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC332411/pdf/nar00204-0153.pdf follow-ups: http://bioinformatics.oxfordjournals.org/content/28/14/1935.long — |
This is now done with the new biojs-stat-seq, globally available under It would also support to use a custom background or calculate the background from the sequences. |
BTW how do we handle gaps in the alignment?
|
Hi Seb, I don't see in https://github.com/greenify/biojs-stat-seqs the information on how you calculate the sequence conservation at each position in the alignment. If you havent done so, please use the formula from the WebLogo paper [1]. The height of each base (or amino acid) per side is then its frequency at a specific position times the sequence conservation at this position.
If we have an alignment: Then the frequency of A at position 1 would be 0.75. The information content (i.e. the sequence conservation) would be 4 and the height of stack A then 3. |
https://github.com/greenify/biojs-stat-seqs/blob/master/lib/index.js#L216
I also have written some tests. trivia
For the identity I ignore gaps. ic
For the sequence logo I draw "conservation per residue scaled" with just omitting the gaps. BackgroundFrequency of the letters
We could offer to calculate the information content against a given background e.g. Uniprot or the alignment itself. |
Comments: We have following sample sequence alignment:
|
Yup but this has nothing to do with the statistics package. The MSA viewer has the task of figuring out whether the user wants to calculate the consensus are already has a consensus sequence (hence two methods).
I can't align sequences (too expensive) and therefore the identity calculation stupidly expects that your sequences are in an optimal alignment (which should be the case).
Information content (aka Entropy) Conservation: Max. Information content - obs. information content
(again I count the gaps as normal letters)
obs. ic / max. ic It is a feature of the biojs-stat-seqs package and might be useful (e.g. barchart, ...) |
Hi Seb, Two issues here:
Tatyana |
I changed the implementation to be consistent with the one from Biopython. Ignored chars like "-" are also ignored in the frequency calculation, so that { A: 0.75, '-': 0.25 }
Why is this an issue? The stat allows to enter any alphabet size. |
|
Fixed 1 & 2 |
count gaps in the seq logo ... |
This issue is a follow-up to #16.
Unfortunately the seqlogo doesn't handle this natively.
(just as a reminder for myself)
original paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC332411/pdf/nar00204-0153.pdf
follow-ups:
http://bioinformatics.oxfordjournals.org/content/28/14/1935.long
http://schneider.ncifcrf.gov/paper/hawaii/latex/paper.pdf
The text was updated successfully, but these errors were encountered: