Skip to content

Conversation

@hyanwong
Copy link
Member

@hyanwong hyanwong commented May 6, 2020

This follows on from #570 and picks up the idea of creating an SVG group for each mutation too (see #570 (comment) for justification).

It seems like this is a good idea to me. The main thing to sort out is how we assign classes to the groups. At the moment I assign the parent node id (or "root") plus the child node id to all groups, including mutation groups. Mutation groups then have an additional 'mut" class and the mutation number and site number, whereas node groups have a "node" class. An example would be something like this (assuming an example with mutation 0 over a root node id 8 with 2 daughters, id 4 and 5)

<g class="tree t0">
  <g class="root n8 mut m0 s0">  <!-- a mutation group identifying mutation 0 over node 8 -->  
    <g class="root n8 node">   <!-- a node group identifying node 8-->
      <g class="p8 n4 mut m1 s1">  <!-- a mutation group identifying mutation 1 over child node 4 -->
      ... further mutation or node groups, including the node group for node 4 and descendants
      </g>
      <g class="p8 n5 mut m2 s2">  <!-- a mutation group identifying mutation 2 over child node 5 -->
      ... further mutation or node groups, including the node group for node 5 and descendants
      </g>
      <path class="edge" ...> <!-- the line joining the root node to mutation 0 -->
      <circle ...> <!-- the root node symbol, doesn't strictly need a class, as `edge + *` will do -->
      <text ...> <!-- the root node label, always a text node -->
    </g>
    <path class="edge" ...> <!-- the line above mutation 0 -->
    <rect ...> <!-- the mutation symbol for mutation 0 -->
    <text ...> <!-- the label for mutation 0, always a text node -->
  </g> <!-- close mutation group -->
</g> <!-- close tree group -->

This is all quite complicated to describe, but logical. I think it allows us to target anything we might want, but it could be easier to explain if, for instance, the circles and rects are given a class "symbol", rather than targetted because they always follow an element of class "edge".

Probably easiest if I discuss this with someone (e.g. @benjeffery ) on a Jitsi call?

@codecov
Copy link

codecov bot commented May 6, 2020

Codecov Report

Merging #587 into master will decrease coverage by 0.01%.
The diff coverage is 92.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #587      +/-   ##
==========================================
- Coverage   87.31%   87.30%   -0.02%     
==========================================
  Files          22       22              
  Lines       16758    16791      +33     
  Branches     3281     3293      +12     
==========================================
+ Hits        14633    14659      +26     
- Misses       1040     1043       +3     
- Partials     1085     1089       +4     
Flag Coverage Δ
#c_tests 88.51% <92.92%> (-0.03%) ⬇️
#python_c_tests 90.42% <92.92%> (-0.08%) ⬇️
#python_tests 98.80% <92.92%> (-0.23%) ⬇️
Impacted Files Coverage Δ
python/tskit/trees.py 98.18% <ø> (ø)
python/tskit/drawing.py 98.48% <92.92%> (-1.21%) ⬇️
c/tskit/trees.c 90.69% <0.00%> (+0.06%) ⬆️
c/tskit/core.c 90.66% <0.00%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 801ebbc...f365d51. Read the comment docs.

@benjeffery
Copy link
Member

We just had a call where we thought the symbols and labels should have a class as the HTML entity may change and shouldn't be used as a target.

@hyanwong
Copy link
Member Author

hyanwong commented May 8, 2020

I wonder if we should also add the population id and individual id to the node group (perhaps only if these are not tskit.NULL)? The obvious notation would be

<g class="pPOPNUM iIND_NUM ... p8 n4 mut m1 s1">

but pID for population would conflict with pID used to indicate parent ID. My inclination is to change the notation for parent nodes. Perhaps aID for "ancestorID" or 'hID' for "head node"? I previously suggested "mID" for "mother", but "m" is already reserved for mutation ID.

@petrelharp
Copy link
Contributor

I wonder if we should also add the population id and individual id to the node group (perhaps only if these are not tskit.NULL)?

That'd make implementing the .draw() method even easier! Maybe pop<POP ID> and ind<IND ID>?

@hyanwong hyanwong force-pushed the grouped-svg-mutations branch 2 times, most recently from a7d1c89 to bd377ec Compare May 10, 2020 17:53
@jeromekelleher
Copy link
Member

The more semantic info we're including the better, I guess. Seems like there's no harm in it? We can always provide methods to trim out the info later if it turns out to add too much weight.

@hyanwong hyanwong force-pushed the grouped-svg-mutations branch 5 times, most recently from 8b549aa to b3fe811 Compare May 14, 2020 18:44
@hyanwong hyanwong force-pushed the grouped-svg-mutations branch from b3fe811 to ebf67e9 Compare May 15, 2020 17:09
@hyanwong
Copy link
Member Author

hyanwong commented May 15, 2020

I've rebased, implemented populations and individuals for nodes ("p<POP_ID>" and "i<IND_ID>", with parent nodes being "a<PARENT_NODE_ID>" - other alternative letters welcome, although given the number of times these are repeated, I've stuck to single letters). So this is also ready for review now - the final piece of my SVG jigsaw!

For discussion is Peter's suggestion about whether to by default create a UID for the SVG, to stop user-generated styles spilling into different SVGs in the same document (which I agree is very unexpected, and can play havoc with custom SVGs in Jupyter notebooks). However, I can't think of an elegant way to do this automatically: the easiest way is to require the user to specify an ID in the style specification, which would mean they would have to extract any autogenerated ID anyway, unless we defined some sort of placeholder. Alternatively, we could require the style specifications to be an array of strings, and we automatically prefix each string with "#UID " when writing to the SVG file, but messing with the user-supplied styles like this seems a little wrong.

Finally, I was also thinking that it would be useful to embed the number of mutations, sites, nodes, and edges per tree as metadata into the SVG somehow, as these are useful for animation. However, the most obvious way to add them as data-* attributes is SVG2 only, and not supported by svgwrite. We could possible define our own namespace for this (see the top answer at https://stackoverflow.com/questions/15532371/do-svg-docs-support-custom-data-attributes) but that seems quite heavyweight. Perhaps @benjeffery has a better solution?

When all seems OK, I'll write some docs and amend the commit with those.

@benjeffery
Copy link
Member

benjeffery commented May 15, 2020

For the UID issue - I don't think we want to be too clever here with re-writing styles etc. I think a note in the docs to add a UID to both your styles and top-level attributes is fine.

All browsers support data attributes: https://caniuse.com/#feat=mdn-svg_attributes_data and svgwrite can be persuaded to support any attribute you like:

import svgwrite
svgwrite.data.full11.elements['g'].valid_attributes = frozenset(
    list(svgwrite.data.full11.elements['g'].valid_attributes) + 
    ['data-whatever']
)
svgwrite.data.full11.attributes['data-whatever'] = svgwrite.data.types.SVGAttribute(
    'data-whatever', anim=False, types=frozenset(['number']), const=[]
)
d = svgwrite.Drawing()
d.add(d.g(data_whatever=5))

You need the first two lines just after the import.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good generally, but I haven't gone deeply into the SVG logic. We'll need to make sure we have good test coverage on all the branches.

hyanwong added 2 commits May 26, 2020 15:52
and set "H 0" on branches to allow nice SPR animations
@hyanwong
Copy link
Member Author

I'm not paying much attention to this for the next week or two, but just to note that the main point of this PR, which assigns part of the line that represents an edge to a mut SVG group, means that when you try to animate an SPR between two trees, it's hard to find corresponding edges to transform, because the single line representing an edge may be broken down into pieces.

However, I think it might be possible to have the best of both worlds here, by drawing the line from a true node (not a mutation) upwards through the mutations and back to the previous node. I've coded this up in f365d51 but it will need a little thought/discussion, hopefully in a few weeks time.

@jeromekelleher
Copy link
Member

Hi @hyanwong, we're hoping to get 0.3.0 out in a couple of weeks and we should get this finished first, I guess? How much is left to do here?

@hyanwong
Copy link
Member Author

hyanwong commented Jul 8, 2020

So I stopped developing this @jeromekelleher because I realised that it might make our lives more difficult if we want to map one tree to an adjacent tree in the tree sequence. With the "mutation makes an SVG group" idea, we no longer maintain some identical groups when moving to a new tree, so e.g. animating SPRs becomes more problematic.

I think we need some discussion on the right approach here. @benjeffery normally has useful things to say.

@jeromekelleher
Copy link
Member

I think it's probably OK to leave this go until the next release, unless you think there are backwards incompatible changes that will need to be made later to accommodate mutations? Do you foresee the structure of the SVG changing?

@hyanwong
Copy link
Member Author

hyanwong commented Jul 8, 2020

Well, if we want to create a new subtree group under each mutation (i.e. as implemented in this PR) then yes, the SVG structure will be quite different. Sorry

@jeromekelleher
Copy link
Member

Well, if we want to create a new subtree group under each mutation (i.e. as implemented in this PR) then yes, the SVG structure will be quite different. Sorry.

Hmmm. Well, I'm going to make this your call then. We do want to ship in the next few weeks, so it would be good to have a think about it.

We could version the SVG output, so at least we could tell users about this, and this would let us make changes without having to worry too much about compatability?

@hyanwong hyanwong mentioned this pull request Aug 22, 2020
@hyanwong hyanwong marked this pull request as draft August 22, 2020 23:58
@hyanwong
Copy link
Member Author

hyanwong commented Aug 22, 2020

Converted to draft PR, as I'm not sure that creating groups for each mutation will allow simple SPR comparison / animation.

Some of the included improvements are now in #790

@jeromekelleher jeromekelleher added this to the Python 0.3.1 milestone Aug 24, 2020
@jeromekelleher
Copy link
Member

I've tagged this for 0.3.1, as it's pretty clear we're not going to get it sorted for 0.3.0. I think we'll just have to mark this SVG structure as "in development" and warn people to expect breakage if they depend on the structure.

@hyanwong
Copy link
Member Author

Yes - I think we might want not to implement SVG groups for mutations anyway. I'm debating whether to close this issue and try a new approach. We should probably have a more general discussion some time.

@benjeffery benjeffery changed the base branch from master to main September 28, 2020 12:12
@hyanwong
Copy link
Member Author

hyanwong commented Nov 3, 2020

Closing for the time being, as we are unsure if grouped mutations will be quite as flexible as we had hoped

@hyanwong hyanwong closed this Nov 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants