Skip to content

Conversation

@hyanwong
Copy link
Member

@hyanwong hyanwong commented Mar 9, 2021

Description

Adds the possibility to put a Y axis & scale onto SVG plots. The axis-adding code is generalised, so it is available to both plots of single trees and whole tree sequences.

Draft at the moment, to solicit feedback

Fixes #580 and #840

PR Checklist:

  • Tests that fully cover new/changed functionality.
  • Documentation including tutorial content if appropriate.
  • Changelogs, if there are API changes.

@hyanwong
Copy link
Member Author

hyanwong commented Mar 9, 2021

Here's an example:

Screenshot 2021-03-09 at 18 09 17

And one with some user-specified ticks and gridlines:

Screenshot 2021-03-09 at 18 09 58

And here's a single tree with both axes:

Screenshot 2021-03-09 at 18 11 36

@codecov
Copy link

codecov bot commented Mar 9, 2021

Codecov Report

Merging #1236 (da2cfc2) into main (2434cb6) will increase coverage by 0.03%.
The diff coverage is 98.04%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1236      +/-   ##
==========================================
+ Coverage   93.70%   93.74%   +0.03%     
==========================================
  Files          26       26              
  Lines       21599    21781     +182     
  Branches      909      963      +54     
==========================================
+ Hits        20239    20418     +179     
  Misses       1321     1321              
- Partials       39       42       +3     
Flag Coverage Δ
c-tests 92.47% <ø> (ø)
lwt-tests 92.97% <ø> (ø)
python-c-tests 94.96% <98.04%> (+0.05%) ⬆️
python-tests 98.61% <98.04%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/tskit/trees.py 97.42% <ø> (ø)
python/tskit/drawing.py 98.84% <98.04%> (-0.14%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2434cb6...da2cfc2. Read the comment docs.

@hyanwong hyanwong force-pushed the y-axis-svg branch 3 times, most recently from f75830a to 2e4ddf2 Compare March 10, 2021 00:00
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I've not gone through the logic in detail, ping me when you'd like some more feedback.

@hyanwong
Copy link
Member Author

hyanwong commented Mar 10, 2021

Looks great. I've not gone through the logic in detail, ping me when you'd like some more feedback.

Thanks @jeromekelleher - I think I've covered most of this now, so some more detailed feedback would be good. If you flip through the SVG files in tests/data/svg/ (e.g. using a web browser) you should be able to see a number of examples of how it works. I think the main thing to decide is the interface (which is why it's not yet documented).

Re interface, I've tried to retain current behaviour. So tree.draw_svg() does the same by default - i.e. is plotted without X or Y axes. But if you call tree.draw_svg(x_label="Pos") then it adds the entire x-axis, and similarly with tree.draw_svg(y_label="Time") (they can be combined).

For a tree sequence, the x axis is on by default (and can't be turned off), so that ts.draw_svg(x_label="Pos") simply adds the axis label. OTOH, ts.draw_svg(y_label="Time") (as with tree-specific version) actually turns the whole axis on.

The other two parameters I've added are y_ticks and y_gridlines, which have similarly "bespoke" behaviour. For both trees and tree sequences, the default, y_ticks=None sets the ticks as the node positions in the tree or the entire ts respectively. This can often look a bit ugly, as the ticks will be clustered near the tips and usually overlap, but it's a easy default (it should look a bit nicer when using a log scale). You can also provide an iterable to y_ticks, to place ticks wherever you want, e.g. evenly spaced. But I haven't implemented a "tick locator" function that will find a nice set of (say) 8 ticks on a linear or log scale: If you call y_ticks=8 it simply returns a NotImplementedError.

Gridlines can be force shown by y_gridlines=True or force hidden by y_gridlines=False. The default y_gridlines=None shows gridlines only if a user-specified set of y_ticks has been given. If y_ticks is the default (a tick for each unique node height), then gridlines are not shown by default, because the tick positions are likely to be a bit crappy.

@hyanwong hyanwong force-pushed the y-axis-svg branch 3 times, most recently from 65030bb to acc4d27 Compare March 10, 2021 12:53
@hyanwong hyanwong marked this pull request as ready for review March 10, 2021 13:18
@hyanwong hyanwong force-pushed the y-axis-svg branch 3 times, most recently from 0734546 to d729ae1 Compare March 11, 2021 10:14
@benjeffery
Copy link
Member

Looks great Yan, ping me when you want a full review.

@hyanwong
Copy link
Member Author

Looks great Yan, ping me when you want a full review.

It's ready for review whenever you have time. I think I've covered all the code lines now.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, but I'm not too keen on the current interface. I think we're trying to do too much with too few parameters. How about

x_axis: bool # Whether to show x axis; None for default
y_axis: bool # Whether to show y axis; None for default.
x_label: str # The x axis label; None for default. If x_axis is shown, default is "Genome", otherwise nothing
y_label: str # The y axis label; None for default. If y_axis is shown, default depends on value of time scale, otherwise nothing.
y_gridlines: bool. Whether to plot gridlines. Default is False. (Overly complex making default depend on y_ticks)

So, we have a set of relatively independent params. For example, you can, if you want, set a Y-axis label without a drawn y-axis (which you might want to do). By overloading too much functionality into label parameters I think we'd be (a) making for code that's hard to read and (b) painting ourselves into a corner in terms of flexibility.

What do you think? We don't have to support x_axis=False for TreeSequence, if that's awkward.

@hyanwong
Copy link
Member Author

Looks great, but I'm not too keen on the current interface. I think we're trying to do too much with too few parameters.

Yep, I'm completely agnostic about the API. Happy to go with your suggestion below. I don't think we should default to a standard axis label text though, as that opens us up to internationalisation issues (and we haven't had it before anyway).

x_axis: bool # Whether to show x axis; None for default
y_axis: bool # Whether to show y axis; None for default.
x_label: str # The x axis label; None for default. If x_axis is shown, default is "Genome", otherwise nothing
y_label: str # The y axis label; None for default. If y_axis is shown, default depends on value of time scale, otherwise nothing.
y_gridlines: bool. Whether to plot gridlines. Default is False. (Overly complex making default depend on y_ticks)

What do you think? We don't have to support x_axis=False for TreeSequence, if that's awkward.

I think it should be relatively easy to support x_axis=False for TreeSequence, but I assume we would keep the defaults being that x_axis=None is equivalent to x_axis=True for a TreeSequence but x_axis=False for a Tree?

@jeromekelleher
Copy link
Member

I don't think we should default to a standard axis label text though, as that opens us up to internationalisation issues (and we haven't had it before anyway).

I think we can break with the old default here, it's generally handy to know that "genome position" (or something) is what the x axis refers to. Definitely for the Y axis having a good default of "Node rank" or "Time ago" would be really helpful, and is what you want most of the time. Unlabelled axes are one of the biggest mistakes made in data visualisation, so we should do the right thing by default here I think.

I wouldn't worry about i8n - we can deal with that if it becomes an issue.

I think it should be relatively easy to support x_axis=False for TreeSequence, but I assume we would keep the defaults being that x_axis=None is equivalent to x_axis=True for a TreeSequence but x_axis=False for a Tree?

yes, I think that's the right default.

@hyanwong
Copy link
Member Author

hyanwong commented Mar 11, 2021

I think we can break with the old default here, it's generally handy to know that "genome position" (or something) is what the x axis refers to. Definitely for the Y axis having a good default of "Node rank" or "Time ago" would be really helpful, and is what you want most of the time. Unlabelled axes are one of the biggest mistakes made in data visualisation, so we should do the right thing by default here I think.

If we are creating a default label, I would like it to reflect #495. When that's solved, I think the default should change from "Time" to "Time (generations)" or "Time (years)", or "Time (days)" or whatever. That's another reason I was putting off this decision. But I guess we'd be happy for the defaults to change like this in the future?

Also, I assume we don't put "Log time" because the actual ticks will be labelled as (untransformed) values.

I wouldn't worry about i8n - we can deal with that if it becomes an issue.

OK, your call. It just makes me wary.

Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, as usual viz code is a ton of work! Just a few comments.

@hyanwong hyanwong force-pushed the y-axis-svg branch 3 times, most recently from dec55d4 to e5402b0 Compare March 11, 2021 18:24
@hyanwong
Copy link
Member Author

OK, it took a fair bit or re-ordering, but I think the new interface with y_axis=True, etc is working now. y_axis and y_label can be used independently, so you can add an axis label without the axis itself, although I can't imagine that being terribly popular:

Screenshot 2021-03-11 at 18 27 02

As before, have a browse through the files in tests/data/svg to see if the format looks right. We can use some of these files (or the general format) in a viz tutorial, I guess.

I think this is ready for a final review?

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Some minor comments on the code and interface, but basically ready to merge I think.

Looks like you need a few more permutations of x_axis=[True, False], y_axis=[True, False] to bump the test coverage up (looks like the tests are only covering one side of a branch in some bases)

@hyanwong
Copy link
Member Author

Looks like you need a few more permutations of x_axis=[True, False], y_axis=[True, False] to bump the test coverage up (looks like the tests are only covering one side of a branch in some bases)

Just committed this combinatorial monster:

@pytest.mark.parametrize("x_axis", (True, False))
@pytest.mark.parametrize("y_axis", (True, False))
@pytest.mark.parametrize("x_label", (True, False))
@pytest.mark.parametrize("y_label", (True, False))
@pytest.mark.parametrize("tree_height_scale", ("rank", "time", "log_time"))
@pytest.mark.parametrize("y_ticks", ([], [0, 1], None))
@pytest.mark.parametrize("y_gridlines", (True, False))
def test_draw_svg_parameter_combos(...)

Is that excessive @jeromekelleher ? It still runs in reasonable time on my machine.

iter(ticks)
return ticks
except TypeError:
raise NotImplementedError("Autocalculated tick mark locations not implemented.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still the wrong error class and message. Just return ticks and let the standard errors do the work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. I did change that, and it's not what's now on my branch: https://github.com/hyanwong/tskit/blob/2382583b1f72de16646637aec9255c5bb0c01744/python/tskit/drawing.py#L125. Not sure what's up here @jeromekelleher

@jeromekelleher
Copy link
Member

Is that excessive @jeromekelleher ? It still runs in reasonable time on my machine.

... yes. How many tests is this? We don't need the cross product of all these things I think, maybe just a few specific combos where the parameters interact.

@hyanwong
Copy link
Member Author

Is that excessive @jeromekelleher ? It still runs in reasonable time on my machine.

... yes. How many tests is this? We don't need the cross product of all these things I think, maybe just a few specific combos where the parameters interact.

2^5 * 3^2 = 288. Yes, I agree it seems excessive.

@hyanwong
Copy link
Member Author

hyanwong commented Mar 13, 2021

I removed the x_axis/label specs and log_time, and now it's only 32 combos, which I deem acceptable. Also the re-push seems to have properly updated the PR, so now the NotImplementedError is gone. Note however, that I think the Python docs say that NotImplementedError need not just be used for base classes lacking a method, but can also be used "while the class is being developed to indicate that the real implementation still needs to be added." (which I think covers my use case). It's gone anyway - this is just a ref for my future self.

I guess this can be merged - do you want me to squash the last commit into the previous one @jeromekelleher ?

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mergify mergify bot merged commit 94e2b94 into tskit-dev:main Mar 13, 2021
@hyanwong hyanwong deleted the y-axis-svg branch March 17, 2021 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Drawing] Add Y axis ticks to TreeSequence.draw_svg()

3 participants