Skip to content

Conversation

@benjeffery
Copy link
Member

Stacked on #2169

A proposal for how TreeSequence.variants and the Variants class will look. No docs or any changes to tests yet. Note that the as_bytes option has been removed for now - we can easily reinstate it. The other breaking change is that the genotype array on Variant is now read-only. I think this makes sense, it technically could be writeable though.

@codecov
Copy link

codecov bot commented Apr 1, 2022

Codecov Report

Merging #2172 (277631d) into main (b1bbf82) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 277631d differs from pull request most recent head b061737. Consider uploading reports for the commit b061737 to get more accurate results

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2172   +/-   ##
=======================================
  Coverage   93.28%   93.28%           
=======================================
  Files          27       27           
  Lines       26059    26077   +18     
  Branches     1163     1165    +2     
=======================================
+ Hits        24308    24325   +17     
- Misses       1721     1722    +1     
  Partials       30       30           
Flag Coverage Δ
c-tests 92.20% <ø> (ø)
lwt-tests 89.05% <ø> (ø)
python-c-tests 72.17% <37.50%> (-0.10%) ⬇️
python-tests 98.87% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/_tskitmodule.c 90.94% <100.00%> (-0.01%) ⬇️
python/tskit/trees.py 98.01% <100.00%> (-0.04%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1bbf82...b061737. Read the comment docs.

@benjeffery
Copy link
Member Author

benjeffery commented Apr 1, 2022

Here are some timings for iterating through all the variants in this tree:

╔═════════════════════════╗
║TreeSequence             ║
╠═══════════════╤═════════╣
║Trees          │    98427║
╟───────────────┼─────────╢
║Sequence Length│   100000║
╟───────────────┼─────────╢
║Time Units     │  unknown║
╟───────────────┼─────────╢
║Sample Nodes   │    40000║
╟───────────────┼─────────╢
║Total Size     │230.7 MiB║
╚═══════════════╧═════════╝
╔═══════════╤═══════╤═════════╤════════════╗
║Table      │Rows   │Size     │Has Metadata║
╠═══════════╪═══════╪═════════╪════════════╣
║Edges      │1603176│ 48.9 MiB│          No║
╟───────────┼───────┼─────────┼────────────╢
║Individuals│  20000│546.9 KiB│          No║
╟───────────┼───────┼─────────┼────────────╢
║Migrations │      0│  8 Bytes│          No║
╟───────────┼───────┼─────────┼────────────╢
║Mutations  │4463324│157.5 MiB│          No║
╟───────────┼───────┼─────────┼────────────╢
║Nodes      │ 343258│  9.2 MiB│          No║
╟───────────┼───────┼─────────┼────────────╢
║Populations│      1│224 Bytes│         Yes║
╟───────────┼───────┼─────────┼────────────╢
║Provenances│      2│  1.7 KiB│          No║
╟───────────┼───────┼─────────┼────────────╢
║Sites      │ 100000│  2.4 MiB│          No║
╚═══════════╧═══════╧═════════╧════════════╝

0.4.1: 176s
this branch, default: 223s
this branch, return_variant_copies=False: 213s

Somewhat disappointing, not where the extra time is coming from yet.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM

We should either nuke the low level VariantGenerator object as part of this, or register an issue for it so we don't forget.

@jeromekelleher
Copy link
Member

this branch, default: 223s
this branch, return_variant_copies=False: 213s
Somewhat disappointing, not where the extra time is coming from yet.

Oh, I missed this. Hmm, interesting. Try running through perf record?

@benjeffery benjeffery force-pushed the refactor-python-vargen branch from c34d5bc to a691242 Compare April 4, 2022 14:16
@benjeffery
Copy link
Member Author

this branch, default: 223s
this branch, return_variant_copies=False: 213s
Somewhat disappointing, not where the extra time is coming from yet.

Oh, I missed this. Hmm, interesting. Try running through perf record?

On reflection, I realised this isn't a fair comparison as 0.4.1 had 8bit genotypes. Here are the numbers with only this branch compared to main:
main: 230s
This branch, no copy: 197s (Lower than previously due to changing from kwargs to args for Variant_decode.)

So this means that we've clawed back to much closer to the 8bit genotypes, which is nice.

@benjeffery benjeffery force-pushed the refactor-python-vargen branch from a691242 to 2815fbe Compare April 4, 2022 14:44
@jeromekelleher
Copy link
Member

Aha, great.

@benjeffery benjeffery force-pushed the refactor-python-vargen branch 7 times, most recently from 9d0cbcb to 5b715ea Compare April 4, 2022 23:59
@benjeffery
Copy link
Member Author

@jeromekelleher I think this is ready to go - the breakage in the test suite was more than I expected, mostly due to removing as_bytes.

@benjeffery benjeffery marked this pull request as ready for review April 5, 2022 00:10
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Some minor comments.

that applying a schema to an existing table will no longer necessitate modifying the
existing rows. (:user:`benjeffery`, :issue:`2064`, :pr:`2104`)

- ``tree.mrca`` now takes 2 or more arguments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related, but Tree.mrca isn't a breaking change it's a new feature so may as well move while we're updating.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- ``tree.mrca`` now takes 2 or more arguments.
(:user:`savitakartik`, :issue:`1340`, :pr:`2121`)

- Remove the previously deprecated ``as_bytes`` argument to ``TreeSequence.variants``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's super old, but it would be good to be quantitative about when it was deprecated (version or date, I guess)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing something like this does make this a major version bump, so I guess we should update our milestones accordingly (either 0.5 or 1.0 I guess)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should give some guidance on what people should do as well I suppose. If you do use as_bytes, how do you fix your code now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from the msprime days! 983d969

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we don't need to worry too much then.

I guess the functionality is actually pretty handy though, so I've opened a new issue #2181. We should update this note to tell users to use this method instead. The as_macs method can then use this function too.

@benjeffery
Copy link
Member Author

@jeromekelleher All fixed up - needs a squash before merge.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

See new issue for the proposed replacement for as_bytes

- ``tree.mrca`` now takes 2 or more arguments.
(:user:`savitakartik`, :issue:`1340`, :pr:`2121`)

- Remove the previously deprecated ``as_bytes`` argument to ``TreeSequence.variants``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we don't need to worry too much then.

I guess the functionality is actually pretty handy though, so I've opened a new issue #2181. We should update this note to tell users to use this method instead. The as_macs method can then use this function too.

@benjeffery benjeffery force-pushed the refactor-python-vargen branch from 6ab2ae8 to b061737 Compare April 5, 2022 13:14
@mergify mergify bot merged commit 6743035 into tskit-dev:main Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants