Skip to content

Conversation

@jeromekelleher
Copy link
Member

@jeromekelleher jeromekelleher commented Sep 29, 2020

It's pretty useful to have a summary of the number of bytes used by a table collection. This uses a definition based on the total bytes used to encode the data, which seems like as good a definition as any. Trying to capture the actual memory used seems like a fiddly waste of time.

However, this currently doesn't work because of a few limitations:

Closes #54

@jeromekelleher jeromekelleher added this to the Python 0.3.3 milestone Sep 29, 2020
@jeromekelleher jeromekelleher marked this pull request as draft September 29, 2020 17:42
@benjeffery
Copy link
Member

benjeffery commented Nov 2, 2020

_repr_html_ needs nbytes when this is all working.

@benjeffery
Copy link
Member

@jeromekelleher Are you happy for me to pick this up and push commits to this PR?

@jeromekelleher
Copy link
Member Author

Yes please @benjeffery, that would be great, thanks. I think we can define the nbytes as "the total number of bytes required for the values in the dict encoding". I don't think there's any point in including the keys. Therefore, the nbytes for a TreeSequence is the same as its underlying TableCollection.

@AdminBot-tskit
Copy link
Collaborator

📖 Docs for this PR can be previewed here

@codecov
Copy link

codecov bot commented Nov 12, 2020

Codecov Report

Merging #871 (43785e7) into main (667235e) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #871   +/-   ##
=======================================
  Coverage   93.64%   93.65%           
=======================================
  Files          26       26           
  Lines       20680    20696   +16     
  Branches      835      838    +3     
=======================================
+ Hits        19366    19382   +16     
  Misses       1277     1277           
  Partials       37       37           
Flag Coverage Δ
c-tests 92.45% <ø> (ø)
lwt-tests 93.57% <ø> (ø)
python-c-tests 94.83% <100.00%> (+<0.01%) ⬆️
python-tests 98.56% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/tskit/util.py 100.00% <ø> (ø)
python/tskit/tables.py 99.58% <100.00%> (+<0.01%) ⬆️
python/tskit/trees.py 97.35% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 667235e...43785e7. Read the comment docs.

@benjeffery
Copy link
Member

benjeffery commented Nov 12, 2020

@jeromekelleher Think I'm happy with this now. It has a few test cleanups included that I came across. Still needs a squash.

@benjeffery benjeffery marked this pull request as ready for review November 12, 2020 15:51
Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as it is your PR and mergify needs a non-owner approval.

Copy link
Member Author

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! See comment above about pytest patterns, though


- Added ``nbytes`` method to tables, ``TableCollection`` and ``TreeSequence`` which
reports the size in bytes of those objects.
(:user:`jeromekelleher`, :issue:`54`, :pr:`871`)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should add your name here too

left, right = st1.get_interval()
breakpoints.append(right)
self.assertAlmostEqual(left, length)
assert left == approx(length)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a good look at this, and I think it'd be better if we used the pattern x == pytest.approx(y). We're using pytest. for a bunch of other things, and I don't think there's a good reason for making an exception here. It's more obvious to me what it's actually doing as well, when I see x == pytest.approx(y) in isolation (i.e., if I was coming in fresh to this code without reading the imports)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks like something local. Fixed.

@benjeffery
Copy link
Member

Fixed up, rebased and squashed.

@mergify mergify bot merged commit 0c18d7b into tskit-dev:main Nov 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add total_bytes attribute to Table objects

3 participants