-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrayset Subsamples #179
Arrayset Subsamples #179
Conversation
Codecov Report
@@ Coverage Diff @@
## master #179 +/- ##
=========================================
- Coverage 95.55% 95.4% -0.15%
=========================================
Files 71 81 +10
Lines 12542 14252 +1710
Branches 1105 1293 +188
=========================================
+ Hits 11984 13596 +1612
- Misses 362 444 +82
- Partials 196 212 +16
|
This pull request introduces 7 alerts when merging 9a1674f into fe746ab - view on LGTM.com new alerts:
|
Codecov Report
@@ Coverage Diff @@
## master #179 +/- ##
=========================================
+ Coverage 95.55% 96.1% +0.55%
=========================================
Files 71 86 +15
Lines 12542 15252 +2710
Branches 1105 1403 +298
=========================================
+ Hits 11984 14657 +2673
- Misses 362 386 +24
- Partials 196 209 +13
|
bdf6b2e
to
d4225e4
Compare
677271d
to
cc9dbb7
Compare
setup.py
Outdated
from os.path import dirname | ||
from os.path import join | ||
from os.path import splitext | ||
from os.path import basename, join, splitext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should use Path
everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distutils
does not accept Path
-like objects as the source for compiled c extensions (needed for cython)
@@ -64,15 +60,15 @@ class ReaderCheckout(object): | |||
""" | |||
|
|||
def __init__(self, | |||
base_path: os.PathLike, labelenv: lmdb.Environment, | |||
base_path: Path, labelenv: lmdb.Environment, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could be both Path
and str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this is internal to hangar. We expect a Path
object.
click.echo(f'Initialized Arrayset: {aset.name}') | ||
variable_shape=variable_, | ||
contains_subsamples=subsamples_) | ||
click.echo(f'Initialized Arrayset: {aset.arrayset}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO name
was more intuitive than arrayset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but when we update the internal naming from arrayset ->columns
, this is going to have to be more descriptive than name
. I'm actually going to break this NamedTuple
class into column-type
specific NamedTuple
classes. It's going to be messy to refactor, this will just help in the future (it's a temporary name).
…ass. no new tests currenly written
… level of the accessor chain
…lass no longer work, but only because they haven't been updated yet. There is a lot of room for code deduplication here
…_flat.py and aset_nested.py to arrayset_flat.py and arrayset_nested.py in order to have related file names align with arrayset.py file in editors when listing alphabetically
Added slots and removed some unused code. All tests pass.
…s __slots__ classes, added getstate and setstate to arraysets as well
6adbd4e
to
f843091
Compare
f843091
to
d98d7ab
Compare
Motivation and Context
Why is this change required? What problem does it solve?:
This is a large PR which started with the motivation of allowing arraysets to contain subsamples under a common key. Though minimal work was needed for the technical implementation (with esentially no changes made to the hangar core record parsing, history traversal, or tensor storage backends), the integration of the API into the current model proved difficult, which required some major refactoring of what was previously known as
ArraysetDataReader
andArraysetDataWriter
classes.Description
Describe your changes in detail:
Rather than try to combine every possible API method needed by
flat
andnested
arrayset access into a frankenstein monster class, each access convention implements it's own API class methods (fully independent from one another). The appropriate constructors are selected based on theconstains_subsamples
argument ininit_arrayset()
. The argument is recorded in the schema so the correct type can be identified in subsequent checkouts.I'm working on putting together a summary of the API. That will follow shortly.
At the moment, about half the tests for the new
nested
sample container are missing, and I need to re-evaluate some implementation details for how backend file handles are dealt with.Screenshots (if appropriate):
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Is this PR ready for review, or a work in progress?
How Has This Been Tested?
Put an
x
in the boxes that apply:Checklist: