-
Notifications
You must be signed in to change notification settings - Fork 78
Remove __iter__ protocol from TableCollection #694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I agree that a dict is more natural if we want to return names and tables, and it's trivial to iterate over the dict, especially now that order is guaranteed (as of python 3.7). I don't think this should affect my code, and if it does, it should be easy to change. But in fact, when I iterate over all the tables, I usually don't actually need the names: when I want to check the type of table, I just check whether the class is e.g. tskit.ProvenanceTable, which seems less fragile than doing a string comparison. So I'm wondering if we should have a method that returns the tables in an array or list instead, and simply drop the table names (but perhaps this is the point of an iterator!)? OTOH it is useful to have a function which returns the names of the tables, and the Re naming, it does seem weird to have the tree sequence method called "tables_dict" and the tables method to be "name_map". I see that we already have an |
|
Sorry - following my thought above to the natural conclusion. Why not just implement |
I started out with them both being called |
b14e3b9 to
ae58bc3
Compare
Well there's metadata and stuff as well. If we're happy with us viewing the TableCollection as a dict-like mapping table names to tables, then yeah, we should definitely just do that. I'm uncertain about whether dropping sequence_length and metadata is the correct thing to do - but I'm 90% leaning towards just implementing |
I was really meaning that it's simpler to just return the list of tables, if you don't actually need the names at all. I guess this ties in with the point below. I agree about dropping the ts form if we do keep a dict-like mapping function, though. |
ISTR I just wanted a way to check equality without checking provenance. It occurs to me that we could do this even if the items returned by |
Codecov Report
@@ Coverage Diff @@
## master #694 +/- ##
==========================================
- Coverage 87.70% 87.69% -0.01%
==========================================
Files 24 24
Lines 19328 19323 -5
Branches 3618 3618
==========================================
- Hits 16951 16946 -5
Misses 1292 1292
Partials 1085 1085
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
This is a pretty niche operation, and I don't think we should bias the design of these really high-level operations just to make it a tiny bit more convenient. If we do implement a dictionary or sequence protocol, then we want to be sure that this is generally useful and doesn't lead to nasty gotchas for users. I can't see |
|
I completely agree about not making a special case. It's more to do with whether there's an obvious set of things that a user would expect to iterate over. When I first implemented the I can't see why you would use |
|
My thoughts on this:
As iteration or length is also ambiguous (did you mean the tables or the attributes?) I would error on both. The error would point to using methods analogous to
|
Yes, I like this. Much clearer. Thanks @benjeffery . |
|
Good call @benjeffery, I'll update the PR with a proposal. |
Thanks @jeromekelleher . If we use |
I think |
Hmm. ISWYM. For my use case (which is as you say, not a compelling argument), I'm not sure I want the table names, though? What's the use-case for wanting the names, rather than (say) having a separate Or indeed, could we just rely on [edit - |
|
Sorry - an additional (possibly alternative) thought: should the table names be a property of the table, not so much one returned by the table collection directly? Then if we do want something to iterate over the tables, we use |
ae58bc3 to
bd4ad7e
Compare
|
@benjeffery and I decided to merge this one in the interest of getting the 0.3.0 release done soon. |
Closes #500
This removes the
__iter__protocol from TableCollection and replaces it with two properties on TableCollection and TreeSequence which do the same thing, more directly. This was done for a couple of reasons:__iter__without also implementing__getitem__,__len__etc isn't a great practise. We will probably want to implement__getitem__at some point, and it would probably not be compatible with this definition of iter, so we don't want to be constrained by that.The behaviour was undocumented, so I don't think we're doing to break much code with this change.
The names of the new attributes could probably be improved, so open to ideas there. You could also argue that they should be methods rather than properties - I don't have a strong opinion here.
@hyanwong, I think you're the main user of this - is it going to affect you much?