Skip to content

Conversation

@benjeffery
Copy link
Member

@benjeffery benjeffery commented Oct 22, 2020

Description

Adds indexes (optionally) to the LWT interface. Based on #916 as it needs python access to indexes.
Fixes #870

PR Checklist:

  • Tests that fully cover new/changed functionality.
  • Documentation including tutorial content if appropriate.
  • Changelogs, if there are API changes.

@AdminBot-tskit
Copy link
Collaborator

📖 Docs for this PR can be previewed here

@benjeffery benjeffery force-pushed the lwt-index branch 2 times, most recently from bf2772d to d13efbe Compare October 22, 2020 14:22
@benjeffery benjeffery changed the title [WIP] Lwt index Add index to the LWT representation Oct 22, 2020
@benjeffery benjeffery marked this pull request as ready for review October 22, 2020 14:22
@benjeffery
Copy link
Member Author

I recommend merging #916 first then asking mergify to rebase before review.

@codecov
Copy link

codecov bot commented Oct 22, 2020

Codecov Report

Merging #921 into main will decrease coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #921      +/-   ##
==========================================
- Coverage   93.40%   93.40%   -0.01%     
==========================================
  Files          25       25              
  Lines       20335    20357      +22     
  Branches      824      825       +1     
==========================================
+ Hits        18994    19014      +20     
- Misses       1304     1306       +2     
  Partials       37       37              
Impacted Files Coverage Δ
lwt_interface/tskit_lwt_interface.h 94.79% <0.00%> (-0.14%) ⬇️
_tskitmodule.c 90.21% <0.00%> (+<0.01%) ⬆️
tskit/tables.py 99.71% <0.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5626ce8...89295ea. Read the comment docs.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comments.

c/tskit/tables.h Outdated
@endrst
@param self A pointer to a tsk_table_collection_t object.
@param edge_insertion_order Array of tsk_id_t edge ids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full stop missing here and next

}

/* index */
value = get_table_dict_value(tables_dict, "index", false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best to call this "indexes" just for consistency with the C struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency we should also change the python property on TableCollection to indexes but then that is inconsistent with build_index and has_index which would be breaking changes to move to indexes... we may have to live with the python API being index and the c struct being indexes.

@jeromekelleher
Copy link
Member

Need to do some tests where we add bad indexes from the dict encoding. Things like:

  • Negative/out of bounds edge IDs
  • Nonsensical indexes (np.zeros for both, e.g.). What error occurs here when we try to run ts.trees()?

@benjeffery
Copy link
Member Author

Need to do some tests where we add bad indexes from the dict encoding. Things like:

  • Negative/out of bounds edge IDs
  • Nonsensical indexes (np.zeros for both, e.g.). What error occurs here when we try to run ts.trees()?

As it was nothing happened as indexes were always rebuilt on load_tables I'm adding a build_indexes flag to the python API in #916

@benjeffery benjeffery force-pushed the lwt-index branch 3 times, most recently from 6861686 to 89f158c Compare October 27, 2020 11:40
@benjeffery benjeffery marked this pull request as draft October 27, 2020 11:40
@benjeffery
Copy link
Member Author

Almost there, needs a few more tests so setting to draft.

@benjeffery benjeffery force-pushed the lwt-index branch 2 times, most recently from ae950d1 to c8d6d44 Compare October 27, 2020 13:55
@benjeffery benjeffery marked this pull request as ready for review October 27, 2020 13:59
@benjeffery
Copy link
Member Author

@jeromekelleher I think this is ready to go - I've changed the sematics such that TableCollections without indexes now give None instead of a dict with empty arrays. Previously roundtripping a TableCollection without indexes resulted in adding zero-length indexes, which didn't seem right!

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure None is the right encoding for missing indexes here - see comment above.

[0.1.2] - 2020-10-22
--------------------

- Added optional top-level key ``index`` which has contains ``edge_insertion_order`` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9618a3b

goto out;
}

if (!tsk_table_collection_has_index(self->tables, 0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about this. I think it would be logical to return an empty dictionary. That way we're more robust to adding more indexes in the future, where the dictionary may contain the current edge_x indexes, but also maybe any future indexes. So, we might have to deal with the situation where the dictionary contains a variable number of keys - logically then, it makes sense to treat the zero-keys situation in the same way.

Think of tsk_table_collection_has_index(self, 0) as saying "do you have all possible indexes". So, we'd have (roughly)

dict = PyDict_New();
if (tsk_table_collection_has_index(self->tables, 0)) {
    // add key-value pairs for each edge_insertion and edge_removal
}

We can see how this is then extensible to having more indexes in the future - we can just test the existance of the each index on a case-by-case basis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2855dec

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the lwt encoding version is the same as the output of asdict() now? We should probably add a test that these are identical, somewhere.

};

for (j = 0; j < sizeof(table_descs) / sizeof(*table_descs); j++) {
if (strcmp(table_descs[j].name, "indexes") == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do this? I think it'd be better if the dict encoding had the empty dictionary as the output rather than excluding the key. This is what the TableCollection does, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4d81529

@benjeffery benjeffery force-pushed the lwt-index branch 3 times, most recently from c3a2feb to 4d81529 Compare November 2, 2020 14:54
@benjeffery
Copy link
Member Author

@jeromekelleher I've tidied this up and added the test for identical contents in the dicts.

@jeromekelleher
Copy link
Member

LGTM, merge away whenever

@mergify mergify bot merged commit e40dff7 into tskit-dev:main Nov 2, 2020
benjeffery pushed a commit that referenced this pull request Nov 2, 2020
Add index to the LWT representation
@benjeffery benjeffery deleted the lwt-index branch November 10, 2020 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add indexes to the dict encoding

3 participants