Skip to content

Conversation

@petrelharp
Copy link
Contributor

As discussed in #1199, the plan is for

  1. subset to not reorder individuals
  2. canonicalise to sort on (num descendants, first referring node)

WIP.

@petrelharp
Copy link
Contributor Author

This would close #1223.

@petrelharp
Copy link
Contributor Author

petrelharp commented Mar 14, 2021

Well, this is bothersome: I get some VCF-related errors:

____________________________________________________________________________ TestRecordsEqual.test_simple_infinite_sites_random_ploidy ____________________________________________________________________________

tests/test_vcf.py:401: in verify
    pysam_records = list(bcf_file)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OSError: unable to parse next record

pysam/libcbcf.pyx:4108: OSError
---------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------
[E::vcf_parse_format] Number of columns at 1:0 does not match the number of samples (4 vs 5)
============================================================================================= short test summary info =============================================================================================
FAILED tests/test_vcf.py::TestRecordsEqual::test_simple_infinite_sites_random_ploidy - OSError: unable to parse next record

that are apparently triggered by having individual parents - removing these two lines from tsutil.insert_random_ploidy_individuals makes the errors go away.

Edit: oh, I see - it's not actually because of the individual parents, it's just because it changed the randomness, and we happened to get a ploidy-zero individual, i.e., one without any nodes.

@codecov
Copy link

codecov bot commented Mar 14, 2021

Codecov Report

Merging #1226 (f48a9ee) into main (fbba717) will decrease coverage by 0.00%.
The diff coverage is 93.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1226      +/-   ##
==========================================
- Coverage   93.76%   93.76%   -0.01%     
==========================================
  Files          26       26              
  Lines       21798    21907     +109     
  Branches      962      962              
==========================================
+ Hits        20439    20541     +102     
- Misses       1318     1325       +7     
  Partials       41       41              
Flag Coverage Δ
c-tests 92.49% <93.28%> (+<0.01%) ⬆️
lwt-tests 92.97% <ø> (ø)
python-c-tests 94.99% <ø> (+<0.01%) ⬆️
python-tests 98.65% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/tskit/metadata.py 98.60% <ø> (ø)
c/tskit/tables.c 90.93% <93.28%> (+0.03%) ⬆️
python/_tskitmodule.c 91.45% <0.00%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fbba717...f48a9ee. Read the comment docs.

@petrelharp petrelharp marked this pull request as ready for review March 14, 2021 05:54
@petrelharp
Copy link
Contributor Author

I think this is ready for review - @benjeffery?

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for working all this out @petrelharp! Some minor comments.

@petrelharp
Copy link
Contributor Author

Codecov is complaining, but not for a good reason, I think.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it, looks great. A couple of minor clarification points on naming things, but good to merge after squash I think.

c/tskit/tables.c Outdated
static int
tsk_table_sorter_sort_individuals(tsk_table_sorter_t *self)
tsk_individual_table_count_descendants(
tsk_individual_table_t *self, tsk_size_t *num_descendants, tsk_id_t *traversal_order)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nitpicking, but we'd normally put any optional arguments last - you'd never pass in NULL for traversal_order, right?

c/tskit/tables.c Outdated
ret = TSK_ERR_NO_MEMORY;
goto out;
}
memset(new_id_map, 0xff, num_individuals * sizeof(tsk_id_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well use sizeof(*new_id_map)) here for consistency with mallocs above

c/tskit/tables.c Outdated

static int
tsk_table_sorter_sort_individuals(tsk_table_sorter_t *self)
tsk_individual_table_count_descendants(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's not just counting descendents though. How about tsk_individual_table_topological_sort? This returns a topological sorting of the individuals and optionally a count of their descendents.

Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant stuff! Once JKs comments are resolved we can merge and release!

assert tables2.mutations.num_rows == 0

def test_doesnt_reorder_individuals(self):
tables = wf.wf_sim(N=5, ngens=5, num_pops=2, seed=123, record_individuals=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Record individuals is now the default since #1238

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought so, but I didn't see it... oh, I need to rebase.

@petrelharp
Copy link
Contributor Author

There's some kind of bug; I'm tracking it down.

):
individuals.parents[i] = ind_map[individuals.parents[i]]
new_parents[i] = ind_map[individuals.parents[i]]
individuals.parents = new_parents
Copy link
Contributor Author

@petrelharp petrelharp Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doh!!! This was the error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah, that looks like my bad, sorry! I thought this would have been caught by the tests. Also a good argument for allowing assignment in the tables api.

def verify_union_consistency(self, tables, other, node_mapping):
ts1 = tsutil.insert_unique_metadata(tables)
ts2 = tsutil.insert_unique_metadata(other, offset=1000000)
for tsk in [False, True]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checking both the tskit union and py_union for consistency; we don't need to do that now that they both pass, I guess - but I did this to find the error in py_union.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great and close #1243

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sweet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To really close that one we should tests mutations and sites too; I've done this and updated.

@petrelharp
Copy link
Contributor Author

I did a much more thorough test for consistency of union; it took longer than expected, but it's a good test. Hopefully everything passes, now?

@benjeffery
Copy link
Member

All looks good, will merge and release tomorrow!

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's merge and ship!

@mergify mergify bot merged commit d911160 into tskit-dev:main Mar 16, 2021
@petrelharp petrelharp deleted the subset_indivs branch March 17, 2021 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants