Skip to content

Conversation

@benjeffery
Copy link
Member

@jeromekelleher Just trying to get my head round this by adding it to the python simplify first. Is it this simple if we aren't worrying about consistency with the genetic info?

@jeromekelleher
Copy link
Member

Er, yes, actually, I think that basically does it. There'll be some subtleties, but that should be the fundamental change we need, since we're already remapping the individual IDs.

What we really want here to test this stuff out is a way of simulating individuals in the simple forward simulator we have. I think we should be able to add an option record_individuals=False option and add an individual here and it should all work itself out, ish.

That way we can get some decent sized test data without having to hand-craft tree sequences.

@benjeffery benjeffery changed the title Add remap to python simplify Remap individual ids after simplification Feb 3, 2021
@tskit-dev tskit-dev deleted a comment from codecov bot Feb 3, 2021
@codecov
Copy link

codecov bot commented Feb 3, 2021

Codecov Report

Merging #1186 (40c4b80) into main (b68abf6) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1186   +/-   ##
=======================================
  Coverage   93.72%   93.72%           
=======================================
  Files          26       26           
  Lines       21507    21511    +4     
  Branches      904      904           
=======================================
+ Hits        20157    20161    +4     
  Misses       1312     1312           
  Partials       38       38           
Flag Coverage Δ
c-tests 92.50% <100.00%> (+<0.01%) ⬆️
lwt-tests 92.97% <ø> (ø)
python-c-tests 94.91% <ø> (ø)
python-tests 98.63% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
c/tskit/tables.c 90.83% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b68abf6...40c4b80. Read the comment docs.

@benjeffery
Copy link
Member Author

What we really want here to test this stuff out is a way of simulating individuals in the simple forward simulator we have. I think we should be able to add an option record_individuals=False option and add an individual here and it should all work itself out, ish.

I've had a stab at that in 0ad085b adding the same level of testing that migrations have - which seems to just be counting the rows.

@jeromekelleher
Copy link
Member

I've had a stab at that in 0ad085b adding the same level of testing that migrations have - which seems to just be counting the rows.

Nice! I'd need to take the code for a spin to be sure I know what's happening, but can't right now (should have started this presentation yesterday...)

@benjeffery
Copy link
Member Author

I've had a stab at that in 0ad085b adding the same level of testing that migrations have - which seems to just be counting the rows.

Nice! I'd need to take the code for a spin to be sure I know what's happening, but can't right now (should have started this presentation yesterday...)

No worries - I'm working on the c side now.

@jeromekelleher
Copy link
Member

Working on it now - will tack in a commit with some updates to the Python code.

@jeromekelleher
Copy link
Member

@benjeffery I pushed a commit with some updates. There was a slight logic bug in the individual ID remapping and I fixed up the tests a bit. The test refactoring could probably be done better, but it's a big improvement over what it was anyway.

@jeromekelleher
Copy link
Member

So the tests to be done are now to see if the identity relationships between nodes and individuals are retained through simplification. The easiest way to do that I think is to bung in some metadata into both, simplify, and then see if every node's individual has the same metadata as the node. Then, for those parents that happen to survive simplification (which probably won't be many, most will be associated with unary nodes), the parent relationships should be the same.

For filter_individuals=False, we should just keep the whole pedigree in place.

@jeromekelleher
Copy link
Member

@hyanwong, this addition to the simulator will make your life easier in #1154.

@benjeffery benjeffery force-pushed the simplify_individuals branch from 14ae365 to e78192c Compare February 5, 2021 14:08
@petrelharp
Copy link
Contributor

This is going to be so much easier to test now that we have metadata!

@benjeffery
Copy link
Member Author

I've added some tests using the forward sim.

@benjeffery
Copy link
Member Author

Ok, there is one last thing to decide here - in #1192 @jeromekelleher suggested simplify should cope with unordered individuals. The current re-referencing code in this PR only works with sorted individuals as it works in the same pass building the new table. It's no biggie to add another pass over the individual table though.

@jeromekelleher
Copy link
Member

@petrelharp, your call here. A second linear pass through the individuals is no biggie IMO, and probably worth not enforcing a full topological sort on the input data for.

@petrelharp
Copy link
Contributor

Sounds good - let's do the second pass. SLiM currently might have individuals before their parents (because the 'remembered' ones come up top of the table, so if a child is remembered and its parent is not...), and we could change that, but it might actually be a can of worms (because (a) if a child gets remembered before a parent, they'd be out of order; or (b) when we output, we stick the current generation on to the individual table, with the remembered indiviiduals up top; to enforce the order in these two situations we'd have to do the topological sort...).

@benjeffery benjeffery force-pushed the simplify_individuals branch 2 times, most recently from fdaa464 to 307226e Compare February 17, 2021 01:15
@benjeffery
Copy link
Member Author

@jeromekelleher I've added tests with shuffled individuals (that triggered tsk_bug_assert), then modified the C to cope. Should be ready to go after a squash.

@benjeffery benjeffery marked this pull request as ready for review February 17, 2021 01:17
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor suggestions to reduce the amount of code.

@benjeffery benjeffery force-pushed the simplify_individuals branch 2 times, most recently from 8ac388f to 9e62d17 Compare February 18, 2021 00:31
@benjeffery
Copy link
Member Author

@jeromekelleher Comments addressed, thanks for pointing out the existing shuffle method, I should have had a look for one!

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mergify mergify bot merged commit 028997b into tskit-dev:main Feb 18, 2021
@benjeffery benjeffery deleted the simplify_individuals branch March 14, 2021 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants