Skip to content

Conversation

@petrelharp
Copy link
Contributor

Will close #1120; see discussion there; just getting things started.

In particular, I like the idea of having keep_unary=(True|False|"individuals"), though - seems appropriate, given that keep_unary_individuals should give a strict subset of keep_unary, and extensible. Were I to do this, I'd keep the low-level interface the same, and just do the translation in the python TableCollection.simplify code.

@petrelharp petrelharp marked this pull request as draft January 29, 2021 23:47
@jeromekelleher
Copy link
Member

Yeah, proposal seems good to me @petrelharp. We can see how keep_unary=whatever can be made more complex over time. The only potential headache is that people might have been using Truthy values before, but if we do something like

keep_unary_flag = False
keep_unary_in_individuals = False
if keep_unary == "individuals":
    keep_unary_in_individuals = True
else:
    keep_unary_flag = keep_unary

That should be safe enough?

@petrelharp
Copy link
Contributor Author

Sounds good. I think I'm handing this over to @hyanwong though; LMK if you'd like me to have a go at something (it'll be later in the week, though).

@hyanwong
Copy link
Member

hyanwong commented Feb 1, 2021

I can easily implement this calling syntax, and the simplify fix. It's the forward time simulations for the testing that I'm dreading. FWIW, my test code in SLiM simply slices through a Wright-Fisher TS at a fixed time, allocating individuals to all the nodes, then, post-simplify(keep_unary_in_individuals), I can check that each lineage at that time has a node:

    def num_lineages_at_time(self, ts, focal_time, pos):
        edges = ts.tables.edges
        times = ts.tables.nodes.time
        return np.sum(np.logical_and.reduce((
            times[edges.parent] > focal_time,
            times[edges.child] <= focal_time,
            edges.left <= pos,
            edges.right > pos,
        )))

    for tree in ts.trees():
        assert self.num_lineages_at_time(ts, focal_time, tree.interval.left) == len([
            n for n in tree.nodes()
            if ts.node(n).time==focal_time)
        ])

@hyanwong
Copy link
Member

hyanwong commented Feb 1, 2021

As I mentioned in #1120 I think the main extra extension would be to allow a bit flag specifying the nodes with matching flags to keep if unary. I'm not sure how this would be incorporated into the scheme above that @petrelharp suggests, although I'm sure it would be possible somehow.

@petrelharp
Copy link
Contributor Author

Well, let's see. How about

  1. modify tsutil.insert_individuals to also allow inserting of random individuals for non-sample nodes
  2. insert the originall node IDs into metadata, to make it easy to tell which nodes have been kept
  3. write a verify_simplify_keeps_unary vaguely along the lines of test_wright_fisher.verify_simplify, that picks a few trees, and iterates up the branches of each from each of the samples, checking that nodes are retained iff they are coalescences or have individuals

I can't jump in on this at the moment (but have time to write this up while my MCMCs are running).

@hyanwong
Copy link
Member

hyanwong commented Feb 2, 2021

We can see how keep_unary=whatever can be made more complex over time ... do something like

keep_unary_flag = False
keep_unary_in_individuals = False
if keep_unary == "individuals":
    keep_unary_in_individuals = True
else:
    keep_unary_flag = keep_unary

Should we have some string values that are the equivalent of True and False, so that people can move to using strings. For example, the string values could be "all", "in_individuals", "". I guess we would also want the default to be None, although it would do the same as False?

@hyanwong
Copy link
Member

hyanwong commented Feb 2, 2021

Also, I assume I should add this in to the python implementation of simplify in tests/simplify.py?

@hyanwong
Copy link
Member

hyanwong commented Feb 2, 2021

And I just remembered another options we might want for unary nodes (this is, I think, what we want in tsinfer), which is to keep unary nodes only if they are not unary somewhere else in the tree sequence. In other works, delete unary nodes in the ancestry only if they are unary everywhere.

@hyanwong
Copy link
Member

Closed in favour of #1190

@hyanwong hyanwong closed this Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose keep_unary_in_individuals functionality to Python

3 participants