Skip to content

Add "insert sites" method to TableCollection (and possibly TS) #675

@hyanwong

Description

@hyanwong

It might be useful to have an insert_sites method which adds new (empty) sites at user-specified positions and updates the mutations table to keep everything sane? I can imagine this being useful as a prerequisite step for e.g. combining 2 tree sequences (or maybe VCF files) that have accumulated mutations at unique sites, or adding in new sites that have just been reported in an updated sequencing effort, etc. Other existing functions such as map_mutations could then be used to add mutations at any specific (currently empty) site.

It would be useful to sort the sites and return the new ids of the inserted sites, in the same order that the positions were passed in. I imagine it something like this:

def insert_sites(self, positions, ancestral_states, metadata=None):
    if metadata is None:
        self.sites.append_columns(positions, *tskit.pack_bytes(ancestral_states))
    else:
        self.sites.append_columns(
            positions,
            *tskit.pack_bytes(ancestral_states),
            *tskit.pack_bytes(metadata))
        
    self.sort()
    # Positions are required to be unique, so we can use these to identify the new ones
    new_locations = np.searchsorted(self.sites.position[:], positions)
    assert np.all(self.sites.position[:][new_locations] == positions)
    return new_locations

Metadata

Metadata

Assignees

No one assigned

    Labels

    C APIIssue is about the C APIPython APIIssue is about the Python APIenhancementNew feature or requestfutureIssues that are closed as they are not planned in the medium-term, but which are still desirable.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions