Skip to content

Releases: tskit-dev/tskit

Python 0.5.6

10 Oct 10:55
Compare
Choose a tag to compare

Breaking Changes

  • tskit now requires Python 3.8, as Python 3.7 became end-of-life on 2023-06-27

Features

Bugfixes

  • Fix incompatibility with jsonschema>4.18.6 which caused
    AttributeError: module jsonschema has no attribute _validators
    (@benjeffery, #2844, #2840)

Python 0.5.5

17 May 20:09
fd72573
Compare
Choose a tag to compare

Performance improvements

  • Methods like ts.at() which seek to a specified position on the sequence from
    a new Tree instance are now much faster (@molpopgen, #2661).

Features

  • Add __repr__ for variants to return a string representation of the raw data
    without spewing megabytes of text (@chriscrsmith, #2695, #2694)

  • Add keep_rows method to table classes to support efficient in-place
    table subsetting (@jeromekelleher, #2700)

Bugfixes

  • Fix UnicodeDecodeError when calling Variant.alleles on the emscripten platform.
    (@benjeffery, #2754, #2737)

C API C_1.1.2

17 May 19:47
fd72573
Compare
Choose a tag to compare

Performance improvements

  • tsk_tree_seek is now much faster at seeking to arbitrary points along
    the sequence from the null tree (@molpopgen, #2661).

Features

  • The struct tsk_treeseq_t now has the variables min_time and max_time,
    which are the minimum and maximum among the node times and mutation times,
    respectively. min_time and max_time can be accessed using the functions
    tsk_treeseq_get_min_time and tsk_treeseq_get_max_time, respectively.
    (@szhan, #2612, #2271)

  • Add the TSK_SIMPLIFY_NO_FILTER_NODES option to simplify to allow unreferenced
    nodes be kept in the output (@jeromekelleher, @hyanwong,
    #2606, #2619).

  • Add the TSK_SIMPLIFY_NO_UPDATE_SAMPLE_FLAGS option to simplify which ensures
    no node sample flags are changed to allow calling code to manage sample status.
    (@jeromekelleher, #2662, #2663).

  • Guarantee that unfiltered tables are not written to unnecessarily
    during simplify (@jeromekelleher #2619).

  • Add x_table_keep_rows methods to provide efficient in-place table subsetting
    (@jeromekelleher, #2700).

  • Add tsk_tree_seek_index function

Python 0.5.4

13 Jan 20:29
Compare
Choose a tag to compare

Features

  • A new Tree.is_root method avoids the need to to search the potentially
    large list of Tree.roots (@hyanwong, #2669, #2620)

  • The TreeSequence object now has the attributes min_time and max_time,
    which are the minimum and maximum among the node times and mutation times,
    respectively. (@szhan, #2612, #2271)

  • The draw_svg methods now have a max_num_trees parameter to truncate
    the total number of trees shown, giving a readable display for tree
    sequences with many trees (@hyanwong, #2652)

  • The draw_svg methods now accept a canvas_size parameter to allow
    extra room on the canvas e.g. for long labels or repositioned graphical
    elements (@hyanwong, #2646, #2645)

  • The Tree object now has the method siblings to get
    the siblings of a node. It returns an empty tuple if the node
    has no siblings, is not a node in the tree, is the virtual root,
    or is an isolated non-sample node.
    (@szhan, #2618, #2616)

  • The msprime.RateMap class has been ported into tskit: functionality should
    be identical to the version in msprime, apart from minor changes in the formatting
    of tabular text output (@hyanwong, @jeromekelleher, #2678)

  • Tskit now supports and has wheels for Python 3.11. This Python version has a significant performance boost. (@benjeffery , #2624 , #2248 )

Breaking Changes

  • the filter_populations, filter_individuals, and filter_sites
    parameters to simplify previously defaulted to True but now default
    to None, which is treated as True. Previously, passing None
    would result in an error. (@hyanwong, #2609, #2608)

Python 0.5.3

03 Oct 18:50
1919abe
Compare
Choose a tag to compare

Fixes

  • The Variant object can now be initialized with 64 bit numpy ints as
    returned e.g. from np.where (@hyanwong, #2518, #2514)

  • Fix tree.mrca for the case of a tree with multiple roots.
    (@benjeffery, #2533, #2521)

Features

  • The ts.nodes method now takes an order parameter so that nodes
    can be visited in time order (@hyanwong, #2471, #2370)

  • Add samples argument to TreeSequence.genotype_matrix.
    Default is None, where all the sample nodes are selected.
    (@szhan, #2493, #678)

  • ts.draw and the draw_svg methods now have an optional omit_sites
    parameter, aiding drawing large trees with many sites and mutations
    (@hyanwong, #2519, #2516)

Breaking Changes

  • Single statistics computed with TreeSequence.general_stat are now
    returned as numpy scalars if windows=None, AND; samples is a single
    list or None (for a 1-way stat), OR indexes is None or a single list of
    length k (instead of a list of length-k lists).
    (@gtsambos, #2417, #2308)

  • Accessor methods such as ts.edge(n) and ts.node(n) now allow negative
    indexes (@hyanwong, #2478, #1008)

  • ts.subset() produces valid tree sequences even if nodes are shuffled
    out of time order (@hyanwong, #2479, #2473), and the
    same for tables.subset() (@hyanwong, #2489). This involves
    sorting the returned tables, potentially changing the returned edge order.

Performance improvements

  • TreeSequence.link_ancestors no longer continues to process edges once all
    of the sample and ancestral nodes have been accounted for, improving memory
    overhead and overall performance
    (@gtsambos, #2456, #2442)

Python 0.5.2

29 Jul 18:27
Compare
Choose a tag to compare

Fixes

Performance improvements

  • TreeSequence.site position search performance greatly improved, with much lower
    memory overhead (@jeromekelleher, #2424).

  • TreeSequence.samples time/population search performance greatly improved, with
    much lower memory overhead (@jeromekelleher, #2424, #1916).

  • The timeasc and timedesc orders for Tree.nodes have much
    improved performance and lower memory overhead
    (@jeromekelleher, #2424, #2423).

Features

  • Variant objects now have a .num_missing attribute and .counts() and
    .frequencies methods (@hyanwong, #2390 #2393).

  • Add the Tree.num_lineages(t) method to return the number of lineages present
    at time t in the tree (@jeromekelleher, #386, #2422)

  • Efficient array access to table data now provided via attributes like
    TreeSequence.nodes_time, etc (@jeromekelleher, #2424).

Breaking Changes

  • Previously, accessing (e.g.) tables.edges returned a different instance of
    EdgeTable each time. This has been changed to return the same instance
    for the lifetime of a given TableCollection instance. This is technically
    a breaking change, although it's difficult to see how code would depend
    on the property that (e.g.) tables.edges is not tables.edges.
    (@jeromekelleher, #2441, #2080).

C API C_1.1.1

29 Jul 18:13
Compare
Choose a tag to compare

Bug fixes

  • Fix segfault in tsk_variant_restricted_copy in tree sequences with large
    numbers of alleles or very long alleles
    (@jeromekelleher, #2437, #2429).

Python 0.5.1

14 Jul 12:08
f41eddc
Compare
Choose a tag to compare

Fixes

  • Copies of a Variant object would cause a segfault when .samples was accessed.
    (@benjeffery, #2400, #2401)

Changes

  • Tables in a table collection can be replaced using the replace_with method
    (@hyanwong, #1489 #2389)

  • SVG drawing routines now return a special string object that is automatically
    rendered in a Jupyter notebook (@hyanwong, #2377)

Features

  • New Site.alleles() method (@hyanwong, #2380, #2385)

  • The variants(), haplotypes() and alignments() methods can now
    take a list of sample ids and a left and right position, to restrict the
    size of the output (@hyanwong, #2092, #2397)

C API 1.1.0

14 Jul 12:07
f41eddc
Compare
Choose a tag to compare

Features

  • Add num_children to tsk_tree_t an array which contains counts of the number of child
    nodes of each node in the tree. (@GertjanBisschop, #2274, #2316)

  • Add edge to tsk_tree_t an array which contains the edge_id of the edge encoding
    the relationship between the child node and its parent for each (child) node in the tree.
    (@GertjanBisschop, #2304, #2340)

Changes

  • Reduce the maximum number of rows in a table by 1. This removes edge cases so that a tsk_id_t can be
    used to count the number of rows. (@benjeffery, #2336, #2337)

  • Samples are now copied by tsk_variant_restricted_copy. (@benjeffery, #2400, #2401)

Python 0.5.0

22 Jun 15:05
Compare
Choose a tag to compare

Major Feature Release

Breaking Changes

  • The JSON metadata codec now interprets the empty string as an empty object. This means
    that applying a schema to an existing table will no longer necessitate modifying the
    existing rows. (@benjeffery, #2064, #2104)

  • Remove the previously deprecated as_bytes argument to TreeSequence.variants.
    If you need genotypes in byte form this can be done following the code in the
    to_macs method on line 5573 of trees.py.
    This argument was initially deprecated more than 3 years ago when the code was part of
    msprime.
    (@benjeffery, #605, #2172)

  • Arguments after ploidy in write_vcf marked as keyword only
    (@jeromekelleher, #2329, #2315).

  • When metadata equal to b'' is printed to text or HTML tables it will render as
    an empty string rather than "b''". (@hyanwong, #2349, #2351)

Changes

  • A min_time parameter in draw_svg enables the youngest node as the y axis min
    value, allowing negative times.
    (@hyanwong, #2197, #2215)

  • VcfWriter.write now prints the site ID of variants in the ID field of the
    output VCF files.
    (@roohy, #2103, #2107)

  • Make dumping of tables and tree sequences to disk a zero-copy operation.
    (@benjeffery, #2111, #2124)

  • Add copy argument to TreeSequence.variants which if False reuses the
    returned Variant object for improved performance. Defaults to True.
    (@benjeffery, #605, #2172)

  • tree.mrca now takes 2 or more arguments and gives the common ancestor of them all.
    (@savitakartik, #1340, #2121)

  • Add a edge attribute to the Mutation class that gives the ID of the
    edge that the mutation falls on.
    (@jeromekelleher, #685, #2279).

  • Add the TreeSequence.split_edges operation which inserts nodes into
    edges at a specific time.
    (@jeromekelleher, #2276, #2296).

  • Add the TreeSequence.decapitate (and closely related
    TableCollection.delete_older) operation to remove topology and mutations
    older than a give time.
    (@jeromekelleher, #2236, #2302, #2331).

  • Add the TreeSequence.individuals_time and TreeSequence.individuals_population
    methods to return arrays of per-individual times and populations, respectively.
    (@petrelharp, #1481, #2298).

  • Add the sample_mask and site_mask to write_vcf to allow parts
    of an output VCF to be omitted or marked as missing data. Also add the
    as_vcf convenience function, to return VCF as a string.
    (@jeromekelleher, #2300).

  • Add support for missing data to write_vcf, and add the isolated_as_missing
    argument. (@jeromekelleher, #2329, #447).

  • Add Tree.num_children_array and Tree.num_children. Returns the counts of
    the number of child nodes for each or a single node in the tree respectively.
    (@GertjanBisschop, #2318, #2319, #2332)

  • Add Tree.path_length.
    (@jeremyguez, #2249, #2259).

  • Add B1 tree balance index.
    (@jeremyguez, @jeromekelleher, #2251, #2281, #2346).

  • Add B2 tree balance index.
    (@jeremyguez, @jeromekelleher, #2252, #2353, #2354).

  • Add Sackin tree imbalance index.
    (@jeremyguez, @jeromekelleher, #2246, #2258).

  • Add Colless tree imbalance index.
    (@jeremyguez, @jeromekelleher, #2250, #2266, #2344).

  • Add direction argument to TreeSequence.edge_diffs, allowing iteration
    over diffs in the reverse direction. NOTE: this comes with a ~10% performance
    regression as the implementation was moved from C to Python for simplicity
    and maintainability. Please open an issue if this affects your application.
    (@jeromekelleher, @benjeffery, #2120).

  • Add Tree.edge_array and Tree.edge. Returns the edge id of the edge encoding
    the relationship of each node with its parent.
    (@GertjanBisschop, #2361, #2357)