Releases: tskit-dev/tskit
Python 0.5.6
Breaking Changes
- tskit now requires Python 3.8, as Python 3.7 became end-of-life on 2023-06-27
Features
-
Tree.trmca
now accepts >2 nodes and returns nicer errors
(@hyanwong, :pr:2808, #2801, #2070, #2611) -
Add
TreeSequence.genetic_relatedness_weighted
stats method.
(@petrelharp, @brieuclehmann, @jeromekelleher,
#2785, #1246) -
Add
TreeSequence.impute_unknown_mutations_time
method to return an
array of mutation times based on the times of associated nodes
(@duncanMR, #2760, #2758) -
Add
asdict
to all dataclasses. These are returned when you access a row or
other tree sequence object. (@benjeffery, #2759, #2719)
Bugfixes
- Fix incompatibility with
jsonschema>4.18.6
which caused
AttributeError: module jsonschema has no attribute _validators
(@benjeffery, #2844, #2840)
Python 0.5.5
Performance improvements
- Methods like ts.at() which seek to a specified position on the sequence from
a new Tree instance are now much faster (@molpopgen, #2661).
Features
-
Add
__repr__
for variants to return a string representation of the raw data
without spewing megabytes of text (@chriscrsmith, #2695, #2694) -
Add
keep_rows
method to table classes to support efficient in-place
table subsetting (@jeromekelleher, #2700)
Bugfixes
- Fix
UnicodeDecodeError
when callingVariant.alleles
on theemscripten
platform.
(@benjeffery, #2754, #2737)
C API C_1.1.2
Performance improvements
- tsk_tree_seek is now much faster at seeking to arbitrary points along
the sequence from the null tree (@molpopgen, #2661).
Features
-
The struct
tsk_treeseq_t
now has the variablesmin_time
andmax_time
,
which are the minimum and maximum among the node times and mutation times,
respectively.min_time
andmax_time
can be accessed using the functions
tsk_treeseq_get_min_time
andtsk_treeseq_get_max_time
, respectively.
(@szhan, #2612, #2271) -
Add the
TSK_SIMPLIFY_NO_FILTER_NODES
option to simplify to allow unreferenced
nodes be kept in the output (@jeromekelleher, @hyanwong,
#2606, #2619). -
Add the
TSK_SIMPLIFY_NO_UPDATE_SAMPLE_FLAGS
option to simplify which ensures
no node sample flags are changed to allow calling code to manage sample status.
(@jeromekelleher, #2662, #2663). -
Guarantee that unfiltered tables are not written to unnecessarily
during simplify (@jeromekelleher #2619). -
Add
x_table_keep_rows
methods to provide efficient in-place table subsetting
(@jeromekelleher, #2700). -
Add
tsk_tree_seek_index
function
Python 0.5.4
Features
-
A new
Tree.is_root
method avoids the need to to search the potentially
large list ofTree.roots
(@hyanwong, #2669, #2620) -
The
TreeSequence
object now has the attributesmin_time
andmax_time
,
which are the minimum and maximum among the node times and mutation times,
respectively. (@szhan, #2612, #2271) -
The
draw_svg
methods now have amax_num_trees
parameter to truncate
the total number of trees shown, giving a readable display for tree
sequences with many trees (@hyanwong, #2652) -
The
draw_svg
methods now accept acanvas_size
parameter to allow
extra room on the canvas e.g. for long labels or repositioned graphical
elements (@hyanwong, #2646, #2645) -
The
Tree
object now has the methodsiblings
to get
the siblings of a node. It returns an empty tuple if the node
has no siblings, is not a node in the tree, is the virtual root,
or is an isolated non-sample node.
(@szhan, #2618, #2616) -
The
msprime.RateMap
class has been ported into tskit: functionality should
be identical to the version in msprime, apart from minor changes in the formatting
of tabular text output (@hyanwong, @jeromekelleher, #2678) -
Tskit now supports and has wheels for Python 3.11. This Python version has a significant performance boost. (@benjeffery , #2624 , #2248 )
Breaking Changes
Python 0.5.3
Fixes
-
The
Variant
object can now be initialized with 64 bit numpy ints as
returned e.g. from np.where (@hyanwong, #2518, #2514) -
Fix
tree.mrca
for the case of a tree with multiple roots.
(@benjeffery, #2533, #2521)
Features
-
The
ts.nodes
method now takes anorder
parameter so that nodes
can be visited in time order (@hyanwong, #2471, #2370) -
Add
samples
argument toTreeSequence.genotype_matrix
.
Default isNone
, where all the sample nodes are selected.
(@szhan, #2493, #678) -
ts.draw
and thedraw_svg
methods now have an optionalomit_sites
parameter, aiding drawing large trees with many sites and mutations
(@hyanwong, #2519, #2516)
Breaking Changes
-
Single statistics computed with
TreeSequence.general_stat
are now
returned as numpy scalars if windows=None, AND; samples is a single
list or None (for a 1-way stat), OR indexes is None or a single list of
length k (instead of a list of length-k lists).
(@gtsambos, #2417, #2308) -
Accessor methods such as ts.edge(n) and ts.node(n) now allow negative
indexes (@hyanwong, #2478, #1008) -
ts.subset()
produces valid tree sequences even if nodes are shuffled
out of time order (@hyanwong, #2479, #2473), and the
same fortables.subset()
(@hyanwong, #2489). This involves
sorting the returned tables, potentially changing the returned edge order.
Performance improvements
Python 0.5.2
Fixes
-
Iterating over
ts.variants()
could cause a segfault in tree sequences
with large numbers of alleles or very long alleles
(@jeromekelleher, #2437, #2429). -
Various circular references fixed, lowering peak memory usage
(@jeromekelleher, #2424, #2423, #2427). -
Fix bugs in VCF output when there isn't a 1-1 mapping between individuals
and sample nodes (@jeromekelleher, #2442, #2257,
#2446, #2448).
Performance improvements
-
TreeSequence.site position search performance greatly improved, with much lower
memory overhead (@jeromekelleher, #2424). -
TreeSequence.samples time/population search performance greatly improved, with
much lower memory overhead (@jeromekelleher, #2424, #1916). -
The
timeasc
andtimedesc
orders forTree.nodes
have much
improved performance and lower memory overhead
(@jeromekelleher, #2424, #2423).
Features
-
Variant objects now have a
.num_missing
attribute and.counts()
and
.frequencies
methods (@hyanwong, #2390 #2393). -
Add the
Tree.num_lineages(t)
method to return the number of lineages present
at time t in the tree (@jeromekelleher, #386, #2422) -
Efficient array access to table data now provided via attributes like
TreeSequence.nodes_time
, etc (@jeromekelleher, #2424).
Breaking Changes
- Previously, accessing (e.g.)
tables.edges
returned a different instance of
EdgeTable each time. This has been changed to return the same instance
for the lifetime of a given TableCollection instance. This is technically
a breaking change, although it's difficult to see how code would depend
on the property that (e.g.)tables.edges is not tables.edges
.
(@jeromekelleher, #2441, #2080).
C API C_1.1.1
Bug fixes
- Fix segfault in tsk_variant_restricted_copy in tree sequences with large
numbers of alleles or very long alleles
(@jeromekelleher, #2437, #2429).
Python 0.5.1
Fixes
- Copies of a
Variant
object would cause a segfault when.samples
was accessed.
(@benjeffery, #2400, #2401)
Changes
-
Tables in a table collection can be replaced using the replace_with method
(@hyanwong, #1489 #2389) -
SVG drawing routines now return a special string object that is automatically
rendered in a Jupyter notebook (@hyanwong, #2377)
Features
C API 1.1.0
Features
-
Add
num_children
totsk_tree_t
an array which contains counts of the number of child
nodes of each node in the tree. (@GertjanBisschop, #2274, #2316) -
Add
edge
totsk_tree_t
an array which contains theedge_id
of the edge encoding
the relationship between the child node and its parent for each (child) node in the tree.
(@GertjanBisschop, #2304, #2340)
Changes
-
Reduce the maximum number of rows in a table by 1. This removes edge cases so that a
tsk_id_t
can be
used to count the number of rows. (@benjeffery, #2336, #2337) -
Samples are now copied by
tsk_variant_restricted_copy
. (@benjeffery, #2400, #2401)
Python 0.5.0
Major Feature Release
Breaking Changes
-
The JSON metadata codec now interprets the empty string as an empty object. This means
that applying a schema to an existing table will no longer necessitate modifying the
existing rows. (@benjeffery, #2064, #2104) -
Remove the previously deprecated
as_bytes
argument toTreeSequence.variants
.
If you need genotypes in byte form this can be done following the code in the
to_macs
method on line5573
oftrees.py
.
This argument was initially deprecated more than 3 years ago when the code was part of
msprime
.
(@benjeffery, #605, #2172) -
Arguments after
ploidy
inwrite_vcf
marked as keyword only
(@jeromekelleher, #2329, #2315). -
When metadata equal to
b''
is printed to text or HTML tables it will render as
an empty string rather than"b''"
. (@hyanwong, #2349, #2351)
Changes
-
A
min_time
parameter indraw_svg
enables the youngest node as the y axis min
value, allowing negative times.
(@hyanwong, #2197, #2215) -
VcfWriter.write
now prints the site ID of variants in the ID field of the
output VCF files.
(@roohy, #2103, #2107) -
Make dumping of tables and tree sequences to disk a zero-copy operation.
(@benjeffery, #2111, #2124) -
Add
copy
argument toTreeSequence.variants
which if False reuses the
returnedVariant
object for improved performance. Defaults to True.
(@benjeffery, #605, #2172) -
tree.mrca
now takes 2 or more arguments and gives the common ancestor of them all.
(@savitakartik, #1340, #2121) -
Add a
edge
attribute to theMutation
class that gives the ID of the
edge that the mutation falls on.
(@jeromekelleher, #685, #2279). -
Add the
TreeSequence.split_edges
operation which inserts nodes into
edges at a specific time.
(@jeromekelleher, #2276, #2296). -
Add the
TreeSequence.decapitate
(and closely related
TableCollection.delete_older
) operation to remove topology and mutations
older than a give time.
(@jeromekelleher, #2236, #2302, #2331). -
Add the
TreeSequence.individuals_time
andTreeSequence.individuals_population
methods to return arrays of per-individual times and populations, respectively.
(@petrelharp, #1481, #2298). -
Add the
sample_mask
andsite_mask
towrite_vcf
to allow parts
of an output VCF to be omitted or marked as missing data. Also add the
as_vcf
convenience function, to return VCF as a string.
(@jeromekelleher, #2300). -
Add support for missing data to
write_vcf
, and add theisolated_as_missing
argument. (@jeromekelleher, #2329, #447). -
Add
Tree.num_children_array
andTree.num_children
. Returns the counts of
the number of child nodes for each or a single node in the tree respectively.
(@GertjanBisschop, #2318, #2319, #2332) -
Add
Tree.path_length
.
(@jeremyguez, #2249, #2259). -
Add B1 tree balance index.
(@jeremyguez, @jeromekelleher, #2251, #2281, #2346). -
Add B2 tree balance index.
(@jeremyguez, @jeromekelleher, #2252, #2353, #2354). -
Add Sackin tree imbalance index.
(@jeremyguez, @jeromekelleher, #2246, #2258). -
Add Colless tree imbalance index.
(@jeremyguez, @jeromekelleher, #2250, #2266, #2344). -
Add
direction
argument toTreeSequence.edge_diffs
, allowing iteration
over diffs in the reverse direction. NOTE: this comes with a ~10% performance
regression as the implementation was moved from C to Python for simplicity
and maintainability. Please open an issue if this affects your application.
(@jeromekelleher, @benjeffery, #2120). -
Add
Tree.edge_array
andTree.edge
. Returns the edge id of the edge encoding
the relationship of each node with its parent.
(@GertjanBisschop, #2361, #2357)