tskit-dev · petrelharp · Dec 16, 2021
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -4,13 +4,35 @@
 
 **Breaking changes**:
 
-- 
+- `reference_sequence` is now a tskit attribute, no longer managed by pyslim.
+    It is no longer mutable on tree sequences (only TableCollections), and
+    previous calls to `ts.reference_sequence` to get the actual sequence
+    should be replaced by `ts.reference_sequence.data`.
+
+- `annotate_defaults` no longer returns a SlimTreeSequence; wrap the call
+    in `pyslim.SlimTreeSequence( )` to retain previous behavior.
+
+- Old-style "legacy" metadata (previously deprecated) has been removed.
+    See `the documentation <https://tskit.dev/pyslim/docs/previous_versions.html>`_
+    for instructions on migrating your code.
+
+- The `SlimTreeSequence` class is now deprecated, and using it produces a
+    warning, and will be removed at some point in the future (not soon).
 
 **New features**:
 
 - Added `pyslim.population_size( )` to compute an array giving numbers of
     individuals across a grid of space and time bins. ({user}giliapatterson)
 
+- `recapitate` is updated to use new demography features in msprime 1.0.
+
+- Methods of the SlimTreeSequence class are now methods of pyslim:
+    `recapitate`, `individual_parents`, `individual_ages_at`,
+    `has_individual_parents`, `individuals_alive_at`,
+    `mutation_at`, `nucleotide_at`, `slim_time`. For instance it is now
+    recommended to call `pyslim.recapitate(ts, ...)` instead of
+    `ts.recapitate(...)`.
+
 ********************
 [0.600] - 2021-02-24
 ********************

diff --git a/docs/_toc.yml b/docs/_toc.yml
@@ -11,6 +11,7 @@
     - file: vignette_coalescent_diversity
     - file: vignette_parallel_phylo
     - file: metadata
+    - file: previous_versions
 
 - part: pyslim reference
   chapters:

diff --git a/docs/metadata.md b/docs/metadata.md
@@ -19,7 +19,7 @@ import numpy as np
 import random
 random.seed(23)
 
-ts = pyslim.load("example_sim.trees")
+ts = tskit.load("example_sim.trees")
 tables = ts.tables
 ```
 
@@ -243,117 +243,17 @@ mod_ts.dump("modified_ts.trees")
 
 ### Metadata entries
 
-SLiM records additional information in the metadata columns of Population, Individual, Node, and Mutation tables,
+SLiM records additional information in the metadata columns of Individual, Node, and Mutation tables,
 in a binary format using the python ``struct`` module.
 See {ref}`tskit's metadata documentation <tskit:sec_metadata>`
 for details on how this works.
 Nothing besides this binary information can be stored in the metadata of these tables if the tree sequence is to be used by SLiM,
 and so when ``pyslim`` annotates an existing tree sequence, anything in those columns is overwritten.
+Population metadata is stored as JSON, however, which is more flexible.
 For more detailed documentation on the contents and format of the metadata, see the SLiM manual.
 
 Of particular note is that *nodes* and *populations* may have empty metadata.
 SLiM will not use the metadata of nodes that are not associated with alive individuals,
 so this can safely be omitted (and makes recapitation easier).
 And, populations not used by SLiM will have empty metadata.
 All remaining metadata are required (besides edges and sites, whose metadata is not used at all).
-
-
-(sec_legacy_metadata)=
-
-## Legacy metadata
-
-In previous versions of pyslim,
-SLiM-specific metadata was provided as customized objects:
-for instance, for a node ``n`` provided by a ``SlimTreeSequence``,
-we'd have ``n.metadata`` as a ``NodeMetadata`` object,
-with attributes ``n.metadata.slim_id`` and ``n.metadata.is_null`` and ``n.metadata.genome_type``.
-However, with tskit 0.3,
-the capacity to deal with structured metadata
-was implemented in {ref}`tskit itself <tskit:sec_metadata>`,
-and so pyslim shifted to using the tskit-native metadata tools.
-As a result, parsed metadata is provided as a dictionary instead of an object,
-so that now ``n.metadata`` would be a dict,
-with entries ``n.metadata["slim_id"]`` and ``n.metadata["is_null"]`` and ``n.metadata["genome_type"]``.
-Annotation should be done with tskit methods (e.g., ``packset_metadata``).
-
-.. note::
-
-    Until pyslim version 0.600, the old-style metadata was still available,
-    but this functionality has been removed.
-
-Here are more detailed notes on how to migrate a script from the legacy
-metadata handling. If you run into issues, please ask (open a discussion on github).
-
-**1.** Use top-level metadata instead of ``slim_provenance``:
-previously, information about the model type and the time counter (generation)
-in SLiM was provided in the Provenances table, made available through
-the ``ts.slim_provenance`` object.  This is still available but deprecated,
-and should be obtained from the *top-level* metadata object, ``ts.metadata["SLiM"]``.
-So, in your scripts ``ts.slim_provenance.model_type`` should be replaced with
-``ts.metadata["SLiM"]["model_type"]``,
-and (although it's not deprecated), probably ``ts.slim_generation`` should
-probably be replaced with
-``ts.metadata["SLiM"]["generation"]``.
-
-**2.** Switch metadata objects to dicts:
-if ``md`` is the ``metadata`` property of a population, individual, or node,
-this means replacing ``md.X`` with ``md["X"]``.
-The ``migration_records`` property of population metadata is similarly
-a list of dicts rather than a list of objects, so instead of
-``ts.population(1).metadata.migration_records[0].source_subpop``
-we would write
-``ts.population(1).metadata["migration_records"][0]["source_subpop"]``.
-
-Mutations were previously a bit different - if ``mut`` is a mutation
-(e.g., ``mut = ts.mutation(0)``)
-then ``mut.metadata`` was previously a list of MutationMetadata objects.
-Now, ``mut.metadata`` is a dict, with a single entry:
-``mut.metadata["mutation_list"]`` is a list of dicts, each containing the information
-that was previously in the MutationMetadata objects.
-So, for instance, instead of ``mut.metadata[0].selection_coeff``
-we would write ``mut.metadata["mutation_list"][0]["selection_coeff"]``.
-
-**3.** The ``decode_X`` and ``encode_X`` methods are now deprecated,
-as this is handled by tskit itself.
-For instance, ``encode_node`` would take a NodeMetadata object
-and produce the raw bytes necessary to encode it in a Node table,
-and ``decode_node`` would do the inverse operation.
-This is now handled by the relevant MetadataSchema object:
-for nodes one can obtain this as ``nms = ts.tables.nodes.metadata_schema``,
-which has the methods ``nms.validate_and_encode_row`` and ``nms.decode_row``.
-Decoding is for the most part not necessary,
-since the metadata is automatically decoded,
-but ``pyslim.decode_node(raw_md)`` could be replaced by ``nms.decode_row(raw_md)``.
-Encoding is necessary to modify tables,
-and ``pyslim.encode_node(md)`` can be replaced by ``nms.validate_and_encode_row(md)``
-(where furthermore ``md`` should now be a dict rather than a NodeMetadata object).
-
-**4.** The ``annotate_X_metadata`` methods are deprecated,
-as again tskit has tools to do this.
-These methods would set the metadata column of a table -
-for instance, if ``metadata`` is a list of NodeMetadata objects, then
-``annotate_node_metadata(tables, metadata)`` would modify ``tables.nodes`` in place
-to contain the (encoded) metadata in the list ``metadata``.
-Now, this could be done as follows (where now ``metadata`` is a list of metadata dicts):
-
-```{code-cell}
-metadata = [ {'slim_id': k, 'is_null': False, 'genome_type': 0}
-            for k in range(tables.nodes.num_rows) ]
-nms = tables.nodes.metadata_schema
-tables.nodes.packset_metadata(
-  [nms.validate_and_encode_row(r) for r in metadata])
-```
-
-If speed is an issue, then ``encode_row`` can be substituted for ``validate_and_encode_row``,
-but at the risk of missing errors in metadata.
-
-**5.** the ``extract_X_metadata`` methods are not necessary,
-since the metadata in the tables of a TableCollection are automatically decoded.
-For instance, ``[ind.metadata["sex"] for ind in tables.individuals]`` will obtain
-a list of sexes of the individuals in the IndividualTable.
-
-:::{warning}
-   It is our intention to remain backwards-compatible for a time.
-   However, the legacy code will disappear at some point in the future,
-   so please migrate over scripts you intend to rely on.
-:::
diff --git a/docs/previous_versions.md b/docs/previous_versions.md
@@ -0,0 +1,159 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.12
+    jupytext_version: 1.9.1
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
+```{code-cell}
+:tags: [remove-cell]
+import pyslim, tskit, msprime
+
+ts = tskit.load("example_sim.trees")
+tables = ts.tables
+```
+
+
+(sec_previous_versions)=
+
+
+# Migrating from previous versions of pyslim
+
+A number of features that were first introduced in pyslim have been made part of core
+tskit functionality. For instance, reference sequence support was provided (although
+loosely) inpyslim to support SLiM's nucleotide models, but is now part of a standard
+tskit {class}`tskit.TreeSequence`. Similarly, metadata processing in tskit made
+code to do this within pyslim obsolete; this "legacy metadata" code has been removed
+and instructions for how to migrate your code are {ref}`below <sec_legacy_metadata>`.
+
+In fact, we are now at the (very good) place where we don't need
+the {class}`pyslim.SlimTreeSequence` class any longer - it only makes code more compliated.
+So, pyslim is migrating to be purely functional: instead of providing the SlimTreeSequence
+class with specialized methods, all methods will be functions of TreeSequences,
+that take in a tree sequence and return something
+(a modified tree sequence or some summary of it).
+Backwards compatibility will be maintained for some time, but we request that you
+switch over sooner, as your code will be cleaner and faster.
+
+To do this, you should
+
+1. Remove all calls to `pyslim.SlimTreeSequence( )`. They are unnecessary.
+2. Replace `ts.slim_generation` with `ts.metadata['SLiM']['generation']`,
+    and `ts.model_type` with `ts.metadata['SLiM']['model_type']`.
+3. Replace `ts.reference_sequence` with `ts.reference_sequence.data`.
+4. Replace calls to `ts.recapitate(...)` with `pyslim.recapitate(ts, ...)`,
+    and similarly with other SlimTreeSequence methods.
+
+If you encounter difficulties, please post an
+[issue](https://github.com/tskit-dev/pyslim/issues)
+or [discussion](https://github.com/tskit-dev/pyslim/discussions) on github.
+
+
+(sec_legacy_metadata)=
+
+## Legacy metadata
+
+In previous versions of pyslim,
+SLiM-specific metadata was provided as customized objects:
+for instance, for a node ``n`` provided by a ``SlimTreeSequence``,
+we'd have ``n.metadata`` as a ``NodeMetadata`` object,
+with attributes ``n.metadata.slim_id`` and ``n.metadata.is_null`` and ``n.metadata.genome_type``.
+However, with tskit 0.3,
+the capacity to deal with structured metadata
+was implemented in {ref}`tskit itself <tskit:sec_metadata>`,
+and so pyslim shifted to using the tskit-native metadata tools.
+As a result, parsed metadata is provided as a dictionary instead of an object,
+so that now ``n.metadata`` would be a dict,
+with entries ``n.metadata["slim_id"]`` and ``n.metadata["is_null"]`` and ``n.metadata["genome_type"]``.
+Annotation should be done with tskit methods (e.g., ``packset_metadata``).
+
+.. note::
+
+    Until pyslim version 0.600, the old-style metadata was still available,
+    but this functionality has been removed.
+
+Here are more detailed notes on how to migrate a script from the legacy
+metadata handling. If you run into issues, please ask (open a discussion on github).
+
+**1.** Use top-level metadata instead of ``slim_provenance``:
+previously, information about the model type and the time counter (generation)
+in SLiM was provided in the Provenances table, made available through
+the ``ts.slim_provenance`` object.  This is still available but deprecated,
+and should be obtained from the *top-level* metadata object, ``ts.metadata["SLiM"]``.
+So, in your scripts ``ts.slim_provenance.model_type`` should be replaced with
+``ts.metadata["SLiM"]["model_type"]``,
+and (although it's not deprecated), probably ``ts.slim_generation`` should
+probably be replaced with
+``ts.metadata["SLiM"]["generation"]``.
+
+**2.** Switch metadata objects to dicts:
+if ``md`` is the ``metadata`` property of a population, individual, or node,
+this means replacing ``md.X`` with ``md["X"]``.
+The ``migration_records`` property of population metadata is similarly
+a list of dicts rather than a list of objects, so instead of
+``ts.population(1).metadata.migration_records[0].source_subpop``
+we would write
+``ts.population(1).metadata["migration_records"][0]["source_subpop"]``.
+
+Mutations were previously a bit different - if ``mut`` is a mutation
+(e.g., ``mut = ts.mutation(0)``)
+then ``mut.metadata`` was previously a list of MutationMetadata objects.
+Now, ``mut.metadata`` is a dict, with a single entry:
+``mut.metadata["mutation_list"]`` is a list of dicts, each containing the information
+that was previously in the MutationMetadata objects.
+So, for instance, instead of ``mut.metadata[0].selection_coeff``
+we would write ``mut.metadata["mutation_list"][0]["selection_coeff"]``.
+
+**3.** The ``decode_X`` and ``encode_X`` methods are now deprecated,
+as this is handled by tskit itself.
+For instance, ``encode_node`` would take a NodeMetadata object
+and produce the raw bytes necessary to encode it in a Node table,
+and ``decode_node`` would do the inverse operation.
+This is now handled by the relevant MetadataSchema object:
+for nodes one can obtain this as ``nms = ts.tables.nodes.metadata_schema``,
+which has the methods ``nms.validate_and_encode_row`` and ``nms.decode_row``.
+Decoding is for the most part not necessary,
+since the metadata is automatically decoded,
+but ``pyslim.decode_node(raw_md)`` could be replaced by ``nms.decode_row(raw_md)``.
+Encoding is necessary to modify tables,
+and ``pyslim.encode_node(md)`` can be replaced by ``nms.validate_and_encode_row(md)``
+(where furthermore ``md`` should now be a dict rather than a NodeMetadata object).
+
+**4.** The ``annotate_X_metadata`` methods are deprecated,
+as again tskit has tools to do this.
+These methods would set the metadata column of a table -
+for instance, if ``metadata`` is a list of NodeMetadata objects, then
+``annotate_node_metadata(tables, metadata)`` would modify ``tables.nodes`` in place
+to contain the (encoded) metadata in the list ``metadata``.
+Now, this could be done as follows (where now ``metadata`` is a list of metadata dicts):
+
+```{code-cell}
+metadata = [ {'slim_id': k, 'is_null': False, 'genome_type': 0}
+            for k in range(tables.nodes.num_rows) ]
+nms = tables.nodes.metadata_schema
+tables.nodes.packset_metadata(
+  [nms.validate_and_encode_row(r) for r in metadata]
+)
+```
+
+If speed is an issue, then ``encode_row`` can be substituted for ``validate_and_encode_row``,
+but at the risk of missing errors in metadata.
+
+**5.** the ``extract_X_metadata`` methods are not necessary,
+since the metadata in the tables of a TableCollection are automatically decoded.
+For instance, ``[ind.metadata["sex"] for ind in tables.individuals]`` will obtain
+a list of sexes of the individuals in the IndividualTable.
+
+:::{warning}
+   It is our intention to remain backwards-compatible for a time.
+   However, the legacy code will disappear at some point in the future,
+   so please migrate over scripts you intend to rely on.
+:::
+=======
+>>>>>>> 483184a (deprecation start)
diff --git a/docs/python_api.md b/docs/python_api.md
@@ -18,7 +18,7 @@ kernelspec:
     from IPython.display import SVG
     import numpy as np
 
-    ts = pyslim.load("example_sim.trees")
+    ts = tskit.load("example_sim.trees")
     tables = ts.tables
 ```
 
@@ -51,12 +51,19 @@ available in pyslim.
 
 ## Summarizing tree sequences
 
-Additionally, ``pyslim`` contains the following summary methods:
+Additionally, ``pyslim`` contains the following methods:
 
 ```{eval-rst}
 .. autosummary::
 
+  has_individual_parents
+  individual_ages_at
+  individual_parents
+  individuals_alive_at
+  mutation_at
+  nucleotide_at
   population_size
+  slim_time
 ```