diff --git a/docs/file_formats.rst b/docs/file_formats.rst index 851c2204..78f401a9 100644 --- a/docs/file_formats.rst +++ b/docs/file_formats.rst @@ -42,12 +42,12 @@ Tree sequences The goal of ``tsinfer`` is to infer correlated genealogies from variation data, and it uses the very efficient `succinct tree sequence -`_ data structure -to encode this output. Please see the `msprime documentation -`_ for details on how to +`_ data structure +to encode this output. Please see the `tskit documentation +`_ for details on how to process and manipulate such tree sequences. The intermediate ``.ancestors.trees`` file produced by the :ref:`sec_inference_match_ancestors` step is also a tree sequence and can be loaded and analysed using the -`msprime API `_. +`tskit API `_. diff --git a/docs/inference.rst b/docs/inference.rst index 0b8adcc3..4fbff4db 100644 --- a/docs/inference.rst +++ b/docs/inference.rst @@ -36,13 +36,12 @@ Data model ********** The data model for ``tsinfer`` is tightly integrated with -``msprime``'s `data model `_ +``tskit``'s `data model `_ and uses the same concepts throughout. The intermediate file formats and APIs described here provide a bridge between this model and existing data sources. For convenience, we provide a brief description of concepts needed for importing -data into ``tsinfer`` here. Please see the `msprime documentation -`_ for more detailed -information. +data into ``tsinfer`` here. Please see the `tskit documentation +`_ for more detailed information. .. _sec_inference_data_model_individual: @@ -167,8 +166,9 @@ number of recombination events. The copying path for each ancestor then describes its ancestry at every point in the sequence: from a genealogical perspective, we know its parent node. This information is encoded precisely as an `edge -`_ in a -`tree sequence `_. +`_ in a +`tree sequence +`_. Thus, we refer to the output of this step as the "ancestors tree sequence", which is conventionally stored in a file ending with ``.ancestors.trees``. @@ -200,7 +200,7 @@ The final phase of a ``tsinfer`` inference consists of a number steps: 3. Reduce the resulting tree sequence to a canonical form by `simplifying it - `_. + `_. .. todo:: 1. Describe path compression here and above in the ancestors diff --git a/docs/installation.rst b/docs/installation.rst index 05d3dd37..33d06bba 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -14,10 +14,6 @@ e.g.:: will install ``tsinfer`` to the Python installation corresponding to your ``python3`` executable. All requirements should be installed automatically. -However, there are situations (usually where the GSL libraries are not in the default -locations) where ``msprime`` installation can fail. Please the -`msprime installation documentation `_ -for details on the various to address this problem. To run the command line interface to ``tsinfer`` you can then use:: @@ -36,3 +32,19 @@ first using `venv `_:: $ source tsinfer-venv/bin/activate (tsinfer-venv) $ pip install tsinfer (tsinfer-venv) $ tsinfer --help + +.. _sec_installation_installation_problems: + +**************** +Potential issues +**************** + +One of the dependencies of ``tsinfer``, +`numcodecs `_, is compiled to +use AVX2 instructions (where available) when installed using pip. This can lead to +issues when ``numcodecs`` is compiled on a machine that supports AVX2 +and subsequently run on older machines that do not. To resolve this, ``numcodecs`` has a +``DISABLE_NUMCODECS_AVX2`` variable which can be turned on before calling +``pip install``, see +`these instructions `_ +for details. diff --git a/docs/introduction.rst b/docs/introduction.rst index 2b886f49..c6099c1c 100644 --- a/docs/introduction.rst +++ b/docs/introduction.rst @@ -17,6 +17,6 @@ make two very important gains: storing and processing the data that we have. The output of ``tsinfer`` is an :class:`msprime.TreeSequence` and so the -full `msprime API `_ can be used to -analyse real data, in precisely the same way that it is currently used -to analyse simulation data. +full `tskit API `_ can be used to +analyse real data, in precisely the same way that it is commonly used +to analyse simulation data, for example, from `msprime `_.