tskit-dev · benjeffery · Jul 9, 2021 · Jun 24, 2021 · Jun 30, 2021 · Jul 9, 2021
diff --git a/_config.yml b/_config.yml
@@ -1,9 +1,9 @@
 # Book settings
 # Learn more at https://jupyterbook.org/customize/config.html
 
-title: Tskit Tutorials
+title: Tree Sequence Tutorials
 author: Tskit Developers
-# logo: logo.png
+logo: _static/tskit_logo.svg
 
 # Force re-execution of notebooks on each build.
 # See https://jupyterbook.org/content/execute.html

diff --git a/_static/tskit_logo.svg b/_static/tskit_logo.svg
diff --git a/basics.md b/basics.md
@@ -119,9 +119,9 @@ SVG(ts.draw_svg(y_axis=True, y_gridlines=True, time_scale="rank"))
 Each tree records the lines of descent along which a piece of DNA has been
 inherited (ignore for the moment the red symbols, which represent a mutation).
 For example, the first tree tells us that DNA from ancestral genome 7 duplicated
-to produce two lineages, which ended up in sample genome 1 and sample genome 4 in the
-current population (since this pattern is seen in all trees, this must have happened
-for all the DNA along the entire 1000 base pair genome).
+to produce two lineages, which ended up in genomes 1 and 4, both of which exist in the
+current population. In fact, since this pattern is seen in all trees, these particular
+lines of inheritance were taken by all the DNA in this 1000 base pair genome.
 
 
 (sec_basics_terminology_nodes)=
@@ -274,17 +274,17 @@ for mutation in ts.mutations():
 
 The mutation can have a {attr}`~Mutation.time` or if, as in this case, the times of
 mutations in the tree sequence are unknown, all mutations can have the special NaN value
-``tskit.UNKNOWN_TIME``. Notice that the genomic position of the mutation is not included.
-Instead, that is a property of the _site_ to which the mutation refers, in this case,
-site ID 0 (which happens to be at position 257):
+{data}`tskit.UNKNOWN_TIME`. Notice that the genomic position of the mutation is not
+included. Instead, that is a property of the _site_ to which the mutation refers, in
+this case, site ID 0 (which happens to be at position 751):
 
 ```{code-cell} ipython3
 print(ts.site(0))  # For convenience, the Python API also returns the mutations at the site
 ```
 
 In the plot above, since the the only mutation is above node 8 in the last tree, and has
 a {attr}`~Mutation.derived_state` of "G", we know that the samples descending from node
-8 in the first tree (sample genomes 2, 4, and 5) have a "G" at {attr}`~Site.position` 257,
+8 in the first tree (sample genomes 2, 3, and 5) have a "G" at {attr}`~Site.position` 751,
 while the others have the {attr}`~Site.ancestral_state` of "T". This means that Ada is
 homozygous for "T", Bob is homozygous for "G", and Cat is heterozygous "T/G".
 In other words the ancestral state and the details of any mutations at that site,

diff --git a/getting_started.md b/getting_started.md
@@ -76,7 +76,7 @@ contains 40 *sample nodes*, one for each genome.
 
 ## Processing trees
 
-A common idiom is to iterate over all the {class}`tree objects<Tree>` in a tree
+A common idiom is to iterate over all the {class}`Tree` objects in a tree
 sequence. This process underlies many tree sequence algorithms, including those we'll
 encounter later in this tutorial for calculating 
 {ref}`population genetic statistics<tskit:sec_stats>`.
@@ -137,7 +137,7 @@ Now that we know all trees have coalesced, we know that at each position in the
 all the 40 sample nodes must have one most recent common ancestor (MRCA). Below, we
 iterate over the trees, finding the IDs of the root (MRCA) node for each tree. The
 time of this root node can be found via the {meth}`tskit.TreeSequence.node` method, which
-returns a {class}`node object<Node>` whose attributes include the node time:
+returns a {class}`Node` object whose attributes include the node time:
 
 ```{code-cell} ipython3
 import matplotlib.pyplot as plt
@@ -194,7 +194,6 @@ for {meth}`~TreeSequence.simplify`.
 It can often be helpful to slim down a tree sequence so that it represents the genealogy
 of a smaller subset of the original samples. This can be done using the powerful
 {meth}`TreeSequence.simplify` method.
-.
 
 The {meth}`TreeSequence.draw_svg` method allows us to draw
 more than one tree: either the entire tree sequence, or
@@ -233,11 +232,11 @@ sites or mutations in your analyses.
 :::
 
 For many purposes it may be better to focus on the genealogy of your samples, rather than
-the sites and
-mutations that
+the {ref}`sites<sec_data_model_definitions_site>` and
+{ref}`mutations<sec_data_model_definitions_mutation>` that
 {ref}`define <sec_what_is_dna_data>` the genome sequence itself. Nevertheless,
-{program}`tskit` also provides efficient ways to return {class}`site objects<Site>` and
-{class}`mutation objects<Mutation>` from a tree sequence.
+{program}`tskit` also provides efficient ways to return {class}`Site` object and
+{class}`Mutation` objects from a tree sequence.
 For instance, under the finite sites model of mutation that we used above, multiple mutations
 can occur at some sites, and we can identify them by iterating over the sites using the
 {meth}`TreeSequence.sites` method:
@@ -512,7 +511,8 @@ in rough order of importance:
     * {meth}`~TreeSequence.simplify()` reduces the number of sample nodes in the tree
         sequence to a specified subset
     * {meth}`~TreeSequence.keep_intervals()` (or its complement,
-        {meth}`~TreeSequence.delete_intervals()`) deletes unwanted parts of the genome
+        {meth}`~TreeSequence.delete_intervals()`) removes genetic information from
+        specific regions of the genome
     * {meth}`~TreeSequence.draw_svg()` plots tree sequences (and {meth}`Tree.draw_svg()`
         plots trees)
     * {meth}`~TreeSequence.at()` returns a tree at a particular genomic position

diff --git a/intro.md b/intro.md
@@ -13,9 +13,7 @@ kernelspec:
 
 (sec_intro)=
 
-# Tree sequence tutorials
-
-**Welcome!**
+# Welcome!
 
 This site contains a number of tutorials to develop your understanding of
 [tree sequences](https://tskit.dev/learn.html#what) and software programs,

diff --git a/viz.md b/viz.md
@@ -36,9 +36,14 @@ def viz_ts():
     ts_small.dump("data/viz_ts_small.trees")
 
     ts_small_mutated = msprime.sim_mutations(ts_small, rate=1e-7, random_seed=342)
+    # 3rd tree should have first site with 2 muts
+    first_site_tree_2 = next(ts_small_mutated.at_index(2).sites())
+    assert len(first_site_tree_2.mutations) == 2
+    # mutation 8 should be above node 16 in the 1st tree
+    assert ts_small_mutated.site(8).mutations[0].id == 8
+    assert ts_small_mutated.site(8).mutations[0].node == 16
     ts_small_mutated.dump("data/viz_ts_small_mutated.trees")
 
-
 def viz_root_mut():
     """
     Taken from the drawing unit tests
@@ -758,7 +763,8 @@ y_ticks = np.delete(y_ticks, np.argwhere(np.ediff1d(y_ticks) <= 0.01))
 
 svg = ts.draw_svg(size=(1000, 350), y_axis=True, y_gridlines=True, y_ticks=y_ticks, style=(
     ".tree .lab {font-family: sans-serif}"
-    ".x-axis .tick .lab {text-anchor: start; transform: rotate(90deg) translate(8px)}"
+    # Normal X axis tick labels have dominant baseline: hanging, but it needs centring when rotated
+    ".x-axis .tick .lab {text-anchor: start; dominant-baseline: central; transform: rotate(90deg)}"
     ".y-axis .grid {stroke: #DDDDDD}"
     + ".tree :not(.leaf).node > .lab  {transform: translate(0,0); text-anchor:middle; fill: white}"
     + ".tree :not(.leaf).node > .sym {transform: scale(3.5)}"
@@ -789,11 +795,12 @@ SVG(ts.draw_svg(style=css_string, time_scale="rank", x_lim=[0, 30]))
 ```
 
 #### Leaf, sample & isolated nodes
-By default, sample nodes are square and non-sample nodes circular. However, neither need
-to be at time 0. Moreover, leaves need not be samples, and samples need not be leaves.
-Here we change the previous tree sequence to make some leaves non-samples and some samples
-internal nodes. To highlight the change, we have plotted sample nodes in green, and
-leaf nodes (if not samples) in blue.
+By default, sample nodes are square and non-sample nodes circular (at the moment this
+can't easily be changed). However, neither need to be at specific times: sample nodes can
+be at times other than 0, and nonsample nodes can be at time 0. Moreover, leaves need not
+be samples, and samples need not be leaves. Here we change the previous tree sequence to
+make some leaves non-samples and some samples internal nodes. To highlight the change,
+we have plotted sample nodes in green, and leaf nodes (if not samples) in blue.
 
 ```{code-cell} ipython3
 :"tags": ["hide-input"]
@@ -812,6 +819,80 @@ means that there can be sample nodes which are "isolated" in a tree. These are d
 unconnected to the main topology in one or more trees (e.g. nodes 7 and 8 above).
 :::
 
+#### Tick labels and gridlines
+
+Y tick labels can be specified explicitly, which allows time scales to be plotted
+e.g. in years even if the tree sequence ticks in generations. The grid lines associated
+with each y tick can also be changed or even hidden individually using the CSS
+[nth-child pseudo-selector](https://www.w3.org/TR/2018/REC-selectors-3-20181106/#nth-child-pseudo),
+where tickmarks are indexed from the bottom. This is used in
+the {ref}`sec_msprime_introgression` tutorial to show lines behind the trees at specific,
+important times. Below we show a slightly simpler example than in that tutorial,
+keeping node and mutation symbols in black, but colouring gridlines instead:
+
+```{code-cell} ipython3
+:"tags": ["hide-input"]
+# Function from the introgression tutorial - see there for justification
+import msprime
+time_units = 1000 / 25  # Conversion factor for kya to generations
+
+def run_simulation(sequence_length, random_seed=None):
+    demography = msprime.Demography()
+    # The same size for all populations; highly unrealistic!
+    Ne = 10**4
+    demography.add_population(name="Africa", initial_size=Ne)
+    demography.add_population(name="Eurasia", initial_size=Ne)
+    demography.add_population(name="Neanderthal", initial_size=Ne)
+
+    # 2% introgression 50 kya
+    demography.add_mass_migration(
+        time=50 * time_units, source='Eurasia', dest='Neanderthal', proportion=0.02)
+    # Eurasian 'merges' backwards in time into Africa population, 70 kya
+    demography.add_mass_migration(
+        time=70 * time_units, source='Eurasia', dest='Africa', proportion=1)
+    # Neanderthal 'merges' backwards in time into African population, 300 kya
+    demography.add_mass_migration(
+        time=300 * time_units, source='Neanderthal', dest='Africa', proportion=1)
+
+    ts = msprime.sim_ancestry(
+        recombination_rate=1e-8,
+        sequence_length=sequence_length,  
+        samples=[
+            msprime.SampleSet(1, ploidy=1, population='Africa'),
+            msprime.SampleSet(1, ploidy=1, population='Eurasia'),
+            # Neanderthal sample taken 30 kya
+            msprime.SampleSet(1, ploidy=1, time=30 * time_units, population='Neanderthal'),
+        ],
+        demography = demography,
+        record_migrations=True,  # Needed for tracking segments.
+        random_seed=random_seed,
+    )
+    return ts
+
+ts = run_simulation(20 * 10**6, 1)
+
+css = ".y-axis .tick .lab {font-size: 85%}"  # Multi-line labels unimplemented: use smaller font 
+css += ".y-axis .tick .grid {stroke: lightgrey}"  # Default gridline type
+css += ".y-axis .ticks .tick:nth-child(3) .grid {stroke-dasharray: 4}"  # 3rd line from bottom
+css += ".y-axis .ticks .tick:nth-child(3) .grid {stroke: magenta}"  # also 3rd line from bottom
+css += ".y-axis .ticks .tick:nth-child(4) .grid {stroke: blue}"  # 4th line from bottom
+css += ".y-axis .ticks .tick:nth-child(5) .grid {stroke: darkgrey}"  # 5th line from bottom
+y_ticks = {0: "0", 30: "30", 50: "Introgress", 70: "Eur origin", 300: "Nea origin", 1000: "1000"}
+y_ticks = {y * time_units: lab for y, lab in y_ticks.items()}
+SVG(ts.draw_svg(
+    size=(1200, 500),
+    x_lim=(0, 25_000),
+    time_scale="log_time",
+    node_labels = {0: "Afr", 1: "Eur", 2: "Neand"},
+    y_axis=True,
+    y_label="Time (kya)",
+    x_label="Genomic position (bp)",
+    y_ticks=y_ticks,
+    y_gridlines=True,
+    style=css))
+```
+
+
 #### Animation
 
 The classes attached to the SVG also allow elements to be animated. Here's a