Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ jobs:
uses: actions/checkout@v4

# Install dependencies
- name: Install Graphviz
run: |
sudo apt-get install graphviz {lib,}graphviz-dev

- name: Set up Python 3.11
uses: actions/setup-python@v4
with:
Expand Down
3 changes: 2 additions & 1 deletion _toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,14 @@ parts:
chapters:
- file: simulation_overview
- file: no_mutations
- file: completing_forward_sims
- file: advanced_msprime
sections:
- file: demography
- file: bottlenecks
- file: introgression
- file: completing_forward_sims
- file: forward_sims
- file: more_forward_sims
- caption: Other languages
# TODO: add basic C and maybe Rust tutes
chapters:
Expand Down
344 changes: 202 additions & 142 deletions args.md

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions forward_sims.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ as rows of the the {ref}`sec_individual_table_definition` (a node can then be as
individual by storing that individual's id in the appropriate row of the node table).

An *edge* reflects a transmission event between nodes. An edge is a tuple `(Left, Right, Parent, Child)`
whose meaning is "Child genome $C$ inherited the genomic interval $[L, R)$ from $Parent genome $P$".
whose meaning is "The `Child` genome inherited the genomic interval [`Left`, `Right`) from the `Parent` genome".
In a tree sequence this is stored in a row of the {ref}`sec_edge_table_definition`.

The *time*, in the discrete-time Wright-Fisher (WF) model which we will simulate, is measured in
Expand Down Expand Up @@ -90,7 +90,7 @@ import numpy as np
random_seed = 2
random = np.random.default_rng(random_seed) # A random number generator for general use

L = 50_000 # The sequence length = 50 Kb
L = 50_000 # The sequence length: 50 Kb
```

### Core steps
Expand Down
2 changes: 2 additions & 0 deletions requirements-CI.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@ jupyter-cache==0.6.1
msprime==1.2.0
networkx==3.1
pandas==2.1.1
pygraphviz==1.10
scikit-allel==1.3.7
stdpopsim==0.2.0
tqdm==4.66.1
tskit==0.5.5
tskit_arg_visualizer==0.0.1
tszip==0.2.2
jsonschema==4.18.6 # Pinned due to 4.19 "AttributeError module jsonschema has no attribute _validators"
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@ jupyter-cache
msprime>=1.0
networkx
pandas
pygraphviz
scikit-allel
stdpopsim
tqdm
tskit>=0.5.4
tskit_arg_visualizer
tszip
139 changes: 126 additions & 13 deletions viz.md
Original file line number Diff line number Diff line change
Expand Up @@ -1503,22 +1503,135 @@ of which are outlined below.

A tree sequence can be treated as a specific form of (directed)
[graph](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)) consisting
of nodes connected by edges. Standard graph visualization software, such as
[graphviz](https://graphviz.org) can therefore be used to represent tree sequence
topologies. This is a relatively common approach to visualizing the full
"Ancestral Recombination Graph" or ARG (a structure in which some nodes are
"recombination nodes", and which is possible to
{ref}`represent as a tree sequence <msprime:sec_ancestry_full_arg>`).
of nodes connected by edges. Standard graph visualization software,
such as [graphviz](https://graphviz.org) can therefore be used to depict tree sequence
topologies. Alternatively, the [tskit_arg_visualizer](https://github.com/kitchensjn/tskit_arg_visualizer)
project will draw a interactive `tskit` graph directly. Below is an example, which uses the
`variable_edge_width` option to highlight the spans of genome inherited through different routes.
Nodes can be dragged horizontally and embedded trees highlighted:

```{code-cell} ipython3
:"tags": ["hide-input"]
%%javascript
require.config({paths: {d3: 'https://d3js.org/d3.v7.min'}});
require(["d3"], function(d3) {window.d3 = d3;});
```

:::{todo}
Link to the ARG tutorial,
[once it is created](https://github.com/tskit-dev/tutorials/issues/43), and show a
picture like this:
```{code-cell} ipython3
import msprime
import tskit_arg_visualizer
ts = msprime.sim_ancestry(4, sequence_length=1000, recombination_rate=0.001, random_seed=3)
d3arg = tskit_arg_visualizer.D3ARG.from_ts(ts=ts)
tip_order = [0, 6, 3, 1, 2, 7, 4, 5] # Found by trial and error for this seed
d3arg.draw(width=500, height=500, variable_edge_width=True, sample_order=tip_order)
```

![A tree sequence (ARG) as a graph](https://user-images.githubusercontent.com/36134434/109398193-2ec6b700-7933-11eb-9cbf-99fdfab46df0.png)
(from [here](https://github.com/tskit-dev/tutorials/issues/43#issuecomment-787124425))
:::
For an `msprime` "full ARG" tree sequence, `edge_type="ortho"` can be used to draw a
traditional "[Ancestral Recombination Graph](sec_args)" style plot (variable edge widths
turned off for clarity):

```{code-cell} ipython3
import msprime
import tskit_arg_visualizer
full_arg_ts = msprime.sim_ancestry(
4, sequence_length=1000, recombination_rate=0.001, random_seed=3, record_full_arg=True)
d3arg = tskit_arg_visualizer.D3ARG.from_ts(ts=full_arg_ts)
d3arg.draw(width=500, height=500, edge_type="ortho", sample_order=tip_order)
```

For more general graph plots, it can be helpful convert the tree sequence to a
[networkx](https://networkx.org) graph first, as described in the
{ref}`sec_args_other_analysis` section of the {ref}`sec_args` tutorial.
This provides interfaces to graph plotting software such as
[graphviz](https://graphviz.org), which provides the `dot` layout engine for
directed graphs:

```{code-cell} ipython3
:"tags": ["hide-input"]
## Networkx conversion code taken from the ARG tutorial

import networkx as nx
import pandas as pd
import tskit

def to_networkx_graph(ts, interval_lists=False):
"""
Make an nx graph from a tree sequence. If `intervals_lists` is True, then
each graph edge will have an ``intervals`` attribute containing a *list*
of tskit.Intervals per parent/child combination. Otherwise each graph edge
will correspond to a tskit edge, with a ``left`` and ``right`` attribute.
"""
D = dict(source=ts.edges_parent, target=ts.edges_child, left=ts.edges_left, right=ts.edges_right)
G = nx.from_pandas_edgelist(pd.DataFrame(D), edge_attr=True, create_using=nx.MultiDiGraph)
if interval_lists:
GG = nx.DiGraph() # Mave a new graph with one edge that can contai
for parent, children in G.adjacency():
for child, edict in children.items():
ilist = [tskit.Interval(v['left'], v['right']) for v in edict.values()]
GG.add_edge(parent, child, intervals=ilist)
G = GG
nx.set_node_attributes(G, {n.id: {'flags':n.flags, 'time': n.time} for n in ts.nodes()})
return G
```

```{code-cell} ipython3
import networkx as nx
from IPython.display import SVG

def graphviz_svg(networkx_graph):
AG = nx.drawing.nx_agraph.to_agraph(networkx_graph) # Convert to graphviz "agraph"
nodes_at_time_0 = [k for k, v in networkx_graph.nodes(data=True) if v['time'] == 0]
AG.add_subgraph(nodes_at_time_0, rank='same') # put time=0 at same rank
return AG.draw(prog="dot", format="svg")

G = to_networkx_graph(ts, interval_lists=True) # Function from the ARG tutorial
print("Converted `ts` to a networkx graph named `G`")
print("Plotting using graphviz...")
SVG(graphviz_svg(G))
```

Alternatively, you can read the `graphviz` positions back into `networkx`
and use the `networkx` drawing functionality, which
relies upon the [matplotlib](https://matplotlib.org) library. This
allows modification of node colours and symbols, labels, rotations,
annotations, etc., as shown below:

```{code-cell} ipython3
:"tags": ["hide-input"]
from matplotlib import pyplot as plt
import string

def get_graphviz_positions(networkx_graph):
AG = nx.drawing.nx_agraph.to_agraph(networkx_graph) # Convert to graphviz "agraph"
nodes_at_time_0 = [k for k, v in networkx_graph.nodes(data=True) if v['time'] == 0]
AG.add_subgraph(nodes_at_time_0, rank='same') # put time=0 at same rank
AG.layout(prog="dot") # create the layout, storing positions in the "pos" attribute
return {n: [float(x) for x in AG.get_node(n).attr["pos"].split(",")] for n in G.nodes()}

pos=get_graphviz_positions(G)
edge_labels = {
edge[0:2]: "\n".join([f"[{int(i.left)},{int(i.right)})" for i in edge[2]["intervals"]])
for edge in G.edges(data=True)
}

samples = set(ts.samples())
nonsamples = set(range(ts.num_nodes)) - samples

plt.figure(figsize=(10, 4))
# Sample nodes as dark green squares with white text
nx.draw(G, pos, node_color="#007700", font_size=9, node_size=250, node_shape="s", nodelist=samples, edgelist=[])
nx.draw_networkx_labels(G, pos, font_size=9, labels={u: u for u in samples}, font_color="white")
# Others as blue circles with alphabetic labels
nx.draw(G, pos, node_color="#22CCFF", font_size=9, edgecolors="black", nodelist=nonsamples, edgelist=[])
nx.draw_networkx_labels(G, pos, font_size=9, labels={u: string.ascii_lowercase[u] for u in nonsamples})

nx.draw_networkx_edges(G, pos, edge_color="lightgrey", arrows=False, width=2);
nx.draw_networkx_edge_labels(G, pos, edge_labels, font_size=7, rotate=False);
```

Note, however, that finding node and edge layout positions that avoid too much overlap
can be tricky, even for the graphviz layout engine, and there is no easy functionality
to place nodes at specific vertical (time) positions.

(sec_tskit_viz_other_demographic)=

Expand Down