# Lineage tree

This example shows how lineage trees can be passed, specifically
useful for the {class}`~moscot.problems.time.LineageProblem`, which requires lineage information.
Check [moslin](https://github.com/theislab/moslin) {cite}`lange-moslin:23` for examples on real-world data.

{mod}`moscot` allows this by passing the:
1. precomputed cost matrices,
2. barcode information,
3. or the lineage tree as a {class}`~networkx.DiGraph`.

In this notebook, we consider the lineage tree case.

:::{seealso}
- TODO: link to other relevant examples
:::

## Imports and data loading

In [1]:
from moscot import datasets
from moscot.problems.time import LineageProblem

Simulate data using {func}`~moscot.datasets.simulate_data`.

In [2]:
adata = datasets.simulate_data(n_distributions=3, key="day", quad_term="tree")
adata

AnnData object with n_obs × n_vars = 60 × 60
    obs: 'day', 'celltype'
    uns: 'trees'

We assume trees are saved in {attr}`~anndata.AnnData.uns` as a {class}`dict`, where each key is a value in {attr}`~anndata.AnnData.obs` and each value is a {class}`~networkx.DiGraph`.

In [3]:
adata.uns["trees"]

{0: <networkx.classes.digraph.DiGraph at 0x7fb4786c9ae0>,
 1: <networkx.classes.digraph.DiGraph at 0x7fb4786c9720>,
 2: <networkx.classes.digraph.DiGraph at 0x7fb4786c94b0>}

## Leaf distance

Now, we can instantiate and prepare the {class}`~moscot.problems.time.LineageProblem` by specifying the cost.

In [4]:
lp = LineageProblem(adata)
lp = lp.prepare(
    time_key="day",
    lineage_attr={"attr": "uns", "key": "trees", "cost": "leaf_distance"},
)

[34mINFO    [0m Computing pca with `[33mn_comps[0m=[1;36m30[0m` for `xy` using `adata.X`                                                  
[34mINFO    [0m Computing pca with `[33mn_comps[0m=[1;36m30[0m` for `xy` using `adata.X`                                                  


Internally, cost matrices have been computed from the trees using the [shortest path](https://en.wikipedia.org/wiki/Shortest_path_problem) distance between the leaves. Let us investigate the first few entries of the cost matrix computed from the first lineage tree.

In [5]:
lp[0, 1].x.data_src[:3, :3]

array([[0., 2., 3.],
       [2., 0., 3.],
       [3., 3., 0.]])

Similarly, we investigate parts of the cost matrix created from the second tree.

In [6]:
lp[0, 1].y.data_src[:3, :3]

array([[0., 2., 3.],
       [2., 0., 3.],
       [3., 3., 0.]])

Note that the gene expression term is still saved as two point clouds. This cost matrix will be computed by the backend.

In [7]:
lp[0, 1].xy.data_src.shape, lp[0, 1].xy.data_tgt.shape

((20, 30), (20, 30))