# Transforms and Iterables

## Creating Iterables From AnnData

The Ann2Data expects an anndata iterable by default when called upon. If you want the Ann2Data object you created to take one big adata to be split later with various strategies you can provide any `geome.iterables.ToIterable` class to the `anndata2iter` parameter of the `geome.ann2data.Ann2Data` contructor. 

In [1]:
%load_ext autoreload
%autoreload 2

In [25]:
from geome import iterables
from geome import transforms
import squidpy as sq
import numpy as np
from anndata import AnnData

## Load data

In [26]:
# Load squidpy dataset
adata = sq.datasets.mibitof()

In [27]:
adata.obs['Cluster'].cat.categories

Index(['Endothelial', 'Epithelial', 'Fibroblast', 'Imm_other', 'Myeloid_CD11c',
       'Myeloid_CD68', 'Tcell_CD4', 'Tcell_CD8'],
      dtype='object')

### Create ToIterable object

In [5]:
to_iterable:iterables.ToIterable = iterables.ToCategoryIterable('Cluster', axis='obs')

In [6]:
split_adatas = list(to_iterable(adata)) # split by cluster
split_adatas[:3] # show first 3

[AnnData object with n_obs × n_vars = 115 × 36
     obs: 'row_num', 'point', 'cell_id', 'X1', 'center_rowcoord', 'center_colcoord', 'cell_size', 'category', 'donor', 'Cluster', 'batch', 'library_id'
     var: 'mean-0', 'std-0', 'mean-1', 'std-1', 'mean-2', 'std-2'
     uns: 'Cluster_colors', 'batch_colors', 'neighbors', 'spatial', 'umap'
     obsm: 'X_scanorama', 'X_umap', 'spatial'
     obsp: 'connectivities', 'distances',
 AnnData object with n_obs × n_vars = 746 × 36
     obs: 'row_num', 'point', 'cell_id', 'X1', 'center_rowcoord', 'center_colcoord', 'cell_size', 'category', 'donor', 'Cluster', 'batch', 'library_id'
     var: 'mean-0', 'std-0', 'mean-1', 'std-1', 'mean-2', 'std-2'
     uns: 'Cluster_colors', 'batch_colors', 'neighbors', 'spatial', 'umap'
     obsm: 'X_scanorama', 'X_umap', 'spatial'
     obsp: 'connectivities', 'distances',
 AnnData object with n_obs × n_vars = 270 × 36
     obs: 'row_num', 'point', 'cell_id', 'X1', 'center_rowcoord', 'center_colcoord', 'cell_size',

### The role of AnnData2Iterable in Ann2Data
They are equivalent in the following way

In [14]:
from geome.ann2data import Ann2DataByCategory, Ann2DataDefault

In [15]:
Ann2DataByCategory(
    fields={'x': ['X']},
    category='Cluster',
).to_list(adata)

[Data(x=[115, 36]),
 Data(x=[746, 36]),
 Data(x=[270, 36]),
 Data(x=[488, 36]),
 Data(x=[168, 36]),
 Data(x=[259, 36]),
 Data(x=[799, 36]),
 Data(x=[464, 36])]

In [16]:
Ann2DataDefault(
    fields={'x': ['X']},
    adata2iter=iterables.ToCategoryIterable('Cluster', axis='obs'),
).to_list(adata)

[Data(x=[115, 36]),
 Data(x=[746, 36]),
 Data(x=[270, 36]),
 Data(x=[488, 36]),
 Data(x=[168, 36]),
 Data(x=[259, 36]),
 Data(x=[799, 36]),
 Data(x=[464, 36])]

## Transforms

before we head onto transforming our adata lets simplify it to see the effect better

In [28]:
adata

AnnData object with n_obs × n_vars = 3309 × 36
    obs: 'row_num', 'point', 'cell_id', 'X1', 'center_rowcoord', 'center_colcoord', 'cell_size', 'category', 'donor', 'Cluster', 'batch', 'library_id'
    var: 'mean-0', 'std-0', 'mean-1', 'std-1', 'mean-2', 'std-2'
    uns: 'Cluster_colors', 'batch_colors', 'neighbors', 'spatial', 'umap'
    obsm: 'X_scanorama', 'X_umap', 'spatial'
    obsp: 'connectivities', 'distances'

In [29]:
simple_adata = AnnData(adata.X)
simple_adata.obs['Cluster'] = adata.obs['Cluster']
simple_adata.obsp['connectivities'] = adata.obsp['connectivities']
simple_adata

AnnData object with n_obs × n_vars = 3309 × 36
    obs: 'Cluster'
    obsp: 'connectivities'

In [30]:
adds_edge_index = transforms.AddEdgeIndex(adj_matrix_loc='obsp/connectivities', edge_index_key='edge_index', overwrite=True)

As the name suggests this object is expected to add edge index to uns of adata.

In [32]:
adds_edge_index(simple_adata)

AnnData object with n_obs × n_vars = 3309 × 36
    obs: 'Cluster'
    uns: 'edge_index'
    obsp: 'connectivities'

In [35]:
multiple_transforms = transforms.Compose( # you can get creative with this
    [
        transforms.AddEdgeIndex(adj_matrix_loc='obsp/connectivities', edge_index_key='edge_index', overwrite=True),
        transforms.AddEdgeWeight(weight_matrix_loc='obsp/connectivities', edge_index_key='edge_index', edge_weight_key='edge_weight', overwrite=True),
    ]
)

In [36]:
multiple_transforms(simple_adata)

AnnData object with n_obs × n_vars = 3309 × 36
    obs: 'Cluster'
    uns: 'edge_index', 'edge_weight'
    obsp: 'connectivities'

### The role of transforms

The transform objects can be given to two different parameters of A2D classes. One is as transform which will tranform each splitted adata. One is preprocess it will be called on if the user gives one anndata instead of an iterable and gives a anndata2iter in the constructor. Before the call for anndata2iter this transform is applied.