# Basics: Transforms and Preprocessing


## The Difference between Transforms and Preprocessing

```python

class Ann2DataAbstract(ABC):
    """Abstract class that transforms an iterable of AnnData to Pytorch Geometric Data objects."""

    def __init__(
        self,
        preprocess: Callable[[AnnData], AnnData] | None = None,
        transform: Callable[[AnnData], AnnData] | None = None,
        ...,
    ) -> None:
        pass
```

## The Difference between Transforms and Preprocessing

In the `Ann2DataAbstract` class, the distinction between preprocessing and transforming data is crucial for managing data flow.

- **Preprocessing**: This step involves preparing the `AnnData` objects before they are used in the main analysis or modeling.

- **Transforming**: Transformation operations are applied to each `AnnData` object individually after splitting the data into smaller blocks.


![Data Processing Workflow](example_data/diag.png)


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from geome import transforms
import squidpy as sq
from anndata import AnnData

## Load data
First, let's load the data and see what it looks like. In this example assume that we want to split by these categories specified in `adata.obs["Cluster"]`.


In [3]:

adata = sq.datasets.mibitof()

## Transforms

before we head onto transforming our adata lets simplify it to see the effect better

In [4]:
adata

AnnData object with n_obs × n_vars = 3309 × 36
    obs: 'row_num', 'point', 'cell_id', 'X1', 'center_rowcoord', 'center_colcoord', 'cell_size', 'category', 'donor', 'Cluster', 'batch', 'library_id'
    var: 'mean-0', 'std-0', 'mean-1', 'std-1', 'mean-2', 'std-2'
    uns: 'Cluster_colors', 'batch_colors', 'neighbors', 'spatial', 'umap'
    obsm: 'X_scanorama', 'X_umap', 'spatial'
    obsp: 'connectivities', 'distances'

In [5]:
sq_neighbors_args = {"radius": 4.0, "coord_type": "generic"}
adds_edge_index = transforms.AddEdgeIndex(edge_index_key="edge_index", edge_weight_key="edge_weight", func_args=sq_neighbors_args, spatial_key="spatial", key_added="added")

As the name suggests this object is expected to add edge index to uns of adata.

In [6]:
adds_edge_index(adata)

AnnData object with n_obs × n_vars = 3309 × 36
    obs: 'row_num', 'point', 'cell_id', 'X1', 'center_rowcoord', 'center_colcoord', 'cell_size', 'category', 'donor', 'Cluster', 'batch', 'library_id'
    var: 'mean-0', 'std-0', 'mean-1', 'std-1', 'mean-2', 'std-2'
    uns: 'Cluster_colors', 'batch_colors', 'neighbors', 'spatial', 'umap', 'added_neighbors', 'edge_index', 'edge_weight'
    obsm: 'X_scanorama', 'X_umap', 'spatial'
    obsp: 'connectivities', 'distances', 'added_connectivities', 'added_distances'

In [7]:
multiple_transforms = transforms.Compose(  # you can get creative with this
    [
        transforms.AddAdjMatrix(func_args=sq_neighbors_args, key_added="added2", spatial_key="spatial"),
        transforms.AddEdgeIndexFromAdj(adj_matrix_loc="obsp/added2_connectivities", edge_index_key="edge_index2", edge_weight_key="edge_weight2"),
    ]
)

In [8]:
res = multiple_transforms(adata)
res

AnnData object with n_obs × n_vars = 3309 × 36
    obs: 'row_num', 'point', 'cell_id', 'X1', 'center_rowcoord', 'center_colcoord', 'cell_size', 'category', 'donor', 'Cluster', 'batch', 'library_id'
    var: 'mean-0', 'std-0', 'mean-1', 'std-1', 'mean-2', 'std-2'
    uns: 'Cluster_colors', 'batch_colors', 'neighbors', 'spatial', 'umap', 'added_neighbors', 'edge_index', 'edge_weight', 'added2_neighbors', 'edge_index2', 'edge_weight2'
    obsm: 'X_scanorama', 'X_umap', 'spatial'
    obsp: 'connectivities', 'distances', 'added_connectivities', 'added_distances', 'added2_connectivities', 'added2_distances'