# Basics: Transforms and Preprocessing


## The Difference between Transforms and Preprocessing

```python

class Ann2DataAbstract(ABC):
    """Abstract class that transforms an iterable of AnnData to Pytorch Geometric Data objects."""

    def __init__(
        self,
        preprocess: Callable[[AnnData], AnnData] | None = None,
        transform: Callable[[AnnData], AnnData] | None = None,
        ...,
    ) -> None:
        pass
```

## The Difference between Transforms and Preprocessing

In the `Ann2DataAbstract` class, the distinction between preprocessing and transforming data is crucial for managing data flow.

- **Preprocessing**: This step involves preparing the `AnnData` objects before they are used in the main analysis or modeling.

- **Transforming**: Transformation operations are applied to each `AnnData` object individually after splitting the data into smaller blocks.


![Data Processing Workflow](example_data/diag.png)


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from geome import transforms
import squidpy as sq
from anndata import AnnData

## Load data
First, let's load the data and see what it looks like. In this example assume that we want to split by these categories specified in `adata.obs["Cluster"]`.


In [3]:
# Load some simple adata sample
def create_adata():  # noqa: D103
    adata = sq.datasets.mibitof()
    simple_adata = AnnData(adata.X)
    simple_adata.obs["Cluster"] = adata.obs["Cluster"]
    simple_adata.obsp["connectivities"] = adata.obsp["connectivities"]
    return simple_adata


adata = create_adata()

## Transforms

before we head onto transforming our adata lets simplify it to see the effect better

In [4]:
adata

AnnData object with n_obs × n_vars = 3309 × 36
    obs: 'Cluster'
    obsp: 'connectivities'

In [6]:
adds_edge_index = transforms.AddEdgeIndexFromAdj(
    adj_matrix_loc="obsp/connectivities", edge_index_key="edge_index"
)

As the name suggests this object is expected to add edge index to uns of adata.

In [7]:
adds_edge_index(adata)

AnnData object with n_obs × n_vars = 3309 × 36
    obs: 'Cluster'
    uns: 'edge_index'
    obsp: 'connectivities'

In [10]:
multiple_transforms = transforms.Compose(  # you can get creative with this
    [
        transforms.AddEdgeIndexFromAdj(adj_matrix_loc="obsp/connectivities", edge_index_key="edge_index"),
        transforms.AddEdgeWeight(
            weight_matrix_loc="obsp/connectivities",
            edge_index_key="edge_index",
            edge_weight_key="edge_weight",
            overwrite=True,
        ),
    ]
)

In [11]:
res = multiple_transforms(adata)
res.uns["edge_weight"].shape

torch.Size([71500])