This code loads an scapny object and extracts the normalized gene expression matrix for CellphoneDB.
Optionally, it also generates the metadata table.


**Input:** 
1. Path to the scanpy object. The scanpy anndata object should contain:
    - `adata.uns["cellphoneDB"]` or `adata.raw.X`
        - `adata.uns["cellphoneDB"]` should be a filtered and normalized gene expression matrix where cells are **rows** and genes **columns**.
        - `adata.raw.X` should be a filtered and normalized gene expression matrix where cells are **rows** and genes **columns**.
    - `adata.obs` containing cell ids as index. Cell ids should match the ids used in the metadata file.
    - `adata.raw.var` containing gene ids as index. These should be either Ensembl IDs or gene names.
2. Optional. The name of the `adata.obs` column containg the cell-cluster classification.


**Output**
1. A pandas dataframe containing the gene expression values, with cells as columns and genes as rows (`df_expr_matrix`).
2. Optional.  A pandas dataframe containing the metadata. The format of the metadata is a two columns dataframe where the first column is the cell id and the second column the cluster id (`df_meta`).

In [12]:
import pandas as pd
import scanpy as sc
import anndata
import numpy as np

In [13]:
# normalised but not log transformed data 
path_to_scanpy = "./adata_activation_for_cellphoneDB.h5ad"
celltype_variable_name = 'annotation'

### Extract count matrix

Load the scanpy object and extract normalized matrix, transpose it and format it as pandas dataframe

In [14]:
adata = sc.read(path_to_scanpy)

if 'cellphoneDB' in adata.uns:
    print('using adata.uns["cellphoneDB"] as expression matrix')
    df_expr_matrix = adata.uns["cellphoneDB"]
    df_expr_matrix = df_expr_matrix.T
    df_expr_matrix = pd.DataFrame(df_expr_matrix.toarray())
    # Set cell ids as column index
    df_expr_matrix.columns = adata.obs.index
    # Set gene ids as row index
    df_expr_matrix.set_index(adata.raw.var.index, inplace=True)
else:
    print('adata.uns["cellphoneDB"] not found\nUsing adata.raw.X matrix. This assumes cell-normalized NOT log transformed data.')
    adata_count = anndata.AnnData(X=adata.raw.X, var=adata.raw.var, obs=adata.obs)
    t = adata_count.X.toarray().T
    # Set cell ids as column index and gene ids as row index
    df_expr_matrix = pd.DataFrame(data=t, columns= adata_count.obs.index, index=adata_count.var_names)

adata.uns["cellphoneDB"] not found
Using adata.raw.X matrix. This assumes cell-normalized and log transformed data.


### Extract meta (optional)

In [15]:
df_meta = pd.DataFrame(data={'Cell':list(adata.obs.index),
'cell_type':[ 'celltype_'+str(i) for i in adata.obs[celltype_variable_name]] })
df_meta.set_index('Cell', inplace=True)

## Save 
if needed

In [18]:
savepath_counts = './counts.csv'
df_expr_matrix.to_csv(savepath_counts)
savepath_meta = './meta.tsv'
df_meta.to_csv(savepath_meta, sep = '\t')