### Augurpy

This is a short tutorial demonstrating augurpy. 

In [1]:
import scanpy as sc

from augurpy.estimator import Params, create_estimator
from augurpy.evaluate import calculate_auc
from augurpy.read_load import load





First we import the data that we want to work with. This can either be an anndata object, a dataframe containing cell type labels as well as conditions for each cell or data contained in a dataframe with corresponding meta data containing cell type labels and conditions. Here we use scanpy to read the simulated sample anndata set contained in augurpy. Then we load this data into the necessary format for augurpy. 

In [2]:
# import sample simulation data
adata = sc.read_h5ad("../tests/sc_sim.h5ad")

loaded_data = load(adata)



In [3]:
loaded_data

AnnData object with n_obs × n_vars = 600 × 15697
    obs: 'label', 'cell_type', 'y_treatment'
    var: 'name', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'hvg'

Next we choose the estimator used to measure how predictable the individual cell type data sets are. Choose `random_forest_classifier` or `logistic_regression_classifier` for categorical data and `random_forest_regressor` for numerical data. 

In [4]:
random_forest = create_estimator("random_forest_classifier", Params(random_state=42))

Then we run augurpy with the function `calculate_auc` and look at the results. 

In [5]:
result_adata, results = calculate_auc(loaded_data, random_forest, random_state=51)

print(results['summary_metrics'])

Output()

                  CellTypeA  CellTypeB  CellTypeC
mean_augur_score   0.486111   0.666667   0.791667
mean_auc           0.486111   0.666667   0.791667


The corresponding `mean_augur_score` is also saved in `result_adata.obs`. The feature importances can be found in `results['feature_importances']` and used for further analysis. 
We recommend that you try out multiple estimators depending on the data used. 