In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import logging

import scipy.stats
import numpy as np

# Simulate data

Here we simualte data from 40 groups, in a perturbation experiment this would correspond to different genetic groups, ie. different perturbations.

In [None]:
from batchglm.api.models.nb_glm import Simulator

sim = Simulator(num_observations=400, num_features=20)
sim.generate_sample_description(num_batches=40, num_conditions=0)
# sample parameters from positive truncated N(1, 0.1)
mu=1; phi=0.1
sim.generate_params(rand_fn_loc=lambda size: mu + scipy.stats.truncnorm.rvs(-mu / phi, np.infty, scale=phi, size=size))
sim.generate_data()

# count data
X = sim.X
# sample description
sample_description = sim.sample_description

# Run differential expression test:

The pairwise test allows to perform pairwise comparisons between a set of groups.

It needs a parameter `grouping` which assigns a group to each sample.
This `grouping` can either be a vector of length `num_observations` or a string specifying a column in the sample description.
Since we simulated `grouping` with `num_batches=40` different groups, the pairwise test matrix would be very large with $40*40 = 1600$ comparisons per gene. 

To avoid creating a matrix with all comparisons, we chose the z-test with the option lazy=True, so that the comparisons are not all executed but are only evaluated upon request of specific pairs (lazy). This is very fast for the z-test once the model was fit.

In [None]:
logging.getLogger("tensorflow").setLevel(logging.ERROR)
logging.getLogger("batchglm").setLevel(logging.INFO)
logging.getLogger("diffxpy").setLevel(logging.INFO)

import diffxpy.api as de

test = de.test.pairwise(
    data=X,
    grouping="batch",
    test="z-test",
    lazy=True,
    noise_model="nb",
    quick_scale=False,
    sample_description=sample_description
)


# Obtaining the results

## Enquire sets of pairwise comparisons

We can now extract specific pairs of comparisons, for example perturbation 0 (negative control) versus all other perturbations:

In [None]:
test.pval_pairs(groups0='0')

In [None]:
test.pval_pairs(groups0='0').shape

Note that leaving group1 to the default resulted in a comparison of group 0 against all groups. The output array has shape 1 (length groups0) x 40 (length groups1) x 20 (number of genes).

One can look at a gene-wise summary table which contains summaries across all tests for each gene, i.e. maximum test values across all test for each gene (refer to the documenation of test.summary_pairs):

In [None]:
test.summary_pairs(groups0='0').iloc[:10,:]

## Enquire specific a specific pair of groups

Alternatively, you could speicfically enquire a comparison between group 3 and 6:

In [None]:
test.pval_pairs(groups0='3', groups1='6')

In [None]:
test.pval_pairs(groups0='3', groups1='6').shape

Similarly, you can also look at summary tables of your selection:

In [None]:
test.log_fold_change_pairs(groups0='3', groups1='6')

In [None]:
test.summary_pair(group0='3', group1='6').iloc[:10,:]