In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import logging
import scipy.stats
import numpy as np

import diffxpy.api as de

# Simulate data

Here we simualte data from 40 groups, in a perturbation experiment this would correspond to different genetic groups, ie. different perturbations.

In [2]:
from batchglm.api.models.glm_nb import Simulator

sim = Simulator(num_observations=2000, num_features=20)
sim.generate_sample_description(num_batches=0, num_conditions=50)
sim.generate_params()
sim.generate_data()

# Run differential expression test:

The pairwise test allows to perform pairwise comparisons between a set of groups.

It needs a parameter `grouping` which assigns a group to each sample.
This `grouping` can either be a vector of length `num_observations` or a string specifying a column in the sample description.
Since we simulated `grouping` with `num_batches=40` different groups, the pairwise test matrix would be very large with $40*40 = 1600$ comparisons per gene. 

To avoid creating a matrix with all comparisons, we chose the z-test with the option lazy=True, so that the comparisons are not all executed but are only evaluated upon request of specific pairs (lazy). This is very fast for the z-test once the model was fit.

In [3]:
logging.getLogger("tensorflow").setLevel(logging.ERROR)
logging.getLogger("batchglm").setLevel(logging.INFO)
logging.getLogger("diffxpy").setLevel(logging.INFO)

import diffxpy.api as de

test = de.test.pairwise(
    data=sim.X,
    grouping="condition",
    test="z-test",
    lazy=True,
    noise_model="nb",
    quick_scale=False,
    sample_description=sim.sample_description,
    batch_size=100,
    training_strategy="DEFAULT",
    dtype="float64"
)


INFO:diffxpy.testing.base:Fitting model...


  return _inspect.getargspec(target)
  return _inspect.getargspec(target)
  return _inspect.getargspec(target)
  return _inspect.getargspec(target)
  self._theta_sd = np.sqrt(np.diagonal(model_estim.fisher_inv, axis1=-2, axis2=-1)).T


# Obtaining the results

## Enquire sets of pairwise comparisons

We can now extract specific pairs of comparisons, for example perturbation 0 (negative control) versus all other perturbations:

In [4]:
test.pval_pairs(groups0='0').shape

  return (self.a < x) & (x < self.b)
  return (self.a < x) & (x < self.b)
  cond2 = (x >= self.b) & cond0


(1, 50, 20)

Note that leaving group1 to the default resulted in a comparison of group 0 against all groups. The output array has shape 1 (length groups0) x 40 (length groups1) x 20 (number of genes).

One can look at a gene-wise summary table which contains summaries across all tests for each gene, i.e. maximum test values across all test for each gene (refer to the documenation of test.summary_pairs):

In [5]:
test.summary_pairs(groups0='0').iloc[:10,:]

  return (self.a < x) & (x < self.b)
  return (self.a < x) & (x < self.b)
  cond2 = (x >= self.b) & cond0


Unnamed: 0,gene,pval,qval,log2fc,mean
0,0,2.66858e-05,0.0003978973,1.013137,638.3275
1,1,,,0.950364,489.2615
2,2,3.411718e-09,4.260383e-07,0.852783,667.756
3,3,4.732726e-06,0.0001181998,0.98311,570.1865
4,4,0.0001490201,0.001431452,0.699208,645.81
5,5,8.07288e-09,7.331643e-07,1.303666,634.893
6,6,0.0008365217,0.005255882,1.047486,638.409
7,7,2.182553e-07,1.147563e-05,1.367049,606.5365
8,8,1.320851e-08,1.078521e-06,0.631605,600.0685
9,9,3.033964e-10,7.577326e-08,1.403712,646.724


## Enquire specific a specific pair of groups

Alternatively, you could speicfically enquire a comparison between group 3 and 6:

In [6]:
test.pval_pairs(groups0='3', groups1='6')

array([[[5.97779647e-03, 1.35173718e-07, 8.23822164e-06, 7.28436147e-01,
         2.59056038e-01, 1.28729135e-02, 1.13659330e-01, 2.67091746e-03,
         4.75259172e-03, 1.66697946e-02, 3.53544729e-01, 2.39231748e-01,
         1.35571109e-01, 8.73283311e-01, 5.41368237e-01, 8.79148885e-05,
         2.37785391e-09, 2.26118205e-06, 4.08627777e-01, 4.53535484e-01]]])

In [7]:
test.pval_pairs(groups0='3', groups1='6').shape

(1, 1, 20)

Similarly, you can also look at summary tables of your selection:

In [8]:
test.log_fold_change_pairs(groups0='3', groups1='6')

array([[[-0.44280891, -1.34214846, -0.84676601,  0.08860507,
          0.19500511, -0.51743878, -0.27103217, -0.53845934,
          0.5337684 , -0.42971055, -0.27820667,  0.19151138,
         -0.21618617,  0.03981095,  0.0919207 , -0.72986699,
          0.82566278,  0.82933637, -0.15624134, -0.18243065]]])

In [9]:
test.summary_pair(group0='3', group1='6').iloc[:10,:]

Unnamed: 0,gene,pval,qval,log2fc,mean
0,0,0.005977796,0.014944,-0.638838,638.3275
1,1,1.351737e-07,1e-06,-1.936311,489.2615
2,2,8.238222e-06,4.1e-05,-1.221625,667.756
3,3,0.7284361,0.766775,0.12783,570.1865
4,4,0.259056,0.37008,0.281333,645.81
5,5,0.01287291,0.028606,-0.746506,634.893
6,6,0.1136593,0.206653,-0.391017,638.409
7,7,0.002670917,0.008903,-0.776833,606.5365
8,8,0.004752592,0.013579,0.770065,600.0685
9,9,0.01666979,0.03334,-0.619941,646.724
