CACONET is a computational framework that can be used to distinguish between "diseased" and "healthy" microbial correlation networks inferred from relative abundance data. It can also be used to identify potential signature interactions characteristic of the networks, offering possible targets for further biological and clinical research. CACONET consists of an inference component for compositional-aware correlation inference, a classification component for the classification of the correlation networks, and an explanation component for extraction of signature interactions from the classifier. Please refer to the publication.
Currently, CACONET supports binary classification. Suppose samples
from diseased and healthy subjects were collected and sequenced for
microbiome profiling, resulting in OTU tables containing relative taxa
abundances, X_0
and X_1
for healthy and disease, respectively. An
example demonstrating data preparation using 16S rRNA amplicon
sequencing of CRC fecal microbiome, downloaded from the The
Microbiome Quality Control project,
can be found in mbqc_baseline_data.R
in the MBQC
directory.
Correlation network inference is done separately for X_0
and X_1
,
using a hierarchical Bayesian method,
BAnOCC. Users are
advised to refer to the original publication and relevant
tutorials for setting suitable
parameter values for their own data. Example of using the above real
data to obtain posterior correlation networks corresponding to healthy
and diseased samples is provided in banocc_mbqc.R
CACONET then performs graph-level classification using the combined posterior correlation networks from the inference step, thereby incorporating posterior uncertainty. The DGCNN algorithm was used and implemented with the Python library StellarGraph.
A greedy algorithm was used to find the top n
most important nodes
that best differentiate the correlation networks. We can then extract
these nodes from the result of the inference step and visually examine
their associations. The file dgcnn_mbqc_nnode.py
contains modules
for classification and explanation. The file process_output_mbqc.R
contains several useful functions for visualization and diagnostics.