The reduceKappa_wrapper function reduces redundant pathway enrichment
results by clustering similar categories based on their shared genes
using kappa scores as described in the Metascape
paper. It selects
representative terms for each cluster and retains relevant pathway
information. The function requires information from four columns to
perform the full clustering process: 1) unique geneset ID, 2)
descriptive name of the geneset, 3) genes returned with that category,
and 4) the significance of the category. Check the default arguments and
change accordingly for your dataframe.
There are various ways to use the function to return full or minimal
information for further interpretation. To follow this example, use the
package data file pathway_res. This file contains two trials of
pathway enrichment, with the data_label column indicating the trial.
# install.packages("pak")
# pak::pak("weitzela/reducekappa")
library(reducekappa)
library(dplyr)
data(pathway_res)
Example 1: Reducing Redundant Pathways for a Single Trial
This example processes enrichment results from a single trial
(identified by data_label == “G”) and removes redundant pathways while
keeping the most significant category per cluster as the representative
term. If you want to retain all results, do not include the
filter_representative argument in the function call.
pathway_reduce = pathway_res |>
filter(data_label == "G", FDR < 0.05) |>
reduceKappa_wrapper(filter_representative = TRUE)
pathway_reduce |>
head(5) |>
mutate(across(matches("Genes.|sig_"), ~ stringr::str_trunc(.x, 15))) |>
knitr::kable(digits = 100)
| data_label | cluster | cluster_id | cluster_term | cluster_size | Geneset.ID | Description | Genes.Returned | P.value | FDR | Genes.Returned_inCluster |
|---|---|---|---|---|---|---|---|---|---|---|
| G | 1 | GO:0005759 | mitochondrial matrix | 3 | GO:0005759 | mitochondrial matrix | ENSRNOG00000… | 1.911564e-47 | 1.211932e-44 | ENSRNOG00000… |
| G | 2 | GO:0016054 | organic acid catabolic process | 13 | GO:0016054 | organic acid catabolic process | ENSRNOG00000… | 4.398785e-24 | 2.266694e-20 | ENSRNOG00000… |
| G | 3 | GO:0045333 | cellular respiration | 23 | GO:0045333 | cellular respiration | ENSRNOG00000… | 2.807665e-17 | 4.822633e-14 | ENSRNOG00000… |
| G | 4 | GO:0007005 | mitochondrion organization | 1 | GO:0007005 | mitochondrion organization | ENSRNOG00000… | 2.340259e-13 | 1.004946e-10 | ENSRNOG00000… |
| G | 5 | GO:0009063 | cellular amino acid catabolic process | 3 | GO:0009063 | cellular amino acid catabolic process | ENSRNOG00000… | 3.334101e-11 | 1.010625e-08 | ENSRNOG00000… |
Check out potentially helpful attributes attached to the data objects
attributes(pathway_reduce) |> names()
## [1] "class" "row.names" "names"
## [4] "n_og_terms" "n_reduced_terms" "genes_in_cluster_df"
## [7] "cluster_info"
attr(pathway_reduce, "cluster_info") |> head(5) |> knitr::kable()
| cluster | cluster_id | cluster_term | Geneset.ID | Description | cluster_size |
|---|---|---|---|---|---|
| 1 | GO:0005759 | mitochondrial matrix | GO:0005759 | mitochondrial matrix | 3 |
| 1 | GO:0005759 | mitochondrial matrix | GO:0098798 | mitochondrial protein-containing complex | 3 |
| 1 | GO:0005759 | mitochondrial matrix | GO:0000313 | organellar ribosome | 3 |
| 2 | GO:0016054 | organic acid catabolic process | GO:0016054 | organic acid catabolic process | 13 |
| 2 | GO:0016054 | organic acid catabolic process | GO:0044282 | small molecule catabolic process | 13 |
attr(pathway_reduce, "genes_in_cluster_df") |> head(5) |> knitr::kable()
| cluster_id | cluster_term | Genes.Returned |
|---|---|---|
| GO:0005759 | mitochondrial matrix | ENSRNOG00000002840 |
| GO:0005759 | mitochondrial matrix | ENSRNOG00000017032 |
| GO:0005759 | mitochondrial matrix | ENSRNOG00000006930 |
| GO:0005759 | mitochondrial matrix | ENSRNOG00000024128 |
| GO:0005759 | mitochondrial matrix | ENSRNOG00000006375 |
Example 2: Clustering Across Multiple Trials
To create consistent clusters across multiple pathway enrichment trials,
we use all significant pathway-associated genes as input information.
Example of how to create clusters that apply to multiple pathway
enrichment trials by using all of the genes returned by significant
categories as input information.
pathway_reduce = pathway_res |>
filter(FDR < 0.05) |>
# set group_slice to retain one row per remaining signficant cluster for each group
reduceKappa_wrapper(group_slice = "data_label",
geneset_id_col = "Geneset.ID", gene_col = "Genes.Returned",
sig_col = "P.value", descrip_col = "Description")
pathway_reduce |>
slice_head(n = 3, by = "data_label") |>
mutate(across(matches("Genes.|sig_"), ~ stringr::str_trunc(.x, 15))) |>
knitr::kable(digits = 100)
| data_label | cluster | cluster_id | cluster_term | cluster_size | Geneset.ID | Description | Genes.Returned | P.value | FDR | Genes.Returned_inCluster |
|---|---|---|---|---|---|---|---|---|---|---|
| C | 1 | GO:0007606 | sensory perception of chemical stimulus | 4 | GO:0007606 | sensory perception of chemical stimulus | ENSRNOG00000… | 7.134595e-21 | 3.616526e-17 | ENSRNOG00000… |
| C | 2 | GO:0050906 | detection of stimulus involved in sensory perception | 1 | GO:0050906 | detection of stimulus involved in sensory perception | ENSRNOG00000… | 3.733342e-20 | 9.462156e-17 | ENSRNOG00000… |
| C | 3 | GO:0015629 | actin cytoskeleton | 4 | GO:0015629 | actin cytoskeleton | ENSRNOG00000… | 1.174869e-12 | 4.248438e-10 | ENSRNOG00000… |
| G | 21 | GO:0005759 | mitochondrial matrix | 3 | GO:0005759 | mitochondrial matrix | ENSRNOG00000… | 1.911564e-47 | 1.211932e-44 | ENSRNOG00000… |
| G | 22 | GO:0016054 | organic acid catabolic process | 12 | GO:0016054 | organic acid catabolic process | ENSRNOG00000… | 4.398785e-24 | 2.266694e-20 | ENSRNOG00000… |
| G | 23 | GO:0045333 | cellular respiration | 23 | GO:0045333 | cellular respiration | ENSRNOG00000… | 2.807665e-17 | 4.822633e-14 | ENSRNOG00000… |
Example 3: Comparing pathway results across trials (retaining
insignificant pathways in cases that any trial returns them as
significant)
To compare pathway enrichment results across trials while retaining
information about non-significant pathways, follow the following
approach. It keeps all pathways that are significant in at least one
trial, even if they are non-significant in others
pathway_reduce = pathway_res |>
# retain all categories that are significant in either "data_label" trial.
(\(x) filter(x, Geneset.ID %in% (filter(x, FDR < 0.05) |> pull(Geneset.ID))))() |>
# create a new column that includes only the genes that return from significant categories
mutate(sig_pathway_genes = ifelse(FDR < 0.05, Genes.Returned, NA)) |>
reduceKappa_wrapper(gene_col = "sig_pathway_genes")
pathway_reduce |>
head(5) |>
mutate(across(matches("Genes.|sig_"), ~ stringr::str_trunc(.x, 15))) |>
knitr::kable(digits = 100)
| data_label | cluster | cluster_id | cluster_term | cluster_size | Geneset.ID | Description | Genes.Returned | P.value | FDR | sig_pathway_genes |
|---|---|---|---|---|---|---|---|---|---|---|
| C | 1 | GO:0007606 | sensory perception of chemical stimulus | 4 | GO:0007606 | sensory perception of chemical stimulus | ENSRNOG00000… | 7.134595e-21 | 3.616526e-17 | ENSRNOG00000… |
| C | 2 | GO:0050906 | detection of stimulus involved in sensory perception | 1 | GO:0050906 | detection of stimulus involved in sensory perception | ENSRNOG00000… | 3.733342e-20 | 9.462156e-17 | ENSRNOG00000… |
| C | 1 | GO:0007606 | sensory perception of chemical stimulus | 4 | hsa04740 | Olfactory transduction | ENSRNOG00000… | 9.081453e-17 | 1.680069e-14 | ENSRNOG00000… |
| C | 1 | GO:0007606 | sensory perception of chemical stimulus | 4 | GO:0007608 | sensory perception of smell | ENSRNOG00000… | 5.624818e-17 | 9.504067e-14 | ENSRNOG00000… |
| C | 3 | GO:0015629 | actin cytoskeleton | 4 | GO:0015629 | actin cytoskeleton | ENSRNOG00000… | 1.174869e-12 | 4.248438e-10 | ENSRNOG00000… |
# set group_slice = "data_label" to keep the most significant category in the cluster per group
pathway_reduce = pathway_res |>
# retain all categories that are significant in either "data_label" trial.
(\(x) filter(x, Geneset.ID %in% (filter(x, FDR < 0.05) |> pull(Geneset.ID))))() |>
# create a new column that includes only the genes that return from significant categories
mutate(sig_pathway_genes = ifelse(FDR < 0.05, Genes.Returned, NA)) |>
reduceKappa_wrapper(gene_col = "sig_pathway_genes", group_slice = "data_label")
pathway_reduce |>
slice_head(n = 3, by = "data_label") |>
mutate(across(matches("Genes.|sig_"), ~ stringr::str_trunc(.x, 15))) |>
knitr::kable(digits = 100)
| data_label | cluster | cluster_id | cluster_term | cluster_size | Geneset.ID | Description | Genes.Returned | P.value | FDR | sig_pathway_genes | sig_pathway_genes_inCluster |
|---|---|---|---|---|---|---|---|---|---|---|---|
| C | 1 | GO:0007606 | sensory perception of chemical stimulus | 4 | GO:0007606 | sensory perception of chemical stimulus | ENSRNOG00000… | 7.134595e-21 | 3.616526e-17 | ENSRNOG00000… | ENSRNOG00000… |
| C | 2 | GO:0050906 | detection of stimulus involved in sensory perception | 1 | GO:0050906 | detection of stimulus involved in sensory perception | ENSRNOG00000… | 3.733342e-20 | 9.462156e-17 | ENSRNOG00000… | ENSRNOG00000… |
| C | 3 | GO:0015629 | actin cytoskeleton | 4 | GO:0015629 | actin cytoskeleton | ENSRNOG00000… | 1.174869e-12 | 4.248438e-10 | ENSRNOG00000… | ENSRNOG00000… |
| G | 21 | GO:0005759 | mitochondrial matrix | 3 | GO:0005759 | mitochondrial matrix | ENSRNOG00000… | 1.911564e-47 | 1.211932e-44 | ENSRNOG00000… | ENSRNOG00000… |
| G | 22 | GO:0016054 | organic acid catabolic process | 12 | GO:0016054 | organic acid catabolic process | ENSRNOG00000… | 4.398785e-24 | 2.266694e-20 | ENSRNOG00000… | ENSRNOG00000… |
| G | 23 | GO:0045333 | cellular respiration | 23 | GO:0045333 | cellular respiration | ENSRNOG00000… | 2.807665e-17 | 4.822633e-14 | ENSRNOG00000… | ENSRNOG00000… |