Skip to content

weitzela/reducekappa

Repository files navigation

reducekappa

The reduceKappa_wrapper function reduces redundant pathway enrichment results by clustering similar categories based on their shared genes using kappa scores as described in the Metascape paper. It selects representative terms for each cluster and retains relevant pathway information. The function requires information from four columns to perform the full clustering process: 1) unique geneset ID, 2) descriptive name of the geneset, 3) genes returned with that category, and 4) the significance of the category. Check the default arguments and change accordingly for your dataframe.

There are various ways to use the function to return full or minimal information for further interpretation. To follow this example, use the package data file pathway_res. This file contains two trials of pathway enrichment, with the data_label column indicating the trial.

# install.packages("pak")
# pak::pak("weitzela/reducekappa")

library(reducekappa)
library(dplyr)
data(pathway_res)

Example 1: Reducing Redundant Pathways for a Single Trial
This example processes enrichment results from a single trial (identified by data_label == “G”) and removes redundant pathways while keeping the most significant category per cluster as the representative term. If you want to retain all results, do not include the filter_representative argument in the function call.

pathway_reduce = pathway_res |> 
  filter(data_label == "G", FDR < 0.05) |> 
  reduceKappa_wrapper(filter_representative = TRUE)
pathway_reduce |> 
  head(5) |> 
  mutate(across(matches("Genes.|sig_"), ~ stringr::str_trunc(.x, 15))) |> 
  knitr::kable(digits = 100)
data_label cluster cluster_id cluster_term cluster_size Geneset.ID Description Genes.Returned P.value FDR Genes.Returned_inCluster
G 1 GO:0005759 mitochondrial matrix 3 GO:0005759 mitochondrial matrix ENSRNOG00000… 1.911564e-47 1.211932e-44 ENSRNOG00000…
G 2 GO:0016054 organic acid catabolic process 13 GO:0016054 organic acid catabolic process ENSRNOG00000… 4.398785e-24 2.266694e-20 ENSRNOG00000…
G 3 GO:0045333 cellular respiration 23 GO:0045333 cellular respiration ENSRNOG00000… 2.807665e-17 4.822633e-14 ENSRNOG00000…
G 4 GO:0007005 mitochondrion organization 1 GO:0007005 mitochondrion organization ENSRNOG00000… 2.340259e-13 1.004946e-10 ENSRNOG00000…
G 5 GO:0009063 cellular amino acid catabolic process 3 GO:0009063 cellular amino acid catabolic process ENSRNOG00000… 3.334101e-11 1.010625e-08 ENSRNOG00000…

Check out potentially helpful attributes attached to the data objects

attributes(pathway_reduce) |> names()

## [1] "class"               "row.names"           "names"              
## [4] "n_og_terms"          "n_reduced_terms"     "genes_in_cluster_df"
## [7] "cluster_info"

attr(pathway_reduce, "cluster_info") |> head(5) |> knitr::kable()
cluster cluster_id cluster_term Geneset.ID Description cluster_size
1 GO:0005759 mitochondrial matrix GO:0005759 mitochondrial matrix 3
1 GO:0005759 mitochondrial matrix GO:0098798 mitochondrial protein-containing complex 3
1 GO:0005759 mitochondrial matrix GO:0000313 organellar ribosome 3
2 GO:0016054 organic acid catabolic process GO:0016054 organic acid catabolic process 13
2 GO:0016054 organic acid catabolic process GO:0044282 small molecule catabolic process 13
attr(pathway_reduce, "genes_in_cluster_df") |> head(5) |> knitr::kable()
cluster_id cluster_term Genes.Returned
GO:0005759 mitochondrial matrix ENSRNOG00000002840
GO:0005759 mitochondrial matrix ENSRNOG00000017032
GO:0005759 mitochondrial matrix ENSRNOG00000006930
GO:0005759 mitochondrial matrix ENSRNOG00000024128
GO:0005759 mitochondrial matrix ENSRNOG00000006375

Example 2: Clustering Across Multiple Trials
To create consistent clusters across multiple pathway enrichment trials, we use all significant pathway-associated genes as input information. Example of how to create clusters that apply to multiple pathway enrichment trials by using all of the genes returned by significant categories as input information.

pathway_reduce = pathway_res |> 
  filter(FDR < 0.05) |> 
  # set group_slice to retain one row per remaining signficant cluster for each group
  reduceKappa_wrapper(group_slice = "data_label", 
                      geneset_id_col = "Geneset.ID", gene_col = "Genes.Returned", 
                      sig_col = "P.value", descrip_col = "Description")
pathway_reduce |> 
  slice_head(n = 3, by = "data_label") |> 
  mutate(across(matches("Genes.|sig_"), ~ stringr::str_trunc(.x, 15))) |> 
  knitr::kable(digits = 100)
data_label cluster cluster_id cluster_term cluster_size Geneset.ID Description Genes.Returned P.value FDR Genes.Returned_inCluster
C 1 GO:0007606 sensory perception of chemical stimulus 4 GO:0007606 sensory perception of chemical stimulus ENSRNOG00000… 7.134595e-21 3.616526e-17 ENSRNOG00000…
C 2 GO:0050906 detection of stimulus involved in sensory perception 1 GO:0050906 detection of stimulus involved in sensory perception ENSRNOG00000… 3.733342e-20 9.462156e-17 ENSRNOG00000…
C 3 GO:0015629 actin cytoskeleton 4 GO:0015629 actin cytoskeleton ENSRNOG00000… 1.174869e-12 4.248438e-10 ENSRNOG00000…
G 21 GO:0005759 mitochondrial matrix 3 GO:0005759 mitochondrial matrix ENSRNOG00000… 1.911564e-47 1.211932e-44 ENSRNOG00000…
G 22 GO:0016054 organic acid catabolic process 12 GO:0016054 organic acid catabolic process ENSRNOG00000… 4.398785e-24 2.266694e-20 ENSRNOG00000…
G 23 GO:0045333 cellular respiration 23 GO:0045333 cellular respiration ENSRNOG00000… 2.807665e-17 4.822633e-14 ENSRNOG00000…

Example 3: Comparing pathway results across trials (retaining insignificant pathways in cases that any trial returns them as significant)
To compare pathway enrichment results across trials while retaining information about non-significant pathways, follow the following approach. It keeps all pathways that are significant in at least one trial, even if they are non-significant in others

pathway_reduce = pathway_res |> 
  # retain all categories that are significant in either "data_label" trial.
  (\(x) filter(x, Geneset.ID %in% (filter(x, FDR < 0.05) |> pull(Geneset.ID))))() |> 
  # create a new column that includes only the genes that return from significant categories
  mutate(sig_pathway_genes = ifelse(FDR < 0.05, Genes.Returned, NA)) |> 
  reduceKappa_wrapper(gene_col = "sig_pathway_genes")

pathway_reduce |> 
  head(5) |> 
  mutate(across(matches("Genes.|sig_"), ~ stringr::str_trunc(.x, 15))) |> 
  knitr::kable(digits = 100)
data_label cluster cluster_id cluster_term cluster_size Geneset.ID Description Genes.Returned P.value FDR sig_pathway_genes
C 1 GO:0007606 sensory perception of chemical stimulus 4 GO:0007606 sensory perception of chemical stimulus ENSRNOG00000… 7.134595e-21 3.616526e-17 ENSRNOG00000…
C 2 GO:0050906 detection of stimulus involved in sensory perception 1 GO:0050906 detection of stimulus involved in sensory perception ENSRNOG00000… 3.733342e-20 9.462156e-17 ENSRNOG00000…
C 1 GO:0007606 sensory perception of chemical stimulus 4 hsa04740 Olfactory transduction ENSRNOG00000… 9.081453e-17 1.680069e-14 ENSRNOG00000…
C 1 GO:0007606 sensory perception of chemical stimulus 4 GO:0007608 sensory perception of smell ENSRNOG00000… 5.624818e-17 9.504067e-14 ENSRNOG00000…
C 3 GO:0015629 actin cytoskeleton 4 GO:0015629 actin cytoskeleton ENSRNOG00000… 1.174869e-12 4.248438e-10 ENSRNOG00000…
# set group_slice = "data_label" to keep the most significant category in the cluster per group
pathway_reduce = pathway_res |> 
  # retain all categories that are significant in either "data_label" trial.
  (\(x) filter(x, Geneset.ID %in% (filter(x, FDR < 0.05) |> pull(Geneset.ID))))() |> 
  # create a new column that includes only the genes that return from significant categories
  mutate(sig_pathway_genes = ifelse(FDR < 0.05, Genes.Returned, NA)) |> 
  reduceKappa_wrapper(gene_col = "sig_pathway_genes", group_slice = "data_label")

pathway_reduce |> 
  slice_head(n = 3, by = "data_label") |> 
  mutate(across(matches("Genes.|sig_"), ~ stringr::str_trunc(.x, 15))) |> 
  knitr::kable(digits = 100)
data_label cluster cluster_id cluster_term cluster_size Geneset.ID Description Genes.Returned P.value FDR sig_pathway_genes sig_pathway_genes_inCluster
C 1 GO:0007606 sensory perception of chemical stimulus 4 GO:0007606 sensory perception of chemical stimulus ENSRNOG00000… 7.134595e-21 3.616526e-17 ENSRNOG00000… ENSRNOG00000…
C 2 GO:0050906 detection of stimulus involved in sensory perception 1 GO:0050906 detection of stimulus involved in sensory perception ENSRNOG00000… 3.733342e-20 9.462156e-17 ENSRNOG00000… ENSRNOG00000…
C 3 GO:0015629 actin cytoskeleton 4 GO:0015629 actin cytoskeleton ENSRNOG00000… 1.174869e-12 4.248438e-10 ENSRNOG00000… ENSRNOG00000…
G 21 GO:0005759 mitochondrial matrix 3 GO:0005759 mitochondrial matrix ENSRNOG00000… 1.911564e-47 1.211932e-44 ENSRNOG00000… ENSRNOG00000…
G 22 GO:0016054 organic acid catabolic process 12 GO:0016054 organic acid catabolic process ENSRNOG00000… 4.398785e-24 2.266694e-20 ENSRNOG00000… ENSRNOG00000…
G 23 GO:0045333 cellular respiration 23 GO:0045333 cellular respiration ENSRNOG00000… 2.807665e-17 4.822633e-14 ENSRNOG00000… ENSRNOG00000…

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages