reducekappa

The reduceKappa_wrapper function reduces redundant pathway enrichment results by clustering similar categories based on their shared genes using kappa scores as described in the Metascape paper. It selects representative terms for each cluster and retains relevant pathway information. The function requires information from four columns to perform the full clustering process: 1) unique geneset ID, 2) descriptive name of the geneset, 3) genes returned with that category, and 4) the significance of the category. Check the default arguments and change accordingly for your dataframe.

There are various ways to use the function to return full or minimal information for further interpretation. To follow this example, use the package data file pathway_res. This file contains two trials of pathway enrichment, with the data_label column indicating the trial.

# install.packages("pak")
# pak::pak("weitzela/reducekappa")

library(reducekappa)
library(dplyr)
data(pathway_res)

Example 1: Reducing Redundant Pathways for a Single Trial
This example processes enrichment results from a single trial (identified by data_label == “G”) and removes redundant pathways while keeping the most significant category per cluster as the representative term. If you want to retain all results, do not include the filter_representative argument in the function call.

pathway_reduce = pathway_res |> 
  filter(data_label == "G", FDR < 0.05) |> 
  reduceKappa_wrapper(filter_representative = TRUE)
pathway_reduce |> 
  head(5) |> 
  mutate(across(matches("Genes.|sig_"), ~ stringr::str_trunc(.x, 15))) |> 
  knitr::kable(digits = 100)

data_label	cluster	cluster_id	cluster_term	cluster_size	Geneset.ID	Description	Genes.Returned	P.value	FDR	Genes.Returned_inCluster
G	1	GO:0005759	mitochondrial matrix	3	GO:0005759	mitochondrial matrix	ENSRNOG00000…	1.911564e-47	1.211932e-44	ENSRNOG00000…
G	2	GO:0016054	organic acid catabolic process	13	GO:0016054	organic acid catabolic process	ENSRNOG00000…	4.398785e-24	2.266694e-20	ENSRNOG00000…
G	3	GO:0045333	cellular respiration	23	GO:0045333	cellular respiration	ENSRNOG00000…	2.807665e-17	4.822633e-14	ENSRNOG00000…
G	4	GO:0007005	mitochondrion organization	1	GO:0007005	mitochondrion organization	ENSRNOG00000…	2.340259e-13	1.004946e-10	ENSRNOG00000…
G	5	GO:0009063	cellular amino acid catabolic process	3	GO:0009063	cellular amino acid catabolic process	ENSRNOG00000…	3.334101e-11	1.010625e-08	ENSRNOG00000…

Check out potentially helpful attributes attached to the data objects

attributes(pathway_reduce) |> names()

## [1] "class"               "row.names"           "names"              
## [4] "n_og_terms"          "n_reduced_terms"     "genes_in_cluster_df"
## [7] "cluster_info"

attr(pathway_reduce, "cluster_info") |> head(5) |> knitr::kable()

cluster	cluster_id	cluster_term	Geneset.ID	Description	cluster_size
1	GO:0005759	mitochondrial matrix	GO:0005759	mitochondrial matrix	3
1	GO:0005759	mitochondrial matrix	GO:0098798	mitochondrial protein-containing complex	3
1	GO:0005759	mitochondrial matrix	GO:0000313	organellar ribosome	3
2	GO:0016054	organic acid catabolic process	GO:0016054	organic acid catabolic process	13
2	GO:0016054	organic acid catabolic process	GO:0044282	small molecule catabolic process	13

attr(pathway_reduce, "genes_in_cluster_df") |> head(5) |> knitr::kable()

cluster_id	cluster_term	Genes.Returned
GO:0005759	mitochondrial matrix	ENSRNOG00000002840
GO:0005759	mitochondrial matrix	ENSRNOG00000017032
GO:0005759	mitochondrial matrix	ENSRNOG00000006930
GO:0005759	mitochondrial matrix	ENSRNOG00000024128
GO:0005759	mitochondrial matrix	ENSRNOG00000006375

Example 2: Clustering Across Multiple Trials
To create consistent clusters across multiple pathway enrichment trials, we use all significant pathway-associated genes as input information. Example of how to create clusters that apply to multiple pathway enrichment trials by using all of the genes returned by significant categories as input information.

pathway_reduce = pathway_res |> 
  filter(FDR < 0.05) |> 
  # set group_slice to retain one row per remaining signficant cluster for each group
  reduceKappa_wrapper(group_slice = "data_label", 
                      geneset_id_col = "Geneset.ID", gene_col = "Genes.Returned", 
                      sig_col = "P.value", descrip_col = "Description")
pathway_reduce |> 
  slice_head(n = 3, by = "data_label") |> 
  mutate(across(matches("Genes.|sig_"), ~ stringr::str_trunc(.x, 15))) |> 
  knitr::kable(digits = 100)

data_label	cluster	cluster_id	cluster_term	cluster_size	Geneset.ID	Description	Genes.Returned	P.value	FDR	Genes.Returned_inCluster
C	1	GO:0007606	sensory perception of chemical stimulus	4	GO:0007606	sensory perception of chemical stimulus	ENSRNOG00000…	7.134595e-21	3.616526e-17	ENSRNOG00000…
C	2	GO:0050906	detection of stimulus involved in sensory perception	1	GO:0050906	detection of stimulus involved in sensory perception	ENSRNOG00000…	3.733342e-20	9.462156e-17	ENSRNOG00000…
C	3	GO:0015629	actin cytoskeleton	4	GO:0015629	actin cytoskeleton	ENSRNOG00000…	1.174869e-12	4.248438e-10	ENSRNOG00000…
G	21	GO:0005759	mitochondrial matrix	3	GO:0005759	mitochondrial matrix	ENSRNOG00000…	1.911564e-47	1.211932e-44	ENSRNOG00000…
G	22	GO:0016054	organic acid catabolic process	12	GO:0016054	organic acid catabolic process	ENSRNOG00000…	4.398785e-24	2.266694e-20	ENSRNOG00000…
G	23	GO:0045333	cellular respiration	23	GO:0045333	cellular respiration	ENSRNOG00000…	2.807665e-17	4.822633e-14	ENSRNOG00000…

Example 3: Comparing pathway results across trials (retaining insignificant pathways in cases that any trial returns them as significant)
To compare pathway enrichment results across trials while retaining information about non-significant pathways, follow the following approach. It keeps all pathways that are significant in at least one trial, even if they are non-significant in others

pathway_reduce = pathway_res |> 
  # retain all categories that are significant in either "data_label" trial.
  (\(x) filter(x, Geneset.ID %in% (filter(x, FDR < 0.05) |> pull(Geneset.ID))))() |> 
  # create a new column that includes only the genes that return from significant categories
  mutate(sig_pathway_genes = ifelse(FDR < 0.05, Genes.Returned, NA)) |> 
  reduceKappa_wrapper(gene_col = "sig_pathway_genes")

pathway_reduce |> 
  head(5) |> 
  mutate(across(matches("Genes.|sig_"), ~ stringr::str_trunc(.x, 15))) |> 
  knitr::kable(digits = 100)

data_label	cluster	cluster_id	cluster_term	cluster_size	Geneset.ID	Description	Genes.Returned	P.value	FDR	sig_pathway_genes
C	1	GO:0007606	sensory perception of chemical stimulus	4	GO:0007606	sensory perception of chemical stimulus	ENSRNOG00000…	7.134595e-21	3.616526e-17	ENSRNOG00000…
C	2	GO:0050906	detection of stimulus involved in sensory perception	1	GO:0050906	detection of stimulus involved in sensory perception	ENSRNOG00000…	3.733342e-20	9.462156e-17	ENSRNOG00000…
C	1	GO:0007606	sensory perception of chemical stimulus	4	hsa04740	Olfactory transduction	ENSRNOG00000…	9.081453e-17	1.680069e-14	ENSRNOG00000…
C	1	GO:0007606	sensory perception of chemical stimulus	4	GO:0007608	sensory perception of smell	ENSRNOG00000…	5.624818e-17	9.504067e-14	ENSRNOG00000…
C	3	GO:0015629	actin cytoskeleton	4	GO:0015629	actin cytoskeleton	ENSRNOG00000…	1.174869e-12	4.248438e-10	ENSRNOG00000…

# set group_slice = "data_label" to keep the most significant category in the cluster per group
pathway_reduce = pathway_res |> 
  # retain all categories that are significant in either "data_label" trial.
  (\(x) filter(x, Geneset.ID %in% (filter(x, FDR < 0.05) |> pull(Geneset.ID))))() |> 
  # create a new column that includes only the genes that return from significant categories
  mutate(sig_pathway_genes = ifelse(FDR < 0.05, Genes.Returned, NA)) |> 
  reduceKappa_wrapper(gene_col = "sig_pathway_genes", group_slice = "data_label")

pathway_reduce |> 
  slice_head(n = 3, by = "data_label") |> 
  mutate(across(matches("Genes.|sig_"), ~ stringr::str_trunc(.x, 15))) |> 
  knitr::kable(digits = 100)

data_label	cluster	cluster_id	cluster_term	cluster_size	Geneset.ID	Description	Genes.Returned	P.value	FDR	sig_pathway_genes	sig_pathway_genes_inCluster
C	1	GO:0007606	sensory perception of chemical stimulus	4	GO:0007606	sensory perception of chemical stimulus	ENSRNOG00000…	7.134595e-21	3.616526e-17	ENSRNOG00000…	ENSRNOG00000…
C	2	GO:0050906	detection of stimulus involved in sensory perception	1	GO:0050906	detection of stimulus involved in sensory perception	ENSRNOG00000…	3.733342e-20	9.462156e-17	ENSRNOG00000…	ENSRNOG00000…
C	3	GO:0015629	actin cytoskeleton	4	GO:0015629	actin cytoskeleton	ENSRNOG00000…	1.174869e-12	4.248438e-10	ENSRNOG00000…	ENSRNOG00000…
G	21	GO:0005759	mitochondrial matrix	3	GO:0005759	mitochondrial matrix	ENSRNOG00000…	1.911564e-47	1.211932e-44	ENSRNOG00000…	ENSRNOG00000…
G	22	GO:0016054	organic acid catabolic process	12	GO:0016054	organic acid catabolic process	ENSRNOG00000…	4.398785e-24	2.266694e-20	ENSRNOG00000…	ENSRNOG00000…
G	23	GO:0045333	cellular respiration	23	GO:0045333	cellular respiration	ENSRNOG00000…	2.807665e-17	4.822633e-14	ENSRNOG00000…	ENSRNOG00000…

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
R		R
data-raw		data-raw
data		data
inst/extdata		inst/extdata
man		man
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
pathway-res-example.txt		pathway-res-example.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reducekappa

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

reducekappa

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages