Welcome to pyRforest, a comprehensive tool for genomic data analysis featuring scikit-learn Random Forests in R. Tailored for expression data, such as RNA-seq or Microarray, pyRforest is built for bioinformaticians and researchers looking to explore the relationship between biological features and matched binary or categorical outcome variables using Random Forest models. Please read on for instructions that will guide you through pyRforest's seamless integration of scikit-learn's Random Forest methodologies (imported to R via reticulate) for model development, evaluation, SHAPley additive explanations, and our custom feature reduction approach by way of rank-based permutation. You will also be directed you through our integration with clusterProfiler and g:Profiler for Gene Ontology and Enrichment Analysis.
Please see our vignette for instructions that will guide you through pyRforest's seamless integration of scikit-learn's Random Forest methodologies (imported to R via reticulate) for model development, evaluation, and our custom feature reduction approach by way of rank-based permutation. You will also be directed through our integration with Enrichr Enrichment Analysis & Gene Ontology, SHAP and gProfiler.
- Integration of Python's scikit-learn Random Forest models in R.
- Custom rank-based feature reduction methodology.
- Compatibility with RNA-seq and Microarray data.
- Integration with Gene Ontology, Enrichment Analysis, SHAP, and gProfiler.
# Install devtools if not already available
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
# Install pyRforest from GitHub
devtools::install_github("tkolisnik/pyRforest")
# Load pyRforest
library(pyRforest)
Note: pyRforest is developed on R version 4.3.1 for Apple Mac M1 arm64 architecture. It also works on Windows (intel x64) and Linux.
Please see vignette for full installation and usage instructions.
pyRforest is designed to work with structured genomic data with a focus on classification problems. Your data should be formatted as a list containing four tibbles (training, validation, testing, and target categories). See the vignette for detailed structuring instructions.
Demo data is available using the command data("demo_rnaseq_data", package = "pyRforest")
pyRforest includes functionalities for data preprocessing, model tuning, fitting, evaluation, and obtaining feature importances. Detailed usage examples and function documentation are available in the package vignette.
Explore advanced functionalities including SHAP value analysis and integration with Enrichment, Gene Ontology and gProfiler modules.
For more information, detailed documentation, and how to contribute, visit the pyRforest GitHub repository.
Created by Tyler Kolisnik with support from Dr. Olin Silander, Dr. Adam Smith & Faeze Keshavarz.
For queries, contact Tyler Kolisnik at tkolisnik@gmail.com.