FMLE

FMLE is an R package for regime-aware prediction of protein abundance from single-cell transcriptomic features using a fuzzy mixture of linear experts. It combines fuzzy c-means gating in a low-dimensional latent space with expert-specific linear predictors on high-dimensional gene expression features, enabling both interpretable and accurate protein prediction across heterogeneous cellular regimes.

The package supports:

task prediction (fmle_train(), fmle_predict())
cross-validation over the number of experts, fuzzifier, and L1 penalty (fmle_cv_parallel(), fmle_cv_mt_parallel())
fuzzy c-means gating (fcm_fit())
predictive uncertainty decomposition from the fitted experts

Overview

FMLE models protein abundance as a mixture of regime-specific RNA–protein mappings with soft, input-dependent gating. This allows the model to capture heterogeneous coupling structure that is missed by a single global mapping.

Figure 1. FMLE identifies regime-dependent RNA–protein coupling, improves over a single global mapping, and reveals interpretable regime structure across cells.

Installation

# install.packages("remotes")
remotes::install_local("FMLE")
# or
remotes::install_github("vikkyak/FMLE")

Python interoperability

If you want to import AnnData (.h5ad) objects in R, use reticulate in your analysis script, for example:

library(reticulate)
use_condaenv("your_env_name", required = TRUE)
py_config()
anndata <- import("anndata")

Quickstart

Dataset used in the demo

The packaged demo object is derived from the PBMC 10k CITE-seq dataset (10x Genomics, v3 chemistry).

Original dataset:

PBMC 10k CITE-seq (10x Genomics)
https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_protein_v3/pbmc_10k_protein_v3_filtered_feature_bc_matrix.tar.gz

For the FMLE package, this dataset was converted into a lightweight demo object (fmle_demo.rds) to allow examples and the vignette to run quickly.

The demo dataset includes:

a reduced set of gene-expression features
the protein panel used in the example workflow
train/test splits and latent representations (Z)

In the experiments reported in the FMLE manuscript, models were trained using larger feature sets (e.g. ~2000 highly variable genes). The reduced demo dataset is intended only for reproducible examples and fast package demonstrations.

Single-task example

library(FMLE)

demo <- readRDS(system.file("extdata", "fmle_demo.rds", package = "FMLE"))

X_train <- demo$X_train
X_test  <- demo$X_test
Y_train <- demo$Y_train
Y_test  <- demo$Y_test
Z_train <- demo$Z_train
Z_test  <- demo$Z_test
q <- 0.995

cap_and_scale_fit_local <- function(y, q = 0.995, eps = 1e-8) {
  cap <- as.numeric(stats::quantile(y, probs = q, na.rm = TRUE))
  y_cap <- pmin(y, cap)
  y_log <- log1p(y_cap + eps)
  mu <- mean(y_log, na.rm = TRUE)
  sd <- stats::sd(y_log, na.rm = TRUE)
  if (is.na(sd) || sd == 0) sd <- 1
  list(cap = cap, mu = mu, sd = sd, eps = eps)
}

cap_and_scale_apply_local <- function(y, tf) {
  y_cap <- pmin(y, tf$cap)
  y_log <- log1p(y_cap + tf$eps)
  (y_log - tf$mu) / tf$sd
}

tf_y <- cap_and_scale_fit_local(Y_train[, 1], q = q)
y_train <- cap_and_scale_apply_local(Y_train[, 1], tf_y)
y_test  <- cap_and_scale_apply_local(Y_test[, 1], tf_y)

cv <- fmle_cv_parallel(
  X = X_train,
  y = Y_train[, 1],
  Z = Z_train,
  R_grid = c(2, 3),
  m_grid = c(1.6, 1.8),
  lambda_grid = c(0, 1e-3),
  folds = 3,
  seed = 1,
  exec = "sequential",
  verbose = FALSE
)

best <- cv$best

fit <- fmle_train(
  X = X_train,
  y = y_train,
  Z = Z_train,
  R = best$R,
  m = best$m,
  lambda_l1 = best$lambda,
  ridge = 1e-6,
  standardize = TRUE,
  seed = 1
)

pred <- fmle_predict(
  model = fit,
  X_new = X_test,
  Z_new = Z_test,
  return_se = TRUE
)

pearson <- cor(pred$mean, y_test, method = "pearson")
spearman <- cor(pred$mean, y_test, method = "spearman")
mse <- mean((pred$mean - y_test)^2)

data.frame(
  metric = c("Pearson", "Spearman", "MSE"),
  value = c(pearson, spearman, mse)
)

Single-task results across multiple proteins

proteins_to_show <- colnames(Y_train)
res_list <- vector("list", length(proteins_to_show))

for (j in seq_along(proteins_to_show)) {
  prot <- proteins_to_show[j]

  tf_y <- cap_and_scale_fit_local(Y_train[, j], q = q)
  y_train_j <- cap_and_scale_apply_local(Y_train[, j], tf_y)
  y_test_j  <- cap_and_scale_apply_local(Y_test[, j], tf_y)

  cv_j <- fmle_cv_parallel(
    X = X_train,
    y = Y_train[, j],
    Z = Z_train,
    R_grid = c(2, 3),
    m_grid = c(1.6, 1.8),
    lambda_grid = c(0, 1e-3),
    folds = 3,
    seed = 1,
    exec = "sequential",
    verbose = FALSE
  )

  best_j <- cv_j$best

  fit_j <- fmle_train(
    X = X_train,
    y = y_train_j,
    Z = Z_train,
    R = best_j$R,
    m = best_j$m,
    lambda_l1 = best_j$lambda,
    ridge = 1e-6,
    standardize = TRUE,
    seed = 1
  )

  pred_j <- fmle_predict(
    model = fit_j,
    X_new = X_test,
    Z_new = Z_test,
    return_se = TRUE
  )

  res_list[[j]] <- data.frame(
    protein = prot,
    R = best_j$R,
    m = best_j$m,
    lambda = best_j$lambda,
    Pearson = cor(pred_j$mean, y_test_j, method = "pearson"),
    Spearman = cor(pred_j$mean, y_test_j, method = "spearman"),
    MSE = mean((pred_j$mean - y_test_j)^2)
  )
}

res_tab <- do.call(rbind, res_list)
res_tab$Pearson <- round(res_tab$Pearson, 3)
res_tab$Spearman <- round(res_tab$Spearman, 3)
res_tab$MSE <- round(res_tab$MSE, 3)
res_tab

Benchmark summary

Across multiple PBMC datasets, FMLE improves RNA→protein prediction relative to scLinear and cTPnet.

Figure 2. FMLE achieves stronger per-protein predictive performance across benchmark datasets and wins more frequently than competing methods.

Zero-shot cross-dataset transfer

FMLE preserves regime structure and predictive advantage under zero-shot dataset transfer, supporting the biological reproducibility of the inferred coupling regimes across independent single-cell multimodal datasets.

Figure 3. FMLE generalizes in a zero-shot cross-dataset setting, preserves structured RNA–protein coupling, and improves unseen target-dataset prediction relative to global and baseline models.

Cross-donor generalization

FMLE preserves regime structure and predictive advantage under donor shift, supporting the biological reproducibility of the inferred coupling regimes.

Figure 4. FMLE regimes generalize across donors, preserve structured RNA–protein coupling, and improve held-out donor prediction relative to global and baseline models.

Vignette

You can also browse installed package vignettes in R with:

browseVignettes("FMLE")

Important preprocessing note

For single-task FMLE, fmle_cv_parallel() internally applies cap/log/scale preprocessing to the response before fold-wise fitting and evaluation. In contrast, fmle_train() fits the response exactly as supplied.

After selecting (R, m, lambda) by cross-validation, refit the full model using the response scale you intend to use for the final model and evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
R		R
doc		doc
figures		figures
inst/extdata		inst/extdata
man		man
paper		paper
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CITATION.cff		CITATION.cff
DESCRIPTION		DESCRIPTION
FMLE.Rproj		FMLE.Rproj
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FMLE

Overview

Installation

Python interoperability

Quickstart

Dataset used in the demo

Single-task example

Single-task results across multiple proteins

Benchmark summary

Zero-shot cross-dataset transfer

Cross-donor generalization

Vignette

Important preprocessing note

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FMLE

Overview

Installation

Python interoperability

Quickstart

Dataset used in the demo

Single-task example

Single-task results across multiple proteins

Benchmark summary

Zero-shot cross-dataset transfer

Cross-donor generalization

Vignette

Important preprocessing note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages