dppca provides tools for differentially private principal component analysis (PCA) visualization in R.
It supports private PC direction estimation, private scree/PVE plots, private score plots, grouped score visualizations, and an interactive 'shiny' app.
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("yejinjo0220/dppca")The main workflow is:
- estimate private PC directions with
dp_pc_dir(). - estimate and plot private scree/PVE summaries with
dp_scree()anddp_scree_plot(). - compute and plot private PCA score summaries with
dp_score()anddp_score_plot(). - optionally use grouped score visualizations or the Shiny app.
The examples below use the synthetic Gaussian cluster dataset included in the package.
library(dppca)
data(gau, package = "dppca")
X <- gaudp_pc_dir() estimates leading principal component directions under differential privacy.
set.seed(123)
V <- dp_pc_dir(
X,
k = 5,
g_dppca = TRUE,
eps = 3,
delta = 1e-4
)
V The returned object contains private principal component directions that can be used PCA summaries and visualizations.
dp_scree() estimates private scree values or proportions of variance explained. The method is chosen by the method argument.
set.seed(123)
scree_clipped <- dp_scree(
X,
k = 5,
method = "clipped",
control = clipped_control(C_clip = 3),
eps = 3,
delta = 1e-4
)
scree_clippedThe package currently supports three scree estimation methods:
"clipped": clipped mean based estimator;"pmwm": private modified winsorized mean based estimator;"huber": Huber-type robust estimator.
Method-specific tuning parameters are specified using the control helper
unctions clipped_control(), pmwm_control(), and huber_control().
For example, multiple scree methods can be requested by passing a vector to method and a named list to control.
set.seed(123)
scree_all <- dp_scree(
X,
k = 5,
method = c("clipped", "pmwm", "huber"),
control = list(
clipped = clipped_control(C_clip = 3),
pmwm = pmwm_control(a = 0, b = 50, trim_const = 10, eta = 0.01),
huber = huber_control(k_min_m2 = -10, k_max_m2 = 10, m2_frac = 1 / 4)
),
eps = 3,
delta = 1e-4
)
scree_alldp_scree_plot() visualizes private scree values or private proportions of variance explained.
set.seed(123)
scree_plot_all <- dp_scree_plot(
X,
k = 5,
method = c("clipped", "pmwm", "huber"),
control = list(
clipped = clipped_control(C_clip = 3),
pmwm = pmwm_control(a = 0, b = 50, trim_const = 10, eta = 0.01),
huber = huber_control(k_min_m2 = -10, k_max_m2 = 10, m2_frac = 1 / 4)
),
eps = 3,
delta = 1e-4
)
scree_plot_alldp_score() computes differentially private summaries of two-dimensional PCA scores using histogram-based methods.
set.seed(123)
score_result <- dp_score(
X,
eps = 3,
delta = 1e-4,
bins = c(8, 8),
method = "add"
)
score_result Available score methods include:
"add": additive histogram method;"sparse": sparse histogram method.
Use method = "add" or method = "sparse" to run one histogram method, or method = c("add", "sparse") to compute both.
dp_score_plot() draws private score plots based on the histogram summaries returned by dp_score().
If method is omitted, both additive and sparse histogram methods are used.
set.seed(123)
score_plot <- dp_score_plot(
X,
eps = 3,
delta = 1e-4,
bins = c(15, 15)
)
score_plot$plot$allFor data with group labels, dp_score_group() and dp_score_plot_group() provide grouped versions of the private score.
data(gau_g, package = "dppca")
X_g <- gau_gCompute grouped private score.
set.seed(123)
score_group <- dp_score_group(
X_g,
group = "group",
eps = 3,
delta = 1e-4,
bins = c(8, 8),
method = "add"
)
score_groupDraw a grouped private score plot.
set.seed(123)
score_group_plot <- dp_score_plot_group(
X_g,
group = "group",
eps = 3,
delta = 1e-4,
bins = c(15, 15),
)
score_group_plot$plot$alldppca_app() launches a Shiny app for exploring private scree and score plots through a graphical interface.
dppca_app()You can also launch the app with a user-supplied dataset.
data(gau_g, package = "dppca")
dppca_app(gau_g, group = "group")dppca includes three datasets for examples and demonstrations:
gau: a synthetic 20-dimensional Gaussian cluster dataset;gau_g: a grouped version ofgauwith an additionalgroupcolumn;adult: a numerical subset of the Adult dataset from the UCI Machine Learning Repository.
The package includes a numerical subset of the Adult dataset from the UCI Machine Learning Repository. The Adult dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This package retains five numerical variables: age, education_num, capital_gain, capital_loss, and hours_per_week.
The package also includes synthetic Gaussian cluster datasets generated by the package authors for reproducible examples.
The methods and examples in dppca are related to the following references.
-
Kim, M. and Jung, S. (2025). Robust and Differentially Private Principal Component Analysis. Statistical Analysis and Data Mining: An ASA Data Science Journal, 18(6), e70053. doi:10.1002/sam.70053.
-
Dwork, C. and Roth, A. (2014). The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 9(3--4), 211--407. doi:10.1561/0400000042.
-
Ramsay, K. and Spicker, D. (2025). Improved subsample-and-aggregate via the private modified winsorized mean. arXiv:2501.14095.
-
Yu, M., Ren, Z., and Zhou, W.-X. (2024). Gaussian differentially private robust mean estimation and inference. Bernoulli, 30(4), 3059--3088.
-
Nissim, K., Raskhodnikova, S., and Smith, A. (2007). Smooth Sensitivity and Sampling in Private Data Analysis. In STOC'07: Proceedings of the 39th Annual ACM Symposium on Theory of Computing, 75--84. doi:10.1145/1250790.1250803.
-
Wasserman, L. and Zhou, S. (2010). A Statistical Framework for Differential Privacy. Journal of the American Statistical Association, 105(489), 375--389. doi:10.1198/jasa.2009.tm08651.
-
Karwa, V. and Vadhan, S. P. (2017). Finite Sample Differentially Private Confidence Intervals. arXiv:1711.03908.
-
Becker, B. and Kohavi, R. (1996). Adult dataset. UCI Machine Learning Repository. doi:10.24432/C5XW20.


