NOTICE: while this package is installed from GitHub (wikimedia/wmfdata-r), that repository is a read-only mirror of wikimedia/discovery/wmf repository hosted on Gerrit. See mediawiki:Developer account for information about creating a Wikimedia Developer account for contributing to this package, MediaWiki, and other Wikimedia projects.
Other packages from Wikimedia Foundations's Product Analytics team include wmfdata for working with Wikimedia data in Python, and waxer for querying Wikimedia Analytics Query Service in R, and wmfastr for speedy dwelltime and search preference metrics calculations in R.
# install.packages("remotes", repos = c(CRAN = "https://cran.rstudio.com/"))
remotes::install_github("wikimedia/wmfdata-r")To update:
remotes::update_packages("wmfdata")set_proxiesto set http(s) proxies on the analytics clusterglobal_queryfor querying all of our MySQL databases- Utilities for working with logs, including EventLogging data:
from_mediawikiandfrom_log(and correspondingto_*functions) to convert between time formats
query_hivefor querying our Hadoop cluster via Hivemysql_readfor querying our MariaDB databases- uses automatic shard detection, see
?connection_detailsfor more info
- uses automatic shard detection, see
- Sample size calculations:
chisq_test_oddsestimates sample size for a chi-squared test given an odds ratiochisq_test_effectestimates sample size for a chi-squared test given Cohen's w
Also includes Wikimedia Design visual style colors:
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
