This page catalogs a set of idioms that I have found useful in R.
To create a large data.frame from two smaller data.frames you may be tempted to do
big.df <- NULL
for(i in i.vec){
for(j in j.vec){
small.df <- data.frame(i, j)
big.df <- rbind(big.df, small.df)
}
}
It works, but it is slower than
big.df.list <- list()
for(i in i.vec){
for(j in j.vec){
small.df <- data.frame(i, j)
big.df.list[[paste(i, j)]] <- small.df
}
}
big.df <- do.call(rbind, big.df.list)
Because of the versionless install.packages
function, it is
difficult to conduct truly reproducible research using R. In one of
your scripts you may be tempted to write
library(ggplot2)
to indicate that you code uses the ggplot2 package. But there were major backwards-incomapatible changes to the ggplot2 package in 2015. How will the future users of your code (including your future self) know which version to use?
Instead, I would recommend writing the following at the top of your R script. It indicates the version of a package from a CRAN-like repository.
works_with_R("3.2.3", ggplot2="1.0.1")
Even better, if the package can be found on GitHub you can indicate the repository that it comes from, and the specific commit that you used.
works_with_R("3.2.3",
"tdhock/ggplot2@a8b06ddb680acdcdbd927773b1011c562134e4d2")
I recommend defining works_with_R in your ~/.Rprofile
.