-
Notifications
You must be signed in to change notification settings - Fork 105
/
wflow-04-how-it-works.Rmd
154 lines (124 loc) · 8.79 KB
/
wflow-04-how-it-works.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
title: "How the workflowr package works"
subtitle: "workflowr version `r utils::packageVersion('workflowr')`"
author: "John Blischak"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{How the workflowr package works}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
The workflowr package combines many powerful tools in order to produce a
research website. It is absolutely **not** necessary to understand all the
underlying tools to take advantage of workflowr, and in fact that is one of the
primary goals of workflowr: to allow researchers to focus on their analyses
without having to worry too much about the technical details. However, if you
are interested in implementing advanced customization options, contributing to
workflowr, or simply want to learn more about these tools, the sections below
provide some explanations of how workflowr works.
## Overview
[R][] is the computer programming language used to perform the analysis.
[knitr][] is an R package that executes code chunks in an R Markdown file to create a Markdown file.
[Markdown][] is a lightweight markup language that is easier to read and write than HTML.
[rmarkdown][] is an R package that combines the functionality of [knitr][] and the document converter [pandoc][].
[Pandoc][] powers the conversion of [knitr][]-produced Markdown files into HTML, Word, or PDF documents.
Additionally, newer versions of [rmarkdown][] contain functions for building websites.
The styling of the websites is performed by the web framework [Bootstrap][].
[Bootstrap][] implements the navigation bar at the top of the website, has many available themes to customize the look of the site, and dynamically adjusts the website so it can be viewed on a desktop, tablet, or mobile device.
The [rmarkdown][] website configuration file `_site.yml` allows convenient customization of the [Bootstrap][] navigation bar and theme.
[Git][] is a distributed version control system (VCS) that tracks code development.
It has many powerful features, but only a handful of the main functions are required to use workflowr.
[git2r][] is an R package which provides an interface to [libgit2][], which is a portable, pure C implementation of the Git core methods (this is why you don't need to install Git before using workflowr).
[GitHub][] is a website that hosts [Git][] repositories and additionally provides collaboration tools for developing software.
[GitHub Pages][] is a [GitHub][] service that offers free hosting of [static websites][static].
By placing the HTML files for the website in the subdirectory `docs/`, [GitHub Pages][] serves them online.
To aid reproducibility, workflowr provides an R Markdown output format
`wflow_html()` template that automatically sets a seed for random number
generation, records the session information, and reports the status of the Git
repository (so you always know which version of the code produced the results
contained in that particular file). These options are controlled by the settings
in `_workflowr.yml`. It also provides a custom site generator `wflow_site()`
that enables `wflow_html()` to work with R Markdown websites. These options are
controlled in `analysis/_site.yml`.
[R]: https://cran.r-project.org/
[knitr]: https://yihui.org/knitr/
[Markdown]: https://daringfireball.net/projects/markdown/
[rmarkdown]: https://rmarkdown.rstudio.com/
[pandoc]: https://pandoc.org/
[Bootstrap]: https://getbootstrap.com/
[Git]: https://git-scm.com/
[SHA-1]: https://en.wikipedia.org/wiki/SHA-1
[GitHub]: https://github.com/
[GitHub Pages]: https://pages.github.com/
[static]: https://en.wikipedia.org/wiki/Static_web_page
## Where are the figures?
workflowr saves the figures into an organized, hierarchical directory structure
within `analysis/`. For example, the first figure generated by the chunk named
`plot-data` in the file `filename.Rmd` will be saved as
`analysis/figure/filename.Rmd/plot-data-1.png`. Furthermore, the figure files
are _moved_ to `docs/` when `render_site` is run (this is the rmarkdown package
function called by `wflow_build`, `wflow_publish`, and the RStudio Knit button).
The figures have to be committed to the Git repository in `docs/` in order to be
displayed properly on the website. `wflow_publish` automatically commits the
figures in `docs` corresponding to new or updated R Markdown files, and
`analysis/figure/` is in the `.gitignore` file to prevent accidentally
committing duplicate files.
Because workflowr requires the figures to be saved to a specific location in
order to function properly, it will override any custom setting of the knitr
option `fig.path` (which controls where figure files are saved) and insert a
warning into the HTML file to alert the user that their value for `fig.path` was
ignored.
## Additional tools
[Posit Software, PBC][] is a company that develops open source software for R users.
They are the principal developers of [RStudio][], an integrated development environment (IDE) for R, and the [rmarkdown][] package.
Because of this tight integration, new developments in the [rmarkdown][] package are quickly incorporated into the [RStudio][] IDE.
While not strictly required for using workflowr, using [RStudio][] provides many benefits, including:
* RStudio projects make it easier to setup your R environment, e.g. set the correct working directory, and quickly switch between different projects
* The Git pane allows you to conveniently view your changes and run the main Git functions
* The Viewer pane displays the rendered HTML results for immediate feedback
* Clicking the `Knit` button automatically uses the [Bootstrap][] options specified in `_site.yml` and moves the rendered HTML to the website subdirectory `docs/` (requires version 1.0 or greater)
* Includes an up-to-date copy of [pandoc][] so you don't have to install or update it
* Tons of other cool [features][rstudio-features] like debugging and source code inspection
Another key R package used by workflowr is [rprojroot][].
This package finds the root of the repository, so workflowr functions like `wflow_build` will work the same regardless of the current working directory.
Specifically, [rprojroot][] searches for the RStudio project `.Rproj` file at the base of the workflowr project (so don't delete it!).
[Posit Software, PBC]: https://posit.co/
[RStudio]: https://posit.co/products/open-source/rstudio/
[rstudioapi]: https://github.com/rstudio/rstudioapi
[rprojroot]: https://cran.r-project.org/package=rprojroot
[git2r]: https://cran.r-project.org/package=git2r
[libgit2]: https://libgit2.org/
[rstudio-features]: https://posit.co/products/open-source/rstudio/
## Background and related work
There is lots of interest and development around reproducible research with R.
Projects like workflowr are possible due to two key developments. First, the R
packages [knitr][] and [rmarkdown][] have made it easy for any R programmer to
generate reports that combine text, code, output, and figures. Second, the
version control software [Git][], the Git hosting site [GitHub][], and the
static website hosting service [GitHub Pages][] have made it easy to share not
only source code but also static HTML files (i.e. no need to purchase a domain
name, setup a server, etc).
My first attempt at sharing a reproducible project online was [singleCellSeq][].
Basically, I started by copying the documentation website of [rmarkdown][] and
added some customizations to organize the generated figures and to insert the
status of the Git repository directly into the HTML pages. The workflowr R
package is my attempt to simplify my previous workflow and provide helper
functions so that any researcher can take advantage of this workflow.
Workflowr encompasses multiple functions: 1) provides a project template, 2)
version controls the R Markdown and HTML files, and 3) builds a website.
Furthermore, it provides R functions to perform each of these steps. There are
many other related works that provide similar functionality. Some are templates
to be copied, some are R packages, and some involve more complex software (e.g.
static blog software). Depending on your use case, one of the related works
listed at [r-project-workflows][] may better suit your needs. Please check them
out!
[r-project-workflows]: https://github.com/jdblischak/r-project-workflows#readme
[singleCellSeq]: https://jdblischak.github.io/singleCellSeq/analysis/
## Further reading
* How the code, results, and figures are executed and displayed can be customized using [knitr chunk and package options](https://yihui.org/knitr/options/)
* How [R Markdown websites](https://bookdown.org/yihui/rmarkdown/rmarkdown-site.html) are configured
* The many [features][rstudio-features] of the [RStudio][] IDE
* [Directions](https://docs.github.com/articles/configuring-a-publishing-source-for-github-pages) to publish a [GitHub Pages][] site using the `docs/` subdirectory