Skip to content

Style Guide

Matthew Kay edited this page Feb 6, 2019 · 39 revisions

This is a draft of the style guide for the transparent statistics guidelines, to be updated as we develop it alongside our first set of chapters. Feedback and suggestions for refining this style guide are welcome!

This document contains a style guide for the guidelines: it describes what the guidelines should look like. For a description of the mechanics of how to contribute to the guidelines, see Contributing to the Guidelines.

The Transparent Statistics Guidelines are organized into chapters. Chapter 1 lays out general guiding principles for transparent statistics. The remaining chapters are topics (e.g., Chapter 2 is on effect size). Each topic contains a FAQ on that topic and one or more exemplars.

All Chapters

  • Keep author list updated. If you have contributed substantial content (or know someone who did), please update the author list at the beginning of the chapter. Substantial content means the person has either added new information (even if it's only two sentences or a few lines of code), or significantly improved the presentation (structure, grammar, wording, figures, etc.). "Substantial" doesn't mean people need to contribute a lot, but it does mean that fixing one or two typos may not be enough.

  • Tag incomplete sections with like to how to contribute. Incomplete / in progress sections should be tagged "alpha" and given a link to Contributing to the Guidelines using the following block:

    <mark>
    This section is in *alpha*. We welcome help and feedback at all levels!
    If you would like to contribute, please see
    [Contributing to the Guidelines](https://github.com/transparentstats/guidelines/wiki/Contributing-to-the-Guidelines).
    </mark>
    

Chapter 1: Guiding Principles

  • Focus on high-level principles. Do not include recommendations or discussions about statistical methods and procedures such as “should we correct for multiple comparisons” or “should we remove outliers” or “is bayesian better”. These issues will be discussed in individual topic chapters (FAQs and exemplars). The first chapter is a chapter on core high-level principles (axioms) that will serve to motivate specific methodological recommendations in later chapters, and it should be method-agnostic as much as possible. It is fine, though, to mention specific methods and procedures as examples, using the example: format. Keep the examples very short and provide a reference if necessary.

  • Focus on statistics. Do not include discussions that pertain to research methods or study design: this is about transparent statistics. Statistics are arguably tightly connected to study design, but we can discuss issues of study design in a later chapters, if needed. Anything that pertains to statistical communication (e.g, how we frame conclusions, share material, etc) is however relevant to this chapter.

  • Focus on transparency. This chapter is not about getting our statistics right. The first principle “faithfulness” already states that our stats need to be correct, perhaps there is no need to elaborate more on this. The many ways we can get our stats wrong can be covered in much more depth in later chapters.

  • Justify. Add references as much as possible. Also justify using common sense and logics, especially when adding a new principle. The principle should be self-evident in a transparent statistics context, or should logically follow from other principles. Some redundancy in the chapter is OK, but avoid too much overlap between principles. Ask yourself if the new principle is really only a subset (or a superset) of existing principles.

Topic Chapters

  • The chapter should start with a teaser figure: an annotated plot that illustrates or explains the topic addressed in the chapter. The first figure should be as self-contained and self-explanatory as much as possible.

    • Should we say this should be generated from a hidden chunk of R code? That would make it easier to edit, but also a little more annoying to develop depending on the complexity of the figure. ---Matt
    • I'd side with hidden R code when possible. You can just source the figure out to an external R file. Also, you can set the output format to svg via dev: CairoSVG ---Steve
    • If the whole figure can be generated in R without unreasonable amount of pain, we'll prefer that + including hidden code. Otherwise, I'd suggest generating plots in R (+ include the hidden code) and compose them together elsewhere. (The case in point: I wouldn't be able to reproduce the effect size figure solely in R.) ---Chat
    • I think starting with that approach for quick prototyping of the figure is good, but once we know what we want it might be good to port it back into R at some point to make it easier for anyone to contribute to. The figure you made, for example, would be a pain to make in ggplot but I think would be doable, and once there would be able to be modified through git like everything else. ---Matt
  • The chapter should include an auto-generated floating table of contents for navigation.

  • Internal links should be created by adding an id before a heading using an anchor tag and linking to that id.

    • Do not use auto-generated ids, as they will break if the heading text changes.
    • Create ids by appending them to headers with {#some_id}
    • Create links like this: [some link](#some_id).
    • Prefix the id with the name of the file plus _faq_ or _exemplar_. Use underscore between words. Filename prefixing is needed (1) to make it easy to link to sections in other files using ids (bookdown resolves these automatically) and (2) to avoid name clashes across files.
    • For example, a heading from the effectsize.Rmd file (note the heading id starts with the filename):
      ## How should effect sizes be reported? {#effectsize_faq_how_reporting}
      
    • Headings in the github wiki, by contrast, must use the <a id= syntax (the wiki does not support the other syntax):
      <a id="some_wiki_id"></a>
      
      ## Some wiki section
      

Below are specific guidelines for the two major parts that constitute a topic: the FAQ and the exemplars.

FAQ

  • Address both authors and reviewers. A FAQ should address all researchers, irrespective of their role and the reason why they consult it. Thus "you" should be avoided since it's ambiguous. If at some point you need to address either authors or reviewers, use the third person and specify the subject.

  • Focus on the "what" (what are the accepted practices?) rather than the "how" (how to do things?).

    • The "what" often admits multiples possible approaches: there's rarely a single way to do things. "How" can be too specific / prescriptive; defer such details to exemplars, which show one way to do things rather than the way to do things.
    • This also keeps the FAQ brief and high-level.
    • Nevertheless some approaches are better than others (in terms of the guiding principles from Chapter 1), and there is room to point this out.
  • Use references

    • FAQs should be dense in citations. FAQs are not for original content or personal opinion. Cite (almost) every claim.
    • Prefer citing books, archival publications over Wikipedia or websites (as the content may change without our awareness.) However, if a blog post or online article is particularly useful and informative, feel free to cite it.
    • Use quotes as backup, set in blockquotes (a new line starting with > in markdown).
    • Citations should point to claims that are easy to find: where possible, cite page numbers for papers and chapters for books (book page numbers can change from edition to edition).
  • Briefly define terms when it helps. FAQs should use terms in a consistent manner, and is possible, in a manner consistent with how other researchers generally use them (consider using citations).

  • Be as brief as possible. Avoid repetitions, redundancies, and lengthy explanations (use refs instead).

  • Support random access. Some readers may read only a specific section, instead of reading the entire FAQ. Therefore, include links to relevant exemplars in each FAQ section.

  • Support sequential access. Some readers may wish to read the FAQ in sequence, so order questions logically.

Exemplars

  • Should start by giving a concrete context of a study with research question.

  • Exemplars should be exemplars. Do not include two ways of doing things if one is clearly more applicable to the given data or research question. However, sometimes there is a partial ordering on what is an exemplar; e.g. approach A and B are equally okay. In this case you may include both in an exemplar, with explanation.

  • Exemplars within a guideline should differ primarily along dimensions strongly related to the core topic of that guideline. Exemplars that do not meet this criterion should be omitted, moved to another guideline, or moved to an appendix. This means, for example:

    • There should not be two different exemplars under Effect Size that calculate a simple effect size using either frequentist or Bayesian approaches, because these differ by inferential approach but are essentially the same with respect to the effect size calculated (the topic of the Effect Size guideline). Instead one of those examples could go in the appendix or another guideline (e.g. one on Bayesian estimation or on inferential statistics or what have you).
    • There should not be two different exemplars under Effect Size that calculate a simple effect size on a design with one independent variable and either a between subjects or within subjects design, because these differ by experimental design but the effect size is still just a mean difference. Instead, a calculation of simple effect size for a simple within subjects design might be placed in a relevant exemplar in a guideline on Repeated Measures or Experimental Design.
    • There could be different exemplars within the Effect Size guideline showing Cohen's d or partial eta-squared, since those exemplars differ by type of effect size and are within the Effect Size guideline.
  • Some guidelines may need anti-exemplars. This may be example analyses that embody some common mistake. Anti-exemplars must have some corresponding exemplar with a link indicating the correct way to analyze the same dataset.

  • Should include blockquotes for text to include in the paper.

  • Should put graphical presentation before textual presentation. Or at least, the preferred presentation first.

  • Should be designed for random access of exemplars, but not within exemplars. We should not encourage people to cherry pick single code chunks, but rather to understand the context in which the analyses are conducted. This implies:

    • Exemplars should be informative but concise, so they are more likely to be read completely.
    • We should cut down on boilerplate code where possible.
    • Exemplars should not use localized library loading beyond their setup chunk. Rather, they should include a single chunk where someone can see all the libraries they would need to ensure are installed to run the entire code, so they can install them all at once.
  • Should be as simple as possible and focus on interpretation and reporting. The guidelines are not intended to be an exhaustive statistics textbook. Instead, the best exemplars are minimally complex and focus on helping people understand effective, transparent ways to report results.

Basic outline of an exemplar

This basic structure will not always apply exactly, but should be close:

  1. Setup chunk
  2. Main Scenario: when to use analysis, a study design + research question, the data
  3. Analysis
  4. How to report
  5. [optional] Alternative scenarios

Code style

We will adopt the tidyverse style guide, with the additions outlined below.

Concrete examples for this style guide is in the "Effect size" guide. Each of the following section may contain pointer to particular location in that guide.

Default packages, setup and boilerplate chunks

For readability, we prefer to use packages in tidyverse over standard R or other packages when possible. Use dplyr for data manipulation and ggplot2 for visualization.

If you use other non-standard packages, make sure to add them in the DESCRIPTION file under the section Imports (for packages from CRAN) or Remotes (for packages from Github). This is necessary for Travis to compile the master branch.

At the beginning of the exemplar, add the following code chunk (replace n in setup_n with the exemplar number in the document):

```{r setup_n, warning = FALSE, message = FALSE}
library(tidyverse) # for data wrangling and plotting
library(broom)     # for tidy(): extracting model results in table format
```

The setup chunk has warnings and messages disabled for concise output.

If only a single function or two is needed from a package (or if the package has the bad behavior of clobbering the namespace), use import::from() instead of library() to load the functions needed. Otherwise, with exception for common packages with a large set of functions (e.g., tidyverse), include a comment indicating the functions the library loads in the setup chunk.

Even if some of the libraries needed for this exemplar are already loaded prior to the beginning of the exemplar (e.g., in the FAQ or another exemplar), repeat this code chunk here. This will keep everything in the same locus of attention.

After the setup chunk, add the following code chunk (replace n in boilerplate_n with the exemplar number in the document):

```{r boilerplate_n, include = FALSE}
set.seed(12)
format_num <- function(nums, sigdigits = 3) gsub("\\.$", "", formatC(nums, sigdigits, format = "fg", flag="#"))
```

Boilerplate not necessary for the gist of the code should be put in this chunk. Change the random seed if desired for your data generation (if you do not use data generation, you can delete the set.seed call).

Code chunks

  • All code chunks should be named

  • Prefer multiple lines with piping over compact but more difficult to read one-liners when it comes to function composition.

  • Output of code should generally have some explanation of it.

  • Each code chunk should end with a line with one variable name. That variable is the main output of the chunk. Such variable may be a number, a plot (as ggplot object), a table (as tibble object), or a structure with special print function (e.g., the output from cohen.d() below).

    • What is the rationale for the above recommendation? If the code chunk itself already gives output and the result is not used in future chunks, I think this makes the code more verbose without making it clearer. ---Matt
  • For model output (e.g., from lm()), use tidy() function from broom package to extract important parameters in table format.

Example: (Notice cohen_d.)

Reporting text

For textual reporting of results (e.g. numerical results), write an example of text that could be put in a paper in a blockquote (>). Use an inline R code chunk in the text to refer to actual values in the code, and wrap numerical output with the format_num() function (defined in the boilerplate header above).

Static images

For static images that are not generated by R, place the image file (and source file, if applicable) in figures/<guideline_name>. Then, use knitr::includegraphics(). For example, a teaser figure for the Effect Size guideline would be placed with the following command:

knitr::include_graphics("figures/effectsize/teaser.png")

Building preview

Use RStudio to open the project guidelines.Rproj while modifying the guideline. This would ensure path references are correct. During development you can use "Knit" button and select "Knit to HTML" to preview only the page that you are working on (resulting in faster turnaround).

Before you make a pull request, you should add your page to the book and recompile the whole book. (The default Knit to HTML template and the book template looks slightly different.) The following steps allow you to compile the whole book:

  1. Add your page to _bookdown.yml under the rmd_files: section. For example, if your new page is in guides/newpage.Rmd which should be placed after "Principles" page, you should make the following addition to the section:

    rmd_files:
      - "index.Rmd"
      - "guides/principles.Rmd"
      - "guides/newpage.Rmd"
    ...

    For testing purpose, you can also comment out other pages by adding # in front of the lines.

  2. Then, find the "Build pane" (menu "View" → "Show Build"). Click on the "Build Book" button. (For the first time, RStudio will ask you to install several required packages.

The built book should be shown in preview. Otherwise, you can find at under _book/index.html.

You can’t perform that action at this time.