Skip to content

Consistency of emmeans tidiers (with other tidiers) #692

@crsh

Description

@crsh

While working on a PR to add a new emmeans-tidier, I noticed that the emmeans-tidiers have some internal and external inconsistencies:

  1. The lsmobj-method uses the common arguments conf.int and conf.level as defined in the param_confint template. The other methods (e.g., emmGrid) do not provide these arguments and instead rely on the argument names native to the emmeans summary()-methods (e.g., infer and level).

I haven't looked exhaustively at other methods, but I additionally noticed some inconsistencies compared to other contrast tidiers, specifically tidy.TukeyHSD():

  1. tidy.TukeyHSD() reports the contrasted conditions in a column labelled comparison in the form of a-b. In contrast, the emmeans tidiers return the same information in two columns labelled level1 and level2 (containing a and b).
fm1 <- aov(breaks ~ wool + tension, data = warpbreaks)
thsd <- TukeyHSD(fm1, "tension", ordered = TRUE)
tidy(thsd)

emmp <- pairs(emmeans(fm1, ~ tension))
tidy(emmp)
  1. In the tibble returned by tidy.TukeyHSD(), the column containing p-values is labelled adj.p.value. In contrast, the emmeans tidiers label this column p.value regardless of whether it has been adjusted for multiple comparisons or not (see code above). Unless I missed something, the use of adj.p.value is currently unique to tidy.TukeyHSD().

It seems desirable to try to keep things consistent across methods where possible but particularly within the set of tidiers for a given package. I would, therefore, suggest the following changes, that I'd be willing to implement in a PR:

  1. Add the arguments conf.int and conf.level to all emmeans tidiers.
  2. Change reporting of contrast pairs in either tidy.TukeyHSD() or the emmeans-methods. I'm not sure which of the two is preferable here.
  3. Either use adj.p.value in emmeans tidiers whenever p-values are adjusted for multiple comparisons or use p.value in tidy.TukeyHSD(). Again I'm not sure which is preferable.

Any thoughts?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions