Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broom:::tidy.glht() not working with conf.int = TRUE #1103

Closed
behrman opened this issue Jun 1, 2022 · 7 comments
Closed

broom:::tidy.glht() not working with conf.int = TRUE #1103

behrman opened this issue Jun 1, 2022 · 7 comments

Comments

@behrman
Copy link

behrman commented Jun 1, 2022

In the following code, broom:::tidy.glht() throws an error when conf.int = TRUE.

The cause for this may be that broom:::tidy.confint.glht() expects broom:::glht_term_column() to return a tibble with a term column. Instead, it returns NULL, since the glht object does not have a focus component.

library(tidyverse)

set.seed(660)
n <- 10

data <- 
  tibble(
    y = rnorm(n),
    a = factor(sample(0:1, size = n, replace = TRUE)),
    x = rnorm(n),
  )

fit <- lm(y ~ a * x, data = data)

linfct <- matrix(0, ncol = length(coef(fit)))
colnames(linfct) <- names(coef(fit))
linfct[, c("a1", "a1:x")] <- c(1, 1)

x <- multcomp::glht(fit, linfct = linfct)

broom::tidy(x)
#> # A tibble: 1 × 6
#>   contrast null.value estimate std.error statistic adj.p.value
#>   <chr>         <dbl>    <dbl>     <dbl>     <dbl>       <dbl>
#> 1 1                 0   0.0853     0.905    0.0943       0.928

confint(x)
#> 
#>   Simultaneous Confidence Intervals
#> 
#> Fit: lm(formula = y ~ a * x, data = data)
#> 
#> Quantile = 2.4469
#> 95% family-wise confidence level
#>  
#> 
#> Linear Hypotheses:
#>        Estimate lwr      upr     
#> 1 == 0  0.08529 -2.12862  2.29919

names(x)
#> [1] "model"       "linfct"      "rhs"         "coef"        "vcov"       
#> [6] "df"          "alternative" "type"

broom:::glht_term_column(x)

broom::tidy(x, conf.int = TRUE)
#> Joining, by = c("contrast", "estimate")
#> Error in `select()`:
#> ! Can't subset columns that don't exist.
#> ✖ Column `term` doesn't exist.

Created on 2022-05-31 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       macOS Catalina 10.15.7
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Los_Angeles
#>  date     2022-05-31
#>  pandoc   2.14.0.3 @ /Applications/RStudio.app/Contents/MacOS/pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.1.0)
#>  backports     1.4.1   2021-12-13 [1] CRAN (R 4.1.0)
#>  broom         0.8.0   2022-04-13 [1] CRAN (R 4.1.2)
#>  cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.1.0)
#>  cli           3.3.0   2022-04-25 [1] CRAN (R 4.1.2)
#>  codetools     0.2-18  2020-11-04 [1] CRAN (R 4.1.2)
#>  colorspace    2.0-3   2022-02-21 [1] CRAN (R 4.1.2)
#>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.1.2)
#>  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.1.2)
#>  dbplyr        2.1.1   2021-04-06 [1] CRAN (R 4.1.0)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.1.0)
#>  dplyr       * 1.0.9   2022-04-28 [1] CRAN (R 4.1.2)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.15    2022-02-18 [1] CRAN (R 4.1.2)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.1.2)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
#>  forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.1.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.1.0)
#>  generics      0.1.2   2022-01-31 [1] CRAN (R 4.1.2)
#>  ggplot2     * 3.3.6   2022-05-03 [1] CRAN (R 4.1.2)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.1.2)
#>  gtable        0.3.0   2019-03-25 [1] CRAN (R 4.1.0)
#>  haven         2.5.0   2022-04-15 [1] CRAN (R 4.1.2)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)
#>  hms           1.1.1   2021-09-26 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.0)
#>  httr          1.4.3   2022-05-04 [1] CRAN (R 4.1.2)
#>  jsonlite      1.8.0   2022-02-22 [1] CRAN (R 4.1.2)
#>  knitr         1.39    2022-04-26 [1] CRAN (R 4.1.2)
#>  lattice       0.20-45 2021-09-22 [1] CRAN (R 4.1.2)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.0)
#>  lubridate     1.8.0   2021-10-07 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.1.2)
#>  MASS          7.3-57  2022-04-22 [1] CRAN (R 4.1.2)
#>  Matrix        1.4-1   2022-03-23 [1] CRAN (R 4.1.2)
#>  modelr        0.1.8   2020-05-19 [1] CRAN (R 4.1.0)
#>  multcomp      1.4-19  2022-04-26 [1] CRAN (R 4.1.2)
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.1.0)
#>  mvtnorm       1.1-3   2021-10-08 [1] CRAN (R 4.1.0)
#>  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.1.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
#>  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
#>  R.cache       0.15.0  2021-04-30 [1] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1   2020-08-26 [1] CRAN (R 4.1.0)
#>  R.oo          1.24.0  2020-08-26 [1] CRAN (R 4.1.0)
#>  R.utils       2.11.0  2021-09-26 [1] CRAN (R 4.1.0)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.0)
#>  readr       * 2.1.2   2022-01-30 [1] CRAN (R 4.1.2)
#>  readxl        1.4.0   2022-03-28 [1] CRAN (R 4.1.2)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.0)
#>  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.1.2)
#>  rmarkdown     2.14    2022-04-25 [1] CRAN (R 4.1.2)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
#>  rvest         1.0.2   2021-10-16 [1] CRAN (R 4.1.0)
#>  sandwich      3.0-1   2021-05-18 [1] CRAN (R 4.1.0)
#>  scales        1.2.0   2022-04-13 [1] CRAN (R 4.1.2)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.1.2)
#>  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.1.0)
#>  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.7.0   2022-03-13 [1] CRAN (R 4.1.2)
#>  survival      3.3-1   2022-03-03 [1] CRAN (R 4.1.2)
#>  TH.data       1.1-1   2022-04-26 [1] CRAN (R 4.1.2)
#>  tibble      * 3.1.7   2022-05-03 [1] CRAN (R 4.1.2)
#>  tidyr       * 1.2.0   2022-02-01 [1] CRAN (R 4.1.2)
#>  tidyselect    1.1.2   2022-02-21 [1] CRAN (R 4.1.2)
#>  tidyverse   * 1.3.1   2021-04-15 [1] CRAN (R 4.1.0)
#>  tzdb          0.3.0   2022-03-28 [1] CRAN (R 4.1.2)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.1.2)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.1.2)
#>  xfun          0.31    2022-05-10 [1] CRAN (R 4.1.2)
#>  xml2          1.3.3   2021-11-30 [1] CRAN (R 4.1.0)
#>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.1.2)
#>  zoo           1.8-10  2022-04-15 [1] CRAN (R 4.1.2)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@simonpcouch
Copy link
Collaborator

I appreciate you reporting this! This edge case is now handled:

library(tidyverse)
library(broom)
library(multcomp)
#> Loading required package: mvtnorm
#> Loading required package: survival
#> Loading required package: TH.data
#> Loading required package: MASS
#> 
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#> 
#>     select
#> 
#> Attaching package: 'TH.data'
#> The following object is masked from 'package:MASS':
#> 
#>     geyser

set.seed(660)
n <- 10

data <- 
  tibble(
    y = rnorm(n),
    a = factor(sample(0:1, size = n, replace = TRUE)),
    x = rnorm(n),
  )

fit <- lm(y ~ a * x, data = data)

linfct <- matrix(0, ncol = length(coef(fit)))
colnames(linfct) <- names(coef(fit))
linfct[, 1:4] <- rep(1, 4)

x <- glht(fit, linfct = linfct)

tidy(x, conf.int = TRUE)
#> Joining, by = c("contrast", "estimate")
#> # A tibble: 1 × 7
#>   contrast null.value estimate conf.low conf.high statistic adj.p.value
#>   <chr>         <dbl>    <dbl>    <dbl>     <dbl>     <dbl>       <dbl>
#> 1 1                 0   0.0368    -1.91      1.98    0.0463       0.965

Created on 2022-06-01 by the reprex package (v2.0.1)

@behrman
Copy link
Author

behrman commented Jun 1, 2022

Many thanks to you and your team for this excellent package. I'm currently writing code for a book, and it's wonderful to be able to use broom as a common interface to many packages, something I'm sure students will appreciate.

I'd like to make a suggestion for broom:::tidy.glht(). It would be good if the code could avoid the message from the left_join(), in this case

Joining, by = c("contrast", "estimate")

And do you mean to be joining on the estimate? If it was calculated in two different ways in the two tibbles, the two results could differ slightly.

@simonpcouch
Copy link
Collaborator

Yeah, I noticed that join message while I was putting together the fix.🤔 The tidyselect magic I used for f010098 doesn't do the trick here, and I don't feel confident enough in my understanding of the multcomp internals to alter that join code myself. I'd welcome a PR/suggestion if you have an idea for some solid logic to specify that join. :)

@behrman
Copy link
Author

behrman commented Jun 2, 2022

Fortunately, I don't believe you need to understand multcomp internals to specify the by = columns for the join in broom:::tidy.glht(). The join is joining a tibble created by broom:::tidy.summary.ghlt() with one created by broom:::tidy.confint.glht(), so those two functions can tell us the possible columns.

The first tibble of the join can have these columns:

[term] contrast null.value estimate statistic [adj.p.value | p.value]

term may be present or not (the reason for my original issue), and there is exactly one of adj.p.value or p.value.

The second tibble of the join can have these columns:

[term] contrast estimate conf.low conf.high

where again term may be present or not.

Thus, the present code is equivalent to specifying the by = in two different cases:

  1. If both of the tibbles have a term column: by = c("term", "contrast", "estimate")
  2. If either of the tibbles lacks a term column: by = c("contrast", "estimate")

Someone can double check me on this. I suspect that the estimate column in both tibbles is drawn from the same column in the glht object, so joining on it would not have the problem of doubles being slightly different.

@simonpcouch
Copy link
Collaborator

Yup, that joining on doubles bit is what makes me hesitant, and I'm not confident contrast well-specifies the join in the cases where we don't have term. I'll let this sit over the weekend in case anyone drops a note here before adjusting the join logic. :)

@simonpcouch simonpcouch reopened this Jun 2, 2022
@behrman
Copy link
Author

behrman commented Jun 2, 2022

I agree that the logic of this should be reviewed, especially joining on the double estimate. For example, what would happen if two different estimates have exactly the same value.

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants