Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug with visualize() #424

Closed
andrewpbray opened this issue Oct 20, 2021 · 2 comments · Fixed by #437
Closed

bug with visualize() #424

andrewpbray opened this issue Oct 20, 2021 · 2 comments · Fixed by #437

Comments

@andrewpbray
Copy link
Collaborator

The problem

There seems to be a plotting glitch that's actually reproduced in the documentation (the full pipeline article).

Reproducible example

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(infer)
r_hat <- gss %>% 
    observe(college ~ sex, success = "no degree",
            stat = "ratio of props", order = c("female", "male"))
set.seed(33)
null_dist <- gss %>%
    specify(college ~ sex, success = "no degree") %>%
    hypothesize(null = "independence") %>% 
    generate(reps = 1000) %>% 
    calculate(stat = "ratio of props", order = c("female", "male"))
#> Setting `type = "permute"` in `generate()`.
visualize(null_dist) +
    shade_p_value(obs_stat = r_hat, direction = "two-sided")

Created on 2021-10-20 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.0 (2021-05-18)
#>  os       macOS Catalina 10.15.7      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/Los_Angeles         
#>  date     2021-10-20                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version     date       lib source                             
#>  assertthat    0.2.1       2019-03-21 [1] CRAN (R 4.1.0)                     
#>  cli           3.0.1       2021-07-17 [1] CRAN (R 4.1.0)                     
#>  colorspace    2.0-2       2021-06-24 [1] CRAN (R 4.1.0)                     
#>  crayon        1.4.1       2021-02-08 [1] CRAN (R 4.1.0)                     
#>  curl          4.3.2       2021-06-23 [1] CRAN (R 4.1.0)                     
#>  DBI           1.1.1       2021-01-15 [1] CRAN (R 4.1.0)                     
#>  digest        0.6.27      2020-10-24 [1] CRAN (R 4.1.0)                     
#>  dplyr       * 1.0.7       2021-06-18 [1] CRAN (R 4.1.0)                     
#>  ellipsis      0.3.2       2021-04-29 [1] CRAN (R 4.1.0)                     
#>  evaluate      0.14        2019-05-28 [1] CRAN (R 4.1.0)                     
#>  fansi         0.5.0       2021-05-25 [1] CRAN (R 4.1.0)                     
#>  farver        2.1.0       2021-02-28 [1] CRAN (R 4.1.0)                     
#>  fastmap       1.1.0       2021-01-25 [1] CRAN (R 4.1.0)                     
#>  fs            1.5.0       2020-07-31 [1] CRAN (R 4.1.0)                     
#>  generics      0.1.0       2020-10-31 [1] CRAN (R 4.1.0)                     
#>  ggplot2       3.3.5       2021-06-25 [1] CRAN (R 4.1.0)                     
#>  glue          1.4.2       2020-08-27 [1] CRAN (R 4.1.0)                     
#>  gtable        0.3.0       2019-03-25 [1] CRAN (R 4.1.0)                     
#>  highr         0.9         2021-04-16 [1] CRAN (R 4.1.0)                     
#>  htmltools     0.5.2       2021-08-25 [1] CRAN (R 4.1.0)                     
#>  httr          1.4.2       2020-07-20 [1] CRAN (R 4.1.0)                     
#>  infer       * 1.0.0       2021-08-13 [1] CRAN (R 4.1.0)                     
#>  knitr         1.34        2021-09-09 [1] CRAN (R 4.1.0)                     
#>  labeling      0.4.2       2020-10-20 [1] CRAN (R 4.1.0)                     
#>  lifecycle     1.0.1       2021-09-24 [1] CRAN (R 4.1.0)                     
#>  magrittr      2.0.1       2020-11-17 [1] CRAN (R 4.1.0)                     
#>  mime          0.11        2021-06-23 [1] CRAN (R 4.1.0)                     
#>  munsell       0.5.0       2018-06-12 [1] CRAN (R 4.1.0)                     
#>  pillar        1.6.3       2021-09-26 [1] CRAN (R 4.1.0)                     
#>  pkgconfig     2.0.3       2019-09-22 [1] CRAN (R 4.1.0)                     
#>  purrr         0.3.4       2020-04-17 [1] CRAN (R 4.1.0)                     
#>  R6            2.5.1       2021-08-19 [1] CRAN (R 4.1.0)                     
#>  reprex        2.0.0       2021-04-02 [1] CRAN (R 4.1.0)                     
#>  rlang         0.4.11      2021-04-30 [1] CRAN (R 4.1.0)                     
#>  rmarkdown     2.10        2021-08-06 [1] CRAN (R 4.1.0)                     
#>  rstudioapi    0.13.0-9000 2021-07-07 [1] Github (rstudio/rstudioapi@96fad1d)
#>  scales        1.1.1       2020-05-11 [1] CRAN (R 4.1.0)                     
#>  sessioninfo   1.1.1       2018-11-05 [1] CRAN (R 4.1.0)                     
#>  stringi       1.7.4       2021-08-25 [1] CRAN (R 4.1.0)                     
#>  stringr       1.4.0       2019-02-10 [1] CRAN (R 4.1.0)                     
#>  tibble        3.1.5       2021-09-30 [1] CRAN (R 4.1.0)                     
#>  tidyselect    1.1.1       2021-04-30 [1] CRAN (R 4.1.0)                     
#>  utf8          1.2.2       2021-07-24 [1] CRAN (R 4.1.0)                     
#>  vctrs         0.3.8       2021-04-29 [1] CRAN (R 4.1.0)                     
#>  withr         2.4.2       2021-04-18 [1] CRAN (R 4.1.0)                     
#>  xfun          0.25        2021-08-06 [1] CRAN (R 4.1.0)                     
#>  xml2          1.3.2       2020-04-23 [1] CRAN (R 4.1.0)                     
#>  yaml          2.2.1       2020-02-01 [1] CRAN (R 4.1.0)                     
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
@simonpcouch
Copy link
Collaborator

Didn't end up coming to a fix, but some notes from poking around with this:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(infer)

r_hat <- gss %>% 
  observe(college ~ sex, success = "no degree",
          stat = "ratio of props", order = c("female", "male"))

set.seed(33)

null_dist <- gss %>%
  specify(college ~ sex, success = "no degree") %>%
  hypothesize(null = "independence") %>% 
  generate(reps = 1000) %>% 
  calculate(stat = "ratio of props", order = c("female", "male"))
#> Setting `type = "permute"` in `generate()`.

visualize(null_dist) +
  shade_p_value(obs_stat = r_hat, direction = "two-sided")

# just draw the problematic portion
visualize(null_dist) +
  shade_p_value(obs_stat = r_hat, direction = "left")

# or both of them, using a slightly jittered r_hat
r_hat$stat <- .989

visualize(null_dist) +
  shade_p_value(obs_stat = r_hat, direction = "both")

Created on 2021-11-13 by the reprex package (v2.0.0)

The calculated statistic seems to fall exactly on the boundaries of one of the bins, and this results in the y-coordinate from geom_area having twice the value it should for that right-most bin on the LHS. I'm mostly poking around here:

infer/R/shade_p_value.R

Lines 245 to 252 in a4a41d6

left_area <- one_tail_area(
min(obs_stat, second_border), "left", do_warn = FALSE
)(data)
right_area <- one_tail_area(
max(obs_stat, second_border), "right", do_warn = FALSE
)(data)
dplyr::bind_rows(left_area, right_area)

Note, in this case, that

obs_stat == second_border
> TRUE

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants