Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P-value Rounding #257

Closed
atheobold opened this issue Oct 9, 2019 · 22 comments
Closed

P-value Rounding #257

atheobold opened this issue Oct 9, 2019 · 22 comments
Assignees

Comments

@atheobold
Copy link

Currently, when a test results in a very small p-value the default output rounds this p-value to 0. This hinders the conversation of properly reporting the size of you p-value, when teaching hypothesis testing.
Could we modify the default output for small p-values?

@echasnovski
Copy link
Collaborator

Could you, please, provide a small reproducible example of this behavior? Package reprex may be quite handy here.

@atheobold
Copy link
Author

library(infer)
library(tidyverse)

head(iris)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

diff_means <- iris %>% 
              filter(Species != "versicolor") %>% 
              specify(Sepal.Length ~ Species) %>% 
              calculate(stat = "diff in means", 
                        order = c("setosa", "virginica"))

permute <- iris %>% 
           filter(Species != "versicolor") %>% 
           specify(Sepal.Length ~ Species) %>% 
           hypothesize(null = "independence") %>% 
           generate(reps = 1000, type = "permute") %>% 
           calculate(stat = "diff in means", 
                     order = c("setosa", "virginica"))


get_pvalue(permute, diff_means, direction = "two_sided")
#> # A tibble: 1 x 1
#>   p_value
#>     <dbl>
#> 1       0

Created on 2019-10-09 by the reprex package (v0.3.0)

@andrewpbray
Copy link
Collaborator

I believe this is a case where the p-value is actually effectively zero. If you plot the observed difference in the context of the null distribution:

library(infer)
library(tidyverse)

diff_means <- iris %>% 
  filter(Species != "versicolor") %>% 
  specify(Sepal.Length ~ Species) %>% 
  calculate(stat = "diff in means", 
            order = c("setosa", "virginica"))

permute <- iris %>% 
  filter(Species != "versicolor") %>% 
  specify(Sepal.Length ~ Species) %>% 
  hypothesize(null = "independence") %>% 
  generate(reps = 1000, type = "permute") %>% 
  calculate(stat = "diff in means", 
            order = c("setosa", "virginica"))

permute %>%
  visualize() +
  shade_p_value(obs_stat = diff_means, direction = "two_sided")

Created on 2019-10-09 by the reprex package (v0.3.0)

Is it such an extreme value that reporting a p-value of zero seems reasonable to me.

@atheobold
Copy link
Author

Yes, you are correct. In this case, the p-value is negligible and effectively zero. Yet, to students, the p-value reported appears to be exactly zero. Concerningly, when using a parametric approach, with a t-test, the p-value reported is not exactly 0, but a very very very small value.

library(infer)
library(tidyverse)

iris %>% 
  filter(Species != "versicolor") %>% 
  t_test(Sepal.Length ~ Species, 
         order = c("setosa", "virginica"), 
         alternative = "two_sided")
#> # A tibble: 1 x 6
#>   statistic  t_df  p_value alternative lower_ci upper_ci
#>       <dbl> <dbl>    <dbl> <chr>          <dbl>    <dbl>
#> 1     -15.4  76.5 3.97e-25 two.sided      -1.79    -1.38

Created on 2019-10-10 by the reprex package (v0.3.0)

Generally, we teach students that the probability of any value on a continuous distribution is never 0. Furthermore, when employing a permutation test, a p-value is never exactly 0. Instead, it is less than the number of replications you performed.

I suggest two possible options to alleviate these concerns when the p-value is effectively 0:

  1. change the output of the permutation test to be ~0 instead of 0
  2. change the output of the permutation test to be < 1/(number of reps)

@echasnovski
Copy link
Collaborator

For reference: very similar issues are discussed in #205. Mainly after this comment.

@atheobold
Copy link
Author

Yes, the issues brought up in this issue reflect a similar sentiment towards the reporting of an exact 0 for a p-value when using a permutation test. I agree with @TimHesterberg that "a p-value of zero is literally impossible. You shouldn't report an impossible p-value" (comment). As @ismayc correctly asserted, a p-value in a permutation test is "dependent on how many samples we generated" (comment).

Therefore, I believe it would be reasonable to take either of the two approaches I proposed above to remedy this output issue. Adding a simple check to see if the p-value is below a given threshold and modifying the output to be "~0" or "< 1/reps" seems to be a reasonable resolution.

@echasnovski
Copy link
Collaborator

Hmmm... I just realized that from a programming perspective these approaches are tricky. Output of get_pvalue() contains a number, but "~0" and "< 1/res" are strings. Those are doable only assuming that the p-value output is a final product of a pipeline and will not be used in a later computation. Otherwise it will through errors in many cases.

@ismayc
Copy link
Collaborator

ismayc commented Oct 13, 2019

Maybe returning a warning stating that a value of 0 is based on the number of simulations chosen and is more likely to be very small would suffice instead of trying to return a string here instead. We could also create a vignette showing this expected behavior so that users are aware of it before it actually happens.

@atheobold
Copy link
Author

@echasnovski, I see the difficulty with coercing the p-value to a character and how that may result in issues down the road. I agree with @ismayc that adding in a warning based on the size of the p-value or the number of simulations would be sufficient to make students think twice before reporting a p-value of 0.

@echasnovski
Copy link
Collaborator

Based on the discussion, the best compromise seems to be adding a warning if output is exactly 0, as @ismayc suggested.
Something like "Note that zero p-value is based on number of random samples computed in generate() step. Here it means that p-value is likely to be less than {1/reps}.", where reps is number of samples computed in generate() step (i.e. number of samples drawn from null distribution.
@atheobold, is this good enough?

@TimHesterberg
Copy link

TimHesterberg commented Jan 3, 2020 via email

@atheobold
Copy link
Author

@echasnovski, I like your compromise. While @TimHesterberg's suggestion is interesting, I don't feel that the output needs to contain a confidence interval for the p-value.

Rather, I would appreciate it if when the output of get_pvalue() is 0 that it come with a word of caution. This will help the teachers who use the infer package when teaching permutation tests to better convey to students how p-values from permutation tests should be reported.

@TimHesterberg
Copy link

TimHesterberg commented Jan 3, 2020 via email

@echasnovski
Copy link
Collaborator

Deciding on what to use as upper bound of p-value in case it is zero proved to be challenging. @TimHesterberg, I'd be honest: I didn't really understand the reasoning behind choosing 3/reps as upper bound (I got confused with "If the true P-value is 3/reps", because we don't really know true p-value). However, I really liked the CI approach to the problem.

That got me thinking about how to approach this, which ended up being the following framework I can understand. Each time a value is generated from a null distribution it can end up inside critical region (region, probability of which under null distribution is p-value by definition) or outside of it. So "is value inside a region" generation can be viewed as sampling from Bernoulli distribution, main parameter of which constitutes a desirable p-value. Confidence interval of its parameter is thus can be viewed as confidence interval for p-value.

I knew that this should be something meaningful when I found this answer on Cross Validated, which states that under certain conditions 95% CI for zero estimate is [0, 3/reps].

So I suggest the following warning: "Note that zero p-value is based on number of random samples computed in `generate()` step. Here it means that p-value is 95% likely to be between 0 and {ci[2]}.". Variable ci is a numeric vector of length 2, which is computed with binom.test() (with default arguments). It aligns pretty good with 3/reps and has nice theoretical background. One question though: is phrase "95% likely to be" understandable and correct?

@andrewpbray, @ismayc, any thoughts? By the way, this also means that we can implement a get_p_value_ci() function (if its existence and approach are agreed to be reasonable).

@ismayc
Copy link
Collaborator

ismayc commented Jan 4, 2020

I fear that adding this much content to a warning message is going to further confuse some introductory statistics students when they see it. Something like “Please be cautious. A value of 0 for the p-value here is an approximation based on the number of reps chosen. The true p-value is a small value likely less than 1/reps.” might be better.

I used 1 instead of 3 above since we probably shouldn’t assume someone using the {infer} package has experience with distributions like the Poisson. Alternatively, a vignette could be created and then linked in the warning message to explain what is going on so the message isn’t so wordy.

@echasnovski
Copy link
Collaborator

Ah, so not replace 1/reps and reps with actual values? Yeah, this might work and is simpler implementation. However, using 1/reps does seem like imprecise alternative.

So I'll soon just add a simple warning without much extra information about upper bound of p-value.

@TimHesterberg
Copy link

TimHesterberg commented Jan 4, 2020 via email

@ismayc
Copy link
Collaborator

ismayc commented Jan 4, 2020

One reason we avoided all of this is the stats student using {infer} has been taught to think about the p-value in an intuitive way. I understand the concerns about imprecision so feel free to adapt as needed, but a warning message saying something like 7.22/1000 is going to be more confusing than it is worth to an Intro stats reader while inside R. I recommend adding a simple warning to be cautious of the p-value when it is 0 (or sufficiently small) in R and then add a vignette that goes into a lot more detail and link to that in the warning. In the vignette all of this discussion can be added with a lot more detail. This helps solve both problems of being cautious of reps size and also providing further information as needed without overwhelming a novice learner.

@echasnovski
Copy link
Collaborator

I think vignette is a little bit too much. Section in help page of get_p_value() I think should be enough together with this message (in case exactly 0): “Please be cautious. A value of 0 for the p-value here is an approximation based on the number of `reps` chosen in `generate()` step. The true p-value is a small value likely less than `3/reps`.”

@ismayc
Copy link
Collaborator

ismayc commented Jan 4, 2020

That’s fine with me too! If someone wanted to contribute a vignette to explain further via PR, we welcome it too though.

@atheobold
Copy link
Author

I agree with @ismayc in that when making changes we should be mindful of who is using the infer package and for what purpose. I suspect that many others, like myself, are using the infer package for a more intuitive interface for teaching simulation based methods in R in an Introductory or Intermediate Statistics course. The students in these courses often struggle with the concept of a p-value, which makes outputting a p-value of exactly 0 problematic.

While statistically I agree with the precision of presenting an upperbound for the p-value of a permutation test, this will likely confuse students. Many Introductory Statistics textbooks which employ simulation based inference advocate that p-values are reported to be less than 1 in the number of replications generated.

I appreciate the revised warning message, but do think that further information about why the value of 3/reps is being reported would be useful!

@ismayc ismayc closed this as completed Jan 27, 2020
@github-actions
Copy link

github-actions bot commented Mar 8, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants