P-value Rounding #257

atheobold · 2019-10-09T02:39:15Z

Currently, when a test results in a very small p-value the default output rounds this p-value to 0. This hinders the conversation of properly reporting the size of you p-value, when teaching hypothesis testing.
Could we modify the default output for small p-values?

echasnovski · 2019-10-09T07:20:36Z

Could you, please, provide a small reproducible example of this behavior? Package reprex may be quite handy here.

atheobold · 2019-10-09T20:13:08Z

library(infer)
library(tidyverse)

head(iris)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

diff_means <- iris %>% 
              filter(Species != "versicolor") %>% 
              specify(Sepal.Length ~ Species) %>% 
              calculate(stat = "diff in means", 
                        order = c("setosa", "virginica"))

permute <- iris %>% 
           filter(Species != "versicolor") %>% 
           specify(Sepal.Length ~ Species) %>% 
           hypothesize(null = "independence") %>% 
           generate(reps = 1000, type = "permute") %>% 
           calculate(stat = "diff in means", 
                     order = c("setosa", "virginica"))


get_pvalue(permute, diff_means, direction = "two_sided")
#> # A tibble: 1 x 1
#>   p_value
#>     <dbl>
#> 1       0

^{Created on 2019-10-09 by the reprex package (v0.3.0)}

andrewpbray · 2019-10-09T21:01:50Z

I believe this is a case where the p-value is actually effectively zero. If you plot the observed difference in the context of the null distribution:

library(infer)
library(tidyverse)

diff_means <- iris %>% 
  filter(Species != "versicolor") %>% 
  specify(Sepal.Length ~ Species) %>% 
  calculate(stat = "diff in means", 
            order = c("setosa", "virginica"))

permute <- iris %>% 
  filter(Species != "versicolor") %>% 
  specify(Sepal.Length ~ Species) %>% 
  hypothesize(null = "independence") %>% 
  generate(reps = 1000, type = "permute") %>% 
  calculate(stat = "diff in means", 
            order = c("setosa", "virginica"))

permute %>%
  visualize() +
  shade_p_value(obs_stat = diff_means, direction = "two_sided")

^{Created on 2019-10-09 by the reprex package (v0.3.0)}

Is it such an extreme value that reporting a p-value of zero seems reasonable to me.

atheobold · 2019-10-10T18:55:36Z

Yes, you are correct. In this case, the p-value is negligible and effectively zero. Yet, to students, the p-value reported appears to be exactly zero. Concerningly, when using a parametric approach, with a t-test, the p-value reported is not exactly 0, but a very very very small value.

library(infer)
library(tidyverse)

iris %>% 
  filter(Species != "versicolor") %>% 
  t_test(Sepal.Length ~ Species, 
         order = c("setosa", "virginica"), 
         alternative = "two_sided")
#> # A tibble: 1 x 6
#>   statistic  t_df  p_value alternative lower_ci upper_ci
#>       <dbl> <dbl>    <dbl> <chr>          <dbl>    <dbl>
#> 1     -15.4  76.5 3.97e-25 two.sided      -1.79    -1.38

^{Created on 2019-10-10 by the reprex package (v0.3.0)}

Generally, we teach students that the probability of any value on a continuous distribution is never 0. Furthermore, when employing a permutation test, a p-value is never exactly 0. Instead, it is less than the number of replications you performed.

I suggest two possible options to alleviate these concerns when the p-value is effectively 0:

change the output of the permutation test to be ~0 instead of 0
change the output of the permutation test to be < 1/(number of reps)

echasnovski · 2019-10-11T08:45:07Z

For reference: very similar issues are discussed in #205. Mainly after this comment.

atheobold · 2019-10-13T19:14:02Z

Yes, the issues brought up in this issue reflect a similar sentiment towards the reporting of an exact 0 for a p-value when using a permutation test. I agree with @TimHesterberg that "a p-value of zero is literally impossible. You shouldn't report an impossible p-value" (comment). As @ismayc correctly asserted, a p-value in a permutation test is "dependent on how many samples we generated" (comment).

Therefore, I believe it would be reasonable to take either of the two approaches I proposed above to remedy this output issue. Adding a simple check to see if the p-value is below a given threshold and modifying the output to be "~0" or "< 1/reps" seems to be a reasonable resolution.

echasnovski · 2019-10-13T19:49:00Z

Hmmm... I just realized that from a programming perspective these approaches are tricky. Output of get_pvalue() contains a number, but "~0" and "< 1/res" are strings. Those are doable only assuming that the p-value output is a final product of a pipeline and will not be used in a later computation. Otherwise it will through errors in many cases.

ismayc · 2019-10-13T19:51:48Z

Maybe returning a warning stating that a value of 0 is based on the number of simulations chosen and is more likely to be very small would suffice instead of trying to return a string here instead. We could also create a vignette showing this expected behavior so that users are aware of it before it actually happens.

atheobold · 2019-10-16T15:41:49Z

@echasnovski, I see the difficulty with coercing the p-value to a character and how that may result in issues down the road. I agree with @ismayc that adding in a warning based on the size of the p-value or the number of simulations would be sufficient to make students think twice before reporting a p-value of 0.

echasnovski · 2020-01-03T08:53:39Z

Based on the discussion, the best compromise seems to be adding a warning if output is exactly 0, as @ismayc suggested.
Something like "Note that zero p-value is based on number of random samples computed in generate() step. Here it means that p-value is likely to be less than {1/reps}.", where reps is number of samples computed in generate() step (i.e. number of samples drawn from null distribution.
@atheobold, is this good enough?

TimHesterberg · 2020-01-03T14:13:14Z

If the true P-value is 3/reps, there is a 5% chance of observing zero (using a Poisson approximation). So a 95% one-sided CI for the true P-value is (0, 3/reps).

…

On Fri, Jan 3, 2020 at 12:53 AM Evgeni Chasnovski ***@***.***> wrote: Based on the discussion, the best compromise seems to be adding a warning if output is exactly 0, as @ismayc <https://github.com/ismayc> suggested. Something like "Note that zero p-value is based on number of random samples computed in generate() step. Here it means that p-value is likely to be less than {1/reps}.", where reps is number of samples computed in generate() step (i.e. number of samples drawn from null distribution. @atheobold <https://github.com/atheobold>, is this good enough? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#257?email_source=notifications&email_token=AH2N4PEUOOKLHJJEKPHIEALQ334JJA5CNFSM4I6ZLTPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIAT7HY#issuecomment-570507167>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH2N4PERMMTLK2EXZGW673TQ334JJANCNFSM4I6ZLTPA> .

atheobold · 2020-01-03T23:27:34Z

@echasnovski, I like your compromise. While @TimHesterberg's suggestion is interesting, I don't feel that the output needs to contain a confidence interval for the p-value.

Rather, I would appreciate it if when the output of get_pvalue() is 0 that it come with a word of caution. This will help the teachers who use the infer package when teaching permutation tests to better convey to students how p-values from permutation tests should be reported.

TimHesterberg · 2020-01-03T23:56:02Z

I wasn't thinking of explicitly giving a CI. Instead, I suggest replacing the statement Here it means that p-value is likely to be less than {1/reps}." with Here it means that p-value is likely to be less than {3/reps}."

…

On Fri, Jan 3, 2020 at 3:27 PM Allison Theobold ***@***.***> wrote: @echasnovski <https://github.com/echasnovski>, I like your compromise. While @TimHesterberg <https://github.com/TimHesterberg>'s suggestion is interesting, I don't feel that the output needs to contain a confidence interval for the p-value. Rather, I would appreciate it if when the output of get_pvalue() is 0 that it come with a word of caution. This will help the teachers who use the infer package when teaching permutation tests to better convey to students how p-values from permutation tests should be reported. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#257?email_source=notifications&email_token=AH2N4PCBNTJO3I6YUS2TCPTQ37CWPA5CNFSM4I6ZLTPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEICJ7CQ#issuecomment-570728330>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH2N4PCIBJ5UUQEHCMGDTIDQ37CWPANCNFSM4I6ZLTPA> .

echasnovski · 2020-01-04T10:24:10Z

Deciding on what to use as upper bound of p-value in case it is zero proved to be challenging. @TimHesterberg, I'd be honest: I didn't really understand the reasoning behind choosing 3/reps as upper bound (I got confused with "If the true P-value is 3/reps", because we don't really know true p-value). However, I really liked the CI approach to the problem.

That got me thinking about how to approach this, which ended up being the following framework I can understand. Each time a value is generated from a null distribution it can end up inside critical region (region, probability of which under null distribution is p-value by definition) or outside of it. So "is value inside a region" generation can be viewed as sampling from Bernoulli distribution, main parameter of which constitutes a desirable p-value. Confidence interval of its parameter is thus can be viewed as confidence interval for p-value.

I knew that this should be something meaningful when I found this answer on Cross Validated, which states that under certain conditions 95% CI for zero estimate is [0, 3/reps].

So I suggest the following warning: "Note that zero p-value is based on number of random samples computed in `generate()` step. Here it means that p-value is 95% likely to be between 0 and {ci[2]}.". Variable ci is a numeric vector of length 2, which is computed with binom.test() (with default arguments). It aligns pretty good with 3/reps and has nice theoretical background. One question though: is phrase "95% likely to be" understandable and correct?

@andrewpbray, @ismayc, any thoughts? By the way, this also means that we can implement a get_p_value_ci() function (if its existence and approach are agreed to be reasonable).

ismayc · 2020-01-04T16:17:15Z

I fear that adding this much content to a warning message is going to further confuse some introductory statistics students when they see it. Something like “Please be cautious. A value of 0 for the p-value here is an approximation based on the number of reps chosen. The true p-value is a small value likely less than 1/reps.” might be better.

I used 1 instead of 3 above since we probably shouldn’t assume someone using the {infer} package has experience with distributions like the Poisson. Alternatively, a vignette could be created and then linked in the warning message to explain what is going on so the message isn’t so wordy.

echasnovski · 2020-01-04T19:20:43Z

Ah, so not replace 1/reps and reps with actual values? Yeah, this might work and is simpler implementation. However, using 1/reps does seem like imprecise alternative.

So I'll soon just add a simple warning without much extra information about upper bound of p-value.

TimHesterberg · 2020-01-04T20:43:16Z

1/reps is misleading, because it implies too much precision. We want to subtly steer people away from using too few samples, by showing them the actual precision rather than letting them just assume their answer is near perfect. I'd suggest The true P-value could easily be as much as {3/reps}. {Use the actual value.} or The true P-value could easily be as much as {3.7/reps}. {Use the actual value.} The latter value is from poisson.test(0) Here a Poisson is a good approximation to binomial. The default is a 95% two-sided interval, which seems reasonable. Similarly, give warnings if other small numbers are observed. If observe 1, then The true P-value could easily be as much as {5.57/reps}. If observe 2, then The true P-value could easily be as much as {7.22/reps}. See poisson.test(0) poisson.test(1) poisson.test(2)

…

On Sat, Jan 4, 2020 at 11:20 AM Evgeni Chasnovski ***@***.***> wrote: Ah, so not replace 1/reps and reps with actual values? Yeah, this might work and is simpler implementation. However, using 1/reps does seem like imprecise alternative. So I'll soon just add a simple warning without much extra information about upper bound of p-value. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#257?email_source=notifications&email_token=AH2N4PDCEAAF3PRMSX3HCRDQ4DOQZA5CNFSM4I6ZLTPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIC6NRY#issuecomment-570812103>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH2N4PA623CSMTWOJRVG6WDQ4DOQZANCNFSM4I6ZLTPA> .

ismayc · 2020-01-04T21:02:09Z

One reason we avoided all of this is the stats student using {infer} has been taught to think about the p-value in an intuitive way. I understand the concerns about imprecision so feel free to adapt as needed, but a warning message saying something like 7.22/1000 is going to be more confusing than it is worth to an Intro stats reader while inside R. I recommend adding a simple warning to be cautious of the p-value when it is 0 (or sufficiently small) in R and then add a vignette that goes into a lot more detail and link to that in the warning. In the vignette all of this discussion can be added with a lot more detail. This helps solve both problems of being cautious of reps size and also providing further information as needed without overwhelming a novice learner.

echasnovski · 2020-01-04T21:11:53Z

I think vignette is a little bit too much. Section in help page of get_p_value() I think should be enough together with this message (in case exactly 0): “Please be cautious. A value of 0 for the p-value here is an approximation based on the number of `reps` chosen in `generate()` step. The true p-value is a small value likely less than `3/reps`.”

ismayc · 2020-01-04T21:23:33Z

That’s fine with me too! If someone wanted to contribute a vignette to explain further via PR, we welcome it too though.

atheobold · 2020-01-06T20:38:54Z

I agree with @ismayc in that when making changes we should be mindful of who is using the infer package and for what purpose. I suspect that many others, like myself, are using the infer package for a more intuitive interface for teaching simulation based methods in R in an Introductory or Intermediate Statistics course. The students in these courses often struggle with the concept of a p-value, which makes outputting a p-value of exactly 0 problematic.

While statistically I agree with the precision of presenting an upperbound for the p-value of a permutation test, this will likely confuse students. Many Introductory Statistics textbooks which employ simulation based inference advocate that p-values are reported to be less than 1 in the number of replications generated.

I appreciate the revised warning message, but do think that further information about why the value of 3/reps is being reported would be useful!

github-actions · 2021-03-08T00:16:00Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

ismayc assigned echasnovski Jan 3, 2020

echasnovski mentioned this issue Jan 5, 2020

Address zero output of get_p_value() #273

Merged

ismayc closed this as completed Jan 27, 2020

github-actions bot locked and limited conversation to collaborators Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P-value Rounding #257

P-value Rounding #257

atheobold commented Oct 9, 2019

echasnovski commented Oct 9, 2019

atheobold commented Oct 9, 2019

andrewpbray commented Oct 9, 2019

atheobold commented Oct 10, 2019

echasnovski commented Oct 11, 2019

atheobold commented Oct 13, 2019

echasnovski commented Oct 13, 2019

ismayc commented Oct 13, 2019

atheobold commented Oct 16, 2019

echasnovski commented Jan 3, 2020

TimHesterberg commented Jan 3, 2020 via email

atheobold commented Jan 3, 2020

TimHesterberg commented Jan 3, 2020 via email

echasnovski commented Jan 4, 2020

ismayc commented Jan 4, 2020

echasnovski commented Jan 4, 2020

TimHesterberg commented Jan 4, 2020 via email

ismayc commented Jan 4, 2020

echasnovski commented Jan 4, 2020

ismayc commented Jan 4, 2020 •

edited

Loading

atheobold commented Jan 6, 2020

github-actions bot commented Mar 8, 2021

P-value Rounding #257

P-value Rounding #257

Comments

atheobold commented Oct 9, 2019

echasnovski commented Oct 9, 2019

atheobold commented Oct 9, 2019

andrewpbray commented Oct 9, 2019

atheobold commented Oct 10, 2019

echasnovski commented Oct 11, 2019

atheobold commented Oct 13, 2019

echasnovski commented Oct 13, 2019

ismayc commented Oct 13, 2019

atheobold commented Oct 16, 2019

echasnovski commented Jan 3, 2020

TimHesterberg commented Jan 3, 2020 via email

atheobold commented Jan 3, 2020

TimHesterberg commented Jan 3, 2020 via email

echasnovski commented Jan 4, 2020

ismayc commented Jan 4, 2020

echasnovski commented Jan 4, 2020

TimHesterberg commented Jan 4, 2020 via email

ismayc commented Jan 4, 2020

echasnovski commented Jan 4, 2020

ismayc commented Jan 4, 2020 • edited Loading

atheobold commented Jan 6, 2020

github-actions bot commented Mar 8, 2021

ismayc commented Jan 4, 2020 •

edited

Loading