add z argument to prop_test #353

simonpcouch · 2020-12-22T21:24:59Z

[EDIT: fixed visual clutter of diffs!]

Here’s a go at implementing the z argument for prop_test() to report the z statistic instead of the chi-square statistic!

I opted to fully make use of the infer observed statistic pipeline rather than capitalizing on the connection between the standard normal and chi-square to calculate the statistic. Otherwise, the output piggybacks off of the current output, switching out the statistic, removing the chisq_df column, and leaving the rest of the columns.

This implementation also turns off the Yates continuity correction when z = TRUE so that p-values and intervals align with manual calculations as expected. It also introduces a non-breaking correct argument that was previously passed through ... to avoid introducing weird internal logic to overwrite components of the ellipses that were modified elsewhere.

library(infer)  
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
  
# one-way --------------------------------------------
# chi-square with prop.test
prop.test(table(gss$college), p = .8)
#> 
#>  1-sample proportions test with continuity correction
#> 
#> data:  table(gss$college), null probability 0.8
#> X-squared = 67.528, df = 1, p-value < 2.2e-16
#> alternative hypothesis: true p is not equal to 0.8
#> 95 percent confidence interval:
#>  0.6082126 0.6934143
#> sample estimates:
#>     p 
#> 0.652

# chi-square with prop_test
prop_test(gss, college ~ NULL, success = "no degree", p = .8)
#> # A tibble: 1 x 4
#>   statistic chisq_df  p_value alternative
#>       <dbl>    <int>    <dbl> <chr>      
#> 1      67.5        1 2.08e-16 two.sided

# z with prop_test
prop_test(gss, college ~ NULL, success = "no degree", p = .8, z = TRUE)
#> # A tibble: 1 x 3
#>   statistic  p_value alternative
#>       <dbl>    <dbl> <chr>      
#> 1     -8.27 1.30e-16 two.sided

# z^2 != 67.5
prop_test(gss, college ~ NULL, success = "no degree", p = .8, z = TRUE) %>%
  select(statistic) %>%
  pull() %>%
  `*`(., .)
#> [1] 68.45

# chi-square, turning off Yates' continuity correction,
# to see z^2 = \chi^2_1
prop_test(gss, college ~ NULL, success = "no degree", p = .8, correct = FALSE)
#> # A tibble: 1 x 4
#>   statistic chisq_df  p_value alternative
#>       <dbl>    <int>    <dbl> <chr>      
#> 1      68.5        1 1.30e-16 two.sided

# two-way --------------------------------------------
prop_test(gss, college ~ sex, order = c("female", "male"), z = TRUE)
#> # A tibble: 1 x 5
#>   statistic p_value alternative lower_ci upper_ci
#>       <dbl>   <dbl> <chr>          <dbl>    <dbl>
#> 1   -0.0985   0.922 two.sided    -0.0965   0.0873

prop_test(gss, college ~ sex, order = c("female", "male"), z = TRUE) %>%
  select(statistic) %>%
  pull() %>%
  `*`(., .)
#> [1] 0.009707371

prop_test(gss, college ~ sex, order = c("female", "male"), correct = FALSE)
#> # A tibble: 1 x 6
#>   statistic chisq_df p_value alternative lower_ci upper_ci
#>       <dbl>    <dbl>   <dbl> <chr>          <dbl>    <dbl>
#> 1   0.00971        1   0.922 two.sided    -0.0965   0.0873

This PR also enables the following code, which used to return the specified gss as is:

gss %>% 
  specify(college ~ sex, success = "degree") %>%
  hypothesize(null = "independence") %>%
  calculate(stat = "z", order = c("female", "male"))
#> # A tibble: 1 x 1
#>     stat
#>    <dbl>
#> 1 0.0985

…to give the same output as this code:

gss %>% 
  specify(college ~ sex, success = "degree") %>%
  calculate(stat = "z", order = c("female", "male"))
#> # A tibble: 1 x 1
#>     stat
#>    <dbl>
#> 1 0.0985

Relatedly, I agree with Evgeni’s comment on my last PR that calculate() deserves a refactor. Probably worth specifying more strictly what calculate()s behavior should be in calculating observed test statistics—I’ll open a PR/issue related to this after the holidays! Namely for this PR, I think deleting the generate() line in an infer pipeline and otherwise leaving code as is should report the appropriate observed statistic. If hypothesize() is omitted when it probably ought not to be (for reporting point estimates), assume a reasonable null value and report it as a warning.

^{Created on 2020-12-22 by the reprex package (v0.3.0.9001)}

Closes #347.

need the new z functionality from #351

allows the user to report a standard normal deviate rather than a chi-square statistic. also enables two-sample z statistic calculation after having hypothesized and introduces a non-breaking `correct` argument to avoid unnecessarily complex logic regarding `...`.

ismayc · 2020-12-22T23:05:53Z

Beautiful! Thanks much and happy holidays!

NEWS.md

R/wrappers.R

echasnovski · 2020-12-26T20:16:02Z

Nice work! Thank you.

github-actions · 2021-03-08T00:16:46Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

simonpcouch added 7 commits December 20, 2020 15:38

add specification of success to examples

871a62c

merge develop into prop_test

4f6befa

need the new z functionality from #351

add additional prop_test example

d177dfc

add z argument unit testing

6f52bc3

update NEWS

4947ae0

Merge branch 'develop' into prop_test

61bdb04

simonpcouch requested review from echasnovski and ismayc December 22, 2020 21:39

simonpcouch mentioned this pull request Dec 22, 2020

success argument in prop_test() #347

Closed

ismayc approved these changes Dec 22, 2020

View reviewed changes

echasnovski reviewed Dec 24, 2020

View reviewed changes

NEWS.md Outdated Show resolved Hide resolved

echasnovski reviewed Dec 24, 2020

View reviewed changes

R/wrappers.R Outdated Show resolved Hide resolved

R/wrappers.R Outdated Show resolved Hide resolved

use internal helper, default correct = NULL

2bbc0b4

echasnovski merged commit 34d296c into develop Dec 26, 2020

simonpcouch deleted the prop_test branch December 26, 2020 20:26

github-actions bot locked and limited conversation to collaborators Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add z argument to prop_test #353

add z argument to prop_test #353

simonpcouch commented Dec 22, 2020 •

edited

Loading

ismayc commented Dec 22, 2020

echasnovski commented Dec 26, 2020

github-actions bot commented Mar 8, 2021

add z argument to prop_test #353

add z argument to prop_test #353

Conversation

simonpcouch commented Dec 22, 2020 • edited Loading

ismayc commented Dec 22, 2020

echasnovski commented Dec 26, 2020

github-actions bot commented Mar 8, 2021

simonpcouch commented Dec 22, 2020 •

edited

Loading