Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add z argument to prop_test #353

Merged
merged 8 commits into from
Dec 26, 2020
Merged

add z argument to prop_test #353

merged 8 commits into from
Dec 26, 2020

Conversation

simonpcouch
Copy link
Collaborator

@simonpcouch simonpcouch commented Dec 22, 2020

[EDIT: fixed visual clutter of diffs!]

Here’s a go at implementing the z argument for prop_test() to report the z statistic instead of the chi-square statistic!

I opted to fully make use of the infer observed statistic pipeline rather than capitalizing on the connection between the standard normal and chi-square to calculate the statistic. Otherwise, the output piggybacks off of the current output, switching out the statistic, removing the chisq_df column, and leaving the rest of the columns.

This implementation also turns off the Yates continuity correction when z = TRUE so that p-values and intervals align with manual calculations as expected. It also introduces a non-breaking correct argument that was previously passed through ... to avoid introducing weird internal logic to overwrite components of the ellipses that were modified elsewhere.

library(infer)  
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
  
# one-way --------------------------------------------
# chi-square with prop.test
prop.test(table(gss$college), p = .8)
#> 
#>  1-sample proportions test with continuity correction
#> 
#> data:  table(gss$college), null probability 0.8
#> X-squared = 67.528, df = 1, p-value < 2.2e-16
#> alternative hypothesis: true p is not equal to 0.8
#> 95 percent confidence interval:
#>  0.6082126 0.6934143
#> sample estimates:
#>     p 
#> 0.652

# chi-square with prop_test
prop_test(gss, college ~ NULL, success = "no degree", p = .8)
#> # A tibble: 1 x 4
#>   statistic chisq_df  p_value alternative
#>       <dbl>    <int>    <dbl> <chr>      
#> 1      67.5        1 2.08e-16 two.sided

# z with prop_test
prop_test(gss, college ~ NULL, success = "no degree", p = .8, z = TRUE)
#> # A tibble: 1 x 3
#>   statistic  p_value alternative
#>       <dbl>    <dbl> <chr>      
#> 1     -8.27 1.30e-16 two.sided

# z^2 != 67.5
prop_test(gss, college ~ NULL, success = "no degree", p = .8, z = TRUE) %>%
  select(statistic) %>%
  pull() %>%
  `*`(., .)
#> [1] 68.45

# chi-square, turning off Yates' continuity correction,
# to see z^2 = \chi^2_1
prop_test(gss, college ~ NULL, success = "no degree", p = .8, correct = FALSE)
#> # A tibble: 1 x 4
#>   statistic chisq_df  p_value alternative
#>       <dbl>    <int>    <dbl> <chr>      
#> 1      68.5        1 1.30e-16 two.sided

# two-way --------------------------------------------
prop_test(gss, college ~ sex, order = c("female", "male"), z = TRUE)
#> # A tibble: 1 x 5
#>   statistic p_value alternative lower_ci upper_ci
#>       <dbl>   <dbl> <chr>          <dbl>    <dbl>
#> 1   -0.0985   0.922 two.sided    -0.0965   0.0873

prop_test(gss, college ~ sex, order = c("female", "male"), z = TRUE) %>%
  select(statistic) %>%
  pull() %>%
  `*`(., .)
#> [1] 0.009707371

prop_test(gss, college ~ sex, order = c("female", "male"), correct = FALSE)
#> # A tibble: 1 x 6
#>   statistic chisq_df p_value alternative lower_ci upper_ci
#>       <dbl>    <dbl>   <dbl> <chr>          <dbl>    <dbl>
#> 1   0.00971        1   0.922 two.sided    -0.0965   0.0873

This PR also enables the following code, which used to return the specified gss as is:

gss %>% 
  specify(college ~ sex, success = "degree") %>%
  hypothesize(null = "independence") %>%
  calculate(stat = "z", order = c("female", "male"))
#> # A tibble: 1 x 1
#>     stat
#>    <dbl>
#> 1 0.0985

…to give the same output as this code:

gss %>% 
  specify(college ~ sex, success = "degree") %>%
  calculate(stat = "z", order = c("female", "male"))
#> # A tibble: 1 x 1
#>     stat
#>    <dbl>
#> 1 0.0985

Relatedly, I agree with Evgeni’s comment on my last PR that calculate() deserves a refactor. Probably worth specifying more strictly what calculate()s behavior should be in calculating observed test statistics—I’ll open a PR/issue related to this after the holidays! Namely for this PR, I think deleting the generate() line in an infer pipeline and otherwise leaving code as is should report the appropriate observed statistic. If hypothesize() is omitted when it probably ought not to be (for reporting point estimates), assume a reasonable null value and report it as a warning.

Created on 2020-12-22 by the reprex package (v0.3.0.9001)

Closes #347.

need the new z functionality from #351
allows the user to report a standard normal deviate rather than a chi-square statistic.

also enables two-sample z statistic calculation after having hypothesized and introduces a non-breaking `correct` argument to avoid unnecessarily complex logic regarding `...`.
@ismayc
Copy link
Collaborator

ismayc commented Dec 22, 2020

Beautiful! Thanks much and happy holidays!

NEWS.md Outdated Show resolved Hide resolved
R/wrappers.R Outdated Show resolved Hide resolved
R/wrappers.R Outdated Show resolved Hide resolved
@echasnovski
Copy link
Collaborator

Nice work! Thank you.

@echasnovski echasnovski merged commit 34d296c into develop Dec 26, 2020
@simonpcouch simonpcouch deleted the prop_test branch December 26, 2020 20:26
@github-actions
Copy link

github-actions bot commented Mar 8, 2021

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants