New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have any summary function work in calculate #50

Closed
rudeboybert opened this Issue Oct 19, 2017 · 8 comments

Comments

Projects
None yet
4 participants
@rudeboybert
Copy link
Contributor

rudeboybert commented Oct 19, 2017

I really like using The Lady Tasting Tea to motivate hypothesis testing. The following code does the trick:

library(tidyverse)
library(infer)
lady_tasting_tea <- data_frame(
  first = as.factor(c(rep("milk",4), rep("tea",4)))
  )

lady_tasting_tea %>%
  specify(response = first) %>% # alt: am ~ NULL (or am ~ 1)
  hypothesize(null = "point", p = c("milk" = .5, "tea" = .5)) %>% 
  generate(reps = 1000, type = "simulate") %>% 
  calculate(stat = "prop")

However, it would be great if the final stat could be "sum". That way there is one less layer of abstraction between the experiment and the null distribution (students can read off count directly, instead of reading off proportions)

In a more general setting, any many-to-one summary function would be great, for example all those that work with dplyr::summarize()

@ismayc ismayc added the enhancement label Jan 5, 2018

@ismayc

This comment has been minimized.

Copy link
Collaborator

ismayc commented Aug 8, 2018

@echasnovski I have a feeling calculate(sum) might be one case that would work for generalizing here? Do you think that's possible?

@echasnovski

This comment has been minimized.

Copy link
Collaborator

echasnovski commented Aug 8, 2018

I don't think this is actually generalizing, rather adding another acceptable string to calculate().

In my understanding implementation of "sum" should be similar to "mean", "median", "sd" cases. In the example, however, response is a factor and sum() doesn't work with factors.

As this example currently doesn't work (success should be specified for a "prop" statistic), I think an appropriate way would be a possibility of calculate("count") (as said in issue with word description). It would behave exactly like "prop" but with sum() instead of mean() here.

@ismayc

This comment has been minimized.

Copy link
Collaborator

ismayc commented Aug 8, 2018

I think you are right. This might have to be another special case with string input. The success argument was added later on in development.

@echasnovski

This comment has been minimized.

Copy link
Collaborator

echasnovski commented Aug 8, 2018

So is it calculate("count") with success argument, or calculate("sum") for numerical response, or both? After #173 this should be very straightforward to add.

@ismayc

This comment has been minimized.

Copy link
Collaborator

ismayc commented Aug 8, 2018

I think it’s both.

@mine-cetinkaya-rundel

This comment has been minimized.

Copy link
Contributor

mine-cetinkaya-rundel commented Aug 8, 2018

Is the suggestion to not call the statistic "prop" anymore? The reason for that term was so that it matches the name of the parameter on which we do inference.

@echasnovski

This comment has been minimized.

Copy link
Collaborator

echasnovski commented Aug 8, 2018

As I understand it, all current functionality is preserved, including calculate("prop"). A new one will be added: calculate("count"). This will return a number of "successes" inside one bootstrap resample.

ismayc added a commit that referenced this issue Aug 20, 2018

Merge pull request #179 from echasnovski/new-calc-stat
Add new options for `calculate()` (issue #50)

@ismayc ismayc closed this Aug 20, 2018

@ismayc

This comment has been minimized.

Copy link
Collaborator

ismayc commented Aug 20, 2018

Now implemented by @echasnovski on the develop branch via #179

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment