Success should always be defined #57

mine-cetinkaya-rundel · 2017-12-26T19:16:48Z

In working on a solution for #53 I realized that if success isn't defined by the user we make an assumption and use the first level as success. I don't think this is a good approach because it makes the function "just work" without the student properly formulating the inference chain. I propose removing https://github.com/andrewpbray/infer/blob/f034a8629dd42c83d4fbb33dd1b80324262dbb12/R/calculate.R#L60 and https://github.com/andrewpbray/infer/blob/f034a8629dd42c83d4fbb33dd1b80324262dbb12/R/calculate.R#L61 and instead throwing an error saying success must be defined when the response variable is categorical. I can put that in the same PR as the order solution.

Also, as far as I can tell currently the "diff in props" example isn't working since success functionality isn't built into the difference in proportions case (only the single proportion).

The text was updated successfully, but these errors were encountered:

andrewpbray · 2017-12-26T20:15:20Z

Yeah, changing around those default makes sense to me.

One thing that's frustrating is that in the one sample case, we have a different and maybe more compact way of specifying this sort of thing higher up in the chain:

hypothesize(null = "point", p = ("boat" = .3))

I guess we're addressing another issue here, but another option for the one sample case where the parameter is fully specified in hypothesize() would be to use something like calculate(stat = "point est").

mine-cetinkaya-rundel · 2017-12-26T20:21:02Z

I was just about to comment on this @andrewpbray. That it looks odd to specify in both places -- in hypothesize and in calculate. But I don't think we want to specify in hypothesize fully because then it looks much more different to do a confidence interval. So I think we do need to do with (what appears to be) the redundant route. It would look something like this for the one prop case:

mtcars %>%
  specify(response = am) %>% # alt: am ~ NULL (or am ~ 1)
  hypothesize(null = "point", p = c("1" = 0.25)) %>% 
  generate(reps = 100, type = "simulate") %>% 
  calculate(stat = "prop", success = 1)

Two comments here:

success is not a string. I think it makes more sense for it to match the type of the variable.
Now it looks weird that success and p look different (one is string the other is numerical).

We could say we need to be consistent and make both strings because I can't imagine p = c(1 = 0.25)) would read well (and I'm not even sure if it would work). But I do somewhat feel strongly about type of success matching the input type. Thoughts?

andrewpbray · 2017-12-26T20:25:53Z

Yep, I agree with your assessment. p = c(1 = 0.25) breaks my brain, so I say we hold our noses and go with strings in both places.

Such are the compromises when trying to make onetruetest =/

mine-cetinkaya-rundel · 2017-12-26T20:30:55Z

Maybe the right thing to do is to "specify" the success in the specify? I don't know if this is too big a change, but when I'm thinking about formulating a problem I'd think about what the success is at the beginning, not at the end. Like if I want to do a confidence interval for the proportion of Democrats, I would think "I'm working with the party variable, and I'm interested in the Democrat level", not do the whole thing and then at the end think "what, so which level am I interested in?"

andrewpbray · 2017-12-26T20:39:14Z

Hunh, interesting idea. I had thought of specify() has a narrowly disguised select(), but this does seem like a very natural way to solve this problem. So let's see, we could have this:

# one sample
mtcars %>%
  specify(response = am, success = 1) %>%
  hypothesize(null = "point", p = 0.25) %>% 
  generate(reps = 100, type = "simulate") %>% 
  calculate(stat = "prop")

# two sample
mtcars %>%
  specify(am ~ vs, success = 1) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 100, type = "permute") %>%
  calculate(stat = "diff in props")

# GoF
mtcars %>%
  specify(cyl ~ NULL) %>%
  hypothesize(null = "point", p = c("4" = .5, "6" = .25, "8" = .25)) %>%
  generate(reps = 100, type = "simulate") %>%
  calculate(stat = "Chisq")

That way we'd only need to use the named vector thing for the GoF case.

I'm for it. @ismayc ?

ismayc · 2017-12-26T21:35:18Z

I like adding the success argument to specify().

andrewpbray · 2017-12-27T01:19:14Z

Hopefully this is addressed in the current PR.

mine-cetinkaya-rundel · 2017-12-27T01:40:00Z

@andrewpbray I didn't realize you were working on it. I was independently too. I'm going to wait for this PR to be merged to rework the order functionality then.

andrewpbray · 2017-12-29T23:55:46Z

Closed by fd5bbf7.

github-actions · 2021-03-10T00:10:12Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

andrewpbray closed this as completed Dec 29, 2017

github-actions bot locked and limited conversation to collaborators Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Success should always be defined #57

Success should always be defined #57

mine-cetinkaya-rundel commented Dec 26, 2017

andrewpbray commented Dec 26, 2017

mine-cetinkaya-rundel commented Dec 26, 2017

andrewpbray commented Dec 26, 2017

mine-cetinkaya-rundel commented Dec 26, 2017 •

edited

Loading

andrewpbray commented Dec 26, 2017

ismayc commented Dec 26, 2017

andrewpbray commented Dec 27, 2017

mine-cetinkaya-rundel commented Dec 27, 2017

andrewpbray commented Dec 29, 2017

github-actions bot commented Mar 10, 2021

Success should always be defined #57

Success should always be defined #57

Comments

mine-cetinkaya-rundel commented Dec 26, 2017

andrewpbray commented Dec 26, 2017

mine-cetinkaya-rundel commented Dec 26, 2017

andrewpbray commented Dec 26, 2017

mine-cetinkaya-rundel commented Dec 26, 2017 • edited Loading

andrewpbray commented Dec 26, 2017

ismayc commented Dec 26, 2017

andrewpbray commented Dec 27, 2017

mine-cetinkaya-rundel commented Dec 27, 2017

andrewpbray commented Dec 29, 2017

github-actions bot commented Mar 10, 2021

mine-cetinkaya-rundel commented Dec 26, 2017 •

edited

Loading