Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault caused by min function in summarise function #1481

Closed
Fija opened this issue Oct 28, 2015 · 6 comments
Closed

Segfault caused by min function in summarise function #1481

Fija opened this issue Oct 28, 2015 · 6 comments
Assignees

Comments

@Fija
Copy link

Fija commented Oct 28, 2015

When apply min function on summarizing an empty data.frame, it would lead to segfault.

df <- tbl_df(data.frame(A=c(1)))
df <- df %>% filter(A > 1)
df %>% summarise(Min=min(A, na.rm=T))

more details see this StackOverflow Question

@romainfrancois
Copy link
Member

@hadley should I also throw a warning like R does ?

@hadley
Copy link
Member

hadley commented Oct 29, 2015

I don't think so - R isn't consistent anyway compare sum() and max()

@bwlewis
Copy link

bwlewis commented Oct 29, 2015

+/- Inf is not always expected; sorry I'm not good enough w/c++ to write a sol'n but I can give an example:

min(1[-)],na.rm=TRUE)
[1] Inf
Warning message:
In min(1[-1], na.rm = TRUE) :
  no non-missing arguments to min; returning Inf

>  min("a"[-1],na.rm=TRUE)
[1] NA
Warning message:
In min("a"[-1], na.rm = TRUE) : no non-missing arguments, returning NA

But in dplyr with the current commit:

> summarise(data.frame(A=1[-1]), Min=min(na.rm=TRUE))
  Min
1 Inf
Warning message:
In min(na.rm = TRUE) : no non-missing arguments to min; returning Inf

> summarise(data.frame(A="a"[-1]), Min=min(na.rm=TRUE))
  Min
1 Inf
Warning message:
In min(na.rm = TRUE) : no non-missing arguments to min; returning Inf

@bwlewis
Copy link

bwlewis commented Oct 29, 2015

For the record I think R is idiosyncratic here. I kind of wish min(numeric()) returned numeric(0), and so on for other types. Oh well.

@hadley
Copy link
Member

hadley commented Oct 29, 2015

I think the contract for min() (and all other summary functions is that they always return a numeric vector of length 1, so I think returning a 0-length vector would be less helpful, not more.

@bwlewis
Copy link

bwlewis commented Oct 29, 2015

Yes, that's what I meant by idiosyncratic. Mathematically, the minimum of a set S is the intersection of S with infimum(S), a.k.a. the smallest value in the set. Mathematica, Matlab, etc. return min(empty set) = empty set, consistent with that definition, R does not.

--Sorry, this is an R issue really, not a dplyr one. Not the best place to discuss it!

@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants