You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure if this is a problem per se or whether we should just not use just-created summary columns for new columns (but maybe in this case nicer to produce an error?). I get slightly different values each time I run summarise, as shown below. It does work properly when I avoid reusing the just-created columns. So far I've only seen this when I have more than one summary column reusing a previously created summary column, which is why in all the examples below I've got columns diff1 and diff2.
I'm running dplyr 0.1.1 and R 3.0.2 on OS X Mavericks.
> require('dplyr')
> df <- tbl_df(data.frame(id=c(1,1,2,2,3,3), a=1:6))
> df %.% group_by(id) %.% summarise(biggest=max(a), smallest=min(a), diff1=biggest-smallest, diff2=smallest-biggest)
Source: local data frame [3 x 5]
id biggest smallest diff1 diff2
1 3 6 5 1 0
2 2 4 3 1 0
3 1 2 1 1 0
> # produces some randomly different values when rerun, not always the same ones, and sometimes segfaulted on big tbl_dfs I was working with
> df %.% group_by(id) %.% summarise(biggest=max(a), smallest=min(a), diff1=biggest-smallest, diff2=smallest-biggest)
Source: local data frame [3 x 5]
id biggest smallest diff1 diff2
1 3 6 5 1 32643
2 2 4 3 1 0
3 1 2 1 1 0
> df %.% group_by(id) %.% summarise(biggest=max(a), smallest=min(a), diff1=biggest-smallest, diff2=smallest-biggest)
Source: local data frame [3 x 5]
id biggest smallest diff1 diff2
1 3 6 5 1 -32641
2 2 4 3 1 0
3 1 2 1 1 0
> # but this seems to work consistently
> df %.% group_by(id) %.% summarise(biggest=max(a), smallest=min(a), diff1=max(a)-min(a), diff2=min(a)-max(a)) # seems to work
Source: local data frame [3 x 5]
id biggest smallest diff1 diff2
1 3 6 5 1 -1
2 2 4 3 1 -1
3 1 2 1 1 -1
The text was updated successfully, but these errors were encountered:
Thanks. That was a tricky one. The c++ class I used to handle newly created variables (SummarisedSubsetTemplate) was using a field to keep track of the group the indices were referring to, assuming that groups were being handled in sequence.
When making diff1, it considered it was processing groups 0, 1 and 2, and when making diff2, it kept going and considered it was processing groups 3, 4, 5.
I augmented the SlicingIndex class so that it has both the actual indices but also the index of the group.
Not sure if this is a problem per se or whether we should just not use just-created summary columns for new columns (but maybe in this case nicer to produce an error?). I get slightly different values each time I run summarise, as shown below. It does work properly when I avoid reusing the just-created columns. So far I've only seen this when I have more than one summary column reusing a previously created summary column, which is why in all the examples below I've got columns
diff1
anddiff2
.I'm running dplyr 0.1.1 and R 3.0.2 on OS X Mavericks.
The text was updated successfully, but these errors were encountered: