Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results when overwriting and reusing variable in summarise() #2404

Closed
krlmlr opened this issue Feb 3, 2017 · 3 comments
Closed

Inconsistent results when overwriting and reusing variable in summarise() #2404

krlmlr opened this issue Feb 3, 2017 · 3 comments
Assignees
Labels
Milestone

Comments

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Feb 3, 2017

library(dplyr)
library(tibble)
a <- as_tibble(seq(1:10))
for (i in 1:50) print( a %>% summarise(value = mean(value), sd = sd(value)) )

On my machine the sd result is sometimes 1.73, sometimes Inf. Something very strange is going on here.

#2169 (comment), CC @nuskcalb.

@krlmlr
Copy link
Member Author

@krlmlr krlmlr commented Feb 3, 2017

With R --debugger=valgrind I'm getting, among others:

==26363== Conditional jump or move depends on uninitialised value(s)
==26363==    at 0x4EACFA8: R_IsNA (arithmetic.c:120)
==26363==    by 0x4F4D2E4: Rf_formatReal (format.c:329)
==26363==    by 0x4F8179C: do_format (paste.c:482)
==26363==    by 0x4F39DC0: bcEval (eval.c:5658)
==26363==    by 0x4F4499F: Rf_eval (eval.c:616)
==26363==    by 0x4F4526A: forcePromise (eval.c:515)
==26363==    by 0x4F45757: FORCE_PROMISE (eval.c:4258)
==26363==    by 0x4F45757: getvar (eval.c:4300)
==26363==    by 0x4F3D3B1: bcEval (eval.c:5425)
==26363==    by 0x4F4499F: Rf_eval (eval.c:616)
==26363==    by 0x4F4661C: Rf_applyClosure (eval.c:1135)
==26363==    by 0x4F4060C: bcEval (eval.c:5630)
==26363==    by 0x4F4499F: Rf_eval (eval.c:616)
==26363== 

@krlmlr
Copy link
Member Author

@krlmlr krlmlr commented Feb 3, 2017

Reprex output, note the different results after some iterations:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tibble)
a <- as_tibble(seq(1:10))
for (i in 1:50) print( a %>% summarise(value = mean(value), sd = sd(value)) )
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value           sd
#>   <dbl>        <dbl>
#> 1   5.5 5.469883e+96
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value    sd
#>   <dbl> <dbl>
#> 1   5.5   Inf
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value    sd
#>   <dbl> <dbl>
#> 1   5.5   Inf
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value    sd
#>   <dbl> <dbl>
#> 1   5.5   Inf
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value           sd
#>   <dbl>        <dbl>
#> 1   5.5 6.999517e+20
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253
#> # A tibble: 1 × 2
#>   value       sd
#>   <dbl>    <dbl>
#> 1   5.5 1.739253

@krlmlr
Copy link
Member Author

@krlmlr krlmlr commented Feb 3, 2017

Similar results on Windows.

I think the hybrid handlers don't handle new summary variables very well. This looks similar to #2312.

@krlmlr krlmlr self-assigned this Feb 10, 2017
@krlmlr krlmlr added this to the data frame 1 milestone Feb 10, 2017
@krlmlr krlmlr added this to the data frame 1 milestone Feb 10, 2017
@krlmlr krlmlr closed this in #2453 Feb 20, 2017
krlmlr added a commit to krlmlr/dplyr that referenced this issue Feb 21, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

1 participant