Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stat_count() respects uniqueness of x #5520

Merged
merged 6 commits into from Dec 14, 2023

Conversation

teunbrand
Copy link
Collaborator

This PR aims to fix #4609.

Briefly, as.factor(x) and vec_unique(x) have different definitions of what constitutes a unique value, because the as.factor(x) casts x as a character, losing any small digits in doubles. This PR uses rowsum() instead of tapply() to count, which preserves the uniqueness as it doesn't cast the group to a factor (also rowsum() is a bit faster).

Reprex from #4609:

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2
df <- data.frame(x = rep(c(1, 2), 5) + rep(c(0, -2.220446e-16), c(4, 1)))

ggplot(df, aes(x)) + geom_bar()
#> Warning: `position_stack()` requires non-overlapping x intervals

Created on 2023-11-13 with reprex v2.0.2

Of course, position_stack() not unreasonably complains about overlapping intervals, but this is a better hint at what might be going on that the current error.

Copy link
Member

@thomasp85 thomasp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasp85
Copy link
Member

Please add NEWS bullet

@thomasp85 thomasp85 added this to the ggplot2 3.5.0 milestone Dec 14, 2023
@teunbrand teunbrand merged commit abc70e2 into tidyverse:main Dec 14, 2023
12 checks passed
@teunbrand teunbrand deleted the unique_count_4609 branch December 14, 2023 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stat_count gives cryptic error when used on a column of doubles
2 participants