Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of bins off by one in geom_contour() #3879

Closed
clauswilke opened this issue Mar 10, 2020 · 10 comments · Fixed by #3976
Closed

Number of bins off by one in geom_contour() #3879

clauswilke opened this issue Mar 10, 2020 · 10 comments · Fixed by #3976

Comments

@clauswilke
Copy link
Member

clauswilke commented Mar 10, 2020

If we explicitly provide a number of bins, we get one too few. This is particularly striking if we set bins = 2, since that removes all contouring. This example of an empty plot is currently on the ggplot2 website (scroll down to 4th example):
https://ggplot2.tidyverse.org/reference/geom_contour.html#examples

I assume this is a regression introduced in 3.3.0.

library(ggplot2)

p <- ggplot(faithfuld, aes(waiting, eruptions, z = density))

p + geom_contour(bins = 3)

p + geom_contour_filled(bins = 3, aes(fill = after_stat(ordered(piece)))) + 
  scale_fill_viridis_d()

p + geom_contour(bins = 6)

p + geom_contour_filled(bins = 6, aes(fill = after_stat(ordered(piece)))) + 
  scale_fill_viridis_d()

Created on 2020-03-09 by the reprex package (v0.3.0)

@clauswilke clauswilke added this to the ggplot2 3.3.1 milestone Mar 10, 2020
@clauswilke
Copy link
Member Author

clauswilke commented Mar 10, 2020

Ok, this appears to be dataset dependent, so it's likely a bug in scales::fullseq().

library(tidyverse)

volcano %>%
  as_tibble() %>%
  rowid_to_column(var = "x") %>%
  gather(key = "y", value = "z", -1) %>%
  mutate(y = as.numeric(gsub("V", "", y))) %>%
  ggplot(aes(x, y, z = z)) + 
  geom_contour_filled(bins = 5, aes(fill = after_stat(ordered(piece)))) + 
  scale_fill_viridis_d()

Created on 2020-03-10 by the reprex package (v0.3.0)

@clauswilke clauswilke removed this from the ggplot2 3.3.1 milestone Mar 10, 2020
@clauswilke
Copy link
Member Author

clauswilke commented Mar 10, 2020

At a minimum, though, the example with bins = 2 needs to be changed.

@thomasp85
Copy link
Member

thomasp85 commented Apr 30, 2020

have you dug deeper into this? should an issue in scales be opened?

@clauswilke
Copy link
Member Author

clauswilke commented Apr 30, 2020

should an issue in scales be opened?

I think so. I gave up because I didn't immediately understand what data input causes it to work or not, but scales::fullseq() is definitely not behaving quite right.

@thomasp85
Copy link
Member

thomasp85 commented Apr 30, 2020

Can I get you to do that and close this issue?

@clauswilke
Copy link
Member Author

clauswilke commented Apr 30, 2020

Actually, looking over everything one more time, it's not clear to me that we can just blame scales::fullseq(). We may not be using it correctly. It does what it promises to do. Because it takes a range and a binwidth as input, it doesn't guarantee a specific number of bins.

In the following reprex, the function make_breaks() encapsulates the logic used by ggplot2. We see that scales::fullseq() does what it is asked to do, but the results aren't necessarily what we want, depending on how the boundaries of the range line up with the binwidth.

library(tidyverse)

volcano_tbl <- volcano %>%
  as_tibble() %>%
  rowid_to_column(var = "x") %>%
  gather(key = "y", value = "z", -1) %>%
  mutate(y = as.numeric(gsub("V", "", y))) 

make_breaks <- function(data, bins) {
  z_range <- range(data, na.rm = TRUE, finite = TRUE)
  binwidth <- diff(z_range) / (bins - 1)
  breaks <- scales::fullseq(z_range, binwidth)
  cat("Range:", z_range, "\n")
  cat("Binwidth", binwidth, "\n")
  cat("Breaks:", breaks, "\n")
  invisible(breaks)
}

# one bin too few
make_breaks(faithfuld$density, 3)  
#> Range: 1.25925e-24 0.0369878 
#> Binwidth 0.0184939 
#> Breaks: 0 0.0184939 0.0369878
make_breaks(faithfuld$density, 5)  
#> Range: 1.25925e-24 0.0369878 
#> Binwidth 0.00924695 
#> Breaks: 0 0.00924695 0.0184939 0.02774085 0.0369878

# shifting the data creates the desired number of bins
make_breaks(faithfuld$density + .01, 3)  
#> Range: 0.01 0.0469878 
#> Binwidth 0.0184939 
#> Breaks: 0 0.0184939 0.0369878 0.0554817
make_breaks(faithfuld$density + .01, 5)  
#> Range: 0.01 0.0469878 
#> Binwidth 0.00924695 
#> Breaks: 0.00924695 0.0184939 0.02774085 0.0369878 0.04623475 0.0554817

# also correct number of bins
make_breaks(volcano_tbl$z, 3)
#> Range: 94 195 
#> Binwidth 50.5 
#> Breaks: 50.5 101 151.5 202
make_breaks(volcano_tbl$z, 5)
#> Range: 94 195 
#> Binwidth 25.25 
#> Breaks: 75.75 101 126.25 151.5 176.75 202

Created on 2020-04-30 by the reprex package (v0.3.0)

@thomasp85
Copy link
Member

thomasp85 commented Apr 30, 2020

👍 we’ll fix it in ggplot2 then

@clauswilke
Copy link
Member Author

clauswilke commented Apr 30, 2020

The question is fundamentally whether we should divide by bins - 1 or bins in this line:

binwidth <- diff(z_range) / (bins - 1)

A lazy fix would be to divide by bins - 1, calculate the breaks, check them out, and if it's one too few rerun the calculation but divide by bins.

@thomasp85
Copy link
Member

thomasp85 commented Apr 30, 2020

I remember fiddling with that in the bin scale as well. I think your suggestion is pragmatic and good

@clauswilke
Copy link
Member Author

clauswilke commented Apr 30, 2020

Ok. I'll prepare a PR.

clauswilke added a commit to wilkelab/ggplot2_archive that referenced this issue Apr 30, 2020
clauswilke added a commit that referenced this issue May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants