Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geom_sf mismatches when unknown aesthetic used with non-aesthetic specified colour/fill #5172

Closed
barryrowlingson opened this issue Feb 1, 2023 · 11 comments

Comments

@barryrowlingson
Copy link

When controlling the colour (or fill) by a vector of colours and also specifying an unknown aesthetic in aes, the colours get displayed on the wrong features.

Example: Make some fake data. This is a 3x4 grid of points with F1, a numeric column, F2, a character column, and mapcolour, a column of colour names.

library(sf)
library(ggplot2)

rgn4 <- st_as_sf(data.frame(expand.grid(x=1:3, y=1:4)), coords=1:2)
rgn4$F1 = 1:nrow(rgn4)
rgn4$F2 = paste0("A",rgn4$F1)
rgn4$mapcolour = "palegreen"
rgn4$mapcolour[11]="red"
rgn4$mapcolour[12]="yellow"

The goal is to plot the points coloured according to the names in the mapcolour column, but to also specify an aesthetic aes(foo=...) which will be F1 or F2.

With F1 (numeric), this works, giving a pale green grid of points but with the top middle (11th) being red and top right (12th) being yellow. There is a warning about the unknown aesthetic:

ggplot() + geom_sf(data=rgn4 , aes(foo=F1), col=rgn4$mapcolour, cex=10)

image

But if I try with F2 (character column), the colours are in the wrong place:

ggplot() + geom_sf(data=rgn4 , aes(foo=F2), col=rgn4$mapcolour, cex=10)

image

Specifying colours numerically from 1 to 12 shows the top row looking like a repeat of the previous row in the failure case, which looks to be different to the behaviour above:

ggplot() + geom_sf(data=rgn4 , aes(foo=F1), col=1:nrow(rgn4), cex=10)
ggplot() + geom_sf(data=rgn4 , aes(foo=F2), col=1:nrow(rgn4), cex=10)

image
image

Further experiments with "ignored" aesthetics that are character with different numbers of discrete values show different mismatches of expected colour locations. Try:

rgn4$F3 = paste0("X",c(1,2))
rgn4$F4 = paste0("Y",c(1,2,3))
rgn4$F5 = paste0("Z",c(1,2,3,4))
ggplot() + geom_sf(data=rgn4 , aes(foo=F3), col=rgn4$mapcolour, cex=10)
ggplot() + geom_sf(data=rgn4 , aes(foo=F4), col=rgn4$mapcolour, cex=10)
ggplot() + geom_sf(data=rgn4 , aes(foo=F5), col=rgn4$mapcolour, cex=10)

to see different mismatch patterns.

This can all be worked round by using ggplot() + geom_sf(data=rgn4 , aes(foo=F1, col=mapcolour), cex=10) and adding a scale_ function to map the colours as desired. I've shown the student who first showed me this bug how to do that instead. But I thought I'd submit an issue report because it seems something is not quite right. This also manifests with sf polygons.

The unknown aesthetic is there because the plotly package can utilise a text aesthetic for popup texts on web plots.

I've spent some time narrowing this down and it only seems to manifest with geom_sf, when there's at least one unknown aesthetic, and when one aspect of the features is controlled by a parameter outside the aes by a vector, as in the example. The problem does not show when using geom_point on a converted version of the sf data, these two looking fine:

pts = cbind(st_coordinates(rgn4), st_drop_geometry(rgn4))
ggplot() + geom_point(data=pts , aes(x=X,y=Y,foo=F1), col=pts$mapcolour, cex=10)
ggplot() + geom_point(data=pts , aes(x=X,y=Y,foo=F2), col=pts$mapcolour, cex=10)

Session info follows:

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /opt/R-4.1.1/lib/R/lib/libRblas.so
LAPACK: /opt/R-4.1.1/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.4.0 sf_1.0-9     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7         magrittr_2.0.1     units_0.7-2        munsell_0.5.0     
 [5] tidyselect_1.1.1   colorspace_2.0-2   R6_2.5.1           rlang_1.0.6       
 [9] fansi_0.5.0        dplyr_1.0.7        tools_4.1.1        grid_4.1.1        
[13] gtable_0.3.0       KernSmooth_2.23-20 utf8_1.2.2         cli_3.4.1         
[17] e1071_1.7-9        DBI_1.1.1          withr_2.5.0        class_7.3-19      
[21] assertthat_0.2.1   tibble_3.1.8       lifecycle_1.0.3    farver_2.1.0      
[25] purrr_1.0.0        vctrs_0.5.0        glue_1.6.2         labeling_0.4.2    
[29] proxy_0.4-26       compiler_4.1.1     pillar_1.8.1       scales_1.2.1      
[33] generics_0.1.3     classInt_0.4-3     pkgconfig_2.0.3   
@clauswilke
Copy link
Member

This is highly non-idiomatic code. If you want to give points different colors you need to set up an aesthetic mapping.

@barryrowlingson
Copy link
Author

@clauswilke I know. Did you not read the whole of the issue? "This can all be worked round by using ggplot() + geom_sf(data=rgn4 , aes(foo=F1, col=mapcolour), cex=10) and adding a scale_ function to map the colours as desired. I've shown the student who first showed me this bug how to do that instead."

Even non-idiomatic code should not fail this way though.

@clauswilke
Copy link
Member

Using the API correctly is not a workaround. It's using the API as intended. We've had complaints for years that providing vectors of colors as parameters fails in unexpected ways under various circumstances, and the answer has always been that we're not guaranteeing this to work. Now on occasion we have made changes to the code that would fix such cases, but I'm not even sure that's helpful because it reinforces that this should work when really from a technical perspective it cannot be guaranteed. It's up to the geom/stat to not reshuffle the data, and the geom/stat may have good reasons to do so.

geom_sf() has to do some complex reshuffling I believe to separate out the different potential things it may have to draw (points, lines, polygons) and it's possible that that algorithm gets confused by the extra aesthetic.

Note that I'm not closing the issue. If somebody wants to hunt this down and can find a way to fix it without breaking anything else great. But it'd be very low on my priority list.

@clauswilke
Copy link
Member

I shouldn't even say "gets confused". The algorithm does what it is supposed to do when proper aesthetic mappings are set up. But the internal ordering it creates may depend on additional aesthetics that are provided. For example because groupings are calculated on the basis of all aesthetics provided, even ones that aren't used by any geom.

@clauswilke
Copy link
Member

Simple example of how a meaningless aesthetic can mess up drawing. Just because the aesthetic is meaningless doesn't mean it isn't considered when generating the plot. The core ggplot2 engine can't know whether foo is used by any geom.

library(tidyverse)

df <- tibble(
  x = rep(1:10, 3),
  y = rep(cos(1:10), 3) + rep(1:3, each = 10),
  a = rep(c('a', 'b', 'c'), each = 10),
  b = rep(letters[1:6], 5)
)

ggplot(df, aes(x, y, color = a)) +
  geom_line()

ggplot(df, aes(x, y, color = a, foo = b)) +
  geom_line()

Created on 2023-02-01 with reprex v2.0.2

@barryrowlingson
Copy link
Author

Yes I see there's lots of potential confusion because the top half of ggplot can't know what the bottom half is doing. I think another example of this is your:

ggplot(df, aes(x, y, color = a, foo = b)) +
  geom_line()

which produces no warnings, since I guess ggplot doesn't know what is and isn't a valid aesthetic in code to come, yet the identical plot, written:

ggplot() + geom_line(data=df, aes(x, y, color = a, foo = b)) 

at least produces a warning:

Warning message:
In geom_line(data = df, aes(x, y, color = a, foo = b)) :
  Ignoring unknown aesthetics: foo

(Side note:
Is that a more idiomatic way of writing ggplot calls? I'm not a big fan of adding arguments to ggplot and letting them be implicit later)

Its an unusual meaning of "Ignoring" in that warning, when it makes such a substantial breaking change to the graphic. I wonder if the plotly package authors are aware, although it shouldn't affect the "text" aspect of plotly plots since it seems if your unknown aesthetic has a 1:1 relationship with the color and other grouping aesthetics then the graph functions as expected.

I did try and simplify the example down as much as possible, the original problem was with complex mapped polygons and I thought it might have been related to the behaviour you've shown with lines, when unfolding polygons to a sort of fortified version (I'm not sure fortification happens with geom_sf but...), but the problem could be manifest with points which amazed me, since point features have one row per feature, no fortification needed. But yes some unpacking of sf point data is getting out of sync with the colour vector and the aesthetic and that doesn't happen with geom_point.

I am reminded of the great clown Tommy Cooper, who waved his arm at his doctor and said "it hurts when I do that" and the doctor replied "well don't do that then".

@barryrowlingson
Copy link
Author

If its possible to fix this in documentation or code warnings that might be useful. The docs for geom_point talk about these args outside aes() thus:

     ...: Other arguments passed on to ‘layer()’. These are often
          aesthetics, used to set an aesthetic to a fixed value, like
          ‘colour = "red"’ or ‘size = 3’. They may also be parameters
          to the paired geom/stat.

which doesn't explicitly say passing a vector can lead to a world of pain. The help for layer() doesn't have "colour" or "size" or "..." as parameters so its not clear how these are "passed on to layer()", which makes the help for geom_point not very helpful. Explicit mentions of the passage to a world of pain are welcome.

@clauswilke
Copy link
Member

To your earlier question: If you put the aesthetic into the ggplot call, then it applies to all geoms in the plot. Geoms that don't know what to do with it just ignore it. By contrast, if you put it into a specific geom, then it applies only to that geom, and then the geom can let you know if it doesn't recognize the aesthetic.

@barryrowlingson
Copy link
Author

Yes, and given that "Explicit is better than implicit" is a Good Thing, I prefer to not put things in the ggplot(), so that there's no implicit passing of aesthetics or data args in the ggplot call passed down to geoms. Is that better, worse, or idiomatically neutral than stuffing as much as possible into ggplot()? In my eyes that results in parts of a ggplot chain that are dependent on the specifics of the ggplot call and not independent geoms with their own data and aesthetics specified in the place they are used.

@teunbrand
Copy link
Collaborator

The issue of geom_sf() rearranging points by group was raised before in #4340. A simple fix for panel-level rearrangement has been proposed here: #5170. However, if you're assigning colours outside of aes() with facetted plots, you'd still be at risk of rearrangement.

Is that better, worse, or idiomatically neutral than stuffing as much as possible into ggplot()?

To my mind, you put things in the main ggplot() call that are to be re-used among layers. For example: if you have a geom_path() and geom_segment() layer that both use the x/y aesthetic, it's fine to put it there. If there are two layers that use different mappings, there is no explicit guidance but pragmatically I would put what I'd consider the 'main' data in the ggplot() call for code clarity.

@teunbrand
Copy link
Collaborator

This is now fixed in the dev version of ggplot2.

library(sf)
#> Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
devtools::load_all("~/packages/ggplot2/")
#> ℹ Loading ggplot2

rgn4 <- st_as_sf(data.frame(expand.grid(x=1:3, y=1:4)), coords=1:2)
rgn4$F1 = 1:nrow(rgn4)
rgn4$F2 = paste0("A",rgn4$F1)
rgn4$mapcolour = "palegreen"
rgn4$mapcolour[11]="red"
rgn4$mapcolour[12]="yellow"

ggplot() + geom_sf(data=rgn4 , aes(foo=F1), col=rgn4$mapcolour, cex=10)
#> Warning in layer_sf(geom = GeomSf, data = data, mapping = mapping, stat =
#> stat, : Ignoring unknown aesthetics: foo

ggplot() + geom_sf(data=rgn4 , aes(foo=F2), col=rgn4$mapcolour, cex=10)
#> Warning in layer_sf(geom = GeomSf, data = data, mapping = mapping, stat =
#> stat, : Ignoring unknown aesthetics: foo

ggplot() + geom_sf(data=rgn4 , aes(foo=F1), col=1:nrow(rgn4), cex=10)
#> Warning in layer_sf(geom = GeomSf, data = data, mapping = mapping, stat =
#> stat, : Ignoring unknown aesthetics: foo

ggplot() + geom_sf(data=rgn4 , aes(foo=F2), col=1:nrow(rgn4), cex=10)
#> Warning in layer_sf(geom = GeomSf, data = data, mapping = mapping, stat =
#> stat, : Ignoring unknown aesthetics: foo

Created on 2023-03-27 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants