New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce "object 'geometry' not found" error with geom_sf() #2872

Closed
yutannihilation opened this Issue Aug 29, 2018 · 15 comments

Comments

Projects
None yet
2 participants
@yutannihilation
Member

yutannihilation commented Aug 29, 2018

Currently, geom_sf() works without specifying geometry aes only when:

  1. the sf object has a geometry column named geometry
  2. the sf object is passed directly to geom_sf(), not via the plot data

I'm afraid it's a bit confusing that the example of nc won't work if we just replace nc with another sf object.

library(ggplot2)

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
attr(nc, "sf_column")
#> [1] "geometry"

# this works
ggplot(nc) +
  geom_sf(aes(fill = AREA))

nc_gpkg <- sf::st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE)
attr(nc_gpkg, "sf_column")
#> [1] "geom"

# this fails
ggplot(nc_gpkg) +
  geom_sf(aes(fill = AREA))
#> Error in FUN(X[[i]], ...): object 'geometry' not found

Created on 2018-08-29 by the reprex package (v0.2.0).

I want to let ggplot(nc_gpkg) + geom_sf(aes(fill = AREA)) simply work. For example, implementing fortify.sf() seems to solve our problem quickly.

fortify.sf <- function(model, data, ...) {
  if (! "geometry" %in% colnames(model)) {
    model$geometry <- sf::st_geometry(model)
  }
  model
}

But, I'm not fully sure this does the right thing... I feel this is similar kind of problem that we wanted to map grouping variable of grouped_df automatically to group aes, but couldn't (#2378). Are there any smarter way to create mappings automatically?

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 30, 2018

There's a third case that works, when you're actually mapping the geometry aesthetic just like all other aesthetics:

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(ggplot2) 

nc_gpkg <- sf::st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE)

# works, uses proper mapping for geometry column
ggplot(nc_gpkg) +
  geom_sf(aes(fill = AREA, geometry = geom))

Created on 2018-08-29 by the reprex package (v0.2.0).

So if we routinely mapped geometry columns, just like we do with all other data columns, the problem wouldn't exist. Maybe that would be good practice at least within all the ggplot2 documentation and examples?

The downside to your proposed solution is that it is not universal. If a data frame contains a column called geometry but that doesn't contain geometry info then it will break.

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 30, 2018

Thinking some more about this, it seems the problem is that geom_sf() doesn't have access to the original data. This sounds very much like the problem you discussed here: https://yutani.rbind.io/post/2017-11-07-ggplot-add/
So that could be a solution.

However, this got me thinking: Since this is a generic problem that may come up over and over, maybe we could solve it by adding a function such as setup_layer() to the layer ggproto and call it as the very first thing in ggplot_add(), like so:

ggplot_add.Layer <- function(object, plot, object_name) {
  object$setup_layer(plot)
  plot$layers <- append(plot$layers, object)
  ...

This might enable all sorts of new applications. Would just require careful evaluation of whether anything that is currently done in layer() instead should be done in layer()$setup_layer().

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 30, 2018

So if we routinely mapped geometry columns, just like we do with all other data columns, the problem wouldn't exist.

Agreed. Actually, one of the things that frustrate me is the very inconsistency that geometry aes is required in some cases while it's not in some other cases.

One more thing is that I have to copy and paste these lines every time I implement a new geom_sf_*() just to preserve the geom_sf()'s behaviour of inferring the geometry column...

ggplot2/R/sf.R

Lines 238 to 246 in 71cb174

# Automatically determin name of geometry column
if (!is.null(data) && is_sf(data)) {
geometry_col <- attr(data, "sf_column")
} else {
geometry_col <- "geometry"
}
if (is.null(mapping$geometry)) {
mapping$geometry <- as.name(geometry_col)
}

So, in general, I agree with you that always specifying the geometry is an universal solution.

Yet, specifying geometry manually looks redundant, given that a sf object already knows which column represents geometry...

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 30, 2018

setup_layer() sounds super cool!

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 30, 2018

One more thing is that I have to copy and paste these lines every time I implement a new geom_sf_*() just to preserve the geom_sf()'s behaviour of inferring the geometry column...

It would probably make sense to put them into a function, just like geom_column():

ggplot2/R/sf.R

Lines 84 to 94 in 71cb174

geom_column <- function(data) {
w <- which(vapply(data, inherits, TRUE, what = "sfc"))
if (length(w) == 0) {
"geometry" # avoids breaks when objects without geometry list-column are examined
} else {
# this may not be best in case more than one geometry list-column is present:
if (length(w) > 1)
warning("more than one geometry column present: taking the first")
w[[1]]
}
}

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 30, 2018

I mean, there will be many variants of geom_sf(), which are outside of the ggplot2's scope (for example, I think I'm going to implement geom_sf_label_repel()). So, bundling the process into an internal function sounds smart, but it may not be really helpful for the extension packages of ggplot2.

Now I'm thinking I need something like fortify() for mapping. Let me think a bit more...

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 30, 2018

Anyway, this seems reasonable to me. May I create a PR to improve examples?

So if we routinely mapped geometry columns, just like we do with all other data columns, the problem wouldn't exist. Maybe that would be good practice at least within all the ggplot2 documentation and examples?

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 30, 2018

@yutannihilation I have created a PR (#2875) that solves the issue in a general way and also eliminates the need for copying the geometry detection code all over the place. Could you take a look?

@hadley Do you see any showstoppers? The only change visible to the outside world is an additional, optional argument in the layer() function.

Examples:

library(ggplot2)
nc_gpkg <- sf::st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE)

# works now
ggplot(nc_gpkg) +
  geom_sf(aes(fill = AREA))

# works now
ggplot(nc_gpkg) +
  stat_sf_coordinates()
#> Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may
#> not give correct results for longitude/latitude data

# still needs explicit mapping because not an sf layer
ggplot(nc_gpkg) +
  geom_errorbarh(
    aes(geometry = geom,
        xmin = stat(x) - 0.1,
        xmax = stat(x) + 0.1,
        y = stat(y),
        height = 0.04),
    stat = "sf_coordinates"
  )
#> Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may
#> not give correct results for longitude/latitude data

Created on 2018-08-30 by the reprex package (v0.2.0).

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 30, 2018

Thanks, it seems fine. But, what was in my mind was a bit different; yours won't cover this case:

I feel this is similar kind of problem that we wanted to map grouping variable of grouped_df automatically to group aes, but couldn't (#2378).

IIUC, requires us to create a new Layer (and new Geom and Stat accordingly) to handle a new object. So, if we want to map group variable automatically for grouped_df, we need to re-implement all of existing Geoms.

I think mechanism of auto-mapping should be an generic function because it depends on the class of object, not the type of the geom, in general. This may be too general, though.

I will send a PR that illustrates my idea...

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 30, 2018

I hadn't thought about the grouping issue, but I think it can be solved with my approach. The logic would be "if the data frame for this layer has grouping information and a group aesthetic is not set for this layer, then map the grouping information to group for this layer". This can be done in the setup_layer() function, and you could do it in the default layer object to have it happen for all geoms and stats.

The key idea is that there are things that we want to be layer specific but they depend on the global data frame and mapping, and setup_layer() has the required info to make those decisions.

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 30, 2018

Ah, Thanks, it makes sense.

Here's my PoC. We may eventually need some sort of this auto_mapping() generic in setup_layer(), but I'm not sure for now 🤔 :
https://github.com/yutannihilation/ggplot2/pull/1/files

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 30, 2018

Oops, sorry, I was wrong. My PoC requires .group variable all of the other layers, so we'll see the same problem as #2378... Please ignore the comment above. Your approach seems appropriate.

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 30, 2018

I'm not sure whether a generic is the right approach, because one data frame could match multiple criteria (for example, both be grouped and contain a geometry column). It seems to me more akin to a checklist of conditions and corresponding actions on the mapping.

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 30, 2018

I'm happy to add auto-grouping to my PR, but I'd like to hear @hadley's position first. As it stands right now, the PR is essentially a bug fix, even if it adds a new extension mechanism. If we add auto-grouping, we're fundamentally changing how ggplot2 works and that's not appropriate for the upcoming bug-fix release.

One option could be to implement auto-grouping to make sure the API works and then disable it for the upcoming release. Alternatively, if Hadley thinks this whole approach of setup_layer() is a no-go, at least for now, then there's no point in pushing the code further at this time.

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 30, 2018

Sorry, I forgot the upcoming release is for bug-fix... I understand. Will have a closer look on your PR later!

clauswilke added a commit to clauswilke/ggplot2 that referenced this issue Nov 15, 2018

clauswilke added a commit that referenced this issue Nov 15, 2018

Allow creation of custom layers that have access to global plot data (#…
…2875)

* allow creation of custom layers that have access to global plot data

* properly respect inherit.aes when looking for geometry column

* isFALSE() doesn't exist in R 3.4

* incorporate suggested changes by yutannihilation

* Move `setup_layer()` to plot build.

* update news item. fixes #2872
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment