Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement default crs for non-sf objects in coord_sf(). #3659

Merged
merged 31 commits into from
Jun 24, 2020

Conversation

clauswilke
Copy link
Member

Second attempt at implementing the idea proposed in #3654. Based on discussion in #3655.

The main issue that remains to be solved is proper setting of x and y scale limits.

library(ggplot2)
library(readr)
library(ggforce)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

df <- data.frame(
  lat = 35.76667,
  long = -78.63333,
  name = "Raleigh"
)

ggplot(df, aes(x = long, y = lat)) +
  geom_sf(data = nc, size = 0.1, fill = "white", inherit.aes = FALSE) +
  geom_point() +
  geom_text(aes(label = name), hjust = -.1, vjust = 1.1) +
  coord_sf(crs = 2264)

ggplot(nc) +
  geom_sf(size = 0.1, fill = "white") +
  geom_hline(yintercept = 34:36) +
  geom_vline(xintercept = c(-84, -80, -76)) +
  coord_sf(crs = 2264)

df <- read_csv(file = "name,population,lat,long
Charlotte,827097,35.227,-80.843
Raleigh,451066,35.772,-78.639
Greensboro,285342,36.073,-79.792
Durham,257636,35.994,-78.899
Winston-Salem,241218,36.1,-80.244")

ggplot(df, aes(x = long, y = lat)) +
  geom_sf(data = nc, size = 0.1, fill = "white", inherit.aes = FALSE) +
  geom_point() +
  geom_mark_hull() +
  coord_sf(crs = 2264)

ggplot() +
  annotation_map(map_data("state"), fill = "antiquewhite", colour = "darkgrey") +
  geom_sf(data = nc) +
  coord_sf(crs = st_crs(3347))

# this works except the scales can't figure out the correct limits

ggplot(nc) +
  geom_sf() +
  stat_sf_coordinates(geom = "point") +
  coord_sf(crs = st_crs(3347))

ggplot(nc) +
  stat_sf_coordinates(geom = "point") +
    coord_sf(crs = st_crs(3347))

Created on 2019-12-06 by the reprex package (v0.3.0)

R/coord-sf.R Outdated Show resolved Hide resolved
@clauswilke
Copy link
Member Author

clauswilke commented Dec 6, 2019

I haven't figured out yet how to get the scale limits correct. I'll provide some notes as food for thought.

Currently, the scale limits for geometry columns are calculated in StatSf, which calculates a bounding box for each object and then adds columns xmin, xmax, ymin, ymax to the data:

ggplot2/R/stat-sf.R

Lines 6 to 14 in 16ed4d0

compute_group = function(data, scales) {
bbox <- sf::st_bbox(data[[ geom_column(data) ]])
data$xmin <- bbox[["xmin"]]
data$xmax <- bbox[["xmax"]]
data$ymin <- bbox[["ymin"]]
data$ymax <- bbox[["ymax"]]
data
},

This trick works nicely if we're exclusively working in projected coordinates, but it doesn't if non-sf layers are in non-projected coordinates, as then the data units don't match. We could back-transform the bounding boxes to long-lat coordinates, but I'm worried that that would lead to strange limit choices in cases where meridians and parallels are not mostly aligned with the y and x axis, respectively. As much as possible, we should perform limit calculations in transformed coordinates.

Since CoordSf has final say in the limits, maybe it's somehow possible to carry the bounding box info of geometry objects until the point where the final panel params are set up, and only merge then with the limits the scales have determined from the position data:

ggplot2/R/coord-sf.R

Lines 128 to 147 in 16ed4d0

setup_panel_params = function(self, scale_x, scale_y, params = list()) {
# Bounding box of the data
expansion_x <- default_expansion(scale_x, expand = self$expand)
x_range <- expand_limits_scale(scale_x, expansion_x, coord_limits = self$limits$x)
expansion_y <- default_expansion(scale_y, expand = self$expand)
y_range <- expand_limits_scale(scale_y, expansion_y, coord_limits = self$limits$y)
bbox <- c(
x_range[1], y_range[1],
x_range[2], y_range[2]
)
# Generate graticule and rescale to plot coords
graticule <- sf::st_graticule(
bbox,
crs = params$crs,
lat = scale_y$breaks %|W|% NULL,
lon = scale_x$breaks %|W|% NULL,
datum = self$datum,
ndiscr = self$ndiscr
)

I'm just not sure yet how to do this, in particular in a way that works in faceted graphs. On the flip side, free scales don't seem to be currently supported anyways, so maybe that's not a regression.

library(ggplot2)

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

ggplot(nc[1:5, ]) + geom_sf() + facet_wrap(~NAME)

ggplot(nc[1:5, ]) + geom_sf() + facet_wrap(~NAME, scales = "free")
#> Error: coord_sf doesn't support free scales

Created on 2019-12-06 by the reprex package (v0.3.0)

Maybe StatSf can register the bounding boxes with CoordSf, which then can use them when it determines the final scale limits.

@clauswilke
Copy link
Member Author

I think I've got limits working, but it required quite a bit of fiddling and communication back-and-forth between the stat and the coord.

@paleolimbot Could you take a look?

library(ggplot2)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

ggplot(nc) +
  geom_sf() +
  stat_sf_coordinates(geom = "point") +
  coord_sf(crs = st_crs(3347))

ggplot(nc) +
  stat_sf_coordinates(geom = "point") +
    coord_sf(crs = st_crs(3347))

# scale limits are given in longitude/latitude
ggplot(nc) +
  geom_sf() +
  scale_x_continuous(limits = c(-86, -74)) +
  coord_sf(crs = 3347)

# coord limits are given in projected coordinates
ggplot(nc) +
  geom_sf() +
  coord_sf(xlim = c(6800000, 7900000), crs = 3347)

Created on 2019-12-06 by the reprex package (v0.3.0)

@clauswilke
Copy link
Member Author

Some more testing of this code. First, I wanted to see whether it works in cases that span the international dateline. The answer is yes as long as an appropriate CRS is chosen. This is reasonable.

library(tidyverse)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0

world <- ne_countries(scale = "medium", returnclass = "sf")
fiji_polys <- world %>%
  filter(name == "Fiji") %>%
  pull(geometry) %>%
  st_cast("POLYGON") %>% 
  st_sf() %>%
  mutate(label = letters[1:n()])

# doesn't work, long-lat across the dateline causes problems 
ggplot(data = fiji_polys) +
  geom_sf()

# works
ggplot(data = fiji_polys) +
  geom_sf() +
  coord_sf(crs = 3460)

# works even with separate groups on either
# side of the date line
ggplot(data = fiji_polys) +
  geom_sf(aes(fill = label)) +
  coord_sf(crs = 3460)

Created on 2019-12-06 by the reprex package (v0.3.0)

Second, I wanted to see how coordinate expansion is affected by non-sf layers. The answer is that coordinate expansion can change depending on whether the default crs of non-sf layers is set equal to the primary crs or not. I assume there's no way around this. When the default crs is long-lat, then the x and y scales are trained on these values, and the resulting scale limits can be different from what we would choose in an appropriate projection.

library(ggplot2)
library(patchwork)

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

p1 <- ggplot(nc) + geom_sf() +
  coord_sf(crs = 3347)
p2 <- ggplot(nc) + geom_sf() +
  coord_sf(crs = 3347, default_crs = 3347)

# when only sf layers are used, the panel expansion does not
# depend on the default crs
p1 / p2

# however, when there are non-sf layers also, the panel expansion
# depends on the default crs
(p1 + stat_sf_coordinates()) / (p2 + stat_sf_coordinates())

Created on 2019-12-06 by the reprex package (v0.3.0)

@clauswilke
Copy link
Member Author

I think this generally works, but there could be some implications for geom_stars() from the stars package. @edzer, could you comment/take a look?

@paleolimbot
Copy link
Member

This looks awesome! The one limitation I forsee is that multiple panels will all be calling the same record_bbox() function, so there would be no support for a facet with scales = "free" (I don't know if there is currently any support for this, so it may not matter). This PR also breaks the convention that ggproto objects are immutable (except scales). I think that convention has resulted in some very ugly code in ggplot2 (in my just-for-fun R6 rewrite it's one of the first things I ditched), so I have no problem with it here.

@paleolimbot
Copy link
Member

The alternative would be to have a scale_geometry() that would be considered a position aesthetic. Since position aesthetics are hard-coded for the most part, it would require a bit of code duplication, but I think there's nothing that has to change systematically for that to work. We'd need a custom Range that keeps a bbox, but you pretty much have that code written already.

@clauswilke
Copy link
Member Author

We don't currently allow scales = "free" in any coord that enforces a fixed aspect ratio, and it's not clear that we ever will. But in general, I'd prefer a solution that can scale separately for different panels.

I like the idea of a scale_geometry(), I just don't know enough about scales to know how to implement that. Could you give me some pointers and/or take a stab on top of my PR? Would it make sense to create a branch and develop this effort there until it's done?

@paleolimbot
Copy link
Member

I'll give it a shot and see if I can get it to work!

@paleolimbot
Copy link
Member

Ok, I gave it a shot and while the scale_geometry() scale gets trained no problem, everything with the Layout and Coord and Facet are hard-coded as scale_x and scale_y, so there's no room for a scale_geometry without changing a bunch of signatures. I think it's not worth it to change the signatures, but I've attached my scale code in case it's useful.

library(scales)
library(ggplot2)


scale_geometry_continuous <- function() {
  scale <- ggproto(
    NULL, ScaleGeometryContinuous,
    aesthetics = "geometry",
    scale_name = "position_c"
  )
}

scale_type.sfc <- function(x) "continuous"

RangeBbox <- ggproto("RangeBbox", NULL,
  range = NULL,
  target_crs = NULL,
  train = function(self, x) {
    if (all(sf::st_is_empty(x))) {
      return()
    }

    if (is.null(self$range)) {
      self$range <- list()
    }

    if (!is.null(self$target_crs) && !identical(sf::st_crs(x), sf::NA_crs_)) {
      x <- sf::st_transform(x)
    }

    box <- sf::st_bbox(x)

    self$range$xmin <- min(self$bbox$xmin, box["xmin"])
    self$range$xmax <- max(self$bbox$xmax, box["xmax"])
    self$range$ymin <- min(self$bbox$ymin, box["ymin"])
    self$range$ymax <- max(self$bbox$ymax, box["ymax"])
  },

  reset = function(self) {
    self$range <- NULL
  }
)

ScaleGeometryContinuous <- ggproto("ScaleGeometryContinuous", ScaleContinuous,
  range = ggproto(NULL, RangeBbox),
  clone = function(self) {
    new <- ggproto(NULL, self)
    new$range <- ggproto(NULL, RangeBbox)
    new
  },

  train = function(self, x) {
    self$range$train(x)
  },

  map = function(x) x,

  transform = function(x) x
)

nc_tiny_coords <- matrix(
  c(-81.473, -81.741, -81.67, -81.345, -81.266, -81.24, -81.473,
    36.234, 36.392, 36.59, 36.573, 36.437, 36.365, 36.234),
  ncol = 2
)

nc <- sf::st_as_sf(
  data.frame(
    NAME = "ashe",
    geometry = sf::st_sfc(sf::st_polygon(list(nc_tiny_coords)), crs = 4326)
  )
)

geometry_scale <- scale_geometry_continuous()
geometry_scale$train(nc$geometry)
geometry_scale$range$range
#> $xmin
#> [1] -81.741
#> 
#> $xmax
#> [1] -81.24
#> 
#> $ymin
#> [1] 36.234
#> 
#> $ymax
#> [1] 36.59

Created on 2019-12-08 by the reprex package (v0.2.1)

@clauswilke
Copy link
Member Author

Yes, that's what I was expecting was going to be problematic when I looked over the code yesterday. At some point, I think we'll have to rethink hard-coded position scales(*), but it's not something I want to tackle now.

(*) For example, in some cases a third dimension would be useful, e.g. to control z-ordering of elements, or to natively implement something like ggtern.

@paleolimbot
Copy link
Member

That's something I've also thought about...coordinate systems should be able to specify which aesthetics are "position" aesthetics? Outside the scope of this PR obviously, but it might be worth considering before the next release since I've already messed with the Coord internals. I could give it a shot but not for a few weeks.

@clauswilke
Copy link
Member Author

There is indeed a problem with writing changes back to the coord. If we reuse the coord in multiple plots, limits can get messed up. This could be addressed by cloning the coord when adding it, but I'll first want to see if there's a different approach.

library(ggplot2)
library(patchwork)
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

# works as expected
p1 <- ggplot(nc[1, ]) + geom_sf() + coord_sf(crs = 3347)
p2 <- ggplot(nc[2, ]) + geom_sf() + coord_sf(crs = 3347)
p1 + p2

# doesn't work, limits in second plot are wrong
crd <- coord_sf(crs = 3347)
p1 <- ggplot(nc[1, ]) + geom_sf() + crd
p2 <- ggplot(nc[2, ]) + geom_sf() + crd
p1 + p2

Created on 2019-12-08 by the reprex package (v0.3.0)

If we decide the coord needs to be cloned, this would have to happen here:

ggplot_add.Coord <- function(object, plot, object_name) {
if (!isTRUE(plot$coordinates$default)) {
message(
"Coordinate system already present. Adding new coordinate ",
"system, which will replace the existing one."
)
}
plot$coordinates <- object
plot
}

@clauswilke
Copy link
Member Author

I was able to avoid interference between different copies of the same coord, by resetting internal parameters to a defined state whenever the plot build begins.

library(ggplot2)
library(patchwork)
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

# parameters are regenerated with every plot build, no
# sharing of parameters between plots
crd <- coord_sf(crs = 3347)
p1 <- ggplot(nc[1, ]) + geom_sf() + crd
p2 <- ggplot(nc[2, ]) + geom_sf() + crd
p1 + p2

Created on 2019-12-08 by the reprex package (v0.3.0)

@clauswilke
Copy link
Member Author

@edzer FYI, with the latest version of my code, if you set default_crs = NULL in coord_sf() everything should behave exactly as before. But without that option, I assume there will be cases that have issues. The right way to resolve this would likely involve implementing an sf-aware version of geom_raster() and/or geom_rect(), but I don't understand the geom_stars() code well enough to grasp all the implications.

@paleolimbot
Copy link
Member

I've thought a bit more about this, and I think that this approach introduces considerable complexity for a problem that has a reasonable solution (project coordinates in advance or use something like ggspatial::stat_spatial_identity()). If we do solve it in ggplot2, my vote would be to do projection of non-sf layers early (e.g., in Layer$compute_aesthetics() or Stat$compute_layer()). That approach makes it hard to interpret position aesthetics that aren't x or y, but might be much simpler.

@hadley
Copy link
Member

hadley commented Mar 4, 2020

@clauswilke would you mind writing a quick overview of the PR, updated based on the discussion so far? Otherwise it's going to be hard for me to find the time to read the whole thread.

@clauswilke
Copy link
Member Author

@hadley Sure. Give me a day or two. I'll also resolve the merge conflicts. I'll ping you when I'm ready.

Copy link
Member

@yutannihilation yutannihilation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though we should still wait for Hadley's review, I have no more review comments.

@mdsumner
Copy link

mdsumner commented Mar 6, 2020

gee this is fab only just found it, thanks @clauswilke !!!!!!!!!!

@clauswilke
Copy link
Member Author

@hadley As promised, a summary of the discussion for this PR. I'll write this in a series of comments, so that I don't have to craft one massive comment all at once.

First, what does this PR do? From the news item:

coord_sf() now has an argument default_crs that specifies the coordinate reference system (CRS) for non-sf layers and scale/coord limits. This argument defaults to the World Geodetic System 1984 (WGS84), which means x and y positions are interpreted as longitude and latitude. This is a potentially breaking change for users who use projected coordinates in non-sf layers or in limits. Setting default_crs = NULL recovers the old behavior. Further, authors of extension packages implementing stat_sf()-like functionality are encouraged to look at the source code of stat_sf()'s compute_group() function to see how to provide scale-limit hints to coord_sf() (@clauswilke, #3659).

Here are some basic examples. We can now easily add points or text in long-lat coordinates.

library(tidyverse)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

df <- data.frame(
  lat = 35.76667,
  long = -78.63333,
  name = "Raleigh"
)

ggplot(df, aes(x = long, y = lat)) +
  geom_sf(data = nc, size = 0.1, fill = "white", inherit.aes = FALSE) +
  geom_point() +
  geom_text(aes(label = name), hjust = -.1, vjust = 1.1) +
  coord_sf(crs = 2264)

The framework is general and extends to essentially any geom. Here is an example with geom_mark_hull() from ggforce.

library(ggforce)

df <- read_csv(file = "name,population,lat,long
Charlotte,827097,35.227,-80.843
Raleigh,451066,35.772,-78.639
Greensboro,285342,36.073,-79.792
Durham,257636,35.994,-78.899
Winston-Salem,241218,36.1,-80.244")

ggplot(df, aes(x = long, y = lat)) +
  geom_sf(data = nc, size = 0.1, fill = "white", inherit.aes = FALSE) +
  geom_point() +
  geom_mark_hull() +
  coord_sf(crs = 2264)

And here is an example from annotation_map(), which uses geom_polygon() to draw maps and until now hasn't played nicely with coord_sf().

ggplot() +
  annotation_map(map_data("state"), fill = "antiquewhite", colour = "darkgrey") +
  geom_sf(data = nc) +
  coord_sf(crs = st_crs(3347))

The limits (both for scales and for the coord) are set in the default coordinate system, i.e., long-lat by default.

ggplot() +
  annotation_map(map_data("state"), fill = "antiquewhite", colour = "darkgrey") +
  geom_sf(data = nc) +
  coord_sf(crs = st_crs(3347), xlim = c(-82, -76), ylim = c(34, 37))

Created on 2020-03-08 by the reprex package (v0.3.0)

@clauswilke
Copy link
Member Author

While I developed this PR a number of issues came up, which I'll list here and then address one by one:

  1. We have to keep track of two different coordinate systems, the default one (default_crs) and the one into which we're projecting (crs). This breaks some assumptions in standard ggplot2.

  2. We have to appropriately segmentize straight lines.

  3. Setting limits in default coordinates poses unique challenges, because the coordinates are not cartesian after projection.

  4. In general, things can get wonky in weird coordinates, e.g. when we want to plot a pole or the international date boundary.

  5. Extension developers will need to write more complex code to write sf-aware geoms and stats. This requires additional helper functions.

@clauswilke
Copy link
Member Author

Point 1: Two different coordinate systems

For geometry objects, we want to keep track of their correct limits in projected coordinates. But scales are now trained in default coordinates. Thus, we somehow have to keep track of projected limits outsides of the regular scales. I handled this by implementing a call-back to coord_sf() to record the bounding boxes of all geometry objects:
https://github.com/clauswilke/ggplot2/blob/e4cdcd67498e53885a3124b13959acf5aef53560/R/stat-sf.R#L18-L24

Ultimately, the correct way to handle this would be to implement generic position scales, so that we could add a geometry scale that can be trained on geometry objects. However, that solutions is out of reach at this time.

@clauswilke
Copy link
Member Author

Point 2: Appropriately segmentize straight lines.

I'm segmentizing lines in the default coordinate system and then transform the segments. With the default setting, this generates horizontal and vertical lines that follow parallels and meridians, a behavior that many people might expect:

library(tidyverse)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

ggplot(nc) +
  geom_sf(size = 0.1, fill = "white") +
  geom_hline(yintercept = 34:36) +
  geom_vline(xintercept = c(-84, -80, -76)) +
  coord_sf(crs = 2264)

Created on 2020-03-08 by the reprex package (v0.3.0)

However, @edzer pointed out that this is not always the correct approach, and that we might want to segmentize, for example, along great circles. Doing this is beyond the scope of this PR though (and likely beyond the scope of a simple ggplot2 coord), so I addressed this issue by pointing it out in the documentation:
https://github.com/clauswilke/ggplot2/blob/e4cdcd67498e53885a3124b13959acf5aef53560/R/geom-sf.R#L43-L51

@clauswilke
Copy link
Member Author

clauswilke commented Mar 9, 2020

Point 3: Limits

Specifying limits in default coordinates (either through the min/max values in the data, or via xlim etc.) is complicated, because how these limits translate into regions in projected coordinates can be unintuitive, in particular for very curved coordinate systems. There is no algorithm that always works well, so I implemented a couple of different approaches that have different strengths and weaknesses. The default choice is relatively conservative, i.e., it is unlikely to make the limits too large, but it tends to make limits too narrow for very non-linear projections.

library(ggplot2)

tile_df <- expand.grid(
  xmin = seq(-140, -52, by = 20),
  ymin = seq(40, 70, by = 10)
)

tile_df$xmax <- tile_df$xmin + 20
tile_df$ymax <- tile_df$ymin + 10
tile_df$x <- (tile_df$xmin + tile_df$xmax) / 2
tile_df$y <- (tile_df$ymin + tile_df$ymax) / 2
tile_df$width <- 20
tile_df$height <- 10

p <- ggplot(
  tile_df,
  aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, x = x, y = y)
) + geom_tile(aes(height = 7.5, width = 5))

p + coord_sf(crs = 3979) # default limit method is "cross"

p + coord_sf(crs = 3979, lims_method = "box")

p + coord_sf(crs = 3979, lims_method = "orthogonal")

Created on 2020-03-08 by the reprex package (v0.3.0)

@clauswilke
Copy link
Member Author

clauswilke commented Mar 9, 2020

Point 4: Things can get wonky in weird coordinate systems

The reason for why was already explained under Points 2 and 3. Here are a few more examples with limits:

library(tidyverse)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0

world <- ne_countries(scale = "medium", returnclass = "sf")
fiji_polys <- world %>%
  filter(name == "Fiji") %>%
  pull(geometry) %>%
  st_cast("POLYGON") %>% 
  st_sf() %>%
  mutate(label = letters[1:n()])

# doesn't work, long-lat across the dateline causes problems 
ggplot(fiji_polys) +
  geom_sf()

# doesn't work, due to dateline boundary
ggplot(fiji_polys) +
  geom_sf() +
  coord_sf(crs = 3460)

# setting `default_crs = NULL` always fixes things
ggplot(fiji_polys) +
  geom_sf() +
  coord_sf(crs = 3460, default_crs = NULL)

# changing the limits method also works
ggplot(fiji_polys) +
  geom_sf() +
  coord_sf(crs = 3460, lims_method = "geometry_bbox")

# works
ggplot(fiji_polys) +
  geom_sf() +
  coord_sf(crs = 3460, lims_method = "orthogonal")

# works (not sure why)
ggplot(fiji_polys) +
  geom_sf() +
  coord_sf(crs = 3460, lims_method = "orthogonal") + 
  xlim(-170, 170) +
  ylim(-22, -12)

Created on 2020-03-08 by the reprex package (v0.3.0)

And now an example with straight lines. These plots probably don't look the way we would want them to.

library(ggplot2)

cities <- data.frame(
  x = c(-63.58595, 116.41214, 13.50, -149.75),
  y = c(44.64862, 40.19063, 52.51, 61.20),
  city = c("Halifax", "Beijing", "Berlin", "Anchorage")
)

cities$xend <- cities$x[c(2, 4, 1, 3)]
cities$yend <- cities$y[c(2, 4, 1, 3)]

p <- ggplot(cities, aes(x, y, xend = xend, yend = yend)) +
  geom_point() +
  # view of the north pole
  coord_sf(crs = 3995, lims_method = "box")

p + geom_segment()

p +
  geom_hline(
    aes(yintercept = y, col = city)
  ) +
  geom_vline(
    aes(xintercept = x, lty = city)
  )

Created on 2020-03-08 by the reprex package (v0.3.0)

However, surprisingly, the following works just fine. Note: this does not use geom_sf() at all, only coord_sf()!

library(ggplot2)
library(ggrepel)

cities <- data.frame(
  x = c(-63.58595, 116.41214, 13.50, -149.75),
  y = c(44.64862, 40.19063, 52.51, 61.20),
  city = c("Halifax", "Beijing", "Berlin", "Anchorage")
)

ggplot(cities, aes(x, y)) +
  annotation_map(
    map_data("world"),
    fill = "antiquewhite", colour = "darkgrey", size = 0.1
  ) +
  geom_point() +
  geom_text_repel(aes(label = city), box.padding = unit(20, "pt")) +
  coord_sf(crs = 3995, lims_method = "box")

Created on 2020-03-08 by the reprex package (v0.3.0)

@clauswilke
Copy link
Member Author

Point 5: Additional helper functions

This point applies primarily to a function I needed to write that makes it easy to transform coordinates among different coordinate systems, without having to worry about creating and then unpacking sf objects. I called this function sf_transform_xy():
https://github.com/clauswilke/ggplot2/blob/e4cdcd67498e53885a3124b13959acf5aef53560/R/coord-sf.R#L470-L517

This is how it works:

library(ggplot2)

# location of cities in NC by long (x) and lat (y)
data <- data.frame(
  city = c("Charlotte", "Raleigh", "Greensboro"),
  x =  c(-80.843, -78.639, -79.792),
  y = c(35.227, 35.772, 36.073)
)

# transform to projected coordinates
data_proj <- sf_transform_xy(data, 3347, 4326)
data_proj
#>         city       x         y
#> 1  Charlotte 7275499 -60169.66
#> 2    Raleigh 7474260  44384.13
#> 3 Greensboro 7357835  57438.27

# transform back
sf_transform_xy(data_proj, 4326, 3347)
#>         city       x      y
#> 1  Charlotte -80.843 35.227
#> 2    Raleigh -78.639 35.772
#> 3 Greensboro -79.792 36.073

Created on 2020-03-08 by the reprex package (v0.3.0)

This function is not only useful for extension writers, it can also be useful to endusers who want to, e.g., calculate plot limits in projected coordinates. However, it is somewhat outside the normal scope of what ggplot2 does. So, two questions:

  1. Are we Ok exporting this or should I keep it internal for now?
  2. If we export it, is the current name fine, or should I change it?

This was the last comment in this series. I have addressed everything I could think of.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation — your reasoning sounds solid to me, so I think I think it makes sense to merge this PR. Exporting sf_transform_xy() seems like a reasonable compromise.

Given that the documentation for geom_sf()/coord_sf() is getting so long, we may want to consider a vignette/article in the future.

Thanks for all your hard work on this!

@clauswilke
Copy link
Member Author

Thanks! I think I'll wait merging this until after the 3.3.1 release. It's too much of a change for a minor release.

I agree with the documentation issues. I've also thought about breaking the man page into two, one for geom_sf() and one for coord_sf(), because it's starting to get confusing which parts of the text belong to which function.

@clauswilke clauswilke merged commit ccd94e1 into tidyverse:master Jun 24, 2020
@clauswilke
Copy link
Member Author

This is now merged. Thanks to everybody for the extensive feedback. Let's see how this plays out as people start testing this in the wild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants