New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement geom_sf_label() and geom_sf_text() #2761

Merged
merged 27 commits into from Aug 24, 2018

Conversation

Projects
None yet
4 participants
@yutannihilation
Member

yutannihilation commented Jul 16, 2018

This PR closes #2742.

Background

It's very convenient that geom_sf() automatically chooses the proper geometries (point, line or polygon) based on the type of the simple feature in the data. But, we sometimes want to choose different type of geoms, for example, texts and labels.

However, adding flexibility to geom_sf() (or underlying sf::st_as_grob()) leads to unmaintainable complexity as discussed in #2111. So, we need to implement it as separate geoms/stats for code simplicity. Here, I propose geom_sf_label() and geom_sf_text(), and their corresponding stat stat_sf_coordinates().

Design

StatSfCoordinates

Texts and labels need pairs of x and y. StatSfCoordinates is a stat to provide these via computed variable x and y (lower letters). stat_sf_coordinates() is the stat_ function corresponding to this.

X and Y dimensions of a sf object can be retrieved by sf::st_coordinates(). But, we cannot simply use sf::st_coordinates() texts and labels require exactly one coordinate per geometry whereas it returns multiple locations for a polygon or a line. Instead, these two steps are needed:

  1. Choose one point per geometry by some function like sf::st_centroid() and sf::st_point_on_surface().
  2. Retrieve coordinates from the points by sf::st_coordinates().

For the first step, users can pass arbitrary function via parameter fun.geometry. By default, function(x) sf::st_point_on_surface(sf::st_zm(x)) is used; I think sf::st_point_on_surface() is more appropriate than sf::st_centroid() since labels and texts usually are intended to be put within the polygon or the line.

geom_sf_label() and geom_sf_text()

geom_sf_label() and geom_sf_text() are basically same with geom_label() and geom_text() for each, except that:

  • they uses StatSfCoordinates for stat
  • they set geometry aesthetic automatically just as geom_sf() does
  • they accept fun.geometry argument

Considerations

Z and M dimension

Simple feature objects may have Z and/or M dimension in addition to X and Y. These dimensions may be useful, but sometimes they are problematic:

  • M dimension makes functions like sf::st_point_on_surface() and sf::st_centroid() fail.
  • Z dimension may do no harm, but Z dimension is not always available in the results.

For simplicity, we simply drop Z and M and use only X and Y.

Warning for longitude/latitude data

Functions that does calculation based on the distances may warn if the on data is in unprojected coordinate system. Though the warnings should not be suppressed, we need to be careful to use geom_sf_label() in examples. Otherwise, the tests fail.

TODOs

  • Add tests
  • Write more docs about stat_sf_coordinates()
  • Write more docs about argument fun.geometry
    • Write some notes about warnings for longitude/latitude data
  • Do we really support Z and M dimension? (no)

Usages

library(ggplot2)

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

# texts and labels
p <- ggplot(nc[1:3, ]) +
  geom_sf(aes(fill = AREA))

p + geom_sf_text(aes(label = NAME), colour = "white")
#> Warning in st_point_on_surface.sfc(data$geometry): st_point_on_surface may
#> not give correct results for longitude/latitude data

p + geom_sf_label(aes(label = NAME))
#> Warning in st_point_on_surface.sfc(data$geometry): st_point_on_surface may
#> not give correct results for longitude/latitude data

# using the stat
p + stat_sf_coordinates(colour = "white", size = 10)
#> Warning in st_point_on_surface.sfc(data$geometry): st_point_on_surface may
#> not give correct results for longitude/latitude data

p +
  geom_errorbarh(
    aes(geometry = geometry,
        xmin = stat(x) - 0.1,
        xmax = stat(x) + 0.1,
        y = stat(y),
        height = 0.04),
    colour = "white",
    stat = "sf_coordinates"
  )
#> Warning in st_point_on_surface.sfc(data$geometry): st_point_on_surface may
#> not give correct results for longitude/latitude data

Created on 2018-08-19 by the reprex package (v0.2.0).

@yutannihilation yutannihilation changed the title from [WIP] Implement geom_sf_label() and geom_sf_text() to Implement geom_sf_label() and geom_sf_text() Aug 11, 2018

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 11, 2018

This PR is ready to review now.

But, I don't see why Travis fails. especially, on R 3.1; it fails on test-function-args, which I think I've fixed. What's happening...?

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 11, 2018

The sf package is missing for 3.1. For the more recent releases, I have noticed that Travis seems to have network issues today. If you restart a failed build it may pass the next time.

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 11, 2018

Thanks. Hmm, does absense of sf affects the function args? If so, I think I'm missing something. Anyway, will give it a try later.

yutannihilation added some commits Aug 11, 2018

Set the default of fun.geometry to NULL
When sf package is not installed, test-function-args fails.

@yutannihilation yutannihilation force-pushed the yutannihilation:geom-sf-label branch 2 times, most recently from 2228250 to 7111bf7 Aug 11, 2018

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 11, 2018

OK, I think I found the cause. I've made two mistakes below, both of which are not appropriate when sf package is not installed.

  • use sf::st_point_on_surface() as the default value of some functions' arguments
  • reference sf's documents

@yutannihilation yutannihilation force-pushed the yutannihilation:geom-sf-label branch from 7111bf7 to b444c24 Aug 11, 2018

@yutannihilation yutannihilation force-pushed the yutannihilation:geom-sf-label branch from b444c24 to bfe7f3a Aug 12, 2018

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 12, 2018

Confirmed the build on 3.1 passed! Thank you for your advice, Claus.

(devel still fails due to the curl's "Connection timed out" error; I think I'm not lucky enough to let all 5 builds pass at the same time...)

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 12, 2018

You can restart just one particular failed build by clicking on the reload icon to the right of it. In this way, you won't mess up the builds that already succeeded.

I have a few comments/questions:

  1. Have you considered using lower case for x, y, z, m? That seems more in line with typical ggplot2 naming.
  2. Have you considered returning NAs for z and m when those values are not available? Would that make things overall more predictable?
  3. I don't fully understand the comment about errors when z and m are not available. Does my point 2 address this or is there a different type of error that can occur?
@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 12, 2018

You can restart just one particular failed build by clicking on the reload icon to the right of it.

The icon doesn't show; it seems I don't have the permission, unfortunately.

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 12, 2018

(Disclaimer: I'm not a GIS expert but a guy who just wants to add labels on geom_sf(). So, I might be wrong, especially about technical phrasing...)

  1. Have you considered using lower case for x, y, z, m? That seems more in line with typical ggplot2 naming.

In sf package (and probably in simple feature access), they are always written as capital. So, I felt it's more natural to use capital letters for computed variables so that we can distinguish them from ggplot's coordinates. That said, I'm still wondering whether they are essentially the same thing, X and Y dimensions of simple feature access, and x and y axis of ggplot2...

  1. Have you considered returning NAs for z and m when those values are not available? Would that make things overall more predictable?

Thanks, this sounds nice.

  1. I don't fully understand the comment about errors when z and m are not available. Does my point 2 address this or is there a different type of error that can occur?

Different. As described above, StatSfCoordinates takes these two steps and the error occurs in the first step:

  1. Choose one point per geometry by some function like sf::st_centroid() and sf::st_point_on_surface().
  2. Retrieve coordinates from the points by sf::st_coordinates().

Your suggestion is about when the first step succeeds but doesn't return Z and M value, but what I was thinking is when the calculation itself fails.

For example, consider we have a sf object of single POLYGON feature and want to determine a location to plot a label:

library(ggplot2)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3

mat_poly_xy <- rbind(
  c(1, 1), c(-1, 1), c(-1, -1), c(1, -1), c(1, 1)
)
poly_xy <- st_polygon(list(mat_poly_xy))
poly_xy
#> POLYGON ((1 1, -1 1, -1 -1, 1 -1, 1 1))

In this case, we can choose a point on the polygon by st_point_on_surface() and extract the coordinates by st_coordinates().

d_poly_xy <- st_sf(geometry = st_sfc(poly_xy))

d_pt_xy <- st_point_on_surface(d_poly_xy)
xy <- st_coordinates(d_pt_xy)

So we can combine the coordinates with the original data

cbind(d_pt_xy, xy)
#> Simple feature collection with 1 feature and 2 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 0 ymin: 0 xmax: 0 ymax: 0
#> epsg (SRID):    NA
#> proj4string:    NA
#>   X Y    geometry
#> 1 0 0 POINT (0 0)

But, if the polygon has M dimension, the function fails.

mat_poly_xym <- cbind(mat_poly_xy, 1)

poly_xym <- st_polygon(list(mat_poly_xym), dim = "XYM")

poly_xym
#> POLYGON M ((1 1 1, -1 1 1, -1 -1 1, 1 -1 1, 1 1 1))

d_poly_xym <- st_sf(geometry = st_sfc(poly_xym))

st_point_on_surface(d_poly_xym)
#> Error in CPL_geos_op("point_on_surface", x, numeric(0), integer(0), numeric(0), : GEOS does not support XYM or XYZM geometries; use st_zm() to drop M
@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 12, 2018

the comment about errors when z and m are not available

Ah, in short, I commented about errors when Z and/or M are available, which is not always happy. Does my answer make sense?

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 12, 2018

The Z and M issue still isn't entirely clear to me. Do I understand correctly that the default setup fails if the geometries contain Z or M coordinates? In this case, wouldn't it be better for the default to remove Z and M and then run sf::st_point_on_surface()? In other words, something like this:

#' @param fun.geometry
#'   A function that takes a `sfc` object and returns a `sfc_POINT` with the
#'   same length as the input. If `NULL`, `function(x) sf::st_point_on_surface(sf::st_zm(x))` will be
#'   used.
@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 12, 2018

Do I understand correctly that the default setup fails if the geometries contain Z or M coordinates?

Generally, yes. (It may not fail depending on what fun.geometry you use, though.)

Thanks, function(x) sf::st_point_on_surface(sf::st_zm(x)) looks nice to me. For simplicity, now I'm thinking about supporting only X and Y, I mean:

  • always drop Z and M by default (using function(x) sf::st_point_on_surface(sf::st_zm(x)))
  • always return only X and Y
  • use the lower case x and y for computed variables.

Does this look better to you?

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 20, 2018

I have no idea why the Travis build still fails, but it seems unrelated to this PR.

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 20, 2018

Ah, do I have to wait for #2838 to be solved?

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 20, 2018

I assume it's because of #2824. That needs to be resolved first.

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 20, 2018

#2838 has been solved, #2824 not yet. #2824 is the problem.

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 20, 2018

I got it, Sorry that I don't keep up the discussion...

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 22, 2018

#2824 is now resolved.

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 22, 2018

Yay, all checks passed!

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 22, 2018

Looks good to me. @hadley, any concerns from your side?

@hadley

hadley approved these changes Aug 22, 2018

A couple of documentation style issues, and needs a news bullet; otherwise looks good!

R/sf.R Outdated
@@ -6,6 +6,7 @@
#' an unusual geom because it will draw different geometric objects depending
#' on what simple features are present in the data: you can get points, lines,
#' or polygons.
#' For texts and labels, you can use `geom_sf_text` and `geom_sf_label`.

This comment has been minimized.

@hadley

hadley Aug 22, 2018

Member

text -> text; functions should have () after them

R/sf.R Outdated
#' @rdname ggsf
#' @inheritParams geom_label
#' @inheritParams stat_sf_coordinates
#' @seealso [stat_sf_coordinates()]

This comment has been minimized.

@hadley

hadley Aug 22, 2018

Member

I think this seealso belongs in the documentation block above.

#'
#' `stat_sf_coordinates()` extracts the coordinates from 'sf' objects and
#' summarises them to one pair of coordinates (x and y) per geometry. This is
#' convenient when you draw an sf object as geoms like texts and labels (so

This comment has been minimized.

@hadley

hadley Aug 22, 2018

Member

texts -> text (English is weird)

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 23, 2018

Thanks! I was very anxious if my English is strange, so the comments are very helpful...

For a news bullet, should I wait for the release of v3.0.1?

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 23, 2018

No, please add the news bullet now. Thanks!

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 23, 2018

Sure!

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 23, 2018

Added a bullet.

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 24, 2018

@clauswilke Could you merge this if this is OK? Then I can create another PR to improve the error message you commented :)

@clauswilke clauswilke merged commit cac3a95 into tidyverse:master Aug 24, 2018

4 checks passed

codecov/patch 90.14% of diff hit (target 77.64%)
Details
codecov/project 77.79% (+0.14%) compared to 5a37ae7
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 24, 2018

Thanks!

@yutannihilation yutannihilation deleted the yutannihilation:geom-sf-label branch Aug 24, 2018

@adrfantini

This comment has been minimized.

adrfantini commented Aug 24, 2018

This is an extremely useful feature. Thanks!!!

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 24, 2018

The merge broke the build. Could you look into what's going wrong? I'm in meetings all day.

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 24, 2018

Hmm, sorry if this is my fault. I will investigate.

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 24, 2018

Now it's all green. I definitely saw "error" status, but I can't find the errored build in the history. What happened...?

https://travis-ci.org/tidyverse/ggplot2/builds

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 24, 2018

Did I deleted my branch too early?

@yutannihilation

This comment has been minimized.

Member

yutannihilation commented Aug 24, 2018

Anyway, it seems alright now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment