plotting polygon shapefiles

Ben Carlson edited this page Nov 25, 2015 · 22 revisions
Clone this wiki locally

Plotting polygon shapefiles

Goal

Use ggplot2 to plot polygons contained in a shapefile.

Packages

  require("rgdal") # requires sp, will use proj.4 if installed
  require("maptools")
  require("ggplot2")
  require("plyr")

Data

The shapefile can be extracted from

http://www.epa.gov/nheerl/arm/documents/design_doc/ecoregion_design.zip

http://archive.epa.gov/nheerl/arm/web/html/design_files.html

http://archive.epa.gov/nheerl/arm/web/zip/ecoregion_design.zip

Introduction

A shapefile is a group of files containing feature geometry and feature attribute data, as specified by Environmental Systems Research Institute, Inc. (ESRI).  In this example, we are interested in a shapefile that contains the geometry type polygons.  The example shapefile is eco_l3_ut, which contains the ecoregions of Utah.  The group files comprises

  • eco_l3_ut.shp
  • eco_l3_ut.dbf
  • eco_l3_ut.shx
  • eco_l3_ut.prj
  • eco_l3_ut.sbn 
  • eco_l3_ut.sbx

The .shp is the main file and contains feature geometry.  The .dbf file contains the attributes of the feature.  The elements of the two files are linked by their offsets in the file: the first geometric feature (offset 0 in the shp) has its attributes in the first attribute tuple (offset 0 in the dbf file).  There is a one-to-one relationship between geometry tuples and attribute tuples.  This structure will be key when preparing shapefile contents for ggplot2.

The .prj file is the projection of the geometry components.  Using the projection file is often critical when dealing with multiple shapefiles having different projections.  rgdal will use this file to set projection information when creating a spatial object from a shapefile (assuming the .prj file exists or the CRS string is given).  The information can be used to re-project spatial objects as needed.  See the rgdal and sp documentation for details.

Preparation

The shapefile contents are loaded as a spatial object and transformed for presentation by ggplot2.

  utah = readOGR(dsn=".", layer="eco_l3_ut")
  utah@data$id = rownames(utah@data)
  utah.points = fortify(utah, region="id")
  utah.df = join(utah.points, utah@data, by="id")

Notes

Object utah is an instance of class sp::SpatialPolygonsDataFrame, which holds polygons with attributes.  Any spatial manipulation (such as re-projection) is performed on this object.

ggplot2 will render the polygons using geom_polygon, which expects a standard data frame containing polygon verticies and attribute data.  The information contained in utah must be transformed into such a standard data frame.

utah slots are

  • bbox
  • data
  • plotOrder
  • polygons
  • proj4string

The slots of primary interest are utah@data (the geometry attributes) and utah@polygons (the geometry features).  The attributes and features are related to each other by order, so the relationship is implicit.  Relationship attributes need to be explicit so that the geometry attributes can be joined with the geometry features.

  utah@data$id = rownames(utah@data)

explicitly identifies attribute rows by the .dbf offset.

  utah.points = fortify(utah, region="id")

melts the polygons into points, tags each point with the id value of the corresponding attribute row, and tags each point with values from the polygon from which the point was derived.

region selects the feature attribute name to tag each row in the new data frame.  Each geometry feature offset gets the value from the feature attribute at the same offset.  In this case, it is utah@data$id.  The attribute utah.points$id contains the mapped value.  If a different attribute were used (region="LEVEL3"), utah.points$id would contain the corresponding value of utah@data$LEVEL3.

utah.points attributes include spatial coordinates (long and lat), group (coordinates having the same group belongs to the same polygon), and id (each id identifies a feature attribute tuple).

One of the remaining attributes is hole.  Each polygon in utah@polygons is defined by one or more polygons. When defined by multiple polygons, the sub-polygons may be disjunct islands or holes. A hole part should remove geometry from its utah@polgons instance.  When utah.points is created and individual polygons are melted, the hole state is retained, but this information is not used by ggplot2. Depending on the rendering sequence of the polygons, holes may overlay and obscure other polygons, or, more likely and more problematic, polygons are rendered without their fill removed from the hole, which obscures what should be seen through the hole.

  utah.df = join(utah.points, utah@data, by="id")

joins the points to their corresponding attributes and finalizes the data preparation.

Plotting

  ggplot(utah.df) + 
  aes(long,lat,group=group,fill=LEVEL3_NAM) + 
  geom_polygon() +
  geom_path(color="white") +
  coord_equal() +
  scale_fill_brewer("Utah Ecoregion")

Problems

  utah.points = fortify.SpatialPolygonsDataFrame(utah, region="LEVEL3")
  names(utah.points)[which(names(utah.points)=="id")] = "LEVEL3"
  utah.df = join(utah.points, utah@data, by="LEVEL3")
  ggplot(utah.df) + 
    aes(long,lat,group=group,fill=LEVEL3_NAM) + 
    geom_polygon() +
    geom_path(color="white") +
    coord_equal() +
    scale_fill_brewer("Utah Ecoregion")

In this plot, one of the polygons in the ecoregion Wasatch and Uinta Mountains is obscured by a the polygon of ecoregion Colorado Plateaus because the hole is not used to remove fill from the Colorado Plateaus polygon. In the original SpatialFramesDataFrame, we got lucky in that Wasatch and Uinta Mountains were defined as three separate polygon instances in utah@polygons rather than as a single instance of multiple polygons. We also were lucky that the Wasatch and Uinta Mountains polygon inside Colorado Plateaus appears after the Colorado Plateaus polygon. Using id as the region identifier maintained this order, and the plot was properly rendered. If the shapefile had been produced in another way, we might not have been so lucky and have ended up with the Wasatch and Uinta Mountains polygon obscured. Bottom line: plots need to be checked for correctness.

Workaround: forced layering of geom_plot

Force-order the plot to insure obscured polygons are moved above their obscurers.

  utah.points = fortify.SpatialPolygonsDataFrame(utah, region="LEVEL3")
  names(utah.points)[which(names(utah.points)=="id")] = "LEVEL3"
  utah.df = join(utah.points, utah@data, by="LEVEL3")
  ggplot(utah.df) + 
    aes(long,lat,group=group,fill=LEVEL3_NAM) + 
    geom_polygon(data=subset(utah.df,LEVEL3!=19)) +
    geom_polygon(data=subset(utah.df,LEVEL3==19)) +
    geom_path(color="white") +
    coord_equal() +
    scale_fill_brewer("Utah Ecoregion")

Additional techniques

Getting at label points

Felipe points out that sp::coordinates returns the centroid label points from SpatialPolygons.

  label_points = coordinates(utah)

The coordinate points can be bound to the data to specify text location in geom_text.

References

ESRI Shapefile Technical Description http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf

Detailed discussion of package sp classes (and many other spatial topics) can be found in the excellent Bivand, R.S., Pebesma, E.J. & Gómez-Rubio, V., 2008. Applied Spatial Data Analysis with R 1st ed., Springer.

See http://cran.r-project.org/web/views/Spatial.html for a complete list of spatial packages in context.