Skip to content

Letter to CF

bekozi edited this page Sep 1, 2016 · 10 revisions

As part of an EarthCube project for advancing netCDF-CF, we are developing an approach to represent simple geometries in enhanced netCDF-4 with a variable length array backport for netCDF-3. Simple geometries, for example, may be used to associate stream discharge with river lines or surface runoff with watershed polygons. We've drafted an initial approach and reference implementation on GitHub netCDF-CF-simple-geometry project and would greatly appreciate feedback from the CF community. We'd like to make sure our scope is appropriate and our approach is acceptable.

Scope

  • The result of this effort will be a standard that the CF timeSeries feature type could use to specify spatial coordinates (define a simple geometry) for a timeSeries variable.
  • For those familiar with the OGC WKT standard geometry types, we will include Point, LineString, Polygon, Multipoint, MultiLineString, and MultiPolygon (WKT primitives and multipart geometries).

We anticipate that the six chosen geometry types will cover the needs of most people generating netCDF data. These types also align with other geospatial data formats such as GeoJSON and ESRI Shapefile. If our approach is well received by the CF community, we may later adapt it to include parametric shapes such as circles and ellipses.

Simple Geometry Encoding Method

Driven by the possibility that different features will require different numbers of coordinates to describe their geometries, our approach uses variable length (VLEN) arrays in enhanced netCDF-4 and continuous ragged arrays (CRAs) in netCDF-3. We describe the VLEN netCDF-4 approach first. The netCDF-3 CRA description follows.

In our approach, a VLEN coordinate_index variable which identifies the indices of geometry coordinates in separate coordinate arrays. The coordinate_index variable includes a coordinates attribute which stores the names of the coordinate variables and a geom_type attribute to indicate the geometry type.

For multipart geometries, the coordinate index variable may include a negative integer flag(s) indicating the start of each new geometry "part" for the current feature. The first geometry part is not preceded by the negative integer flag. The variable shall include an attribute named multipart_break_value identifying the flag's value.

For polygon geometries with holes (also called "interiors"), the coordinate index values shall include a negative integer flagging the start of each hole. In this case, the variable shall include a hole_break_value attribute to indicate the flag value.

Other attributes on the coordinate index variable describe clockwise or anticlockwise node order for polygons and polygon closure convention. For additional details, see the wiki. With these concepts defined, an example for multipolygons with holes is shown below. You can copy the WKT description below into Wicket if you'd like to see what the geometry in this example looks like.

Well-Known Text (WKT): MULTIPOLYGON(((0 0, 20 0, 20 20, 0 20, 0 0), (1 1, 10 5, 19 1, 1 1), (5 15, 7 19, 9 15, 5 15), (11 15, 13 19, 15 15, 11 15)), ((5 25, 9 25, 7 29, 5 25)), ((11 25, 15 25, 13 29, 11 25)))

Common Data Language (CDL) for netCDF-4 VLEN Arrays:

netcdf multipolygon_example {
types:
  int64(*) geom_VLType ;
dimensions:
  node = 25 ;
  geom = 1 ;
variables:
  geom_VLType coordinate_index(geom) ;
    string coordinate_index:geom_type = "multipolygon" ;
    string coordinate_index:coordinates = "x y" ;
    coordinate_index:multipart_break_value = -1 ;
    coordinate_index:hole_break_value = -2 ;
    string coordinate_index:outer_ring_order = "anticlockwise" ;
    string coordinate_index:closure_convention = "last_node_equals_first" ;
  double x(node) ;
  double y(node) ;
data:

 coordinate_index = 
    {0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12, -2, 13, 14, 15, 16, -1, 17, 18, 19, 20, -1, 21, 22, 23, 24} ;

 x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7, 5, 11, 15, 13, 11 ;

 y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29, 25, 25, 25, 29, 25 ;
}

You'll find additional examples for VLEN geometry storage on our wiki.

Variable Length (VLEN) Arrays in NetCDF-3

To support netCDF-3, we created a VLEN approach for netCDF-3. Inspired by CF continuous ragged arrays (CRAs), our approach drops the CRA count variable in favor of a stop variable that stores the stop index for each geometry within an array of geometry coordinates. This improves random accessibility of the CRA "elements" avoiding the need to sum counts preceding the target element index. The stop variable includes a contiguous_ragged_dimension attribute whose value is the name of the dimension for which stop indices apply (similar to the CRA sample_dimension attribute). An example showing how strings can be stored with this approach is shown below.

Common Data Language (CDL) for netCDF-3 CRAs:

netcdf dwarf_planets {
dimensions:
	dwarf_planet = 5 ;  // number of dwarf planets described in this file
	dwarf_planet_chars = 28 ;  // total number of characters for all planet names
variables:
	char dwarf_planet_name(dwarf_planet_chars) ;
	int dwarf_planet_name_stop(dwarf_planet) ;
		dwarf_planet_name_stop:contiguous_ragged_dimension = "dwarf_planet_chars" ;
data:
 dwarf_planet_name = "PlutoCeresErisHaumeaMakemake" ;
 dwarf_planet_name_stop = 5, 10, 14, 20, 28 ;
}

For the above geometry example, the VLEN coordinate_index netCDF-4 array is replaced by a netCDF-3 CRA.

netcdf multipolygon_example {
dimensions:
    node = 25 ;
    indices = 30;
    geom = 1 ;
variables:
  int coordinate_index(indices) ;
      coordinate_index:geom_type = "multipolygon" ;
      coordinate_index:coordinates = "x y" ;
      coordinate_index:multipart_break_value = -1 ;
      coordinate_index:hole_break_value = -2 ;
      coordinate_index:outer_ring_order = "anticlockwise" ;
      coordinate_index:closure_convention = "last_node_equals_first" ;
  int coordinate_index_stop(geom) ;
      coordinate_index_stop:contiguous_ragged_dimension = "indices" ;
  double x(node) ;
  double y(node) ;
data:
 coordinate_index = 0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12, -2, 13, 14, 15, 16, -1, 17, 18, 19, 20, -1, 21, 22, 23, 24 ;
 coordinate_index_stop = 30 ;
 x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7, 5, 11, 15, 13, 11 ;
 y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29, 25, 25, 25, 29, 25 ;
}

The CRA method could of course be used in place of VLEN in netCDF-4. See our wiki page on GitHub for more details and examples.

Questions for the CF Community

  1. Are our VLEN netCDF-3 and netCDF-4 approaches acceptable? What changes would you recommend?
  2. Are the geometry types point, line, polygon, and their multipart equivalents sufficient for the community?

Thank you very much for considering our ideas and helping us with your valuable feedback!