Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AGU 2016 Poster #40

Closed
bekozi opened this issue Nov 9, 2016 · 69 comments
Closed

AGU 2016 Poster #40

bekozi opened this issue Nov 9, 2016 · 69 comments
Assignees
Labels
Milestone

Comments

@bekozi
Copy link
Collaborator

bekozi commented Nov 9, 2016

Template is up: https://docs.google.com/drawings/d/1zwJTWQ9uOkuLxTnDNBdKLlVWDUFoIQ2UcE89P9UzppI/edit.

Please edit as you see fit. I am impressed with Google Drawings for this sort of thing. If we can't get things quite aligned, we can export and refine. Otherwise, I recommend we just continue to use this. Google has PDF and SVG export options.

@bekozi bekozi added the docs label Nov 9, 2016
@bekozi bekozi self-assigned this Nov 9, 2016
@twhiteaker
Copy link
Owner

I added an image with catchments and streams, along with graphs of (real) streamflow and ET data from August. The highlighted stream and the graph lines use the same cornflower blue as in the simple geometry image to the left. The simple geom images use similar colors to the USGS and NOAA logos in the top left, so I used orange for the catchments to be similar to the UT logo in the top right. Orange is also a complementary color to blue so it makes the image pop. I suggest using a color other than black for the leader lines, graph outlines, and map outline, but that can come later.

How does that look for the middle box? What else do you want in there? Another example, some text, or make the existing example bigger?

What else do you need help with? I was thinking of filling in text for "What is a simple geometry" next.

@dblodgett-usgs
Copy link
Collaborator

I just attempted to work up the CDL example. This seem reasonable? Totally draft, please help make better.
screen shot 2016-11-25 at 7 15 54 am

@dblodgett-usgs
Copy link
Collaborator

The use of 'instance' should be explained some where else on the slide. I have a hard time describing 'instance' versus 'element'. Maybe... "each simple feature is an instance that we describe with element variables"

@bekozi
Copy link
Collaborator Author

bekozi commented Nov 28, 2016

How does that look for the middle box? What else do you want in there? Another example, some text, or make the existing example bigger?

@twhiteaker: Thanks for putting the image together. It looks good, and I agree the color combination is nice. I think we should add a point example (stream gauge, infrastructure). With that, this should be sufficient for a real-world example. Is it possible to put values on the streamflow and evapotranspiration plots? Makes it look more "realistic". Not necessary if difficult.

What else do you need help with? I was thinking of filling in text for "What is a simple geometry" next.

@twhiteaker: Yes, please tackle the simple geometry section. I will work on the text in the other sections.

I just attempted to work up the CDL example. This seem reasonable? Totally draft, please help make better.

@dblodgett-usgs: Graphical layout is excellent with the highlighting and arrows + boxes. I thought we were going to link the CDL example with hydrologic catchment data used in the data graphic? No big deal as this looks sufficient as an explanatory graphic. I think we should try and work in a multi-geometry example here as this is a confusing bit. Is that simple for you to do with your graphic? I noticed the geom_type was multipolygon in the CDL anyway.

We should stick the full Bull Creek CDL with data variables below the hydrologic catchment graphic in any case. I'll add that. It will also fill out the space I expect.

The use of 'instance' should be explained some where else on the slide. I have a hard time describing 'instance' versus 'element'. Maybe... "each simple feature is an instance that we describe with element variables"

I'm not sure we should adopt the DSG lingo if we are trying to move towards OGC. I think instance ~= feature and element ~= data. Is this true? But yeah, a paragraph on relationships to DSG makes sense.

@dblodgett-usgs
Copy link
Collaborator

Are we trying to move toward OGC? I would much rather focus on CF adoption. Yeah. Instances are the features, so any variable defined on the instance dimension only is feature attribute data. Elements are those variables defined on temporal or other element dimensions as well as the instance dimension.

@dblodgett-usgs
Copy link
Collaborator

I'll work up a multipolygon example if you guys think it's not going to be too confusing. I was sticking to a hole just to make it a simple demonstration of the contiquous_ragged_dimension details.

@bekozi
Copy link
Collaborator Author

bekozi commented Nov 28, 2016

Are we trying to move toward OGC? I would much rather focus on CF adoption.

I think we should provide crosswalks with OGC at all times at the very least. CF adoption is most important but that doesn't mean compromising unnecessarily - using instance/element is not that much of a compromise.

Yeah. Instances are the features, so any variable defined on the instance dimension only is feature attribute data. Elements are those variables defined on temporal or other element dimensions as well as the instance dimension.

I have a couple questions regarding the use of instance identifiers. We can talk about them at a later date, but I am trying to wrap my head around this approach.

  • How does this work with multiple geometries in a single NetCDF group? Are there multiple variables with the instance_id attribute? It's fine when the geometry count is constant across coordinate index variables, but what happens when there are, say, four catchments with seven gauges?
  • All geometry coordinate index variables will require a unique identifier variable?

I'll work up a multipolygon example if you guys think it's not going to be too confusing. I was sticking to a hole just to make it a simple demonstration of the contiquous_ragged_dimension details.

I'm looking more closely at your example. I just now noticed that it is for a holed polygon and not a polygon on green background. 😶 With that in mind, I don't think we need a multi-polygon example since you are demonstrating the use of break values.

A couple other questions:

  • Are instance identifier variables always strings? Can they be integer data types?
  • The indexing is one-based. This should be indicated on the coordinate index variable with start_index=1. Python is zero-based. R is one-based.

@dblodgett-usgs
Copy link
Collaborator

How does this work with multiple geometries in a single NetCDF group? Are there multiple variables with the instance_id attribute? It's fine when the geometry count is constant across coordinate index variables, but what happens when there are, say, four catchments with seven gauges?
All geometry coordinate index variables will require a unique identifier variable?

By my understanding of the DSG spec, this would be handled by putting things in separate files. The reason for this is that the 'featureType' attribute is global. Having multiple featureTypes present in the same file would add quite a bit of complexity. For the purposes of our CF 1.* compatible proposal, I think we should probably stick to that model. That said, I do think putting the 'geometry_type' attribute in the coodinate_index rather than the global attributes, is a good idea to allow for multiple geometry_types in a single file (e.g. watersheds polygons and their associated outlet locations).

Are instance identifier variables always strings? Can they be integer data types?

They don't HAVE to be strings. This is a common practice though and is a nice way to embed identifiers of any type generically rather than requiring coercion to int or some other data type. The code I've worked up uses strings but I'm hoping to loosen that up as I work forward.

The indexing is one-based. This should be indicated on the coordinate index variable with start_index=1. Python is zero-based. R is one-based.

Good call. I missed this in the spec draft. Will add it.

While I'm thinking about it, for the contiguous ragged array indexing, I found it helpful to make a distinction between things that index into the contiguous_ragged_dimension and things that index into the coordinate dimension. In my code I've named variables with 'ind' for things in the contiguous ragged dimension and 'coord' in the coordinate dimension. My naming was arbitrary. My point here is that the distinction between the two kinds of indexing is really critical and is pretty easy to stumble over unless you are really explicit about how you talk about them. Just something to think about in the poster.

@twhiteaker
Copy link
Owner

twhiteaker commented Nov 28, 2016

@dblodgett-usgs what do you think of adding x- and y-axes to the CDL example geometry so that users can more easily find the coordinates described in the CDL?

@dblodgett-usgs
Copy link
Collaborator

Good idea. But not super easy to do. I just screenshot a GIS rendering of the shapefile. Could we hack it in by hand?

@twhiteaker
Copy link
Owner

If you send me the shapefile I could do this in ArcGIS.

@dblodgett-usgs
Copy link
Collaborator

Here it is.

sample.zip

@twhiteaker
Copy link
Owner

Added axis labels to the graphs. Is the font size (10) too small? I didn't think the labels were important enough to make them as big as other fonts on the poster.

@bekozi
Copy link
Collaborator Author

bekozi commented Nov 28, 2016

Thanks for the explanations, @dblodgett-usgs.

By my understanding of the DSG spec, this would be handled by putting things in separate files. The reason for this is that the 'featureType' attribute is global. Having multiple featureTypes present in the same file would add quite a bit of complexity. For the purposes of our CF 1.* compatible proposal, I think we should probably stick to that model. That said, I do think putting the 'geometry_type' attribute in the coodinate_index rather than the global attributes, is a good idea to allow for multiple geometry_types in a single file (e.g. watersheds polygons and their associated outlet locations).

I would really, really, really like to make the spec compatible with different geometry counts per data/element variable. It's probably best to avoid the issue with the poster directly provided the point examples have the same count as the polygons and stream segments. If we do not want to propose this, we should at least make sure that it can be proposed during the next "version". It may be as easy as adding an instance_dimension/geom_dimension to the coordinate index variable.

And, yes, definitely keep the geometry type out of the global attributes.

While I'm thinking about it, for the contiguous ragged array indexing, I found it helpful to make a distinction between things that index into the contiguous_ragged_dimension and things that index into the coordinate dimension. In my code I've named variables with 'ind' for things in the contiguous ragged dimension and 'coord' in the coordinate dimension. My naming was arbitrary. My point here is that the distinction between the two kinds of indexing is really critical and is pretty easy to stumble over unless you are really explicit about how you talk about them. Just something to think about in the poster.

Interesting to think about. The Python code uses an object to translate in and out of CRAs and mostly relies on variable-length unless reading/writing.

@twhiteaker
Copy link
Owner

@dblodgett-usgs I added coordinate grid to the CDL polygon screenshot. How does it look?

@bekozi
Copy link
Collaborator Author

bekozi commented Nov 28, 2016

Added axis labels to the graphs. Is the font size (10) too small? I didn't think the labels were important enough to make them as big as other fonts on the poster.

Small font is fine. There are ways to read it if someone is desperate. Looks like real data now. Thanks! 😄

@twhiteaker
Copy link
Owner

FYI, Grid labels in CDL polygon screenshot are Arial 24, RGB (110, 110, 110)

@twhiteaker
Copy link
Owner

Added simple geometry text. I talk about "features" in there, so we may want to harmonize that with whatever you all decide to use for feature/instance/element.

There's a bullet about what multiparts are for. Not sure if this is important enough to get a bullet, but I think it's something the CF-metadata readers were a little confused about.

@twhiteaker
Copy link
Owner

Added some fake points to the catchment/river screenshot. Soil moisture is from NLDAS.

@twhiteaker
Copy link
Owner

Since the graphs look like real data now, I added the data source just under the graph title.

@bekozi
Copy link
Collaborator Author

bekozi commented Nov 28, 2016

👍

@bekozi
Copy link
Collaborator Author

bekozi commented Nov 29, 2016

Added Bull Creek CDL to the poster. It captures multiple geometries with time-varying data variables. It differs slightly from @dblodgett-usgs's CDL which uses instance identifiers. I think it's okay to have the different approaches. We can use these examples when deciding on draft spec. It's open for editing now of course.

@dblodgett-usgs: Were you planning to add text for the CRA v. VLen? I think you moved your CDL graphic around a bit.

P.S. Does anyone know the CF standard names for streamflow, evapotranspiration, and soil moisture?

@dblodgett-usgs
Copy link
Collaborator

I think it would be helpful to keep the bull creek example 'conceptual' and give a more basic netcdf3 example on the poster. I've got a list of things to comment about the Bull Creek CDL, but not sure that's worth providing right now since this is a NetCDF-4 VLEN example.

@dblodgett-usgs
Copy link
Collaborator

I could draft the text for CRA/VLEN, but think one of you might be better. I'm not familiar with VLEN at all since I'm focused on a CF1.* spec addition.

@twhiteaker
Copy link
Owner

The Bull Creek CDL has upstream node of river segments instead of the three fake soil moisture stations. I would also remove the GNIS_Name and AreaSqKm variables to simplify things. Ah, and then there's Dave's comment above about using a more basic example.

I suggest:

  1. If you're going to include a Bull Creek CDL, make it just for streamflow for river lines. The idea is to keep the example simple enough that folks can grasp it quickly so they can discuss it with us (Dave) while looking at the poster. The Bull Creek example shows how simple geometry can represent features associated with the data variables, which the simple example green polygon example on the right doesn't show.
  2. Can we add a part to the green polygon on the right, and add a purple polygon, so that we have multiparts and multiple geometries in the example? I think at least adding a second polygon feature would be useful since there seemed to be confusion on the use of the stop index on the CF list.

@twhiteaker
Copy link
Owner

I think adding a section in the top left briefly summarizing what we're doing would be a nice lead in the story that unfolds naturally from top left to bottom right. Otherwise, the poster doesn't seem to inform the user of what we're doing until bottom middle. This would require some shuffling of the sections around.

@dblodgett-usgs
Copy link
Collaborator

  • 👍 to only showing one data variable.
  • I think I'm OK showing the geometry as VLEN actually, as long as conventions is CF 2.0!
  • Need to strip out ALL un needed attributes and make sure they align with the draft spec.

@twhiteaker I'd be happy to do multipolygon for the example. I'll make you the shapefile and switch the CDL in a bit. Should we not do hole then?

@twhiteaker
Copy link
Owner

Either a hole or a multipart is good for demonstrating break values. A second geometry is good for demonstrating the coordinate index stop. Let's start with a hole and a second geometry. If there's room and the poster and time, I might play with adding a second part to the first geometry. I don't think it would make the CDL too complex, but I don't think it's vital either.

@twhiteaker
Copy link
Owner

Grid for two poly one multi.
ex_two_poly_multi

@twhiteaker
Copy link
Owner

@dblodgett-usgs change -2 to -1 in your data.

@twhiteaker
Copy link
Owner

Jeez I'm lazy. Ok, here's what I'm suggesting. Also, I took out the hole break value since no holes. And I fixed the coordinates so that they were all anticlockwise...by hand, so hopefully it's right.

dimensions:
	char = 1 ;
	instance = 2 ;
	coordinate_index = 16 ;
	coordinates = 15 ;
variables:
	char instance_name(instance, char) ;
		instance_name:standard_name = "instance_id" ;
	int coordinate_index(coordinate_index) ;
		coordinate_index:long_name = "ragged index for coordinates and geometry break values" ;
		coordinate_index:geom_coordinates = "x y" ;
		coordinate_index:multipart_break_value = -1 ;
		coordinate_index:start_index = 1 ;
		coordinate_index:outer_ring_order = "anticlockwise" ;
		coordinate_index:closure_convention = "last_node_equals_first" ;
		coordinate_index:geom_type = "multipolygon" ;
	int coordinate_index_stop(instance) ;
		coordinate_index_stop:long_name = "index for last coordinate in each instance geometry" ;
		coordinate_index_stop:contiguous_ragged_dimension = "coordinate_index" ;
	double x(coordinates) ;
		x:units = "degrees_east" ;
		x:standard_name = "geometry_x_node" ;
	double y(coordinates) ;
		y:units = "degrees_north" ;
		y:standard_name = "geometry_y_node" ;

// global attributes:
		:Conventions = "CF-1.8" ;
data:

 instance_name =
  "1",
  "2" ;

 coordinate_index = 1, 2, 3, 4, 5, -1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ;

 coordinate_index_stop = 11, 16 ;

 x = 35, 30, 25, 26, 35, 22, 22, 15, 10, 22, 30, 30, 20, 10, 30 ;

 y = 25, 30, 28, 23, 25, 22, 27, 25, 20, 22, 10, 20, 20, 15, 10 ;
}

@twhiteaker
Copy link
Owner

After playing with several colors for highlighting, I didn't find any of them harmonious with the rest of the poster. Color might also be confusing since we use color to associate sections of CDL with features in the map in the other CDL example. In the end I just lightened the black a bit.

@dblodgett-usgs
Copy link
Collaborator

Opps... @twhiteaker - My code is fine, the WKT I created that gets read in is encoded wrong. The second polygon is encoded as a hole that is outside the first polygon!

@dblodgett-usgs
Copy link
Collaborator

Updated a few things. Here's some better cdl created by this script: https://github.com/dblodgett-usgs/NCDFSG/tree/master/demo

netcdf demoPoly {
dimensions:
	char = 1 ;
	instance = 2 ;
	coordinate_index = 16 ;
	coordinates = 15 ;
variables:
	char instance_name(instance, char) ;
		instance_name:units = "unknown" ;
		instance_name:standard_name = "instance_id" ;
	int coordinate_index(coordinate_index) ;
		coordinate_index:long_name = "ragged index for coordinates and geometry break values" ;
		coordinate_index:geom_coordinates = "x y" ;
		coordinate_index:geom_dimension = "instance" ;
		coordinate_index:start_index = 1 ;
		coordinate_index:outer_ring_order = "anticlockwise" ;
		coordinate_index:closure_convention = "last_node_equals_first" ;
		coordinate_index:geom_type = "multipolygon" ;
		coordinate_index:multipart_break_value = -1 ;
	int coordinate_index_stop(instance) ;
		coordinate_index_stop:long_name = "index for last coordinate in each instance geometry" ;
		coordinate_index_stop:contiguous_ragged_dimension = "coordinate_index" ;
	double x(coordinates) ;
		x:units = "degrees_east" ;
		x:standard_name = "geometry_x_node" ;
	double y(coordinates) ;
		y:units = "degrees_north" ;
		y:standard_name = "geometry_y_node" ;

// global attributes:
		:Conventions = "CF-1.8" ;
data:

 instance_name =
  "1",
  "2" ;

 coordinate_index = 1, 2, 3, 4, 5, -1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ;

 coordinate_index_stop = 11, 16 ;

 x = 35, 26, 25, 30, 35, 22, 10, 15, 22, 22, 30, 10, 20, 30, 30 ;

 y = 25, 23, 28, 30, 25, 22, 20, 25, 27, 22, 10, 15, 20, 20, 10 ;
}

demoShape.zip

@dblodgett-usgs
Copy link
Collaborator

Oh, and 👍 to the grey highlighting!

@dblodgett-usgs dblodgett-usgs modified the milestone: AGU Dec 2, 2016
@twhiteaker
Copy link
Owner

Thanks for the new CDL, @dblodgett-usgs. After reading #44 (comment), I wondered about y:standard_name = "geometry_y_node" in our CDL. Standard names seem to be reserved for physical quantities or coordinates, e.g., projection_y_coordinate, whereas geometry_y_node is referring to the role that the variable plays. I'd really like to use y:cf_role = "geometry_y_node", although in #44 the discussion seemed to be leaning toward using cf_role as an instance identifier as it is currently used in CF, with the understanding that CF currently likes only one (but sometimes two) variables with cf_role per file. But couldn't our proposal suggest that we expand cf_role to be used for something other than instance identifier? If not, how about geom_role?

Why is instance_name:units = "unknown" ; there?

All of your rings are clockwise. Either reverse them or use coordinate_index:outer_ring_order = "clockwise" ;. I suggest reversing your rings to play nice with any OGC fans in the poster hall.

@dblodgett-usgs
Copy link
Collaborator

OK, thanks for the in-depth review here, @twhiteaker.

On the standard name, the geometry_y_node is a physical quantity in a way. x/y imply spatial data. This is akin to the projection_x_coordinate, so would it be geometry_x_coordinates ? I don't think we need a cf_role for the coordinate data if we can find a suitable standard name.

My code to add instance data doesn't handle units yet. It just takes a table (DBF-like) and dumps it in NetCDF. We can remove that for the poster.

I also hadn't worked on ring order yet. TBH, I don't even know how to programatically check if they are clockwise or counterclockwise. I have just been naively dumping them in NetCDF files and reading them back out. I'll look into this before next week. dblodgett-usgs/NCDFSG#4

@twhiteaker
Copy link
Owner

To me, geometry_x_coordinates implies something about the role the variable is playing, whereas projection_x_coordinate is ignorant of its role with respect to any feature geometries. CF would establish the role for a projection_x_coordinate variable named lon by using data_var:coordinates = "lat lon", which is analogous to coordinate_index:geom_coordinates = "x y". Ah, and there would also be a lon:axis = "X" attribute. So I don't think we need to use cf_role after all. I think the existing CF constructs and standard names can work to define geometry x node using our proposed coordinate_index:geom_coordinates = "x y" and:

double x(coordinates) ;
		x:units = "degrees_east" ;
		x:standard_name = "longitude" ; // or projection_x_coordinate
		x:axis = "X" ;

@dblodgett-usgs
Copy link
Collaborator

dblodgett-usgs commented Dec 6, 2016

Got my ring order issue fixed. I regenerated the CDL and updated the poster.

@bekozi
Copy link
Collaborator Author

bekozi commented Dec 6, 2016

No major additions on my end! Have at it.

@bekozi
Copy link
Collaborator Author

bekozi commented Dec 7, 2016

A note: I don't think we should use axis on geometry coordinates. The VLEN example has multiple geometry coordinate variables. I don't believe multiple axis attributes are acceptable.

@dblodgett-usgs
Copy link
Collaborator

My interpretation of axis is that it's a declaration of the axis type to be interpreted when you look at the 'coordinates' of a given data variable. You may be right though.

@dblodgett-usgs
Copy link
Collaborator

From @bekozi

@dblodgett-usgs / @twhiteaker: Think we need a poster reorg? The flow does not tell a great story. I'm 50/50. All the content is there and is definitely sufficient for talking points. One reorg option:

  • Justification & Motivation moved to left column.
  • Storing geometries in NetCDF moved to middle.
  • Roadmap moved to right.
  • Contiguous... and Proposed... moved to central column.
  • Hydrologic Use Case... moved to right.

Get's a 👍 from me.

@dblodgett-usgs
Copy link
Collaborator

I just realized I need to print this tomorrow morning! YIPES!!! I really don't want to work tonight if I can help it. Any chance you guys can help finalize?

@twhiteaker
Copy link
Owner

I'm tied up today. I might have time tonight, but best chance is tomorrow morning. Can you post back say around 4:30pm Central with what still needs to be done?

@bekozi
Copy link
Collaborator Author

bekozi commented Dec 8, 2016

Yeah. Let me move stuff around and give it a last read.

@bekozi
Copy link
Collaborator Author

bekozi commented Dec 8, 2016

My editing and review is complete. Spacing forced swapping Justification... and Storing... from what I proposed above. I think it looks pretty good!

@twhiteaker
Copy link
Owner

I fixed some typos. I changed standard_name to long_name for evapotranspiration.

I think it's good enough. But here are some other lingering things to ignore unless you are a glutton for last minute punishment.

Is there a reason why stop_encoding = "cra" isn't in the CRA CDL example?

CRA example uses axis="X" but VLen example uses cf_role. Did we decide to stick with cf_role? It's all running together in my head.

Is catchments_coordinate_index:geom_dimension = "catchments_geom" needed since the coordinate index is dimensioned by catchments_geom? Couldn't software just see that the coordinate index dimension matches the catchments_evapotranspiration second dimension? I'm embarrassed to say that I still haven't mastered VLen even though I've been part of this effort for months. I still live in netCDF 3 world along with my compact disc player and blue jean shorts.

@bekozi
Copy link
Collaborator Author

bekozi commented Dec 8, 2016

CRA example uses axis="X" but VLen example uses cf_role. Did we decide to stick with cf_role? It's all running together in my head.

I think we kind of decided to suffer a bit of disarray. I expect @dblodgett-usgs will speak to the six ways to Sunday nature of these schemas.

Is catchments_coordinate_index:geom_dimension = "catchments_geom" needed since the coordinate index is dimensioned by catchments_geom? Couldn't software just see that the coordinate index dimension matches the catchments_evapotranspiration second dimension?

Not really with VLen. However, it is needed if the example were CRA as the coordinate index variable dimension would be longer than the number of geometries/instances. Hence, it is left on there.

@dblodgett-usgs
Copy link
Collaborator

I'll take a last pass of these two and strip out stuff that needs to be normalized between then. I think it's better to leave questions than inconsistencies.

@dblodgett-usgs
Copy link
Collaborator

Done and dusted. See here for analytics: https://goo.gl/#analytics/goo.gl/0NI4Sd/all_time

@bekozi
Copy link
Collaborator Author

bekozi commented Dec 9, 2016

Congrats, all! 😂

As @twhiteaker said, 'twas a pleasure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants