Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Tile Matrix Set to describe multiscales #44

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

thomas-maschler
Copy link

This PR implements the changes discussed in #30 and during the Zarr Sprint on Feb. 8, 2024 (participants: @maxrjones and @thomas-maschler).

It refactors the current multiscales metadata attribute and replaces the current dataset definition with the OGC Two Dimensional Tile Matrix Set standard. This change will allow for more flexibility when defining the layout of multiscales and embrace already existing standards instead of reinventing the wheel.

The Tile Matrix Set standard includes all information currently covered by the dataset definition and includes additional information on chunk layout, pixel size, and origin of the matrix.

geozarr-spec.md Outdated Show resolved Hide resolved
@felixcremer
Copy link

Do you have a link to the OGC Tile Matrix Set standard. I am currently working on https://github.com/JuliaDataCubes/PyramidScheme.jl a Julia package for generating and working with pyramid datasets mainly for plotting and I aim to be complaint with geozarr in reading and writing these datasets.

I will have a more depth look in the next days and try to implement this standard in Julia.

Co-authored-by: Felix Cremer <felix.cremer@dlr.de>
@thomas-maschler
Copy link
Author

Here the link: https://docs.ogc.org/is/17-083r4/17-083r4.html

geozarr-spec.md Outdated Show resolved Hide resolved
Copy link

@wietzesuijker wietzesuijker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I've added a few suggestions and inline questions.

geozarr-spec.md Outdated Show resolved Hide resolved
geozarr-spec.md Outdated Show resolved Hide resolved
Within the Tile Matrix Set
* the Tile Matrix identifier for each zoom level MUST be the relative path to the Zarr group which holds the DataArray variable
* zoom levels MUST be provided from lowest to highest resolutions
* the `supportedCRS` attribute of the Tile Matrix Set MUST match the crs information defined under **grid_mapping**.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the duplication?

* MAY list the min and max rows and columns for each zoom level. If omitted, it is assumed that the entire spatial extent is covered (resulting in higher chunk count of the DataArray).

#### Resampling Method
Resampling Method specifies which resampling method is used for generating multiscales. It MUST be one of the following string values. Resampling method MUST be the same across all zoom levels:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the options constrained by xarray or tms? It would be good to link to a source that details the options (e.g. could I use the 20th percentile (though not sure why)).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the list from rasterio. But looking at this again, I think this should probably be an implementation detail. It will be enough to say that it must be of type string and the same across all zoom levels

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the same syntax in the Tile Matrix Set spec? I searched and couldn't find anything. Is this also a requirement for COGs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, resampling is not part of TMS or COGs.
This property was already part of the current specs. It is useful for assuring consistency when progressively adding data to the same zarr store. Otherwise you might end up with overview chunks that were resampled using different methods.

+ ]
+ "multiscales":
- {
- "tile_matrix_set": "https://schemas.opengis.net/tms/2.0/json/examples/tilematrixset/WebMercatorQuad.json",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how concise this becomes!
I Assume this supports any TMS, e.g. one based on an equal area projection such as epsg:6933.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should work with most CRS as long as you can reference it via URL (such as EPSG), represent it as WKT, or in the ISO 19115 standard.
Well-known Tile Matrix Sets are listed here: https://schemas.opengis.net/tms/2.0/json/examples/tilematrixset/
But you can always define your own.

thomas-maschler and others added 2 commits February 27, 2024 11:52
Co-authored-by: Wietze <wietze@space-intelligence.com>
Co-authored-by: Wietze <wietze@space-intelligence.com>
geozarr-spec.md Show resolved Hide resolved
* MAY list the min and max rows and columns for each zoom level. If omitted, it is assumed that the entire spatial extent is covered (resulting in higher chunk count of the DataArray).

#### Resampling Method
Resampling Method specifies which resampling method is used for generating multiscales. It MUST be one of the following string values. Resampling method MUST be the same across all zoom levels:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the same syntax in the Tile Matrix Set spec? I searched and couldn't find anything. Is this also a requirement for COGs?

@briannapagan
Copy link

@thomas-maschler discussed in the SWG meeting today, it would be helpful before approving PRs like this if we have an example zarr store to test interoperability before approving - can you provide one? A few of us are available for testing.

- }
+}
```
#### Using a URI

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should lean towards recommending well-known or explicit identifiers instead of URIs, especially to maintain a 'self-describing' format

@thomas-maschler
Copy link
Author

@thomas-maschler discussed in the SWG meeting today, it would be helpful before approving PRs like this if we have an example zarr store to test interoperability before approving - can you provide one? A few of us are available for testing.

@briannapagan, initially I discussed with @maxrjones that he would give it a first try, he was planning to add some extra functionality to ndpyramids. However, if he didn't manage to find the time for this I should be able to do that and create some example Zarr stores with different overview layouts/ TMS.

@maxrjones
Copy link

@thomas-maschler discussed in the SWG meeting today, it would be helpful before approving PRs like this if we have an example zarr store to test interoperability before approving - can you provide one? A few of us are available for testing.

@briannapagan, initially I discussed with @maxrjones that he would give it a first try, he was planning to add some extra functionality to ndpyramids. However, if he didn't manage to find the time for this I should be able to do that and create some example Zarr stores with different overview layouts/ TMS.

My apologies, I haven't found time for this yet.

If implemented, each DataArray MUST define the 'multiscales' metadata attribute which includes the following fields:
* `tile_matrix_set`
* `tile_matrix_set_limits` (optional)
* `resampling_method`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called resampling_method?
I think of this as an aggregation method, because the high resolution data is aggregated to coarser resolutions to make visualisation easier.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view, when data is made less detailed, the process isn't just about combining numbers. Resampling specifically refers to the techniques used to interpolate or approximate pixel values as data is transformed to a different spatial resolution (nearest neighbour, bilinear, cubic). The choice of resampling method can greatly influence the quality and interpretive value of the final imagery. While the word "aggregation" might make you think of just adding or averaging numbers, "resampling" points to a broader set of actions and shows that there's more complexity in working with spatial data (e.g. weighted average of the four nearest pixels).

@briannapagan
Copy link

We have some folks interested in having a dedicated discussion about this PR and understanding some of its implications, can @maxrjones @thomas-maschler @felixcremer @wietzesuijker join our next bi-weekly zarr call?

@felixcremer
Copy link

I won't most likely not be able to attend this weeks geozarr call, since Wednesday is a public holiday in Germany.

I worked on implementing the multiscale functionality in PyramidScheme.jl and I am more and more convinced, that the multiscale specification should be independent from the geozarr specification. Building pyramids of a dataset is not restricted to geospatial data but is also
used in bioimaging for example. see the on going discussion about
multiscales in Zarr zarr-developers/zarr-specs#125. So I would suggest not to define multiscale images as part of the GeoZarr spec but rather work on a domain-agnostic multiscale convention and once that is finished we link to it in the GeoZarr spec.

As a side note, a source of recurrent confusion when implementing TMS for GeoZarr was that in TMS the concept of "TIles" is a central part of the specification. In contrast, the Zarr specs present n-dimensional arrays to the user which can be seen as one entity and where the chunking structure is rather an (important) implementation detail In practice this means that when users query a subset of a zarr array a in all zarr implementations they know they would simply write some form of a[start_index:end_index] so requests are done on pixel-level and the implementation takes care of looking up the correct chunks. On the other hand queries into a TMS are explicitly by tile, meaning that the user queries tiles for a given bounding box and is left with the overhead of concatenating the results.
In my PyramidScheme.jl implementation it felt weird to mix these two worlds of tile-based and element-based access and I would be interested to see an implementation that puts TMS and Zarr together for some inspiration. Until then I tend to favor the multiscale convention proposal linked above, since it seems more in line with the general zarr interface idea.

@christophenoel
Copy link

Building pyramids of a dataset is not restricted to geospatial data but is also used in bioimaging

Hi Felix,

Being not restricted to geospatial data, this is similar to many aspects covered by GeoZarr, which aims to reuse existing standards (such as OGC Tile Matrix Set) and indicate which location/placeholder must be used in the encoding.

However, it is important to note that the pyramid structure is a key aspect for GeoZarr as it aims to offer functions equivalent to alternative formats such as COG within the Zarr format. Additionally, pyramid structures for Earth Observation (EO) data have their own particularities, such as resolution, compared to geospatially agnostic pyramids.

@christophenoel
Copy link

Feel free to check the playlist below for demonstration of the pyramids encoded in Zarr datasets: https://www.youtube.com/watch?v=NYhh66EstnY&list=PLzPGC4s5HQOPdeLoK1MXK6gEa1x2Az8Dn

@thomas-maschler
Copy link
Author

@briannapagan is it still worth joining tomorrow's call? I will only be able to join during the second half. If this can wait another two weeks, i might be able to prepare a POC.

@mdsumner
Copy link

I'm following this, unlikely I can make the call sadly

@maxrjones
Copy link

I received a calendar notice that the meeting was moved to next week; unfortunately I am unavailable on May 8th at 11 ET but could join in two weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants