Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Project proposal: Dicing or partitioning Ontology for RDF Data Cubes #1068

Open
6a6d74 opened this issue Aug 8, 2018 · 11 comments
Open
Labels
Projects

Comments

@6a6d74
Copy link
Contributor

6a6d74 commented Aug 8, 2018

The RDF Data Cube specification supports 'slicing' across one dimension or more, thereby reducing the dimensionality of the data cube. Originally when the RDF ontology was proposed, the UN SDMX statisticians could not agree on a vocabulary for further sub-setting or summarizing.

At the OGC TC in March 2018, there was recognition that there was a commonality underlying many proposed big data cubes, geospatial data cubes, map tiles, vector tile sets, data partitions, result paging, etc.

With the OGC enthusiasm for the newer, more flexible, less schematic, more RESTful, Web Feature Service V3.0, there seems to be a push to review the entities that appear in various web services and generalizing them to use across a variety of services and APIs.

It appears to me that there are some very common patterns in data partitioning that could be re-used, especially if the concepts and terminology were refined; e.g. along one dimension, partition according to:

  • item count (give me the first 10 000 values, then the next 10 000, ...) or
  • measure along the dimension (give me everything between 0.0 to 45.0, then 45.0 to 90.0, ...) or
  • data volume (give me the first 10MB of data values, then the next 10MB, ... and by the way, tell me the index value of the dimension boundaries)

These patterns could be applied in 1D (timeseries), 2D (map tiles), 3D (Cesium), or more.

Chris Little sees this as being complementary and orthogonal to the QB4ST work.

Rob Atkinson said he previously had played with URL templates referencing QB components...

This proposal would support arbitrary service interfaces. The W3C DXWG work on profile descriptions might be a pathway to classifying such services. Documenting such services, and various subset relationships is important and not well supported, but possibly some of the DCAT work will help, but probably just recommend using an external vocabulary. So leveraging that to justify this new work makes sense.

QB metadata for subsets once transferred is another concern, but would be a use case for the same vocabulary.

@6a6d74
Copy link
Contributor Author

6a6d74 commented Aug 8, 2018

Request to SDW IG participants:

Please identify if you support this Project proposal - either as is or with amendments, and indicate whether you are keen/able to contribute effort.

Pending responses from the IG, this proposal may be promoted to a SDW IG Project.

@6a6d74
Copy link
Contributor Author

6a6d74 commented Aug 8, 2018

Chris Little and Rob Atkinson have already noted their support. Will you join them?

@6a6d74 6a6d74 added this to To do in Proposals Aug 8, 2018
@6a6d74 6a6d74 added the proposal label Aug 8, 2018
@6a6d74
Copy link
Contributor Author

6a6d74 commented Aug 8, 2018

Chris Little notes:

  1. The initial project document gave some simple examples. Tilesets/Levels of Detail give a different kind of instantaneous partitioning of a data cube.
  2. Assume that this will become, initially, a W3C effort, rather than Joint OGC-W3C, as not restricted to geospatial data.
  3. Who else is interested in taking this forward?

@rob-metalinkage
Copy link
Contributor

Note that another effort looked at the general case - https://github.com/lorenae/qb4olap/wiki

This is not in the W3C canon.

One option is to worry about the spatio-temporal functions, consistent with QB4OLAP - but defined standalone with an alignment to it - (so no direct dependencies)

I think DGGS as a spatial dimension may also be important.

The UNGGIM stats discussion needs to be brought into the frame here.

So clearly, there is scope for fragmented, inconsistent approaches to propagate, unless a BP or enabling specification can emerge

@lvdbrink
Copy link
Contributor

lvdbrink commented Aug 9, 2018

So clearly, there is scope for fragmented, inconsistent approaches to propagate, unless a BP or enabling specification can emerge

Would this fit into the Statistics on the Web BP then? @BillSwirrl

@rob-metalinkage
Copy link
Contributor

I think there are two aspects -

  1. describing statistics
  2. describing service interfaces according to the operations they perform (and on what dimensions typically)

I dont think the latter is likely to surface strongly in the statistics on the web BP - maybe I'm wrong - but its a general issue around designing OGC services to be more web friendly - can we describe what they do?

these issues kind of come together when we start to look at distributions of datasets via services (supported explicitly by current revisions of DCAT) - and also how datasets and distributions relate (is one a slice of another)

I think its a solvable with a BP approach, but we need to first test some ideas and establish the BP :-)

@chris-little
Copy link
Contributor

Thanks for these comments and links, rob-metalinkage and lvdbrink.
I notice that the OLAP proposal does not seem to have been touched since 2015?

I am not sure that I understand their use of 'dice'. They seem to use it as an aggregation process over the levels of hierarchical dimensions. This does not fit with my naïve view of partitioning 'observations' along various non-hierarchical dimensions (pure QB) into practical chunks. Their proposal seems to automatically calculate the derived first order statistics for each such chunk, and use such statistics as proxies for the underlying data.

As rob-metalinkage says, there seems to be some orthogonality between calculating statistics and partitioning. the simple partitioning may not correspond to the grouping required for meaningful statistics.

@tidoust
Copy link
Member

tidoust commented Aug 9, 2018

  1. describing service interfaces according to the operations they perform (and on what dimensions typically)

For reference, I note that for what @6a6d74 presents as "item count", there was some standardization work at W3C on a Linked Data Platform Paging spec, which was shelved for lack of implementors.

@VladimirAlexiev
Copy link

QB4OLAP is definitely worth investigating and extending if possible. "Not touched since 2015" is not enough reason to disregard it. I have the papers listed at https://github.com/lorenae/qb4olap/wiki/4)-Publications in case someone wants them.

QB4OLAP's hierarchical features need to be reconciled with the following:

  • qb:HierarchicalCodelist, which is a way to graft and cobble a codelist from other codelists
  • qb4st:subdivides, which relates two dimension properties where one represents a lower-level refArea division, and the other higher-level

@6a6d74
Copy link
Contributor Author

6a6d74 commented Jun 23, 2020

@chris-little ... not much happening here! Do you think that this work / concept can be incorporated into the OGC Data Tiles activity?

@chris-little
Copy link
Contributor

@6a6d74 Many years ago, as we started the Met-Ocean extensions to OGC Web Coverage Service, I also tried to started work on a Web Coverage Tile Service in OGC. There was resistance at the time, and it went into abeyance. There has been a lot more work since, in OGC Interoperability Experiments and Testbeds, on tiling (2D and 3D), and this is now manifesting as input into the OGC API - Tiles standard.

Also, there is now a conceptual model for multi-dimensional tiling in the OGC pipeline, with a 2D concrete implementation extension Core Tiling Conceptual and Logical Models for 2D Euclidean Space. It has been out for public comment and is currently subject to an electronic vote for release as an underlying "Abstract Specification Topic".

There is no extension yet for 3D tessellations or for tiling that involves overlaps and gaps. It is not clear to me that any of the OGC 3D tiling work has a solid underlying conceptual model.

There is still a gap between these (real, actual) space tiling efforts and the idea of tiling/tessellating the abstract spaces of the RDF Data Cube Vocabulary, which would encompass paging/partitioning.

I do not think that the OGC data tiles activities will bridge the gap to ontologies.

What does @lieberjosh think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Proposals
  
To do
Development

No branches or pull requests

6 participants