Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Absence of mention of units of measure for columns is very surprising #741

Closed
iherman opened this issue Sep 21, 2015 · 4 comments · Fixed by #774
Closed

Absence of mention of units of measure for columns is very surprising #741

iherman opened this issue Sep 21, 2015 · 4 comments · Fixed by #774

Comments

@iherman
Copy link
Member

iherman commented Sep 21, 2015

Copy of a comment from Simon Cox on the mailing list, to be handled alongside the others.

I am involved with the Research Data Alliance activity on Data Types and Registries.
The goal of this is to
(i) develop a format/model for the description of the structure of datasets
(ii) allow the descriptions to be registered, so they can be referred to.
kinda like enhanced MIME-types, so that client applications know what's inside a dataset, not just the file format.
A prototype has already been developed by CNRI, with a test deployment.

There is clearly a significant shared concern with CSV on the web, so in preparation for meetings next week I consulted the Candidate Specs, particularly the "Model for Tabular Data and Metadata on the Web". I have not read the full suite of documents in detail, but was surprised to find that 'units of measure' is not mentioned in the set of 'core annotations' for columns http://www.w3.org/TR/tabular-data-model/#columns (in most tables data in a single column will have a common unito of measure).

I raised this with Jeremy, and he showed me the route which can be followed, by adding a column or traversing through the QB vocabulary.
However, this is complicated, and not made immediately available or even flagged in the text.
I strongly suggest
(i) at least alerting readers to how this very common requirement can be managed
(ii) better still, consider adding uom as a standard column annotation.

Simon Cox
CSIRO, co-convenor of RDA Data Types activity.

@gkellogg
Copy link
Member

Just my perspective, but I think the issue is that there is no one standard way of describing units in RDF data. As the basic data model used by CSV on the Web closely corresponds to RDF, the fact that literal values extracted from CSV cells don’t have more dimensions is related to this underlying lack of a data model for describing data with units.

Searching for this indicated a couple of different ways to handle it:

  • Define an OWL datatype which describes the values with units (see http://stackoverflow.com/questions/20248369/units-of-measurement-in-owl-and-rdf)

    unit:megaPascal rdf:type   rdfs:datatype ;
                 rdfs:label "MPa" .
    
    unit:Pascal rdf:type   rdfs:datatype ;
                 rdfs:label "Pa" .
    
    :AlMg3 prop:hasTensileStrength "300"^^unit:megaPascal .
    :AlMg3 prop:hasYieldStrength   "2"^^unit:Pascal .
    

    QUDT (http://www.openphacts.org/specs/units/) also describes similar methods.

    CSVW already supports this by allowing an arbitrary datatype using the @id field on a datatype (see http://www.w3.org/TR/tabular-metadata/#datatypes).

    @id If included, @id is a link property that identifies the datatype described by this datatype description. The value of this property becomes the id annotation for the described datatype. It must not start with _: and it must not be the URL of a built-in datatype.

  • Use a structured value to represent the data, for example:

    :AIMg3 prop:hasTensileStrength [rdf:value 300, ex:units unit:MegaPascal] .
    

    This can be supported using the virtual columns feature, which allows relationships to be created and allocate columns to different values. This might also be useful when the units varied on each row.

    I think describing a use case for this, and using this as an informative example in one of the documents, or a primer would be a good way to approach this right now. As common practice emerges, this could be incorporated into a future version of these specs, but this should be done in harmony with describing a standard way of describing dimensional data in RDF and JSON.

@6a6d74
Copy link
Contributor

6a6d74 commented Sep 22, 2015

@dr-shorthair [simon cox]

One of my key goals for CSV on the Web was to be able to convey CSV-encoded environmental data as RDF. This means that getting the units of measure explicitly mentioned is an important part of that information.

As Gregg points out, there is not a single convention for expressing units of measure. Gregg defines two, and then there's the RDF Data Cube mechanism of 'attaching' attributes (such as UoM) to the columns of data.

I am satisfied that I can express units of measure in the CSV-metadata ... but agree that it's not entirely straightforward. Your insight into the starting point of a scientist wanting to publish data is useful.

I think that the best way forward here is:
i) to ensure that the Primer that we plan to produce to accompany the Recommendations has a section on "representing scaled values that have units of measure" where we can illustrate each of the 3 mechanisms outlined above.
ii) add a note into the model document ( ref ) indicating that units of measure are not formally part of the tabular data model but that they can be incorporated in a number of ways ... perhaps with a reference to the Primer section. I think the model doc is the best place?

Would that be sufficient to resolve your concerns? (at least in the interim whilst there is no single convention for describing units of measure in RDF)

@dr-shorthair
Copy link

Thanks Jeremy - Yes - that would probably be enough for the moment. Definitely need to fill the silence in the model document, even if it is just to acknowledge that there is no single solution, and point out to the Primer, where some options are illustrated.

@JeniT
Copy link

JeniT commented Oct 14, 2015

@JeniT to add something in the model document to say that units of measure aren't treated as anything special.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants