Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collaboration between UNECE Rec20 and QUDT #32

Closed
VladimirAlexiev opened this issue Jan 28, 2022 · 22 comments
Closed

collaboration between UNECE Rec20 and QUDT #32

VladimirAlexiev opened this issue Jan 28, 2022 · 22 comments

Comments

@VladimirAlexiev
Copy link

@dr-shorthair @steveraysteveray @nissimsan @mgh128

Now that we know UNECE are actively engaging in Linked Data (eg see #24),
what's the best way for the two initiatives to collaborate?

Here are some thoughts #24 (comment), please contribute more ideas or arguments.

@mgh128
Copy link

mgh128 commented Jan 28, 2022

Would it be helpful if we do the following:

For QUDT, provide a JSON-LD dataset equivalent to http://qudt.org/2.1/vocab/unit (currently in Turtle), using the improved JSON-LD context / framing ideas suggested by @VladimirAlexiev at qudt/qudt-public-repo#386 (comment)

For UN/CEFACT, provide an enhanced version of https://service.unece.org/trade/uncefact/vocabulary/rec20.jsonld in which (as a minimum) we add the correct pure numeric conversion multiplier and conversion offsets (perhaps just by extracting those from the equivalent terms in QUDT). Also prepare an equivalent dataset in which the identifiers share a common Web URI stem but terminate in a forward slash followed by the UN ECE Rec20 code value such as 'KGM'.

I think if we did this, then it should help QUDT users who would like the QUDT dataset in JSON-LD or JSON, while also making the UN/CEFACT Rec20 Linked Data dataset more usable by software. Potentially the latter could even include cross-reference links to equivalent unit terms within QUDT, so that the mutual recognition is two-way, rather than (as currently) only one-way from QUDT to the UN/CEFACT Rec20 code values (though not yet to UN/CEFACT Linked Data Web URIs based on those Rec20 code values).

If these suggestions would be helpful, I'd be happy to contribute to developing the above.

@VladimirAlexiev
Copy link
Author

Sounds like a reasonable plan, mate!

But what to do about the customary/weird Rec20 units like "military sticks"? Just leave them with little data?

@dr-shorthair
Copy link

http://qudt.org/2.1/vocab/unit (currently in Turtle)

I believe that version is the 'source of truth' for building the QUDT web site.
Of course TTL is just a serialization of the underlying graph, and JSON-LD is just another one.
And in both cases there are multiple ways that they could be laid out that are logically equivalent.

Is the requirement here that there is a single file available for download, in JSON to keep the webbies happy?
Need to be very clear about how it would be derived from, and synched with, the reference or canonical information.

@steveraysteveray
Copy link

Indeed, the Turtle files are the master representations.

@nissimsan
Copy link
Contributor

Admittedly I struggle to (find time to) keep up. But this sounds like a sketch of a Rosetta Stone for units. Hard not to get excited about the prospects.

The only thing I will note is that the starting point of this project is that everything gets done by NDR-based transformations. However, I'm personally very willing to give up that constraint in exchange for a (freakin') Rosetta Stone. Especially considering the isolated and well-defined scope of Rec20. Looking very much forward to seeing your contributions on this...!! :)

@mgh128
Copy link

mgh128 commented Feb 1, 2022

Hi @nissimsan
Not sure I understand what you mean by the abbreviation NDR - is that Network Data Representation or something else?

It sounds as though the existing Turtle files are the master (rather than the master being a database from which the Turtle files are generated by some script)

What I was proposing is that we develop a repeatable procedure / script using tools such as https://github.com/digitalbazaar/jsonld.js/ to convert whatever the current Turtle source file is into the equivalent JSON-LD using a context resource and framing to make the result as JSON-friendly as possible. I understand that it needs to be a procedure (e.g. node script / shell script) that is repeatable whenever the source Turtle file is updated.

To combine the most helpful aspects of QUDT with the work from UN/CEFACT, I was thinking that it should be possible to construct one of more SPARQL CONSTRUCT / INSERT queries that would extract and generate the additional triples needed - using its result as an enhanced Turtle file, that is then also made available as JSON-LD as described above.

@VladimirAlexiev
Copy link
Author

VladimirAlexiev commented Feb 1, 2022

NDR is Naming and Design Rules, I think.

@steveraysteveray
For EPCIS, I used this workflow: https://github.com/gs1/EPCIS/tree/master/Ontology#conversion-to-jsonld.
But as per qudt/qudt-public-repo#386 (comment), your existing "View As.. JSONLD" popup is pretty good, just needs to be turned into a normal response.

@dr-shorthair
Copy link

The OBO guys have begun a Rosetta Stone service here: https://units-of-measurement.org/
It uses UCUM codes as the keys.
If we could prepare a mapping table from UCUM to UNECE then I believe the CEFACT bridge could be added.

Right @kaiiam ?

@mgh128
Copy link

mgh128 commented Feb 1, 2022

Looking at the Linked Data for https://units-of-measurement.org/m.s-1 and the Linked Data for https://units-of-measurement.org/kW.h there are skos:exactMatch links to equivalent resources at QUDT but I didn't see any links to UN ECE Rec20 code values so far.

QUDT does appear to link to the UCUM code value and UN ECE Rec 20 code value but primarily as strings or typed literals. For example, at http://qudt.org/vocab/unit/M-PER-SEC we can find the following triples:

<http://qudt.org/vocab/unit/M-PER-SEC> <http://qudt.org/schema/qudt/uneceCommonCode> "MTS" .

<http://qudt.org/vocab/unit/M-PER-SEC> <http://qudt.org/schema/qudt/ucumCode> "m.s-1"^^<http://qudt.org/schema/qudt/UCUMcs> .

( for the UCUM code - but not yet to a URI resource https://units-of-measurement.org/m.s-1 )

Having said that, QUDT appears to also have some gaps. For example, at http://qudt.org/vocab/unit/KiloW-HR we can find this triple:
<http://qudt.org/vocab/unit/KiloW-HR> <http://qudt.org/schema/qudt/ucumCode> "kW.h"^^<http://qudt.org/schema/qudt/UCUMcs> .

but we can not yet find the following triple to express the corresponding UN ECE Rec 20 code value:

<http://qudt.org/vocab/unit/KiloW-HR> <http://qudt.org/schema/qudt/uneceCommonCode> "KWH" .

So at this stage, I'm not convinced that UCUM is the only starting point for cross-referencing - or even the most appropriate starting point, since I have not yet found actual examples of UCUM pointing to UN ECE Rec20 code values.

We can write SPARQL queries to try to match UCUM resources for units with QUDT resources for units - and since QUDT appears (based on a very small sample above) to have greater coverage of links to Rec20 code values, I think we can populate a third column with the Rec20 code values and see where we have gaps. Filling in those gaps provides useful feedback to improve QUDT.

From that third column, we can find terms within https://service.unece.org/trade/uncefact/vocabulary/rec20.jsonld whose rdf:value property has a value that matches the Rec20 code in the third column, thus populating a fourth column with the current UN/CEFACT unit resource URIs

I hope that doing this preparatory work of a four-column mapping table might even motivate our colleagues at UN/CEFACT to consider creating equivalent resources formed from a common Web URI stem appended with the Rec20 code value.

@dr-shorthair
Copy link

The range of skos:exactMatch is skos:Concept, so there needs to be a URI at least.

Yes, QUDT does already have some (many? all?) of the UNECE codes as annotations, so those codes can be traced starting from https://units-of-measurement.org but I suspect we could do better. Mind you, the ambitions of https://units-of-measurement.org are modest - just to expose mappings between these URI sets. After a fairly thorough evaluation, it was decided that UCUM was the best option for the keys.

not yet found actual examples of UCUM pointing to UN ECE Rec20 code values.

UCUM would not do this. As far as Regenstrief and LOINC are concerned, UCUM is self-contained and fit for their purpose. It is widely deployed in medical and health contexts. I have not evaluated NDR so have no opinion on whether this has the same scope and could be an alternative. But UCUM is very very good already.

Note that if you find holes in the QUDT annotations, then just raise and issue https://github.com/qudt/qudt-public-repo/issues or better still, submit a PR https://github.com/qudt/qudt-public-repo/pulls - they tend to get processed very quickly and the whole QUDT service gets rebuilt every month or thereabouts.

@mgh128
Copy link

mgh128 commented Feb 2, 2022

I've now done some initial experiments using SPARQL CONSTRUCT - please see https://github.com/mgh128/UnitUnity

@kaiiam
Copy link

kaiiam commented Feb 2, 2022

Thanks for the mention @dr-shorthair. Yes one goal of UOM is to use UCUM codes as a Rosetta stone between existing systems.

@mgh128 UOM currently doesn't support UN ECE Recommendation 20 unit codes but I'm happy in include such mapping either as codes or mappings to the https://service.unece.org/trade/uncefact/vocabulary/rec20/# service (or similar). This would be easy to add to UOM if we had UNECE code mappings to UCUM. If not I could probably pull from your work to make mappings based on your QUDT/UNECE mappings.

UOM is certainly not trying to step on QUDTs toes in anyway. The main difference between the systems is that UOM allows for the automated creation of new units on the fly using UCUM codes, which resolve to our purl server immediately. Whereas with QUDT creating terms is a manual process (like most ontologies). Both approaches have their advantages and disadvantages. I'm personally in favor of all systems existing and doing the work to map outward to each other to get to more robust interoperability. So please continue any efforts to make requests to QUDT.

That being said, let me know @mgh128 or @dr-shorthair if there is any interest in creating UCUM/UNECE mappings, and we would certainly add those to UOM. Happy to help with that effort

Thanks!
Kai

@kaiiam
Copy link

kaiiam commented Feb 2, 2022

UOM also currently provides ttl and json downloads e.g. https://units-of-measurement.org/m.s-1?format=json-ld for json if that's helpful.

@mgh128
Copy link

mgh128 commented Feb 2, 2022

@kaiiam - thanks for this - and the willingness to add additional mappings.
I have now found and checked 159 missing mappings from QUDT to UN ECE Rec20 codes and plan to submit a pull request sometime this week to QUDT, as recommended by @dr-shorthair.
You and others are very welcome to use anything in https://github.com/mgh128/UnitUnity to improve cross-references between these systems.

GS1 standards mostly reference UN ECE Rec20 unit codes but we'd really like those 2-3 character alphanumeric codes to be easy to dereference (simply by appending to a common Web URI stem and making a Web request) and to provide multiplicative and additive conversion factors without requiring any string parsing. In some situations, GS1 colleagues have identified gaps in the coverage UN ECE Rec20 codes - and though we do formally request assignment of new Rec20 codes to address those gaps, some have begun using UCUM codes as an alternative within GDSN.

Unit conversion is becoming more important for GS1 as the EPCIS 2.0 standard under development is supporting reporting of business-relevant summary-level sensor data (not intended for embedding of raw data within EPCIS event data) and the situation can arise where the data is captured in one unit of measure but the query is formulated using a different but interconvertible unit of measure, which is why we developed this ( https://gs1.github.io/UnitConverterUNECERec20/ )

@steveraysteveray
Copy link

I was just about to publish a new release of QUDT, but I'll wait for @mgh128 to submit the PR so that it contains the new mappings...

@mgh128
Copy link

mgh128 commented Feb 2, 2022

Thanks, @steveraysteveray - I will try to submit it sometime within the next 18 hours.

@kaiiam
Copy link

kaiiam commented Feb 3, 2022

@mgh128

You and others are very welcome to use anything in https://github.com/mgh128/UnitUnity to improve cross-references between these systems.

👍 will do thank you.

we'd really like those 2-3 character alphanumeric codes to be easy to dereference (simply by appending to a common Web URI stem and making a Web request)

In UOM we solved that problem with UCUM codes by ordering the elements of the UCUM code then applying a url escap function to the final UCUM code to make the unique identifier for a term. For example imperial ton [lton_av] becomes https://w3id.org/uom/%5Blton_av%5D. Perhaps something similar might be useful for the UN ECE Rec20 unit codes? We settled on UCUM however, as it provided the most coverage of any existing code system.

@steveraysteveray

UOM plans to provide SSSOM formatted mappings between UOM terms (derived from UCUM) and all the resources we map to. As I'm sure you know @dr-shorthair has worked extensively on those (some of which are already in QUDT), but we are working to add more. Happy to share those with QUDT team once created and curated.

@stuchalk is engaged in similar efforts with UMIS and I think it would be ideal if we can all collaborate on creating/curating mappings between the various systems.

@steveraysteveray
Copy link

Just to let you know that Release 16 of QUDT has now been published, containing all the new UN ECE codes submitted by @mgh128. Also, the EDG interface and the SPARQL endpoint reflect the new release, in addition to the good old web pages.

@nissimsan
Copy link
Contributor

Amazing - you guys move fast!
Thank you!!

@nissimsan
Copy link
Contributor

I'll close this ticket. Again: awesome - we're proud that this link has been established! :)

@nissimsan
Copy link
Contributor

(Noting that there is still some work to be done on our end: #24)

@mgh128
Copy link

mgh128 commented Apr 1, 2022

We're also very happy about the collaboration. Let us know if you'd like any help with (or review of) your team's work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants