-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
collaboration between UNECE Rec20 and QUDT #32
Comments
Would it be helpful if we do the following: For QUDT, provide a JSON-LD dataset equivalent to http://qudt.org/2.1/vocab/unit (currently in Turtle), using the improved JSON-LD context / framing ideas suggested by @VladimirAlexiev at qudt/qudt-public-repo#386 (comment) For UN/CEFACT, provide an enhanced version of https://service.unece.org/trade/uncefact/vocabulary/rec20.jsonld in which (as a minimum) we add the correct pure numeric conversion multiplier and conversion offsets (perhaps just by extracting those from the equivalent terms in QUDT). Also prepare an equivalent dataset in which the identifiers share a common Web URI stem but terminate in a forward slash followed by the UN ECE Rec20 code value such as 'KGM'. I think if we did this, then it should help QUDT users who would like the QUDT dataset in JSON-LD or JSON, while also making the UN/CEFACT Rec20 Linked Data dataset more usable by software. Potentially the latter could even include cross-reference links to equivalent unit terms within QUDT, so that the mutual recognition is two-way, rather than (as currently) only one-way from QUDT to the UN/CEFACT Rec20 code values (though not yet to UN/CEFACT Linked Data Web URIs based on those Rec20 code values). If these suggestions would be helpful, I'd be happy to contribute to developing the above. |
Sounds like a reasonable plan, mate! But what to do about the customary/weird Rec20 units like "military sticks"? Just leave them with little data? |
I believe that version is the 'source of truth' for building the QUDT web site. Is the requirement here that there is a single file available for download, in JSON to keep the webbies happy? |
Indeed, the Turtle files are the master representations. |
Admittedly I struggle to (find time to) keep up. But this sounds like a sketch of a Rosetta Stone for units. Hard not to get excited about the prospects. The only thing I will note is that the starting point of this project is that everything gets done by NDR-based transformations. However, I'm personally very willing to give up that constraint in exchange for a (freakin') Rosetta Stone. Especially considering the isolated and well-defined scope of Rec20. Looking very much forward to seeing your contributions on this...!! :) |
Hi @nissimsan It sounds as though the existing Turtle files are the master (rather than the master being a database from which the Turtle files are generated by some script) What I was proposing is that we develop a repeatable procedure / script using tools such as https://github.com/digitalbazaar/jsonld.js/ to convert whatever the current Turtle source file is into the equivalent JSON-LD using a context resource and framing to make the result as JSON-friendly as possible. I understand that it needs to be a procedure (e.g. node script / shell script) that is repeatable whenever the source Turtle file is updated. To combine the most helpful aspects of QUDT with the work from UN/CEFACT, I was thinking that it should be possible to construct one of more SPARQL CONSTRUCT / INSERT queries that would extract and generate the additional triples needed - using its result as an enhanced Turtle file, that is then also made available as JSON-LD as described above. |
NDR is Naming and Design Rules, I think. @steveraysteveray |
The OBO guys have begun a Rosetta Stone service here: https://units-of-measurement.org/ Right @kaiiam ? |
Looking at the Linked Data for https://units-of-measurement.org/m.s-1 and the Linked Data for https://units-of-measurement.org/kW.h there are QUDT does appear to link to the UCUM code value and UN ECE Rec 20 code value but primarily as strings or typed literals. For example, at http://qudt.org/vocab/unit/M-PER-SEC we can find the following triples:
( for the UCUM code - but not yet to a URI resource https://units-of-measurement.org/m.s-1 ) Having said that, QUDT appears to also have some gaps. For example, at http://qudt.org/vocab/unit/KiloW-HR we can find this triple: but we can not yet find the following triple to express the corresponding UN ECE Rec 20 code value:
So at this stage, I'm not convinced that UCUM is the only starting point for cross-referencing - or even the most appropriate starting point, since I have not yet found actual examples of UCUM pointing to UN ECE Rec20 code values. We can write SPARQL queries to try to match UCUM resources for units with QUDT resources for units - and since QUDT appears (based on a very small sample above) to have greater coverage of links to Rec20 code values, I think we can populate a third column with the Rec20 code values and see where we have gaps. Filling in those gaps provides useful feedback to improve QUDT. From that third column, we can find terms within https://service.unece.org/trade/uncefact/vocabulary/rec20.jsonld whose I hope that doing this preparatory work of a four-column mapping table might even motivate our colleagues at UN/CEFACT to consider creating equivalent resources formed from a common Web URI stem appended with the Rec20 code value. |
The range of Yes, QUDT does already have some (many? all?) of the UNECE codes as annotations, so those codes can be traced starting from https://units-of-measurement.org but I suspect we could do better. Mind you, the ambitions of https://units-of-measurement.org are modest - just to expose mappings between these URI sets. After a fairly thorough evaluation, it was decided that UCUM was the best option for the keys.
UCUM would not do this. As far as Regenstrief and LOINC are concerned, UCUM is self-contained and fit for their purpose. It is widely deployed in medical and health contexts. I have not evaluated NDR so have no opinion on whether this has the same scope and could be an alternative. But UCUM is very very good already. Note that if you find holes in the QUDT annotations, then just raise and issue https://github.com/qudt/qudt-public-repo/issues or better still, submit a PR https://github.com/qudt/qudt-public-repo/pulls - they tend to get processed very quickly and the whole QUDT service gets rebuilt every month or thereabouts. |
I've now done some initial experiments using SPARQL CONSTRUCT - please see https://github.com/mgh128/UnitUnity |
Thanks for the mention @dr-shorthair. Yes one goal of UOM is to use UCUM codes as a Rosetta stone between existing systems. @mgh128 UOM currently doesn't support UN ECE Recommendation 20 unit codes but I'm happy in include such mapping either as codes or mappings to the https://service.unece.org/trade/uncefact/vocabulary/rec20/# service (or similar). This would be easy to add to UOM if we had UNECE code mappings to UCUM. If not I could probably pull from your work to make mappings based on your QUDT/UNECE mappings. UOM is certainly not trying to step on QUDTs toes in anyway. The main difference between the systems is that UOM allows for the automated creation of new units on the fly using UCUM codes, which resolve to our purl server immediately. Whereas with QUDT creating terms is a manual process (like most ontologies). Both approaches have their advantages and disadvantages. I'm personally in favor of all systems existing and doing the work to map outward to each other to get to more robust interoperability. So please continue any efforts to make requests to QUDT. That being said, let me know @mgh128 or @dr-shorthair if there is any interest in creating UCUM/UNECE mappings, and we would certainly add those to UOM. Happy to help with that effort Thanks! |
UOM also currently provides ttl and json downloads e.g. https://units-of-measurement.org/m.s-1?format=json-ld for json if that's helpful. |
@kaiiam - thanks for this - and the willingness to add additional mappings. GS1 standards mostly reference UN ECE Rec20 unit codes but we'd really like those 2-3 character alphanumeric codes to be easy to dereference (simply by appending to a common Web URI stem and making a Web request) and to provide multiplicative and additive conversion factors without requiring any string parsing. In some situations, GS1 colleagues have identified gaps in the coverage UN ECE Rec20 codes - and though we do formally request assignment of new Rec20 codes to address those gaps, some have begun using UCUM codes as an alternative within GDSN. Unit conversion is becoming more important for GS1 as the EPCIS 2.0 standard under development is supporting reporting of business-relevant summary-level sensor data (not intended for embedding of raw data within EPCIS event data) and the situation can arise where the data is captured in one unit of measure but the query is formulated using a different but interconvertible unit of measure, which is why we developed this ( https://gs1.github.io/UnitConverterUNECERec20/ ) |
I was just about to publish a new release of QUDT, but I'll wait for @mgh128 to submit the PR so that it contains the new mappings... |
Thanks, @steveraysteveray - I will try to submit it sometime within the next 18 hours. |
👍 will do thank you.
In UOM we solved that problem with UCUM codes by ordering the elements of the UCUM code then applying a url escap function to the final UCUM code to make the unique identifier for a term. For example imperial ton [lton_av] becomes https://w3id.org/uom/%5Blton_av%5D. Perhaps something similar might be useful for the UN ECE Rec20 unit codes? We settled on UCUM however, as it provided the most coverage of any existing code system. UOM plans to provide SSSOM formatted mappings between UOM terms (derived from UCUM) and all the resources we map to. As I'm sure you know @dr-shorthair has worked extensively on those (some of which are already in QUDT), but we are working to add more. Happy to share those with QUDT team once created and curated. @stuchalk is engaged in similar efforts with UMIS and I think it would be ideal if we can all collaborate on creating/curating mappings between the various systems. |
Just to let you know that Release 16 of QUDT has now been published, containing all the new UN ECE codes submitted by @mgh128. Also, the EDG interface and the SPARQL endpoint reflect the new release, in addition to the good old web pages. |
Amazing - you guys move fast! |
I'll close this ticket. Again: awesome - we're proud that this link has been established! :) |
(Noting that there is still some work to be done on our end: #24) |
We're also very happy about the collaboration. Let us know if you'd like any help with (or review of) your team's work |
@dr-shorthair @steveraysteveray @nissimsan @mgh128
Now that we know UNECE are actively engaging in Linked Data (eg see #24),
what's the best way for the two initiatives to collaborate?
Here are some thoughts #24 (comment), please contribute more ideas or arguments.
The text was updated successfully, but these errors were encountered: