Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conflation of identifier: use class traceability:Identifier #571

Open
VladimirAlexiev opened this issue Sep 19, 2022 · 7 comments
Open

conflation of identifier: use class traceability:Identifier #571

VladimirAlexiev opened this issue Sep 19, 2022 · 7 comments
Assignees
Labels
1.0 version 1.0

Comments

@VladimirAlexiev
Copy link
Contributor

VladimirAlexiev commented Sep 19, 2022

11 schemas use a URL containing /identifier multiple times:

$ grep -cr '/identifier' .|grep -vE ':(0|1)'
./common/BindingDataRegistrationCredential.yml:2
./common/CrudeOilProduct.yml:2
./common/EntrySummary.yml:5
./common/ImmediateDelivery.yml:2
./common/Inbond.yml:6
./common/NAISMARecordLeveldentifiers.yml:3
./common/NaturalGasProduct.yml:2
./common/ppq203.yml:2
./common/SteelProduct.yml:2
./common/TransferEvent.yml:2
./common/UsdaSc6.yml:5

After #570 is fixed, this would probably mean they use the same schema.org/identifier URL multiple times.
This probably means they conflate several different identifiers in one field.
I can see two cases, eg:

  1. NaturalGasProduct.yml
  UWI:
    title: Unique Well Identifier
    description: Unique Well Identifier used for individual well identification.
  HSCode:
    title: HSCode
    description: Defines the Harmonized System Code for the Commodity

This conflates two different objects (the well and the gas extracted from it) into the same RDF prop identifier.

  • That's because both are attached to the same object: NaturalGasProduct.UWI, NaturalGasProduct.HSCode.
  • This is despite having sub-objects where the identifiers can be better attached: NaturalGasProduct.facility.UWI, NaturalGasProduct.product.HSCode
  • However, it's hard to express this modeling construct: "NaturalGasProduct is a class where facility has a field UWI"
  • It's easier to express this with inheritance (another argument to fix use inheritance not aggregation #277)
    • "NaturalGasProduct is a subclass of Product and adds a field HSCode, and retargets field facility to OilAndGasFacility"
    • "OilAndGasFacility is a subclass of Facility (or Place) that adds field UWI"
  1. One of the worst offenders is EntrySummary that conflates 5 values to the same RDF prop identifier, eg below two identifiers of entry and of manufacturer are conflated:
    "entryNumber": "73461882610",
    "manufacturerId": "2300912",
  1. ImmediateDelivery.yml has a slightly different problem:
    "assignedIdentifier": "12345678",
    "assignedIdentifierType": "CBP",
    "entryNumber": "A123456",
    "lineItems": [
        "itemParty": {
          "assignedIdentifier": "12345678",
          "assignedIdentifierType": "CBP"
  • it separates assignedIdentifier to two different objects (ImmediateDelivery vs Party): ok
  • but it conflates entryNumber and assignedIdentifier to the same RDF prop identifier
  • Also, it fails to allow multiple assignedIdentifier. The fact that assignedIdentifierType accommodates multiple agencies suggests that it should allow multiple identifiers.

The cleanest way to solve all these cases is to use "structured identifiers", i.e. simple records that record the identifier value, but also its type ("propertyID").
And don't use specific sub-properties of identifier (which would be redundant with this identifier type).
Using schema.org, this can be expressed as follows in turtle (prefixes omitted for brevity).
The URLs also reflect some "URL policy" for making URLs of sub-objects rather than using blank nodes:

<naturalGasProduct/1> a :NaturalGasProduct :identifier <naturalGasProduct/1/id/1>, <naturalGasProduct/1/id/2>;
   :place <facility/1>.
<naturalGasProduct/1/id/1> a :PropertyValue; :propertyID "HSCode"; :value "80123456".
<naturalGasProduct/1/id/2> a :PropertyValue; :propertyID "GTIN"; :value "56190358290187694".

<facility/1> a :OilAndGasFacility; :identifier <facility/1/id/1>, <facility/1/id/2>.
<facility/1/id/1> a :PropertyValue; :propertyID "UWI"; :value "123456".
<facility/1/id/2> a :PropertyValue; :propertyID "GLN"; :value "56109258249087".

Better, you can define traceability:Idenifier as a subclass of :PropertyValue specialized for expressing structured identifiers.
You could also record extra data such as issuer, date issued, valid until, etc.
adms:Identifier has similar stuff, and we used it in the euBusinessGraph ontology.

This was referenced Sep 19, 2022
@VladimirAlexiev VladimirAlexiev changed the title conflation of idenifier: use class traceability:Identifier conflation of identifier: use class traceability:Identifier Sep 19, 2022
@TallTed

This comment was marked as resolved.

@brownoxford
Copy link
Collaborator

Discussed on call, @mkhraisha to review.

@mkhraisha mkhraisha self-assigned this Feb 28, 2023
@mkhraisha
Copy link
Collaborator

I believe the ask here is use schema.org/identifier for the identifiers used to identify the specific object and to use https://schema.org/propertyID for other identifiers for example in the NaturalGasProduct.yml we would have:

  1. identifier for HScode
  2. propertyID for UWI

@mkhraisha
Copy link
Collaborator

I didn't attend to this, will work on it soon.

@nissimsan
Copy link
Collaborator

@mkhraisha, progress on this?

There are also parts of this I should do, assigning myself as well.

@VladimirAlexiev
Copy link
Contributor Author

VladimirAlexiev commented Mar 12, 2024

@mkhraisha

I believe the ask here is use schema.org/identifier for the identifiers used to identify the specific object and to use https://schema.org/propertyID for other identifiers

No!

  • In its usual permissive manner, schema.org allows identifier to be pretty much anything, including an ambiguous string that doesn't describe WHAT it is (the conflation described by this issue leads to this exact problem).
  • propertyID is a way to specify the kind of identifier value.
  • It's not a good practice to single out one special identifier kind and use a plain string for it, but a structured Identifier for the others: it's best to always use a structured identifier
  • I made describe all IdentifierSystems relevant for Trade and Logistics #944 to capture data about identifier kinds: tr:IdentifierSystem
  • It leaves aside the distinction between product (natural gas) and facility (oil well) for pedagogical reasons.

@mkhraisha mkhraisha added the 1.0 version 1.0 label Jul 11, 2024
@mkhraisha
Copy link
Collaborator

mkhraisha commented Jul 11, 2024

Will take care of this issue soon.
We should have one person clean up the credentials for their vertical:

use the Structured Value With Prefix system as laid out in #944

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0 version 1.0
Projects
None yet
Development

No branches or pull requests

5 participants