Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt OWL duck typing #438

Open
plbt5 opened this issue Aug 9, 2022 · 12 comments
Open

Adopt OWL duck typing #438

plbt5 opened this issue Aug 9, 2022 · 12 comments

Comments

@plbt5
Copy link
Contributor

plbt5 commented Aug 9, 2022

Background

UCO has implemented Duck Typing for already a long time by application of the facet pattern. As indicated by the UCO Design Document, section 5:

[A facet] serves to enable Duck Typing as described in Section 5.1 below and flexible characterization of subclasses of observable objects through combinations of facets.

The facet pattern brings about several drawbacks/problems, and we propose to implement Duck Typing by using standard OWL constructs only.

Problem 1 - inconsistent data

Facets, and particularly their subclassing, allow the following inconsistent construct to emerge:

kb:dns-server-set-1
  uco-core:description "This is the pair of DNS servers I found on this system.  I found 2, needed a place to stash them.  Unsure what ObservableObject subclass to use." ;
  uco-core:hasFacet
    [
      a observable:DigitalAddressFacet ;
      observable:addressValue "1.1.1.1" ;
    ] ,
    [
      a observable:IPAddressFacet ;
      observable:addressValue "8.8.8.8" ;
    ]
    ;
  .

This example shows that the subclass convenience accidentally creates a spot to record inconsistent data.

Problem 2 - strange, if not invalid, implicit commitment to reality

The intended application of the facet pattern is to allow Duck Typing, e.g., a query returns things that have storage capabilities without being enforced to be specified as storage devices:

SELECT ?nThing
WHERE {
  ?nThing core:hasFacet ?nFacet .
  ?nFacet observable:hasStorageCapacityInBytes ?lCapacity .
}

Unfortunately, the absence of an explicit commitment to a type allows for flexibility in a way that can result in weird data, e.g.,

kb:person-1
    a uco-identity:Person ;
    uco-core:hasFacet [
        a uco-observable:StorageDeviceFacet ;
        uco-observable:storageCapacityInBytes 2199023255552 ;
    ] ;
    .

That example is a perfectly UCO-0.9.0-conformant manner of representing a person who has a 2TB hard drive in their pocket. However, as opposed to "a person carrying a device that has that storage capacity", what the triples actually assert is that "the person itself has storage capacity". This represents an invalid state of affairs, or is at least a rather inaccurate representation of the actual state of affairs: a human cannot be considered to be a storage device, or to have storage capabilities.

One could argue that stakeholders won't construct such weird semantics, however, a community member has said they would happily assign a location:LatLongCoordinatesFacet to a observable:RasterPicture (a subclass of observable:File) if that picture file was a JPEG with lat/long coordinates embedded in its EXIF.

Problem 3 - absence of explicit commitment

Despite the requirement to not enforce strict data typing, the facet defines a set of characteristics in order to represent something. This implies that each and every facet, by token of its specified characteristics, represents a certain typology implicitly: although the facet does not name the type of the typology, the typology de-facto applies.

In accordance to the above SPARQL example, the fact that there is no name attached to the category does not prevent us to conclude that the laptop (a computer) and the iPod (a music player) are similar devices as the hard disk, viz a storage device. The question at the heart of the issue is: do we commit to the conclusion that the laptop, the iPod and the hard disk are instances of one type of thing?

Yes: commit to one type of thing

If we answer the question affirmative, then we accept the behaviour of the facet to commit to the existence of a particular type of thing that gathers computers, music players and storage devices, e.g., devices that allow storage of data. Since we commit to it, there is no reason not to attach a name to the type, e.g., ex:DeviceAllowingDataStorage. Consequently, the following three statements are consider valid:

   kb:MyFirstIPod  a ex:DeviceAllowingDataStorage .
   kb:MyNewLaptop  a ex:DeviceAllowingDataStorage .
   kb:MyFlashDrive a ex:DeviceAllowingDataStorage .

which implies that we have successfully characterised a type of thing by means of its characteristics: indeed a proper implementation of Duck typing.

No: these are different types of thing

If we answer the question negative, then the characterisation of the facet can be considered incomplete or otherwise invalid. We either have to add more characteristics to differentiate between the distinct types of thing, or we have mistakenly conflated the semantics of is_a with those of has_a. In any case, we have not implemented duck typing correctly.

When we combine both answers, then the conclusion is that facets either do NOT implement duck typing or do properly implement duck typing but in an OWL-unfamiliair approach. Considering that it was the intention of facets to make the distinction based on characteristics, i.e., duck typing, I'm inclined to acknowledge the objective of the facet, viz. duck typing is a necessary capability to support, but consider the design pattern to its implementation incorrect due to the absence of the explicit commitment to the existence of the type.

Problem 4 - OWL-unfamilair approach

Another consequence of the application of facets is that this is not how the Semantic Web, i.e., the OWL language, has been designed to work. The facet design pattern is not part of OWL in the sense that it is recognised as such and conclusions are drawn from it out-of-the-box: no code exists to process this design pattern. Consequently, none of the tools that are compliant to OWL will be able to process this design pattern and show the intended behaviour. If one requires the intended behaviour, this behaviour is to be implemented next to the OWL technology by each and every stakeholder that has interest in this behaviour.

This characteristic might be allowed for a local solution to a local problem, however, for a worldwide standard this is odd. Moreover, it is very problematic since it enforces local additions to the technology, additions that might even be stakeholder dependent.

Problem 5 - Undefined relationship with core:UcoObject

Although "A facet is a grouping of characteristics unique to a particular aspect of an object" (Definition of core:Facet), no definition exists about the relationship that apply between the facet and the object. Two related problems arise with the absence of defining the relation:

  1. It presents an incompleteness of the facet since the relationship does exist by merit of the facet's definition, and the current design impedes to commit to the relation and what it characterises.
  2. No constraints can be defined to exist about the relation, which implies that anything can be asserted to and inferred from it, leading to invalid semantics.

Requirements

The requirements for Duck typing have been specified already in the UCO Design Document, section 5.1, as three separate requirements.

Requirement 1

CASE uses duck typing which allows data to be defined by its inherent characteristics rather than enforcing strict data typing.

Requirement 2

CASE objects can be assigned any rational combination of facets, such as a file that is an image and a thumbnail. When employing this approach, data types are evaluated with the duck test, allowing data to be represented more truly without imposing a rigid class structure. (...)

Requirement 3

For certain common combinations of facets, it is possible to assign them a higher-level class, such a PDF File or WhatsApp Message.

Risk / Benefit analysis

Benefits

Replacing the facet pattern with an OWL-familiair Duck Typing capability, removes the need for stakeholders to provide for additional code to support the intended behaviour of the facet, viz Duck Typing, allows for consistency in its allowed data, and creates the ability to commit explicitly to the inferred typology.

The facet pattern has been used since the start of the development of UCO, and has received questions and confusion since. Replacing it with an OWL supported pattern will clarify how Duck Typing can be applied within UCO. We therefore recommend the CP's implementation in version 1.0.0 to consolidate this clear, supported and simple form of Duck Typing as opposed to suggest that the facet pattern is a necessary pattern for the UCO standard.

Risks

This CP can be considered a significant overhaul of the UCO design with the risk that community members might decide to turn away from the UCO initiative, given the effort required to implement the change.

Competencies demonstrated

Competency 1

Duck typing: When something has one or more properties, infer that it belongs to the category identified by those properties, e.g., assume that everything that allows to store data is a storage device.

Competency Question 1.1

Provided the following data:

kb:object-1 ex:hasStorageCapacityInBytes 2000000000000 .

What is the type of thing the individual kb:object-1 represents? In SPARQL:

SELECT ?t
WHERE {
  kb:object-1 a ?t .
}

Result 1.1

The following triple shall be inferred:

kb:object-1 a ex:StorageDevice .

Competency 2

In terms of the UCO DD:

If particular characterizing properties are directly relevant to the object across most use cases they are typically defined as properties directly on the class/object but where they may be characterizing a particular aspect of an object relevant to some but not all use cases they are typically defined as a facet that can be applied to the class/object when appropriate.

Competency Question 2.1

Provided the following two sets of data on the same individual:

kb:raster-picture-f970b1a2-c6f1-4082-a2fb-3e8f4a7913b2 
   observable:pictureType   "jpg" ;
   observable:pictureHeight 12345 ;
   observable:pictureWidth  12345 ;
   observable:bitsPerPixel  2 ;
.
kb:raster-picture-f970b1a2-c6f1-4082-a2fb-3e8f4a7913b2
    observable:fileName       "IMG_0123.jpg" ;
    observable:filePath       "/sdcard/IMG_0123.jpg" ;
    observable:extension      "jpg" ;
    observable:sizeInBytes    35002 ;
.

What is the type of thing the individual kb:raster-picture-f970b1a2-c6f1-4082-a2fb-3e8f4a7913b2 represents? In SPARQL:

SELECT ?t
WHERE {
  kb:raster-picture-f970b1a2-c6f1-4082-a2fb-3e8f4a7913b2 a ?t .
}

Result 2.1

The following triples shall be inferred:

kb:raster-picture-f970b1a2-c6f1-4082-a2fb-3e8f4a7913b2 a observable:RasterPicture .
kb:raster-picture-f970b1a2-c6f1-4082-a2fb-3e8f4a7913b2 a observable:File .

Competency 3

Infer that a datum is a member of a higher-order class, i.e., a superclass, based on the same Duck Typing properties.

Competency Question 3.1

Provided the following data:

ex:dns-server-address-1 observable:addressValue "1.1.1.1" .

What is the (super)type of thing the individual kb:object-1 represents? In SPARQL:

SELECT ?t
WHERE {
  ex:dns-server-address-1 a ?t .
}

Result 3.1

The following triples shall be inferred:

ex:dns-server-address-1 a observable:DigitalAddress . 
ex:dns-server-address-1 a observable:IPAddress .

Solution suggestion

The examples apply namespace abbreviations to separate between their definition as reusable knowledge base, kb:, or as exemplifying data to assert a certain state of affairs, ex:.

Solution part 1

Use rdfs:domain and rdfs:range statements to implement Duck Typing, as opposed to the facet pattern, for each facet that has been specified as rdfs:subClassOf core:Facet.

Solution implementation

This implies the following modifications:

  1. Add, for each facet, a Class with the name and definition of the facet, e.g.:
  • observable:FileFacet --> observable:File a owl:Class.
  • If this concept already exists, this step can be omitted.
  1. Add, for each characteristic in the facet, a similar relation to the above Class, e.g.:
  • sh:path observable:fileName ; -->
    • observable:fileName rdfs:domain observable:File .
    • observable:fileName a rdfs:DatatypeProperty .
    • Note that the type of the property can also be an rdfs:ObjectProperty, dependent on the range of the characteristic.
  • You might want to maintain/add the other SHACL constraints for each characteristic to the property just defined.
  1. Remove the facet.

Explanation: The use of rdfs:domain and rdfs:range statements.

Consider the following knowledge graph:

kb:D kb:p kb:R .
kb:p rdfs:domain kb:D ;
   rdfs:range kb:R .

Note:

  • The first knowledge rule, kb:D kb:p kb:R, only defines that p is used to relate D to R. This allows us to say that ex:Shakespeare kb:wrote ex:Hamlet, and subsequently, to get an answer to the question who wrote Hamlet (SELECT ?a WHERE { ?a kb:wrote ex:Hamlet } ==> ex:Shakespeare).
  • The second and third knowledge rules, as opposed to object-orientation, the rdfs:domain and rdfs:range properties do NOT mean to validate data, i.e., that an instance of the specified object MUST HAVE the specified property. In stead, it is used the other way around to establish the type of a datum. For instance, if a datum applies the property about storage capacity, then that datum is considered to belong to the category of storage device. In pseudo code:
IF
  kb:p rdfs:domain kb:D .
AND
  ex:x kb:p ex:v .
THEN
  e:x a kb:D

Formalised in SPARQL, this results in:

Domain Construct Rule
=====================

CONSTRUCT {
  ?x a ?D .
}
WHERE {
  ?p rdfs:domain ?D .
  ?x ?p ?y .
}

Similarly, rdfs:range statements can be made to infer something to be of a certain type based on the range of a property:

Range Construct Rule
====================

CONSTRUCT {
  ?y a ?R .
}
WHERE {
  ?p rdfs:range ?R .
  ?x ?p ?y .
}

In contrast to the facet, both CONSTRUCT rules are already part of the set of inference rules that belong to OWL and do not need to be specified; only the domain and range relations that are used as input to these rules are required to be specified.

(CASE users may already have seen some of the impact of these CONSTRUCT queries. The RDFLib OWL-RL library provides "Graph expansion" features that perform some of this constructive inference. Users of case_validate can make use of the features via the --inference flag. RDFS inferencing runs those above CONSTRUCTs for rdfs:domain and rdfs:range statements that directly reference classes. OWL inferencing can function with more nuanced domains and ranges, involving anonymous classes and owl:unionOf / owl:intersectionOf.)

Conformance to competencies

CQ 1

For example:

  1. Assume the following knowledge definition, specifying the relationship ex:hasStorageCapacityInBytes as a property to the class ex:StorageDevice.
kb:hasStorageCapacityInBytes
  a owl:DatatypeProperty ;
  rdfs:domain kb:StorageDevice .
  1. Then, if we are provided with the following data triple
ex:object-1 kb:hasStorageCapacityInBytes 2000000000000 .
  1. the following triple can be inferred:
ex:object-1 a kb:StorageDevice .

This meets CQ 1.

CQ 2

Consider the knowledge that:

  1. everything that has a file name is member of the type kb:File
kb:fileName
  a owl:DatatypeProperty ;
  rdfs:domain kb:File .

2: everything that has a pictureType is member of the type kb:Picture

kb:pictureType
  a owl:DatatypeProperty ;
  rdfs:domain kb:Picture .
``

3. Assume the following data is asserted:

```turtle
ex:object-2 kb:fileName "IMG_0123.jpg" ;
  kb:pictureType "jpg" .

Then the following triples will be inferred:

ex:object-2 a kb:RasterPicture .
ex:object-2 a kb:File .

This meets CQ2

Solution part 2

Combine domain and range statements with rdfs:subClassOf in order to apply subclassing in the Duck Type pattern.

Solution implementation

This implies similar modifications as indicated in Part 1:

  1. Add, for each superfacet, a Class with the name and definition of the superfacet, e.g.:
  • observable:DigitalAddressFacet --> observable:DigitalAddress a owl:Class.
  • If this concept already exists, this can be omitted.
  1. Remove properties that are already available in its subfacets, e.g.:
  • sh:property [ sh:path observable:addressValue ] --> NIL
  1. Add, for each subfacet of this superfacet, a Class with the name and definition of the subfacet as subclass to the superclass, e.g.:
  • observable:IPAddressFacet rdfs:subClassOf observable:DigitalAddressFacet --> observable:IPAddress rdfs:subClassOf observable:DigitalAddress.
  1. Add, for each characteristic in the facet, a domain relation to the above Class, e.g.:
  • sh:path observable:addressValue ; -->
    • observable:addressValue rdfs:domain observable:IPAddress .
    • observable:addressValue a rdfs:DatatypeProperty .
    • Note that the type of the property can also be an rdfs:ObjectProperty, dependent on the range of the characteristic.
  • You might want to maintain/add the other SHACL constraints for each characteristic to the property just defined.
  1. Remove the facet.

Explanation: combination of inference patterns

The Type Propagation Rule

The basic subclassing inference is induced by kb:B rdfs:subClassOf kb:A. The meaning for rdfs:subClassOf is given by the statements that are inferred from it. In pseudo code:

IF
  kb:B rdfs:subClassOf kb:A .
AND
  ?x a kb:B .
THEN
  ?x a kb:A .

This has been formalised (and included by default) as a knowledge rule in OWL:

Type Propagation Rule
=====================

CONSTRUCT {
  ?x a ?A .
}
WHERE {
  ?B rdfs:subClassOf ?A .
  ?x a ?B .
}

Combination of Type Propagation with Domain and Range

The purpose of this combination is to infer that when it is asserted that the rdfs:domain of a property is a particular class, then it can be inferred that the property also has the superclass of the particular class in its domain. This also holds for rdfs:range properties.

CONSTRUCT { ?p rdfs:domain ?A . }
WHERE {
  ?p rdfs:domain ?B .
  ?B rdfs:subClassOf ?A .
}

Conformance to competencies

CQ 3

For example, by specifying the knowledge graph:

observable:IPAddress rdfs:subClassOf observable:DigitalAddress

and adding the datum triple

ex:dns-server-address-1 observable:addressValue "1.1.1.1" .

allows to infer that ex:dns-server-address-1 rdf:type observable:DigitalAddress.

This meets CQ 3.

Conclusion

In conclusion, we only need to specify:

  1. the knowledge rule that a property is used in a characteristic way, and
  2. the knowledge about a subclassing relation, and
  3. the data to represent the value of that property,

in order to induce this particular behaviour of Duck Typing in regular OWL as opposed to adopt the unclear and complicated facet pattern.

@ajnelson-nist
Copy link
Contributor

CASE and UCO lack a formal definition of "Duck typing," and I believe that is the source of much confusion among committee members.

By my understanding: Informally, "Duck typing" has implied a combination of two methods of classifying objects, and optionally a third entailment:

  • (M1) Method 1: Classifying objects by reviewing the set of explicit type assignments, each type assignment asserting the object is a member of a class.
  • (M2) Method 2: Characterizing objects by reviewing the object's capabilities, i.e. property-value assignments. This method is agnostic to explicit type assignments, and does not induce class assignment by itself.
  • (E1) Entailment of method 1 and method 2: Assigning a class-membership based on an object capability.

In OWL, inferencing capabilities bring M1, M2, and E1 together. owl:Restrictions and/or OWL rdfs:domain and rdfs:range interpretations can lead an inferencing engine to assign types as an "entailed ontology" (a graph including inferred triples).

I'm not sure CASE and UCO's interpretation of "Duck typing" is more than only M2. I am aware that early drafts of UCO attempted OWL domains for properties erroneously in implementation, and in purpose - domains were attempted for validation use, but OWL doesn't do data validation. Hence we went to SHACL.

CASE and UCO need to formalize their interpretation of "Duck typing," and relate it to OWL's, for us to understand the merits of this proposal. For instance, we must understand what is, and is not, meant to be entailed by:

  • An object having a observable:FileFacet attached via core:hasFacet.
  • An object having observable:fileName associated with it (whether or not any Facet is involved).

@plbt5
Copy link
Contributor Author

plbt5 commented Aug 12, 2022

@sbarnum, my understanding of the facet has been fuzzy from the beginning. @ajnelson-nist has not been able to clarify it for me. You and I have not had the time nor the incentive to discuss this.

My understanding of the design principle, as described in the UCO design document section 5, is to provide the capability to separate the object from properties.

Question: is my understanding correct? If so, please provide the capabilities that justify the principle. If not, please provide the essence of your interpretation of the principle.

@plbt5
Copy link
Contributor Author

plbt5 commented Aug 15, 2022

In ontology engineering, the prime purpose is to define categories of things. More specific, to commit to the existence of those categories. If we look at a cup of coffee, we all share the intuition that the cup is different from the coffee, and that the two show different behaviour. This is based on what in ontology-speak is termed the "Principle of Identity": what makes that we can point at things in reality and collect things that are very similar to the cup, and other things that are very similar to the coffee. In short: what identifies something as a cup and what as coffee. The answer is probably along the lines of: the cup can hold coffee, whereas the coffee cannot hold something but requires something to hold it; on dividing the coffee in two, both parts remain coffee, whereas dividing the cup turns it disfunctional.

The significance about these answers is that they “[...] do not ask for what there is, but for what a given remark or doctrine [...] says there is” (Quine). The prime purpose of UCO and CASE is to make distinctions from the perspective of the Cyber Community; If we adopt a concept, then UCO acnowledges that such particular thing exists in the Cyber Domain: UCO commits to its existence. This is magnified by the objective of UCO to become the standard in the Cyber Domain. In order to fulfill the objective and apply ontology to achieve its purpose, we need the principle of identity. And the one and only means to implement the principle of identity in ontology is to specify what it means to be of a certain kind, to be member of a certain category. In other words, define a Class with a unique set of intentions that remain invariant over all its individual members (instances). (This does not imply, btw, that each and every class implements the principle of identity, there is also the principle of application.)

By introduction of the Facet, and by insisting on the potential to decouple between the class and its characterising properties as specified by a facet, the capability to evade the principle of identity has been provided. Because:

  1. by using a facet, we commit to specific properties that are characteristic to something;
  2. by disallowing facets as stand alone representation of things that we acknowledge in reality, we don't commit to these things and deny the principle of identity;
  3. by allowing a collective of multiple facets to "define" a certain category of things, we are capable of combining things that potentially carry fundamental different characteristics, that commit to incompatible characteristics.

In conclusion, by ignoring ontological rigor in general and the principle of identity specifically, the model that is being created is not an ontology anymore. Whether it is modeled in OWL or not is irrelevant.

(Note that each and every facet commits to a particular set of characteristics, which, by ontological definition, represents a category of things. I.e., it commits to the existence of such category. Consequently, not following ontological rigor does not imply "no ontology applies", but implies "this categorisation applies" by token of the definition of the category. In other words, the opposite of ontology is not "non ontology" but "bad ontology". )

@ajnelson-nist
Copy link
Contributor

I agree with @plbt5 's remark, and also have some engineering-inspired unsettlements with Facet.

Facets, to me, have long had a smell of some kind of UML artifact encroaching into ontological modeling in a fairly odd, and I believe ultimately harmful, manner.

They have proven difficult to evolve, and have harmed a non-zero number of change proposals. Issue 370 got stuck when we realized the issue enabled (with our understanding of Facets at the time) defining independent Facets on one object with disagreeing values on a Boolean property.

They are a significant discomfort to program, because at least in Python, UcoObjects need to maintain dictionaries of Facet objects by (as best as I've been able to design) facet-class IRI, and somehow provide a property-forwarder between programming objects in order to store properties. UCO's usage of contextual interpretation of properties, particularly observable:sizeInBytes, means the a value for sizeInBytes can't be assigned on the UcoObject. Keeping track of those Facet instance references is awkward.

They are restricted in ways that make them seem like ...organelles of objects, is the best term I can think of. We've had misunderstandings and disagreements with whether it's ever appropriate to reference them with properties aside from core:hasFacet, and concluded no, but that took a long time to get an explicit stance on.

It would be my strong preference to remove the notion of Facet. But, we are very late in the release cycle for 1.0.0, and we would need a substantial modeling decision on how to treat properties that are "inhering" (which currently solely reside on Facet subclasses), versus properties that somehow demand qualification with a Relationship object, versus properties that are both (such as a spatial relationship that is inhering, but needs annotations---e.g. a file's location within a file system, currently necessitating a observable:DataRangeFacet on a Relationship). We are likely committed to having Facet endure at least the period between 1.0.0 and 2.0.0.

For all the modeling weaknesses that Facets enable in UCO, I think it would be a healthier way forward to look to the permitted arbitrary extensibility as opportunities to refine UCO's model, by finally embracing class disjointedness.

We have an example in this Issue's description that, in some sci-fi contexts, would be a cyborg - a person with a 2TB storage capacity. If the Ontology Committees agree "Please let's not permit that for now," we can stage for UCO 2.0.0 a disjointedness definition between identity:Person and observable:StorageDevice. We can discuss likewise for location:Location and observable:RasterPicture, though I'd be interested to see if we have someone lodge a defense of Augmented Reality applications.

I think this journey starts with making firmer commitments around what I'd named "M1", "M2", and "E1" in my prior comment.

@ajnelson-nist ajnelson-nist added this to the UCO 2.0.0 milestone Aug 17, 2022
@ajnelson-nist
Copy link
Contributor

When the subgroup meets to discuss this Issue, we should be aware of this demonstration in Oresteia:

casework/CASE-Examples#97

@plbt5
Copy link
Contributor Author

plbt5 commented Aug 19, 2022

I think it is necessary to call together the subgroup (@ajnelson-nist @sbarnum @eoghanscasey @plbt5), but only after @sbarnum has had the opportunity to describe his explanation on the Facet: Purpose and Approach.
@sbarnum please try to confine the explanation to the essences only, where possible.

@plbt5
Copy link
Contributor Author

plbt5 commented Aug 19, 2022

In response to @ajnelson-nist comment above:

The "duck typing" concept is usually used to mean the opposite of the "it's a duck" principle. The Martelli usenet posting: "In other words, don't check whether it IS-a duck: check whether it QUACKS-like-a duck, WALKS-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with." (Wikipedia talk, section Rewrite)

I understand the M1 definition to equal the "it's a duck" principle, and M2 as the actual "duck typing" concept. Please correct me if I've understood M1 or M2 incorrectly.

@Harm-van-Beek
Copy link

Harm-van-Beek commented Aug 26, 2022

In general: This reads like a very good idea to me. I can follow the way this could work and how it can make things easier.
The examples match the ideas of "Duck typing" we implemented in Hansken.

Having said that, I am not an ontologist.
This sounds like (already) a need for UCO 2.0 since this will definitely break the UCO interface/API (and thus also CASE 2.0).

Problem 1. Clear example.
Having said that, we do see "inconsistent" data in actual databases in seized devices. By accident (software bug), on purpose (manipulated data), or by design (e.g., multiple bodies in one email with different contents)

Problem 2. I do not see why it is weird to add LatLongCoordinationFacet to a picture having geo-location details in its EXIF.

Problem 3. My answer to the specific question would be: YES, the laptop, the iPod and the hard disk are instances of devices allowing data storage.
In the end, this is how "normal users" look at such devices. In my opinion, the actual implementation/technology of the devices should not be subordinate to this.

Problem 4. If OWL is the way to go with UCO, then UCO should follow the OWL approach (amongst others to benefit from the tools and techniques available for OWL).

@ajnelson-nist
Copy link
Contributor

Re: @plbt5

I understand the M1 definition to equal the "it's a duck" principle, and M2 as the actual "duck typing" concept. Please correct me if I've understood M1 or M2 incorrectly.

You understood me correctly.

@ajnelson-nist
Copy link
Contributor

Re: @Harm-van-Beek

...

This sounds like (already) a need for UCO 2.0 since this will definitely break the UCO interface/API (and thus also CASE 2.0).

Yes, this is under the 2.0.0 milestone.

Problem 1. Clear example. Having said that, we do see "inconsistent" data in actual databases in seized devices. By accident (software bug), on purpose (manipulated data), or by design (e.g., multiple bodies in one email with different contents)

Yes, ObservableObjects that expand their behaviors between multiple unexpected classes are something UCO should continue to support.

Problem 2. I do not see why it is weird to add LatLongCoordinationFacet to a picture having geo-location details in its EXIF.

Why it's weird is that LatLongCoordinatesFacet encourages a confusion of "is-a" vs. "has-a" relationships between classes. Suppose we didn't have Facets, and instead just had UcoObject subclasses. I see the temptation to put latitude and longitude annotations onto a JPEG file's node---it's right there! in the EXIF!. Say someone took a picture near "Null Island", and an analyst characterized that picture with ex:latitude and ex:longitude properties directly on it.

kb:jpeg-1
  a observable:RasterPicture ;
  ex:latitude "1.234"^^xsd:double ;
  ex:longitude "2.345"^^xsd:double ;
  .

If ex:latitude were defined like this:

ex:LatLongCoordinates
  a owl:Class ;
  rdfs:subClassOf location:Location ;
  .

ex:latitude
  a owl:DatatypeProperty ;
  rdfs:range xsd:double ;
  .

# And sim. for ex:longitude

then, as written above, there would be no OWL or RDFS expansion from putting ex:latitude on anything you wanted. However, typical modeling in RDFS and OWL uses rdfs:domain to associate a property with a class. Following that typical modeling pattern, we would also have this statement:

ex:latitude
  rdfs:domain ex:LatLongCoordinates ;
  .

If that domain statement is used, then the presence of ex:latitude on kb:jpeg-1 would expand its classes to include these inferred triples:

kb:jpeg-1
  a
    ex:LatLongCoordinates ,
    location:Location
    ;

ex:LatLongCoordinates would come from RDFS expansion of ex:latitude rdfs:domain ex:LatLongCoordinates, and location:Location would come from RDFS expansion of ex:LatLongCoordinates rdfs:subClassOf location:Location.

Would it make sense to you for a single graph node to be BOTH a location:Location and observable:RasterPicture? UCO should have something in place to separate physical-space phenomena from cyber-space concepts that are manifested only in bit streams. UCO currently does not make that separation, so we have to rely on end users' guts.

I believe the objective of associating a latitude and longitude with a picture is to say "This picture has a relationship with a location with lat Y and long X", not "This picture is a location with lat Y and long X." That is, however, a significant amount of top-level property design and class separation (using owl:disjointWith) that has not happened.

Facets let UCO users bypass some modeling questions on "is-a" vs. "has-a" relationships. There is a balance to strike here, and your next response highlights the other side of the balance.

Problem 3. My answer to the specific question would be: YES, the laptop, the iPod and the hard disk are instances of devices allowing data storage. In the end, this is how "normal users" look at such devices. In my opinion, the actual implementation/technology of the devices should not be subordinate to this.

Now, suppose I say to an analyst "Please image this desktop tower." A tower has characterizations of a storage device, so they say sure thinking it's an easy overnight for their one write blocker, unscrew the case, find eight hard drives in it, and realize what was meant by this graph node handed to them as part of the chain of custody:

kb:tower-5b2188da-67a2-40e5-842c-bb582874ca2b
  a observable:Computer ;
  core:hasFacet [
    a ex:StorageMediumFacet ;
    rdfs:comment "Heads up - the OS reports having 7TB storage.  Didn't know anyone made those.  Box is kinda heavy, too."@en ;
    observable:storageCapacityInBytes 7696581394432 ;
  ] ;
  .

(Aside: ex:StorageMediumFacet should be implemented as observable:StorageMediumFacet soon after 1.0.0.)

Here, the Facet masks a modeling matter where the "Right" thing to do is to model the tower as a composition of multiple component devices, especially as having multiple hard drives, one of which apparently wasn't actively contributing to the available storage. If we didn't have Facets, we would need to address as part of class-design that yes, a observable:Computer can have storageCapacityInBytes, but that's because it is a subclass of something like ex:ThingWithStorageCapacity, and NOT a subclass of ex:ThingProvidingStorageCapacity.

Problem 4. If OWL is the way to go with UCO, then UCO should follow the OWL approach (amongst others to benefit from the tools and techniques available for OWL).

To continue the OWL conversation, UCO needs technology demonstration pipelines that could be integrated into unit testing. There have been some less-than-successful attempts at this, which was part of what lead to self-building an OWL conformance suite in SHACL. We certainly welcome receiving guidance or demonstration on OWL mechanisms, but for now, UCO's adoption of OWL goes so far as some of the more elementary features (e.g. ontology versioning, disjointedness semantics, some property-range expression beyond RDFS) and, so not yet into OWL inferencing or RDFS domain usage.

@plbt5
Copy link
Contributor Author

plbt5 commented Aug 29, 2022

Thanks @Harm-van-Beek for your very valuable review and comments. Much appreciated.

@plbt5
Copy link
Contributor Author

plbt5 commented Sep 30, 2022

Provide for a link to the design doc section that explains the differences between ontologies and schemata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants