Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCO support for Protégé #449

Closed
14 tasks done
ajnelson-nist opened this issue Aug 15, 2022 · 2 comments · Fixed by #450 or #530
Closed
14 tasks done

UCO support for Protégé #449

ajnelson-nist opened this issue Aug 15, 2022 · 2 comments · Fixed by #450 or #530

Comments

@ajnelson-nist
Copy link
Contributor

ajnelson-nist commented Aug 15, 2022

Disclaimer

Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.

Background

Protégé is a tool for editing ontologies.

https://protege.stanford.edu/

It is able to open and interact with ontologies stored as local files in a user's desktop environment. It is also capable of resolving all of the ontology imports (encoded as owl:import statements), recursively retrieving all ontologies referenced from any loaded ontology.

By default, Protégé will do an over-the-wire retrieval on encountering an owl:imports statement: The referenced IRI will be downloaded in whatever RDF serialization is offered (seemingly preferring application/rdf+xml). It is possible to include an "Override" XML file that can be interpreted as: "Whenever Protégé encounters this IRI, instead of loading a file from a network retrieval, load a file from this hard-coded relative or absolute path."

There is a slight technical matter with the XML file: It must reside in the same directory as the ontology file one would open with Protégé.

Requirements

Requirement 1

UCO should store a Protégé catalog-v001.xml file in ontology/uco/master/, enumerating all UCO ontology files.

Requirement 2

The catalog-v001.xml file's hard-coded enumeration must be tested to be in sync. with UCO's imports.

Requirement 3

CASE must provide the same Protégé support as UCO, maintaining its own catalog-v001.xml file in ontology/master/, enumerating all CASE and UCO ontology files. While this might seem out of scope of UCO's purview, this requirement is also to ensure UCO can enable any downstream ontology to provide the same support for Protégé that UCO does.

Risk / Benefit analysis

Benefits

  • Storage of this file would enable anybody loading all of UCO into Protégé to navigate the class hierarchy without requiring network access.
    • Similarly, any pre-release version of UCO can also be loaded into Protégé, prevent conflicts with ontology files that are on the web as part of the most recent release.

Risks

  • The catalog-v001.xml file is necessarily a hard-coded enumeration of resources. Hence, any additions of new .ttl files would need to be kept in sync to maintain referential integrity within the Protégé application.
    • This is believed to be a not-terribly-difficult piece of custom programming to implement as a unit test. However, it is custom software that would add to the ontology-level software base.
  • It's fair to consider that each ontology file should get a catalog-v001.xml in order to locally resolve its imports. Else, Protégé is only compatible with UCO when loaded in its entirely. This would be additional work to maintain.
  • The import statements in UCO will potentially use owl:versionIRI. If so, these catalog XML files would need to be updated with every release that changes the owl:versionIRI statement.
  • CASE currently tracks UCO as a Git submodule. Directions would need to be added to CASE's documentation---likely in CONTRIBUTE.md---that Protégé local resolution would only work if git submodule update --init has been run at least once.

Competencies demonstrated

Competency 1

A user is interested in using Protégé to load all of UCO's current state in the develop branch, which has some changes implemented since the last UCO release.

Competency Question 1.1

How does the user see the current version of observable:File in the develop branch?

Result 1.1

If there is only one catalog-v001.xml in UCO - uco.ttl would need to be opened with Protégé. Then, observable:File's current state would be viewable through the class navigator.

If instead each ontology directory gets a catalog-v001.xml - observable.ttl would need to be opened. The rest is as above.

Solution suggestion

  • Add catalog-v001.xml alongside uco.ttl, in the directory ${top_srcdir}/ontology/uco/master/.
  • Add a program (likely Python) that confirms these sets of IRIs align:
    • For an ontology file X, the set of IRIs in that file's transitive closure (which might only be apparent from the monolithic test build);
    • For the same ontology file X, the set of IRIs mapped in the catalog-v001.xml.
  • Confirm the test program developed for UCO functions for CASE, and thus in theory other downstream adopters.

Coordination

  • Tracking in Jira ticket OC-261
  • Requirements to be discussed in OC meeting, 2022-08-16
  • Requirements Review vote occurred, passing, on 2022-08-16
  • Requirements development phase completed.
  • Solution announced to OCs on 2023-03-15
  • Solutions Approval to be discussed in OC meeting, 2023-03-23
  • Solutions Approval vote occurred, passing, on 2023-03-23
  • Solutions development phase completed.
  • Backwards-compatible implementation merged into develop for the next release
  • develop state with backwards-compatible implementation merged into develop-2.0.0
  • Backwards-incompatible implementation merged into develop-2.0.0 (N/A)
  • Backwards-compatible CASE implementation merged into develop for the next release
  • Milestone linked
  • Documentation logged in pending UCO release page
  • Documentation logged in pending CASE release page
ajnelson-nist pushed a commit that referenced this issue Aug 15, 2022
AJN: This is a *partial* cherry-pick of the commit by @DrSnowbird.  The
`index.html` file has been removed from this commit, and the rename of
the "root" `uco.ttl` file has been reverted, in order to save on Git
noise.

A follow-on patch will address the two new files.

References:
* #449

(cherry picked from commit 0747a62)
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Aug 15, 2022
This patch modifies paths in the catalog file, but does not attempt an
import of the UCO co and owl ontologies.  For a yet-undiagnosed reason,
adding those reference resolutions causes Protégé to fail the load.

References:
* #449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist ajnelson-nist linked a pull request Aug 15, 2022 that will close this issue
12 tasks
@ajnelson-nist
Copy link
Contributor Author

PR 450 has been filed to start the implementation for this proposal. Thanks again, @DrSnowbird, for contributing the start of this branch.

Unfortunately, a curious issue arose with trying to load the Collections Ontology shape file. There's a chance it will be resolvable with an extra Git submodule based interaction - I'm out of time to test tonight.

My current feeling is that the testing infrastructure complexity for this might be too high for integration with the UCO 1.0.0 release, but it is a backwards-compatible change that could be integrated with any 1.x.0.

@ajnelson-nist ajnelson-nist added this to the UCO 1.x.0 milestone Aug 17, 2022
ajnelson-nist added a commit that referenced this issue Feb 8, 2023
This patch adds a Makefile and configuration file to sketch the call
pattern for the catalog creation script.  More work would need to follow
to align the Makefile with the current descent order, but this patch
provides enough for the current pass of development and testing.

References:
* #449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Mar 13, 2023
This implementation pattern should enable every directory with a Turtle
file to be able to call the script with the same command pattern,
whether in UCO or a downstream ontology.  Hard-coded logic moves out of
the script, and into maintaining a tab-separated-values file and Make
calls to spcify what ontology file to inspect.  This way, any individual
ontology file can be loaded into Protege if desired.

This patch modifies the demonstrated Makefile call pattern.  After
demonstration by regenerating the catalog XML file for the root
`uco.ttl` graph, future patches will generate other catalog files.

This patch also removes some erroneously copy-pasted script text from
`/ontology/uco/master/Makefile`, and retires the first draft by
@DrSnowbird.

References:
* #449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Mar 13, 2023
With this patch, I can also confirm that opening the `uco.ttl` file in
the affected directory worked for me with Protege.

The steps to regenerate this file are not yet captured in CI:

    cd ontology/uco/master
    make

A future patch will add catalog regeneration to CI.

References:
* #449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Mar 15, 2023
It has been an oversight that prov-o was not included in the transitive
closure construction, given its usage with `case_prov`.  The oversight
became a blocking issue in implementation of UCO Issue 449, which has a
solution in draft that requires the transitive closure be present, and
DCAT imports PROV-O as a dependency.

A follow-on patch will generate PROV-O per this recipe.

References:
* ucoProject/UCO#449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Mar 15, 2023
The Protege catalog XML construction design involves review of the
transitive `owl:imports` closure of UCO's graph files.  To do this,
RDFLib needs to be available in a Python virtual environment before
descending into the `/ontology` directory.  Before this patch, a virtual
environment with RDFLib was constructed in `/tests` for testing, but not
for other ontology-related maintenance.

This patch moves the virtual environment construction to the UCO
repository root directory, and sets the dependency order of `all` and
`check` to include `/venv` being built before descent into `/ontology`
and `/tests.  All paths referencing `/tests/venv` under `/tests` have
been updated.

As one bit of code upgrading, the `PYTHON3` selector snippet, originally
written before Python 3.10's release, now looks only for the default
Python 3 if not supplied when calling Make (e.g. with
`make PYTHON3=python3.11 check`).

This patch isolates its effects to moving the virtual environment.
A follow-on patch will integrate catalog construction using the new
placement.

References:
* #449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Mar 15, 2023
The construction script now handles multiple input ontology files, but
with the requirement that they be in the same directory.

Interfaces have also been added to handle imported, possibly non-CDO
ontology references in two ways:
* With a TSV file mapping ontology IRIs or version IRIs to files.
* With optional references to (effectively imported) `catalog-v001.xml`
  files.

Another behavior change is implemented: the focus ontologies are now
also added to the `catalog-v001.xml` file, in part to support when
multiple graph files are in one directory, and in part to support
re-consumption of `catalog-v001.xml` by the `catalog-v001.xml`
generating script.

The rationales for how to handle ontology--file mappings outside the
scope of UCO (in both upstream and downstream directions) include:
* Symbolic links could have been used to pool all file references into
  the `/dependencies` directory.  Windows users that run `git clone`
  without symbolic links enabled for their system would encounter
  significantly counter-intuitive errors.
  - This also would not iterate well with consumers of the catalog
    script outside of UCO (e.g. CASE).
* A Makefile could have been made to normalize the dependent ontology
  files into the same Turtle style (or even away from RDF-XML, which the
  Collections Ontology currently uses as sole format).  However, this
  would again be a point of difficulty for Windows users, as they would
  have to run `make` to create the files referenced in the catalog XML.
* Copying files into a Git repository introduces code-drift issues that
  are difficult to manage.  When the copied files were themselves
  tracked in Git, this is counter to the purpose of Git submodules.

This patch goes on the assumption that Git submodules and recursive
cloning are a reasonable minimal requirement to access full local-file
ontology interaction.

The catalog generating script in this patch state has been tested
(offline) with CASE and CASE-Corpora as users, via a submodule chain
starting with CASE-Corpora.

The `CONTRIBUTE.md` file has also been updated to add usage
documentation, and to fix a copy-paste error from some time ago.

A follow-on patch will regenerate Make-managed files.

References:
* #449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Mar 15, 2023
References:
* #449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist ajnelson-nist modified the milestones: UCO 1.x.0, UCO 1.2.0 Mar 15, 2023
ajnelson-nist added a commit to casework/CASE that referenced this issue Mar 15, 2023
This commit matches the rationale of UCO commit `66d0c38`, in support
UCO Issue 449.

The UCO submodule pointer is also bumped due to its own motion of
virtual environment resources in the commit noted above.

References:
* ucoProject/UCO#449
* ucoProject/UCO@66d0c38
ajnelson-nist added a commit to casework/CASE that referenced this issue Mar 15, 2023
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#449
ajnelson-nist added a commit to casework/CASE that referenced this issue Mar 15, 2023
References:
* ucoProject/UCO#449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Mar 15, 2023
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Mar 15, 2023
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Mar 15, 2023
References:
* ucoProject/UCO#449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Contributor Author

PR 450 now has an implemented solution for this Issue. The summary of effects is that with the associated PRs merged, UCO works in Protégé wholly with locally-stored (/-versioned) ontology files, and the generation mechanism is confirmed to work for downstream ontologies. I have tested this with CASE and CASE-Corpora (test links are in PR 450). CASE-Corpora is now able to generate catalog-v001.xml files that cover its ontology import closure.

Some unexpected developments occurred:

  • The build order no longer made sense with the Python virtual environment being nestled under /tests, due to /ontology needing Python to have access to rdflib. That environment is now built in the repository root, for UCO and for CASE. Rationale is described in 66d0c38.
  • The catalog-generating script is designed to create one catalog-v001.xml file per directory with ontology files. This is to mirror the behavior of Protégé, where that XML file has an effect just on graph files in its current directory, and it does not seem to be transitive (i.e. Protégé does not seem to inherit catalog-v001.xml files when following relative-pathed uri references). This is also necessary because file references are only done as relative paths from the XML file's housing directory.
  • The catalog-generating script currently generates an XML file that lists paths to backing files for every imported ontology-reference (that is, the object of every $subject owl:imports $object triple) -- "every" meaning through the transitive import closure of whatever ontology file was passed to the script as the "Focus ontology". The hard-coding for this name resolution is done in a few tab-separated value files. The catalog files regenerate as part of CI; see the Makefile reivisions for the recipes and /etc for the new files, in UCO, CASE, and CASE-Corpora.

The overall increase in risk is for projects that track UCO and CASE as Git submodules for the sake of re-using their virtual environment and/or monolithic ontology build. The virtual environment motion means some tracking projects will need to update paths to scripts. I'm aware that this will impact the CASE Python Utilities' monolithic build tracking and the documentation engines (CASE's, UCO's) most, but I suggest that overall this is logistically acceptable, as paths only need to be updated once per tracking Git project.

ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Mar 15, 2023
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Mar 15, 2023
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Mar 15, 2023
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist ajnelson-nist linked a pull request Mar 28, 2023 that will close this issue
4 tasks
ajnelson-nist added a commit to casework/CASE-Utilities-Python that referenced this issue Mar 28, 2023
This patch, normally a brief edit of `version_info.py`, also makes path
updates to the virtual environments that were enacted as part of UCO
Issue 449.  Test updates made for UCO Issue 508 are also forward-ported.

A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#449
* ucoProject/UCO#508

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to ucoProject/ontology.unifiedcyberontology.org that referenced this issue Apr 24, 2023
This catches a resource move made for UCO Issue 449.

References:
* ucoProject/UCO#449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to Cyber-Domain-Ontology/CDO-Shapes-Example that referenced this issue Aug 16, 2023
CDO Shape repositories are not necessarily being designed to require UCO
as a Git submodule.  At least one, which will take UCO's OWL-review
shapes into their own repository, will avoid tracking UCO as a submodule
in order to prevent circular Git submodule dependencies.

The catalog construction script created for UCO Issue 449 is still
potentially useful for inspecting shapes for ontologies with one or more
`owl:imports` statements, even if UCO is not available via a Git
submodule.

Hence, this patch copies the catalog construction script from UCO (at
version 1.2.0), as part of moving the script out of UCO.

The inlined NIST license also receives an update in this patch, and the
format review adjusts some syntax style.  So, the version of the script
receives a bump.

No effects were observed on Make-managed files.

References:
* ucoProject/UCO#449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to Cyber-Domain-Ontology/CDO-Shapes-Example that referenced this issue Aug 16, 2023
CDO Shape repositories are not necessarily being designed to require UCO
as a Git submodule.  At least one, which will take UCO's OWL-review
shapes into their own repository, will avoid tracking UCO as a submodule
in order to prevent circular Git submodule dependencies.

The catalog construction script created for UCO Issue 449 is still
potentially useful for inspecting shapes for ontologies with one or more
`owl:imports` statements, even if UCO is not available via a Git
submodule.

Hence, this patch copies the catalog construction script from UCO (at
version 1.2.0), as part of moving the script out of UCO.

The inlined NIST license also receives an update in this patch, the
format review adjusts some syntax style, and an RDFLib type reference is
updated.  For these patch-level changes, the version of the script
receives a bump.

No effects were observed on Make-managed files.

References:
* ucoProject/UCO#449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to Cyber-Domain-Ontology/CDO-Shapes-Example that referenced this issue Aug 16, 2023
CDO Shape repositories are not necessarily being designed to require UCO
as a Git submodule.  At least one, which will take UCO's OWL-review
shapes into their own repository, will avoid tracking UCO as a submodule
in order to prevent circular Git submodule dependencies.

The catalog construction script created for UCO Issue 449 is still
potentially useful for inspecting shapes for ontologies with one or more
`owl:imports` statements, even if UCO is not available via a Git
submodule.

Hence, this patch copies the catalog construction script from UCO (at
version 1.2.0), as part of moving the script out of UCO.

The inlined NIST license also receives an update in this patch, the
format review adjusts some syntax style, and an RDFLib type reference is
updated.  For these patch-level changes, the version of the script
receives a bump.

No effects were observed on Make-managed files.

References:
* ucoProject/UCO#449

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant