Create formal version of the security vocabulary #65

iherman · 2022-10-17T07:29:28Z

I have recently worked on the security vocabulary, namely to create a formal RDF vocabulary (similar to the the data model vocabulary). I started with¹ and I created the same structure as for the data model, re-using the tools I had for PR in². The first version on all this is this PR with the relevant files on³ with the CSV, Turtle, JSON-LD, and HTML files; the latter is better viewed at⁴.

This is definitely a draft PR at this point, because I am sure there are a bunch of mistakes. The fact is that my domain knowledge for ¹ is poor, so I had to try to make sense of what is in that document. I did copy the descriptions, references, etc, into the vocabulary, to increase the quality of the documentation. I also did hit a bunch of issues, though.

Before diving in my issues, though: all this is a draft for now. I would propose not to answer all the issues below in this PR round except, maybe, (1); once the structure of the document (and the generating process thereof) are agreed upon, we can do this in a series of separate, subsequent PRs. I would, however, hope to handle this PR swiftly. Indeed, the discussion in, e.g., w3c/vc-data-model#943 is ongoing and, I believe, it is an example for a set of problems that should be handled in sync with the vocabulary specification. Put it another way, in my view the development of the @context file(s) and the vocabulary specifications should go hand in hand. If we agree on the vocabulary structure and tool-set then we do have the framework to do that properly. That is the essence of this PR, actually.

As for my issues/questions:

Many of the subclass, range and domain statements are not explicit in¹ (and, actually, I believe some of the entries were wrong, mixing up ranges and domains). I tried to deduce these based on the text, the examples, etc. These should be cross-checked to get everything right.
¹ has some texts for some of the classes of the sort "expected properties are...". I do not think it is possible to express that in RDFS or OWL, i.e., I am not sure what we should do with those. Of course, the range statements of the relevant properties may provide that information, but they do not express formal constraints and expectations. Maybe the right approach is to put those types of constraints into a JSON Schema for a VC (but I know that the security vocabulary is not only for VCs, so that is not proper approach either). Another possibility is to add these remarks, if important, to the end of the "label", ie, the longer description (I am happy to do that if that is the consensus).
I of course could not convert the examples into the vocabulary. I am not sure what would be the fate of those. I believe the best place for those examples would either be the spec⁵, or maybe a separate guide. To be discussed.
I have not found a way to add to the vocabulary the statement on property ranges which says "xsd:string or a URL". In terms of RDF this roughly means "range can be anything except for Blank Nodes" (roughly, because there is a constraint on the string's datatype, too). It is possible to express this in OWL (for the aficionados this would be something like <…> rdfs:range [owl:unionOf (xsd:string [owl:complementOf rdfs:Literal])]) but no existing OWL reasoner could handle this for reasons that go beyond this discussion. I.e., it is useless. I would propose to ignore this but I would also seriously look at those properties whether these types of statements are o.k. in the first place. From an abstract vocabulary point of view I think a range to URL's sounds better…
Of course, it would be necessary to review the term descriptions themselves editorially. E.g., I am not sure what the difference is between 'data signature suite' and a 'data signature'; some terms have an extensive description and others are just a sketch (actually a number of descriptions are missing and should be completed). But that is all right; once we have the structure in place, we can discuss that.

B.t.w. I have modified the script in ² to add a separate "see also" column in the CSV file. This means that, in turtle and json-ld, all the references we had in the explanation texts of¹ are separated and appear as, possibly several, rdfs:seeAlso statements. The HTML version⁴ makes those explicit as well.

B.t.w. 2 you may have realized that I could express the fact that some ranges are denoted as "IRI". But this does not translate into a real rdfs:range statement rather that the property is (also) of class owl:ObjectProperty. To make it symmetric, if a property's range is marked as, say, xsd:string, than it is also set to be of class owl:DatatypeProperty. This is just to make it look nice…

TallTed · 2022-10-17T15:56:35Z

I have not reviewed everything, but thought it worth suggesting use of the softer schema:domainIncludes and/or schema:rangeIncludes than the strictly limiting rdfs:domain and/or rdfs:range for some of the (loose) formalization you described...

msporny · 2022-10-17T20:57:41Z

@iherman wrote:

I have recently worked on the security vocabulary, namely to create a formal RDF vocabulary (similar to the the data model vocabulary).

Ivan, this is fantastic, thank you! I have not had a chance to dive into the details yet.

Given that we now have two places where the same-ish .ts scripts live, we should consider packaging and releasing your tool via npm, which would allow us to just specify the CSV file here and make publishing a part of a build process. My concern is that things will get out of sync w/ one another quickly. We might ask @davidlehn to package up and release your csv2vocab tool. Just a thought. It'll be the weekend until I get get to a thorough review of this PR.

iherman · 2022-10-18T04:19:10Z

@iherman wrote:

Given that we now have two places where the same-ish .ts scripts live, we should consider packaging and releasing your tool via npm, which would allow us to just specify the CSV file here and make publishing a part of a build process. My concern is that things will get out of sync w/ one another quickly. We might ask @davidlehn to package up and release your csv2vocab tool. Just a thought. It'll be the weekend until I get get to a thorough review of this PR.

I can release it on npm (I have done that with other tools before) but I would wait a bit: new things may come up. This version is already an extension of the one I used for the VCM vocabulary, and more may come (see the comment of @TallTed on domain and range). I guess this also means that tool should go to its own repository (which is of course ok).

I must admit I am moderately happy with the tool in the sense that CSV has its limit. We may add new and new column types for various aspects of the vocabulary but then the CSV may become unmanageable, too. (Personally, I would prefer to start with a turtle vocabulary and generate the HTML and JSON-LD from there, but I understand that being able to modify the vocabulary description in an Excel/LibreOffice sheet is more attractive for non-RDF people...)

iherman · 2022-10-18T04:21:23Z

I have not reviewed everything, but thought it worth suggesting use of the softer schema:domainIncludes and/or schema:rangeIncludes than the strictly limiting rdfs:domain and/or rdfs:range for some of the (loose) formalization you described...

The problem is that the loose formalization I refer to are related to classes and not properties. In this respect, I do not think the schema: versions would be helpful...

OR13 · 2022-10-19T15:18:10Z

Looks good, prefer to see this stuff autogenerated.

iherman · 2022-10-19T16:10:56Z

The issue was discussed in a meeting on 2022-10-19

no resolutions were taken

View the transcript

2.4. Create formal version of the security vocabulary (pr vc-data-integrity#65)

See github pull request vc-data-integrity#65.

Manu Sporny: do we want to externalise the libraries for vocabularies.

Ivan Herman: the most important thing is that the whole doc with proper vocabulary was based on CCG document with not a good description.

Manu Sporny: Yes, the CCG document is in a half-baked state... it represents a decade of stuff that's been added. :).

Ivan Herman: there are a number of properties that say 'to be done'.

Manu Sporny: I am happy to do a full detailed review as I have been maintaining the vocabulary for a decade.

msporny

I've completed a review of the generated vocabulary and think it's a good first cut that should be merged. In the worst case, I can cherry-pick the .html file and merge that.

Here are the concerns I found, only one of which should be addressed before /everything/ is merged.

MINOR: The git commit history seems to be duplicated, you might want to rewrite history if you want to preserve it, or I can squash it. Up to you.
MINOR: There is no mention of DataIntegrityProof or cryptosuite in this PR, but we can add it in a future one.
CRITICAL: We should only have the source files checked in for the vocabulary. It's fairly trivial to run auto-generation tools for the documents published to the web and there is no need to pollute the github history with tools and auto-generated files. I think this one is important, but can be done in parallel if we just cherry-pick the .html file for now.
FEATURE REQUEST: Deprecated properties should probably be moved to a deprecated section. There are many values in here now that should be deprecated. We should have a bigger warning for deprecated properties (something like the W3C spec out of date warnings).
FEATURE REQUEST: We should migrate the source file from CSV to YAML, but you already know that. :)
FEATURE REQUEST: We should add the ability to include examples, as (up to date ones) can be helpful for context.

Again, fantastic work, Ivan.

If I don't hear from you before Sunday 3pm CET, my plan is to add the DataIntegrityProof/cryptosuite definitions to the .csv, regenerate the .html file, and then cherry pick the .html file so we at least have that in order to resolve the PR in the vc-data-model.

WDYT?

iherman · 2022-10-22T11:22:25Z

@msporny, all,

I have changed the underlying structure, without touching significantly the technical content. Namely:

Following Create formal version of the security vocabulary #65 (comment) I have factored out the tool into a separate script repository. This means that the same tool can be used for this vocabulary as well as the credentials vocabulary, and there is no danger of discrepancy. At a later step I can publish it on NPM, once we are over this initial round
After some discussions, I have switched from CSV to YAML as the format to define the vocabulary itself. (The rest of the script is unchanged, only the input format has changed.) YAML is (much!) easier to edit than CSV. I do not rely on any fancy feature of YAML (like cross references), just the basic stuff. The YAML file of the vocabulary is the file that should be modified whenever the vocabulary has to be changed.

As for the auto-generation: in theory, I would love to see that once the script is mature. But I would need some help in doing that. I have done such auto-generation via github actions, but this situation is similar insofar as the script should be installed as a separate action tool, and I have no idea how to do that. Help would be welcome on this...

iherman · 2022-10-22T11:36:32Z

@msporny,

Here are the concerns I found, only one of which should be addressed before /everything/ is merged.

MINOR: The git commit history seems to be duplicated, you might want to rewrite history if you want to preserve it, or I can squash it. Up to you.

I do not care too much about the history here, to be honest. If you can squash it, please do (I am not such a wizzard of git...). Thanks.

MINOR: There is no mention of DataIntegrityProof or cryptosuite in this PR, but we can add it in a future one.

I intentionally did not add any new terms, and stuck with the CCG document. (Or did I miss something?) That should be a proper, bona fide PR on the evolution of the vocabulary once the structure is settled.

CRITICAL: We should only have the source files checked in for the vocabulary. It's fairly trivial to run auto-generation tools for the documents published to the web and there is no need to pollute the github history with tools and auto-generated files. I think this one is important, but can be done in parallel if we just cherry-pick the .html file for now.

I am not sure what you mean by "cherry-pick the .html file". The HTML file should not be touched (except the template file, of course), only the YAML file.

As for, essentially, creating an auto-generation of the files via, I presume, github actions, I would love to see that but, as I said in #65 (comment) I am not sure how to do that. Help would be good.

FEATURE REQUEST: Deprecated properties should probably be moved to a deprecated section. There are many values in here now that should be deprecated. We should have a bigger warning for deprecated properties (something like the W3C spec out of date warnings).

Good idea. That would require some coding; I would probably restructure the template file to list, in one section, the 'proper' terms and, in another section, the deprecated ones (if applicable). But that would also mean some coding in the HTML generation part to separate these two categories.

I have raised an issue (w3c/yml2vocab#2) in the tool's repository.

FEATURE REQUEST: We should migrate the source file from CSV to YAML, but you already know that. :)

Already done :-)

FEATURE REQUEST: We should add the ability to include examples, as (up to date ones) can be helpful for context.

I am not sure how to do that, though. I know you can add vanilla JSON content into YAML, so the information can be forwarded to the script, but I believe that should require a complete re-thinking of the HTML format, because this tabular setup that is there (and that we inherited from the earlier version of the credential vocabulary) may not work well for that.

Raised an issue (w3c/yml2vocab#3) for this, too.

msporny

Thank you, Ivan. This is good to go, AFAICT.

dlongley

Thanks! +1 to this happening -- any needed tweaks can follow on as detected.

iherman · 2022-10-23T07:04:11Z

@msporny I would prefer you to do the merge, just to keep in good order... whenever you feel like it.

(Once you did that, I will have to update the security vocabulary v2 to use the same tools. A new PR will come to that effect in a few days.)

msporny · 2022-10-23T21:16:53Z

Editorial, multiple reviews, changes requested and made, no objections, merging.

iherman requested review from msporny, dlongley, pchampin, mprorock and OR13 October 17, 2022 07:29

OR13 approved these changes Oct 19, 2022

View reviewed changes

iherman mentioned this pull request Oct 19, 2022

Add DataIntegrityProof to credentials/v2 JSON-LD Context. w3c/vc-data-model#943

Merged

msporny requested changes Oct 22, 2022

View reviewed changes

iherman mentioned this pull request Oct 22, 2022

Deprecated properties should be better highlighted. w3c/yml2vocab#2

Closed

msporny self-requested a review October 22, 2022 16:48

msporny approved these changes Oct 22, 2022

View reviewed changes

dlongley approved these changes Oct 22, 2022

View reviewed changes

iherman marked this pull request as ready for review October 23, 2022 07:02

iherman added 3 commits October 23, 2022 17:14

Add README.md to security vocabulary.

d043c9b

Add security template and YAML file to security vocabulary.

983d983

Add HTML, JSON-LD, and TURTLE renderings for security vocabulary.

909baf7

msporny force-pushed the vocabulary/create-formal-vocab branch from 483c897 to 909baf7 Compare October 23, 2022 21:15

msporny merged commit 6637f15 into main Oct 23, 2022

msporny deleted the vocabulary/create-formal-vocab branch October 23, 2022 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create formal version of the security vocabulary #65

Create formal version of the security vocabulary #65

iherman commented Oct 17, 2022

TallTed commented Oct 17, 2022

msporny commented Oct 17, 2022

iherman commented Oct 18, 2022

iherman commented Oct 18, 2022

OR13 commented Oct 19, 2022

iherman commented Oct 19, 2022

2.4. Create formal version of the security vocabulary (pr vc-data-integrity#65)

msporny left a comment •

edited

iherman commented Oct 22, 2022

iherman commented Oct 22, 2022

msporny left a comment

dlongley left a comment

iherman commented Oct 23, 2022

msporny commented Oct 23, 2022

Create formal version of the security vocabulary #65

Create formal version of the security vocabulary #65

Conversation

iherman commented Oct 17, 2022

Footnotes

TallTed commented Oct 17, 2022

msporny commented Oct 17, 2022

iherman commented Oct 18, 2022

iherman commented Oct 18, 2022

OR13 commented Oct 19, 2022

iherman commented Oct 19, 2022

2.4. Create formal version of the security vocabulary (pr vc-data-integrity#65)

msporny left a comment • edited

Choose a reason for hiding this comment

iherman commented Oct 22, 2022

iherman commented Oct 22, 2022

msporny left a comment

Choose a reason for hiding this comment

dlongley left a comment

Choose a reason for hiding this comment

iherman commented Oct 23, 2022

msporny commented Oct 23, 2022

msporny left a comment •

edited