Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create formal version of the security vocabulary #65

Merged
merged 3 commits into from Oct 23, 2022

Conversation

iherman
Copy link
Member

@iherman iherman commented Oct 17, 2022

I have recently worked on the security vocabulary, namely to create a formal RDF vocabulary (similar to the the data model vocabulary). I started with1 and I created the same structure as for the data model, re-using the tools I had for PR in2. The first version on all this is this PR with the relevant files on3 with the CSV, Turtle, JSON-LD, and HTML files; the latter is better viewed at4.

This is definitely a draft PR at this point, because I am sure there are a bunch of mistakes. The fact is that my domain knowledge for 1 is poor, so I had to try to make sense of what is in that document. I did copy the descriptions, references, etc, into the vocabulary, to increase the quality of the documentation. I also did hit a bunch of issues, though.

Before diving in my issues, though: all this is a draft for now. I would propose not to answer all the issues below in this PR round except, maybe, (1); once the structure of the document (and the generating process thereof) are agreed upon, we can do this in a series of separate, subsequent PRs. I would, however, hope to handle this PR swiftly. Indeed, the discussion in, e.g., w3c/vc-data-model#943 is ongoing and, I believe, it is an example for a set of problems that should be handled in sync with the vocabulary specification. Put it another way, in my view the development of the @context file(s) and the vocabulary specifications should go hand in hand. If we agree on the vocabulary structure and tool-set then we do have the framework to do that properly. That is the essence of this PR, actually.

As for my issues/questions:

  1. Many of the subclass, range and domain statements are not explicit in1 (and, actually, I believe some of the entries were wrong, mixing up ranges and domains). I tried to deduce these based on the text, the examples, etc. These should be cross-checked to get everything right.

  2. 1 has some texts for some of the classes of the sort "expected properties are...". I do not think it is possible to express that in RDFS or OWL, i.e., I am not sure what we should do with those. Of course, the range statements of the relevant properties may provide that information, but they do not express formal constraints and expectations. Maybe the right approach is to put those types of constraints into a JSON Schema for a VC (but I know that the security vocabulary is not only for VCs, so that is not proper approach either). Another possibility is to add these remarks, if important, to the end of the "label", ie, the longer description (I am happy to do that if that is the consensus).

  3. I of course could not convert the examples into the vocabulary. I am not sure what would be the fate of those. I believe the best place for those examples would either be the spec5, or maybe a separate guide. To be discussed.

  4. I have not found a way to add to the vocabulary the statement on property ranges which says "xsd:string or a URL". In terms of RDF this roughly means "range can be anything except for Blank Nodes" (roughly, because there is a constraint on the string's datatype, too). It is possible to express this in OWL (for the aficionados this would be something like <…> rdfs:range [owl:unionOf (xsd:string [owl:complementOf rdfs:Literal])]) but no existing OWL reasoner could handle this for reasons that go beyond this discussion. I.e., it is useless. I would propose to ignore this but I would also seriously look at those properties whether these types of statements are o.k. in the first place. From an abstract vocabulary point of view I think a range to URL's sounds better…

  5. Of course, it would be necessary to review the term descriptions themselves editorially. E.g., I am not sure what the difference is between 'data signature suite' and a 'data signature'; some terms have an extensive description and others are just a sketch (actually a number of descriptions are missing and should be completed). But that is all right; once we have the structure in place, we can discuss that.

    B.t.w. I have modified the script in 2 to add a separate "see also" column in the CSV file. This means that, in turtle and json-ld, all the references we had in the explanation texts of1 are separated and appear as, possibly several, rdfs:seeAlso statements. The HTML version4 makes those explicit as well.

    B.t.w. 2 you may have realized that I could express the fact that some ranges are denoted as "IRI". But this does not translate into a real rdfs:range statement rather that the property is (also) of class owl:ObjectProperty. To make it symmetric, if a property's range is marked as, say, xsd:string, than it is also set to be of class owl:DatatypeProperty. This is just to make it look nice…

Footnotes

  1. https://w3c-ccg.github.io/security-vocab/ 2 3 4 5

  2. https://github.com/w3c/vc-data-model/pull/936 2

  3. https://github.com/w3c/vc-data-integrity/tree/vocabulary/create-formal-vocab/vocab/security

  4. https://raw.githack.com/w3c/vc-data-integrity/vocabulary/create-formal-vocab/vocab/security/vocabulary.html 2

  5. https://w3c.github.io/vc-data-integrity/

@TallTed
Copy link
Member

TallTed commented Oct 17, 2022

I have not reviewed everything, but thought it worth suggesting use of the softer schema:domainIncludes and/or schema:rangeIncludes than the strictly limiting rdfs:domain and/or rdfs:range for some of the (loose) formalization you described...

@msporny
Copy link
Member

msporny commented Oct 17, 2022

@iherman wrote:

I have recently worked on the security vocabulary, namely to create a formal RDF vocabulary (similar to the the data model vocabulary).

Ivan, this is fantastic, thank you! I have not had a chance to dive into the details yet.

Given that we now have two places where the same-ish .ts scripts live, we should consider packaging and releasing your tool via npm, which would allow us to just specify the CSV file here and make publishing a part of a build process. My concern is that things will get out of sync w/ one another quickly. We might ask @davidlehn to package up and release your csv2vocab tool. Just a thought. It'll be the weekend until I get get to a thorough review of this PR.

@iherman
Copy link
Member Author

iherman commented Oct 18, 2022

@iherman wrote:

Given that we now have two places where the same-ish .ts scripts live, we should consider packaging and releasing your tool via npm, which would allow us to just specify the CSV file here and make publishing a part of a build process. My concern is that things will get out of sync w/ one another quickly. We might ask @davidlehn to package up and release your csv2vocab tool. Just a thought. It'll be the weekend until I get get to a thorough review of this PR.

I can release it on npm (I have done that with other tools before) but I would wait a bit: new things may come up. This version is already an extension of the one I used for the VCM vocabulary, and more may come (see the comment of @TallTed on domain and range). I guess this also means that tool should go to its own repository (which is of course ok).

I must admit I am moderately happy with the tool in the sense that CSV has its limit. We may add new and new column types for various aspects of the vocabulary but then the CSV may become unmanageable, too. (Personally, I would prefer to start with a turtle vocabulary and generate the HTML and JSON-LD from there, but I understand that being able to modify the vocabulary description in an Excel/LibreOffice sheet is more attractive for non-RDF people...)

@iherman
Copy link
Member Author

iherman commented Oct 18, 2022

I have not reviewed everything, but thought it worth suggesting use of the softer schema:domainIncludes and/or schema:rangeIncludes than the strictly limiting rdfs:domain and/or rdfs:range for some of the (loose) formalization you described...

The problem is that the loose formalization I refer to are related to classes and not properties. In this respect, I do not think the schema: versions would be helpful...

@OR13
Copy link
Contributor

OR13 commented Oct 19, 2022

Looks good, prefer to see this stuff autogenerated.

@iherman
Copy link
Member Author

iherman commented Oct 19, 2022

The issue was discussed in a meeting on 2022-10-19

  • no resolutions were taken
View the transcript

2.4. Create formal version of the security vocabulary (pr vc-data-integrity#65)

See github pull request vc-data-integrity#65.

Manu Sporny: do we want to externalise the libraries for vocabularies.

Ivan Herman: the most important thing is that the whole doc with proper vocabulary was based on CCG document with not a good description.

Manu Sporny: Yes, the CCG document is in a half-baked state... it represents a decade of stuff that's been added. :).

Ivan Herman: there are a number of properties that say 'to be done'.

Manu Sporny: I am happy to do a full detailed review as I have been maintaining the vocabulary for a decade.

Copy link
Member

@msporny msporny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've completed a review of the generated vocabulary and think it's a good first cut that should be merged. In the worst case, I can cherry-pick the .html file and merge that.

Here are the concerns I found, only one of which should be addressed before /everything/ is merged.

  • MINOR: The git commit history seems to be duplicated, you might want to rewrite history if you want to preserve it, or I can squash it. Up to you.
  • MINOR: There is no mention of DataIntegrityProof or cryptosuite in this PR, but we can add it in a future one.
  • CRITICAL: We should only have the source files checked in for the vocabulary. It's fairly trivial to run auto-generation tools for the documents published to the web and there is no need to pollute the github history with tools and auto-generated files. I think this one is important, but can be done in parallel if we just cherry-pick the .html file for now.
  • FEATURE REQUEST: Deprecated properties should probably be moved to a deprecated section. There are many values in here now that should be deprecated. We should have a bigger warning for deprecated properties (something like the W3C spec out of date warnings).
  • FEATURE REQUEST: We should migrate the source file from CSV to YAML, but you already know that. :)
  • FEATURE REQUEST: We should add the ability to include examples, as (up to date ones) can be helpful for context.

Again, fantastic work, Ivan.

If I don't hear from you before Sunday 3pm CET, my plan is to add the DataIntegrityProof/cryptosuite definitions to the .csv, regenerate the .html file, and then cherry pick the .html file so we at least have that in order to resolve the PR in the vc-data-model.

WDYT?

@iherman
Copy link
Member Author

iherman commented Oct 22, 2022

@msporny, all,

I have changed the underlying structure, without touching significantly the technical content. Namely:

  1. Following Create formal version of the security vocabulary #65 (comment) I have factored out the tool into a separate script repository. This means that the same tool can be used for this vocabulary as well as the credentials vocabulary, and there is no danger of discrepancy. At a later step I can publish it on NPM, once we are over this initial round
  2. After some discussions, I have switched from CSV to YAML as the format to define the vocabulary itself. (The rest of the script is unchanged, only the input format has changed.) YAML is (much!) easier to edit than CSV. I do not rely on any fancy feature of YAML (like cross references), just the basic stuff. The YAML file of the vocabulary is the file that should be modified whenever the vocabulary has to be changed.

As for the auto-generation: in theory, I would love to see that once the script is mature. But I would need some help in doing that. I have done such auto-generation via github actions, but this situation is similar insofar as the script should be installed as a separate action tool, and I have no idea how to do that. Help would be welcome on this...

@iherman
Copy link
Member Author

iherman commented Oct 22, 2022

@msporny,

Here are the concerns I found, only one of which should be addressed before /everything/ is merged.

  • MINOR: The git commit history seems to be duplicated, you might want to rewrite history if you want to preserve it, or I can squash it. Up to you.

I do not care too much about the history here, to be honest. If you can squash it, please do (I am not such a wizzard of git...). Thanks.

  • MINOR: There is no mention of DataIntegrityProof or cryptosuite in this PR, but we can add it in a future one.

I intentionally did not add any new terms, and stuck with the CCG document. (Or did I miss something?) That should be a proper, bona fide PR on the evolution of the vocabulary once the structure is settled.

  • CRITICAL: We should only have the source files checked in for the vocabulary. It's fairly trivial to run auto-generation tools for the documents published to the web and there is no need to pollute the github history with tools and auto-generated files. I think this one is important, but can be done in parallel if we just cherry-pick the .html file for now.

I am not sure what you mean by "cherry-pick the .html file". The HTML file should not be touched (except the template file, of course), only the YAML file.

As for, essentially, creating an auto-generation of the files via, I presume, github actions, I would love to see that but, as I said in #65 (comment) I am not sure how to do that. Help would be good.

  • FEATURE REQUEST: Deprecated properties should probably be moved to a deprecated section. There are many values in here now that should be deprecated. We should have a bigger warning for deprecated properties (something like the W3C spec out of date warnings).

Good idea. That would require some coding; I would probably restructure the template file to list, in one section, the 'proper' terms and, in another section, the deprecated ones (if applicable). But that would also mean some coding in the HTML generation part to separate these two categories.

I have raised an issue (w3c/yml2vocab#2) in the tool's repository.

  • FEATURE REQUEST: We should migrate the source file from CSV to YAML, but you already know that. :)

Already done :-)

  • FEATURE REQUEST: We should add the ability to include examples, as (up to date ones) can be helpful for context.

I am not sure how to do that, though. I know you can add vanilla JSON content into YAML, so the information can be forwarded to the script, but I believe that should require a complete re-thinking of the HTML format, because this tabular setup that is there (and that we inherited from the earlier version of the credential vocabulary) may not work well for that.

Raised an issue (w3c/yml2vocab#3) for this, too.

@msporny msporny self-requested a review October 22, 2022 16:48
Copy link
Member

@msporny msporny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Ivan. This is good to go, AFAICT.

Copy link
Contributor

@dlongley dlongley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! +1 to this happening -- any needed tweaks can follow on as detected.

@iherman iherman marked this pull request as ready for review October 23, 2022 07:02
@iherman
Copy link
Member Author

iherman commented Oct 23, 2022

@msporny I would prefer you to do the merge, just to keep in good order... whenever you feel like it.

(Once you did that, I will have to update the security vocabulary v2 to use the same tools. A new PR will come to that effect in a few days.)

@msporny msporny force-pushed the vocabulary/create-formal-vocab branch from 483c897 to 909baf7 Compare October 23, 2022 21:15
@msporny
Copy link
Member

msporny commented Oct 23, 2022

Editorial, multiple reviews, changes requested and made, no objections, merging.

@msporny msporny merged commit 6637f15 into main Oct 23, 2022
@msporny msporny deleted the vocabulary/create-formal-vocab branch October 23, 2022 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants