New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create formal version of the security vocabulary #65
Conversation
I have not reviewed everything, but thought it worth suggesting use of the softer |
@iherman wrote:
Ivan, this is fantastic, thank you! I have not had a chance to dive into the details yet. Given that we now have two places where the same-ish .ts scripts live, we should consider packaging and releasing your tool via npm, which would allow us to just specify the CSV file here and make publishing a part of a build process. My concern is that things will get out of sync w/ one another quickly. We might ask @davidlehn to package up and release your |
I can release it on npm (I have done that with other tools before) but I would wait a bit: new things may come up. This version is already an extension of the one I used for the VCM vocabulary, and more may come (see the comment of @TallTed on domain and range). I guess this also means that tool should go to its own repository (which is of course ok). I must admit I am moderately happy with the tool in the sense that CSV has its limit. We may add new and new column types for various aspects of the vocabulary but then the CSV may become unmanageable, too. (Personally, I would prefer to start with a turtle vocabulary and generate the HTML and JSON-LD from there, but I understand that being able to modify the vocabulary description in an Excel/LibreOffice sheet is more attractive for non-RDF people...) |
The problem is that the loose formalization I refer to are related to classes and not properties. In this respect, I do not think the |
Looks good, prefer to see this stuff autogenerated. |
The issue was discussed in a meeting on 2022-10-19
View the transcript2.4. Create formal version of the security vocabulary (pr vc-data-integrity#65)See github pull request vc-data-integrity#65. Manu Sporny: do we want to externalise the libraries for vocabularies. Ivan Herman: the most important thing is that the whole doc with proper vocabulary was based on CCG document with not a good description.
Ivan Herman: there are a number of properties that say 'to be done'. Manu Sporny: I am happy to do a full detailed review as I have been maintaining the vocabulary for a decade. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've completed a review of the generated vocabulary and think it's a good first cut that should be merged. In the worst case, I can cherry-pick the .html file and merge that.
Here are the concerns I found, only one of which should be addressed before /everything/ is merged.
- MINOR: The git commit history seems to be duplicated, you might want to rewrite history if you want to preserve it, or I can squash it. Up to you.
- MINOR: There is no mention of
DataIntegrityProof
orcryptosuite
in this PR, but we can add it in a future one. - CRITICAL: We should only have the source files checked in for the vocabulary. It's fairly trivial to run auto-generation tools for the documents published to the web and there is no need to pollute the github history with tools and auto-generated files. I think this one is important, but can be done in parallel if we just cherry-pick the .html file for now.
- FEATURE REQUEST: Deprecated properties should probably be moved to a deprecated section. There are many values in here now that should be deprecated. We should have a bigger warning for deprecated properties (something like the W3C spec out of date warnings).
- FEATURE REQUEST: We should migrate the source file from CSV to YAML, but you already know that. :)
- FEATURE REQUEST: We should add the ability to include examples, as (up to date ones) can be helpful for context.
Again, fantastic work, Ivan.
If I don't hear from you before Sunday 3pm CET, my plan is to add the DataIntegrityProof/cryptosuite definitions to the .csv, regenerate the .html file, and then cherry pick the .html file so we at least have that in order to resolve the PR in the vc-data-model.
WDYT?
@msporny, all, I have changed the underlying structure, without touching significantly the technical content. Namely:
As for the auto-generation: in theory, I would love to see that once the script is mature. But I would need some help in doing that. I have done such auto-generation via github actions, but this situation is similar insofar as the script should be installed as a separate action tool, and I have no idea how to do that. Help would be welcome on this... |
I do not care too much about the history here, to be honest. If you can squash it, please do (I am not such a wizzard of git...). Thanks.
I intentionally did not add any new terms, and stuck with the CCG document. (Or did I miss something?) That should be a proper, bona fide PR on the evolution of the vocabulary once the structure is settled.
I am not sure what you mean by "cherry-pick the .html file". The HTML file should not be touched (except the template file, of course), only the YAML file. As for, essentially, creating an auto-generation of the files via, I presume, github actions, I would love to see that but, as I said in #65 (comment) I am not sure how to do that. Help would be good.
Good idea. That would require some coding; I would probably restructure the template file to list, in one section, the 'proper' terms and, in another section, the deprecated ones (if applicable). But that would also mean some coding in the HTML generation part to separate these two categories. I have raised an issue (w3c/yml2vocab#2) in the tool's repository.
Already done :-)
I am not sure how to do that, though. I know you can add vanilla JSON content into YAML, so the information can be forwarded to the script, but I believe that should require a complete re-thinking of the HTML format, because this tabular setup that is there (and that we inherited from the earlier version of the credential vocabulary) may not work well for that. Raised an issue (w3c/yml2vocab#3) for this, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Ivan. This is good to go, AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! +1 to this happening -- any needed tweaks can follow on as detected.
@msporny I would prefer you to do the merge, just to keep in good order... whenever you feel like it. (Once you did that, I will have to update the security vocabulary v2 to use the same tools. A new PR will come to that effect in a few days.) |
483c897
to
909baf7
Compare
Editorial, multiple reviews, changes requested and made, no objections, merging. |
I have recently worked on the security vocabulary, namely to create a formal RDF vocabulary (similar to the the data model vocabulary). I started with1 and I created the same structure as for the data model, re-using the tools I had for PR in2. The first version on all this is this PR with the relevant files on3 with the CSV, Turtle, JSON-LD, and HTML files; the latter is better viewed at4.
This is definitely a draft PR at this point, because I am sure there are a bunch of mistakes. The fact is that my domain knowledge for 1 is poor, so I had to try to make sense of what is in that document. I did copy the descriptions, references, etc, into the vocabulary, to increase the quality of the documentation. I also did hit a bunch of issues, though.
Before diving in my issues, though: all this is a draft for now. I would propose not to answer all the issues below in this PR round except, maybe, (1); once the structure of the document (and the generating process thereof) are agreed upon, we can do this in a series of separate, subsequent PRs. I would, however, hope to handle this PR swiftly. Indeed, the discussion in, e.g., w3c/vc-data-model#943 is ongoing and, I believe, it is an example for a set of problems that should be handled in sync with the vocabulary specification. Put it another way, in my view the development of the
@context
file(s) and the vocabulary specifications should go hand in hand. If we agree on the vocabulary structure and tool-set then we do have the framework to do that properly. That is the essence of this PR, actually.As for my issues/questions:
Many of the subclass, range and domain statements are not explicit in1 (and, actually, I believe some of the entries were wrong, mixing up ranges and domains). I tried to deduce these based on the text, the examples, etc. These should be cross-checked to get everything right.
1 has some texts for some of the classes of the sort "expected properties are...". I do not think it is possible to express that in RDFS or OWL, i.e., I am not sure what we should do with those. Of course, the range statements of the relevant properties may provide that information, but they do not express formal constraints and expectations. Maybe the right approach is to put those types of constraints into a JSON Schema for a VC (but I know that the security vocabulary is not only for VCs, so that is not proper approach either). Another possibility is to add these remarks, if important, to the end of the "label", ie, the longer description (I am happy to do that if that is the consensus).
I of course could not convert the examples into the vocabulary. I am not sure what would be the fate of those. I believe the best place for those examples would either be the spec5, or maybe a separate guide. To be discussed.
I have not found a way to add to the vocabulary the statement on property ranges which says "xsd:string or a URL". In terms of RDF this roughly means "range can be anything except for Blank Nodes" (roughly, because there is a constraint on the string's datatype, too). It is possible to express this in OWL (for the aficionados this would be something like
<…> rdfs:range [owl:unionOf (xsd:string [owl:complementOf rdfs:Literal])]
) but no existing OWL reasoner could handle this for reasons that go beyond this discussion. I.e., it is useless. I would propose to ignore this but I would also seriously look at those properties whether these types of statements are o.k. in the first place. From an abstract vocabulary point of view I think a range to URL's sounds better…Of course, it would be necessary to review the term descriptions themselves editorially. E.g., I am not sure what the difference is between 'data signature suite' and a 'data signature'; some terms have an extensive description and others are just a sketch (actually a number of descriptions are missing and should be completed). But that is all right; once we have the structure in place, we can discuss that.
B.t.w. I have modified the script in 2 to add a separate "see also" column in the CSV file. This means that, in turtle and json-ld, all the references we had in the explanation texts of1 are separated and appear as, possibly several,
rdfs:seeAlso
statements. The HTML version4 makes those explicit as well.B.t.w. 2 you may have realized that I could express the fact that some ranges are denoted as "IRI". But this does not translate into a real
rdfs:range
statement rather that the property is (also) of classowl:ObjectProperty
. To make it symmetric, if a property's range is marked as, say,xsd:string
, than it is also set to be of classowl:DatatypeProperty
. This is just to make it look nice…Footnotes
https://w3c-ccg.github.io/security-vocab/ ↩ ↩2 ↩3 ↩4 ↩5
https://github.com/w3c/vc-data-model/pull/936 ↩ ↩2
https://github.com/w3c/vc-data-integrity/tree/vocabulary/create-formal-vocab/vocab/security ↩
https://raw.githack.com/w3c/vc-data-integrity/vocabulary/create-formal-vocab/vocab/security/vocabulary.html ↩ ↩2
https://w3c.github.io/vc-data-integrity/ ↩