Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to define additional metadata in JSON-LD @context? #32

Closed
jechterhoff opened this issue Jul 5, 2018 · 33 comments
Closed

Comments

@jechterhoff
Copy link

Hello,
I'm currently analysing JSON-LD 1.1 (reviewed the Final Community Group Report) with the goal of semantically enabling JSON data. Is there a way to define additional metadata in a JSON-LD context?

(I hope it is ok to ask this question here. If I should ask somewhere else, for example the mailing list, let me know, please.)

Let me explain what I mean with an example:

{
 "@context": {
  "@base": "http://example.org/baumregister/",
  "@version": 1.1,
  "xsd": "http://www.w3.org/2001/XMLSchema#",
  "geosparql": "http://www.opengis.net/ont/geosparql#",
  "ex": "http://example.org/ontology/flora/",
  "ort": "geosparql:hasGeometry",
  "wkt": {
   "@id": "geosparql:asWKT",
   "@type": "geosparql:wktLiteral"
  },
  "eid": "@id",
  "art": "@type",
  "Eiche": "ex:oak",
  "Walnuss": "ex:walnut",
  "hoehe": {   
   "@id": "ex:height",
   "@type": "xsd:double",
   "@derivedBy": "A specific description ...",
   "uom": "m"
  },
  "alter": {
   "@id": "ex:age",
   "@type": "xsd:integer"
  }
 },
 "art": "Eiche",
 "hoehe": "16",
 "eid": "08218adf-7947-4f28-bcaf-e069ef43e012",
 "alter": 242,
 "ort": {"wkt": "POINT(8.191035,51.899666)"}
}

In this example, the key "hoehe" expands to the IRI http://example.org/ontology/flora/height. In addition, the value type is defined as http://www.w3.org/2001/XMLSchema#double.

In the example, I've added two keys as metadata to the definition of term "hoehe":

  • "@derivedBy" - The value is supposed to contain a textual description. I used the leading "@" on purpose for testing.
  • "uom" - An indication of the unit of measure for "hoehe".

When I test this on the JSON-LD dev playground (a great tool, by the way - so helpful!), I get the following syntax errors:

  • For "@derivedBy": Invalid JSON-LD syntax; a term definition must not contain @derivedBy
  • For "uom": Invalid JSON-LD syntax; a term definition must not contain uom

When the two keys are removed, the example "works".
In the JSON-LD 1.1 specification I find the following:

An expanded term definition MUST be a JSON object composed of zero or more keys from @id, @reverse, @type, @language, @context, @Prefix, or @container. An expanded term definition SHOULD NOT contain any other keys.

Apparently I should not use the two additional keys, but they do not appear to be explicitly forbidden either. Now I am wondering if the dev playground is too strict or not. Can someone please enlighten me as to where in a JSON-LD context I can add custom keys - if at all - and what restrictions, if any, I need to be aware of? Any additional insight would be highly appreciated.

I did come across the following two issues, which may be related to my question:

@gkellogg
Copy link
Member

We have become more prescriptive about what can be in a context, but could consider issuing warnings for non-keyword-like keys (e.g., "uom" in your example) and issue a warning. It might be better to tackle this with a new keyword, such as @comment, which would be explicitly allowed and content ignored.

@azaroth42
Copy link
Contributor

@comment only in contexts? Otherwise we should add it to the round tripping discussion.

@BigBlueHat
Copy link
Member

The uom (Unit of Measure) thing is interesting, but I'm not sure the context is the best place for that--as it seems data related--i.e. you'd want this in the output graph somewhere.

Perhaps something like https://schema.org/QuantitativeValue (or similar) would be useful here?

{
  "hoehe": {
    "@type": "schema:QualitativeValue",
    "value": 6,
    "unitText": "m"
  }
}

@azaroth42
Copy link
Contributor

Related to #31 in terms of context adding actual data, which is currently out of scope (and I feel should remain out of scope).

@jechterhoff
Copy link
Author

Thank you all for your responses and comments so far.

Having @comment as new keyword would be very useful, I think. In my example, it could replace @derivedBy. In my opinion, it would be useful to allow @comment on any level of nesting within a JSON-LD context.

Regarding the "uom" from the example: My use case is to semantically enable JSON data, by adding an external JSON-LD context document as described in section "JSON data as JSON-LD" from the JSON-LD final community group report. I do not intend to add any actual data to the JSON data through the JSON-LD context. That might cause issues with digital signatures of the data.

The idea of having the "uom" within the definition of term "hoehe" in the example was to allow any applications that process the context to use the information of this application specific metadata to correctly interpret the height value - without requiring the actual data to contain the unit of measure. However, "uom" is just an example. I was thinking that a term definition could contain any non-keyword-like keys which might be useful for a specific application or community. An application that parses the context should simply ignore such keys when they are encountered in a term definition and are unknown to the application. The JSON-LD playground could issue a warning when it finds such keys, to emphasize what is currently defined by the standard: that an expanded term definition should not contain any keys other than the keywords defined by the standard. However, when application specific extensions of term definitions are not explicitly forbidden, then the note in the standard could actually be changed to:

An expanded term definition MUST be a JSON object composed of zero or more keys from @id, @reverse, @type, @language, @context, @Prefix, or @container. The object MAY contain non-keyword-like keys. Each non-keyword-like key that is unknown to an application that processes the term definition MUST be ignored. The application MAY issue a warning if such a key is encountered.

@azaroth42
Copy link
Contributor

Part of the way towards consensus about this issue, is the resolution not to add comment type data into graph instance data.

The remaining discussion is whether or not to allow comments in contexts and frames. Allowing comments in context data would allow embedded micro-contexts within instance data that contained only a comment (c.f. @language rationale in #5).

WG members with opinions are to add their proposal and rationales to the issue before the call on 2018-08-03.

@msporny
Copy link
Member

msporny commented Jul 30, 2018

-1 to adding @comment.

I get the use case, but adding a new JSON-LD keyword such as this, while easy from an implementer perspective creates added cognitive burden for authors and potentially problematic outcomes (your comment moves to a place you didn't intend it to move to). There is yet another keyword that they have to know.

Comments should be in Vocabulary documents and other human-readable documents. JSON-LD Context files and documents are not intended to be human-readable... especially since we can't place @comment by the item that was originally intended due to not being able to control where it ends up when the JSON object is reserialized.

@azaroth42
Copy link
Contributor

I am +1 to adding @comment as a reserved keyword, valid in the scope of contexts only. The value of @comment could be any valid JSON construction, and JSON-LD processors MUST ignore the value.

Rationale:

  • All (?) other RDF serializations have syntax level commenting. JSON does not have syntax level commenting, requiring any data not intended for machine processing to be embedded within the same representation in "magic" fields such as @comment. We control the schema for contexts and frames, and thus we are the correct authority to determine where such commentary could be included. This would be bringing JSON-LD up to parity with the functionality of other syntaxes.
  • It gives a consistent place to put comments intended for humans looking at the data, such that they are better able to understand the intent or usage.
  • It gives a consistent place to put metadata about the document, such as creator and date of last modification.
  • It gives a consistent place to put out of band information that is guaranteed not to cause an error or collide with other context-defined mappings. For example, when exporting data from a system, it is useful to include internal metadata not intended for JSON-LD processors but instead for other components in that pipeline. In a generic pipeline, there would be no guarantee that the key chosen did not collide with context-defined mappings. That pipeline might span across systems.

I disagree with @msporny that this adds a significant burden for developers:

  • It is intended to make it easier for developers to understand the data
  • It is (I think) self-evident as to its intent ... more so than some existing keyword usage (e.g. the confusion caused by overloading of @type for both datatype and class experienced by many members of the WG, per the previous call)
  • It would be optional. If the audience of a particular context is solely machines, and no human ever needs to develop using the serialization, then don't put in comments. This is not true for many communities, however, and it would be better to have consistency than all of them using different hacks to work around the syntax level lack of functionality.
  • Unless the comment is intended to be associated with a literal, it can always be within the JSON object, and thus be serialized with it.

I also try to address some arguments against:

  • It doesn't survive round-tripping through other RDF syntaxes, when put into the @context of a node within an instance -- Yes, that's intentional not a bug :) If you need something like that, then put it in the data.
  • JSON shouldn't have comments -- Yet everyone asks for them, and uses them in other machine readable syntaxes.
  • Just put it in the body of the context document -- this works for context documents, but there is the convention of putting the ontology within the body, meaning that the data might collide. Also, it would mean mapping into RDF in order to put arbitrary structure into the data, and defining those terms in the context.
  • Just put it in external documentation -- this is a good computer science solution, but not a good practical one. In this case there is no link to the next level of abstraction from the context document found on the web to the documentation for that context. That problem could be solved separately, but would not meet all of the use cases described above.

@ajs6f
Copy link
Member

ajs6f commented Jul 30, 2018

Is there any serious effort currently afoot of which anyone knows to introduce a comment facility for JSON?

@msporny
Copy link
Member

msporny commented Jul 30, 2018

All (?) other RDF serializations have syntax level commenting. JSON does not have syntax level commenting, requiring any data not intended for machine processing to be embedded within the same representation in "magic" fields such as @comment. We control the schema for contexts and frames, and thus we are the correct authority to determine where such commentary could be included. This would be bringing JSON-LD up to parity with the functionality of other syntaxes.

The point is that the base syntax doesn't have comments. I'll note that JSON still doesn't have comments and developers that use it are just fine without them.

It gives a consistent place to put comments intended for humans looking at the data, such that they are better able to understand the intent or usage.

No, it's not consistent... @comment can find itself anywhere in the object it's included in and can only be used once per object. It doesn't work like a comment that most developers are familiar with (that is, something that sticks next to the code for which it was intended). This @comment can move around depending on the implementation that's reading it. So, calling it @comment is bad because it doesn't actually act like a comment that most developers are used to.

It gives a consistent place to put metadata about the document, such as creator and date of last modification.

How do you know it's metadata about the document instead of metadata about the field it is next to? Won't this just force people to create microsyntaxes for the comment like "Author: J. Doe, Date: 2018-10-19"?

It gives a consistent place to put out of band information that is guaranteed not to cause an error or collide with other context-defined mappings. For example, when exporting data from a system, it is useful to include internal metadata not intended for JSON-LD processors but instead for other components in that pipeline. In a generic pipeline, there would be no guarantee that the key chosen did not collide with context-defined mappings. That pipeline might span across systems.

If the metadata and pipeline spans across systems, it is no longer internal. JSON-LD is the mechanism that can be used for annotating system data. That said, if this is a use case, we should write it up and then see if this new mechanism is the best way to solve it.

Comments are supposed to be metadata that is out of band of the data structure. This suggests that we put something in band of the data structure that is metadata about the data structure. Feels like the wrong level of abstraction.

@msporny
Copy link
Member

msporny commented Jul 30, 2018

I think a better feature here would be explicit drop support for JSON-LD Processors... as in, you see these terms, drop them... This may be useful when we go to making JSON-LD Signatures throw errors for any unknown value... in some cases, people may not want to map the value, but rather drop the value.

That said, I hesitate to suggest that as a feature at this point in time. The only reason I'm doing it is that it would prevent the addition of "@comment" with a more useful feature that can be used for things that are more than just comments.

@azaroth42
Copy link
Contributor

I agree that developers get by without comments in JSON ... clearly. However that the stackoverflow question was asked 10 years ago, and continues to be active sends the signal to me that it's still of keen interest: https://stackoverflow.com/questions/244777/can-comments-be-used-in-json
Viewed 1.5 million times, with thousands of up votes, suggests that it's desirable.
The top ranked workaround is, of course, to create _comment ... which is what we would be standardizing for JSON-LD contexts.

By consistent, I meant (consistently) that it would be the same across contexts. If we don't define it, then thousands of implementers will pick something different to use. I agree it's not consistent with other syntaxes commenting functionality, and that unordered keys makes the comment node unreliable in position. Happy to call it @ignored or @outofband or anything else.

The pipeline use case came up here where the system needs additional information when processing the JSON-LD input to correctly manage it, due to internal-to-the-implementation limitations. I agree that it generates micro-syntaxes, but I would rather these meta- level microsyntaxes live somewhere that is trivial to ignore rather than having to determine if they're real syntax or serialization level annotations.

I could also see the functionality being implemented with @documentation that takes a URI as value. Then humans can go and read the docs, and machines can do conneg to get something useful to them.

@hsolbrig
Copy link

Comments can also serve as extension points -- either explicit such as appinfo in XML Schema or by agreed-upon inner syntax, as exemplified by decorators in Java or quoted strings in python. I wonder whether we couldn't kill both birds (keeping the namespace clean versions and allowing extensibility) by allowing any key starting with '@@'?

One might also note that YAML supports comments.

@hsolbrig
Copy link

+1 on azaroth42's suggestion on attaching a URI to documentation -- it would prevent micro-context collision.

@iherman
Copy link
Member

iherman commented Jul 31, 2018

I continue to have negative feelings about comments in JSON-LD.

Comments in probably all other languages, including the RDF serialization syntaxes, are syntactically very different from the rest to avoid humans mixing them up with genuine language constructions. Hence the usage of characters like #, /*, etc. The proposal coming up here is to stay within the framework of the JSON syntax, introduce a language construction for something that, for all goods and purposes, should not be part of the language. I think that may become the source of lots of confusions and misunderstandings.

This issue is not "ours". It is up to the JSON community to solve. After all, we did not invent our own syntax for JSON-LD; we piggybacked it instead on top of an existing syntax. We have to live with the consequences of that; after all, missing comments are not the only nuisances of JSON (anyone else every forgot to remove a trailing , in an array?).

I am also a bit averse to have such a difference in the syntax used for @context and for the rest of the data. That may become an extra source of confusion. We do have a resolution (that I agree with) not to introduce a specific comment syntax for data; I do not see a reason to allow this for contexts.

Bottom line: -1 for me on this issue.

@iherman
Copy link
Member

iherman commented Jul 31, 2018

One might also note that YAML supports comments.

... and JSON-LD, per our charter, should be automatically available for YAML as well. Maybe we should consider whether JSON-LD processors would accept YAML surface syntax as well, so that people could write context files using YAML... (no, I am not really serious with that idea).

@goofballLogic
Copy link

-1

We need comments, but this is a JSON problem which we shouldn't attempt to solve with a JSON-LD hack.

I'd be happier with some other keyword indicating a description because developers may wrongly assume (or may wish) that @comment has no meaning to plain-JSON processors

@msporny
Copy link
Member

msporny commented Jul 31, 2018

If we don't define it, then thousands of implementers will pick something different to use.

We've had many years of JSON-LD so far... how many developers are dumping comments into their JSON-LD?

If this is a problem, we should see many attempts at a workaround by this point. We should have some data to back up the need for this feature. It's too easy to add features because a very tiny group of people (the JSON-LD WG members) think it might be useful.

@mixterj
Copy link

mixterj commented Aug 1, 2018

-1 until there is a more strict scoping of this issue. The initial example seems to suggest that the unit of measure was intended for someone or something to be able to process the JSON and derive context for the measurement - that it was in Meters. That is very different than simple adding a comment for comments sake like "I have no idea why there are so many curly braces here". To @azaroth42 point about other RDF Syntaxes allowing for comments - this is true but they are only ever intended for human reading. JSON is not typically a human consumed data format so I do not see what comments would be necessary - unlike in Turtle where you might be reading through it like a document. The issue for me comes down to - is this important for JSON processing or RDF processing (from JSON-ld). If it is the former than I think it is out of scope due to the points made by @iherman and @msporny. But if it is the latter than I am more willing to consider it but would still have strong reservations based on the popular tendency to process JSON-ld strictly as JSON and not as RDF.

@azaroth42 if there is a use case for something like 'authorship' or 'last updated' that too could be modeled in RDF. It is a bit funky but we do it in Library Land all the time with Authorities. See this FAST example - http://experimental.worldcat.org/fast/958235/rdf.xml. There is a document that can have a creator and creation data that points to the Intangible thing that it focuses on. Again, I am not advocating for this approach but I think it is cleaner modeling if 'creator' or 'date modified' information is important.

@jechterhoff
Copy link
Author

Some clarification: the uom in my example was really just to see what is allowed. I am happy to exclude this from the current discussion of introducing a new keyword in context documents only, for human readable consumption.

The early discussion of this issue quickly led to the proposal of having a JSON-LD keyword in a JSON-LD context that is only meant to provide human readable information (in the example I used @derivedBy, then the suggestion of @comment came up, which would be just as fine). The keyword is only intended to be read and used by human developers that need to understand a context document - especially if the developer was not the one who created that document. We have a case where it is not obvious why a specific term is defined in the context as it is. The additional keyword would be used to provide clarification.
It is a pity that JSON itself does not support comments. I am not sure if comments will ever be introduced in the JSON standard. A new JSON-LD keyword, used only in @context documents, would help us. If the name "comment" is difficult, then any other name is fine.

@ajs6f
Copy link
Member

ajs6f commented Aug 1, 2018

To @azaroth42 point about other RDF Syntaxes allowing for comments - this is true but they are only ever intended for human reading.

I'm not sure this is so at all. NTriples are about the most machine-centric and least human-readable RDF format you can find and they have have perfectly usable comments.

@mixterj
Copy link

mixterj commented Aug 1, 2018

@ajs6f yes, I take that point. I would argue that N-triples are more human readable than JSON-ld simply because it is just a giant document of, in essence, statements (rudimentary sentences) where as JSON-ld is a mess of curly braces and brackets that boggle the mind - but that would just be for the sake of arguing ;)

My main points were that:

  • decisions on comments should be carried/driven by the syntax (JSON) not the specific application of the syntax (JSON-ld).

  • I have concerns that comments in JSON-ld might be used for processing of the data, which would not be appropriate. This is particularly true if the comments were processable by a JSON parser regardless of whether a JSON-ld parser ignored them. Though this does NOT sound like it is a valid concern based on @jechterhoff clarification.

So, I am fine with human readable only comments but I do not think it should be the charge of this group to make that decision.

So I am still a -1 right now

@BigBlueHat
Copy link
Member

I think the proposal of a @comment property only for use in @context objects is not unwarranted. However, I would be concerned that developers would expect it to have meaning and value within a JSON-LD document's "data" space.

In a @context object, my presumption would be that it would relate to the object in which it was used (assuming that an object was used and not just a property/URI pair). Since the @context object is not endlessly hierarchical, then @comment would seem sensible as a property of @context and of any immediate child object of any property.

However, just writing this gives me fears that @comment would begin to be peppered throughout the JSON-LD document space with a wide range of uses for both machines and humans, but in a way that would essentially be para-structural and certainly colloquial.

Sadly, it also doesn't seem to ultimately solve JSON's own lack of commenting constructs because (for all the same reasons), doing that right (and without polluting the data/graph) simply can't be done within a document which only defines a data structure sans comments--i.e. JSON.

tl;dr We should not add comments to JSON-LD for the same reason they were taken out of JSON:

I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.
https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaGSr
+ Hacker News commentary at https://news.ycombinator.com/item?id=3912149

@ajs6f
Copy link
Member

ajs6f commented Aug 1, 2018

decisions on comments should be carried/driven by the syntax (JSON) not the specific application of the syntax (JSON-ld).

Perhaps +1 folks are thinking of the @context as JSON-LD syntax (with JSON-LD semantics) that, like all JSON-LD syntax, is intentionally entirely within JSON syntax, and -1 folks are thinking of the @context as JSON syntax with JSON-LD semantics?

@tcole3
Copy link

tcole3 commented Aug 3, 2018

I see ample use cases for wanting to provide information (i.e., context, pardon the semantic collision) about json-ld context and framing documents. This said, @comment is misleading since for developers (as mentioned) it implies an ability to make multiple comment(s), each about specific lines of code, and to anchor such comments next to these lines of code, none of which is doable json. So, I am -1 to @comment, but I would be open some other key more narrowly defined to provide information at document level about a context or framing document of potential value to humans.

In particular, I am +1 for @documentation (as suggested by @azaroth42) assuming @documentation has a range of URL and is scoped only to context and framing documents. This would help humans understand more about the source / rationale of the particular context of framing document version being used, something I think would be useful as contexts and framing documents multiply.

A possible limitation of this approach, of course, is that if an instance references multiple context documents, the last @documentation encountered would likely be given priority, which may or may not be the result intended.

@ajs6f
Copy link
Member

ajs6f commented Aug 3, 2018

the last @documentation encountered would likely be given priority, which may or may not be the result intended.

I think this problem might best be solved on the other end of the proposed @documentation link; whatever human- (and hopefully machine-) readable content is on the other end could aggregate other content as appropriate and in some content type -specific way..

@azaroth42
Copy link
Contributor

WG Resolutions:

  • We won't add an @comment or similar field to contexts that takes arbitrary constructs intended for comments.
  • We will discuss other proposals for how to reference or embed documentation for the context

Actions before 2018-08-10:

  • @tcole3, @azaroth42: Write up a proposal for a keyword that links to an external resource
  • @BigBlueHat: Write up a proposal to recommend content negotiation on the context document
  • @BigBlueHat, @gkellogg: Write up a proposal to process JSON-LD embedded in HTML as a context document
  • @hsolbrig: Write up a proposal for additional syntax for inline fields
  • Everyone: Comment on the proposals

@azaroth42
Copy link
Contributor

Closing this issue in favor of the further proposals.
WG Resolution: https://www.w3.org/2018/json-ld-wg/Meetings/Minutes/2018/2018-08-03-json-ld#resolution6

@ghost ghost removed the needs discussion label Aug 4, 2018
@msporny
Copy link
Member

msporny commented Aug 4, 2018

We won't add an @comment or similar field to contexts that takes arbitrary constructs intended for comments.

+1

We will discuss other proposals for how to reference or embed documentation for the context

+0.5 -- suggest the pointer is to a specification of vocabulary document. We could skip this by ensuring that the construction of the JSON-LD Context URl should provide the vocabulary document one level up as a design pattern. For example:

JSON-LD Context: https://w3id.org/security/v1
Documentation (or redirect to documentation): https://w3id.org/security

If we write a normative rule on discovery of JSON-LD Context documentation that does the above, we avoid the need to create yet another reserved word /and/ solve the use case.

@BigBlueHat, @gkellogg: Write up a proposal to process JSON-LD embedded in HTML as a context document

Note the many ways this could result in security issues (thinking of financial sector use cases where sender/receiver are switched by a JSON-LD context override and other digital signature shenanigans, specifically).

@gkellogg
Copy link
Member

gkellogg commented Aug 4, 2018 via email

@msporny
Copy link
Member

msporny commented Aug 5, 2018

It would be great if you can elaborate on this when the issue gets created. It would seem to be quite similar to content-negotiation re-using the URL for both documentation and context purposes, but with the context embedded in the HTML.

The issue that jumps at me first is one of user generated context. What happens if there are two contexts in an HTML page... one that the website author intended... and another that was injected via user generated content. Which one do you use? What if you can get a context through conneg, but the contexts don't match, which one do you prefer. I don't think these are insurmountable problems... but we should do a security analysis based on using these mechanisms to create digital signatures as that's where the attacks get dangerously interesting.

@BigBlueHat
Copy link
Member

@msporny when you say "user generated content" are you thinking JSON-LD context documents added via a CMS? or are you referring to some JavaScript-built user generated inline context content? Trying to narrow the scope as I start writing these. 😄

@msporny
Copy link
Member

msporny commented Aug 6, 2018

@msporny when you say "user generated content" are you thinking JSON-LD context documents added via a CMS?

Yes, this one.

or are you referring to some JavaScript-built user generated inline context content?

I hadn't considered this one... a bit less concerned about it... but it is an attack vector if a client-side JSON-LD tool ever gained traction. For example... LastPass sniffs and modifies every single password page I hit. If it extracted JSON-LD to do stuff, there might be an attack vector there. I think browser extensions are sandboxed from each other, except for the page content they're working on... they share that, making it possible for an attacker to inject something onto the page that makes it possible to pwn another extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests