Skip to content
This repository has been archived by the owner on Nov 11, 2019. It is now read-only.

RDFa and JSON-LD are not equivalent #78

Closed
iherman opened this issue Oct 10, 2017 · 12 comments
Closed

RDFa and JSON-LD are not equivalent #78

iherman opened this issue Oct 10, 2017 · 12 comments

Comments

@iherman
Copy link
Member

iherman commented Oct 10, 2017

RDFa and JSON-LD are both serializations of RDF. What it means that, when converted to RDF, both conversion results should produce equivalent graphs.

However... this does not seem to be the case. At least the way I read it

  • JSON-LD has a top level items property, which yields, in RDF one subject (a blank node, actually) which has a number of <items> _:XYZ pairs, where _:XYZ are blank nodes with the content coming from a specific itemscope
  • RDFa yields a number _:XYZ triplets without any common subjects binding them together.

This can be easily solved. Either

  • The JSON-LD structure uses a top level @graph construct which can be used to specify a number of more or less independent group of triples with common subjects
  • The RDFa version is extended by an artificial HTML code providing the equivalent of the JSON-LD items

I am more in favour of the first approach to solve this, but the second one is also a solution.

(As an aside, the JSON-LD example is incomplete, there is no @context.)

Cc: @gkellogg

@gkellogg
Copy link
Member

Ivan is correct, for the items property to not create a blank node, the context must include "items": "@graph", or, the generation of the items element should simply be eliminated, and replaced with "@graph", which is what I did in my implementation.

My implementation produces the following output for the example in the spec:

{
  "@graph": [
    {
      "@context": {
        "@vocab": "https://schema.org/"
      },
      "@type": [
        "https://schema.org/BlogPosting"
      ],
      "headline": [
        "Progress report"
      ],
      "url": [
        {
          "@id": "http://example.com?comments=0"
        }
      ],
      "comment": [
        {
          "@context": {
            "@vocab": "https://schema.org/"
          },
          "@type": [
            "https://schema.org/Comment"
          ],
          "url": [
            {
              "@id": "http://example.com#c1"
            }
          ],
          "creator": [
            {
              "@context": {
                "@vocab": "https://schema.org/"
              },
              "@type": [
                "https://schema.org/Person"
              ],
              "name": [
                "Greg"
              ]
            }
          ],
          "dateCreated": [
            "2013-08-29"
          ]
        }
      ],
      "datePublished": [
        "2013-08-29"
      ]
    }
  ]
}

@chaals
Copy link
Collaborator

chaals commented Oct 10, 2017

It looks like the example was just copy-pasted from the plain JSON conversion, and is really wrong :(

So it looks like step 4 of the algorithm should say

Add an entry to result called "@graph" whose value is the array items.

@gkellogg you seem to have done the @context step differently, and instead of making it a top-level entry in items added

@context : {
    @vocab :  _vocabulary-identifier_
}  

to each item.

  • Does that matter in getting the JSON-LD right? (I think at first glance the answer is no)
  • Would doing the existing step 5, but with @vocab set to the vocabulary-identifier change the resulting graph? (Again, I think the answer is no)

Seem like we should also clarify that the algorithm is not normatively required to be followed exactly, but that a conversion should produce JSON-LD that represents the same RDF graph, probably at the beginning of the JSON-LD conversion section.

@gkellogg
Copy link
Member

Setting @context at the top means that if the vocabulary changes along the way, you won't pick it up. Setting it for each item, while repetitions, does make sure that the object for each item is picked up if it happens to change.

I believe that setting "@vocab": _vocabulary-identifier_ (when properly quoted) is don't what I do.

@chaals
Copy link
Collaborator

chaals commented Oct 11, 2017

Setting @context at the top means that if the vocabulary changes along the way, you won't pick it up. Setting it for each item, while repetitions, does make sure that the object for each item is picked up if it happens to change.

But in microdata, my first instinct is that it cannot change anyway, in which case this protection is unnecessary. (Wondering if I missed something)

I believe that setting "@vocab": vocabulary-identifier (when properly quoted) is don't what I do.

vocabulary-identifier is (meant to be) the URL path for the (first) itemtype. Are you getting something different to use as a value? (I don't understand "don't" in the sentence above)

@gkellogg
Copy link
Member

But in microdata, my first instinct is that it cannot change anyway, in which case this protection is unnecessary. (Wondering if I missed something)

In section 4.3, the vocabulary identifier is determined based on itemtype, which can certainly change when a new itemscope is encountered. Otherwise, you couldn't use properties from different vocabularies. Certainly, the RDFa algorithm is written expecting that the vocabulary identifier can change with each item, which is my read of 4.3 and 5.2:

The item types determine the vocabulary identifier.

I'm not sure where you see that the vocabulary-identifier is only associated with the first itemtype.

@iherman
Copy link
Member Author

iherman commented Oct 11, 2017

Looking at the example of @gkellogg's generation tells me that there is a need for a specific @context file for microdata to make the output simpler and more readable. Lines like

     "url": [
        {
          "@id": "http://example.com?comments=0"
        }
      ],

could then disappear behind the smoke screen of a context. At present, without such a context file, the generated JSON-LD does not seem to be correct...

@iherman
Copy link
Member Author

iherman commented Oct 11, 2017

@chaals @gkellogg

But in microdata, my first instinct is that it cannot change anyway, in which case this protection is unnecessary. (Wondering if I missed something)

In section 4.3, the vocabulary identifier is determined based on itemtype, which can certainly change when a new itemscope is encountered.

That is correct, but I believe the comment of @chaals referred to the @context. While I agree that the vocabulary may change, I guess we can safely say that the @context will not...

@gkellogg
Copy link
Member

For the vocabulary to track, and properties to be properly expanded, the @context must change with each item, at least for each item where the current vocabulary changes, otherwise, results will be quite different.

Consider a mixed use of DCMI and schema.org, if the context (and @vocab) is set for one, then items that use an @itermtype from the other will no longer come up as part of that vocabulary.

@iherman
Copy link
Member Author

iherman commented Oct 12, 2017

Ah, true.

Would it be an over-complication in the document if the algorithm includes a check on whether @context and/or @vocab is necessary (as far as I can see a simple check on the DOM tree would be sufficient)?

@gkellogg
Copy link
Member

Simplest, of course, is to just emit an @context/@vocab for each item, which doesn't require history. (Note that @vocab can only be used within @context). But, it might be done by passing current-vocabulary into the algorithm, and editing the @context/@vocab whenever the vocabulary identifier differs from that passed into the algorithm, and include it as part of step 7.2.

@iherman
Copy link
Member Author

iherman commented Oct 13, 2017

Exactly. I would propose to add that to the algorithm. Because we are generating a proper and, potentially, human readable serialization of RDF (as opposed to an abstract RDF Graph), I think such seemingly minor improvements would help acceptance and deployment.

@chaals
Copy link
Collaborator

chaals commented Nov 11, 2017

Following discussion at TPAC, our proposal is to remove the JSON-LD conversion - it is just a convenience, and so long as you generate a minimal graph that is the same when converting to an RDF format it doesn't matter much which format you use...

And I apparently made fewer mistakes in creating the first draft RDFa conversion, so we will stick to that. It is also closer in the problem space it covers.

chaals pushed a commit that referenced this issue Nov 12, 2017
Fix #78, #80

The JSON-LD direct conversion algorithm is harder to get right, and is
redundant in practice.

Clarify that the "JSON" conversion is explicitly for
`application/microdata+json` and mark it as obsolete but conforming
since it seems to have been removed from current versions of known
implementations.

(And update status)
danbri pushed a commit that referenced this issue Nov 23, 2017
Fix #78, #80

The JSON-LD direct conversion algorithm is harder to get right, and is
redundant in practice.

Clarify that the "JSON" conversion is explicitly for
`application/microdata+json` and mark it as obsolete but conforming
since it seems to have been removed from current versions of known
implementations.

(And update status)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants