RDFa and JSON-LD are not equivalent #78

iherman · 2017-10-10T14:21:54Z

RDFa and JSON-LD are both serializations of RDF. What it means that, when converted to RDF, both conversion results should produce equivalent graphs.

However... this does not seem to be the case. At least the way I read it

JSON-LD has a top level items property, which yields, in RDF one subject (a blank node, actually) which has a number of <items> _:XYZ pairs, where _:XYZ are blank nodes with the content coming from a specific itemscope
RDFa yields a number _:XYZ triplets without any common subjects binding them together.

This can be easily solved. Either

The JSON-LD structure uses a top level @graph construct which can be used to specify a number of more or less independent group of triples with common subjects
The RDFa version is extended by an artificial HTML code providing the equivalent of the JSON-LD items

I am more in favour of the first approach to solve this, but the second one is also a solution.

(As an aside, the JSON-LD example is incomplete, there is no @context.)

Cc: @gkellogg

The text was updated successfully, but these errors were encountered:

gkellogg · 2017-10-10T23:07:22Z

Ivan is correct, for the items property to not create a blank node, the context must include "items": "@graph", or, the generation of the items element should simply be eliminated, and replaced with "@graph", which is what I did in my implementation.

My implementation produces the following output for the example in the spec:

{
  "@graph": [
    {
      "@context": {
        "@vocab": "https://schema.org/"
      },
      "@type": [
        "https://schema.org/BlogPosting"
      ],
      "headline": [
        "Progress report"
      ],
      "url": [
        {
          "@id": "http://example.com?comments=0"
        }
      ],
      "comment": [
        {
          "@context": {
            "@vocab": "https://schema.org/"
          },
          "@type": [
            "https://schema.org/Comment"
          ],
          "url": [
            {
              "@id": "http://example.com#c1"
            }
          ],
          "creator": [
            {
              "@context": {
                "@vocab": "https://schema.org/"
              },
              "@type": [
                "https://schema.org/Person"
              ],
              "name": [
                "Greg"
              ]
            }
          ],
          "dateCreated": [
            "2013-08-29"
          ]
        }
      ],
      "datePublished": [
        "2013-08-29"
      ]
    }
  ]
}

chaals · 2017-10-10T23:20:26Z

It looks like the example was just copy-pasted from the plain JSON conversion, and is really wrong :(

So it looks like step 4 of the algorithm should say

Add an entry to result called "@graph" whose value is the array items.

@gkellogg you seem to have done the @context step differently, and instead of making it a top-level entry in items added

@context : {
    @vocab :  _vocabulary-identifier_
}

to each item.

Does that matter in getting the JSON-LD right? (I think at first glance the answer is no)
Would doing the existing step 5, but with @vocab set to the vocabulary-identifier change the resulting graph? (Again, I think the answer is no)

Seem like we should also clarify that the algorithm is not normatively required to be followed exactly, but that a conversion should produce JSON-LD that represents the same RDF graph, probably at the beginning of the JSON-LD conversion section.

gkellogg · 2017-10-10T23:39:25Z

Setting @context at the top means that if the vocabulary changes along the way, you won't pick it up. Setting it for each item, while repetitions, does make sure that the object for each item is picked up if it happens to change.

I believe that setting "@vocab": _vocabulary-identifier_ (when properly quoted) is don't what I do.

chaals · 2017-10-11T00:08:25Z

Setting @context at the top means that if the vocabulary changes along the way, you won't pick it up. Setting it for each item, while repetitions, does make sure that the object for each item is picked up if it happens to change.

But in microdata, my first instinct is that it cannot change anyway, in which case this protection is unnecessary. (Wondering if I missed something)

I believe that setting "@vocab": vocabulary-identifier (when properly quoted) is don't what I do.

vocabulary-identifier is (meant to be) the URL path for the (first) itemtype. Are you getting something different to use as a value? (I don't understand "don't" in the sentence above)

gkellogg · 2017-10-11T00:17:12Z

But in microdata, my first instinct is that it cannot change anyway, in which case this protection is unnecessary. (Wondering if I missed something)

In section 4.3, the vocabulary identifier is determined based on itemtype, which can certainly change when a new itemscope is encountered. Otherwise, you couldn't use properties from different vocabularies. Certainly, the RDFa algorithm is written expecting that the vocabulary identifier can change with each item, which is my read of 4.3 and 5.2:

The item types determine the vocabulary identifier.

I'm not sure where you see that the vocabulary-identifier is only associated with the first itemtype.

iherman · 2017-10-11T06:35:37Z

Looking at the example of @gkellogg's generation tells me that there is a need for a specific @context file for microdata to make the output simpler and more readable. Lines like

     "url": [
        {
          "@id": "http://example.com?comments=0"
        }
      ],

could then disappear behind the smoke screen of a context. At present, without such a context file, the generated JSON-LD does not seem to be correct...

iherman · 2017-10-11T06:38:27Z

@chaals @gkellogg

But in microdata, my first instinct is that it cannot change anyway, in which case this protection is unnecessary. (Wondering if I missed something)

In section 4.3, the vocabulary identifier is determined based on itemtype, which can certainly change when a new itemscope is encountered.

That is correct, but I believe the comment of @chaals referred to the @context. While I agree that the vocabulary may change, I guess we can safely say that the @context will not...

gkellogg · 2017-10-11T16:11:07Z

For the vocabulary to track, and properties to be properly expanded, the @context must change with each item, at least for each item where the current vocabulary changes, otherwise, results will be quite different.

Consider a mixed use of DCMI and schema.org, if the context (and @vocab) is set for one, then items that use an @itermtype from the other will no longer come up as part of that vocabulary.

iherman · 2017-10-12T13:45:12Z

Ah, true.

Would it be an over-complication in the document if the algorithm includes a check on whether @context and/or @vocab is necessary (as far as I can see a simple check on the DOM tree would be sufficient)?

gkellogg · 2017-10-12T22:58:49Z

Simplest, of course, is to just emit an @context/@vocab for each item, which doesn't require history. (Note that @vocab can only be used within @context). But, it might be done by passing current-vocabulary into the algorithm, and editing the @context/@vocab whenever the vocabulary identifier differs from that passed into the algorithm, and include it as part of step 7.2.

iherman · 2017-10-13T06:51:14Z

Exactly. I would propose to add that to the algorithm. Because we are generating a proper and, potentially, human readable serialization of RDF (as opposed to an abstract RDF Graph), I think such seemingly minor improvements would help acceptance and deployment.

chaals · 2017-11-11T20:19:26Z

Following discussion at TPAC, our proposal is to remove the JSON-LD conversion - it is just a convenience, and so long as you generate a minimal graph that is the same when converting to an RDF format it doesn't matter much which format you use...

And I apparently made fewer mistakes in creating the first draft RDFa conversion, so we will stick to that. It is also closer in the problem space it covers.

Fix #78, #80 The JSON-LD direct conversion algorithm is harder to get right, and is redundant in practice. Clarify that the "JSON" conversion is explicitly for `application/microdata+json` and mark it as obsolete but conforming since it seems to have been removed from current versions of known implementations. (And update status)

chaals mentioned this issue Nov 12, 2017

Remove JSON-LD, clarify JSON conversion #82

Merged

danbri closed this as completed in #82 Nov 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDFa and JSON-LD are not equivalent #78

RDFa and JSON-LD are not equivalent #78

iherman commented Oct 10, 2017 •

edited

gkellogg commented Oct 10, 2017

chaals commented Oct 10, 2017

gkellogg commented Oct 10, 2017

chaals commented Oct 11, 2017 •

edited

gkellogg commented Oct 11, 2017

iherman commented Oct 11, 2017

iherman commented Oct 11, 2017

gkellogg commented Oct 11, 2017

iherman commented Oct 12, 2017

gkellogg commented Oct 12, 2017

iherman commented Oct 13, 2017

chaals commented Nov 11, 2017

RDFa and JSON-LD are not equivalent #78

RDFa and JSON-LD are not equivalent #78

Comments

iherman commented Oct 10, 2017 • edited

gkellogg commented Oct 10, 2017

chaals commented Oct 10, 2017

gkellogg commented Oct 10, 2017

chaals commented Oct 11, 2017 • edited

gkellogg commented Oct 11, 2017

iherman commented Oct 11, 2017

iherman commented Oct 11, 2017

gkellogg commented Oct 11, 2017

iherman commented Oct 12, 2017

gkellogg commented Oct 12, 2017

iherman commented Oct 13, 2017

chaals commented Nov 11, 2017

iherman commented Oct 10, 2017 •

edited

chaals commented Oct 11, 2017 •

edited