add JSON-LD conversion #76

chaals · 2017-07-20T11:20:48Z

Note this just uses the changes @lanthaler said were needed back in
2012, so it should be checked before merging.

@lanthaler

fix #29 Note this just uses the changes @lanthaler said were needed back in 2012, so it should be checked before merging.

chaals · 2017-07-20T11:21:58Z

@msporny, @halindrome @gkellogg can you take a quick look?

was generating two definitions with the same name / identifiers

msporny

I only had time to review the raw commits. I don't know if the algorithm is right w/o implementing it. The output JSON-LD looks a bit off, and I made suggestions on changing that. It feels like you may need a Microdata JSON-LD context that at least contains aliases, and things like items and properties.

msporny · 2017-07-20T13:15:58Z

index.html

+   value is the array <var>items</var>.</p></li>
+
+   <li><p>Add an entry to <var>result</var> called "<code>@context</code>" whose value is the following object:</p>
+     <pre class="html">{ "@vocab" : "" }</pre></li>


I don't understand what the injection of this HTML does. Also, if you're converting to JSON-LD, you'll need to specify an @context in the converted JSON-LD. In that context, you might want to alias @id to id and @type to type to make developers lives easier.

Thinking about this a bit more, perhaps it should be:

'@context': 'https://www.w3.org/ns/microdata/v1'

or maybe this:

'@context': '**https://schema.org/microdata/v1'

@danbri, thoughts?

I don't thing aliasing in the default output is particularly useful, after all, it can always be re-compacted using a different context after the fact.

msporny · 2017-07-20T13:16:40Z

index.html

+<h3>JSON-LD</h3>
+
+  <p>Given a list of nodes <var>nodes</var> in a <code>Document</code>, a user agent must
+  run the following algorithm to extract the Microdata from those nodes into a JSON-LD:</p>


maybe change to "convert the Microdata from those nodes into a JSON-LD representation":

msporny · 2017-07-20T13:19:48Z

index.html

+
+  </ol>
+
+  <p class="note">This algorithm returns an object with a single property that is an array, instead


If this is the case, put the @context at the very top of this object.

msporny · 2017-07-20T13:20:48Z

index.html

+   <li><p>Add <var>item</var> to <var>memory</var>.</p></li>
+
+   <li><p>If the <var>item</var> has any <a>item types</a>, add an entry to <var>result</var>
+   called "<code>@type</code>" whose value is an array listing the


You may want to alias @type to type to make JS developers lives easier. So they can do obj.type instead of obj['@type']... the latter creates sad pandas.

Keep it simple; no need to be opinionated in the flavor of the output here, IMO.

In fact, you probably need to create an item-level @context, and set @vocab to the vocabulary identifier for the item; otherwise, the properties will not expand to IRIs and will be ignored.

msporny · 2017-07-20T13:21:06Z

index.html

+   <code><a>itemtype</a></code> attribute.</p>
+
+   </li><li><p>If the <var>item</var> has a <a>global identifier</a>, add an entry to
+   <var>result</var> called "<code>@id</code>" whose value is the <a>global


Same here... maybe alias @id to id.

msporny · 2017-07-20T13:22:37Z

index.html

+
+     <li><p>If <var>value</var> is an <a data-lt="concept item">item</a>, then:
+     If <var>value</var> is in <var>memory</var>, then let <var>value</var> be
+     the string "<code>ERROR</code>". Otherwise, <a>get the object</a> for


It feels strange to set a value to the string ERROR... feels like you should throw an exception or do something less extreme than throwing but more extreme than returning an 'ERROR' string.

Actually, in this case, shouldn't it be a node reference? After all, the value is an item, and we should just reference that item.

msporny · 2017-07-20T13:23:12Z

index.html

+   URL was <code>http://blog.example.com/progress-report</code>):</p>
+
+   <pre>{
+  "items": [


If this is supposed to be a JSON-LD object, you're missing a @context.

gkellogg

I'll take a crack at implementing, but as is, this will not generate anything close the the same output as the RDFa mechanism. Also, parsed values (datetime, object, meter, etc.) are not extracted properly. This could potentially be done by adding to an item-level @context to keep the actual generated JSON cleaner.

gkellogg · 2017-07-20T21:36:21Z

index.html

+   <a>top-level Microdata item</a>, and if it is then <a>get the object</a> for that element and add it to <var>items</var>.</p></li>
+
+   <li><p>Add an entry to <var>result</var> called "<code>items</code>" whose
+   value is the array <var>items</var>.</p></li>


Having a top-level entry "items" is not useful, unless "items" is aliased to "@graph". "@graph" is used to contain top-level node definitions in JSON-LD. But, if there is no top-level @context (see note elsewhere), then perhaps result is simply an array.

gkellogg · 2017-07-20T21:38:46Z

index.html

+   <li><p>Add <var>item</var> to <var>memory</var>.</p></li>
+
+   <li><p>If the <var>item</var> has any <a>item types</a>, add an entry to <var>result</var>
+   called "<code>@type</code>" whose value is an array listing the


In fact, you probably need to create an item-level @context, and set @vocab to the vocabulary identifier for the item; otherwise, the properties will not expand to IRIs and will be ignored.

gkellogg · 2017-07-20T21:41:59Z

index.html

+       <li><p>If there is no entry named <var>name</var> in <var>result</var>,
+       then add an entry named <var>name</var> to <var>result</var> whose
+       value is an empty array.</p></li>
+


This looses the fact that the value may be a URL relationship, which should be encoded as {"@id": "..."}; alternatively, to simplify the syntax, and entry in the item-specific @context could be created for name, with {"@type": "@id"}.

gkellogg · 2017-07-20T21:44:19Z

cc/ @niklasl

gkellogg · 2017-07-20T23:49:22Z

I have an implementation alongside my normal, and RDFa-based Microdata parser. It pretty much follows the algorithm, with the comments I've previously made.

The property_value id dumbed-down from that used in the native RDF parser, so it could relatively easily obtain datatypes for values and parse numbers.

It allocates an @id using a generated BNode identifier when it encounters a reference to another item that doesn't already have an @id, and allows this to be used when an item is already found in memory.

It creates an item-level @context containing @vocab when it finds a local vocabulary for an item. It probably wouldn't be difficult to avoid this when the parent of the item has the same vocabulary.

It always uses @graph, but this could be optimized in case there is only a single top-level item.

It infers language and base-URL by introspection into the DOM.

Edit: it also trims whitespace around values, which at least makes the output look a bit better.

gkellogg · 2017-07-20T23:53:12Z

For consideration, here's my output to your example

{
  "@graph": [
    {
      "@context": {"@vocab": "https://schema.org/"},
      "@type": ["https://schema.org/BlogPosting"],
      "headline": ["Progress report"],
      "url": [{"@id": "http://example.com?comments=0"}],
      "comment": [
        {
          "@context": {"@vocab": "https://schema.org/"},
          "@type": ["https://schema.org/Comment"],
          "url": [{"@id": "http://example.com#c1"}
          ],
          "creator": [
            {
              "@context": {"@vocab": "https://schema.org/"},
              "@type": ["https://schema.org/Person"],
              "name": ["Greg"]
            }
          ],
          "dateCreated": ["2013-08-29"]
        }
      ],
      "datePublished": ["2013-08-29"]
    }
  ]
}

chaals · 2017-07-21T23:10:34Z

Quick reaction:My sincere thanks @gkellogg @msporny for the work and comments. The one that struck me even before asking was the vocab but it looks liike there is more to deal with. Given that microdata only has two datatypes (string and URL), I'm going to leave it that you can - and probably should in a grown-up format - infer that for example a datetime is actually a date, without requiring that in the microdata spec. I'm pushing to get this done ASAP, but it's not the only priority in the world, so I hope I will have sensible replies over the weekend.

danbri

As discussed, this looks plausible and solid to me and ought to go into the spec for implementator sanity checking.

danbri · 2017-09-06T19:58:27Z

On the context question, if the generated output were to always explicitly mark literals as ID or literal/text, would that minimize or remove the need for context declarations?

gkellogg · 2017-09-06T22:43:55Z

Yes, you can always generate expanded nodes/literal. However, if you know enough to use "@vocab": "http://scheme.org", you can also take advantage of using as a contex, and assume all of its term definitions.

add JSON-LD conversion

42f0364

fix #29 Note this just uses the changes @lanthaler said were needed back in 2012, so it should be checked before merging.

chaals requested a review from danbri July 20, 2017 11:21

chaals mentioned this pull request Jul 20, 2017

No description of how numeric property values are obtained. #62

Closed

fix respect issue

50fa4dc

was generating two definitions with the same name / identifiers

msporny suggested changes Jul 20, 2017

View reviewed changes

gkellogg suggested changes Jul 20, 2017

View reviewed changes

danbri approved these changes Sep 6, 2017

View reviewed changes

danbri merged commit 72d85b8 into gh-pages Sep 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add JSON-LD conversion #76

add JSON-LD conversion #76

chaals commented Jul 20, 2017

chaals commented Jul 20, 2017

msporny left a comment

msporny Jul 20, 2017

msporny Jul 20, 2017

gkellogg Jul 20, 2017

msporny Jul 20, 2017

msporny Jul 20, 2017

msporny Jul 20, 2017

gkellogg Jul 20, 2017

gkellogg Jul 20, 2017

msporny Jul 20, 2017

msporny Jul 20, 2017

gkellogg Jul 20, 2017

msporny Jul 20, 2017

gkellogg left a comment

gkellogg Jul 20, 2017

gkellogg Jul 20, 2017

gkellogg Jul 20, 2017

gkellogg commented Jul 20, 2017

gkellogg commented Jul 20, 2017 •

edited

Loading

gkellogg commented Jul 20, 2017

chaals commented Jul 21, 2017

danbri left a comment

danbri commented Sep 6, 2017

gkellogg commented Sep 6, 2017


		</ol>

		<p class="note">This algorithm returns an object with a single property that is an array, instead

add JSON-LD conversion #76

add JSON-LD conversion #76

Conversation

chaals commented Jul 20, 2017

chaals commented Jul 20, 2017

msporny left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkellogg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkellogg commented Jul 20, 2017

gkellogg commented Jul 20, 2017 • edited Loading

gkellogg commented Jul 20, 2017

chaals commented Jul 21, 2017

danbri left a comment

Choose a reason for hiding this comment

danbri commented Sep 6, 2017

gkellogg commented Sep 6, 2017

gkellogg commented Jul 20, 2017 •

edited

Loading