Skip to content
This repository has been archived by the owner on Nov 11, 2019. It is now read-only.

add JSON-LD conversion #76

Merged
merged 2 commits into from
Sep 6, 2017
Merged

add JSON-LD conversion #76

merged 2 commits into from
Sep 6, 2017

Conversation

chaals
Copy link
Collaborator

@chaals chaals commented Jul 20, 2017

fix #29

Note this just uses the changes @lanthaler said were needed back in
2012, so it should be checked before merging.

fix #29

Note this just uses the changes @lanthaler said were needed back in
2012, so it should be checked before merging.
@chaals chaals requested a review from danbri July 20, 2017 11:21
@chaals
Copy link
Collaborator Author

chaals commented Jul 20, 2017

@msporny, @halindrome @gkellogg can you take a quick look?

was generating two definitions with the same name / identifiers
Copy link
Member

@msporny msporny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only had time to review the raw commits. I don't know if the algorithm is right w/o implementing it. The output JSON-LD looks a bit off, and I made suggestions on changing that. It feels like you may need a Microdata JSON-LD context that at least contains aliases, and things like items and properties.

value is the array <var>items</var>.</p></li>

<li><p>Add an entry to <var>result</var> called "<code>@context</code>" whose value is the following object:</p>
<pre class="html">{ "@vocab" : "" }</pre></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what the injection of this HTML does. Also, if you're converting to JSON-LD, you'll need to specify an @context in the converted JSON-LD. In that context, you might want to alias @id to id and @type to type to make developers lives easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more, perhaps it should be:

  '@context': 'https://www.w3.org/ns/microdata/v1' 

or maybe this:

  '@context': '**https://schema.org/microdata/v1' 

@danbri, thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't thing aliasing in the default output is particularly useful, after all, it can always be re-compacted using a different context after the fact.

index.html Outdated
<h3>JSON-LD</h3>

<p>Given a list of nodes <var>nodes</var> in a <code>Document</code>, a user agent must
run the following algorithm to extract the Microdata from those nodes into a JSON-LD:</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change to "convert the Microdata from those nodes into a JSON-LD representation":


</ol>

<p class="note">This algorithm returns an object with a single property that is an array, instead
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case, put the @context at the very top of this object.

<li><p>Add <var>item</var> to <var>memory</var>.</p></li>

<li><p>If the <var>item</var> has any <a>item types</a>, add an entry to <var>result</var>
called "<code>@type</code>" whose value is an array listing the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to alias @type to type to make JS developers lives easier. So they can do obj.type instead of obj['@type']... the latter creates sad pandas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it simple; no need to be opinionated in the flavor of the output here, IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, you probably need to create an item-level @context, and set @vocab to the vocabulary identifier for the item; otherwise, the properties will not expand to IRIs and will be ignored.

<code><a>itemtype</a></code> attribute.</p>

</li><li><p>If the <var>item</var> has a <a>global identifier</a>, add an entry to
<var>result</var> called "<code>@id</code>" whose value is the <a>global
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here... maybe alias @id to id.


<li><p>If <var>value</var> is an <a data-lt="concept item">item</a>, then:
If <var>value</var> is in <var>memory</var>, then let <var>value</var> be
the string "<code>ERROR</code>". Otherwise, <a>get the object</a> for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels strange to set a value to the string ERROR... feels like you should throw an exception or do something less extreme than throwing but more extreme than returning an 'ERROR' string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, in this case, shouldn't it be a node reference? After all, the value is an item, and we should just reference that item.

URL was <code>http://blog.example.com/progress-report</code>):</p>

<pre>{
"items": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is supposed to be a JSON-LD object, you're missing a @context.

Copy link
Member

@gkellogg gkellogg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a crack at implementing, but as is, this will not generate anything close the the same output as the RDFa mechanism. Also, parsed values (datetime, object, meter, etc.) are not extracted properly. This could potentially be done by adding to an item-level @context to keep the actual generated JSON cleaner.

index.html Outdated
<a>top-level Microdata item</a>, and if it is then <a>get the object</a> for that element and add it to <var>items</var>.</p></li>

<li><p>Add an entry to <var>result</var> called "<code>items</code>" whose
value is the array <var>items</var>.</p></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a top-level entry "items" is not useful, unless "items" is aliased to "@graph". "@graph" is used to contain top-level node definitions in JSON-LD. But, if there is no top-level @context (see note elsewhere), then perhaps result is simply an array.

<li><p>Add <var>item</var> to <var>memory</var>.</p></li>

<li><p>If the <var>item</var> has any <a>item types</a>, add an entry to <var>result</var>
called "<code>@type</code>" whose value is an array listing the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, you probably need to create an item-level @context, and set @vocab to the vocabulary identifier for the item; otherwise, the properties will not expand to IRIs and will be ignored.

<li><p>If there is no entry named <var>name</var> in <var>result</var>,
then add an entry named <var>name</var> to <var>result</var> whose
value is an empty array.</p></li>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looses the fact that the value may be a URL relationship, which should be encoded as {"@id": "..."}; alternatively, to simplify the syntax, and entry in the item-specific @context could be created for name, with {"@type": "@id"}.

@gkellogg
Copy link
Member

cc/ @niklasl

@gkellogg
Copy link
Member

gkellogg commented Jul 20, 2017

I have an implementation alongside my normal, and RDFa-based Microdata parser. It pretty much follows the algorithm, with the comments I've previously made.

The property_value id dumbed-down from that used in the native RDF parser, so it could relatively easily obtain datatypes for values and parse numbers.

It allocates an @id using a generated BNode identifier when it encounters a reference to another item that doesn't already have an @id, and allows this to be used when an item is already found in memory.

It creates an item-level @context containing @vocab when it finds a local vocabulary for an item. It probably wouldn't be difficult to avoid this when the parent of the item has the same vocabulary.

It always uses @graph, but this could be optimized in case there is only a single top-level item.

It infers language and base-URL by introspection into the DOM.

Edit: it also trims whitespace around values, which at least makes the output look a bit better.

@gkellogg
Copy link
Member

For consideration, here's my output to your example

{
  "@graph": [
    {
      "@context": {"@vocab": "https://schema.org/"},
      "@type": ["https://schema.org/BlogPosting"],
      "headline": ["Progress report"],
      "url": [{"@id": "http://example.com?comments=0"}],
      "comment": [
        {
          "@context": {"@vocab": "https://schema.org/"},
          "@type": ["https://schema.org/Comment"],
          "url": [{"@id": "http://example.com#c1"}
          ],
          "creator": [
            {
              "@context": {"@vocab": "https://schema.org/"},
              "@type": ["https://schema.org/Person"],
              "name": ["Greg"]
            }
          ],
          "dateCreated": ["2013-08-29"]
        }
      ],
      "datePublished": ["2013-08-29"]
    }
  ]
}

@chaals
Copy link
Collaborator Author

chaals commented Jul 21, 2017

Quick reaction:My sincere thanks @gkellogg @msporny for the work and comments. The one that struck me even before asking was the vocab but it looks liike there is more to deal with. Given that microdata only has two datatypes (string and URL), I'm going to leave it that you can - and probably should in a grown-up format - infer that for example a datetime is actually a date, without requiring that in the microdata spec. I'm pushing to get this done ASAP, but it's not the only priority in the world, so I hope I will have sensible replies over the weekend.

Copy link
Contributor

@danbri danbri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this looks plausible and solid to me and ought to go into the spec for implementator sanity checking.

@danbri danbri merged commit 72d85b8 into gh-pages Sep 6, 2017
@danbri
Copy link
Contributor

danbri commented Sep 6, 2017

On the context question, if the generated output were to always explicitly mark literals as ID or literal/text, would that minimize or remove the need for context declarations?

@gkellogg
Copy link
Member

gkellogg commented Sep 6, 2017

Yes, you can always generate expanded nodes/literal. However, if you know enough to use "@vocab": "http://scheme.org", you can also take advantage of using as a contex, and assume all of its term definitions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Describe microdata -> JSON-LD conversion?
4 participants