-
Notifications
You must be signed in to change notification settings - Fork 19
Conversation
fix #29 Note this just uses the changes @lanthaler said were needed back in 2012, so it should be checked before merging.
@msporny, @halindrome @gkellogg can you take a quick look? |
was generating two definitions with the same name / identifiers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only had time to review the raw commits. I don't know if the algorithm is right w/o implementing it. The output JSON-LD looks a bit off, and I made suggestions on changing that. It feels like you may need a Microdata JSON-LD context that at least contains aliases, and things like items
and properties
.
value is the array <var>items</var>.</p></li> | ||
|
||
<li><p>Add an entry to <var>result</var> called "<code>@context</code>" whose value is the following object:</p> | ||
<pre class="html">{ "@vocab" : "" }</pre></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what the injection of this HTML does. Also, if you're converting to JSON-LD, you'll need to specify an @context in the converted JSON-LD. In that context, you might want to alias @id
to id
and @type
to type
to make developers lives easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more, perhaps it should be:
'@context': 'https://www.w3.org/ns/microdata/v1'
or maybe this:
'@context': '**https://schema.org/microdata/v1'
@danbri, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't thing aliasing in the default output is particularly useful, after all, it can always be re-compacted using a different context after the fact.
index.html
Outdated
<h3>JSON-LD</h3> | ||
|
||
<p>Given a list of nodes <var>nodes</var> in a <code>Document</code>, a user agent must | ||
run the following algorithm to extract the Microdata from those nodes into a JSON-LD:</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change to "convert the Microdata from those nodes into a JSON-LD representation":
|
||
</ol> | ||
|
||
<p class="note">This algorithm returns an object with a single property that is an array, instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the case, put the @context at the very top of this object.
<li><p>Add <var>item</var> to <var>memory</var>.</p></li> | ||
|
||
<li><p>If the <var>item</var> has any <a>item types</a>, add an entry to <var>result</var> | ||
called "<code>@type</code>" whose value is an array listing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to alias @type
to type
to make JS developers lives easier. So they can do obj.type
instead of obj['@type']
... the latter creates sad pandas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep it simple; no need to be opinionated in the flavor of the output here, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, you probably need to create an item-level @context
, and set @vocab
to the vocabulary identifier for the item; otherwise, the properties will not expand to IRIs and will be ignored.
<code><a>itemtype</a></code> attribute.</p> | ||
|
||
</li><li><p>If the <var>item</var> has a <a>global identifier</a>, add an entry to | ||
<var>result</var> called "<code>@id</code>" whose value is the <a>global |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here... maybe alias @id
to id
.
|
||
<li><p>If <var>value</var> is an <a data-lt="concept item">item</a>, then: | ||
If <var>value</var> is in <var>memory</var>, then let <var>value</var> be | ||
the string "<code>ERROR</code>". Otherwise, <a>get the object</a> for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels strange to set a value to the string ERROR
... feels like you should throw an exception or do something less extreme than throwing but more extreme than returning an 'ERROR' string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, in this case, shouldn't it be a node reference? After all, the value is an item, and we should just reference that item.
URL was <code>http://blog.example.com/progress-report</code>):</p> | ||
|
||
<pre>{ | ||
"items": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is supposed to be a JSON-LD object, you're missing a @context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a crack at implementing, but as is, this will not generate anything close the the same output as the RDFa mechanism. Also, parsed values (datetime, object, meter, etc.) are not extracted properly. This could potentially be done by adding to an item-level @context
to keep the actual generated JSON cleaner.
index.html
Outdated
<a>top-level Microdata item</a>, and if it is then <a>get the object</a> for that element and add it to <var>items</var>.</p></li> | ||
|
||
<li><p>Add an entry to <var>result</var> called "<code>items</code>" whose | ||
value is the array <var>items</var>.</p></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<li><p>Add <var>item</var> to <var>memory</var>.</p></li> | ||
|
||
<li><p>If the <var>item</var> has any <a>item types</a>, add an entry to <var>result</var> | ||
called "<code>@type</code>" whose value is an array listing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, you probably need to create an item-level @context
, and set @vocab
to the vocabulary identifier for the item; otherwise, the properties will not expand to IRIs and will be ignored.
<li><p>If there is no entry named <var>name</var> in <var>result</var>, | ||
then add an entry named <var>name</var> to <var>result</var> whose | ||
value is an empty array.</p></li> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looses the fact that the value may be a URL relationship, which should be encoded as {"@id": "..."}
; alternatively, to simplify the syntax, and entry in the item-specific @context
could be created for name
, with {"@type": "@id"}
.
cc/ @niklasl |
I have an implementation alongside my normal, and RDFa-based Microdata parser. It pretty much follows the algorithm, with the comments I've previously made. The It allocates an It creates an item-level It always uses It infers language and base-URL by introspection into the DOM. Edit: it also trims whitespace around values, which at least makes the output look a bit better. |
For consideration, here's my output to your example {
"@graph": [
{
"@context": {"@vocab": "https://schema.org/"},
"@type": ["https://schema.org/BlogPosting"],
"headline": ["Progress report"],
"url": [{"@id": "http://example.com?comments=0"}],
"comment": [
{
"@context": {"@vocab": "https://schema.org/"},
"@type": ["https://schema.org/Comment"],
"url": [{"@id": "http://example.com#c1"}
],
"creator": [
{
"@context": {"@vocab": "https://schema.org/"},
"@type": ["https://schema.org/Person"],
"name": ["Greg"]
}
],
"dateCreated": ["2013-08-29"]
}
],
"datePublished": ["2013-08-29"]
}
]
} |
Quick reaction:My sincere thanks @gkellogg @msporny for the work and comments. The one that struck me even before asking was the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, this looks plausible and solid to me and ought to go into the spec for implementator sanity checking.
On the context question, if the generated output were to always explicitly mark literals as ID or literal/text, would that minimize or remove the need for context declarations? |
Yes, you can always generate expanded nodes/literal. However, if you know enough to use "@vocab": "http://scheme.org", you can also take advantage of using as a contex, and assume all of its term definitions. |
fix #29
Note this just uses the changes @lanthaler said were needed back in
2012, so it should be checked before merging.