Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text and tests for using HTML base for embedded JSON-LD. #51

Merged
merged 4 commits into from
Dec 1, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 13 additions & 25 deletions common/extract-examples.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
require 'fileutils'
require 'colorize'
require 'yaml'
require 'cgi'

PREFIXES = {
dc: "http://purl.org/dc/terms/",
Expand Down Expand Up @@ -49,8 +50,8 @@
# Remove highlighting and commented out sections
def justify(str)
str = str.
sub(/^\s*<!--\s*$/, '').
sub(/^\s*-->\s*$/, '').
gsub(/^\s*<!--\s*$/, '').
gsub(/^\s*-->\s*$/, '').
gsub('****', '').
gsub(/####([^#]*)####/, '')

Expand Down Expand Up @@ -222,7 +223,7 @@ def save_example(examples:, element:, title:, example_number:, error:, warn:)
examples[title] = {
title: title,
filename: fn,
content: content,
content: content.to_s.gsub(/^\s*< !\s*-\s*-/, '<!--').gsub(/-\s*- >/, '-->'),
content_type: element.attr('data-content-type'),
number: example_number,
ext: ext,
Expand Down Expand Up @@ -302,6 +303,7 @@ def save_example(examples:, element:, title:, example_number:, error:, warn:)
# Perform example syntactic validation based on extension
case ex[:ext]
when 'json', 'jsonld', 'jsonldf'
content = CGI.unescapeHTML(content)
begin
::JSON.parse(content)
rescue JSON::ParserError => exception
Expand All @@ -325,22 +327,16 @@ def save_example(examples:, element:, title:, example_number:, error:, warn:)
ex[:base] = html_base.to_s if html_base

script_content = doc.at_xpath(xpath)
if script_content
# Remove (faked) XML comments and unescape sequences
content = script_content
.inner_html
.sub(/^\s*< !\s*-\s*-/, '')
.sub(/-\s*- >\s*$/, '')
.gsub(/&lt;/, '<')
end


# Remove (faked) XML comments and unescape sequences
content = CGI.unescapeHTML(script_content.inner_html) if script_content
rescue Nokogiri::XML::SyntaxError => exception
errors << "Example #{ex[:number]} at line #{ex[:line]} parse error: #{exception.message}"
$stdout.write "F".colorize(:red)
next
end
when 'table'
# already in parsed form
content = Nokogiri::HTML.parse(content)
when 'ttl', 'trig'
begin
reader_errors = []
Expand Down Expand Up @@ -443,10 +439,7 @@ def save_example(examples:, element:, title:, example_number:, error:, warn:)
# Set argument to referenced content to be parsed
args[0] = if examples[ex[:result_for]][:ext] == 'html' && method == :expand
# If we are expanding, and the reference is HTML, find the first script element.
doc = Nokogiri::HTML.parse(
examples[ex[:result_for]][:content]
.sub(/^\s*< !\s*-\s*-/, '')
.sub(/-\s*- >\s*$/, ''))
doc = Nokogiri::HTML.parse(examples[ex[:result_for]][:content])

# Get base from document, if present
html_base = doc.at_xpath('/html/head/base/@href')
Expand All @@ -458,15 +451,10 @@ def save_example(examples:, element:, title:, example_number:, error:, warn:)
$stdout.write "F".colorize(:red)
next
end
StringIO.new(script_content
.inner_html
.gsub(/&lt;/, '<'))
StringIO.new(CGI.unescapeHTML(script_content.inner_html))
elsif examples[ex[:result_for]][:ext] == 'html' && ex[:target]
# Only use the targeted script
doc = Nokogiri::HTML.parse(
examples[ex[:result_for]][:content]
.sub(/^\s*< !\s*-\s*-/, '')
.sub(/-\s*- >\s*$/, ''))
doc = Nokogiri::HTML.parse(examples[ex[:result_for]][:content])
script_content = doc.at_xpath(xpath)
unless script_content
errors << "Example #{ex[:number]} at line #{ex[:line]} references example #{ex[:result_for].inspect} with no JSON-LD script element"
Expand Down Expand Up @@ -565,7 +553,7 @@ def save_example(examples:, element:, title:, example_number:, error:, warn:)
$stderr.puts "expected:\n" + expected.to_trig if verbose
when 'table'
expected = begin
table_to_dataset(content)
table_to_dataset(content.xpath('/html/body/table'))
rescue
errors << "Example #{ex[:number]} at line #{ex[:line]} raised error reading table: #{$!}"
RDF::Dataset.new
Expand Down
35 changes: 9 additions & 26 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -4671,40 +4671,15 @@ <h3>Extract Script Content Algorithm</h3>

<section class="informative">
<h4>Overview</h4>
<p>As a <a data-cite="HTML52/semantics-scripting.html#data-block">data block</a>
may be inside a comment, and may be escaped, the algorithm extracts the JSON from any comment,
removes REVERSE SOLIDUS escapes,
and reverses <a data-cite="HTML5/syntax.html#character-references">HTML Character references</a>.
<p>The algorithm reverses <a data-cite="HTML5/syntax.html#character-references">HTML Character references</a>.
</section>

<section>
<h4>Algorithm</h4>
<p>The algorithm takes a single required input variable: <var>source</var>,
the <a data-cite="DOM#dom-node-textcontent">textContent</a> of an HTML <a data-cite="HTML52/semantics-scripting.html#the-script-element">script element</a>.</p>
<p>For the purpose of this algorithm, the following tokens are defined in [[ABNF]]:</p>

<pre class="nohighlight">
<dfn>space-character</dfn> = %20 ; SPACE
/ %09 ; CHARACTER TABULATION (tab)
/ %0A ; LINE FEED (LF)
/ %0C ; FORM FEED (FF)
/ %0D ; CARRIAGE RETURN (CR)
<dfn>comment-open</dfn> = *<a>space-character</a> <code>"&lt;!--"</code> *<a>space-character</a>
<dfn>comment-close</dfn> = *<a>space-character</a> <code>"--&gt;"</code> *<a>space-character</a>
</pre>

<ol>
<li>If <var>source</var> begins with <a>comment-open</a> and ends with <a>comment-close</a>,
remove those sequences from <var>source</var>.</li>
<li>If <var>source</var> contains <a>comment-open</a> or <a>comment-close</a>,
an <a data-link-for="JsonLdErrorCode">invalid script element</a> has been detected, and processing is aborted.</li>
<li>For all occurances of the any of the character sequences
<code>&lt;\script</code>,
<code>&lt;\/script</code>,
<code>&lt;\!--</code>,
or <code>--\&gt;</code>
in <var>source</var> using a case-insenstive match,
replace the sequence with the equivalent sequence excluding the REVERSE SOLIDUS (<code>\</code>).</li>
<li>For all occurances of a <a data-cite="HTML5/syntax.html#character-references">HTML Character reference</a> in <var>source</var>,
replace the sequence with the equivalent Unicode character as defined
in <a data-cite="HTML52/syntax.html#named-character-references">Named character references</a> in [[HTML52]].</li>
Expand Down Expand Up @@ -4866,6 +4841,14 @@ <h3>The <dfn>JsonLdProcessor</dfn> Interface</h3>
a <a>string</a> representing the <a>IRI</a> of a remote document,
extract the content of the <a>JSON-LD script element</a>(s) into <var>original input</var>:
<ol>
<li>Set <a>base IRI</a> to the the <a data-cite="HTML52/infrastructure.html#document-base-url">Document Base URL</a>
of <var>original input</var>, as defined in [[HTML52]],
using the existing <a>base IRI</a> as the document's URL.
<div class="issue atrisk">
The use of the <a data-cite="HTML52/infrastructure.html#document-base-url">Document Base URL</a>
from [[HTML52]] for setting the <a>base IRI</a> of the enclosed JSON-LD
is an experimental feature, which may be changed in a future version of this specification.
</div>
<li>If the original passed <a data-lt="jsonldprocessor-expand-input">input</a> parameter
contains a <a data-cite="RFC3986#section-3.5">fragment identifier</a>,
set <var>source</var> to the <a data-cite="DOM#dom-node-textcontent">textContent</a>
Expand Down
56 changes: 40 additions & 16 deletions tests/expand-manifest.jsonld
Original file line number Diff line number Diff line change
Expand Up @@ -1353,22 +1353,6 @@
"input": "expand/h007-in.html",
"expect": "expand/h007-out.jsonld",
"option": {"specVersion": "json-ld-1.1", "extractAllScripts": true}
}, {
"@id": "#th008",
"@type": ["jld:PositiveEvaluationTest", "jld:ExpandTest"],
"name": "Expands embedded JSON-LD script element with comments",
"purpose": "Tests embedded JSON-LD in HTML with comments",
"input": "expand/h008-in.html",
"expect": "expand/h008-out.jsonld",
"option": {"specVersion": "json-ld-1.1"}
}, {
"@id": "#th009",
"@type": ["jld:PositiveEvaluationTest", "jld:ExpandTest"],
"name": "Expands embedded JSON-LD script element with escaped tokens",
"purpose": "Tests embedded JSON-LD in HTML with escapes",
"input": "expand/h009-in.html",
"expect": "expand/h009-out.jsonld",
"option": {"specVersion": "json-ld-1.1"}
}, {
"@id": "#th010",
"@type": ["jld:PositiveEvaluationTest", "jld:ExpandTest"],
Expand Down Expand Up @@ -1433,6 +1417,46 @@
"input": "expand/h017-in.html",
"expect": "invalid script element",
"option": {"specVersion": "json-ld-1.1"}
}, {
"@id": "#th018",
"@type": ["jld:PositiveEvaluationTest", "jld:ExpandTest"],
"name": "Expands embedded JSON-LD script element relative to document base",
"purpose": "Tests embedded JSON-LD in HTML",
"input": "expand/h018-in.html",
"expect": "expand/h018-out.jsonld",
"option": {"specVersion": "json-ld-1.1"}
}, {
"@id": "#th019",
"@type": ["jld:PositiveEvaluationTest", "jld:ExpandTest"],
"name": "Expands embedded JSON-LD script element relative to base option",
"purpose": "Tests embedded JSON-LD in HTML",
"input": "expand/h019-in.html",
"expect": "expand/h019-out.jsonld",
"option": {"specVersion": "json-ld-1.1", "base": "http://a.example.com/doc"}
}, {
"@id": "#th020",
"@type": ["jld:PositiveEvaluationTest", "jld:ExpandTest"],
"name": "Expands embedded JSON-LD script element relative to HTML base",
"purpose": "Tests embedded JSON-LD in HTML",
"input": "expand/h020-in.html",
"expect": "expand/h020-out.jsonld",
"option": {"specVersion": "json-ld-1.1", "base": "http://a.example.com/doc"}
}, {
"@id": "#th021",
"@type": ["jld:PositiveEvaluationTest", "jld:ExpandTest"],
"name": "Expands embedded JSON-LD script element relative to relative HTML base",
"purpose": "Tests embedded JSON-LD in HTML",
"input": "expand/h021-in.html",
"expect": "expand/h021-out.jsonld",
"option": {"specVersion": "json-ld-1.1", "base": "http://a.example.com/doc"}
}, {
"@id": "#th022",
"@type": ["jld:PositiveEvaluationTest", "jld:ExpandTest"],
"name": "Expands targeted JSON-LD script element with fragment and HTML base",
"purpose": "Tests embedded JSON-LD in HTML with fragment identifier",
"input": "expand/h022-in.html#second",
"expect": "expand/h022-out.jsonld",
"option": {"specVersion": "json-ld-1.1"}
}, {
"@id": "#tm001",
"@type": ["jld:PositiveEvaluationTest", "jld:ExpandTest"],
Expand Down
13 changes: 13 additions & 0 deletions tests/expand/h018-in.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<html>
<head>
<script type="application/ld+json">
{
"@context": {
"foo": {"@id": "http://example.com/foo"}
},
"@id": "",
"foo": [{"@value": "bar"}]
}
</script>
</head>
</html>
4 changes: 4 additions & 0 deletions tests/expand/h018-out.jsonld
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[{
"@id": "https://w3c.github.io/json-ld-api/tests/expand/h018-in.html",
"http://example.com/foo": [{"@value": "bar"}]
}]
13 changes: 13 additions & 0 deletions tests/expand/h019-in.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<html>
<head>
<script type="application/ld+json">
{
"@context": {
"foo": {"@id": "http://example.com/foo"}
},
"@id": "",
"foo": [{"@value": "bar"}]
}
</script>
</head>
</html>
4 changes: 4 additions & 0 deletions tests/expand/h019-out.jsonld
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[{
"@id": "http://a.example.com/doc",
"http://example.com/foo": [{"@value": "bar"}]
}]
14 changes: 14 additions & 0 deletions tests/expand/h020-in.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<html>
<head>
<base href="http://a.example.com/base" />
<script type="application/ld+json">
{
"@context": {
"foo": {"@id": "http://example.com/foo"}
},
"@id": "",
"foo": [{"@value": "bar"}]
}
</script>
</head>
</html>
4 changes: 4 additions & 0 deletions tests/expand/h020-out.jsonld
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[{
"@id": "http://a.example.com/base",
"http://example.com/foo": [{"@value": "bar"}]
}]
14 changes: 14 additions & 0 deletions tests/expand/h021-in.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<html>
<head>
<base href="base" />
<script type="application/ld+json">
{
"@context": {
"foo": {"@id": "http://example.com/foo"}
},
"@id": "",
"foo": [{"@value": "bar"}]
}
</script>
</head>
</html>
4 changes: 4 additions & 0 deletions tests/expand/h021-out.jsonld
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[{
"@id": "http://a.example.com/base",
"http://example.com/foo": [{"@value": "bar"}]
}]
20 changes: 20 additions & 0 deletions tests/expand/h022-in.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<html>
<head>
<base href="http://a.example.com/base" />
<script id="first" type="application/ld+json">
{
"@context": {
"foo": {"@id": "http://example.com/foo"}
},
"foo": [{"@value": "bar"}]
}
</script>
<script id="second" type="application/ld+json">
{
"@context": {"ex": "http://example.com/"},
"@id": "",
"ex:bar": "foo"
}
</script>
</head>
</html>
4 changes: 4 additions & 0 deletions tests/expand/h022-out.jsonld
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[{
"@id": "http://a.example.com/base",
"http://example.com/bar": [{"@value": "foo"}]
}]