HTML Serialization Use Cases #147

tcole3 · 2016-02-01T01:47:01Z

The issue of how to serialize Web annotations in HTML has come up a number of times on the WG list, in WG calls, and in regard to at least one open issue (#87). However, as became clear during the WG call on 27 Jan 2016, what we collectively mean by HTML Serialization needs more definition. In particular we need well-defined use cases and better definition of scope before we can provide guidance to implementers and discuss HTML serialization in WG Recommendations or our other documents. We also need to determine which facets of the HTML Serialization issue need to be dealt with before model and vocabulary go to CR, and which facets should be deferred to next Charter.

Please add HTML serialization use cases to this issue. This will allow us to cluster and raise new issues dealing with specific aspects of HTML serialization as required and to better identify the technologies and approaches that can be used to best address the full range of HTML Serialization use cases.

To help get you started, for this discussion all of the following should be considered potentially in scope (pending review of use cases submitted), and likely you will think of additional categories of HTML Serialization use cases:

• HTML as another serialization format for Web Annotations – for example, expressing an annotation using RDFa, RDFa Lite, microdata.

• Expressing a Web Annotation by mapping our vocabulary directly to HTML

• Web Annotations embedded in HTML documents which also contain the annotation target(s) and/or body(ies).

• Use cases that require dynamically updating the HTML DOM as footnotes, comments and other forms of annotation of the HTML are added.

Technologies implementing some of these use cases exist and may be referenced in use cases submitted, but at this point in the process, our focus should be on defining and describing the use cases you think the WG should address. The best approach(es) to use will best be hashed out in more specific issue threads once we have a better sense of scope and have done some sorting and prioritizing.

iherman · 2016-02-01T13:51:02Z

Use case: Dynamically extending a Web page with annotation(s)

An interactive annotation system incorporates the annotation into the annotated HTML page. This is done by extending the DOM of the HTML page at load (or interaction) time (eg, retrieving the annotations from a server), using the (DOM version of the) HTML serialization of each annotation. By doing so, it leaves the display of the annotation to the browser's (or the reading system's) display engine.

The user/reader of the HTML page can add CSS statements to style the annotations themselves. Because the annotations use a standard, the CSS can refer to the standard set of elements and attributes; the effect will be the same regardless of which annotation system is used.

In effect, by using a standard HTML serialization, the content and the style becomes strictly separated.

Characterizations

It is immaterial whether the serialization is defined in terms of new elements and/or new attributes (as extensions to HTML) or specific values for already existing attributes (eg, RDFa or microdata), as long as it is clear how to make the right CSS selections.
In this use case the target and most of the body (eg, the textual body, but possibly other media, too) are, after the DOM injection, in the same document as far as the browser is concerned. But, because this is done dynamically at load time, this may not have much influence on the serialization.

tcole3 · 2016-04-08T03:53:32Z

## 3 Annotation Use Cases – 3 Serialization Options

The JSON-LD 1.0 Specification in section 6.20 [1] says, "HTML script tags can be used to embed blocks of data in documents. This way, JSON-LD content can be easily embedded in HTML by placing it in a script element with the type attribute set to application/ld+json." (This section is non-normative.) Elsewhere, in appendices [2] [3] also non-normative, the specification does provide illustrations of how JSON-LD would transform to RDFa or Microdata; but importantly for us, the JSON-LD specification in the Microdata appendix says that, "the JSON-LD representation of the Microdata information stays true to the desires of the Microdata community to avoid contexts and instead refer to items by their full IRI." (The same is effectively true for RDFa serializations.) Because our @context shortens some vocabulary items and hides the namespaces from which we borrow properties and classes, this means that developers choosing RDFa or Microdata to serialize annotations in HTML would need to use full property names and namespaces, e.g., oa:hasBody, oa:hasTarget, dcterms:creator. etc.

For this reason, if we write a WG note regarding the serialization of annotations in HTML documents, we may want to recommend or highlight the approach of embedding JSON serializations of annotations in HTML as JSON-LD-in-script elements, rather than the use of RDFa or Microdata to serialize annotations. (Of course there could be 4th, 5th, etc. option(s) -- e.g., extending HTML directly with new elements and/or attributes -- for serializing annotations in HTML that we might prefer to JSON-LD-in-script element – please suggest.)

I illustrate the approach of using JSON-LD-in-script elements in HTML to embed annotations [4] for the 3 HTML annotation use cases described below. I also illustrate Microdata [5] and RDFa [6] options for the first use case, but only for the first use case, since I think it likely we will prefer JSON-LD-in-script or something else to RDFa or Microdata.

Finally, though this gets into issues of interface (not our bailiwick), to help illustrate the JSON-LD-in-script approach, I include in [4] some JavaScript that dynamically modifies the HTML based on the annotations that have been added to the HTML in script elements (e.g., adding anchors, tooltips, footnotes as appropriate). This code was written with Janina Sarol and is not meant to be generic, but is provided simply as proof-of-concept.

Use cases illustrated:

A. Viewing an HTML blog entry about the birthday of Frances Scott Key [7], Tim wants to annotate the mention of the HMS Tonnant with a link to its Wikipedia page. The target is the mention of the Tonnant in the HTML page and the body is the Wikipedia page (Resource). The annotation is embedded in the Web page in a script element with id='Anno1'. An HTML fragment (i.e., #Anno1) is then appended to the page URL to create the Annotation URI (is this acceptable practice? What are the identity requirements for annotations serialized in HTML?).

B. Viewing this same HTML blog entry, Tim wants to annotate the portrait of Key embedded in the HTML page with a Textual Body (noting that the portrait was painted many years after the death of its subject). The target is a SpecificResource having the image as its source and the HTML Page URI as its scope. The script element id is 'Anno2' and the Annotation URI is created in the same way.

C. Viewing this same HTML blog entry, Tim wants to annotate the mention of the first publication of what became the US National Anthem with its full citation – essentially add a footnote. The TextualBody is the text of the citation, and the target is text in the HTML page. A second body, the link to the digitized article, is included. The script element containing the third annotation has attribute id='Anno3'.

[1] https://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
[2] https://www.w3.org/TR/json-ld/#rdfa
[3] https://www.w3.org/TR/json-ld/#microdata
[4] http://w3c.github.io/web-annotation/htmlSerialization/BlogEntryAnnotatedJSON.html
[5] http://w3c.github.io/web-annotation/htmlSerialization/BlogEntryAnnotatedMicrodata.html
[6] http://w3c.github.io/web-annotation/htmlSerialization/BlogEntryAnnotatedRDFa.html
[7] http://w3c.github.io/web-annotation/htmlSerialization/BlogEntry.html

csarven · 2016-04-08T12:46:16Z

dokieli - a decentralised authoring, annotations, and social notifications tool. It stores all articles and Web Annotations natively in HTML+RDFa by default in personal data stores.

https://www.youtube.com/watch?v=tH_wMWSEzlE is a 1 minute screencast demonstrating an annotation interaction:

Sign-in to an article using personal WebID by authenticating with personal data store
Annotation UI (highlight text, write content, select license, submit)
Store annotation at personal data store (at a different URL and domain than the article)
Send notification to the article's inbox about the annotation
Article looks at its own inbox for notifications, retrieves the annotation from the remote location, parses HTML+RDFa, generates a similar HTML+RDFa and inserts into the DOM for view

There can be variations to the process and mechanism above, but it works the same for replies, footnotes, references, bookmarking, and other social interactions.

For more details on dokieli: http://csarven.ca/dokieli .

iherman · 2016-04-08T13:18:02Z

On 8 Apr 2016, at 14:46, Sarven Capadisli notifications@github.com wrote:

dokieli https://github.com/linkeddata/dokieli - a decentralised authoring, annotations, and social notifications tool. It stores all articles and Web Annotations natively in HTML+RDFa by default in personal data stores.

https://www.youtube.com/watch?v=tH_wMWSEzlE https://www.youtube.com/watch?v=tH_wMWSEzlE is a 1 minute screencast demonstrating an annotation interaction:

Sign-in to an article using personal WebID by authenticating with personal data store
Annotation UI (highlight text, write content, select license, submit)
Store annotation at personal data store (at a different URL and domain than the article)
Send notification to the article's inbox about the annotation
Article looks at its own inbox for notifications, retrieves the annotation from the remote location, parses HTML+RDFa, generates a similar HTML+RDFa and inserts into the DOM for view
There can be variations to the process and mechanism above, but it works the same for replies, footnotes, references, bookmarking, and other social interactions.

@csarven, just to make it sure I have the right understanding: as far as the HTML+RDFa is concerned, that is done 100% by the system and not the user, right? In other words, the possible complexity of the HTML+RDFa does raise usability issues, because it is invisible for the end user. Knowing the complexity of HTML+RDFa, this is a very important aspect.

csarven · 2016-04-08T14:05:11Z

@iherman Correct, the HTML+RDFa is done by the system. The user is only involved in highlighting the text which they want to annotate, write their content in natural language, select and assign the license (from the dropdown) and submit. The dokieli JavaScript assembles the information (in HTML+RDFa), and sends it to the user's access controlled personal storage. The "complexity" of the source of the information is never made visible to the user who is 1) publishing the annotation, and 2) viewing the article with the annotation.

iherman · 2016-04-08T17:55:39Z

Looking at the comment put in by @tcole3 :

First of all, what I like in all these approaches is that they work out of the box today, without any need for an extension of HTML. That is a major plus. However I believe that, for practical purposes, we could cross microdata from the list. Microdata, as far as I know, is used only by schema.org (which is of course important!); I do not know of any other environments, tools, etc, that would process microdata.

One of the main complications (maybe the major complication) of the RDFa encoding is that, being a true RDF serialization, it relies on a number of namespaces (duly set in a @prefix attribute). RDF experts/users have no problem with that, it is in their blood :-), but the Web Application community frowns on that (nay, they vehemently refuse doing that). We have avoided this problem in JSON-LD with the help of the appropriate @context file but, alas!, nothing like that exists in RDFa.

If we expect the RDFa encoding ever being done by human users and not only by machines behind the scenes, we may have to address this. There is an approach to do that, but I have to ask my RDF friends to hold their nose:-): we can define a single namespace vocabulary that consists of nothing else than a series of owl:sameAs statements (or equivalents for classes and properties) to the resources in the OA Vocabulary. Ie, it would provide, essentially, aliases (via owl:sameAs) to oa terms, dc terms, etc. A fake @context file, thus. If we do this, we can greatly simplify the RDFa encoding:

<p vocab="http://my.fake.vocabulary.ns">On August 1, 1779, F...
        from the <span id="Anno1" typeOf="Annotation" resource="http://w3c.github.io/web-annotation/htmlSerialization/BlogEntryAnnotatedRDFa.html#Anno1">
            <time property="created" datatype="dateTime" datetime="2015-01-28T12:00:00Z"></time>
            <span property="creator" resource="http://www.library.illinois.edu/people/bios/t-cole3/" typeOf="Person"><meta property="name" content="Tim Cole"/></span>
            ...

etc.

It is an ugly hack from an RDF point of view, although perfectly "legal". But it works, and may become then a fairly acceptable way of encoding an annotation in RDFa.

Take a deep breath before you answer:-)

halindrome · 2016-04-08T18:13:00Z

Speaking as an old RDF person... it's fine. Not even that ugly. I have been considering it for ages. Isn't it what schema.org uses?

iherman · 2016-04-08T18:56:05Z

On 8 Apr 2016, at 20:13, Shane McCarron notifications@github.com wrote:

Speaking as an old RDF person... it's fine. Not even that ugly. I have been considering it for ages. Isn't it what schema.org uses?

Not exactly. Afaik they don't define aliases; they just define their own terms.

halindrome · 2016-04-08T21:23:38Z

Huh - interesting. I thought the datamodel mapped back to the original ontologies when that was appropriate. Guess not.

tcole3 · 2016-04-10T22:07:10Z

Let's be concrete. Our json-ld context document currently maps 10 classes, 23 properties and 1 attribute to a total of 8 namespaces (based on a quick count). I may have missed a few enumerations and/or values we draw from these and other namespaces (there are 12 namespaces in addition to our own reference in current draft of our json-ld context document), but may not matter since arguably you might want to keep these. It's the borrowed properties that are the main issue.

as: "http://www.w3.org/ns/activitystreams#"
as:Application
as:first
as:generator
as:items, "@container": "@list"
as:last
as:next
as:OrderedCollection
as:OrderedCollectionPage
as:partOf
as:prev
as:startIndex
as:totalItems

dc: "http://purl.org/dc/elements/1.1/"
dc:format

dcterms: "http://purl.org/dc/terms/"
dcterms:conformsTo
dcterms:creator
dcterms:issued
dcterms:modified
dcterms:rights

dctypes: "http://purl.org/dc/dcmitype/"
dctypes:Dataset
dctypes:MovingImage
dctypes:Sound
dctypes:StillImage
dctypes:Text

foaf: "http://xmlns.com/foaf/0.1/"
foaf:homepage
foaf:mbox
foaf:mbox_sha1sum
foaf:name
foaf:nick
foaf:Organization
foaf:Person

rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
rdf:value

rdfs: "http://www.w3.org/2000/01/rdf-schema#"
rdfs:label

schema: "http://schema.org/"
schema:audience

That's a lot of owl:sameAs assertions, but assuming no name collisions (I don't think we have any), I personally have no strong opinions one way or another. Namespaces are convenient in XML and some other serializations, not really so much in JSON. There are advantages in not being seen to re-invent the wheel, and not having to maintain vocabulary terms in parallel, but as long as we acknowledge inspiration, I could live with this kind of change given a strong enough rationale. By the way, http://schema.org/docs/schema_org_rdfa.html does acknowledge source, it just doesn't use owl:sameAs. Personally I'd rather be more explicit and link term-by-term to original namespace using owl:sameAs, as was suggested.

However, because we also shorten some names in our JSON-LD context document, I'm not sure just addressing the namespaced classes and properties issue alone is sufficient to fully facilitate the mapping between RDFa and JSON-LD in HTML, if that's really what we want to do. In our own namespace we have about a dozen of these shortened aliases:

body, hasBody
target, hasTarget
source, hasSource
selector, hasSelector
state, hasState
scope, hasScope
startSelector, hasStartSelector
endSelector, hasEndSelector
motivation, motivatedBy
purpose, hasPurpose
stylesheet, styledBy
cached, cachedSource

We already argued a bit about these. Not sure we want to re-open this discussion at this late date. I personally think it desirable to maintain backward compatibility. But in keeping with idea of eliminating keys in foreign namespaces, if compelling enough case could be made, we could maintain 'superseded' terms as schema.org does (e.g., schema:review supersedes schema:reviews, schema:provider supersedes schema:carrier) or in some other way maintain longer term while preferring shorter term in RDFa as well as in json-ld. For those going back and forth between json-ld and other rdf serializations, it would make life a tiny bit easier.

All in all, seems like a lot work preceded by extensive discussion (and potentially heated argument). Do not want to get derailed. But if there is consensus that we want the RDFa to look more like our json-ld (which is what schema.org clearly wanted) in order to facilitate serialization in HTML, these changes would go a long way in that direction.

Ultimately may depend on the strength of disagreements within the Group and the balance we settle on for HTML Serialization note between JSON-LD, RDFa, and extensions to HTML (the latter would presumably require its own distinct mapping).

What do others think? Go ahead. Be honest (but keep it clean and no personal attacks).

iherman · 2016-04-11T10:08:03Z

On 10 Apr 2016, at 18:07, Tim Cole notifications@github.com wrote:
However, because we also shorten some names in our JSON-LD context document, I'm not sure just addressing the namespaced classes and properties issue alone is sufficient to fully facilitate the mapping between RDFa and JSON-LD in HTML, if that's really what we want to do. In our own namespace we have about a dozen of these shortened aliases:

body, hasBody
target, hasTarget
source, hasSource
selector, hasSelector
state, hasState
scope, hasScope
startSelector, hasStartSelector
endSelector, hasEndSelector
motivation, motivatedBy
purpose, hasPurpose
stylesheet, styledBy
cached, cachedSource

We already argued a bit about these. Not sure we want to re-open this discussion at this late date. I personally think it desirable to maintain backward compatibility. But in keeping with idea of eliminating keys in foreign namespaces, if compelling enough case could be made, we could maintain 'superseded' terms as schema.org does (e.g., schema:review supersedes schema:reviews, schema:provider supersedes schema:carrier) or in some other way maintain longer term while preferring shorter term in RDFa as well as in json-ld. For those going back and forth between json-ld and other rdf serializations, it would make life a tiny bit easier.

All in all, seems like a lot work preceded by extensive discussion (and potentially heated argument). Do not want to get derailed. But if there is consensus that we want the RDFa to look more like our json-ld (which is what schema.org clearly wanted) in order to facilitate serialization in HTML, these changes would go a long way in that direction.

You are right, this is an issue I did not consider.

I do not want to get into this argument either. I would see we have two alternative strategies here, and we should strictly limit ourselves to these two.

For RDFa we use the terms in the vocabulary. No change at all, just in the namespace.
For RDFa we use the terms of JSON-LD. Again, no change at all, just an alias from that term to the vocabulary.

I do not think we should reopen any terminology issue on these, it is bikeshedding at this point.

There are pros and cons for both. I am personally tempted to go for (2), because if users may want to mix the possibilities to include JSON-LD in the script and also use RDFa (which is a perfectly viable option) then (2) allows a mess. On the other hand, hard core RDF people would want to rely on the formal vocabulary terms (but, then again, hard core RDF people would have no problem using namespaces, ie, this aliasing exercise may be of no interest for them in the first place.)

BigBlueHat · 2016-05-18T14:08:35Z

👍 to the "use the terms of JSON-LD" option. Despite the nose holding from our RDF friends. 😉

iherman · 2016-05-18T14:10:48Z

Discussed at F2F, 18.05.16: agreed to move on with a note documenting the existing possibilities (ie, JSON LD in script, and RDFa with one giant namespace document).

tcole3 · 2017-02-15T18:05:06Z

The use cases were needed to help write the HTML Serialization Note. Completion and ratification of this Note by the WG (final ed. draft: http://w3c.github.io/web-annotation/serialization-html-note/) closes this issue.

tcole3 added the serialization label Feb 1, 2016

iherman added the editor_action label May 18, 2016

azaroth42 added the selector note label May 18, 2016

azaroth42 assigned tcole3 May 18, 2016

iherman added html note and removed serialization selector note labels Oct 14, 2016

iherman added this to the V1 Rec milestone Jan 23, 2017

tcole3 closed this as completed Feb 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML Serialization Use Cases #147

HTML Serialization Use Cases #147

tcole3 commented Feb 1, 2016

iherman commented Feb 1, 2016

tcole3 commented Apr 8, 2016

csarven commented Apr 8, 2016

iherman commented Apr 8, 2016

csarven commented Apr 8, 2016

iherman commented Apr 8, 2016

halindrome commented Apr 8, 2016

iherman commented Apr 8, 2016

halindrome commented Apr 8, 2016

tcole3 commented Apr 10, 2016

iherman commented Apr 11, 2016

BigBlueHat commented May 18, 2016 •

edited

Loading

iherman commented May 18, 2016

tcole3 commented Feb 15, 2017

HTML Serialization Use Cases #147

HTML Serialization Use Cases #147

Comments

tcole3 commented Feb 1, 2016

iherman commented Feb 1, 2016

Use case: Dynamically extending a Web page with annotation(s)

Characterizations

tcole3 commented Apr 8, 2016

csarven commented Apr 8, 2016

iherman commented Apr 8, 2016

csarven commented Apr 8, 2016

iherman commented Apr 8, 2016

halindrome commented Apr 8, 2016

iherman commented Apr 8, 2016

halindrome commented Apr 8, 2016

tcole3 commented Apr 10, 2016

iherman commented Apr 11, 2016

BigBlueHat commented May 18, 2016 • edited Loading

iherman commented May 18, 2016

tcole3 commented Feb 15, 2017

BigBlueHat commented May 18, 2016 •

edited

Loading