Skip to content

Conversation

@gkellogg
Copy link
Member

@gkellogg gkellogg commented Apr 8, 2015

Remove property value term, and use within the model document refering instead to the appropriate annotation value within the model document.
Fixes #463.

Remove `property value` term, and use within the model document refering instead to the appropriate annotation value within the model document.
Fixes #463.
@gkellogg gkellogg changed the title Change the value term in the model document to annotation value. Issue 463: Change the value term in the model document to annotation value. Apr 8, 2015
@iherman
Copy link
Member

iherman commented Apr 9, 2015

I have an unease but I am not sure I can put the finger on it very precisely.

This set of changes was mostly triggered by the approach that the model document and the conversions should stand on their own because other specifications may be created in the future that might create and annotated table through some other means than what is described in the metadata document. And if such specs are created, we do not want to redo, say, the metadata spec. I hope I understand this right.

However, ideally, what this should mean is that the syntax document should actually not refer to the metadata document in any normative way. However, that would require a much more thorough re-write. Let me take two examples for my unease (referring to the version in the branch-to-be-merged).

On the one hand, the metadata document describes aboutUrl as a URI Template property with what we have before in section 4.7. On the other hand, the syntax document says (in 3.5):

about URL — a URL for the entity that this cell provides information about, or null. The value of this annotation is derived as described in (reference to section 4.7)

Surely than this should be changed, right? How the aboutUrl came into existence is irrelevant as far as the model document is concerned, other specifications may end up producing the URL by other means. The metadata document should not define an annotation value!

Another, similar, issue is around natural language properties (as we call them in the metadata document). The syntax document says

titles — any number of human-readable titles for the column, each of which has an associated language

whereas the metadata document describes them in much more operational detail:

The annotation value of a natural language property is an object whose properties are language codes and where the values of those properties are an array of strings (see Language Maps in [JSON-LD]).

But... is it o.k. to actually define the annotation value in the metadata document? Shouldn't the definition be part of the syntax document entirely? The language map structure is, in this sense, irrelevant for the syntax document, isn't it?

In general, I believe all references to the metadata document from the syntax document should be considered and, if possible, removed, more exactly replaced by the definition of the annotation value in that document.

Sorry if I am not entirely systematic in my rumbling...

@gkellogg
Copy link
Member Author

gkellogg commented Apr 9, 2015

Too late for me to dive into this this evening (for me), but I take your point. This change did not introduce references from the model document to the metadata document, but it has exacerbated it.

I'll work on trying to stick to a purely functional definition of what these annotations mean in the model, and operational view of the properties and how they affect the annotation values from the metadata document.

I was worried that too many changes this late in the game would be destabilizing, but I'll see what I can do. In the mean time, please suggest any specific changes you'd like to see.

Regarding about url in particular, it is used in two places: the column-about-url and cell-about-url. The Metadata document directly correlates to the column-about-url, but the cell-about-url is derived using the description of the URL template property in the metadata document. How do you think this might be reconciled? Moving the entire description of how URL template properties are turned into URLs into the model document?

Many other things are simpler, and can be described in terms of how the metadata property is used to create the corresponding annotation value. The title annotation, though, should probably more abstractly talk about multiple language-tagged values, rather than the fact that they are in a object whose properties are language tags having multiple values all of that language.

@iherman
Copy link
Member

iherman commented Apr 9, 2015

[@gkellogg:] I was worried that too many changes this late in the game would be destabilizing, but I'll see what I can do.

Yep, that was my worry as well. But, well, we decided yesterday we would do it even if this means shifting the publication date. Thinking about all this a little bit I seriously doubt we can make the publication next week (at least if we go down that line), but we shall see.

[@gkellogg:] Regarding about url in particular, it is used in two places: the column-about-url and cell-about-url. The Metadata document directly correlates to the column-about-url, but the cell-about-url is derived using the description of the URL template property in the metadata document. How do you think this might be reconciled? Moving the entire description of how URL template properties are turned into URLs into the model document?

Actually, I believe the aboutUrl and friends are "simply" absolute URL-s in the model document, i.e., as table annotations. The fact that, in the metadata document, they are derived using a particular template syntax, is irrelevant as far as the model is concerned (other specifications may decide to use very different means to generate those URL-s).

While this means a simple change in the syntax and model documents, we have to be careful that this also modifies the conversion document. Indeed, in section 3.1 of the csv2rdf document it says:

aboutUrl is the evaluation of the URI template property aboutUrl for the current cell.

which probably should be removed; in terms of the conversion, aboutUrl is simply an absolute URL as defined in the syntax.

Once the model and syntax documents are changed, the conversion documents will have to go through a thorough rewrite, too! @6a6d74 :-(

[@gkellogg:] Many other things are simpler, and can be described in terms of how the metadata property is used to create the corresponding annotation value. The title annotation, though, should probably more abstractly talk about multiple language-tagged values, rather than the fact that they are in a object whose properties are language tags having multiple values all of that language.

Well, the current specification of the title annotation is fine I believe:

any number of human-readable titles for the column, each of which has an associated language

The metadata document that should change, removing the reference to JSON-LD structures altogether; processing the metadata produces essentially an array ("any number") of strings with an associated language as described in the model (this also means a slight modification of the merging algorithm)

I think the most spectacular change that we have to do concerns the datatypes. Indeed, the cell values are defined in the syntax document as

value — the semantic value of the cell; this MAY be of a datatype other than a string, MAY be a list, and MAY be null. For example, annotations might enable a processor to understand the string value of the cell as representing a number or a date. By default, if the string value is an empty string, the semantic value of the cell is null. See Parsing Cells in [tabular-metadata] for details about how to compute the cell value.

I believe that the syntax document should clearly remove the cell parsing reference, but it should include the allowed datatypes. Essentially, the the whole of 4.11 from the model document should be moved into the syntax, because those datatypes are constraints on what datatypes the model may include and they also drive the datatypes used in the conversions. Parsing the cells (in the model document) produces such values (so the parsing algorithm stays as it is and where it is).

(B.t.w., the definition of values above should also include a reference to the language information, too!)

Sigh... yes, it is a lot of work.

/Cc: @JeniT

@gkellogg
Copy link
Member Author

gkellogg commented Apr 9, 2015

@iherman I removed most of the explicit references to metadata, at least as they describe how the annotation values are derived. I still need to make sure the metadata document properly describes how these annotation values are created.

Note that I ended up changing the titles definition to reflect the object structure used for normalized titles created in the metadata; this is really the most useful thing for conversion documents, and conforms with the statement at the top of the metadata document about the kinds of values annotations may take. Note that these are not JSON-LD structures, but our own representation of Natural language properties which is similar, but not the same; I think it entirely appropriate to use them here too.

I don't think that the datatypes section from metadata needs to be moved over, as it's used to derive the datatypes values. Really, these values are actually RDF Literals, and perhaps should be described as such; we could even get rid of cell-value-URL and just have this be included in the cell-value, and make it an RDF Term (exclusive of BNodes). This is really what they are, even if a serialization may not represent it that way. It's also how my implementation works, and seems the most logical. Alternatively, we could re-invent this, and just say that the values have string- datatype- and language-facets, and may be absolute URLs.

See what you think.

@gkellogg
Copy link
Member Author

gkellogg commented Apr 9, 2015

After working on the metadata document, I do believe that much of the Datatype and Parsing Cells needs to be moved to the model document. The syntactic requirements for Datatypes must remain so that they can be normalized to form the datatype annotation on the column.

Annotations on Rows and Cells should probably be moved to the model document and normatively describe creating Row and Cell annotations. The Parsing Tabular Data section needs to have a non-normative subsection containing the current algorithm, but needs to reference Parsing Cells, including Datatype parsing and the creation of the value of the cel. along with other cell annotations.

We should consider merging the value-url and value annotations on a cell.

/cc @JeniT, @iherman, @6a6d74

… enough of Datatypes in metadata to describe how the annotations are created.

Fixes #463.
@gkellogg gkellogg removed the Blocked label Apr 10, 2015
@gkellogg
Copy link
Member Author

I think this last set of massive edits accomplishes the separation we need. There is still a reference to URI Template processing in the metadata document, but it seems reasonable to leave this there.

Syntax document:

* I have added some words in the abstract to make it clear that other applications may come with other means of creating annotations, although the standard metadata format is the one we have defined
* In 3.1 I have changed "resources" to "tables", to be consistent with the changes we introduced elsewhere
* I changed a bit the reference to common properties. Other mechanisms may generate those additional annotations through different means, and the current text read as if those would come only from the common properties
* I also removed the dependency of notes on the metadata document. It does not really bring too much to refer to it.
* I have added a reference to BCP47 for the lang annotation for columns. The document already uses that for the titles, and this restriction is needed for conversions.

Metadata document:
* I have added (well copied from the text) a paragraph from the abstract. It define the role of this document better...
@iherman
Copy link
Member

iherman commented Apr 10, 2015

First of all, deep bow towards San Francisco:-)

I have some editing on the text; I will commit that (with comments) separately, so that you can accept them or change them. I also have some comments below that may require some discussion, so I did not want to change them.

  • I was a bit struggling with the presence of the aboutUrl and friends in the column annotations. Do we need them there? The final, purely URL values appear on the cell level annotation, and that seems to be enough. (Of course, implementations may optimize these things for conversions, i.e., extract the common aboutUrl values for cells, but that is not something for the model)
  • The title property in the syntax is defined as "any number of human-readable titles for the column, each of which has an associated language represented as an object whose properties MUST be language codes as defined by [[!BCP47]] and whose values are arrays of strings related to that language." Is this correct, grammatically? But, more importantly... I personally would have preferred to keep it as it was, i.e., something like "any number of human readable titles for the column, each of which with an associated language as defined in [[!BCP47]]". However, if we do that change, the metadata document has to change, to (for natural language properties) so I did not do any change.
  • I note that there are still two references to the metadata documents (beyond the url templates): foreign keys and transformations. I have no problem for the latter, I wonder whether the former should not be rather defined in the syntax document.
  • I see that you have kept the datatype diagram and corresponding text in the metadata document. It is not strictly necessary; after all, the definition is now in the syntax document. I do not have any strong feeling whether it is good to keep it in the metadata document or not; I though I would flag it nevertheless...

/Cc @JeniT

@iherman
Copy link
Member

iherman commented Apr 10, 2015

Just for the good order, here are the changes I made in d567e9c:

Syntax document:

  • I have added some words in the abstract to make it clear that other applications may come with other means of creating annotations, although the standard metadata format is the one we have defined
  • In 3.1 I have changed "resources" to "tables", to be consistent with the changes we introduced elsewhere
  • I changed a bit the reference to common properties. Other mechanisms may generate those additional annotations through different means, and the current text read as if those would come only from the common properties
  • I also removed the dependency of notes on the metadata document. It does not really bring too much to refer to it.
  • I have added a reference to BCP47 for the lang annotation for columns. The document already uses that for the titles, and this restriction is needed for conversions.

Metadata document:

  • I have added (well copied from the text) a paragraph from the abstract. It define the role of this document better...

@6a6d74
Copy link
Contributor

6a6d74 commented Apr 10, 2015

All - the dependencies from csv2* docs back to the model doc (e.g. the description of aboutUrl and friends referring to URI Template Properties) remain in place. I'm guessing that these references could also be wrapped up in @JeniT's edit relating to ISSUE #445 in order to remove reference to "properties" in favour of "annotations"?

I'm reticent to begin another round of edits but will do so if there's consensus. (need to avoid creating merge conflicts though!)

cc/ @JeniT

@iherman
Copy link
Member

iherman commented Apr 10, 2015

On 10 Apr 2015, at 17:48 , Jeremy Tandy notifications@github.com wrote:

All - the dependencies from csv2* docs back to the model doc (e.g. the description of aboutUrl and friends referring to URI Template Properties) remain in place. I'm guessing that these references could also be wrapped up in @JeniT's edit relating to ISSUE #445 in order to remove reference to "properties" in favour of "annotations"?

Very honestly I do not know. And the problem is that Jeni seems to be on the road (at least that is what she said...)

Ivan

I'm reticent to begin another round of edits but will do so if there's consensus. (need to avoid creating merge conflicts though!)

cc/ @JeniT


Reply to this email directly or view it on GitHub.


Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

@gkellogg
Copy link
Member Author

The point of having column about URL separate from cell about URL (and for property and value URLs) is to record the original template on the column, for similar reasons that we might want to record the schema on Table. The cell * URL annotations are fully resolved, and can be used by the csv2* documents without needing to refer to any URL template language.

If you like, I can take a pass and updating the csv2rdf document language as part of this PR.

@gkellogg
Copy link
Member Author

@iherman:

I was a bit struggling with the presence of the aboutUrl and friends in the column annotations. Do we need them there? The final, purely URL values appear on the cell level annotation, and that seems to be enough. (Of course, implementations may optimize these things for conversions, i.e., extract the common aboutUrl values for cells, but that is not something for the model)

This was the result of issue #446, which we didn't ever discuss. But, it is consistent with recording the value of inherited properties as annotations on the columns.

@6a6d74
Copy link
Contributor

6a6d74 commented Apr 10, 2015

@gkellogg - if you can take a pass thru the csv2* docs that would be great.
Happy to review ...

On Friday, 10 April 2015, Gregg Kellogg notifications@github.com wrote:

The point of having column about URL separate from cell about URL (and
for property and value URLs) is to record the original template on the
column, for similar reasons that we might want to record the schema on
Table. The cell * URL annotations are fully resolved, and can be used by
the csv2* documents without needing to refer to any URL template language.

If you like, I can take a pass and updating the csv2rdf document language
as part of this PR.


Reply to this email directly or view it on GitHub
#469 (comment).

@gkellogg
Copy link
Member Author

@iherman:

The title property in the syntax is defined as "any number of human-readable titles for the column, each of which has an associated language represented as an object whose properties MUST be language codes as defined by [[!BCP47]] and whose values are arrays of strings related to that language." Is this correct, grammatically? But, more importantly... I personally would have preferred to keep it as it was, i.e., something like "any number of human readable titles for the column, each of which with an associated language as defined in [[!BCP47]]". However, if we do that change, the metadata document has to change, to (for natural language properties) so I did not do any change.

Ultimately, the form needed in the model annotation should be whatever is most convenient for the conversion documents, IMO. This also relates to a previous statement I made that it might be better if the cell value were actual described as an RDF Term or Literal. In this case, the title annotations could be a set of RDF Literals (either of type xsd:string, or rdf:langString). That makes them easiest to use in the conversion documents, but it introduces some conceptual baggage.

Grammatically, I think the sentence is correct, but it certainly is multi-layered. However, I can revert this to make it more vague, if that doesn't just make life harder for the conversion documents.

@gkellogg
Copy link
Member Author

@iherman:

I note that there are still two references to the metadata documents (beyond the url templates): foreign keys and transformations. I have no problem for the latter, I wonder whether the former should not be rather defined in the syntax document.

The table foreign keys annotation is somewhat similar to the column about URL annotation, in that it captures the information from the metadata document. The actual foreign key relationships are part of the row referenced rows annotation. I'll leave this for @jenitt to figure out.

@gkellogg
Copy link
Member Author

@iherman:

I see that you have kept the datatype diagram and corresponding text in the metadata document. It is not strictly necessary; after all, the definition is now in the syntax document. I do not have any strong feeling whether it is good to keep it in the metadata document or not; I though I would flag it nevertheless...

I think it's useful to have the diagram in both places. As a metadata author, it prevents having to bounce back and forth between the metadata and model documents.

@gkellogg
Copy link
Member Author

I believe I've made necessary changes to both csv2rdf and csv2json documents. @6a6d74, if you'd please check them out.

@gkellogg
Copy link
Member Author

Also, it might be worth considering taking the common parts of examples (annotation descriptions, mostly) and putting them in common included files rather than repeat them inline; it makes fixing issues consistently more difficult.

@iherman
Copy link
Member

iherman commented Apr 11, 2015

On 10 Apr 2015, at 19:17 , Gregg Kellogg notifications@github.com wrote:

The point of having column about URL separate from cell about URL (and for property and value URLs) is to record the original template on the column, for similar reasons that we might want to record the schema on Table. The cell * URL annotations are fully resolved, and can be used by the csv2* documents without needing to refer to any URL template language.

Yep, I can see the point. (B.t.w., the schema annotation is not yet in the document). Maybe it is worth emphasizing that these annotations are there for history record, ie, that other applications generating annotations may ignore them.

Ivan

If you like, I can take a pass and updating the csv2rdf document language as part of this PR.


Reply to this email directly or view it on GitHub.


Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

@iherman
Copy link
Member

iherman commented Apr 11, 2015

On 10 Apr 2015, at 19:30 , Gregg Kellogg notifications@github.com wrote:

@iherman:

I note that there are still two references to the metadata documents (beyond the url templates): foreign keys and transformations. I have no problem for the latter, I wonder whether the former should not be rather defined in the syntax document.

The table foreign keys annotation is somewhat similar to the column about URL annotation, in that it captures the information from the metadata document. The actual foreign key relationships are part of the row referenced rows annotation. I'll leave this for @jenitt to figure out.

Ok

Ivan


Reply to this email directly or view it on GitHub.


Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

@iherman
Copy link
Member

iherman commented Apr 11, 2015

On 10 Apr 2015, at 19:31 , Gregg Kellogg notifications@github.com wrote:

@iherman:

I see that you have kept the datatype diagram and corresponding text in the metadata document. It is not strictly necessary; after all, the definition is now in the syntax document. I do not have any strong feeling whether it is good to keep it in the metadata document or not; I though I would flag it nevertheless...

I think it's useful to have the diagram in both places. As a metadata author, it prevents having to bounce back and forth between the metadata and model documents.

Ok


Reply to this email directly or view it on GitHub.


Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

@iherman
Copy link
Member

iherman commented Apr 11, 2015

On 10 Apr 2015, at 19:35 , Gregg Kellogg notifications@github.com wrote:

@iherman:

Just for the good order, here are the changes I made in d567e9c:

My confusion was that I think you meant ba2d36b, rather than d567e9c (I thought those edits looked familiar, but it did find an error).

I am sorry... :-)

Ivan

+1 all these changes are great!


Reply to this email directly or view it on GitHub.


Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

@iherman
Copy link
Member

iherman commented Apr 11, 2015

On 10 Apr 2015, at 19:27 , Gregg Kellogg notifications@github.com wrote:

@iherman:

The title property in the syntax is defined as "any number of human-readable titles for the column, each of which has an associated language represented as an object whose properties MUST be language codes as defined by [[!BCP47]] and whose values are arrays of strings related to that language." Is this correct, grammatically? But, more importantly... I personally would have preferred to keep it as it was, i.e., something like "any number of human readable titles for the column, each of which with an associated language as defined in [[!BCP47]]". However, if we do that change, the metadata document has to change, to (for natural language properties) so I did not do any change.

Ultimately, the form needed in the model annotation should be whatever is most convenient for the conversion documents, IMO.

Well, for RDF, a simple array of language tagged literals is certainly easier; the JSON-LD form has to be unpacked, so to say. But the difference is not big.

What about:

"any number of human-readable titles for the column; titles are grouped by language codes as defined by [[!BCP47]], each group consisting of any number of titles in that language."

which is a bit less implementation specific and (maybe) clearer?

Ivan

This also relates to a previous statement I made that it might be better if the cell value were actual described as an RDF Term or Literal. In this case, the title annotations could be a set of RDF Literals (either of type xsd:string, or rdf:langString). That makes them easiest to use in the conversion documents, but it introduces some conceptual baggage.

Grammatically, I think the sentence is correct, but it certainly is multi-layered. However, I can revert this to make it more vague, if that doesn't just make life harder for the conversion documents.


Reply to this email directly or view it on GitHub.


Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

@JeniT
Copy link

JeniT commented Apr 11, 2015

Just to say, I did a bunch of edits on the plane on this branch on the conversion documents but there are lots of merge conflicts on them now. I'm in all day meeting today but either later today or during tomorrow I will resolve those and merge this in. All times are US times for me...

@JeniT
Copy link

JeniT commented Apr 11, 2015

(Still in the process of merging.) I have noticed one mismatch in expectations as I go so flagging it in case it requires discussion: in a cell value that is a list, each value can have its own datatype and language. There were lots of examples that made this clear but I'm not sure where they are now.

@gkellogg
Copy link
Member Author

If cell sub-values can have different datatypes or languages, we probably need a term to capture this. This is consistent with how we set the datatype and language on these values in the cell parsing steps, though. We should note that creating such annotations through this algorithm where these annotations are different is not possible now, though. This might go with the LCCR issue on multiple datatypes per column, though.

Jeni Tennison added 3 commits April 12, 2015 09:59
@JeniT
Copy link

JeniT commented Apr 12, 2015

I'm merging this now because there are lots of large changes and it will be a pain to apply other changes without these in place.

JeniT pushed a commit that referenced this pull request Apr 12, 2015
Issue 463: Change the `value` term in the model document to `annotation value`.
@JeniT JeniT merged commit 9456c82 into gh-pages Apr 12, 2015
@JeniT JeniT deleted the issue-463-property-values branch April 12, 2015 14:09
gkellogg added a commit that referenced this pull request Apr 12, 2015
… (comment):

> The title property in the syntax is defined as "any number of human-readable titles for the column, each of which has an associated language represented as an object whose properties MUST be language codes as defined by [[!BCP47]] and whose values are arrays of strings related to that language." Is this correct, grammatically? But, more importantly... I personally would have preferred to keep it as it was, i.e., something like "any number of human readable titles for the column, each of which with an associated language as defined in [[!BCP47]]". However, if we do that change, the metadata document has to change, to (for natural language properties) so I did not do any change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Property values

5 participants