Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should tables have a schema annotation? #444

Closed
JeniT opened this issue Apr 5, 2015 · 12 comments · Fixed by #477
Closed

Should tables have a schema annotation? #444

JeniT opened this issue Apr 5, 2015 · 12 comments · Fixed by #477

Comments

@JeniT
Copy link

JeniT commented Apr 5, 2015

The data model for tables currently does not support a reference to the schema that the table adheres to. Being able to retrieve the schema for a table could be useful to an application, eg to support data entry or display, and it might be useful to be able to tell which tables within a given table group had the same schema.

So I wonder whether we should add a schema annotation on tables that would be populated through the tableSchema property within a metadata file. My thinking was that the annotation would hold a URL and be populated if and only if the schema for the table had an @id property (eg if it was referenced by URL in the first place). Thoughts?

@iherman
Copy link
Member

iherman commented Apr 5, 2015

@JeniT, could you give an example what you are considering? I am not sure I understand.

@JeniT
Copy link
Author

JeniT commented Apr 5, 2015

If the metadata contains:

"tableSchema": "http://example.org/schema.json"

then the data model would include a table whose schema annotation would have the value http://example.org/schema.json.

@iherman
Copy link
Member

iherman commented Apr 6, 2015

Ah. So, in a sense, it is sort of preserving history when creating the overall data model, knowing where a particular schema comes from (if loaded from the outside). But this is not something the author of the metadata is supposed to create, rather it is added by the process building the annotated tabular data, right?

The question is: what if the author does add a schema property to a schema, and that property value is not identical to the URL where the file comes from?

I am also not clear how the URL of the file of the schema (http://example.org/schema.json in your example) and the value of the @id (if any) in that schema relate. Would they both be added to the schema property? Only the URL?

@JeniT
Copy link
Author

JeniT commented Apr 6, 2015

Yes, preserving history.

To answer your second query, the @id comes during normalisation:

If the property is an object property with a string value, the string is a URL referencing a JSON document containing a single object. Dereference this URL and replace the string value with that object, adding @id using the original URL unless it already exists. Normalize each value in the resulting or original object recursively using this algorithm.

I don't understand your first question. Perhaps you were asking about what happens if the author adds a @id property to the schema explicitly? If so, that's the one that's used, even if it's different from the location of the schema file.

@iherman
Copy link
Member

iherman commented Apr 6, 2015

Yes, preserving history.

Thanks, I know understand what you are aiming for:-)

To answer your second query, the @id comes during normalisation:

If the property is an object property with a string value, the string is a URL referencing a JSON document containing a single object. Dereference this URL and replace the string value with that object, adding @id using the original URL unless it already exists. Normalize each value in the resulting or original object recursively using this algorithm.

I.e., the @id value wins even if it is different than the original URL.

I don't understand your first question. Perhaps you were asking about what happens if the author adds a @id property to the schema explicitly? If so, that's the one that's used, even if it's different from the location of the schema file.

No, that was not my question. Let us say the original metadata refers to http://ex.org/foo.json, and the author of the later includes, in that file,

  "@id" : "http://ex.org/foo.json",
  "schema" : "http://ex.org/bar.json"

How would we interpret http://ex.org/bar.json at the end of the day? It is probably an error but this is something that should be documented.

Actually... I wonder whether we should not make it even more restrictive, saying that this is one of those annotation properties (like row properties) that are generated by the process, but it is not part of the properties listed in the metadata document in the sense that the user cannot set that explicitly. That may avoid stupid errors.

B.t.w., I am fine with the original idea, just trying to flesh it out...

@gkellogg
Copy link
Member

gkellogg commented Apr 6, 2015

I don't understand your first question. Perhaps you were asking about what happens if the author adds a @id property to the schema explicitly? If so, that's the one that's used, even if it's different from the location of the schema file.
No, that was not my question. Let us say the original metadata refers to http://ex.org/foo.json, and the author of the later includes, in that file,

"@id" : "http://ex.org/foo.json",
"schema" : "http://ex.org/bar.json"
How would we interpret http://ex.org/bar.json at the end of the day? It is probably an error but this is something that should be documented.

I presume you mean "tableSchema" and not "schema", which is not a metadata property. In any case, that file is, itself, a Schema, and cannot have a "tableSchema" property, as that is used on a Table, not a Schema. I don't see how such an ambiguity might arise.

@iherman
Copy link
Member

iherman commented Apr 6, 2015

On 06 Apr 2015, at 16:24 , Gregg Kellogg notifications@github.com wrote:

I don't understand your first question. Perhaps you were asking about what happens if the author adds a @id property to the schema explicitly? If so, that's the one that's used, even if it's different from the location of the schema file.
No, that was not my question. Let us say the original metadata refers to http://ex.org/foo.json, and the author of the later includes, in that file,

"@id" : "http://ex.org/foo.json",
"schema" : "http://ex.org/bar.json"
How would we interpret http://ex.org/bar.json at the end of the day? It is probably an error but this is something that should be documented.

I presume you mean "tableSchema" and not "schema", which is not a metadata property. In any case, that file is, itself, a Schema, and cannot have a "tableSchema" property, as that is used on a Table, not a Schema. I don't see how such an ambiguity might arise.

No, I meant "schema", the (new) property @JeniT proposes!


Reply to this email directly or view it on GitHub.


Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

@gkellogg
Copy link
Member

gkellogg commented Apr 6, 2015

Ah, okay; if we want such a property, I'd suggest that we continue to call it tableSchema, as if the model is serialized, it would be difficult to use schema. In general, keeping the same properties in both the metadata and the abstract model makes sense to me. (In response to a request by @6a6d74, I did actually allow an Abstract Table to be serialized to JSON in my implementation, so the use of a schema property would be a complication).

If the Table Description has a Schema containing @id (after normalization), the value of @id forms the schema (or tableSchema) annotation on the Annotated Table.

If the referenced file contained a schema annotation itself, it would not be a valid Schema (as schema is not a defined property for a Schema. This would also be a problem if the Table Definition contained a schema property.

@iherman
Copy link
Member

iherman commented Apr 6, 2015

Ah, okay; if we want such a property, I'd suggest that we continue to call it tableSchema, as if the model is serialized, it would be difficult to use schema.

I am not sure about this. The way I understand @JeniT is that if I have a metadata of the form:

{
  "resources" : [{
    "url" : "http://www.ex.org/meta",
    "tableSchema" : "http://www.ex.org/schema"
  }]
  ...

then, after dereferencing the schema file, one would get something like

{
  "resources" : [{
    "url" : "http://www.ex.org/meta",
    "tableSchema" : {
       "schema" : "http://www.ex.org/schema",
       "columns" : [{
          ...
       }, {
          ...
       ],
       "aboutUrl" : "..."
       ...
    }
  }]
  ...

The role of the schema and tableSchema properties are very different in this configuration, i.e., I do not think we should use the same property. If nothing else, tableSchema is a Link Property in our jargon, whereas schema would probably be, simply, an Atomic Property...

I.e., I believe having schema is fine for what I called maintaining history. My only problem is whether the this is a "derived" property in the model or an explicitly defined property that the user can also set. My preference would be the former.

iherman added a commit that referenced this issue Apr 6, 2015
the SVG and PNG files will have to wait for the resolution of issue #444
@JeniT
Copy link
Author

JeniT commented Apr 6, 2015

No no, @iherman you are completely misunderstanding me.

I am not suggesting any new JSON property in the metadata document. I am suggesting a new annotation in the model, based on the existing tableSchema property in the JSON. (I am trying to be consistent in terminology: properties are used in the JSON metadata document; annotations are used on the conceptual model of the table.)

@iherman
Copy link
Member

iherman commented Apr 7, 2015

@JeniT, I do not think we disagree or I completely misunderstand you:-). The "resulting" metadata object in my comment is what is conceptually produced by a processor. We currently have no formalism to describe new annotation properties in the model and in the model only... My preference was also that this is not expressed by the user, so we are on the same line.

I think it is fine to have this. My only issue is how exactly this should be expressed editorially. As far as I know, this is the first annotation property that must be generated by the processor but it is not part of the metadata vocabulary. I guess:

  • it has to be formally defined as an annotation property in the model
  • it should be mentioned, probably in the processing section in the model document, that the processor must generate that extra annotation property

There are two extra questions though.

  1. tableSchema is not the only link property; we have others. Would we want to keep the history the same way as for table schemas? E.g., for dialects? I am not really in favor of it, but I thought I would flag that as a question.
  2. I can also imagine that an RDF generation may want to use this information as part of the provenance block. Do we want to work that out, or leave it for the LCCR? (I would prefer to leave it for now).

@gkellogg
Copy link
Member

At this point, I think it could easily be accomplished by adding some wording to the @id property of a Schema:

The value of this property becomes the value of the schema annotation for this table.

No further logic should be required, as if it is defined after normalization, then it can be used; it doesn't matter that it was originally a reference to an external schema, or that the @id property as added as a result of dereferencing the URL.

@JeniT JeniT self-assigned this Apr 12, 2015
JeniT pushed a commit that referenced this issue Apr 12, 2015
@JeniT JeniT assigned gkellogg and unassigned JeniT Apr 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants