Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which is the relationship between the format and the type of External Web resources? #308

Open
gsergiu opened this issue Jun 16, 2016 · 5 comments
Labels
Milestone

Comments

@gsergiu
Copy link

gsergiu commented Jun 16, 2016

The correspondent of model, message and multipart is missing in the current list of classes for type.
Multipart seems to be used for binary streams, model for 3D models and message for self-contained messages, being a special subtype application/structured data.

  1. Should 3D model and BinaryStream classes be added to the possible types of external resources?
  2. Does it really make sense to use the dctypes instead of MIME types? (I think that the mime types have a much higher usage/coverage in web resources)
  3. There is a clear correlation/correspondence between the Format and the Type and a source of inconsistences in the External Resources. I think that an explicit note should be written to explain this relationship and which is the property that takes precedence in case of inconsistent data (I suppose this should be the format, as this is more precise and more important information)
@gsergiu
Copy link
Author

gsergiu commented Jun 16, 2016

PS: I think that the type classes are redundant information in the case that the format is present, however, they are more user friendly representaions, which are also usefull for applying search filters.

@iherman iherman added this to the v2 milestone Jun 17, 2016
@azaroth42
Copy link
Collaborator

The list of types is not exclusive -- Binary and 3D could easily be used if there was agreement as to the class to use. Which I don't think there is.

The mime types are in dc:format.

And there isn't necessarily a clear correlation. For example application/pdf is a Text, whereas text/cvs is a Dataset. This is already in section 3.2.2: http://w3c.github.io/web-annotation/model/wd2/#classes

@gsergiu
Copy link
Author

gsergiu commented Jun 22, 2016

Hi @azaroth42 ,

Actually I think that the explanation included in the classes section is missleading, using a wrong interpretation of the mime types.

For resources that do not have obvious media types, such as many data formats, it is also useful for a client to know that a resource with the format text/csv should not simply be rendered as plain text, despite the first part of the media type, whereas application/pdf may be able to be rendered by the user agent despite the main type being 'application'.

I think that the goal of these classes is to advertise the type of the carried information (content).

  • application/pdf is perfectly correct, as the client need an external aplication in order to render the PDFs.
  • text/cvs is perfectly correct, having the meaning that this content can be rendered by (probably) any text editor. If the some clients want to use other applications (e.g. excell like) to render the cvs contents, that's fine. But the annotations shouldn't "prevent/impose" the usage of any application on client side. CSV per se is not Dataset, as the format is only tokenized (comma separated) text. It might be a textual serialization of a dataset, but even in this case the correct type is text and not dataset!
    One can embed a kind of schema in the CSV file and transform it to a Dataset format ... still without the usage of special applications that are able to extract the schema, the csv cannot be considered as being a Dataset.

@gsergiu
Copy link
Author

gsergiu commented Jun 22, 2016

So ... my conclusions:

  1. The explanation of the classes needs improvement (in V1)!
  2. By reading the specifications in comparison with TextualBody and SpecificResource, I would expect that the implicit class o external resources should be ExternalResource. In order to be consistent with the definition of the External Resources, I suggest to change the definition of the type in classes section , by replacing:
    "The type of the Body or Target resource" with "The type of the External Resources's content (in Body or Target)"

This will become consistent with the languagve and format references in the definition of the external resource:

Web Resources are identified with a IRI and have various properties, often including a format or language for the resource's content.

@tcole3
Copy link
Contributor

tcole3 commented Jun 27, 2016

While recognizing that it may eventually be possible (and desirable) to improve/clarify the current discussion of resource format and type properties as found in Sections 3.2.1 and 3.2.2 of the Web Annotation Data Model (Bodies and Targets / External Web Resources and Bodies and Targets / Classes), and that it may eventually be worthwhile to review in the context of external resources our decision to align these properties with dc:format and rdf:type (Web Annotation Vocabulary, Sections 3.2.10 [dc:format] and 3.2.23 [rdf:type]), more experience with the use of these properties in the context of Web annotations is required before doing so.

Reconsidering our use of dc:format and rdf:type as properties of resources and making revisions such as proposed now to Version 1 of the Data Model document would be premature given what we know of experience in other domains (e.g., the Dublin Core community which has long distinguished between dc:format and dc:type[1], schema.org which distinguishes between schema:fileFormat and schema:additionalType[2, 3], etc.). As it stands, the current text of 3.2.1 and 3.2.2 with regard to type and format aligns well with previous experience in other domains, while leaving room for future refinement based on annotation-specific experience.

For example, I anticipate (but it is too early to assume) that in an annotation context, resource type (class) may be important not only for rendering, but also for filtering annotations. A user (or software agent) may want only annotations of images, or only annotations of texts, or more specifically only annotations of texts that are of a specific genre, e.g., novel or article, etc. (see definition of dc:type), regardless of whether the format of the Resource is text/plain, application/pdf, text/xml, application/tei+xml. This use case would further illustrates the need to distinguish between format and type in much the way we have it now.

And while it is reasonable to talk about assigning an rdf:type to the content of a resource rather than to the resource itself, practice in the other communities we looked at was to associate type with the resource rather than with the content of the resource. Again the idea is to start in Version 1 by taking advantage of experience in other domains and then check back later (when ready to work on next major version) to see if we need to refine or clarify meaning based on implementer experience.

So, given support for this perspective in the most recent WG meeting, we will leave the postpone for future version of the Data Model and Vocabulary tag on this issue and return to it when ready to work on Version 2.

[1] http://dublincore.org/documents/dcmi-terms/
[2] https://schema.org/fileFormat
[3] https://schema.org/additionalType

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants