Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition of specific resources: @Type #137

Closed
gsergiu opened this issue Jan 15, 2016 · 17 comments
Closed

Definition of specific resources: @Type #137

gsergiu opened this issue Jan 15, 2016 · 17 comments
Labels

Comments

@gsergiu
Copy link

gsergiu commented Jan 15, 2016

Here is the current definition for Type of Specific resources:
https://www.w3.org/TR/annotation-model/#specific-resources
@type
The class of the Specific Resource
The Specific Resource MAY have the SpecificResource.

I would expect that the Specific Resource "MUST have" a Type what is one of SpecificResource and its derivates/chlidren (in contrast to "MAY have")

I would also imply that @type is mandatory for the Specific Resources (in order to make it clear for all implementors)

@iherman
Copy link
Member

iherman commented Jan 15, 2016

I respectfully disagree. Setting the type for resources where this is not necessary is an extra load on users where this does not add any relevant information. That section of the document has a number of examples where the type information is not present.

Implementers may find the type of the resources based on the properties that are used. In the RDF world, this is a fairly standard approach if the vocabulary clearly specifies the domain and/or the range or predicates. (There may be ambiguous cases where having the type information is good to have.) Having the vocabulary and its usage be simpler for users has a higher priority than saving some extra work for implementers...

@gsergiu
Copy link
Author

gsergiu commented Jan 15, 2016

well ... should every parser and serializer implement some complicated rules in order to extract type which is already explicit in the client applications that create the annotation?

Yes, I agree that type is not mandatory for Body and Targets, but I would expect that in this case, the default one should be used ... which is the simple resource identified by @id.

Maybe this issue is closely related to the one indicated in the text:
Issue 2
A future version of the specification may REQUIRE the use of SpecificResources for Body and Target in order to ensure consistency at the expense of additional structure.

If this change will be implemented and all resources are considered by default Specific Resources, the situation will be changed.

In support to that change:
I also must confess that I had the feeling that the representation of specific resources is a kind og "buttom-up" approach and I find the "top-down" a more natural representation.
(e.g. the face detector on freebase is just a circle overlay on an image. So .. wenn tagging a person in an image you basically have only the URL to the image and the Selector for the Circle, but not id for the image region).... I think that in many cases the "Specific Resources" are not required to have an own ID (like the sections in an HTML document, that might have an anchor defined or they might not ...)

@azaroth42
Copy link
Collaborator

This discussion already took place with the outcome that is currently in the documents that @type/rdf:type was not mandatory for SpecificResource. I don't see any new information to reopen that issue beyond feedback that type is expected by implementers.

I'm going to leave this open, but I do not think we need to discuss it until there's additional feedback from the community.

@iherman
Copy link
Member

iherman commented Jan 15, 2016

This discussion already took place with the outcome that is currently in the documents that @type/rdf:type was not mandatory for SpecificResource. I don't see any new information to reopen that issue beyond feedback that type is expected by implementers.

I'm going to leave this open, but I do not think we need to discuss it until there's additional feedback from the community.

Actually, for our own admin, it would be better to close this issue referring to the discussion that happened (and close) elsewhere.

@gsergiu
Copy link
Author

gsergiu commented Jan 15, 2016

Actually I’m a developer implementing the WA standard.
I have to parse the json annotation to a domain model.
As the bodies can be SimpleResource or SpecificResources I must know to which class I have to parse the body.

Given the definition of the type attribute, I would expect that this holds the information I need to decide when I have to parse the body to a SimpleResource and when to parse to a SpecificResource.

@type
Used to set the data type of a nodehttps://www.w3.org/TR/json-ld/#dfn-node or typed valuehttps://www.w3.org/TR/json-ld/#dfn-typed-value. This keyword is described in section 6.4 Typed Valueshttps://www.w3.org/TR/json-ld/#typed-values.

I don’t think that every client should implement custom rules to identify the class of the body…

If the missing of @type attribute should imply the body is a SimpleResource, this is reasonable implication.
However if for a SpecificResource the @type is missing, I don’t find it reasonable to evaluate the values of a list of properties in order to guess that the Application that created the Annotation was using a SpecificResource in the body….

I would suggest that @type property is not mandatory for SimpleResource but it should be mandatory for SpecificResource.

In any case, I think that the standard should write at least a non-normative Note to clarify this issue (how to identify the @type if this is missing).
Are there other developers having a different opinion?

From: Rob Sanderson [mailto:notifications@github.com]
Sent: Freitag, 15. Jänner 2016 19:53
To: w3c/web-annotation
Cc: Gordea Sergiu
Subject: Re: [web-annotation] Definition of specific resources: @type (#137)

This discussion already took place with the outcome that is currently in the documents that @type/rdf:type was not mandatory for SpecificResource. I don't see any new information to reopen that issue beyond feedback that type is expected by implementers.

I'm going to leave this open, but I do not think we need to discuss it until there's additional feedback from the community.


Reply to this email directly or view it on GitHubhttps://github.com//issues/137#issuecomment-172050477.

@iherman
Copy link
Member

iherman commented Jan 16, 2016

@gsergiu, without going into details: the question is whether it is possible or whether it is impossible to implement what is necessary. What I hear from you is that it is possible, though a bit awkward. I understand that, but what we have to weight against it is how awkward is it for a user to add those typing information in cases where, for the user, this step is not really obvious and does not "feel" necessary.

Maybe we have to look at each case separately to have a feeling for it but, as a general rule, I believe wherever something can be implemented without too much complication, and it does simplify the life of the end user, than the latter takes it...

@csarven
Copy link
Member

csarven commented Jan 16, 2016

Only meant to add/expand @iherman 's above; we need to weigh in the cost of publishing and the cost of consuming with one another.

@gsergiu
Copy link
Author

gsergiu commented Jan 18, 2016

hi @iherman @csarven,

Now I understand your concern. And I would add to it, that the annotation applications work a kind of "write once - read many" scenario. If publishing here, means creation, and consumption means read, I would like to mention that the worflow actually includes more actions and processing needs to happen at each of them.
Typically, there is a client application implementing the user interface and and a server that stores and publishes annotations, and them there might be more clients consuming (reading) the annotations. (for reference you can see the following presentation: http://de.slideshare.net/antoineisaac/modelling-and-exchanging-annotations-swib15 )

So ... I'm writing this, becasue it is important to understand how a typical environment looks like and how the resposabilities are separated between the User Informantion, Client Application and Server:

  1. The user should select and annotation editor and provide only his relevant knowledge (e.g. select a resource and write a text, possibly also select a language and an area)
  2. The client application must know the "type" of the annotation (i.e. from the annotation editor), serialize it and submit it to the server. Other technical information, like serializedAt/By creator, created, etc. should be inferred by the Client Application and a consistent Annotation has to be submitted to the server.
  3. The server must parse the annotation to the same "type", check for its consistency and store.
  4. Other client application may read from server the new created annotation, and the server needs to serialize again the annotation to the same representation (e.g. jsonld that he got from the first client), and the "second" client must parse the annotation to the same "type" as the first client created it.
  5. A second user of the second client must understand the Annotation in the same way as the first user of the first client, that actaully created it.

(I think that the standard must ensure the consistency between the representation and interpretation of the annotations in all of these 5 steps)

By Annotation "type" here, I mean something like an "annotation having a Semantic Tag in the body".

As you can see in the workflow the parse/serialize is called several times by each client and server, and these clients must "understand" in the same way the content of the annotations. Consequently the "@type" property is very important for ensuring the "common understanding" between the server and different clients.

An this is why I would recommend that for each property of the annotation that holds an object the "@type" must/should be specified if it is not the default one (including, body, target, selectors, agents, etc.). The standard should also explicity state which is the default "@type".

So .. these are my expectations as an implementor of the standard.
(basically, this can be reduced to the sentence that 2 different developers must have the same understanding of the standard's text).

@gobengo
Copy link

gobengo commented Jan 19, 2016

@gsergiu

well ... should every parser and serializer implement some complicated rules in order to extract type which is already explicit in the client applications that create the annotation?

Keep an eye on this Social WG Editor's Draft from @tantek
https://www.w3.org/wiki/Post-type-discovery

@gsergiu
Copy link
Author

gsergiu commented Jan 19, 2016

Well … I don’t say it is impossible. I say it is complicated and it is a must (one way or the other).
So … I think these are legitimate questions:

  1.  Why should the client application (or serializer) drop the type information when it has it and the server (parser) try to guess it, when it can simply read it?
    
  2.  What are the arguments against making the type mandatory (when for any mapping tool this is mandatory information)? Are these arguments stronger than the simplicity of serialize/parse functionality?
    

Br,
Sergiu

Von: Benjamin Goering [mailto:notifications@github.com]
Gesendet: Dienstag, 19. Januar 2016 09:05
An: w3c/web-annotation
Cc: Gordea Sergiu
Betreff: Re: [web-annotation] Definition of specific resources: @type (#137)

@gsergiuhttps://github.com/gsergiu

well ... should every parser and serializer implement some complicated rules in order to extract type which is already explicit in the client applications that create the annotation?

Keep an eye on this Social WG Editor's Draft from @tantekhttps://github.com/tantek
https://www.w3.org/wiki/Post-type-discovery


Reply to this email directly or view it on GitHubhttps://github.com//issues/137#issuecomment-172770090.

@iherman
Copy link
Member

iherman commented Jan 19, 2016

@gsergiu,

I do not want to go into all details of your client-server description. Suffices it to say that I agree with what you describe in general, although I may not agree with all the details (e.g., you say "server must parse the annotation": I am not sure the "must" is really a must in all cases). But this is a detail.

My concern comes from a very different viewpoint.

I think that, first of all, we can agree that there are and will be much more clients than servers. These clients may be fully automated programs (like the one you describe) but may also be humans: for example, the annotation model is referred to by such documents as the Metadata Vocabulary for Tabular Data, whereby the "author" of a dataset may want to add extra, essentially unstructured metadata in the form of an annotation using our model (do not be confused by a, alas!, different terminology: that document uses the term "annotation" in a very different way, but has a field for "notes" which corresponds to an annotation as we refer to in this WG). If there are (many) more clients (humans or machines) then we clearly have to optimize on what clients should do, and minimize the load on them even if this means a slightly more complicated processing on the server side. I hope we can agree on that.

So the question is: is adding a "type" to a structure (the full annotation, a specific resource, or whatever) such a big deal? Is it an acceptable requirement for the user?

Well... the answer depends. You or I, or some others in the group who have gone through the blessings or the curses (depends whom you ask:-) of learning and being familiar with Semantic Web concepts have no problems with the concept of a type, and we may not consider it a problem to add such and information explicitly. However, we have to realize and accept that this may be an alien concept and an extra cognitive load for many. This includes not only the human clients, but also the implementers of the clients running as programs. “Why adding a type when it is obvious?” is the question they would ask. And, you know what? They are right in asking that. Why indeed, if a human, or a program, can easily deduce a type information? (The Tantek’s document is a great example: let a program discover the type instead of imposing the extra load of setting it on the client...)

The underlying issue is actually more general. We must realize (this has been an issue with the work in the group ever since it started) that we are shaping a JSON vocabulary that is (also) supposed to be used by people (humans and implementers) who are not Semantic Web people, and who may, God Forbid!, be actually very averse to the Semantic Web. Hence we have to think twice before pushing something "Semantic Webby" (and package it nicely). We have to do this while we do maintain a strict adherence to Semantic Web for those who care. Ie, we are trying to satisfy two sometimes very different communities by trying to find compromises.

Hence the approach I am trying to defend: add a type if and only if the type cannot be deduced. Ie, when it is an essential information that is absolutely necessary for processing to properly happen. Yes, server implementers may be unhappy, but they can do it nevertheless. But if this leads to a more diverse set of clients, it is worth it.

(B.t.w., this approach is not sooo alien for Semantic Web people either. This is why one sets domains and ranges for properties: by using RDFS inferencing a SW client can deduce the type of the object or the subject, respectively, so that the client would not have to bother. This is the same concept, after all, but which has been often forgotten because RDF environment rarely implement RDFS inferencing. Unfortunately...)

@gsergiu
Copy link
Author

gsergiu commented Jan 20, 2016

Hi Ivan,

  1.  I’m claiming that at serialization time, the type should be written explicitly in the annotation if it is not the default one.
    

    This will be solved by software applications, I don’t say that the human users should provide this information, which is “technical” by nature and not “functional”

  2.  I don’t think that the Metadata Vocabulary for Tabular Data is representative for client applications, however if the claim is that there are some exceptional scenarios don’t need the type information (either because there is only one type used for each object included in the annotation), I can accept it. However, I would estimate that the share of the annotations created within this kind of scenarios will be le that 1%. Therefore … I have no problem if the standard says “SHOULD serialize the type” … instead of the MUST…
    
  3.  I don’t find Tantek’s document to be a good example.   Why should we use a software application to guess types when we already know them (at the creation time) ?
    
  4.  Why is the type obvious? For whom is the type obvious?
    
  5.  “Type” is a keyword that is understood by anyone that has at least minimal computer science .. and not only
    

See https://en.wikipedia.org/wiki/Instance_%28computer_science%29

The meaning of the term "type" in computer science is rather similar to the meaning of the word "type" in everyday language. For example, a barman can ask a client what type of beverage does he or she want – coffee, tea or beer? A particular cup of coffee that the client receives is in the role of an instance, while two cups of coffee would form a set of two instances of coffee, determining its type at the same time.

  1. It is not only the servers, but also the clients. Annotations are serialized by clients and parsed by servers at creation time. And the other way around during retrieval. So there is a common interest to share the types between servers and clients. Therefore I do not support the following conclusion…
    

“Yes, server implementers may be unhappy, but they can do it nevertheless. But if this leads to a more diverse set of clients, it is worth it.”

  1.  From my point of view a standard is made to reach a common understanding and not for diversification purposes. Therefore I do not support the following conclusion..
    

“But if this leads to a more diverse set of clients, it is worth it.”

  1. Sorry if I defend the developer’s point of view. But knowing is better than guessing in any situation…
    

Br,

Sergiu

From: Ivan Herman [mailto:notifications@github.com]
Sent: Dienstag, 19. Jänner 2016 15:39
To: w3c/web-annotation
Cc: Gordea Sergiu
Subject: Re: [web-annotation] Definition of specific resources: @type (#137)

@gsergiuhttps://github.com/gsergiu,

I do not want to go into all details of your client-server descriptionhttps://github.com//issues/137#issuecomment-172481249. Suffices it to say that I agree with what you describe in general, although I may not agree with all the details (e.g., you say "server must parse the annotation": I am not sure the "must" is really a must in all cases). But this is a detail.

My concern comes from a very different viewpoint.

I think that, first of all, we can agree that there are and will be much more clients than servers. These clients may be fully automated programs (like the one you describe) but may also be humans: for example, the annotation model is referred to by such documents as the Metadata Vocabulary for Tabular Datahttp://www.w3.org/TR/tabular-metadata/, whereby the "author" of a dataset may want to add extra, essentially unstructured metadata in the form of an annotation using our model (do not be confused by a, alas!, different terminology: that document uses the term "annotation" in a very different way, but has a field for "notes" which corresponds to an annotation as we refer to in this WG). If there are (many) more clients (humans or machines) then we clearly have to optimize on what clients should do, and minimize the load on them even if this means a slightly more complicated processing on the server side. I hope we can agree on that.

So the question is: is adding a "type" to a structure (the full annotation, a specific resource, or whatever) such a big deal? Is it an acceptable requirement for the user?

Well... the answer depends. You or I, or some others in the group who have gone through the blessings or the curses (depends whom you ask:-) of learning and being familiar with Semantic Web concepts have no problems with the concept of a type, and we may not consider it a problem to add such and information explicitly. However, we have to realize and accept that this may be an alien concept and an extra cognitive load for many. This includes not only the human clients, but also the implementers of the clients running as programs. “Why adding a type when it is obvious?” is the question they would ask. And, you know what? They are right in asking that. Why indeed, if a human, or a program, can easily deduce a type information? (The Tantek’s documenthttps://www.w3.org/wiki/Post-type-discovery is a great example: let a program discover the type instead of imposing the extra load of setting it on the client...)

The underlying issue is actually more general. We must realize (this has been an issue with the work in the group ever since it started) that we are shaping a JSON vocabulary that is (also) supposed to be used by people (humans and implementers) who are not Semantic Web people, and who may, God Forbid!, be actually very averse to the Semantic Web. Hence we have to think twice before pushing something "Semantic Webby" (and package it nicely). We have to do this while we do maintain a strict adherence to Semantic Web for those who care. Ie, we are trying to satisfy two sometimes very different communities by trying to find compromises.

Hence the approach I am trying to defend: add a type if and only if the type cannot be deduced. Ie, when it is an essential information that is absolutely necessary for processing to properly happen. Yes, server implementers may be unhappy, but they can do it nevertheless. But if this leads to a more diverse set of clients, it is worth it.

(B.t.w., this approach is not sooo alien for Semantic Web people either. This is why one sets domains and ranges for properties: by using RDFS inferencing a SW client can deduce the type of the object or the subject, respectively, so that the client would not have to bother. This is the same concept, after all, but which has been often forgotten because RDF environment rarely implement RDFS inferencing. Unfortunately...)


Reply to this email directly or view it on GitHubhttps://github.com//issues/137#issuecomment-172872489.

@gsergiu
Copy link
Author

gsergiu commented Jan 20, 2016

let's take a practicle example from the real world .. inspired from the wikipedia example.

If an italian is comming to Vienna and orders a coffee, and the waiter is bringing an American coffee, this will results in an emberasing situation both for italian customer and waiter.

In the most cases, though "a coffee" the italian means and expreso:
https://en.wikipedia.org/wiki/Espresso

and being in Vienna the default coffee should be Melange:
https://en.wikipedia.org/wiki/List_of_coffee_drinks#Wiener_or_Viennese_melange

And it is rather an exception for Europeans to drink Cafe Americano:
https://en.wikipedia.org/wiki/List_of_coffee_drinks#Caff.C3.A8_Americano

This is why I claim that the type of the coffe is a "must" both during the ordering process (creation time) and in the billing process (delivery time).

I also want to claim that this is the natural behaviour for any customer!
What I really don't understand, why are the Semantic Web people so keen to get rid of Semantics? (for me the type is the core of the semantic)

@azaroth42
Copy link
Collaborator

Proposal: SHOULD for type: SpecificResource instead of MAY.

Can we live with that?

@iherman
Copy link
Member

iherman commented Jan 21, 2016

Fine. I believe we are running in circles, repeating the same arguments. This is an acceptable compromise.

@gsergiu
Copy link
Author

gsergiu commented Jan 21, 2016

Thanks for openess on compromises. :)

@azaroth42
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants