Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile negotiation [RPFN] #74

Open
jpullmann opened this issue Jan 18, 2018 · 47 comments
Open

Profile negotiation [RPFN] #74

jpullmann opened this issue Jan 18, 2018 · 47 comments

Comments

@jpullmann
Copy link

Profile negotiation [RPFN]

Create a way to negotiate choice of profile between clients and servers


Related requirements: Profile definition [RPFDF] 
Related use cases: Detailing and requesting additional constraints (profiles) beyond content types [ID2] Standard APIs for metadata profile negotiation [ID30] 
@akuckartz
Copy link

👍

@nicholascar
Copy link
Contributor

I have put up a proposal at https://github.com/w3c/dxwg/tree/profiledesc-working/profiledesc/profileneg

@kcoyle
Copy link
Contributor

kcoyle commented May 16, 2018

Nick, profile negotiation is its own deliverable, as per the charter, and is so far based on a proposal by Lars and Ruben: https://profilenegotiation.github.io/I-D-Accept--Schema/I-D-accept-schema. It would be best not to start a separate effort, but to further what is already proposed. Also note that any "solutions" must be based on use cases and requirements. As I have mentioned before, we appear to be lacking use cases that would lead to the profileDesc work and this profile negotiation proposal.

@nicholascar
Copy link
Contributor

I think that the work I’ve outlined above is compatible with Lars’ & Ruben’s work.

In the implementions we’ve used before, a _format Query Sting Argument is used instead of it as a override for Accept header and _view QSA is effectively the equivalent of Accept-Profile.

I would be able to implement Profile headers in the 6 or so APIs delivering different profiles in operation now if I can get persistent URIs for the profiles.

We have discussed the registration of Profiles within our Govt Linked Data WG as registration would give them a persistent URI. We will likely register a series of Profiles for purposes such as an energy sector profile of DCAT (2018) but currently we are unclear about whether a catalogue of known profiles is needed or even possible. We may make such a thing for Aust Gov-approved profiles.

@larsgsvensson
Copy link
Contributor

I think we should be careful to try to standardise a way of putting profile information into URIs/URLs by mandating the use of _format or _view. I agree that it's one way of doing it, but there are others as well. The URLs to the specific resource versions can be propagated using http Link-headers or html link elements (and of course as normal <a href=... in the html pages).
A registry for profiles sounds good. There could even be several, community-specific registries.

@nicholascar
Copy link
Contributor

I agree that URI QSAs are only one of many ways of doing it and perhaps even a secondary way with HTTP headers being the primary, however I think such easy human use ways are very useful, hence my Use Case https://github.com/w3c/dxwg/issues/239

Since we are providing profile guidance, not just a single standard, I think we can base URI methods on (to be compatible with) HTTP methods.

@larsgsvensson
Copy link
Contributor

I don't disagree that we need easy ways for humans to address profiled versions of documents. What I disagree with is to say that we should mandate the use of _format or _view. There are other ways we can do that in the URL, e.g. by using a syntax à la http://example.org/entity.profile.filetype (e. g. http://example.org/myCatalogue.dcat-ap-de.ttl identifying the turtle serialisation of a dcat-catalogue using the DCAT-AP.de profile) instead of using http://example.org/myCatalogue?_view=dcat-ap-de&_format=turtle

@RubenVerborgh
Copy link
Member

Let's not break the Web; no spec should mandate the URL structure of a server.

A secondary way can just be to follow links, i.e., opening the main profile URI in the browser results in an HTML document with links to other representations (for which the server can determine the URIs of its own).

@larsgsvensson
Copy link
Contributor

+1 to @RubenVerborgh

@agreiner
Copy link
Contributor

agreiner commented Jun 1, 2018

I understand that the motivation for profile negotiation is that some users want to be able to use the same URL for versions of a dataset that conform to different profiles. Can anyone explain why some users want that? It does work to model the versions as representations of a resource, but it would also work to model them as separate resources (as in Ruben's suggestion above).

@RubenVerborgh
Copy link
Member

RubenVerborgh commented Jun 1, 2018

I understand that the motivation for profile negotiation is that some users want to be able to use the same URL for versions of a dataset that conform to different profiles.

No, the motivation is to have the same resource available in different profiles.
And resources on the Web happen to be identified by URLs.

Note that each representation still can have its own URL. We will just provide the mechanism to get from resource to representation.

Can anyone explain why some users want that?

  • to get from a resource to its representations
  • to see what other representations a resource has

It does work to model the versions as representations of a resource, but it would also work to model them as separate resources (as in Ruben's suggestion above).

Both models are the exact same, really.

To understand this, it's important to see that the "representation" concept is a relative notion. E.g., in the sentence "A is a representation of B", B the resource that A is the representation of. However, A is a resource in its own right.

An example to clarify:

  1. http://example.org/weather/amsterdam/2018-06-01 is the weather report for Amsterdam for 1 June
  2. http://example.org/weather/amsterdam/2018-06-01.html is the weather report for Amsterdam for 1 June in HTML

Regardless of whether 2 has its own URL, all of the following hold:

  • 1 is a resource
  • 2 is a resource
  • 2 is a representation of 1

@agreiner
Copy link
Contributor

agreiner commented Jun 1, 2018

I'm talking about the motivation to use negotiation. If the only motivation is to have the same resource available in conformance to different profiles, I don't see any particular reason to have profile negotiation that works like content negotiation. Having multiple profiles available is realized already by just offering a version of the dataset that conforms to one profile under one URL and a version that applies to another under another URL. Sorry I can't recall where it was expressed, but the idea of one URL for multiple profiles came from someone else in the group (maybe Lars?).

@akuckartz
Copy link

@agreiner

Create a way to negotiate choice of profile between clients and servers
https://www.w3.org/TR/dcat-ucr/#RPFN

@RubenVerborgh
Copy link
Member

RubenVerborgh commented Jun 2, 2018

I'm talking about the motivation to use negotiation.

Negotiation is what gets clients to the representation with their preferred profile.

If the only motivation is to have the same resource available in conformance to different profiles

No, that's not the motivation. We can do that with existing technologies already.

What existing technologies don't do, is automatically getting a resource represented in a profile the client understands.

I don't see any particular reason to have profile negotiation that works like content negotiation.

It's just like negotiating between XML or JSON, except more fine-grained:
https://ruben.verborgh.org/articles/fine-grained-content-negotiation/

Having multiple profiles available is realized already by just offering a version of the dataset that conforms to one profile under one URL and a version that applies to another under another URL.

But how does the client get from one to the other?
Our answer: content negotiation.

@kcoyle
Copy link
Contributor

kcoyle commented Jun 2, 2018

Can we use DCAT as an example? I'm going to toss one out but it may not be correct. What if you have a dataset that has a whole lot of census-type data, which includes a wide range of elements that can be seen as about people (age, race, employment, location). Not every use of the data wants to make use of all of the columns in the table. Would different profiles be the way to get the view of the data that you desire? If so, could there be a direct correlation between profiles and services? Or could it be that one person's profile is another person's service?

@kcoyle
Copy link
Contributor

kcoyle commented Jun 12, 2018

Yes, my serialization is your media type.

"It might or might not have its own identifier " - if there is no identifier, how will it be accessed/transmitted?

It's fine to have a "work" identifier (although again I caution that one needs to think very hard about what that identifier identifies), but any resource on the web has an identifier for the resource, not just the work. This is why I recommend that this work vs. actual thing be thought through carefully, and the relationship between those be clear. I don't know DCAT terribly well but this seems to be a difference between dataset and distribution. Obviously, the response to content negotiation is some form of distribution (in DCAT terms). In the FRBR sense, the work is an abstract concept with no physical/digital presence, and it is only when it is manifested (distributed) is there a non-abstract thing. So as long as the URI for the dataset refers to an abstraction, that makes sense, but I'm not clear on what the non-abstraction consists of.

@RubenVerborgh
Copy link
Member

RubenVerborgh commented Jun 12, 2018 via email

@kcoyle
Copy link
Contributor

kcoyle commented Jun 12, 2018

I think you misunderstood my question about non-abstractions, so let me make it clearer.

As I understand it:
DCAT dataset is an abstraction. It is only the distributions that are "real" - that is, that can be accessed. There is no access to a dataset EXCEPT through a distribution (in DCAT).

Therefore the non-abstraction above refers to a distribution (as defined in DCAT, which has some permanence) or some other "thing" that is returned from content negotiation. What I'm wondering is whether, in DCAT parlance, what is served through conneg is a distribution, or if it is something else, or if this isn't the right question?

Adding (from DCAT):
dcat:Catalog represents the catalog
dcat:Dataset represents a dataset in a catalog.
dcat:Distribution represents an accessible form of a dataset as for example a downloadable file, an RSS feed or a web service that provides the data.

@RubenVerborgh
Copy link
Member

RubenVerborgh commented Jun 12, 2018 via email

@nicholascar
Copy link
Contributor

nicholascar commented Jun 12, 2018

@kcoyle & @RubenVerborgh: are you saying above that one can interpret a resource and profiles of it as a Dataset and Distributions of it?

If so, I think this is problematic. I see many more types of Resources and Profiles of them than DCAT will allow for. E.g., a Sample identified by URI with profiles of metadata for different purposes. The Resource + Profiles pattern holds here but not Dataset +Distributions.

I can think of other cases: Datasets are just too “big” a thing for many Resources to be sensibly interpreted as them

@RubenVerborgh
Copy link
Member

@kcoyle & @RubenVerborgh: are you saying above that one can interpret a resource and profiles of it as a Dataset and Distributions of it?

I'm saying that a dataset is a resource, and that representations of that dataset conforming to certain profiles and serialized in a certain media type are distributions.

I see many more types of Resources and Profiles of them than DCAT will allow for.

That's fine. The mechanism is more generic than that. It's not because a dataset is a resource, that all resources are dataset.

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Jun 12, 2018

The alignment of DCAT and FRBR [1] is incomplete -

  • frbr:Work is effectively implemented in dcat:Dataset
  • frbr:Manifestation is implemented in dcat:Distribution
  • frbr:Expression is not implemented in DCAT - probably because in practice there is no artefact

dct:conformsTo provides a hook to indicate the standard (which can be a schema or profile) that a resource conforms to. But in DCAT that is associated with dcat:Resource|dcat:Dataset and not with dcat:Distribution. How is it typically used?

In order to fully match FRBR we would need a way to indicate different schematic representations of a dataset (i.e. conforming to different profiles), alongside the different serializations (media-types). Maybe add dct:conformsTo to dcat:Distribution where it should be used to indicate the schema/profile/view that this representation takes.

[1] https://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Jun 12, 2018

dcat:Distribution represents an accessible form of a dataset as for example a downloadable file, an RSS feed or a web service that provides the data.

@kcoyle - now we have added an explicit class for services (dcat:DataService and sub-classes) dcat:Distribution should not be used for a service. The definitions in the DCAT Editors Draft [1] have been tweaked slightly, but certainly could be further improved.

[1] https://w3c.github.io/dxwg/dcat/

@dr-shorthair
Copy link
Contributor

In an email that has not made into this GitHub thread, @agreiner takes us back to Fielding's analysis of web architecture, which distinguishes only Resource and Representation. The issue with that is that it conflates schematic representation and serialization into the one step.

As I understand it, this requirement (Profile Negotiation) is aimed at allowing the shcematic representation to be made explicit.

Meanwhile, @kcoyle has pointed out how this correlates with the FRBR conceptualization, which I've attempted to make more explicit two comments up.

@RubenVerborgh
Copy link
Member

The issue with that is that it conflates schematic representation and serialization into the one step.

Not conflates, but combines. Why is that an issue?

A representation can be negotiated over multiple dimensions, including media type, profile, language, etc.

As I understand it, this requirement (Profile Negotiation) is aimed at allowing the shcematic representation to be made explicit.

Yes.

@dr-shorthair
Copy link
Contributor

Yes, combines - that is a better word. Not a problem, but an issue that is being teased out in the discussion here. Yes, multiple dimensions. FRBR privileges schematic representation very high up the conceptual stack, with its own class, while somehow the web had neglected it until now!

@kcoyle
Copy link
Contributor

kcoyle commented Jun 13, 2018

Actually, FRBR is based on documents and doesn't really fit well with data - the whole "work/expression" thing is very text-based, and even librarians complain that they can't fit it will into music, film, etc. Rather than reference FRBR, why not simply say that there is an abstraction of the dataset which has certain metadata functionality (e.g. describes the dataset apart from any specific instances of it), and there are one or more distributions which have byte-presence.

@kcoyle
Copy link
Contributor

kcoyle commented Jun 13, 2018

@nicholascar In my mind, a profile defines a distribution. Presumably conneg requests a distribution that conforms to a profile. I'm not sure what you mean by "a Sample identified by URI with profiles of metadata for different purposes." This seems to be analogous to the library case, where there is a physical thing (book) that is described by metadata; and there can be profiles governing what metadata is distributed. Is that the same?

@nicholascar
Copy link
Contributor

nicholascar commented Jun 14, 2018

I am a little lost in the multiple things being considered here.

Can I check this RDF as an assertion, based on Ruben's comment #74 (comment):

:Dataset_X dcat:distribution :Distribution_Y .

:Distribution_Y 
    # a dimension, currently missing
    dct:conformsTo :Profile_Z ;  
    # another dimension, currently often catered for 
    dct:format <some_format> ; 
    # another example of a dimension
    # dct:language is indicated for Catalogue & Dataset only in DCAT1.0 but could be here due to no fixed DCT domain
    dct:language <http://id.loc.gov/vocabulary/iso639-1/en> .

You could get Distribution_Y by asking for Dataset_X with a distribution conforming to Profile_Z.

Interpretation using ProfileDesc:

  • no inference is drawn linking a dcat:Distribution to a prof:ImplResourceDesc due to no fixed DCT domains however usage makes them look related
  • the Profile referenced by the Distribution could be linked to validating tools via the prof:resource property, as per intended ProfileDesc use
:Profile_Z prof:resource :ImplResDesc_A ;
    dct:conformsTo <A_validation_standard> ;  
    dct:format <some_other_format> ; 
    prof:resourceRole rolesvoc:ConformanceTest .

@RubenVerborgh
Copy link
Member

The RDF snippet works for me.

@nicholascar
Copy link
Contributor

nicholascar commented Jun 14, 2018

@kcoyle in #74 (comment): I think there is an analog of sorts between my Sample example and your Book example but I'm keen to avoid any inferencing whereby someone then thinks that a Sample (or a Book) is then a Dataset. This would mean ensuring that while a profile could govern metadata distributed, what is distributed need not necessarily be a Distribution.

We can achieve this by having ProfileDesc as the general purpose ontology and ProfileDesc-like functionality allowed in DCAT, as indicated in my comment immediately above.

@nicholascar
Copy link
Contributor

nicholascar commented Jun 19, 2018

The test implementation of the Media Types Linked Data API I just set up implements both QSA & HTTP format & language negotiation within QSA & HTTP profile negotiation, e.g.:

Format:
Entry for https://w3id.org/mediatype/text/csv in RDF (turtle), default profile:
curl -L -H "Accept: text/turtle" http://w3id.org/mediatype/text/csv

Entry for https://w3id.org/mediatype/text/csv in HTML, ‘alternates’ profile (‘view’ as the API calls it) requested using the URI https://promsns.org/def/alt:
curl -L -H "Accept-Profile: <https://promsns.org/def/alt>" https://w3id.org/mediatype/text/csv

As above but in RDF (JSON-LD):
curl -L -H "Accept-Profile: <https://promsns.org/def/alt>" -H "Accept: application/rdf+json" https://w3id.org/mediatype/text/csv

Demo of weighted profile neg with not available view being ignored (not receiving HTTP 406):
curl -L -H "Accept-Profile: <http://example.org/notavailable>, <https://promsns.org/def/alt>; q=0.5" -H "Accept: application/rdf+json" https://w3id.org/mediatype/text/csv

Entry for https://w3id.org/mediatype/text/csv, alternates’ profile indicated by QSA using token & Media Type also indicated by QSA:
curl -L http://w3id.org/mediatype/text/html\?_view=alternates\&_format=application/rdf+xml

Entry for https://w3id.org/mediatype/text/csv default profile with format indicated by QSA using token overriding HTTP Accept header:
curl -L -H "Accept: application/rdf+xml" http://w3id.org/mediatype/text/html\?_format=text/turtle

Language:
A Media Type, default view, HTML, in Polish:
https://w3id.org/mediatype/audio/3gpp?_lang=pl

A Media Type, default view, HTML, in Polish (preferred), using HTTP headers
curl -L -H "Accept: text/html" -H "Accept-Language: pl, en" https://w3id.org/mediatype/audio/3gpp

In this configuration, both the format and language dimensions of the resource are dependent on (configured for a particular) profile. The alternates view of a Media Type shows all the options:

https://w3id.org/mediatype/audio/3gpp?_view=alternates

Note that the alternates view itself is only available in English and that the non-HTML serialisations of the “mt” view, while supposedly bing in Polish actually are not. This is an error for the dataset implementer (me) to fix with RDF lang mappings but the API is operating correctly now with both format & lang within profile QSA and HTTP-based negotiation.

Not Implemented yet:
A lot of things:

  • HTTP-based requests for profiles available for instance
  • Profile Description Ontology terminology - still using the Alternates View RDF

This is just a start.

@azaroth42
Copy link

A concrete use case that we have at the Getty today, that might help some of the commenters or at least provide an avenue for further clarifications:

The Getty Vocabularies are available as Linked Open Data. We currently provide exactly one schema which is a large super-set of SKOS. This schema is appropriate if you want to know absolutely every last thing that we know about the thesaurus terms. This is true for almost no one, it turns out ;)

We also manage data in the institution using a profile of CIDOC-CRM, with which SKOS is not very well-aligned natively but is trivially mappable. For consistency with these other holdings, we would like to make the vocabularies available at the same URIs using this profile. This demonstrates two points:

  • The media type is orthogonal to the profile, as you could ask for the full mega-skos profile in turtle, json-ld or rdf/xml, and the CRM profile in any of those formats too.
  • The URI is critically important to be the same for vocabulary entries, as a different URI would mean a different concept.

We also intend to have a pure SKOS profile for consumers that don't care about everything, but do need SKOS. Again, the format and profile are orthogonal in the same way, and the URI being the same is critical.

Please compare:

@kcoyle
Copy link
Contributor

kcoyle commented Jun 21, 2018

Rob's example above is what I would call the output from a "cross-walk" - data is converted from some database or metadata schema to another, and these schemas, in some cases, may be application profiles depending on their contents and functionality. It isn't clear to me if every use of metadata is a profile, however, so referring to profiles in the conneg work may not meet our definition of "profile", which is not (AFAIK) "any metadata schema." And not including non-profile metadata schemas may not satisfy the needs of conneg. We are going to have to spend some time on definitions. Note that we have (so far) defined profiles as:

A profile is a named set of constraints on one or more identified base specifications,
including the identification of any implementing subclasses of datatypes,
semantic interpretations, vocabularies, options and parameters of those base specifications
necessary to accomplish a particular function.

I think this is more restrictive than "arbitrary metadata schema".

Wanting to serve the same data using a different metadata schema has the reputation of being lossy (in terms of absolute semantics). Rob says: "a different URI would mean a different concept." But I'm not so sure that we aren't talking about different concepts, although I realize that this becomes philosophical at a point. I believe this is what is bothering @agreiner. These are different datasets. That doesn't mean that you can't give an identifier to your data in all of its forms, but the same data served with different metadata schemas as a result of a conversion process is indeed a different dataset. But what is really troubling me is the use of "profile".

(I know that "schema" isn't a great word to use here - substitute "model" or whatever you prefer if it bothers you.)

@azaroth42
Copy link

I believe that our use case falls under that definition, in that both profiles have multiple base specifications, with subclasses, specific interpretations, identified vocabularies for the data instances and are there to accomplish particular functions.

We are not talking about two different real world concepts of "gold", and hence the URI must be the same. If RDF/XML and Turtle are not different datasets, but SKOS and CIDOC-CRM are, then it seems the philosophy of the content negotiation deliverable is not aligned with the DCAT deliverable.

As a reductio ad absurdum, if in model (A) the requirement is to usedc:title, and in model (B) the requirement is to use rdfs:label but (A) and (B) are otherwise identical, that would be two different datasets. This seems ... undesirable.

@kcoyle
Copy link
Contributor

kcoyle commented Jun 21, 2018

Rob, I do see the problem as the alignment between the use of the term "profile" in the two different deliverables. Whether we can align them, we'll have to see. The use of "application profile" in deliverable 2 (guidance for APs) becomes quite broad if we are to cover ANY metadata. Yet the conneg use case may need to allow for any metadata schema, not just those that meet our definition of "profile."

As for if (A) and (B) are different datasets, the definition that I find in the DCAT document is:

"A dataset in DCAT is defined as a "collection of data, published or curated by a single agent, and available for access or download in one or more formats". A dataset is a conceptual entity, and may be represented by one or more distributions that serialize the dataset for transfer. "

Earlier discussion has likened DCAT datasets to FRBR:work (lots of warts there), so your definition of dataset coincides with the DCAT one, and I used "dataset" perhaps more in line with DCAT's "distribution" which reads: "Definition: | Connects a dataset to its available distributions." That definition seems to be undergoing discussion, and the emphasis on "serialization" may be an issue. I also note that "format" is dct:format, aka IANA media type. However, I'll try to be more in line with DCAT definitions in the future.

@larsgsvensson
Copy link
Contributor

The Use Case that @azaroth42 preesents sounds very similar to the one we have in the DNB where we want to and was described above. Good to hear we're not alone!

@nicholascar
Copy link
Contributor

de-tagging as Profile Negotiation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants