Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiles are uniquely named sets of data constraints, such as data elements (e.g. classes, properties, value domains) that describe (meta)data. #275

Open
nicholascar opened this issue Jun 27, 2018 · 32 comments

Comments

@nicholascar
Copy link
Contributor

Entered from Google Doc

@nicholascar nicholascar changed the title Requirement: Profiles are "named collections of properties" or metadata terms (if not RDF) [ID41] (5.41) Profiles are "named collections of properties" or metadata terms (if not RDF) [ID41] (5.41) Sep 1, 2018
@aisaac
Copy link
Contributor

aisaac commented Nov 7, 2018

Could this be reworded as "data" or "(meta)data"? It's perhaps better to make it more general.

@kcoyle
Copy link
Contributor

kcoyle commented Nov 7, 2018

The issue here was the use of "properties" and whether that applies only to RDF. So "or metadata terms" was (awkwardly) added to show that solutions other than RDF are included. Any wording that you can come up with that gets that across would be acceptable to me.

@aisaac
Copy link
Contributor

aisaac commented Nov 8, 2018

ok I won't add to the currently ongoing work by suggesting new wording now, but this is worth keeping in mind, thanks.

@jpullmann
Copy link

Please consider rewording in more requirement-oriented , imperative terms of RFC 2119, e.g.
"Profiles may/should comprise a named collection of properties or metadata terms, if not specified by RDF."

@kcoyle
Copy link
Contributor

kcoyle commented Nov 20, 2018

I'd prefer:

Profiles should comprise a collection of metadata terms
(The "if not RDF" was added because of the use of "properties" but I think that "metadata terms" covers both.) I think "comprise" is a difficult word and will try to think of something better.

(addding:)
Profiles should be made up of a collection of metadata terms that have been defined in published vocabularies.

This matches the DCMI definition, although as I recall Annette thought that profiles should be able to define new terms.

@aisaac
Copy link
Contributor

aisaac commented Nov 20, 2018

Actually I would argue against 'comprise' in @jpullmann 's suggestion because it feels strange to say that profile comprises a collection with a name, while the original idea is that the name of the collection is the name of the profile itself.
@kcoyle 's rewording fixes this, but then the name is gone!

Then the fact that the elements of a profile are from published vocabularies is probably relevant but I would prefer to rely on other existing requirements to carry this instead of extending the scope of this one. E.g. we could piggyback on the one that says that profiles are based on existing specifications. Such a requirement is probably a better fit, as profiling a specification is probably nonsensical if that specification has not been published anywhere.

And talking about @agreiner I guess she may argue against restricting the scope to 'metadata' - she has made a comment about this for the intro of the Profile guidance ;-)

@kcoyle
Copy link
Contributor

kcoyle commented Nov 20, 2018

@aisaac I'm open to your suggestions - maybe:

Profiles should made up of a collection of properties or terms, and the collection should have a name.

The problem is that it is a named collection but it can't be "made up of a named collection" (because it can't be made up of a single thing, a named collection) although it can be that they "are a named collection".

@aisaac
Copy link
Contributor

aisaac commented Nov 20, 2018

"Profiles should be made up of a collections of properties or data terms and have a name"?

Sorry I wasn't very constructive: in fact my only issue about the original wording was on the specific aspect of data vs metadata.

@kcoyle
Copy link
Contributor

kcoyle commented Dec 19, 2018

I have suggested this wording in the profiles guidance document. It may need to be expanded:

2.1 Profiles are named collections of properties

Profiles SHOULD be made up of a collection of properties or terms, and that collection SHOULD have a name. Properties are selected from existing vocabularies. Because profiles as being described here are not limited to those in RDF, collections could be of metadata terms from any type of metadata schema.

@aisaac
Copy link
Contributor

aisaac commented Dec 19, 2018

I agree with the general idea of the proposal, but I would nitpick over two words:

  • "properties" seem too narrow even in the RDF context. A profile can be made of classes, too. We could perhaps have both "properties or classes", or just "data terms" or maybe "data elements"
  • following @agreiner 's earlier comment, we should try to employ "data" instead of "metadata". Perhaps "(meta)data" would be a way to mention both.

@kcoyle
Copy link
Contributor

kcoyle commented Dec 19, 2018

I very much agree about "properties" and am struggling to find a better word or phrase:

  • vocabulary terms (which could include classes)
  • data elements (rather old-fashioned but could also include classes)
  • data terms (Singapore framework uses "metadata terms" - "data terms" sounds odd to me)
  • elements from selected vocabularies or ontologies - longer, but also doesn't address whether one can define new elements with in vocabulary, which also has come up in the past.

I get the desire to speak of "data" as well as "metadata" but I wouldn't want to confound a profile with the instance data in a dataset, such as statistical data. The profile is not instance data (except of the profile itself, but that seems tautological), but a set of terms and constraints that are used to create instance data that usually has a descriptive role.

Maybe we need to define data and metadata?

@aisaac
Copy link
Contributor

aisaac commented Jan 11, 2019

From @agreiner in email on 20-12-2018:

I agree with using the word "metadata" in this context.

@aisaac
Copy link
Contributor

aisaac commented Jan 11, 2019

@kcoyle before I make a proposal about the wording itself, a bit of clarification about metadata vs data is perhaps needed, indeed. Or more precisely about 'data terms (or elements)' vs 'metadata terms (or elements)', because I feel that this is the problem (since I guess we both agree on what metadata and data are, and that metadata can be seen as a kind of data).

I call 'data terms (or elements)' the classes and properties that are used to create (instance) data. And 'metadata terms (or elements)' the classes and properties that are used to create metadata. Profiles can be for either level, e.g. there can be DCAT profiles for data catalogues (which is metadata) and profiles for the statistical data that is in a dataset. In any case, in my view writing that profiles are made up of (meta)data terms (or elements) wouldn't confound profiles with instance data. Are we on the same line?
I reckon this is probably a very naive and superfluous check from me. But one never knows - and maybe some of the wording above can be used to clarify things in the Guidance doc, as a nice side effect ;-)

@kcoyle
Copy link
Contributor

kcoyle commented Jan 11, 2019

@aisaac "there can be DCAT profiles for data catalogues (which is metadata) and profiles for the statistical data that is in a dataset." I totally agree on the separation between descriptive data (metadata) and instance data, and that profiles can consist of element sets for either type.

Profiles are sets of elements intended to define data. Those elements have names (e.g. dct:title). The named elements may be for units of instance data or units of descriptive metadata. The profile itself should have an identifier by which it can be referred.

So:
Profiles are sets of elements that define instance data or metadata. Profiles should/must have an unique identifier or name.

?? It's very hard to do as a single sentence, and I don't know if we've decided that something is NOT a profile if it doesn't have an identifier. Even saying "must" does not mean that a profile without an identifier is not a profile, only that it doesn't meet our best practices. So having the name in the definition gives us a philosophical problem, IMO, and that should instead be a requirement, not a definition.

I'm not sure this helps, but you should give it your best shot and then I think we should consider this done.

@aisaac
Copy link
Contributor

aisaac commented Jan 13, 2019

I like very much your proposal @kcoyle . Indeed it's hard and the progress looks minor, but I believe this word- and concept-smithing is precious.

I would suggest to adapt your suggestion to an even more "requirement-focused" approach (as @jpullmann suggested earlier) by having a should/must for the first part of the requirement too. And keep classes and properties as example, in order to make the thing easier to relate to existing approaches

"Profiles must be made up sets of elements, such as classes and properties, that define instance data or metadata. Profiles must have an unique identifier or name."

I'm opting for "must" over "should" because I think this reflects the original requirement. I.e. for the corresponding use case I guess that a profile that doesn't bring a set of elements and that doesn't have a name is useless.

The WG may argue about this later on, as you suggest, but I believe this is a fair re-writing of the requirement. We could add a note about it, calling for feedback. But at least we would have solved a first round of discussion about this requirement :-)

@kcoyle
Copy link
Contributor

kcoyle commented Jan 13, 2019

Great, @aisaac. I say let's go with your re-write and we'll see if anything that follows in the document causes us to re-think it.

@rob-metalinkage
Copy link
Contributor

I dont feel very comfortable with this rewrite yet..

"Profiles must be made up sets of elements, such as classes and properties, that define instance data or metadata. Profiles must have an unique identifier or name"

is confusing data and metadata (of course) but i dont think we want to join the list of failed attempts to distinguish these (failed as in we dont have an obvious consensus yet!)

specifically, i think profiles consist of "statements about" "data elements" ( and thus can be seen as metadata about such elements, and indirectly about data within such elements). The profile represents a set (i.e a named set) of such statements.

@kcoyle
Copy link
Contributor

kcoyle commented Jan 14, 2019

I could go for "statements about data elements" - now we need to fit that into the rest. The trick here is :

statements
about data elements
that

I'm not sure whether "that" refers to the statements or the data elements or "statements about data elements". Does anyone have a better grasp of grammar rules for this?

@rob-metalinkage
Copy link
Contributor

i think both the data elements and the statements about the data elements define aspects of the data instances (but probably not a full definition either - for example if we say an author identifier must be an ORCID id, there is a lot of external context defining what an ORCID id is that would never be replication within a profile that requires it.

so maybe "that" = "where the data element definition and the statements about it defined by a profile combine to provide metadata about data instances" ...

@larsgsvensson
Copy link
Contributor

If it doesn't have to be one sentence, we could write "Profiles are made up sets of statements about data elements, such as classes and properties. Those statements define instance data or metadata. Profiles must have an unique identifier or name"
And I'm against saying that "profiles must be made up sets of elements". We write a definition of what a profile is which means that if something is a set of statements about data elements and has a unique name, then it's a profile. Perhaps it would be even better to say "A profiles is a uniquely identifie, made up sets of statements about data elements, such as classes and properties. Those statements define instance data or metadata."

@aisaac
Copy link
Contributor

aisaac commented Jan 14, 2019

I'm a bit confused about this exchange. To me the point about metadata in the last wording was about saying that the data created according to profiles can happen to be metadata for other data (just as there are profiles for the metadata in data catalogues). I didn't want to stress that statements in a profile are metadata (even though, yes, they are). I.e I wanted to raise the possibility of profiles-for-metadata, not profiles-as-metadata.
Maybe 'that apply to instance data or metadata' would render this better than 'that define instance data or metadata' would be clearer?

@larsgsvensson I don't understand your last comment: where do we say that "if something is a set of statements about data elements and has a unique name, then it's a profile"?
Note that in my view the re-wording here is only for the requirement title, as we were not happy with the current title. The impact on the definition remains to be handled, and 'negotiated' against other requirements and preoccupation.

@larsgsvensson
Copy link
Contributor

@aisaac Perhaps it's only a matter of style how we write definitions. One way is to say "A profile MUST have A, B and C". Another way is to say "If a resource has A, B and C, then it's a profile" (kind of indirect typing). I think I'd prefer the second one and then go on "If your resource turns out to have a profile, then it MUST have those attributes, too".

And yes, that's a discussion for later since this discussion is about this requirement's title. Perhaps "Profiles are uniquely named sets of statements about data elements, such as classes and properties, that describe how instance data is structured". That would highlight that profiles help to understand the inner structure of a (meta)data collection.

@aisaac
Copy link
Contributor

aisaac commented Jan 15, 2019

@larsgsvensson ok it's a matter of style indeed. Let's discuss this later :-)
I am ready to change the title but would like to hear from @rob-metalinkage (and you) whether it's ok to keep 'metadata' as in my wording. I'm fine merging the sentences and adding the 'structure' point. But I'm -1 on "sets of statements" as this ventures into other requirements (metadata about profile elements, including constraints).

So I'd now propose something like

  1. "Profiles are uniquely named sets of data elements, such as classes and properties, that structure instance (meta)data"
    or a longer but more explicit expression:
  2. "Profiles are uniquely named sets of data elements, such as classes and properties. These elements structure the data (or metadata) that instantiate the profile"

And if none of these two is ok, then I suggest wielding the axe and remove all attempt to clarify 'metadata terms' in the original title. That would be
3. "Profiles are uniquely named sets of data elements, such as classes and properties."

@agreiner
Copy link
Contributor

I really don't care what we do with the title of the use case, but I do have an idea for the definition, when we get around to that: "Profiles are uniquely named sets of constraints, such as prescribed classes or properties, applied to data elements. They can be used to ensure consistency in instance data or metadata."

@larsgsvensson
Copy link
Contributor

@aisaac yes, the use of "metadata" is perfectly fine with me.
@agreiner I'm a bit hesitant to use "prescribed" since profiles can also define optional elements.

And looking at this definition for the umpteenth time: If we talk only about "classes and properties" it has a very RDFy touch and it feels as if we rule out the possibility to have profiles for XML documents or in other markup languages. Should we extend to "e. g. classes, properties or markup elements"?

@kcoyle
Copy link
Contributor

kcoyle commented Jan 21, 2019

"data elements"?

@aisaac
Copy link
Contributor

aisaac commented Jan 22, 2019

@larsgsvensson thanks for the feedback

@larsgsvensson @kcoyle at this stage the expression is "data elements, such as classes and properties" and I think this is the best we can get as we've discussed it many times ;-)

@aisaac
Copy link
Contributor

aisaac commented Feb 5, 2019

As per the discussion on #435 (for example this message from @kcoyle ) it seems that this requirement is, in terms of the categorization currently made in the UCR document, both a general 'definition' requirement and a requirement that indicates a function of profiles. This means it would fit both into sections 6.10 and 6.11.
The easiest option seems to duplicate the requirement, maybe with a slight difference in the explanatory text.
This would address the editor's note in the section where the requirement is currently presented. In fact, maybe the short-term solution is to change the editor's note to indicate that the requirement should be duplicated, until we sort out the other discussion on this requirement.

@jpullmann would you be ok with such duplication?
Any objection from the others?

@aisaac
Copy link
Contributor

aisaac commented Feb 19, 2019

@jpullmann I'd really like to hear your opinion about possible duplication of this in the UCR!

@aisaac
Copy link
Contributor

aisaac commented Feb 19, 2019

Since one month the discussion has not been very active, and it seems that in the last exchanges, people were rather keen on discussing the general definition of profiles, not the title of this requirement, which captures only one specific dimension of it.

So I'm renaming it to one of the last proposals - I'm picking the shorter one:
"Profiles are uniquely named sets of data elements, such as classes and properties, that structure instance (meta)data"
Btw some of reluctance wrt to the RDF-focus or possible confusion between data and metadata will be be clarified in the guidance section that is going to be about this requirement

@aisaac aisaac changed the title Profiles are "named collections of properties" or metadata terms (if not RDF) [ID41] (5.41) Profiles are uniquely named sets of data elements, such as classes and properties, that structure instance (meta)data [ID41] (5.41) Feb 19, 2019
@aisaac
Copy link
Contributor

aisaac commented Feb 28, 2019

In the discussion 28-02-2019: "Profiles are uniquely named sets of constraints, such as data elements (e.g. classes and properties) that describe (meta)data."

@agbeltran
Copy link
Member

Further refinement from discussion 2019-02-18: "Profiles are uniquely named sets of data constraints, such as data elements (e.g. classes, properties, value domains) that describe (meta)data."

@aisaac aisaac changed the title Profiles are uniquely named sets of data elements, such as classes and properties, that structure instance (meta)data [ID41] (5.41) Profiles are uniquely named sets of data constraints, such as data elements (e.g. classes, properties, value domains) that describe (meta)data. Mar 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants