Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flattening AccessPoint structure seems to mint new terms #76

Closed
GoogleCodeExporter opened this issue Mar 13, 2015 · 31 comments
Closed

flattening AccessPoint structure seems to mint new terms #76

GoogleCodeExporter opened this issue Mar 13, 2015 · 31 comments

Comments

@GoogleCodeExporter
Copy link

http://terms.gbif.org/wiki/Audubon_Core_Term_List_(1.0_normative)#hasServiceAcce
ssPoint 

The notes about the flattening case, consider the text "we recommend to select 
from among term names of the form "AB" where "A" is one of thumbnail, trailer, 
lowerQuality, mediumQuality, goodQuality, bestQuality, offline and "B" is one 
of AccessURI, Format, Extent, FurtherInformationURL, LicensingException, 
ServiceExpectation (example: thumbnailAccessURI)."

I don't see any way to interpret this as a suggestion to mint new terms in the 
AC namespace.

Original issue reported on code.google.com by morris.bob on 4 Jun 2013 at 5:26

@GoogleCodeExporter
Copy link
Author

"I don't see any way to interpret this as a suggestion to mint new terms in the 
AC namespace." -- is this missing a not? It is certainly a suggestion that 
additional terms are needed, it only leaves it open in which namespace they 
would be.

Original comment by g.m.hage...@gmail.com on 4 Jun 2013 at 6:44

@GoogleCodeExporter
Copy link
Author

Is thumbnailAccessURI a term or a value?  It seems to be a term of the form AB. 
 If it is a term, in what namespace?  If a value, of what term?

Original comment by morris.bob on 4 Jun 2013 at 7:14

@GoogleCodeExporter
Copy link
Author

[deleted comment]

2 similar comments
@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

Original comment by morris.bob on 5 Jul 2013 at 4:02

@GoogleCodeExporter
Copy link
Author

Original comment by morris.bob on 5 Jul 2013 at 4:04

@GoogleCodeExporter
Copy link
Author

Gregor: I have adjusted the "simple table" example in 
http://terms.gbif.org/wiki/Audubon_Core_(DRAFT_of_1.0_normative)#Multiplicity.2F
Cardinality to be and example of what I think we must mean. If you agree with 
that change to the table, I will adjust the corresponding flat xml example 
above it.

My confusion arose from the fact that the value of hasServiceAccessPoint is 
simply an identifier; whether that is restricted to a URI or not becomes an 
implementation and/or community question and we need to clarify that in the 
Notes for hasServiceAccessPoint. It seems that in this case there is no need to 
introduce hasServiceAccessPointLiteral.  

Original comment by morris.bob on 14 Jul 2013 at 12:20

@GoogleCodeExporter
Copy link
Author

No, wait. I am still confused. Consider the "simple table" example with 
ac:variant added and do so also in the XML example immediately above it. In 
these cases, what is added by a value for ac:hasAccessPoint?  In fact, in the 
"AB" pattern mentioned above, A can always be supplied by ac:variant and B by 
one of the other terms in the 
http://terms.gbif.org/wiki/Audubon_Core_Term_List_(1.0_normative)#Service_Access
_Point_Vocabulary 

The comment about the "AB" pattern seems pointless to me at the moment and I 
propose to take it out.

Original comment by morris.bob on 15 Jul 2013 at 4:02

@GoogleCodeExporter
Copy link
Author

Sorry, I am confused by your comments as well. Where we ever talking about 
adding "a value for ac:hasAccessPoint?"

When you write "In fact, in the "AB" pattern mentioned above, A can always be 
supplied by ac:variant and B by one of the other terms in the 
http://terms.gbif.org/wiki/Audubon_Core_Term_List_(1.0_normative)#Service_Access
_Point_Vocabulary" I agree with this. And your examples in the repeated value 
cases, to add additional records for each service access points may work for 
some, but not for all cases. It certainly does not work in a DarwinCoreArchive 
based exchange. There you need to serialize the access point variants to side 
by side columns, which I thought is the subject of this issue 76.

The point is, whether we should 
1. recommend the AB-constructed terms like as one option when repeated records 
or nested repetitions are not available
2. in addition already supply a subset of acs:thumbnailAccessURI etc. terms.

I favor 1 and 2.

Original comment by g.m.hage...@gmail.com on 15 Jul 2013 at 4:37

@GoogleCodeExporter
Copy link
Author

I'm not sure we can settle this without a conversation with audio. We seem to 
be confusing each other.  Here's my present source of confusion:

a. You wrote "There you need to serialize the access point variants to side by 
side columns".  Be that as it may for DwCA, there is no requirement that AC can 
be directly serialized to DwCA.  The only requirement is that there be the 
ability to have flat representations.  Having a single row for each "AB pair" 
does not, as far as I can tell, prevent someone from extracting the information 
from the table and rearranging to some other flat representation. 

b. In your point  1., I am not sure what you mean by "when repeated records are 
not available."  Repeated records must always be available for some uses of AC 
in flat representations.  We document such use cases in 
http://terms.gbif.org/wiki/Audubon_Core_(1.0_normative)#Multiplicity.2FCardinali
ty  For example, if someone wants multi-lingual ac:title, they need separate 
records with different ac:metadataLanguage records.  This is a generic issue 
when there are relations between terms but structure relating them is 
disallowed. It's an essential problem of flatness.

c. I assume your point 2 in comment 10 to this issue has typo in it and a 
superfluous "already", in it. I read it that you mean that you favor: "in 
addition, supply a subset of ac:thumbnailAccessURI, etc. terms." That is an 
answer to my question in comment 2 of this issue. I understand you to mean that 
ac:thumbnailAccessURI would then be the object of ac:hasServiceAccessPoint. In 
this case, you propose to mint a URI for every AB pair. That is where the 
entire thread started. It mints 42 new URIs just to help out people who not 
only can only make spreadsheets and can only make them of specific forms.

Original comment by morris.bob on 15 Jul 2013 at 8:42

@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

I guess my position is that based on my previous experience in exchanging this 
type of data in key to nature, I am convinced that people will mint these terms 
to make AC usable for their Excel/plain DB table environments. So my question 
is: do all these mint their own terms, or do we provide a shared fallback?

My proposition is to not mint these terms in ac: namespace, but in some acs = 
serialization or aca = ac auxiliary namespace.

I feel that would be a good compromise. I already introduced that above, but 
did not discuss it, apologies.

Original comment by g.m.hage...@gmail.com on 15 Jul 2013 at 9:00

@GoogleCodeExporter
Copy link
Author

OK. I think we understand each other now. See my interspersed comments


Yes, I agree that this is the central question for this issue. My
thought is that it is a rather general problem, especially in RDF, let
alone more abstract data models as AC is trying to be at this stage. I
raise this because there is increasing momentum in TDWG---or at least
by Joel Sachs in his role as RDF IG convener,  and by the contributors
to the draft DwC RDF Guide--to put generic problems on the table and
look for generic solutions. One such frequently occurring problem is
how to model individuals and their type when there is a notion of
individuals and classes to which they may belong. For your solution
that makes a new namespace to isolate some preferred individuals'
identitities \may/ solve the problem in this one case, for one term,
of one vocabulary.  But it leaves open a host of others that (a)might
best be addressed in a broader context and (b)might lead to chaos if
not addressed at the same time as the individual's identities are
minted. Among the problems is what exactly should be the rdf:type of
these URI references? Another class?  How is the namespace managed,
e.g. who gets to add new values of ac:hasServiceAccessPoint in a case
where they may multiply rapidly (as you suggest happened in your
application).

I can think of other solutions, possibly even more appealing, but also
needing more consideration, by more people, than just you and me. For
example these objects could be skos:Concept objects, reducing the
problem to "who gets to mint ConceptSchemes?"  And even that is not
without problems---they are just less frequent ones.  In an analogous
case, the Open Annotation Community ontology draft, refers to a class
of objects called oa:Motivation,and  declares (a) oa:Motivation is a
subClass of skos:Concept and (b) declares 12 specific instances it
deems of wide use [1].  It also declares a management regime [2] for
adding new skos:Concepts.  My guess is that something like this is
even more appealing to you, and probably to me, than what you propose.
But my point is that it needs more discussion, possibly very early on,
but, I believe, not before AC is accepted by the EC.  (By the way, for
actual RDF implementations, there are even some pitfalls for this
approach---which I bet also lurk in terms.gbif.org. Namely given the
rule in [2] that new Motivations must be instances of oa:Motivation,
you cannot validate against this rule unless the declaration is either
explicit, i.e. rdf:type oa:Motivation, or the validating agent has
access to a triple store with enough in it to deduce that  <X>
rdf:type oa:Motivation   This is a nuisance to communities who want to
consume objects made by producers that implicitly assume the entire
consuming community has the original definition of the term at hand.)

[1] http://www.openannotation.org/spec/core/core.html#Motivations
[2] http://www.openannotation.org/spec/core/appendices.html#ExtendingMotivations

Original comment by morris.bob on 15 Jul 2013 at 10:24

@GoogleCodeExporter
Copy link
Author

Bob, I don't think we understand each other. You keep bringing up "new values 
of ac:hasServiceAccessPoint" or similar. In my mind this is NOT under debate. 
If the flattening terms ("AB" scheme like acs:thumbnailAccessURI) are used, 
ac:hasServiceAccessPoint is NOT used. acs:thumbnailAccessURI is an additional 
term. Being an additional term used only under certain exchange requirements, I 
propose to isolate it into its own namespace and SKOS scheme. 

In your discussion above you raise valid very general points, but I think they 
don't apply to the pragmatic topic under discussion. The basic question is: If 
you need to provide or manage your data flattened, which terms do you use for 
acs:thumbnailAccessURI and acs:bestQualityAccessURI? Do you invent your own, or 
is AC  providing terms?

If you have an exchange format that supports the ac:hasServiceAccessPoint 
(which requires a graph), the whole scheme and namespace containing 
acs:thumbnailAccessURI and acs:bestQualityAccessURI is none of your concern, 
you can safely ignore it.

Original comment by g.m.hage...@gmail.com on 16 Jul 2013 at 12:55

@GoogleCodeExporter
Copy link
Author

Perhaps you could make a second "simple table" illustrating what you
mean. Assume a namespace with prefix acs:   Also give an example of
genuine RDF which means the same thing.  Then I should be able to
understand how (a) why there is no possible flat way without
introducing more terms. and (b) what is needed to provide for data
exchange between the two serializations

I hope your last sentence ending "...you can safely ignore it" doesn't
mean that data serialized with genuine RDF and having several variants
can never be compared to data serialized with a flat serialization
having the same human-semantically the same information.
Robert A. Morris

Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390

IT Staff
Filtered Push Project
Harvard University Herbaria
Harvard University

email: morris.bob@gmail.com
web: http://efg.cs.umb.edu/
web: http://wiki.filteredpush.org
http://www.cs.umb.edu/~ram

Original comment by morris.bob on 16 Jul 2013 at 4:12

@GoogleCodeExporter
Copy link
Author

We are not talking here about RDF, we are talking about a flat, DWC-A kind of 
Excel table, where the column headers are terms (and ideally, mapped inside 
Excel to real URIs - a feature supported by Excel). If you want to have one 
column for the thumbnail URI and another for the medium resolution and another 
for the highest resolution, you need three terms in this flat representation.

Please tell me where (WIKI URI!) you want that simple table to be created...

Original comment by g.m.hage...@gmail.com on 16 Jul 2013 at 8:47

@GoogleCodeExporter
Copy link
Author

Use http://terms.gbif.org/wiki/AC_Flat_Examples

1. We have to also talk about RDF in order to understand what
difficulties will arise if there is a need to exchange data between
two different serializations.  2. At the same time, I  want to
understand why the scheme I put in the table at
http://terms.gbif.org/wiki/Audubon_Core_(1.0_normative)#Multiplicity.2FCardinali
ty
cannot have the data easily be exchanged with a serialization of what
you are trying to accomplish.
Robert A. Morris

Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390

IT Staff
Filtered Push Project
Harvard University Herbaria
Harvard University

email: morris.bob@gmail.com
web: http://efg.cs.umb.edu/
web: http://wiki.filteredpush.org
http://www.cs.umb.edu/~ram

Original comment by morris.bob on 16 Jul 2013 at 2:23

@GoogleCodeExporter
Copy link
Author

Done. Not sure it helps. 

The mechanism is intended for those with limited access to computing resources, 
who need to make things work within a flat framework. The row-duplication 
method described on the AC introduction is rather difficult to make work. 
Because so many fields are affected, it may work for the (in a simple workflow 
rare!) case of multiple metadata languages. But duplicating every row for the 
different resolutions can be hard to maintain.

I see no problems with conversion, provided you have access to software 
development resources.

Original comment by g.m.hage...@gmail.com on 16 Jul 2013 at 4:13

@GoogleCodeExporter
Copy link
Author

I cannot see an argument why unfolding the table in your example is difficult. 
It is certainly not necessary to maintain copies of all the variant independent 
data in the Access point table. It merely requires a second table with the 
access independent data. That table is in fact a subtable of a single table 
solution. See my example 2 in http://terms.gbif.org/wiki/AC_Flat_Examples

I am very opposed to introducing new terminology without broad community 
support, which would seriously delay submission for approval.

Original comment by morris.bob on 21 Aug 2013 at 10:19

@GoogleCodeExporter
Copy link
Author

Bob wrote in the wiki page:
"Example 2: This unfolds Example 1 and shows the approach illustrated near the 
bottom of the section Multiplicty/Cardinality in the AC normative page. As 
there, this is simply an unfolding of the table of Example 1 comes only at the 
cost of two additional cells per variant, namely the repeated required terms. 
It also requires a second, single row table that is a proper subtable of the 
Example 1. It needs no new terminolgy, and if it did, so would all the other 
cases mentioned in Multiplicty/Cardinality. There is no requirement stated in 
AC that a tabular representation have only one row."

My analysis of this is that is not flattening the data structure. Bob simply 
proposes how the nested Accesspoint Variants would be expressed in standard 
RDBMS modelling. I suggest to replace the text above for example 2 with the 
following:

"Should you have a relational database and the ability to work with two tables, 
you can avoid the flattening and use a "1:n" relation to express the variants:"

While this may be useful to some people who have never used a 1:n 2 table 
relation, I think it is misleading to present this under the heading of 
"flattening".

Original comment by g.m.hage...@gmail.com on 21 Aug 2013 at 11:26

@GoogleCodeExporter
Copy link
Author

Gregor: The debate is not fundamentally about the use of the word "flatten." 
For purposes of this discussion we can use flatten to mean "Produce a table in 
which every media resource is described in a single row." 

I find that your argument is two-fold: (a)flattening can only be done by 
introducing new AC terminology as table headers, one term for each recommended 
value of ac:variant. (b)That introduction should be done before adoption of 1.0 
so that  the terminolgy is part of 1.0.

I am slightly dubious about (a), mainly because I think it needs substantially 
more thought; For example, one could instead describe a best practice for 
constructing table headers along the lines of the "AB" mechanism described in 
early comments in this issue. Then the discussion (which I am not urging for 
now) becomes one of what burdens fall upon whom when attempting to consume or 
produce a flat media description.

As to (b), you are urging a limited solution to this general problem, which we 
know is broader than ac:variant, and even is broader than access points. My 
concern that doing so now could make it quite difficult to adopt other 
solutions to the flattening problem, in that it would likely lead to an 
imposition on developers to support multiple methods of flattening. 

All of this is why am opposed to addressing this before 1.0.  The general issue 
will still need approaches even were we to decree a new set of terms for table 
headers specifically for ac:variant.

Original comment by morris.bob on 22 Aug 2013 at 12:38

@GoogleCodeExporter
Copy link
Author

Issue 83 has been merged into this issue.

Original comment by morris.bob on 22 Aug 2013 at 12:52

@GoogleCodeExporter
Copy link
Author

Note that the "AB mechanism" is the introduction of a new terminology. Only 
everyone creates a separate URI, because everyone needs to come up with its own 
namespace.

There may be more cases where this is relevant, but my argument hinges on the 
analysis that no-one can use AC in a flat table situation without doing this 
for the one central property: the uri of the media items that AC provides 
metadata about. This is what AC is supposed to do. So, how to solve some 
related issues, we can push later. I think your concern about generality are 
valid, but here I advise to be pragmatic. AC is worthless without the ability 
to express the URIs for the media items.

I am involved in two projects where we have these flattening issues. We are 
introducing the AB notation, but I wonder whether it is a good idea if everyone 
does that independently.

Original comment by g.m.hage...@gmail.com on 22 Aug 2013 at 3:11

@GoogleCodeExporter
Copy link
Author

Original comment by morris.bob on 24 Aug 2013 at 7:36

  • Added labels: Type-Defect

@GoogleCodeExporter
Copy link
Author

No, it probably is not a good idea. But I think that it may be well 
accomplished with a Best Practices document or an applicability statement, 
presumably including introducing a preferred  namespace.

The most immediate problem I see is that if we do this now in official AC, we 
are essentially declaring that everyone should do this.  We also tend to bind 
us to the "AB" solution for all the circumstances where this problem appears.  
These are myriad, consisting at least of N*M cases where N=number of 
accessPoint terms and M=number of variants, including any that may be 
introduced in the future. This does not even count the cases that arise from 
terms having nothing to do with access points, but do not permit repeated 
values. For example, consider the case where several languages must be 
supported and it is desired to have a title in multiple languages.  This will 
give rise to the requirement to have a term name for each of the several 
hundred(?) ISO language names. 

In summary, I think this needs more careful consideration because the flat 
(i.e. one-line per resource) case is important but shouldn't stand in the way 
of ratification.

I have already put this on Issue #84, the list of issues requiring a tdwg task 
group. There may multiple of approaches such a group could recommend about flat 
serializations, just as there might be to any other kind of serialization, such 
as RDF or XML-Schema. For neither of these are we delaying the approval of the 
normative spec.

I am resolving this as WontFix (before 1.0).  If nobody reopens it before 
2013-08-26T:22 I will be declaring AC ready for the Review Manger's approval 
for submission to the EC.


Original comment by morris.bob on 25 Aug 2013 at 3:56

  • Changed state: WontFix

@GoogleCodeExporter
Copy link
Author

(Restatement of poor first paragraph in comment  26)
I agree it is not a good idea to have everyone introduce a solution without 
coordination. But I think such coordination may be well accomplished with a 
Best Practices document or an applicability statement, presumably including 
introducing a preferred  namespace.

Original comment by morris.bob on 25 Aug 2013 at 4:48

@GoogleCodeExporter
Copy link
Author

Original comment by morris.bob on 21 Oct 2013 at 11:08

@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

Reopening in order to address original issue about namespaces for, and 
provision of, compound terms for flattening. Namespace proposal is 
http://http://rs.tdwg.org/acf/terms/ ("ac flattened terms")

We also need a document that at least lists the terms to be compounded, and we 
need a brief description of the problem and proposed solution.

Original comment by morris.bob on 27 Feb 2014 at 2:33

  • Changed state: Accepted

@baskaufs
Copy link

This issue has languished now for five years with no resolution. The AC Structure document provides an example in Section 3.2.2 that shows a bunch of "made-up" properties for flattening service access points into a single table row and says "Note: acf: (for “Audubon Core Flat”) is a made-up namespace. Communities of interest might mint such terms in order to use this kind of structure." As far as I know, no one has indicated a burning need to create these kinds of terms, so until that need becomes more apparent, I'm closing this issue. When somebody wants those kinds of terms, they can create term proposals.

ancillary documentation/guides automation moved this from To do to Done Jul 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants