-
Notifications
You must be signed in to change notification settings - Fork 2
/
SchemaChangeLog091EarlyBetaVersions.txt,v
1735 lines (1527 loc) · 147 KB
/
SchemaChangeLog091EarlyBetaVersions.txt,v
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
head 1.21;
access;
symbols;
locks; strict;
comment @# @;
1.21
date 2009.11.25.03.14.37; author GarryJolleyRogers; state Exp;
branches;
next 1.20;
1.20
date 2009.11.20.02.45.29; author LeeBelbin; state Exp;
branches;
next 1.19;
1.19
date 2007.03.06.17.30.00; author TWikiGuest; state Exp;
branches;
next 1.18;
1.18
date 2005.09.28.17.10.44; author GregorHagedorn; state Exp;
branches;
next 1.17;
1.17
date 2004.10.06.09.15.18; author GregorHagedorn; state Exp;
branches;
next 1.16;
1.16
date 2004.07.15.18.02.00; author GregorHagedorn; state Exp;
branches;
next 1.15;
1.15
date 2004.06.11.09.19.00; author GregorHagedorn; state Exp;
branches;
next 1.14;
1.14
date 2004.05.28.17.24.06; author GregorHagedorn; state Exp;
branches;
next 1.13;
1.13
date 2004.05.28.15.05.00; author GregorHagedorn; state Exp;
branches;
next 1.12;
1.12
date 2004.05.25.08.13.51; author GregorHagedorn; state Exp;
branches;
next 1.11;
1.11
date 2004.05.24.13.33.00; author GregorHagedorn; state Exp;
branches;
next 1.10;
1.10
date 2004.05.24.12.20.00; author GregorHagedorn; state Exp;
branches;
next 1.9;
1.9
date 2004.05.11.13.04.00; author GregorHagedorn; state Exp;
branches;
next 1.8;
1.8
date 2004.05.10.11.33.00; author GregorHagedorn; state Exp;
branches;
next 1.7;
1.7
date 2004.05.05.16.08.00; author GregorHagedorn; state Exp;
branches;
next 1.6;
1.6
date 2004.05.03.13.41.00; author GregorHagedorn; state Exp;
branches;
next 1.5;
1.5
date 2004.04.29.01.20.07; author BobMorris; state Exp;
branches;
next 1.4;
1.4
date 2004.04.23.16.24.00; author GregorHagedorn; state Exp;
branches;
next 1.3;
1.3
date 2004.03.25.12.50.37; author GregorHagedorn; state Exp;
branches;
next 1.2;
1.2
date 2004.03.22.13.35.00; author GregorHagedorn; state Exp;
branches;
next 1.1;
1.1
date 2004.03.16.10.49.07; author GregorHagedorn; state Exp;
branches;
next ;
desc
@none
@
1.21
log
@none
@
text
@%META:TOPICINFO{author="GarryJolleyRogers" date="1259118877" format="1.1" version="1.21"}%
%META:TOPICPARENT{name="SchemaChangeLog"}%
---+!! %TOPIC%
<h1>Changes in 0.91 beta 15 (relative to the 0.9 Dec. 1. 2003 release)</h1>
<strong>This is an updated version containing most of the minor changes discussed at the [[SDD2004Berlin][meeting in Berlin]]. Some changes are still pending.
The current version of the SDD schema can always be found at CurrentSchemaVersion. Please do read through the report of changes, except perhaps for the few trivial at the start.
Please take a look at the schema to verify that you agree with the changes and that they make sense to you.</strong>
<strong>Note:</strong> I have tried to document changes, but I cannot guarantee that everything is properly documented.
In fact, since <nop>GenerationMetadata and <nop>ProjectDefinition are heavily changed in an attempt to find common
ground between the various GBIF standards (current discussion involves only ABCD so far), I have given up on documenting all detailed changes therein (but some are commented).
<h3>Trivial omissions that were present in 0.9, corrected in 0.91<a name="trivial"/></h3>
* audiencekey in <nop>ProjectDefinition/Audiences/Audience was specified to have a pattern in the documentation, but the pattern was not defined in the schema, regular expression pattern added to schema 0.91.
* <nop>RevisionData were required in Description, Keys, and <nop>GlossaryEntries, now made optional.
* The Keys collection could be missing, or empty (0 to unlimited Key objects), now changed to 1 to unlimited Key objects.
<h3>Non-trivial changes enacted plus proposals not enacted<a name="nontrivial"/></h3>
*Root*
* In an attempt to converge with ABCD:
* Document root element changed to <nop>DataSets/<nop>DataSet collection. <nop>DataSet takes the place of the original Document. Multiple "Projects" can now be transported in one file or data stream. This is not urgent for SDD, but does not hurt either.
* <nop>GenerationMetadata changed to <nop>TransformationHistory, conceived as a collection of at least one, possibly multiple Transformation elements. Alternative names: <nop>ConversionHistory, <nop>UBIF.DerivationHistory, <nop>HistoryMetadata, <nop>ContentHistoryMetadata, or <nop>DataHistoryMetadata.
*<nop>ProjectDefinition*
* Element itself changed to <nop>ProjectMetadata
* <nop>AudienceSpecificData/Representation split into <nop>Description/Representation and <nop>IPRStatements/Representation.
* <nop>IPRStatements is a list of various copyright, terms of use, disclaimer, acknowledgment etc. statements (new type common to SDD and ABCD schema). However, this is also present in <nop>TransformationHistory!
* <nop>ProjectDefinition/HistoryWebAddress dropped. Annotation was: "@@@@ To be discussed. The idea is that a project may point to a web resource that informs about details about the history of the data (previous versions or a detailed log of changes)." Unless somebody needs it now, I propose that this should be an addition in a later version rather than included in the first release.
* <nop>ProjectDefinition/Icon moved to new <nop>ProjectDefinition/Description/Representation, thus making it audience specific.
Icon (or logos) are not necessarily language independent since they may include text!
* <nop>ProjectDefinition/WebAddress moved as well, different audiences/languages may be referred to different URIs!
* _New_ after Berlin meeting: attempt to use across standards (see UBIF.SchemaDiscussion), therefore audience-dependent project Description and IPR-Statements changed to language dependent. Language should simplify the adoption of common framework elements for all TDWG/GBIF standards.
* _New_ after Berlin meeting: Version structure revised.
* Version/PublicationDate changed to <nop>VersionReleaseDate to avoid possible confusion with <nop>LastRevision or data generation date in online situations.
* A Modifier element added (for beta, rel. candidate, etc.).
* Increment removed (because considered application-internal management mechanism, no need for interoperability).
* Major and Minor left as integers to improve interoperability and comparability (nobody commented on the proposal "change version to string" posed in previous version of the change log.)
* _New_ after Berlin meeting: The narrative (unconstrained text) elements <nop>GeographicCoverage and <nop>TaxonomicCoverage in <nop>ProjectDefinition|Projectmetadata/Description/Representation combined to Coverage.
Constrained <nop>ClassScope added, __OtherScope needs a proposal how to link it to other vocabularies. <nop>SourcePublication changed from a single to possibly several, and considered a scoping mechanism as well.
* _NOTE_: Project Definition could also be called "Envelope". This avoids "project", which is meaningful in SDD, but perhaps problematic in ABCD/taxon names?)
* _QUESTION_: Can project definition be merged with transformation history?)
* _PROPOSAL_: Need documentation of quality control methods and standards, e. g.
* <nop>QualityControlStandard: Name (and version, if applicable) of the published or internally documented quality control standard used.
* <nop>QualityControlDescription: Free-form description of methods used to ensure the quality of the descriptive data. In the absence of a standard, this should be a short description of the quality control procedures taken.
* _QUESTION_: <nop>ProjectDefinition/RevisionData/InitiationDate is xml:dateTime and required, which may cause problems in legacy projects. See discussion under InitiationDateForImportedLegacyData. The proposal makes sense in the context of project definition.
However, <nop>RevisionDataType is also used in several other contexts (single descriptions, glossary, characters, etc.) and the proposal does not make sense there. Do we need two slightly derived types? Has anybody a better idea?
*GeneralDeclarations*
* New root section "<nop>GeneralDeclarations" created for concepts not specific to SDD, but needed in the schema. Alternative names for this section are:
<nop>GeneralDefinitions, <nop>OverarchingIssues/Functions, <nop>CrosscuttingIssues/Functions, <nop>GeneralTerminology, <nop>GeneralTerms, <nop>GeneralVocabulary (the latter three do not
cover the possible inclusion of "language rules"). The following elements moved there:
* <nop>ProjectDefinition/Audiences
* Terminology/<nop>CodingStatusValues
* Terminology/<nop>UnivariateStatisticalMeasures (was <nop>StatisticalMeasures)
* (Newly created:) Global definitions for <nop>MeasurementUnits (Character definition Numerical/<nop>MeasurementUnit is consequently changed to a ref type). The optional generalization allows to define relations between units such that two size measures, one expressed in mm the other in cm become comparable.
* In each of <nop>CodingStatus, <nop>UnivariateStatisticalMeasure, <nop>MeasurementUnit, the "Generalization" element
(containing the machine-readable partial semantics of an object) was renamed to Specification.
* The Audience definitions lang and expertiselevel, previously defined as attributes, have been reorganized to follow the pattern of Label + Specification.
* The defaultaudience attribute present at Audience was only appropriately placed because all audience definitions were considered part of the project definition.
Now it is separated and moved to <nop>ProjectDefinition/DefaultAudience.
* <nop>StatisticalMeasures renamed to <nop>UnivariateStatisticalMeasures (compare Bob's comment on TWIKI about ClosedTopicMultivariateStatistics).
* Related: the fact that Char. def. Numerical/StatisticalMeasures had both a ref and a key confused several reviewers. To clarify, the key has now been renamed from ref to <nop>GeneralDeclarationRef and both this and the key on <nop>GeneralDefinitions/<nop>UnivariateStatisticalMeasures/<nop>UnivariateStatisticalMeasure is typed as <nop>StatisticalMeasureKeyValue.
* Element "Dimensionless" added to Specification of <nop>UnivariateStatisticalMeasures (answers whether the measurement unit apply to a statistic or not).
*Terminology*
* Sequence of sections changed, Terminology section placed after Entities and Resources sections.
* Terminology/Glossary (= ontology definitions) strongly changed
* Multiple new ontological relations between terms added and subsumed under a new Ontology element. This urgently needs review!
* <nop>SensuLabel and <nop>KindOfTerm added. The first allows to distinguish between multiple definitions of a term (Term does not have to be unique, but Term + <nop>SensuLabel has to be!), the latter categorizes terms (is that doubtful??).
* With the introduction of <nop>SensuLabel, Term is no longer a keyref in the ontological definitions (synonym, antonym, etc.). Replaced with <nop>TermListType = List of <nop>GlossaryEntryRefType.
* Ontology now refers to <nop>GlossaryEntry keys rather than Term strings in a specific language. This is partly necessitated by the introduction of a <nop>SensuLabel.
* As a result, other parts of the <nop>GlossaryEntry (Citations, <nop>RevisionData) have now been made language/audience-independent as well. This also resolves some anomalies, e.g. that <nop>RevisionData were one the audience-specific part instead on the language-independent object as in all other cases in SDD.
* <nop>ExternalReference changed to <nop>ExternalDefinitionURI
* <nop>CharacterDefType
* Label changed from <nop>LabelPlusAbbreviationType to <nop>SimpleLabelType. This simplifies the model: Only a single label can be defined at the character level, all extended concepts (abbreviations, export tokens, images) are definable only in concept trees. Since concept trees require a terminal node for each character, the same expressiveness is maintained.
* Type changed to <nop>MeasurementScale, value list completed to include "ratio".
* Section Assumptions added to the character definition, <nop>MeasurementScale moved there
* Categorical and Numerical are tentatively changed to a choice rather than co-occurring. This needs discussion!
* PlausibilityRange added to numeric character definition. Applies to all values and statistics, except those that are dimensionless (like variance).
* <nop>GenericStates renamed to <nop>ConceptStates (= states that are present at nodes in the concept tree; this is the only place where <nop>GenericStates
was present). "Generic" was considered to be confusing since for biologists it may be understood as referring to states describing a Genus.
* "Probability modifiers" have been renamed back to "Certainty modifiers" (they were previously called "Uncertainty modifiers" before changing to "Probability". As
already discussed in Brazil (but later forgotten), Probability is ambiguous since low occurrence frequency of a state also results in a low probability that a given object has a given character state.
* Terminology/Modifiers/Sets (intended to define reusable modifier sets which would then be associated with characters) and <nop>CharacterDefType/ModifierSets where both replaced with a new Concept/ApplicableModifiers element in the concept trees. For the modifier sets a key and a label had to be defined so they could be selected in each characters through a keyref. The new solution avoids both the label and the key/keyref mechanism: The concept label also identifies the modifier set, and the characters are already defined by all characters included in a concept branch. The disadvantage is, that some tree-walking is required to find which modifier is applicable to which character.
* In frequency modifiers "ProbabilityRange" was changed to "CertaintyRange".
* Frequency and Certainty modifiers changed to now contain the Range definition inside a Specification element.
* Concept trees: An organizing element "Specification" added (similar to definitions in <nop>GeneralDeclarations). The types, roles, etc. inside were reorganized and the enumerations changed (e. g., <nop>MethodHierarchy to <nop>InstrumentationHierarchy, <nop>PartHierarchy split into <nop>PartOfHierarchy and <nop>PartGeneralizationHierarchy). Also please critizise the current structure: "DesignedFor/Role=Filtering". Do the element and value names make sense to native speakers? Any better suggestions?
* _PROPOSAL_: Rename <nop>AutoAddStates to <nop>UpdateStateRefsTriggers (those state from a generic state set must be as <nop>StateReference in Character/Categorical/States). GH: I believe it should be the other way round, i.e. instead of a state-set reference at the character, there should be a list of characters referenced at the place concept node. I have started to do this, but not yet finished! See "####" at the end of the document!
* _QUESTION_: Allow multiple mappings of fine-grained states to coarse-grained states, and make these mappings expertise-specific (part of audience definition)?
Do we need multiple state sets within a character? Broad categories and narrow categories? Currently mapping of state is within a single character, and the two state sets need to be detected by application (those present minus those mapped away. Note: mapping can be indirect a-> b-> c, only c should remain.)
Do we need multiple <em>named</em> mapping definitions in the future? See StateMapping for further discussion.
*Entities*
* The "connector" metapher was not well received and not considered intuitive. As an attempt, I propose to use a proxy metapher: The proxy object is a local object "standing-in" for the external, often asynchronously available resource on the internet. In programming this is called the "proxy-pattern". As a variation proxy objects may, however, also "stand-in" if no external object can be found and a local object (e.g. in biology: taxon name, specimen) has to be defined. Specific changes:
* <nop>ResourceConnectorBaseType changed to <nop>ProxyBaseType
* <nop>ClassNameConnectorType, <nop>ClassHierarchyConnectorType, <nop>DescribedObjectConnectorType, etc. all changed to <nop>...ProxyType
* Within the <nop>ProxyBaseType, the <nop>FreeFormDescription was changed to Label. For all internal SDD object like characters or states, Label signifies a human readable representation, which is the intent of this data element as well.
* The ID/external object linking was strongly changed. The previous version (which was never really worked out so far) worked only if the object query could be embedded into a single URI query string, or if the old <nop>ServiceProvider referred to a web service wsdl with a single method and a single parameter. Now the <nop>ObjectLink rather than the old "ExternalID" points to the object in case of a single URI query string. The method and parameter names, and the ID-values are now given separately for web services. Furthermore, ABCD does not plan to provide a single or unified ID for collection units, but uses three separate variables that together uniquely refer to a specimen object. This is supported, but it would still be desirable to have a single ID to simplify ID comparison and distinguish ID from other parameter values that may be required to use a webservice method (but may be constant for different objects).
* In addition to URL and webservice, tentative support for DOI (digital object identifiers) and <nop>LifeScience ID (LSID) was added (including an LSIDs defining a pattern constraint).
* _New_ after Berlin meeting: Sequence of Label (= <nop>FreeFormDescription in 0.9) and <nop>ObjectLink changed; Label is now first. This agrees with the use of Label throughout the other parts of the schema (characters, states, etc.).
* Entities/Classes changed to Entities/ClassNames, //Class to //ClassName. Note: in addition to the <nop>ClassName (taxon name) pointers present we may need alternative pointers into the class concepts (taxon concepts) as present <nop>ClassHierarchy!
* "<nop>TaxonNameInSource" renamed to "<nop>ClassNameInSource". Related open issue: Combine with Location? Else we need to have a <nop>CitationBaseType without <nop>ClassNameInSource used in Glossary and Keys, and a derived type used in Descriptions!
* _New_ after Berlin meeting: <nop>ClassIdentification changed to <nop>ClassAssignment; the process will be an identification, but the result is assigning the object description to a class. The term Identification caused confusion in the discussion.
* Bob pointed out the inconsistency of declaring the standard to be independent of the biodiversity domain (thus using class/object instead of taxon/specimen) and still having taxon, taxonauthor, etc. in UBIF.FormattedText. For the time being I have removed these (they are still preserved in an unused backup version of the type, so they can easily be put back).
* Similarly, the biology-specific elements Sex and Stage were removed from <nop>ClassNameProxyType (= <nop>ClassNameConnectorType in 0.9; = the type of the proxy object defining links to external name databases).
SDD assumes that <nop>ClassNameConnectorType in the future will connect to nomenclators or species databases and these are unlikely to provide separate records for sex and stage. It would have been possible to move Sex and Stage to <nop>DescriptionBaseType, but they are required at the end of the diagnostic keys as well (sexes or stages may be keyed out separately!). Thus, a new type <nop>ClassRefWithAdditionalClassifierType has been derived from the <nop>ClassRefType and used for <nop>DescriptionBaseType/Class (which is the basis for coded as well as natural language descriptions) and <nop>StoredKeyDefType/Lead/Class. Furthermore the Object identifications may be sex/stage specific (but also many objects will have multiple stages in a single specimen...). At the moment the new <nop>ClassRefWithAdditionalClassifierType has also been used at <nop>DescribedObjectConnectorType/ClassIdentification.
* The above mentioned type <nop>ClassRefWithAdditionalClassifierType should be defined generalized, avoiding biology-specific concepts like sex and stage.
* See SecondaryClassifiersProposal (and earlier: TheProblemOfSex)!
* <nop>ClassHierarchies was previously restricted to single hierarchy, now allows multiple <nop>ClassHierarchy objects. A <nop>ClassHierarchy is the only way available in SDD to define
taxon subsets (character subsets are defined in the <nop>ConceptTrees).
* _PROPOSAL_: Add an Abbreviation element to Class and Object in Entities? Would not likely be updated by service, but may be useful or even required for reports. Update problem is related to problem with updating the Caption of <nop>MediaResources.
*Descriptions*
* In coded and natural language descriptions a Header element was introduced to improve the overview and organization of information.
* <nop>CharacterData_BaseType/Sequence with values "terminology" or "description" was considered difficult to understand. Bob proposed to replace it with a boolean "<nop>StatesAreOrdered" which has been done.
* _PROPOSAL_: Rename <nop>CodedDescriptions to <nop>SymbolicDescriptions, see Analytical Philosophy (I only checked the Enc. Britannica, I am no expert in this!)
*Keys*
* Keys/Key was changed to <nop>IdentificationKey/IdentificationKeys. The term "key" was perceived as too general, causing especially misunderstanding for non-biologists like programmers.
Instead of the depracated "guided key" other terms are "Pathway key" and "Stored key". "Dichotomous key" is inappropriate.
* <nop>CodedStatements in Keys (coded terminology equivalent to the natural language key statement) used to be a simple list of states. To accomodate the frequently occurring more complex statements in keys, e. g. "margin of fruitbody yellow (or orange and hairy)" -> i.e. not if only orange, or "margin of fruitbody yellow, never with denticles" -> other surface structures may be present, a boolean operator logic modeled after <nop>MathML has been added to <nop>CodedStatements inside Keys.
* Related: Should Boolean logic (not, and, or) be added to any natural language markup?
* Should guided keys be marked up using the natural language markup method rather than using a separate section, as currently proposed? Currently, the key markup was thought to follow the coded description model, but now it has been extended. Problem: Boolean logic is frequently found in the lead statements of keys, but rarely in natural language taxon descriptions. However, if Boolean logic operators are introduced to both, it would be a strong argument to use the same method in NLD and Keys, rather than having three variants.
* Alternatively, we may want to extend the <nop>CodedDescriptions and provide Boolean logic operators there as well. This would be a heavy burden on database-oriented descriptive data processing, however. Or can someone provide a simple model how to handle arbitrary logical and/or combinations in a relatively simple database model?
*General*
* <nop>CitationType: optional <nop>LastVerified and <nop>InvalidSince date elements added, important for volatile online publications.
* The application-specific data containers (= extension mechanism to store non-SDD data) has been renamed from <nop>ApplicationData/Application to <nop>CustomExtensions/CustomExtension. Several applications may agree on common extensions, in which case the old names would not have been appropriate. The mechanism itself remains unchanged.
* Model groups like "(Rich)AnnotationGroup" containing only optional elements have been themself made optional. This changes nothing in the validation and schema, but seems to help when using Castor data binding.
* In the <nop>LabelPlusAbbreviationRepresentationType (used frequently in Label/Representation elements) the Selector element containing media (usually images) was renamed to <nop>MediaResources. This is the same element name used generically throughout the schema.
* The name "Selectors" was intended to express that only certain media should be added here - those that are sufficiently informative and concise at the same time to be used as selectors instead of text labels. However, the use of Selector lead to more confusion than clarification, and the purpose of the media is expressed through the Label context, i.e. these are labeling images etc.
* The only other media resource is Icon which remains semantically labeled.
---
<h1>Open Questions<a name="questions"/></h1>
* Class names (= taxon names referenced in descriptions or keys) may have to be audience specific! See LanguageSpecificClassNames for a discussion!
* Descriptions generalization questions, i.e. inferring descriptions from other descriptions:
* Main.PrometheusII proposes to explictly reference descriptions that are to be included or generalized into a current description. Currently we expect in SDD this to rely on am automatic "description resource discovery" mechanism, i. e. _all_ object descriptions with the same class name are generalized, and classes are generalized to higher classes following the class (taxon) hierachy defined in Entities.
* <nop>BioLink proposes (correct?) to explicitly flag which characters or states allow generalization, and whether from above or below.
* (= the first is explicit generalization on the object/class hierarchy level, the second explicit which characters/states are included in generalization.)
* Related: SDD probably needs a mechanism to mark the results of aggregation/generalization, computed characters, calulated statistics to document whether they are calculated / inherited or directly entered.
* Related: Do we have to document original terminology labels during data entry (i.e. in the language/audience representation used during scoring). The audience itself may be interesting (as a code), but even more the terminology may have been changed slightly (evolution of terminology) since scoring. A record of score-time representation would increase the trust in the coded scores and allow some backtracking of problems.
* In Descriptions we call an element <nop>GeographicalScope, in <nop>ProjectDefinition basically the same thing <nop>GeographicalCoverage! However, Descriptions refers to defined objects in Resources, whereas in <nop>ProjectDefinition it is free-form text (modeled directly after <nop>DublinCore). Make this consistent and always use Resources/Geography/Location object references?
* Problem of storing calculated data and marking them as "autogenerated" (or which term to use?). Related to problem of inheriting information up and down taxonomic tree. Similar problems are already marked up in the "Origin" element in character and NLD data, and in the inherited attribute associated with character ratings. In the case of statistical measures, marking the Origin as calculated would refer to the raw data in an observation set. However, there is some discussion on the Wiki (see RepeatedObservations) whether we need a keyref to exactly one observation set or not.
* We probably need to have more than one class hierarchy and add a marker to indicate which hierarchy is formal, and which contains non-taxonomic groupings. In Brazil Kevin reported on Lucid providing a "tag" mechanism to mark "silly characters" intended only to group items like "100 worst weed species: yes/no". XPER reported a similar tag mechanism for items (instead of characters as in Lucid) to tags items for specific problems: diseases / quarantine species / disease vectors. To me both kind of problems seem to be most appropriately handled as a non-taxonomic class hierarchy. Any proposals how to handle this? As a first step an additional attribute "IsPhylogenetic" in the class hierarchy is proposed (already done).
* Glossary:
* Do we need some method to express ranges for cardinality: How many legs may there be etc.?
* Do we need some method to associate states with properties/types?
* Should the natural language markup be brought closer to xhtml by using <span class=""> for markup?
* Basing character states on concept states (= reuse of state sets in multiple characters) causes problem with order (ordinal scale) characters. The states in a character may be inherited from from multiple concepts nodes. Each of these will probably have order in the concept, but the final order can only be defined in each character. This seems unfortunate.
* Can we describe images? Is this automatically implied in reversing the association between a description and an image or not?
Images may only illustrate parts of the description.
* Can we format numeric values in reports? See DELTA *DECIMAL PLACES. How do we format sets of statistical measures in natural language or other reports? The (min-) lowerrange - central - upperrange (-max) format is not necessarily universal. Currently it is nevertheless fixed in application code and cannot be defined by users. Since many variants which individual measures are present exist, this can probably not be done with a <nop>TextBefore/After strategy (possible for Min, Max, but not for ranges with/without mean, "3-6", "5", "3-5-6"). Also, open ranges exist, which should be output as "at least 3 cm long" in natural language. Also: formats are audience/language-specific!
* Can we find a smart method to format related and dependent value like width x length?
* Using polymorphism for character definitions. Color as separate character type?
* Media Resource may need a location detail (if figure has multiple labeled fragments). Perhaps call this <nop>FragmentLabel?
* Media-"FragmentLabels", but even more the "Location" in Citations may be language sensitive! "table 1", "tab. 2", "figure 3" in English, "Abbildung 3" in German etc.!
---
<h1>Problems I believe cannot be solved in xml schema</h1>
(please tell me if you disagree!)
* We have a frequently used type that prevents validation of requiredness in SDD schema: Most labels use <nop>FormattedSimpleTextType, which if the element is required should always be non-empty. However, in contrast to simple text strings, <nop>FormattedSimpleTextType allows limited formatting (sup/sub etc.) and has a mixed content model. As a result, it is not possible in xml schema to require the length of it to be at least 1. This may be a case where we have to make a recommendation not to output empty elements, and a requirement that a missing element and an empty element are to be considered identical (applications should not attach different semantics to empty elements).
---
The missing element issue seems approachable by declaring things nillable and allowing xsi:nil="true" to distinguish from the missing case. This arose also in the discussion ResolvedTopicIsDiGIRadequateForBDI.SDD -- Main.BobMorris - 29 Apr 2004
I cannot follow your argument. The problem I state above is that I cannot constrain the Labels to actually contain a string, the element must be present but may contain nothing. There seems no mechanims in schema to prevent that. I know you warned us against mixed content model! -- Gregor Hagedorn - 3. May.
---
<h3>Appendix, see discussion marked "####" above:</h3>
Current situation in 0.9:
<pre>
Concept
Concept
Concept key="123"
ConceptStates
StateDefinition key="1"
StateDefinition key="2"
StateDefinition key="3"
Char
Categorical/States/
StateReference ref="1"
StateReference ref="2"
StateReference ref="3"
AutoAddStates ref="123"
</pre>
Proposed reversal:
<pre>
Concept
Concept
Concept key="123"
ConceptStates
StateDefinition key="1"
StateDefinition key="2"
StateDefinition key="3"
UpdateStateRefsTrigger
Character ref="123"
Char key="123"
Categorical/States/
StateReference ref="1"
StateReference ref="2"
StateReference ref="3"
</pre>
One reason why this is relevant is that I believe we have to introduce a similar mechanism for <nop>StatisticalMeasures, to allow defining sets of statistical measures centrally (min-max range, a simple range/mean type like DELTA, extensions including variance and sample size, etc.).
Also, we have modifier sets as well. Can we also run them over a concept-node-based system, so that we have very similar systems for States, Measures, and modifiers? That seems to improve the schema. Unfortunately, with modifiers I am uncertain how well this works. Modifiers almost cry for inheritance down the concept tree, something we have not yet done so far!
---
Looking for the most recent schema file? See CurrentSchemaVersion!
-- Gregor Hagedorn - 25 May 2004
%META:FILEATTACHMENT{name="SDD_091beta3.zip" attr="h" comment="SDD 0.91 Beta 3" date="1079962204" path="C:\Data\Desktop\DESCR\TDWG-SDD\Schema\091\SDD_091beta3.zip" size="52796" user="GregorHagedorn" version="1.1"}%
%META:FILEATTACHMENT{name="SDD_091beta6.zip" attr="h" comment="SDD 0.91 Beta 6" date="1082737634" path="C:\Data\Desktop\DESCR\TDWG-SDD\Schema\091\SDD_091beta6.zip" size="57560" user="GregorHagedorn" version="1.1"}%
%META:FILEATTACHMENT{name="SDD_091beta7.zip" attr="h" comment="SDD 0.91 Beta 7" date="1083591586" path="C:\Data\Desktop\DESCR\TDWG-SDD\Schema\091\SDD_091beta7.zip" size="56869" user="GregorHagedorn" version="1.1"}%
%META:FILEATTACHMENT{name="SDD_091beta9.zip" attr="h" comment="SDD 0.91 Beta 9" date="1083773230" path="C:\Data\Desktop\DESCR\TDWG-SDD\Schema\091\SDD_091beta9.zip" size="57050" user="GregorHagedorn" version="1.1"}%
%META:FILEATTACHMENT{name="SDD_091beta10.zip" attr="h" comment="SDD 0.91 Beta 10" date="1084188580" path="C:\Data\Desktop\DESCR\TDWG-SDD\Schema\091\SDD_091beta10.zip" size="58257" user="GregorHagedorn" version="1.1"}%
%META:FILEATTACHMENT{name="SDD_091beta11.zip" attr="h" comment="Beta 11 = Final for Berlin meeting!" date="1084279915" path="C:\Data\Desktop\DESCR\TDWG-SDD\Schema\091\SDD_091beta11.zip" size="77014" user="GregorHagedorn" version="1.1"}%
%META:TOPICMOVED{by="GregorHagedorn" date="1079962486" from="SDD.SchemaChangeLog091EarlyBetaVersion" to="SDD.SchemaChangeLog091EarlyBetaVersions"}%
@
1.20
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="LeeBelbin" date="1258685129" format="1.1" reprev="1.20" version="1.20"}%
d8 1
a8 1
The current version of the BDI.SDD schema can always be found at CurrentSchemaVersion. Please do read through the report of changes, except perhaps for the few trivial at the start.
d24 1
a24 1
* Document root element changed to <nop>DataSets/<nop>DataSet collection. <nop>DataSet takes the place of the original Document. Multiple "Projects" can now be transported in one file or data stream. This is not urgent for BDI.SDD, but does not hurt either.
d30 1
a30 1
* <nop>IPRStatements is a list of various copyright, terms of use, disclaimer, acknowledgment etc. statements (new type common to BDI.SDD and ABCD schema). However, this is also present in <nop>TransformationHistory!
d43 1
a43 1
* _NOTE_: Project Definition could also be called "Envelope". This avoids "project", which is meaningful in BDI.SDD, but perhaps problematic in ABCD/taxon names?)
d52 1
a52 1
* New root section "<nop>GeneralDeclarations" created for concepts not specific to BDI.SDD, but needed in the schema. Alternative names for this section are:
d75 1
a75 1
* As a result, other parts of the <nop>GlossaryEntry (Citations, <nop>RevisionData) have now been made language/audience-independent as well. This also resolves some anomalies, e.g. that <nop>RevisionData were one the audience-specific part instead on the language-independent object as in all other cases in BDI.SDD.
d101 1
a101 1
* Within the <nop>ProxyBaseType, the <nop>FreeFormDescription was changed to Label. For all internal BDI.SDD object like characters or states, Label signifies a human readable representation, which is the intent of this data element as well.
d110 1
a110 1
BDI.SDD assumes that <nop>ClassNameConnectorType in the future will connect to nomenclators or species databases and these are unlikely to provide separate records for sex and stage. It would have been possible to move Sex and Stage to <nop>DescriptionBaseType, but they are required at the end of the diagnostic keys as well (sexes or stages may be keyed out separately!). Thus, a new type <nop>ClassRefWithAdditionalClassifierType has been derived from the <nop>ClassRefType and used for <nop>DescriptionBaseType/Class (which is the basis for coded as well as natural language descriptions) and <nop>StoredKeyDefType/Lead/Class. Furthermore the Object identifications may be sex/stage specific (but also many objects will have multiple stages in a single specimen...). At the moment the new <nop>ClassRefWithAdditionalClassifierType has also been used at <nop>DescribedObjectConnectorType/ClassIdentification.
d113 1
a113 1
* <nop>ClassHierarchies was previously restricted to single hierarchy, now allows multiple <nop>ClassHierarchy objects. A <nop>ClassHierarchy is the only way available in BDI.SDD to define
d134 1
a134 1
* The application-specific data containers (= extension mechanism to store non-BDI.SDD data) has been renamed from <nop>ApplicationData/Application to <nop>CustomExtensions/CustomExtension. Several applications may agree on common extensions, in which case the old names would not have been appropriate. The mechanism itself remains unchanged.
d147 1
a147 1
* Main.PrometheusII proposes to explictly reference descriptions that are to be included or generalized into a current description. Currently we expect in BDI.SDD this to rely on am automatic "description resource discovery" mechanism, i. e. _all_ object descriptions with the same class name are generalized, and classes are generalized to higher classes following the class (taxon) hierachy defined in Entities.
d150 1
a150 1
* Related: BDI.SDD probably needs a mechanism to mark the results of aggregation/generalization, computed characters, calulated statistics to document whether they are calculated / inherited or directly entered.
d186 1
a186 1
* We have a frequently used type that prevents validation of requiredness in BDI.SDD schema: Most labels use <nop>FormattedSimpleTextType, which if the element is required should always be non-empty. However, in contrast to simple text strings, <nop>FormattedSimpleTextType allows limited formatting (sup/sub etc.) and has a mixed content model. As a result, it is not possible in xml schema to require the length of it to be at least 1. This may be a case where we have to make a recommendation not to output empty elements, and a requirement that a missing element and an empty element are to be considered identical (applications should not attach different semantics to empty elements).
@
1.19
log
@Added topic name via script
@
text
@d1 2
a4 2
%META:TOPICINFO{author="GregorHagedorn" date="1127927444" format="1.0" version="1.18"}%
%META:TOPICPARENT{name="SchemaChangeLog"}%
d8 1
a8 1
The current version of the SDD schema can always be found at CurrentSchemaVersion. Please do read through the report of changes, except perhaps for the few trivial at the start.
d16 3
a18 3
* audiencekey in <nop>ProjectDefinition/Audiences/Audience was specified to have a pattern in the documentation, but the pattern was not defined in the schema, regular expression pattern added to schema 0.91.
* <nop>RevisionData were required in Description, Keys, and <nop>GlossaryEntries, now made optional.
* The Keys collection could be missing, or empty (0 to unlimited Key objects), now changed to 1 to unlimited Key objects.
d23 3
a25 3
* In an attempt to converge with ABCD:
* Document root element changed to <nop>DataSets/<nop>DataSet collection. <nop>DataSet takes the place of the original Document. Multiple "Projects" can now be transported in one file or data stream. This is not urgent for SDD, but does not hurt either.
* <nop>GenerationMetadata changed to <nop>TransformationHistory, conceived as a collection of at least one, possibly multiple Transformation elements. Alternative names: <nop>ConversionHistory, <nop>UBIF.DerivationHistory, <nop>HistoryMetadata, <nop>ContentHistoryMetadata, or <nop>DataHistoryMetadata.
d28 22
a49 22
* Element itself changed to <nop>ProjectMetadata
* <nop>AudienceSpecificData/Representation split into <nop>Description/Representation and <nop>IPRStatements/Representation.
* <nop>IPRStatements is a list of various copyright, terms of use, disclaimer, acknowledgment etc. statements (new type common to SDD and ABCD schema). However, this is also present in <nop>TransformationHistory!
* <nop>ProjectDefinition/HistoryWebAddress dropped. Annotation was: "@@@@ To be discussed. The idea is that a project may point to a web resource that informs about details about the history of the data (previous versions or a detailed log of changes)." Unless somebody needs it now, I propose that this should be an addition in a later version rather than included in the first release.
* <nop>ProjectDefinition/Icon moved to new <nop>ProjectDefinition/Description/Representation, thus making it audience specific.
Icon (or logos) are not necessarily language independent since they may include text!
* <nop>ProjectDefinition/WebAddress moved as well, different audiences/languages may be referred to different URIs!
* _New_ after Berlin meeting: attempt to use across standards (see UBIF.SchemaDiscussion), therefore audience-dependent project Description and IPR-Statements changed to language dependent. Language should simplify the adoption of common framework elements for all TDWG/GBIF standards.
* _New_ after Berlin meeting: Version structure revised.
* Version/PublicationDate changed to <nop>VersionReleaseDate to avoid possible confusion with <nop>LastRevision or data generation date in online situations.
* A Modifier element added (for beta, rel. candidate, etc.).
* Increment removed (because considered application-internal management mechanism, no need for interoperability).
* Major and Minor left as integers to improve interoperability and comparability (nobody commented on the proposal "change version to string" posed in previous version of the change log.)
* _New_ after Berlin meeting: The narrative (unconstrained text) elements <nop>GeographicCoverage and <nop>TaxonomicCoverage in <nop>ProjectDefinition|Projectmetadata/Description/Representation combined to Coverage.
Constrained <nop>ClassScope added, __OtherScope needs a proposal how to link it to other vocabularies. <nop>SourcePublication changed from a single to possibly several, and considered a scoping mechanism as well.
* _NOTE_: Project Definition could also be called "Envelope". This avoids "project", which is meaningful in SDD, but perhaps problematic in ABCD/taxon names?)
* _QUESTION_: Can project definition be merged with transformation history?)
* _PROPOSAL_: Need documentation of quality control methods and standards, e. g.
* <nop>QualityControlStandard: Name (and version, if applicable) of the published or internally documented quality control standard used.
* <nop>QualityControlDescription: Free-form description of methods used to ensure the quality of the descriptive data. In the absence of a standard, this should be a short description of the quality control procedures taken.
* _QUESTION_: <nop>ProjectDefinition/RevisionData/InitiationDate is xml:dateTime and required, which may cause problems in legacy projects. See discussion under InitiationDateForImportedLegacyData. The proposal makes sense in the context of project definition.
However, <nop>RevisionDataType is also used in several other contexts (single descriptions, glossary, characters, etc.) and the proposal does not make sense there. Do we need two slightly derived types? Has anybody a better idea?
d52 15
a66 15
* New root section "<nop>GeneralDeclarations" created for concepts not specific to SDD, but needed in the schema. Alternative names for this section are:
<nop>GeneralDefinitions, <nop>OverarchingIssues/Functions, <nop>CrosscuttingIssues/Functions, <nop>GeneralTerminology, <nop>GeneralTerms, <nop>GeneralVocabulary (the latter three do not
cover the possible inclusion of "language rules"). The following elements moved there:
* <nop>ProjectDefinition/Audiences
* Terminology/<nop>CodingStatusValues
* Terminology/<nop>UnivariateStatisticalMeasures (was <nop>StatisticalMeasures)
* (Newly created:) Global definitions for <nop>MeasurementUnits (Character definition Numerical/<nop>MeasurementUnit is consequently changed to a ref type). The optional generalization allows to define relations between units such that two size measures, one expressed in mm the other in cm become comparable.
* In each of <nop>CodingStatus, <nop>UnivariateStatisticalMeasure, <nop>MeasurementUnit, the "Generalization" element
(containing the machine-readable partial semantics of an object) was renamed to Specification.
* The Audience definitions lang and expertiselevel, previously defined as attributes, have been reorganized to follow the pattern of Label + Specification.
* The defaultaudience attribute present at Audience was only appropriately placed because all audience definitions were considered part of the project definition.
Now it is separated and moved to <nop>ProjectDefinition/DefaultAudience.
* <nop>StatisticalMeasures renamed to <nop>UnivariateStatisticalMeasures (compare Bob's comment on TWIKI about ClosedTopicMultivariateStatistics).
* Related: the fact that Char. def. Numerical/StatisticalMeasures had both a ref and a key confused several reviewers. To clarify, the key has now been renamed from ref to <nop>GeneralDeclarationRef and both this and the key on <nop>GeneralDefinitions/<nop>UnivariateStatisticalMeasures/<nop>UnivariateStatisticalMeasure is typed as <nop>StatisticalMeasureKeyValue.
* Element "Dimensionless" added to Specification of <nop>UnivariateStatisticalMeasures (answers whether the measurement unit apply to a statistic or not).
d69 27
a95 27
* Sequence of sections changed, Terminology section placed after Entities and Resources sections.
* Terminology/Glossary (= ontology definitions) strongly changed
* Multiple new ontological relations between terms added and subsumed under a new Ontology element. This urgently needs review!
* <nop>SensuLabel and <nop>KindOfTerm added. The first allows to distinguish between multiple definitions of a term (Term does not have to be unique, but Term + <nop>SensuLabel has to be!), the latter categorizes terms (is that doubtful??).
* With the introduction of <nop>SensuLabel, Term is no longer a keyref in the ontological definitions (synonym, antonym, etc.). Replaced with <nop>TermListType = List of <nop>GlossaryEntryRefType.
* Ontology now refers to <nop>GlossaryEntry keys rather than Term strings in a specific language. This is partly necessitated by the introduction of a <nop>SensuLabel.
* As a result, other parts of the <nop>GlossaryEntry (Citations, <nop>RevisionData) have now been made language/audience-independent as well. This also resolves some anomalies, e.g. that <nop>RevisionData were one the audience-specific part instead on the language-independent object as in all other cases in SDD.
* <nop>ExternalReference changed to <nop>ExternalDefinitionURI
* <nop>CharacterDefType
* Label changed from <nop>LabelPlusAbbreviationType to <nop>SimpleLabelType. This simplifies the model: Only a single label can be defined at the character level, all extended concepts (abbreviations, export tokens, images) are definable only in concept trees. Since concept trees require a terminal node for each character, the same expressiveness is maintained.
* Type changed to <nop>MeasurementScale, value list completed to include "ratio".
* Section Assumptions added to the character definition, <nop>MeasurementScale moved there
* Categorical and Numerical are tentatively changed to a choice rather than co-occurring. This needs discussion!
* PlausibilityRange added to numeric character definition. Applies to all values and statistics, except those that are dimensionless (like variance).
* <nop>GenericStates renamed to <nop>ConceptStates (= states that are present at nodes in the concept tree; this is the only place where <nop>GenericStates
was present). "Generic" was considered to be confusing since for biologists it may be understood as referring to states describing a Genus.
* "Probability modifiers" have been renamed back to "Certainty modifiers" (they were previously called "Uncertainty modifiers" before changing to "Probability". As
already discussed in Brazil (but later forgotten), Probability is ambiguous since low occurrence frequency of a state also results in a low probability that a given object has a given character state.
* Terminology/Modifiers/Sets (intended to define reusable modifier sets which would then be associated with characters) and <nop>CharacterDefType/ModifierSets where both replaced with a new Concept/ApplicableModifiers element in the concept trees. For the modifier sets a key and a label had to be defined so they could be selected in each characters through a keyref. The new solution avoids both the label and the key/keyref mechanism: The concept label also identifies the modifier set, and the characters are already defined by all characters included in a concept branch. The disadvantage is, that some tree-walking is required to find which modifier is applicable to which character.
* In frequency modifiers "ProbabilityRange" was changed to "CertaintyRange".
* Frequency and Certainty modifiers changed to now contain the Range definition inside a Specification element.
* Concept trees: An organizing element "Specification" added (similar to definitions in <nop>GeneralDeclarations). The types, roles, etc. inside were reorganized and the enumerations changed (e. g., <nop>MethodHierarchy to <nop>InstrumentationHierarchy, <nop>PartHierarchy split into <nop>PartOfHierarchy and <nop>PartGeneralizationHierarchy). Also please critizise the current structure: "DesignedFor/Role=Filtering". Do the element and value names make sense to native speakers? Any better suggestions?
* _PROPOSAL_: Rename <nop>AutoAddStates to <nop>UpdateStateRefsTriggers (those state from a generic state set must be as <nop>StateReference in Character/Categorical/States). GH: I believe it should be the other way round, i.e. instead of a state-set reference at the character, there should be a list of characters referenced at the place concept node. I have started to do this, but not yet finished! See "####" at the end of the document!
* _QUESTION_: Allow multiple mappings of fine-grained states to coarse-grained states, and make these mappings expertise-specific (part of audience definition)?
Do we need multiple state sets within a character? Broad categories and narrow categories? Currently mapping of state is within a single character, and the two state sets need to be detected by application (those present minus those mapped away. Note: mapping can be indirect a-> b-> c, only c should remain.)
Do we need multiple <em>named</em> mapping definitions in the future? See StateMapping for further discussion.
d98 17
a114 17
* The "connector" metapher was not well received and not considered intuitive. As an attempt, I propose to use a proxy metapher: The proxy object is a local object "standing-in" for the external, often asynchronously available resource on the internet. In programming this is called the "proxy-pattern". As a variation proxy objects may, however, also "stand-in" if no external object can be found and a local object (e.g. in biology: taxon name, specimen) has to be defined. Specific changes:
* <nop>ResourceConnectorBaseType changed to <nop>ProxyBaseType
* <nop>ClassNameConnectorType, <nop>ClassHierarchyConnectorType, <nop>DescribedObjectConnectorType, etc. all changed to <nop>...ProxyType
* Within the <nop>ProxyBaseType, the <nop>FreeFormDescription was changed to Label. For all internal SDD object like characters or states, Label signifies a human readable representation, which is the intent of this data element as well.
* The ID/external object linking was strongly changed. The previous version (which was never really worked out so far) worked only if the object query could be embedded into a single URI query string, or if the old <nop>ServiceProvider referred to a web service wsdl with a single method and a single parameter. Now the <nop>ObjectLink rather than the old "ExternalID" points to the object in case of a single URI query string. The method and parameter names, and the ID-values are now given separately for web services. Furthermore, ABCD does not plan to provide a single or unified ID for collection units, but uses three separate variables that together uniquely refer to a specimen object. This is supported, but it would still be desirable to have a single ID to simplify ID comparison and distinguish ID from other parameter values that may be required to use a webservice method (but may be constant for different objects).
* In addition to URL and webservice, tentative support for DOI (digital object identifiers) and <nop>LifeScience ID (LSID) was added (including an LSIDs defining a pattern constraint).
* _New_ after Berlin meeting: Sequence of Label (= <nop>FreeFormDescription in 0.9) and <nop>ObjectLink changed; Label is now first. This agrees with the use of Label throughout the other parts of the schema (characters, states, etc.).
* Entities/Classes changed to Entities/ClassNames, //Class to //ClassName. Note: in addition to the <nop>ClassName (taxon name) pointers present we may need alternative pointers into the class concepts (taxon concepts) as present <nop>ClassHierarchy!
* "<nop>TaxonNameInSource" renamed to "<nop>ClassNameInSource". Related open issue: Combine with Location? Else we need to have a <nop>CitationBaseType without <nop>ClassNameInSource used in Glossary and Keys, and a derived type used in Descriptions!
* _New_ after Berlin meeting: <nop>ClassIdentification changed to <nop>ClassAssignment; the process will be an identification, but the result is assigning the object description to a class. The term Identification caused confusion in the discussion.
* Bob pointed out the inconsistency of declaring the standard to be independent of the biodiversity domain (thus using class/object instead of taxon/specimen) and still having taxon, taxonauthor, etc. in UBIF.FormattedText. For the time being I have removed these (they are still preserved in an unused backup version of the type, so they can easily be put back).
* Similarly, the biology-specific elements Sex and Stage were removed from <nop>ClassNameProxyType (= <nop>ClassNameConnectorType in 0.9; = the type of the proxy object defining links to external name databases).
SDD assumes that <nop>ClassNameConnectorType in the future will connect to nomenclators or species databases and these are unlikely to provide separate records for sex and stage. It would have been possible to move Sex and Stage to <nop>DescriptionBaseType, but they are required at the end of the diagnostic keys as well (sexes or stages may be keyed out separately!). Thus, a new type <nop>ClassRefWithAdditionalClassifierType has been derived from the <nop>ClassRefType and used for <nop>DescriptionBaseType/Class (which is the basis for coded as well as natural language descriptions) and <nop>StoredKeyDefType/Lead/Class. Furthermore the Object identifications may be sex/stage specific (but also many objects will have multiple stages in a single specimen...). At the moment the new <nop>ClassRefWithAdditionalClassifierType has also been used at <nop>DescribedObjectConnectorType/ClassIdentification.
* The above mentioned type <nop>ClassRefWithAdditionalClassifierType should be defined generalized, avoiding biology-specific concepts like sex and stage.
* See SecondaryClassifiersProposal (and earlier: TheProblemOfSex)!
* <nop>ClassHierarchies was previously restricted to single hierarchy, now allows multiple <nop>ClassHierarchy objects. A <nop>ClassHierarchy is the only way available in SDD to define
taxon subsets (character subsets are defined in the <nop>ConceptTrees).
d116 1
a116 1
* _PROPOSAL_: Add an Abbreviation element to Class and Object in Entities? Would not likely be updated by service, but may be useful or even required for reports. Update problem is related to problem with updating the Caption of <nop>MediaResources.
d119 2
a120 2
* In coded and natural language descriptions a Header element was introduced to improve the overview and organization of information.
* <nop>CharacterData_BaseType/Sequence with values "terminology" or "description" was considered difficult to understand. Bob proposed to replace it with a boolean "<nop>StatesAreOrdered" which has been done.
d122 1
a122 1
* _PROPOSAL_: Rename <nop>CodedDescriptions to <nop>SymbolicDescriptions, see Analytical Philosophy (I only checked the Enc. Britannica, I am no expert in this!)
d125 6
a130 6
* Keys/Key was changed to <nop>IdentificationKey/IdentificationKeys. The term "key" was perceived as too general, causing especially misunderstanding for non-biologists like programmers.
Instead of the depracated "guided key" other terms are "Pathway key" and "Stored key". "Dichotomous key" is inappropriate.
* <nop>CodedStatements in Keys (coded terminology equivalent to the natural language key statement) used to be a simple list of states. To accomodate the frequently occurring more complex statements in keys, e. g. "margin of fruitbody yellow (or orange and hairy)" -> i.e. not if only orange, or "margin of fruitbody yellow, never with denticles" -> other surface structures may be present, a boolean operator logic modeled after <nop>MathML has been added to <nop>CodedStatements inside Keys.
* Related: Should Boolean logic (not, and, or) be added to any natural language markup?
* Should guided keys be marked up using the natural language markup method rather than using a separate section, as currently proposed? Currently, the key markup was thought to follow the coded description model, but now it has been extended. Problem: Boolean logic is frequently found in the lead statements of keys, but rarely in natural language taxon descriptions. However, if Boolean logic operators are introduced to both, it would be a strong argument to use the same method in NLD and Keys, rather than having three variants.
* Alternatively, we may want to extend the <nop>CodedDescriptions and provide Boolean logic operators there as well. This would be a heavy burden on database-oriented descriptive data processing, however. Or can someone provide a simple model how to handle arbitrary logical and/or combinations in a relatively simple database model?
d133 6
a138 6
* <nop>CitationType: optional <nop>LastVerified and <nop>InvalidSince date elements added, important for volatile online publications.
* The application-specific data containers (= extension mechanism to store non-SDD data) has been renamed from <nop>ApplicationData/Application to <nop>CustomExtensions/CustomExtension. Several applications may agree on common extensions, in which case the old names would not have been appropriate. The mechanism itself remains unchanged.
* Model groups like "(Rich)AnnotationGroup" containing only optional elements have been themself made optional. This changes nothing in the validation and schema, but seems to help when using Castor data binding.
* In the <nop>LabelPlusAbbreviationRepresentationType (used frequently in Label/Representation elements) the Selector element containing media (usually images) was renamed to <nop>MediaResources. This is the same element name used generically throughout the schema.
* The name "Selectors" was intended to express that only certain media should be added here - those that are sufficiently informative and concise at the same time to be used as selectors instead of text labels. However, the use of Selector lead to more confusion than clarification, and the purpose of the media is expressed through the Label context, i.e. these are labeling images etc.
* The only other media resource is Icon which remains semantically labeled.
d144 1
a144 1
* Class names (= taxon names referenced in descriptions or keys) may have to be audience specific! See LanguageSpecificClassNames for a discussion!
d146 6
a151 6
* Descriptions generalization questions, i.e. inferring descriptions from other descriptions:
* Main.PrometheusII proposes to explictly reference descriptions that are to be included or generalized into a current description. Currently we expect in SDD this to rely on am automatic "description resource discovery" mechanism, i. e. _all_ object descriptions with the same class name are generalized, and classes are generalized to higher classes following the class (taxon) hierachy defined in Entities.
* <nop>BioLink proposes (correct?) to explicitly flag which characters or states allow generalization, and whether from above or below.
* (= the first is explicit generalization on the object/class hierarchy level, the second explicit which characters/states are included in generalization.)
* Related: SDD probably needs a mechanism to mark the results of aggregation/generalization, computed characters, calulated statistics to document whether they are calculated / inherited or directly entered.
* Related: Do we have to document original terminology labels during data entry (i.e. in the language/audience representation used during scoring). The audience itself may be interesting (as a code), but even more the terminology may have been changed slightly (evolution of terminology) since scoring. A record of score-time representation would increase the trust in the coded scores and allow some backtracking of problems.
d153 1
a153 1
* In Descriptions we call an element <nop>GeographicalScope, in <nop>ProjectDefinition basically the same thing <nop>GeographicalCoverage! However, Descriptions refers to defined objects in Resources, whereas in <nop>ProjectDefinition it is free-form text (modeled directly after <nop>DublinCore). Make this consistent and always use Resources/Geography/Location object references?
d155 1
a155 1
* Problem of storing calculated data and marking them as "autogenerated" (or which term to use?). Related to problem of inheriting information up and down taxonomic tree. Similar problems are already marked up in the "Origin" element in character and NLD data, and in the inherited attribute associated with character ratings. In the case of statistical measures, marking the Origin as calculated would refer to the raw data in an observation set. However, there is some discussion on the Wiki (see RepeatedObservations) whether we need a keyref to exactly one observation set or not.
d157 1
a157 1
* We probably need to have more than one class hierarchy and add a marker to indicate which hierarchy is formal, and which contains non-taxonomic groupings. In Brazil Kevin reported on Lucid providing a "tag" mechanism to mark "silly characters" intended only to group items like "100 worst weed species: yes/no". XPER reported a similar tag mechanism for items (instead of characters as in Lucid) to tags items for specific problems: diseases / quarantine species / disease vectors. To me both kind of problems seem to be most appropriately handled as a non-taxonomic class hierarchy. Any proposals how to handle this? As a first step an additional attribute "IsPhylogenetic" in the class hierarchy is proposed (already done).
d159 3
a161 3
* Glossary:
* Do we need some method to express ranges for cardinality: How many legs may there be etc.?
* Do we need some method to associate states with properties/types?
d163 1
a163 1
* Should the natural language markup be brought closer to xhtml by using <span class=""> for markup?
d165 1
a165 1
* Basing character states on concept states (= reuse of state sets in multiple characters) causes problem with order (ordinal scale) characters. The states in a character may be inherited from from multiple concepts nodes. Each of these will probably have order in the concept, but the final order can only be defined in each character. This seems unfortunate.
d167 2
a168 2
* Can we describe images? Is this automatically implied in reversing the association between a description and an image or not?
Images may only illustrate parts of the description.
d170 1
a170 1
* Can we format numeric values in reports? See DELTA *DECIMAL PLACES. How do we format sets of statistical measures in natural language or other reports? The (min-) lowerrange - central - upperrange (-max) format is not necessarily universal. Currently it is nevertheless fixed in application code and cannot be defined by users. Since many variants which individual measures are present exist, this can probably not be done with a <nop>TextBefore/After strategy (possible for Min, Max, but not for ranges with/without mean, "3-6", "5", "3-5-6"). Also, open ranges exist, which should be output as "at least 3 cm long" in natural language. Also: formats are audience/language-specific!
d172 1
a172 1
* Can we find a smart method to format related and dependent value like width x length?
d174 1
a174 1
* Using polymorphism for character definitions. Color as separate character type?
d176 1
a176 1
* Media Resource may need a location detail (if figure has multiple labeled fragments). Perhaps call this <nop>FragmentLabel?
d178 1
a178 1
* Media-"FragmentLabels", but even more the "Location" in Citations may be language sensitive! "table 1", "tab. 2", "figure 3" in English, "Abbildung 3" in German etc.!
d186 1
a186 1
* We have a frequently used type that prevents validation of requiredness in SDD schema: Most labels use <nop>FormattedSimpleTextType, which if the element is required should always be non-empty. However, in contrast to simple text strings, <nop>FormattedSimpleTextType allows limited formatting (sup/sub etc.) and has a mixed content model. As a result, it is not possible in xml schema to require the length of it to be at least 1. This may be a case where we have to make a recommendation not to output empty elements, and a requirement that a missing element and an empty element are to be considered identical (applications should not attach different semantics to empty elements).
d188 1
a188 1
The missing element issue seems approachable by declaring things nillable and allowing xsi:nil="true" to distinguish from the missing case. This arose also in the discussion ResolvedTopicIsDiGIRadequateForSDD -- Main.BobMorris - 29 Apr 2004
d199 5
a203 5
Concept key="123"
ConceptStates
StateDefinition key="1"
StateDefinition key="2"
StateDefinition key="3"
d207 4
a210 4
StateReference ref="1"
StateReference ref="2"
StateReference ref="3"
AutoAddStates ref="123"
d217 7
a223 7
Concept key="123"
ConceptStates
StateDefinition key="1"
StateDefinition key="2"
StateDefinition key="3"
UpdateStateRefsTrigger
Character ref="123"
d227 3
a229 3
StateReference ref="1"
StateReference ref="2"
StateReference ref="3"
@
1.18
log
@none
@
text
@d1 2
@
1.17
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1097054118" format="1.0" version="1.17"}%
d3 237
a239 236
<h1>Changes in 0.91 beta 15 (relative to the 0.9 Dec. 1. 2003 release)</h1>
<strong>This is an updated version containing most of the minor changes discussed at the [[SDD2004Berlin][meeting in Berlin]]. Some changes are still pending.
The current version of the SDD schema can always be found at CurrentSchemaVersion. Please do read through the report of changes, except perhaps for the few trivial at the start.
Please take a look at the schema to verify that you agree with the changes and that they make sense to you.</strong>
<strong>Note:</strong> I have tried to document changes, but I cannot guarantee that everything is properly documented.
In fact, since <nop>GenerationMetadata and <nop>ProjectDefinition are heavily changed in an attempt to find common
ground between the various GBIF standards (current discussion involves only ABCD so far), I have given up on documenting all detailed changes therein (but some are commented).
<h3>Trivial omissions that were present in 0.9, corrected in 0.91<a name="trivial"/></h3>
* audiencekey in <nop>ProjectDefinition/Audiences/Audience was specified to have a pattern in the documentation, but the pattern was not defined in the schema, regular expression pattern added to schema 0.91.
* <nop>RevisionData were required in Description, Keys, and <nop>GlossaryEntries, now made optional.
* The Keys collection could be missing, or empty (0 to unlimited Key objects), now changed to 1 to unlimited Key objects.
<h3>Non-trivial changes enacted plus proposals not enacted<a name="nontrivial"/></h3>
*Root*
* In an attempt to converge with ABCD:
* Document root element changed to <nop>DataSets/<nop>DataSet collection. <nop>DataSet takes the place of the original Document. Multiple "Projects" can now be transported in one file or data stream. This is not urgent for SDD, but does not hurt either.
* <nop>GenerationMetadata changed to <nop>TransformationHistory, conceived as a collection of at least one, possibly multiple Transformation elements. Alternative names: <nop>ConversionHistory, <nop>UBIF.DerivationHistory, <nop>HistoryMetadata, <nop>ContentHistoryMetadata, or <nop>DataHistoryMetadata.
*<nop>ProjectDefinition*
* Element itself changed to <nop>ProjectMetadata
* <nop>AudienceSpecificData/Representation split into <nop>Description/Representation and <nop>IPRStatements/Representation.
* <nop>IPRStatements is a list of various copyright, terms of use, disclaimer, acknowledgment etc. statements (new type common to SDD and ABCD schema). However, this is also present in <nop>TransformationHistory!
* <nop>ProjectDefinition/HistoryWebAddress dropped. Annotation was: "@@@@ To be discussed. The idea is that a project may point to a web resource that informs about details about the history of the data (previous versions or a detailed log of changes)." Unless somebody needs it now, I propose that this should be an addition in a later version rather than included in the first release.
* <nop>ProjectDefinition/Icon moved to new <nop>ProjectDefinition/Description/Representation, thus making it audience specific.
Icon (or logos) are not necessarily language independent since they may include text!
* <nop>ProjectDefinition/WebAddress moved as well, different audiences/languages may be referred to different URIs!
* _New_ after Berlin meeting: attempt to use across standards (see UBIF.SchemaDiscussion), therefore audience-dependent project Description and IPR-Statements changed to language dependent. Language should simplify the adoption of common framework elements for all TDWG/GBIF standards.
* _New_ after Berlin meeting: Version structure revised.
* Version/PublicationDate changed to <nop>VersionReleaseDate to avoid possible confusion with <nop>LastRevision or data generation date in online situations.
* A Modifier element added (for beta, rel. candidate, etc.).
* Increment removed (because considered application-internal management mechanism, no need for interoperability).
* Major and Minor left as integers to improve interoperability and comparability (nobody commented on the proposal "change version to string" posed in previous version of the change log.)
* _New_ after Berlin meeting: The narrative (unconstrained text) elements <nop>GeographicCoverage and <nop>TaxonomicCoverage in <nop>ProjectDefinition|Projectmetadata/Description/Representation combined to Coverage.
Constrained <nop>ClassScope added, __OtherScope needs a proposal how to link it to other vocabularies. <nop>SourcePublication changed from a single to possibly several, and considered a scoping mechanism as well.
* _NOTE_: Project Definition could also be called "Envelope". This avoids "project", which is meaningful in SDD, but perhaps problematic in ABCD/taxon names?)
* _QUESTION_: Can project definition be merged with transformation history?)
* _PROPOSAL_: Need documentation of quality control methods and standards, e. g.
* <nop>QualityControlStandard: Name (and version, if applicable) of the published or internally documented quality control standard used.
* <nop>QualityControlDescription: Free-form description of methods used to ensure the quality of the descriptive data. In the absence of a standard, this should be a short description of the quality control procedures taken.
* _QUESTION_: <nop>ProjectDefinition/RevisionData/InitiationDate is xml:dateTime and required, which may cause problems in legacy projects. See discussion under InitiationDateForImportedLegacyData. The proposal makes sense in the context of project definition.
However, <nop>RevisionDataType is also used in several other contexts (single descriptions, glossary, characters, etc.) and the proposal does not make sense there. Do we need two slightly derived types? Has anybody a better idea?
*GeneralDeclarations*
* New root section "<nop>GeneralDeclarations" created for concepts not specific to SDD, but needed in the schema. Alternative names for this section are:
<nop>GeneralDefinitions, <nop>OverarchingIssues/Functions, <nop>CrosscuttingIssues/Functions, <nop>GeneralTerminology, <nop>GeneralTerms, <nop>GeneralVocabulary (the latter three do not
cover the possible inclusion of "language rules"). The following elements moved there:
* <nop>ProjectDefinition/Audiences
* Terminology/<nop>CodingStatusValues
* Terminology/<nop>UnivariateStatisticalMeasures (was <nop>StatisticalMeasures)
* (Newly created:) Global definitions for <nop>MeasurementUnits (Character definition Numerical/<nop>MeasurementUnit is consequently changed to a ref type). The optional generalization allows to define relations between units such that two size measures, one expressed in mm the other in cm become comparable.
* In each of <nop>CodingStatus, <nop>UnivariateStatisticalMeasure, <nop>MeasurementUnit, the "Generalization" element
(containing the machine-readable partial semantics of an object) was renamed to Specification.
* The Audience definitions lang and expertiselevel, previously defined as attributes, have been reorganized to follow the pattern of Label + Specification.
* The defaultaudience attribute present at Audience was only appropriately placed because all audience definitions were considered part of the project definition.
Now it is separated and moved to <nop>ProjectDefinition/DefaultAudience.
* <nop>StatisticalMeasures renamed to <nop>UnivariateStatisticalMeasures (compare Bob's comment on TWIKI about ClosedTopicMultivariateStatistics).
* Related: the fact that Char. def. Numerical/StatisticalMeasures had both a ref and a key confused several reviewers. To clarify, the key has now been renamed from ref to <nop>GeneralDeclarationRef and both this and the key on <nop>GeneralDefinitions/<nop>UnivariateStatisticalMeasures/<nop>UnivariateStatisticalMeasure is typed as <nop>StatisticalMeasureKeyValue.
* Element "Dimensionless" added to Specification of <nop>UnivariateStatisticalMeasures (answers whether the measurement unit apply to a statistic or not).
*Terminology*
* Sequence of sections changed, Terminology section placed after Entities and Resources sections.
* Terminology/Glossary (= ontology definitions) strongly changed
* Multiple new ontological relations between terms added and subsumed under a new Ontology element. This urgently needs review!
* <nop>SensuLabel and <nop>KindOfTerm added. The first allows to distinguish between multiple definitions of a term (Term does not have to be unique, but Term + <nop>SensuLabel has to be!), the latter categorizes terms (is that doubtful??).
* With the introduction of <nop>SensuLabel, Term is no longer a keyref in the ontological definitions (synonym, antonym, etc.). Replaced with <nop>TermListType = List of <nop>GlossaryEntryRefType.
* Ontology now refers to <nop>GlossaryEntry keys rather than Term strings in a specific language. This is partly necessitated by the introduction of a <nop>SensuLabel.
* As a result, other parts of the <nop>GlossaryEntry (Citations, <nop>RevisionData) have now been made language/audience-independent as well. This also resolves some anomalies, e.g. that <nop>RevisionData were one the audience-specific part instead on the language-independent object as in all other cases in SDD.
* <nop>ExternalReference changed to <nop>ExternalDefinitionURI
* <nop>CharacterDefType
* Label changed from <nop>LabelPlusAbbreviationType to <nop>SimpleLabelType. This simplifies the model: Only a single label can be defined at the character level, all extended concepts (abbreviations, export tokens, images) are definable only in concept trees. Since concept trees require a terminal node for each character, the same expressiveness is maintained.
* Type changed to <nop>MeasurementScale, value list completed to include "ratio".
* Section Assumptions added to the character definition, <nop>MeasurementScale moved there
* Categorical and Numerical are tentatively changed to a choice rather than co-occurring. This needs discussion!
* PlausibilityRange added to numeric character definition. Applies to all values and statistics, except those that are dimensionless (like variance).
* <nop>ResolvedTopicGenericStates renamed to <nop>ConceptStates (= states that are present at nodes in the concept tree; this is the only place where <nop>ResolvedTopicGenericStates
was present). "Generic" was considered to be confusing since for biologists it may be understood as referring to states describing a Genus.
* "Probability modifiers" have been renamed back to "Certainty modifiers" (they were previously called "Uncertainty modifiers" before changing to "Probability". As
already discussed in Brazil (but later forgotten), Probability is ambiguous since low occurrence frequency of a state also results in a low probability that a given object has a given character state.
* Terminology/Modifiers/Sets (intended to define reusable modifier sets which would then be associated with characters) and <nop>CharacterDefType/ModifierSets where both replaced with a new Concept/ApplicableModifiers element in the concept trees. For the modifier sets a key and a label had to be defined so they could be selected in each characters through a keyref. The new solution avoids both the label and the key/keyref mechanism: The concept label also identifies the modifier set, and the characters are already defined by all characters included in a concept branch. The disadvantage is, that some tree-walking is required to find which modifier is applicable to which character.
* In frequency modifiers "ProbabilityRange" was changed to "CertaintyRange".
* Frequency and Certainty modifiers changed to now contain the Range definition inside a Specification element.
* Concept trees: An organizing element "Specification" added (similar to definitions in <nop>GeneralDeclarations). The types, roles, etc. inside were reorganized and the enumerations changed (e. g., <nop>MethodHierarchy to <nop>InstrumentationHierarchy, <nop>PartHierarchy split into <nop>PartOfHierarchy and <nop>PartGeneralizationHierarchy). Also please critizise the current structure: "DesignedFor/Role=Filtering". Do the element and value names make sense to native speakers? Any better suggestions?
* _PROPOSAL_: Rename <nop>AutoAddStates to <nop>UpdateStateRefsTriggers (those state from a generic state set must be as <nop>StateReference in Character/Categorical/States). GH: I believe it should be the other way round, i.e. instead of a state-set reference at the character, there should be a list of characters referenced at the place concept node. I have started to do this, but not yet finished! See "####" at the end of the document!
* _QUESTION_: Allow multiple mappings of fine-grained states to coarse-grained states, and make these mappings expertise-specific (part of audience definition)?
Do we need multiple state sets within a character? Broad categories and narrow categories? Currently mapping of state is within a single character, and the two state sets need to be detected by application (those present minus those mapped away. Note: mapping can be indirect a-> b-> c, only c should remain.)
Do we need multiple <em>named</em> mapping definitions in the future? See StateMapping for further discussion.
*Entities*
* The "connector" metapher was not well received and not considered intuitive. As an attempt, I propose to use a proxy metapher: The proxy object is a local object "standing-in" for the external, often asynchronously available resource on the internet. In programming this is called the "proxy-pattern". As a variation proxy objects may, however, also "stand-in" if no external object can be found and a local object (e.g. in biology: taxon name, specimen) has to be defined. Specific changes:
* <nop>ResourceConnectorBaseType changed to <nop>ProxyBaseType
* <nop>ClassNameConnectorType, <nop>ClassHierarchyConnectorType, <nop>DescribedObjectConnectorType, etc. all changed to <nop>...ProxyType
* Within the <nop>ProxyBaseType, the <nop>FreeFormDescription was changed to Label. For all internal SDD object like characters or states, Label signifies a human readable representation, which is the intent of this data element as well.
* The ID/external object linking was strongly changed. The previous version (which was never really worked out so far) worked only if the object query could be embedded into a single URI query string, or if the old <nop>ServiceProvider referred to a web service wsdl with a single method and a single parameter. Now the <nop>ObjectLink rather than the old "ExternalID" points to the object in case of a single URI query string. The method and parameter names, and the ID-values are now given separately for web services. Furthermore, ABCD does not plan to provide a single or unified ID for collection units, but uses three separate variables that together uniquely refer to a specimen object. This is supported, but it would still be desirable to have a single ID to simplify ID comparison and distinguish ID from other parameter values that may be required to use a webservice method (but may be constant for different objects).
* In addition to URL and webservice, tentative support for DOI (digital object identifiers) and <nop>LifeScience ID (LSID) was added (including an LSIDs defining a pattern constraint).
* _New_ after Berlin meeting: Sequence of Label (= <nop>FreeFormDescription in 0.9) and <nop>ObjectLink changed; Label is now first. This agrees with the use of Label throughout the other parts of the schema (characters, states, etc.).
* Entities/Classes changed to Entities/ClassNames, //Class to //ClassName. Note: in addition to the <nop>ClassName (taxon name) pointers present we may need alternative pointers into the class concepts (taxon concepts) as present <nop>ClassHierarchy!
* "<nop>TaxonNameInSource" renamed to "<nop>ClassNameInSource". Related open issue: Combine with Location? Else we need to have a <nop>CitationBaseType without <nop>ClassNameInSource used in Glossary and Keys, and a derived type used in Descriptions!
* _New_ after Berlin meeting: <nop>ClassIdentification changed to <nop>ClassAssignment; the process will be an identification, but the result is assigning the object description to a class. The term Identification caused confusion in the discussion.
* Bob pointed out the inconsistency of declaring the standard to be independent of the biodiversity domain (thus using class/object instead of taxon/specimen) and still having taxon, taxonauthor, etc. in UBIF.FormattedText. For the time being I have removed these (they are still preserved in an unused backup version of the type, so they can easily be put back).
* Similarly, the biology-specific elements Sex and Stage were removed from <nop>ClassNameProxyType (= <nop>ClassNameConnectorType in 0.9; = the type of the proxy object defining links to external name databases).
SDD assumes that <nop>ClassNameConnectorType in the future will connect to nomenclators or species databases and these are unlikely to provide separate records for sex and stage. It would have been possible to move Sex and Stage to <nop>DescriptionBaseType, but they are required at the end of the diagnostic keys as well (sexes or stages may be keyed out separately!). Thus, a new type <nop>ClassRefWithAdditionalClassifierType has been derived from the <nop>ClassRefType and used for <nop>DescriptionBaseType/Class (which is the basis for coded as well as natural language descriptions) and <nop>StoredKeyDefType/Lead/Class. Furthermore the Object identifications may be sex/stage specific (but also many objects will have multiple stages in a single specimen...). At the moment the new <nop>ClassRefWithAdditionalClassifierType has also been used at <nop>DescribedObjectConnectorType/ClassIdentification.
* The above mentioned type <nop>ClassRefWithAdditionalClassifierType should be defined generalized, avoiding biology-specific concepts like sex and stage.
* See SecondaryClassifiersProposal (and earlier: TheProblemOfSex)!
* <nop>ClassHierarchies was previously restricted to single hierarchy, now allows multiple <nop>ClassHierarchy objects. A <nop>ClassHierarchy is the only way available in SDD to define
taxon subsets (character subsets are defined in the <nop>ConceptTrees).
* _PROPOSAL_: Add an Abbreviation element to Class and Object in Entities? Would not likely be updated by service, but may be useful or even required for reports. Update problem is related to problem with updating the Caption of <nop>MediaResources.
*Descriptions*
* In coded and natural language descriptions a Header element was introduced to improve the overview and organization of information.
* <nop>CharacterData_BaseType/Sequence with values "terminology" or "description" was considered difficult to understand. Bob proposed to replace it with a boolean "<nop>StatesAreOrdered" which has been done.
* _PROPOSAL_: Rename <nop>CodedDescriptions to <nop>SymbolicDescriptions, see Analytical Philosophy (I only checked the Enc. Britannica, I am no expert in this!)
*Keys*
* Keys/Key was changed to <nop>IdentificationKey/IdentificationKeys. The term "key" was perceived as too general, causing especially misunderstanding for non-biologists like programmers.
Instead of the depracated "guided key" other terms are "Pathway key" and "Stored key". "Dichotomous key" is inappropriate.
* <nop>CodedStatements in Keys (coded terminology equivalent to the natural language key statement) used to be a simple list of states. To accomodate the frequently occurring more complex statements in keys, e. g. "margin of fruitbody yellow (or orange and hairy)" -> i.e. not if only orange, or "margin of fruitbody yellow, never with denticles" -> other surface structures may be present, a boolean operator logic modeled after <nop>MathML has been added to <nop>CodedStatements inside Keys.
* Related: Should Boolean logic (not, and, or) be added to any natural language markup?
* Should guided keys be marked up using the natural language markup method rather than using a separate section, as currently proposed? Currently, the key markup was thought to follow the coded description model, but now it has been extended. Problem: Boolean logic is frequently found in the lead statements of keys, but rarely in natural language taxon descriptions. However, if Boolean logic operators are introduced to both, it would be a strong argument to use the same method in NLD and Keys, rather than having three variants.
* Alternatively, we may want to extend the <nop>CodedDescriptions and provide Boolean logic operators there as well. This would be a heavy burden on database-oriented descriptive data processing, however. Or can someone provide a simple model how to handle arbitrary logical and/or combinations in a relatively simple database model?
*General*
* <nop>CitationType: optional <nop>LastVerified and <nop>InvalidSince date elements added, important for volatile online publications.
* The application-specific data containers (= extension mechanism to store non-SDD data) has been renamed from <nop>ApplicationData/Application to <nop>CustomExtensions/CustomExtension. Several applications may agree on common extensions, in which case the old names would not have been appropriate. The mechanism itself remains unchanged.
* Model groups like "(Rich)AnnotationGroup" containing only optional elements have been themself made optional. This changes nothing in the validation and schema, but seems to help when using Castor data binding.
* In the <nop>LabelPlusAbbreviationRepresentationType (used frequently in Label/Representation elements) the Selector element containing media (usually images) was renamed to <nop>MediaResources. This is the same element name used generically throughout the schema.
* The name "Selectors" was intended to express that only certain media should be added here - those that are sufficiently informative and concise at the same time to be used as selectors instead of text labels. However, the use of Selector lead to more confusion than clarification, and the purpose of the media is expressed through the Label context, i.e. these are labeling images etc.
* The only other media resource is Icon which remains semantically labeled.
---
<h1>Open Questions<a name="questions"/></h1>
* Class names (= taxon names referenced in descriptions or keys) may have to be audience specific! See LanguageSpecificClassNames for a discussion!
* Descriptions generalization questions, i.e. inferring descriptions from other descriptions:
* Main.PrometheusII proposes to explictly reference descriptions that are to be included or generalized into a current description. Currently we expect in SDD this to rely on am automatic "description resource discovery" mechanism, i. e. _all_ object descriptions with the same class name are generalized, and classes are generalized to higher classes following the class (taxon) hierachy defined in Entities.
* <nop>BioLink proposes (correct?) to explicitly flag which characters or states allow generalization, and whether from above or below.
* (= the first is explicit generalization on the object/class hierarchy level, the second explicit which characters/states are included in generalization.)
* Related: SDD probably needs a mechanism to mark the results of aggregation/generalization, computed characters, calulated statistics to document whether they are calculated / inherited or directly entered.
* Related: Do we have to document original terminology labels during data entry (i.e. in the language/audience representation used during scoring). The audience itself may be interesting (as a code), but even more the terminology may have been changed slightly (evolution of terminology) since scoring. A record of score-time representation would increase the trust in the coded scores and allow some backtracking of problems.
* In Descriptions we call an element <nop>GeographicalScope, in <nop>ProjectDefinition basically the same thing <nop>GeographicalCoverage! However, Descriptions refers to defined objects in Resources, whereas in <nop>ProjectDefinition it is free-form text (modeled directly after <nop>DublinCore). Make this consistent and always use Resources/Geography/Location object references?
* Problem of storing calculated data and marking them as "autogenerated" (or which term to use?). Related to problem of inheriting information up and down taxonomic tree. Similar problems are already marked up in the "Origin" element in character and NLD data, and in the inherited attribute associated with character ratings. In the case of statistical measures, marking the Origin as calculated would refer to the raw data in an observation set. However, there is some discussion on the Wiki (see RepeatedObservations) whether we need a keyref to exactly one observation set or not.
* We probably need to have more than one class hierarchy and add a marker to indicate which hierarchy is formal, and which contains non-taxonomic groupings. In Brazil Kevin reported on Lucid providing a "tag" mechanism to mark "silly characters" intended only to group items like "100 worst weed species: yes/no". XPER reported a similar tag mechanism for items (instead of characters as in Lucid) to tags items for specific problems: diseases / quarantine species / disease vectors. To me both kind of problems seem to be most appropriately handled as a non-taxonomic class hierarchy. Any proposals how to handle this? As a first step an additional attribute "IsPhylogenetic" in the class hierarchy is proposed (already done).
* Glossary:
* Do we need some method to express ranges for cardinality: How many legs may there be etc.?
* Do we need some method to associate states with properties/types?
* Should the natural language markup be brought closer to xhtml by using <span class=""> for markup?
* Basing character states on concept states (= reuse of state sets in multiple characters) causes problem with order (ordinal scale) characters. The states in a character may be inherited from from multiple concepts nodes. Each of these will probably have order in the concept, but the final order can only be defined in each character. This seems unfortunate.
* Can we describe images? Is this automatically implied in reversing the association between a description and an image or not?
Images may only illustrate parts of the description.
* Can we format numeric values in reports? See DELTA *DECIMAL PLACES. How do we format sets of statistical measures in natural language or other reports? The (min-) lowerrange - central - upperrange (-max) format is not necessarily universal. Currently it is nevertheless fixed in application code and cannot be defined by users. Since many variants which individual measures are present exist, this can probably not be done with a <nop>TextBefore/After strategy (possible for Min, Max, but not for ranges with/without mean, "3-6", "5", "3-5-6"). Also, open ranges exist, which should be output as "at least 3 cm long" in natural language. Also: formats are audience/language-specific!
* Can we find a smart method to format related and dependent value like width x length?
* Using polymorphism for character definitions. Color as separate character type?
* Media Resource may need a location detail (if figure has multiple labeled fragments). Perhaps call this <nop>FragmentLabel?
* Media-"FragmentLabels", but even more the "Location" in Citations may be language sensitive! "table 1", "tab. 2", "figure 3" in English, "Abbildung 3" in German etc.!
---
<h1>Problems I believe cannot be solved in xml schema</h1>
(please tell me if you disagree!)
* We have a frequently used type that prevents validation of requiredness in SDD schema: Most labels use <nop>FormattedSimpleTextType, which if the element is required should always be non-empty. However, in contrast to simple text strings, <nop>FormattedSimpleTextType allows limited formatting (sup/sub etc.) and has a mixed content model. As a result, it is not possible in xml schema to require the length of it to be at least 1. This may be a case where we have to make a recommendation not to output empty elements, and a requirement that a missing element and an empty element are to be considered identical (applications should not attach different semantics to empty elements).
---
The missing element issue seems approachable by declaring things nillable and allowing xsi:nil="true" to distinguish from the missing case. This arose also in the discussion ResolvedTopicIsDiGIRadequateForSDD -- Main.BobMorris - 29 Apr 2004
I cannot follow your argument. The problem I state above is that I cannot constrain the Labels to actually contain a string, the element must be present but may contain nothing. There seems no mechanims in schema to prevent that. I know you warned us against mixed content model! -- Gregor Hagedorn - 3. May.
---
<h3>Appendix, see discussion marked "####" above:</h3>
Current situation in 0.9:
<pre>
Concept
Concept
Concept key="123"
ConceptStates
StateDefinition key="1"
StateDefinition key="2"
StateDefinition key="3"
Char
Categorical/States/
StateReference ref="1"
StateReference ref="2"
StateReference ref="3"
AutoAddStates ref="123"
</pre>
Proposed reversal:
<pre>
Concept
Concept
Concept key="123"
ConceptStates
StateDefinition key="1"
StateDefinition key="2"
StateDefinition key="3"
UpdateStateRefsTrigger
Character ref="123"
Char key="123"
Categorical/States/
StateReference ref="1"
StateReference ref="2"
StateReference ref="3"
</pre>
One reason why this is relevant is that I believe we have to introduce a similar mechanism for <nop>StatisticalMeasures, to allow defining sets of statistical measures centrally (min-max range, a simple range/mean type like DELTA, extensions including variance and sample size, etc.).
Also, we have modifier sets as well. Can we also run them over a concept-node-based system, so that we have very similar systems for States, Measures, and modifiers? That seems to improve the schema. Unfortunately, with modifiers I am uncertain how well this works. Modifiers almost cry for inheritance down the concept tree, something we have not yet done so far!
---
Looking for the most recent schema file? See CurrentSchemaVersion!
-- Gregor Hagedorn - 25 May 2004
@
1.16
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1089914520" format="1.0" version="1.16"}%
d106 1
a106 1
* Bob pointed out the inconsistency of declaring the standard to be independent of the biodiversity domain (thus using class/object instead of taxon/specimen) and still having taxon, taxonauthor, etc. in FormattedText. For the time being I have removed these (they are still preserved in an unused backup version of the type, so they can easily be put back).
@
1.15
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1086945540" format="1.0" version="1.15"}%
d23 1
a23 1
* <nop>GenerationMetadata changed to <nop>TransformationHistory, conceived as a collection of at least one, possibly multiple Transformation elements. Alternative names: <nop>ConversionHistory, <nop>DerivationHistory, <nop>HistoryMetadata, <nop>ContentHistoryMetadata, or <nop>DataHistoryMetadata.
d33 1
a33 1
* _New_ after Berlin meeting: attempt to use across standards (see UnifiedBioInfoFramework), therefore audience-dependent project Description and IPR-Statements changed to language dependent. Language should simplify the adoption of common framework elements for all TDWG/GBIF standards.
@
1.14
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1085765046" format="1.0" version="1.14"}%
d33 1
a33 1
* _New_ after Berlin meeting: attempt to use across standards (see OverarchingPatternsForTdwgSchemata), therefore audience-dependent project Description and IPR-Statements changed to language dependent. Language should simplify the adoption of common framework elements for all TDWG/GBIF standards.
d142 1
a142 1
* Class names (= taxon names referenced in descriptions or keys) may have to be audience specific! See AudienceSpecificClassNames for a discussion!
@
1.13
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1085756700" format="1.0" version="1.13"}%
d81 1
a81 1
* <nop>GenericStates renamed to <nop>ConceptStates (= states that are present at nodes in the concept tree; this is the only place where <nop>GenericStates
@
1.12
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1085472830" format="1.0" version="1.12"}%
d62 1
a62 1
* <nop>StatisticalMeasures renamed to <nop>UnivariateStatisticalMeasures (compare Bob's comment on TWIKI about MultivariateStatistics).
d186 1
a186 1
The missing element issue seems approachable by declaring things nillable and allowing xsi:nil="true" to distinguish from the missing case. This arose also in the discussion IsDiGIRadequateForSDD -- Main.BobMorris - 29 Apr 2004
@
1.11
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1085405580" format="1.0" version="1.11"}%
d3 1
a3 1
<h1>Changes in 0.91 beta 14 (relative to the 0.9 Dec. 1. 2003 release)</h1>
d5 3
a7 1
<strong>This is an updated version containing most of the minor changes discussed at the [[SDD2004Berlin][meeting in Berlin]]. Some changes are still pending. The current version of the SDD schema can always be found at CurrentSchemaVersion. Please do read through the report of changes, except perhaps for the few trivial at the start. Please take a look at the schema to verify that you agree with the changes and that they make sense to you.</strong>
d69 1
d71 3