Skip to content

1. Overview of Latimer Core

Sarah Vincent edited this page Nov 30, 2023 · 9 revisions

1.1 Summary

Digitisation of objects held in natural history collections is accelerating worldwide, but object-level digitisation is a time- and resource-intensive process. By describing the objects in their care at a higher level, organisations are able to publicise and share the core characteristics of their collections more easily. Collection-level descriptions are an established concept in the natural science community, but no standards or frameworks exist in this space. Latimer Core has been developed to fill this gap: it is designed to be a flexible, sustainable framework of terms that can be assembled to create records that accurately represent a grouping of objects at any level of granularity, from an entire collection of an institution to a few objects in a single drawer.

1.2 What is Latimer Core?

Latimer Core (LtC), named after Marjorie Courtenay-Latimer and the Coelacanth, is a data standard for describing collections. It has been designed to support the representation and discovery of the groups of items that those collections and their subcomponents encompass. The LtC classes and their properties (collectively called terms) aim to represent information that describes these groups of things in enough detail to inform deeper discovery of the resources they contain.

The LtC standard has significant overlap with existing data standards that represent concepts such as individual objects and occurrences (Darwin Core, ABCD) and organisations, people and activities (W3C ORG Ontology, W3C PROV Ontology, Schema.org). Where possible, the LtC standard has either borrowed terms directly from these other standards or is less formally aligned with them. As far as possible, the terms included in the LtC standard should not preclude their use across domains. The LtC standard also introduces more rules around data structure than are applied in some other TDWG standards (e.g., Darwin Core), in that properties may only be used within the context of their parent class, and some limitations are applied to the relationships that may exist between classes.

LtC is intended to be sufficiently flexible and scalable to apply to a wide range of use cases, from describing the overall collections holdings of an institution to the contents of a single drawer of material. Various approaches are used to enable this flexibility, including the use of generic classes to represent organisations, people, roles and identifiers, and allowing flexible relationships for constructing data models that meet different needs. The Latimer Core Scheme concept is introduced to enable adopters to specify rules and constraints in the use of the LtC standard for a specific implementation. For example, this may specify the subset of the LtC classes and properties that are relevant to the use case and which of those are mandatory or optional, and define a set of metrics (e.g., object counts or digitisation percentages) to be captured for a breakdown of the collections.

The central concept of the standard is the ltc:ObjectGroup class, which represents ‘an intentionally grouped set of objects with one or more common characteristics’. Arranged around ltc:ObjectGroup are a set of classes that are commonly used to describe and classify the objects within ltc:ObjectGroup, classes covering aspects of the custodianship, management and tracking of the collections, a generic class (ltc:MeasurementOrFact) for storing qualitative or quantitative measures for the ltc:ObjectGroup, and a set of classes that are used to describe the structure and description of the dataset. A summary of the classes within the standard is shown in Figure 1 below.



Figure 1: Overview and informal categorisation of Latimer Core standard classes for describing the object group’s characteristics (green), collections custody (purple), generic reusable information types (dark blue), metrics (red), and data structure and links (light blue).

1.3 How is Latimer Core structured?

The LtC standard needs to be flexible in order to support the wide range of use cases identified for the structural, qualitative, and quantitative aspects of collections. There are some rules for which classes may be linked together and a very small number of mandatory elements to enforce a basic level of consistency of use across LtC implementations. Within these constraints, there is considerable scope for using the standard with different data modeling paradigms, such as relational, graph and dimensional approaches.

With this flexibility, there are challenges in ensuring that LtC datasets can achieve the required degree of interoperability. It is anticipated that, for different use cases, application profiles will be developed to support more targeted and simplified implementations within the wider LtC standard scope. For example, these may define:

  • which classes and properties from the overall standard are included, which are mandatory, and which may be repeated
  • the controlled vocabularies to be used for particular terms
  • the relationship model to be used
  • the metrics and descriptors to be included using the Measurement Or Fact class (see Metrics and Narratives section)

The Latimer Core Scheme class, in conjunction with the related Scheme Term and Scheme Measurement Or Fact classes, provide a rudimentary starting place for defining these application profiles. Depending on the serialisation and/or data platform used to manage LtC data for a given implementation, more specific and detailed approaches may also be used to achieve this, such as JSON Schema, XML Schema, RDF Shape Expressions or relational database schemas.

1.4 Where and how can Latimer Core be used?

Latimer Core is intended to be a generic data standard that allows collection descriptions to be represented in a range of formats to suit different use cases. These may include simple CSV files, exchange and linked data formats such as JSON(-LD), XML and RDF, and relational and non-relational database models. While the majority of reference examples are currently demonstrated with JSON, work will continue on building up a suite of examples and reference implementations using other formats.

Audiences

There are three main audiences for this documentation:

  1. Data aggregators - users or groups who have a need to receive standardised collection descriptions.
  2. Data providers - users or groups who have collections information that they would like to share.
  3. Data users - users or groups who want to understand in more detail the information provided by LtC collection records and collection catalogues implementing the LtC standard.

Taking into account the variety of use-cases (documented in this wiki's Use Case section, a github document, and a google sheet) for the standard, it is intended that data-aggregators should define a minimal set of classes, properties and relationships between them that best suit their needs. The ltc:LatimerCoreScheme class is important in this context as a method for communicating to data providers the shape and types of information that they need to provide.

Who uses Latimer Core?

Whilst the initial use cases for the development of the LtC standard were contributed by natural history museums and biodiversity informatics communities, it is not intended to preclude extension to include other types of collection. The standard is intended to be broadly useful for:

  • data providers and users who need to share and aggregate structured collections information
  • data aggregators and collections registries who need to standardise collection- and institution-level data from multiple providers
  • collections managers who need to inventory backlog and prioritise collections management and digitisation efforts
  • museum staff who need to generate collection metrics and summaries
  • communities who need to track and share collection history and provenance
  • developers of collections management systems and other digital tools who need to support collection-level data and functionality within their platforms

1.5 Relationships with other standards

Many of the data concepts relevant to collection descriptions are similar to those reflected in specimen-level data standards - such as the type of object, geographic origins and taxonomy - or else are generic concepts like people, addresses and identifiers. Consequently, a concerted effort was made to borrow appropriate terms from existing standards rather than defining them anew, which brings the benefit of closer alignment with related standards.

To improve interoperability and reduce redundancy, some LtC properties reference and re-use properties from existing standards. Where the LtC version of a property’s definition or permitted usage differs from the original, it will be narrower in scope. The provenance of any borrowed properties is referenced in the LtC normative documentation. Any definitions which have been modified will note the LtC definition in the ‘Usage’ field and the original in the ‘Definition’ field, for example:

Label Genus
Definition The full scientific name of the genus in which the taxon is classified.
Usage The full scientific name of the genus in which the collection’s taxa are classified.
Existing property dwc:genus
Existing class dwc:Taxon

Table 1: Example of a property used in the Latimer Core standard that has been borrowed from the dwc:Taxon class

Any terms which aren’t borrowed directly from elsewhere, but are conceptually aligned with terms found in other standards reference this in the LtC term-level ‘notes’ field.

The full list and provenance of borrowed terms can be found in the LtC normative documentation. A summary is provided in Table 2, including also sources that informed the development of LtC and are the origin of informally derived terms.

Standard name Namespace Documentation
Darwin Core (including extensions) dwc https://dwc.tdwg.org/terms/
ABCD(EFG) abcd https://abcd.tdwg.org/
AIISO aiiso https://vocab.org/aiiso/
Dublin Core dcterms https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
PROV (Provenance) prov https://www.w3.org/TR/prov-overview
The Organization Ontology org http://www.w3.org/TR/vocab-org/
FOAF foaf http://xmlns.com/foaf/spec/
Schema.org schema https://schema.org/

Table 2: List of standards from which terms in LtC have been borrowed or derived.

Data Type Standard
Date ISO 8601

Table 3: List of data type standards recommended for use with Latimer records.

1.6 Mapping using SKOS and SSSOM

Over the course of LtC development, efforts were made to identify and align with relevant standards and vocabularies, and adopt existing terms from them where possible. During the expert review phase, a more structured approach was proposed and implemented using the Simple Knowledge Organization System (SKOS) mappingRelation vocabulary. This exercise helped to better describe the nature of the mappings between new LtC terms and related terms in other standards, and also to validate decisions around the borrowing of existing terms for inclusion in LtC.

Properties used:

Table 4 shows an the example of the mapping between the ltc:collectionName and the Dublin Core dcterms:title property.

term_localName skos_mappingRelation related_termName
http://rs.tdwg.org/ltc/terms/collectionName broadMatch http://purl.org/dc/terms/title

Table 4: Example of SKOS mapping

A provisional set of the LtC SKOS mappings can be found in the Latimer Core GitHub repository.

A further exercise was also carried out that used elements of the Simple Standard for Sharing Ontological Mappings (SSSOM) to start to develop a more comprehensive set of metadata around these mappings, including further information about the justification for the mapping decisions and provenance around when the mappings were created and by whom. The SSSOM mappings are also available in the LtC GitHub repository, and as with the SKOS mappings are expected to be refined and expanded over time.

1.7 Extending the standard

Latimer Core is intended to be a standard that can be expanded to include any genre of collection, physical or digital. As such an attempt has been made to ensure that the classes and properties contained within it do not preclude such extensions.

Flexibility

  • The modular design of the standard allows a high degree of flexibility in terms of schema definition: composition and architecture can be designed around the requirements of the designers system/environment, but class and term definitions, data types and vocabularies are controlled in order to support and encourage schema and LtC re-use. (flexibility is in configuration, not customisation)
  • Mandatory fields are rarely specified in the LtC standard. They are intended to be defined at the schema level, according to the requirements of the specific use-case or implementation.
  • Generic classes are used where possible to provide the opportunity to add non-standard or unqualified metrics and identifiers to a LtC record, but class properties should hold enough information to enable definition of logical rules to support automated parsing, ingestion and data quality measures.

Extensibility

  • Extensibility was a priority during design: requirements change, most of the collections that we are seeking to describe using the standard haven’t been digitised before, so novel use cases are likely to emerge.
  • Generic classes support extensibility: if new concrete classes are required, they can be added without affecting the standard definition or breaking functionality of existing implementations/schema (state/behaviour will not diverge significantly from the parent class/other implementations of the same parent class).
  • Smaller, single-responsibility classes favoured: easier to understand + communicate to users, fewer dependencies, can be extended without impacting other classes.