Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Clone this wiki locally
Darwin Core has become a broadly-used standard for biodiversity data sharing since its inception as a standard by the organization Biodiversity Information Standards (TDWG) in 2009. Despite, or because of, its popularity, people trying to use the standard continue to have questions about how to use Darwin Core and associated extensions such as Audubon Core, Resource Relationship, and Measurement Or Fact. This webinar series looks at open questions related to Darwin Core. Though the topic is broad, individual chapters in the series will focus on specific topics to any adequate level of depth. We encourage people to bring questions and to have open discussions in each webinar.
Chapter 1. Darwin Core Hour: Introduction to Darwin Core
In this first webinar in the Darwin Core Hour series, the basics of how Darwin Core functions in the biodiversity community as a dynamic standard will be covered. Topics will include what the standard consists of, how to use it, how it can be changed, sources of information, and where to go for help.
Chapter 2. Darwin Core Hour: Even Simple is Hard
In this 2nd chapter of the Darwin Core Hour series, we will glance at the list of Darwin Core terms that are recommended to be populated with values from controlled vocabularies. From that list we’ll look at some basic terms (e.g., dcterms:type, dwc:basisOfRecord, etc.) for which there are clearly recommended controlled vocabularies. We will explore how they are used in practice and how usage differs from the recommendations. We’ll discuss what the consequences might be of not following controlled vocabularies, and the converse - what the consequences might be of doing so. We’ll look at some of the secrets of how aggregators deal with the problem and the lessons we can learn from them. Finally, we will review questions that have been submitted related to this particular subject and open the webinar for further discussion.
Chapter 3. Darwin Core Hour: Thousands of shades for “Controlled” Vocabularies
In this third chapter in the Darwin Core Hour series we will visit some of the most colorful Darwin Core terms for which the use of controlled vocabularies is recommended. First, as a follow up to Chapter 2, we will expose the current content of particular terms as they are published right now via different aggregators. Second, we will try to disentangle the reasons, purposes and chances that lead us to observe this diversity of values in such fields. Then, we will explore the current availability of controlled vocabularies in different disciplines within our community and the initiatives that are addressing the problem from different perspectives. Finally, we will try to understand if there is actually a pot at the end of the rainbow: can we come up with solid, community-built controlled vocabularies?
Chapter 4. Darwin Core Hour: Evolution of Darwin Core Terms and Extensions - two extant examples for community input
Overview. Darwin Core Hour #4 shifts focus to understanding how the DwC standard terms and extensions evolve. Community input and implementation drive development of DwC. For this webinar, we selected dwc:preparations, dwc:occurrenceStatus, and dwc:establishmentMeans to show how the process works and highlight the need for outside input. Two presenters invite you to join them and bring your expertise and insights to current issues surrounding these terms.
Chapter 4, Part 1. The Need for a Darwin Core preparation extension.
Presenter Andrew Bentley, Ichthyology Collections Manager and Specify Software Usability lead, starts with an overview of the data now being shared using the dwc:preparations term. From aggregators' datasets, we can look inside and see the need for a preparations extension to Darwin Core rather than just one term. As Andy writes, the preparation field in Darwin Core is fraught with inconsistency. A quick review of the field in GBIF indicates over 120,000 individual terms. This is largely due to data providers sharing information about every preparation (skin, skeleton, tissue, etc.), preparation method (cleared, dried, stained), storage medium (in alcohol, forma, etc.), and multiple preparation locations and counts for a given specimen, in one field, dwc:preparations. As such, a preparation extension is being proposed that will allow for the mapping of multiple preparations, the separation of the multiple concepts and the addition of valuable additional fields of information to describe each preparation. Input from the community is sought to gauge interest in such an extension, to outline the necessary fields to be included, and incorporate the relevant work already done such as that of Apple Core (see reading materials).
Chapter 4, Part 2. Improving Darwin Core for Invasive Species Research. (dwc:establishmentMeans, dwc:occurrenceStatus)
Presenter Quentin Groom, at the Botanic Garden Meise, is a Data Scientist with an interest in invasive species. Quentin writes, biological invasions are an issue for species conservation and cause a wide-range of socio-economic problems. Within the community of invasive species scientists it is widely recognized that rapid, early action and integrated policies are needed to prevent, or at least slow, invasions. Towards this goal the community has proposed various data standards for specific data elements related to biological invasions. Moreover, numerous databases of invasive species incorporate bespoke terminologies. In discussions with invasive species experts four data types have been identified of particular interest. These are firstly an expression as to whether an organism is native to the place it was found. Secondly, an expression as to whether the organism exists at the location (or is extinct or absent). Thirdly, an expression about how the organism got to the location (e.g. planted) and finally, an expression of how well established the organism is. Currently, there is little guidance on how to express these concepts in Darwin Core and in some cases there is no method. In this presentation I will discuss how we can build consensus about terms and controlled vocabularies in a community. I will present current Darwin Core extensions for invasive species, but also issues related to core Darwin Core terms and how they could be improved. Lastly, I will mention why it is worth making the effort and what these improvements will enable us to do.
Chapter 5. Darwin Core Hour: Darwin Core in Practice: Introduction to the GBIF IPT
This webinar introduces the GBIF IPT, the most widely used tool to publish and share biodiversity data through the GBIF network. Demonstrating how to publish a species occurrence dataset, special emphasis will be given to how to properly map fields to DwC terms while complying with vocabularies and making sure to satisfy GBIF’s required and recommended terms. It will end with a brief look at the tool’s roadmap, with some essential information all IPT administrators need to be made aware of.
Chapter 6. Darwin Core Hour: Where am I, exactly? Darwin Core Georeferencing Terms
In this chapter of the Darwin Core Hour series, we will look in depth at two Darwin Core georeferencing terms that can be confusing when sharing location data: coordinateUncertaintyInMeters and coordinatePrecision. We will cover these specific Location terms, along with related terms needed to understand their definitions, importance, and uses. Finally, we will use these terms to demonstrate the process by which community feedback leads to improved Darwin Core documentation.
Chapter 7a. Darwin Core Hour: Aggregators - a Darwin Core View
In this next Darwin Core Hour Series, we shift to the viewpoint of large biodiversity data aggregators such as GBIF, iDigBio, VertNet, ALA, and Canadensys. In this session, we welcome GBIF and iDigBio.
GBIF aggregates the world's biodiversity data from observations to checklists to biological specimen data. iDigBio aggregates specimen data, not observations or checklists. The Darwin Core Standard plays a key role in the standardization of biodiversity data and in the design and implementation of strategies to improve data quality. Many people wonder what happens to their data after they provide it to an aggregator. Find out the answers to such questions as: what does the aggregator do to assess fitness of the data?, what are the most common data issues seen?, what does the aggregator do to data to make it easier to find when searching an aggregator database?, and how does sharing data with an aggregator benefit me as a collection manager/curator/researcher/data scientist?
GBIF: free and open access to global biodiversity data
GBIF—the Global Biodiversity Information Facility—is an open-data research infrastructure funded by the world’s governments and aimed at providing anyone, anywhere access to data about all types of life on Earth. Coordinated through its Secretariat in Copenhagen, the GBIF network of member states and organizations—formally known as Participants—provides data-holding institutions around the world with common standards and open-source tools that enable them to share information about where and when species have been recorded. This knowledge derives from many sources, including everything from museum specimens collected in the 18th and 19th century to geotagged smartphone photos shared by amateur naturalists in recent days and weeks.
The GBIF network draws all these sources together through the use of the Darwin Core standard, which forms the basis of GBIF.org’s index of hundreds of millions of species occurrence records. In the process, a number of checks and validation steps ensure data consistency and completeness of core elements. Publishers provide open access to their datasets using machine-readable Creative Commons licence designations, allowing scientists, researchers and others to apply the data in hundreds of peer-reviewed publications and policy papers each year. Many of these analyses—which cover topics from the impacts of climate change and the spread of invasive and alien pests to priorities for conservation and protected areas, food security and human health— would not be possible without this.
iDigBio: aggregating and enhancing vouchered global biocollections data
iDigBio's scope is focussed on vouchered specimen data. In addition, we accept all attendant information related to the specimen including media, relevant genetic information, and trait data. iDigBio preserves the original data as sent to us by the data provider, and enhances the data via an index according to a set of data quality metrics for improved searchability for researchers and the data providers alike. Through iDigBio, any biocollection on the planet can extend the reach of their collections. We facilitate research use of biocollections data and strive to make it easy for researchers to show what's possible with the data.
Chapter 7b. Darwin Core Hour: The Aggregator's Viewpoint - (More Than Vert)Net
In this Darwin Core Hour we will follow up in the series on aggregator perspectives with a view from VertNet. Though the taxonomic scope of VertNet as a biodiversity data aggregator is focussed on Chordates, VertNet serves a broader biodiversity data mobilization role that has no taxonomic or geographic boundaries. As an Associate Participant in GBIF, VertNet is also involved in a wide variety of other community services, including the development of, promotion of, and training on biodiversity data standards and data quality. In this webinar we will explore the series of services that VertNet provides, such as migration and data quality processes, as well as its unique extracting and searching capabilities, which allow content such as trait data (e.g., body mass and length) to be sought and retrieved. We will discuss VertNet's role in the broader data mobilization framework and how that relates to other biodiversity data sharing initiatives.
Chapter 8. Darwin Core Hour: A bite from the core - testing for data quality
In this upcoming episode of Darwin Core Hour (DCH), we head down under to join Arthur D. Chapman and Lee Belbin for a conversation about Darwin Core and Data Quality. We'll hear about collaborative efforts across the aggregator community (specifically GBIF, ALA, iDigBio, and VertNet) to harmonize data quality algorithms for downstream data use by researchers, developers, data providers, etc. These efforts are the result of the Biodiversity Information Standards (TDWG) Data Quality Interest Group (DQIG) work to develop a suite of common, shared tests and data format expectations for use by aggregators. Read more about these Test and Assertions on the DQIG Wiki at https://github.com/tdwg/bdq/wiki/Task-Group-2-(Tests-and-Assertions)-of-the-'Data-Quality'-Interest-Group-seek-your-comments
Scientists, aggregators, custodians and curators of biodiversity data have varying understanding and requirements of ‘quality’ when it comes to biodiversity-related data. Some users are only interested in having accurate names of the organisms, others are focused on location, and others the date of collection, or of course various other of the 150+ Darwin Core terms. Aggregators are primarily interested in delivering any data they can find knowing that a subset will be useful to some users. They are also keen to present data in as comprehensive form as possible with documented quality so that users can determine if the data they seek is fit for their use. Data custodians and curators want the data that they are responsible for to be as good as it can be for as many purposes as possible. To date, testing for these data quality requirements has been highly inconsistent and largely haphazard, and the documentation and annotations that accompany those data similarly inconsistent.
Task Group 2 of the TDWG Data Quality Interest Group has been working over the past two years developing a set of core tests that can be consistently applied by all users, aggregators and data custodians. Set this task it was quickly realised that it was virtually impossible to run consistent tests for all the fields in the data bases – or even all the fields (terms) documented in the Darwin Core Standard. For this reason, a subset of Darwin Core Terms – those that represent the what, where and when of the data – were the focus. A bite of the core so to speak. There was also a recognition that such core tests could be implemented by most. Tests for all these were gathered from the key groups known to be conducting data quality tests and a consistent set of tests were aligned in a template based on a set of principles. A second step has been to develop a consistent set of annotations or assertions about the data so that reporting on the quality can be done in a consistent and stable manner. This will help users to know that the data they obtain from one source has been tested in the same way and documented in the same way as data from another source.
Generic code is now being developed for each of the 110 or so data quality tests along with test data sets for ensuring consistent implementations. Institutions can take the core set of tests (or a subset of them) and use the generic code and test data set to implement a consistent in-house data quality ‘test and assertion’ regime. Different Database Management Systems use different software and structures and thus local implementations may initially produce different results. By running the tests against the reference test data set, users will be able to determine if their implementation produces the consistent result that it was meant to and modify the implementation accordingly.
Several of the large data aggregators, including GBIF, the Atlas of Living Australia and iDigBio have agreed to implement the core set of tests and assertions and are in the process of implementing them now.
TDWG2017. Standards in Action: Darwin Core Hour
Darwin Core Wieczorek et al. 2012 has become broadly used for biodiversity data sharing since its ratification as a standard in 2009. Despite its popularity, or perhaps because of it, questions about Darwin Core, its definitions, and its applications continue to arise. However, no easy mechanism previously existed for the users of the standard to ask their questions and to have them answered and documented in a strategic and timely way. In order to close this gap, a double-initiative was developed: the Darwin Core Hour (DHC) Darwin Core Hour Team 2017a and the Darwin Core Questions & Answers Site Darwin Core Hour Team 2017b. The Darwin Core Hour Zermoglio et al. 2017 is a webinar series in which particular topics concerning the Darwin Core standard and its use are presented by more experienced and vested community members for the benefit of and discussion amongst the community as a whole. All webinars are recorded for broader accessibility. The Darwin Core Questions & Answers Site is a GitHub repository where questions from the community are submitted as issues, then discussed and answered in the repository as a means of building up documentation. These two instances are tightly linked and feed each other Fig. 1.Questions from the community, some arising during the webinars, turn into issues and are then answered and shaped into documentation, while some questions give birth to new webinar topics for further discussion. So far, this double-initiative model has proved useful in bringing together communities from different geographic locales, levels of expertise, and degrees of involvement in open dialogue for the collaborative evolution of the standard. At the time of this presentation, the group has produced nine webinar sessions and provided a corpus of documentation on several topics. We will discuss its current status, origins and potential of the double-initiative model, community feedback, and future directions, in addition to inviting the TDWG community to join efforts to keep the Darwin Core standard "in action".
TDWG2017. Darwin Core Hour Extensions
Special edition of the Darwin Core Hour given at TDWG 2017 as part of a larger presentation on the Darwin Core Hour in general Standards in Action: Dawrin Core Hour. This mini-hour (15 minutes in length) explains what Darwin Core extensions are, and how they work in biodiversity data publishing.
Chapter 9. Darwin Core Hour: Kurator Web - for Cleaner Biodiversity Data
When we think about biodiversity data and its use, we often immediately wonder about the data quality and how to improve it. The Kurator project is one of the initiatives addressing this issue, and aims at building data quality control and enhancement tools that can be readily used by people with different technical skill levels. One such tool is the Kurator Web application. This tool allows users to execute web-based data quality improvement workflows on biodiversity data, especially on data shared using Darwin Core terms, without the need for programming expertise. In this webinar we will demonstrate the use of Kurator Web to run pre-existing data quality workflows, some of which include: aligning datasets with Darwin Core terms, comparing unique values in datasets with controlled vocabularies, and providing suggestions for improvements for existing data. Finally, we will provide some context on how these tools can integrate with broader workflows and some perspective on the future of biodiversity data quality.
Chapter 10. Darwin Core Hour: Audubon Core and 3D Biodiversity Data: Metadata, Practice, and Unification of Efforts
A growing mode of digitization of natural history collection objects is 3D digitization, which includes three main acquisition techniques: surface scanning (structured light or laser scanners), volumetric scanning (microCT or MRI), and photogrammetry (structure from motion). There is now burgeoning interest in and tremendous need for describing 3D data files with standard vocabularies in the interest of promoting broad accessibility and long-term digital preservation. Audubon Core is an existing vocabulary and extension to DarwinCore that is used to describe digital media files representing natural history objects. It is not an entirely new vocabulary with many terms borrowed from Dublin Core, Darwin Core and more. It also intended to describe different kinds of digital data representing different creation methods and file formats. We overview several different 3D data collection modalities specifying the details needed for understanding how 3D data was generated and processed. We investigate the utility of Audubon Core for describing these 3D modalities. Questions we ask are which existing terms can be used for describing new 3D modalities, whether new terms are needed, whether certain 3D modalities need specific terms not applicable to other modalities, what 3D data formats should be emphasized for preservation and access, and how to pursue formally acquiring new terms either through creation of new vocabularies or extending existing ones.
Chapter 11. Darwin Core Hour: Brainstorming – Inviting the Community to Plan for Next Year
Hi there! Our last Darwin Core Hour of the year is coming! We would like to take this opportunity to invite you all to an open conversation. During this webinar we will briefly go through the experience of putting together a Darwin Core Hour, i.e., how it works: from the inside. We will assess the topics covered and the ones yet to come, and we will put together a plan for next year. We would love to have you participate, bring your input and ideas for making this initiative grow and address the interests and concerns of the community.
Chapter 12. Darwin Core Hour: Making DNA and tissue collections available by using the GGBN extensions with IPT
In this Darwin Core hour we will give a brief overview about the Global Genome Biodiversity Network (GGBN) and the GGBN Data Standard. This standard covers facts about DNA and tissue samples and complements Darwin Core and ABCD. It can be used with both and is not a stand-alone solution. GGBN is dedicated to DNA and tissue collections and makes use of the GBIF infrastructure to reference to the underlying specimens. We will demonstrate how to use the GGBN extensions with IPT in detail and how it finally looks like in the GGBN Data Portal. Today more than 400.000 DNA and tissue samples and 200.000 voucher specimens are available through the GGBN portal. Specify has now included an automated export for the GGBN extensions in its latest release. But many others partners also have started to implement the standard into their systems (e.g. Arctos) and more than 1.7 million new sample records will be made available in 2018.
Chapter 13. The Problem of Time: Dealing with Paleontological and Zooarchaeological Specimens in Darwin Core
The temporality of specimens is a quintessential piece of information that undeniably adds research value. Part of what makes specimens so invaluable is the fact that they represent life in a certain form at a particular place and time. However, there are some complications when it comes to reporting and comparing collection dates and the ages or chronology of ‘modern’ specimens to those from paleontological or zooarchaeological relevant time frames. This Darwin Core Hour aims to take a deeper look at some of these complexities, to begin community discussions of how these important data should be portrayed in Darwin Core, and to provide a starting point for these discussions by introducing the Chronometric extension for Darwin Core that is currently under development.
However, there are some complications when it comes to reporting the collecting dates versus the ages or chronology of ‘modern’, paleontological, and zooarchaeological specimens, as well as making meaningful comparisons across represented time frames (e.g., modern, archaeological, paleontological).
Chapter TBD. Darwin Core Hour: Challenges in Combining Neontological and Paleontological Data (tentative title)