PIDs Persistent Identifiers

Ondřej Košarko edited this page Jun 12, 2017 · 3 revisions

Persistent identifiers

We summarise policies and our usage of PIDs below.

What should a PID be resolved to (bits, web page, metadata)?

By default to a landing page (web page), where a user can read more information on the data (metadata) and download the data (if possible). With a specific query, resolve for "pure metadata". That is supposed to be mainly for automation.

Technical details

Should it point to a state in time or can it change (what about new versions, new formats)?

PID should point to a state of data in time. "Data" here means primarily the abstract information, so a trivial change in format is OK. A non-trivial format change is borderline and it is better to assign it a new PID. A format change that changes the information in data, like exporting PDT 2.5 from PML toCoNLL format (that can only store a subset of the information) clearly requires a new PID.

Who should be able to change the actual data where a PID points to?

Only repository administrators. Persistency is very important. That being said not all changes are equal and maybe submitters (and definitely Editors) could add new files (like readmes, documentation) and change metadata. Submitters' changes should still go to through the Editors.

We should also simplify creating new versions (derived records) from existing records (and specify the relation of records by dc.relation: subset, version-of, supersedes, etc.).

What should be the granularity (file, submissions, word, it depends)?

Submission. This way submitters can decide the granularity. They submit (and describe) what should be preserved with PID (a corpus, a sub-corpus of an existing corpus, one document, or even a special sentence).

PIDs could be used also for pointing to words in a corpus, for instance, but there are significantly more effective ways to do it.

New handle subsystem implementation in DSpace

New handle functionality was implemented in DSpace. This includes the following:

Per community handle prefix

Functionality to define different handle prefixes for different communities was added. This can be useful for instance in case of merging multiple repositories into one, where the former repositories are transformed into communities.

Example configuration:

# per community pid configurations for pid prefixes of the format:
# communtity=<community ID>,prefix=<prefix>,type=<local|epic>,canonical_prefix=<URL of handle>,subprefix=<subprefix>
# multiple configurations can be given separated by semicolon
# default configuration should have asterisk as the community ID
# subprefix is only used for local handles = community=*,prefix=11858,type=epic,canonical_prefix=,subprefix=1

the subprefix keyword is optional and can be used to generate prefixes of the form: <prefix>/<subprefix>-<handle>.

With #766 the default (community=*) is used also for new collections and communties

In LINDAT/CLARIN project the following subprefixes are used for new items starting from June 17th, 2014:

| *Subprefix* | *Description*         |
|  1          | Common submissions    |
|  5          | Weblicht Web Services |
|  6          | Demos                 |

External handles

New functionality to store links to external resources in the handle table (so called External handles) was added. It is now possible to have records that do not point to an object in the database, but rather to some defined absolute URL.

This serves for two main purposes:

  • it can be used as a mean to point multiple handles at the same record if needed (for instance if an item is a duplicate of some existing item and has to be deleted or if the handle prefix changes, but old handles should remain resolvable)
  • it can be used to point to external non data entities such as services

Table handle was extended by adding new url column:

                 Table "public.handle"
      Column      |          Type           | Modifiers 
 handle_id        | integer                 | not null
 handle           | character varying(256)  | 
 resource_type_id | integer                 | 
 resource_id      | integer                 | 
 url              | character varying(2048) | 

The following record is an example of an external handle pointing at an external resource:

 handle_id |             handle             | resource_type_id | resource_id |                                     url                                     
       288 | 11234/5-CESILKO-URL            |                  |             |

The following record is an example of an external handle pointing at an existing handle:

 handle_id |             handle             | resource_type_id | resource_id |                                     url                                     
       123 | 11858/00-097C-0000-0001-4870-7 |                  |             |

Changing handle prefix

Although a tool for changing handle prefix already existed in DSpace as a command line tool (bin/dspace), the functionality was reimplemented and corrected and a GUI was added to facilitate this task and prevent user made mistakes. Old handles are preserved in item metadata and if needed the old handles are archived i.e. preserved in handle table but pointed to new handles via the external handle functionality mentioned above.

New GUI for managing handles

New GUI for managing handles has been developed. This enables users to conveniently browse through existing handles, add new external handles and edit existing external handles as well as change handle prefix.

Screenhost of listing of existing handles:


Screenshot of form for changing handle prefix:

Change handle prefix

Screenshot of form for editing external handle:

Edit handle

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.