Skip to content

RFC Prefix normalised PubMed ids with pmid: #25

Open
@lnielsen

Description

@lnielsen

The following PubMed ID is not correctly detected because it is also a valid EAN8 number:
https://www.ncbi.nlm.nih.gov/pubmed/?term=26037202

>>> import idutils
>>> idutils.is_pmid('26037202')
<_sre.SRE_Match at 0x10b774608>
>>> idutils.detect_identifier_schemes('26037202')
['ean8’]
>>> idutils.detect_identifier_schemes('pmid:26037202')
['pmid']

I think the main problems is when scheme detection is used together with normalisation:

>>> idutils.normalize_pmid('pmid:26037202')
'26037202'
>>> idutils.detect_identifier_schemes(idutils.normalize_pmid('pmid:26037202'))
['ean8']
>>> idutils.detect_identifier_schemes('pmid:26037202')
['pmid']

I would propose that we change PubMed normalisation to include pmid: prefix so that the following holds true:

idutils.detect_identifier_schemes(idutils.normalize_pmid('pmid:26037202')) == idutils.detect_identifier_schemes('pmid:26037202')

This is not strictly correct, but having just integers as identifiers is a bad idea anyway.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions