Open
Description
The following PubMed ID is not correctly detected because it is also a valid EAN8 number:
https://www.ncbi.nlm.nih.gov/pubmed/?term=26037202
>>> import idutils
>>> idutils.is_pmid('26037202')
<_sre.SRE_Match at 0x10b774608>
>>> idutils.detect_identifier_schemes('26037202')
['ean8’]
>>> idutils.detect_identifier_schemes('pmid:26037202')
['pmid']
I think the main problems is when scheme detection is used together with normalisation:
>>> idutils.normalize_pmid('pmid:26037202')
'26037202'
>>> idutils.detect_identifier_schemes(idutils.normalize_pmid('pmid:26037202'))
['ean8']
>>> idutils.detect_identifier_schemes('pmid:26037202')
['pmid']
I would propose that we change PubMed normalisation to include pmid:
prefix so that the following holds true:
idutils.detect_identifier_schemes(idutils.normalize_pmid('pmid:26037202')) == idutils.detect_identifier_schemes('pmid:26037202')
This is not strictly correct, but having just integers as identifiers is a bad idea anyway.