Skip to content
Github mirror of "analytics/wmde/WD/WD_identifierLandscape" - our actual code is hosted with Gerrit (please see for contributing)
R Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Wikidata Identifier Landscape: Public Datasets

They are hosted in:

- WD_ExtIdentifiers_UpdateInfo.csv – The timestamp of the latest update. The dashboard will be updated manually until the WD JSON dump copy in hdfs is not productionized (Phab T209655).
- WD_ExternalIdentifiers_Co-Occurence.csv – A symmetric identifier x identifier co-occurence matrix.
- WD_ExternalIdentifiers_DataFrame.csv – A list of all external identifiers with (a) their P numbers, (b) labels, (c) classes to which they belong (in a sense of P31), (d) their classes’ labels.
- WD_ExternalIdentifiers_JaccardDistance.csv – A symmetric identifier x identifier Jaccard distance matrix.
- WD_ExternalIdentifiers_Stats.csv – Essential statistics on WD external identifier usage.
- WD_ExternalIdentifiers_Usage.csv – Essentially the same data set as  WD_ExternalIdentifiers_DataFrame.csv except for it includes the identifier usage statistics.
- WD_ExternalIdentifiers_tsneMap.csv – the 2D t-SNE solution coordinates. 

--- NOTES:

(1) Feedback should be sent to:
(2) Wikidata Identifier Landscape is produced by Goran S. Milovanovic working as a contractor for WMDE, via a contract established between Data Kolektiv, Belgrade and WMDE, Berlin.

You can’t perform that action at this time.