Skip to content

Opportunities Citation Resolution

wittkensis edited this page May 18, 2026 · 1 revision

Status: Implemented into schema.

8 new tables + 1 column modification:

artifact_identifiers — N:1 alias concordance (any identifier resolves to P-number) artifact_identifier_evidence — standard evidence pattern for disputed identifications artifact_identifier_decisions — standard decision pattern for contested ID assignments publications — structured bibliographic registry with series membership and supersession chains publication_authors — structured authorship linking publications to scholars table artifact_editions — artifact-publication bridge with is_current_edition consensus flag and per-artifact supersedes chain artifact_edition_evidence — standard evidence pattern for publication links artifact_edition_decisions — audit trail for is_current_edition designation annotation_runs — 1 new column: publication_id (nullable FK to publications, alongside existing publication_ref freetext) 4 new migration steps added

Schema Update Summary

Changes

8 new tables, 1 new column on annotation_runs Table count: ~45 (pre-citation) -> ~53 (with citation resolution) 3 core tables + 2 supporting + 3 trust (evidence/decision), following existing patterns exactly

Tradeoffs

Dual resolution paths? Yes — publication references stored in both artifact_identifiers (fast single-index lookup) AND artifact_editions (rich metadata with page numbers, edition_type, currency). Accepted duplication for query flexibility. Two supersession levels? Yes — publications.supersedes_id for series-level ("RIME 4 supersedes VAB 6"), artifact_editions.supersedes_id for per-artifact granularity (one pub may supersede another for some texts but not all). Freetext parsing? Partial coverage initially. CDLI publication_history is inconsistent freetext. Import what regex can parse, preserve raw string, flag unparsed entries. NO data loss. publication_id vs publication_ref? Both — nullable FK added alongside existing freetext. Incremental migration: NULL until matched, freetext stays as fallback for unresolved citations. Flat columns stay? Yes — artifacts.museum_no and excavation_no remain for display convenience. artifact_identifiers is authoritative for resolution queries.

What This Enables

Two resolution paths for any citation:

Path A (fast): artifact_identifiers WHERE identifier_normalized = 'rime 4.3.6.1' -> P363653 Path B (rich): artifact_editions WHERE reference_normalized = 'rime 4.3.6.1' -> P363653 + edition_type, pages, currency

Then: full edition history via recursive supersedes_id chain VAB 6.1 (1907, hand_copy) -> RIME 4.3.6.1 (1990, full_edition) -> Oracc/OBMC (2010, digital_edition)

Key Queries Enabled

  • Resolve any identifier format to a P-number (museum_no, excavation_no, publication, accession, ARK)
  • Find current edition of any artifact
  • Get full edition history with supersession chain
  • Find ambiguous identifiers (same string -> multiple P-numbers)
  • Research gaps: texts without digital editions
  • All publications by a scholar
  • All artifacts published in a given series
  • Direct bibliographic link from any token_reading/lemmatization/translation via annotation_runs.publication_id

Trust Dimensions

  • artifact_identifiers: Trust that two identifiers refer to the same physical object
  • publications: Trust in the scholarly record itself (supersession makes edition authority explicit)
  • artifact_editions: Trust in edition currency (is_current_edition is the consensus flag with full decision audit trail)

Seed Data Sources

  • artifacts.museum_no (~353k rows), excavation_no, primary_publication
  • CDLI API publications[] array (structured bibtex)
  • CDLI CSV publication_history (partial regex parse, raw preserved)
  • CDLI accession_no, external_id fields
  • ORACC project membership (one digital_edition per project)
  • ORACC catalogue cross-references
  • Manual curation for major series (RIME, SAA, RINAP, VAB, PBS, ATU)

What This Does NOT Yet Cover (Future Work)

  • Citation Resolution API (REST/GraphQL endpoints)
  • Fuzzy matching with Elasticsearch/Levenshtein
  • Browser extension for auto-linking citations in web pages
  • Citation generation (Chicago, APA, BibTeX formatting)
  • Citation graph (which publications cite which texts)
  • Subscription/notification system for new editions
  • ML-based citation extraction from PDFs These remain in the vision doc (Assyriology Opportunities.md, section 6). The schema additions provide the data model foundation they will build on.

Clone this wiki locally