Opportunities Citation Resolution

Status: Implemented into schema.

8 new tables + 1 column modification:

artifact_identifiers — N:1 alias concordance (any identifier resolves to P-number) artifact_identifier_evidence — standard evidence pattern for disputed identifications artifact_identifier_decisions — standard decision pattern for contested ID assignments publications — structured bibliographic registry with series membership and supersession chains publication_authors — structured authorship linking publications to scholars table artifact_editions — artifact-publication bridge with is_current_edition consensus flag and per-artifact supersedes chain artifact_edition_evidence — standard evidence pattern for publication links artifact_edition_decisions — audit trail for is_current_edition designation annotation_runs — 1 new column: publication_id (nullable FK to publications, alongside existing publication_ref freetext) 4 new migration steps added

Schema Update Summary

Changes

8 new tables, 1 new column on annotation_runs Table count: ~45 (pre-citation) -> ~53 (with citation resolution) 3 core tables + 2 supporting + 3 trust (evidence/decision), following existing patterns exactly

Tradeoffs

Dual resolution paths? Yes — publication references stored in both artifact_identifiers (fast single-index lookup) AND artifact_editions (rich metadata with page numbers, edition_type, currency). Accepted duplication for query flexibility. Two supersession levels? Yes — publications.supersedes_id for series-level ("RIME 4 supersedes VAB 6"), artifact_editions.supersedes_id for per-artifact granularity (one pub may supersede another for some texts but not all). Freetext parsing? Partial coverage initially. CDLI publication_history is inconsistent freetext. Import what regex can parse, preserve raw string, flag unparsed entries. NO data loss. publication_id vs publication_ref? Both — nullable FK added alongside existing freetext. Incremental migration: NULL until matched, freetext stays as fallback for unresolved citations. Flat columns stay? Yes — artifacts.museum_no and excavation_no remain for display convenience. artifact_identifiers is authoritative for resolution queries.

What This Enables

Two resolution paths for any citation:

Path A (fast): artifact_identifiers WHERE identifier_normalized = 'rime 4.3.6.1' -> P363653 Path B (rich): artifact_editions WHERE reference_normalized = 'rime 4.3.6.1' -> P363653 + edition_type, pages, currency

Then: full edition history via recursive supersedes_id chain VAB 6.1 (1907, hand_copy) -> RIME 4.3.6.1 (1990, full_edition) -> Oracc/OBMC (2010, digital_edition)

Key Queries Enabled

Resolve any identifier format to a P-number (museum_no, excavation_no, publication, accession, ARK)
Find current edition of any artifact
Get full edition history with supersession chain
Find ambiguous identifiers (same string -> multiple P-numbers)
Research gaps: texts without digital editions
All publications by a scholar
All artifacts published in a given series
Direct bibliographic link from any token_reading/lemmatization/translation via annotation_runs.publication_id

Trust Dimensions

artifact_identifiers: Trust that two identifiers refer to the same physical object
publications: Trust in the scholarly record itself (supersession makes edition authority explicit)
artifact_editions: Trust in edition currency (is_current_edition is the consensus flag with full decision audit trail)

Seed Data Sources

artifacts.museum_no (~353k rows), excavation_no, primary_publication
CDLI API publications[] array (structured bibtex)
CDLI CSV publication_history (partial regex parse, raw preserved)
CDLI accession_no, external_id fields
ORACC project membership (one digital_edition per project)
ORACC catalogue cross-references
Manual curation for major series (RIME, SAA, RINAP, VAB, PBS, ATU)

What This Does NOT Yet Cover (Future Work)

Citation Resolution API (REST/GraphQL endpoints)
Fuzzy matching with Elasticsearch/Levenshtein
Browser extension for auto-linking citations in web pages
Citation generation (Chicago, APA, BibTeX formatting)
Citation graph (which publications cite which texts)
Subscription/notification system for new editions
ML-based citation extraction from PDFs These remain in the vision doc (Assyriology Opportunities.md, section 6). The schema additions provide the data model foundation they will build on.

Source: github.com/wittkensis/glintstone · Issues · Edit this wiki

Home

Start here

Getting Started

Overview

Data Model

Reference — Data Model

Reference — API

Reference — MCP

Opportunities

Personas

Project

Research

Opportunities Citation Resolution

Status: Implemented into schema.

8 new tables + 1 column modification:

Schema Update Summary

Changes

Tradeoffs

What This Enables

Key Queries Enabled

Trust Dimensions

Seed Data Sources

What This Does NOT Yet Cover (Future Work)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally