-
Notifications
You must be signed in to change notification settings - Fork 0
Opportunities Citation Resolution
artifact_identifiers — N:1 alias concordance (any identifier resolves to P-number) artifact_identifier_evidence — standard evidence pattern for disputed identifications artifact_identifier_decisions — standard decision pattern for contested ID assignments publications — structured bibliographic registry with series membership and supersession chains publication_authors — structured authorship linking publications to scholars table artifact_editions — artifact-publication bridge with is_current_edition consensus flag and per-artifact supersedes chain artifact_edition_evidence — standard evidence pattern for publication links artifact_edition_decisions — audit trail for is_current_edition designation annotation_runs — 1 new column: publication_id (nullable FK to publications, alongside existing publication_ref freetext) 4 new migration steps added
8 new tables, 1 new column on annotation_runs Table count: ~45 (pre-citation) -> ~53 (with citation resolution) 3 core tables + 2 supporting + 3 trust (evidence/decision), following existing patterns exactly
Dual resolution paths? Yes — publication references stored in both artifact_identifiers (fast single-index lookup) AND artifact_editions (rich metadata with page numbers, edition_type, currency). Accepted duplication for query flexibility. Two supersession levels? Yes — publications.supersedes_id for series-level ("RIME 4 supersedes VAB 6"), artifact_editions.supersedes_id for per-artifact granularity (one pub may supersede another for some texts but not all). Freetext parsing? Partial coverage initially. CDLI publication_history is inconsistent freetext. Import what regex can parse, preserve raw string, flag unparsed entries. NO data loss. publication_id vs publication_ref? Both — nullable FK added alongside existing freetext. Incremental migration: NULL until matched, freetext stays as fallback for unresolved citations. Flat columns stay? Yes — artifacts.museum_no and excavation_no remain for display convenience. artifact_identifiers is authoritative for resolution queries.
Two resolution paths for any citation:
Path A (fast): artifact_identifiers WHERE identifier_normalized = 'rime 4.3.6.1' -> P363653 Path B (rich): artifact_editions WHERE reference_normalized = 'rime 4.3.6.1' -> P363653 + edition_type, pages, currency
Then: full edition history via recursive supersedes_id chain VAB 6.1 (1907, hand_copy) -> RIME 4.3.6.1 (1990, full_edition) -> Oracc/OBMC (2010, digital_edition)
- Resolve any identifier format to a P-number (museum_no, excavation_no, publication, accession, ARK)
- Find current edition of any artifact
- Get full edition history with supersession chain
- Find ambiguous identifiers (same string -> multiple P-numbers)
- Research gaps: texts without digital editions
- All publications by a scholar
- All artifacts published in a given series
- Direct bibliographic link from any token_reading/lemmatization/translation via annotation_runs.publication_id
- artifact_identifiers: Trust that two identifiers refer to the same physical object
- publications: Trust in the scholarly record itself (supersession makes edition authority explicit)
- artifact_editions: Trust in edition currency (is_current_edition is the consensus flag with full decision audit trail)
- artifacts.museum_no (~353k rows), excavation_no, primary_publication
- CDLI API publications[] array (structured bibtex)
- CDLI CSV publication_history (partial regex parse, raw preserved)
- CDLI accession_no, external_id fields
- ORACC project membership (one digital_edition per project)
- ORACC catalogue cross-references
- Manual curation for major series (RIME, SAA, RINAP, VAB, PBS, ATU)
- Citation Resolution API (REST/GraphQL endpoints)
- Fuzzy matching with Elasticsearch/Levenshtein
- Browser extension for auto-linking citations in web pages
- Citation generation (Chicago, APA, BibTeX formatting)
- Citation graph (which publications cite which texts)
- Subscription/notification system for new editions
- ML-based citation extraction from PDFs These remain in the vision doc (Assyriology Opportunities.md, section 6). The schema additions provide the data model foundation they will build on.
Source: github.com/wittkensis/glintstone · Issues · Edit this wiki
Start here
Getting Started
Overview
Data Model
- Data Sources
- Data Quality
- Data Issues
- Import Pipeline Guide
- ML Integration
- Citation Pipeline Summary
Reference — Data Model
Reference — API
Reference — MCP
Opportunities
Personas
Project
Research