-
Notifications
You must be signed in to change notification settings - Fork 0
Opportunities Knowledge Graph
11 tables added to the schema across three interconnected systems:
Named Entity Registry (5 tables) — named_entities stores graph nodes (persons, deities, places, institutions) extracted from ORACC lemmatization POS tags (PN, DN, GN, TN, etc.). entity_aliases handles deduplication merges. entity_mentions links tokens or lines to named entities with full provenance. entity_mention_evidence and entity_mention_decisions provide trust infrastructure. ORACC glossary OIDs serve as primary dedup key across projects.
Entity Relationships (4 tables) — relationship_predicates is a controlled vocabulary for ~13 Tier 1 stable predicates (father_of, patron_deity_of, located_in, etc.). entity_relationships stores graph edges with hybrid predicate governance: Tier 1 uses enum FK, Tier 2-3 volatile predicates (defeated, vassal_of, trade_partner) use freetext. Temporal scope via canonical period names (primary, ~50 values) and optional BCE dates (secondary, ~5% of texts). entity_relationship_evidence and entity_relationship_decisions follow the standard trust pattern.
Authority Reconciliation (2 tables) — authority_links maps entities to external identifiers (Wikidata, VIAF, Pleiades, GeoNames, PeriodO) with match_type (exact/broad/narrow/related/uncertain) and confidence. authority_reconciliation_disputes captures scholarly disagreement on reconciliation. Seeded from existing pleiades_id on artifacts and excavation_sites tables.
| Decision | Trade | Gain |
|---|---|---|
| Hybrid predicate governance | Two columns (predicate_enum, predicate_custom) with XOR constraint | Consistency for stable predicates + flexibility for debated relationships |
| Period names as temporal anchor | Less precise than BCE dates | 100% coverage (all artifacts have period) vs ~5% coverage for absolute dates |
| entity_mentions dual FK (token_id OR line_id) | CHECK constraint, nullable columns | Covers 100% of texts: tokenized (5.5%) and ATF-only (94.5%) |
| ORACC OID as dedup key | Depends on ORACC's ID stability | No cross-project duplicates; deterministic fallback for non-ORACC data |
| Separate evidence/decision tables per entity type | 4 more tables (vs polymorphic) | Real FK enforcement, consistent with rest of trust architecture |
For the knowledge graph vision: Named entities become queryable first-class citizens. "Show all texts mentioning Enlil" is a single JOIN instead of a manual CDLI search. Entity relationships enable the "seconds instead of weeks" promise — "Who were Hammurabi's contemporaries?" becomes one query.
For academic trust: Entity identification (prosopography) is the most interpretive level of cuneiform scholarship. Every identification carries provenance: who said this damaged name refers to Hammurabi, how they determined it, what evidence supports it. Competing identifications coexist through the is_consensus mechanism.
For federation: Authority links create the bridge to Wikidata, VIAF, Pleiades, and other external knowledge systems. Reconciliation disputes ensure scholarly rigor at the boundary — wrong links don't silently pollute federated queries.
- Social network analysis views (Mari letters correspondence graph) — future SQL views on entity_relationships
- SPARQL/GraphQL query interface — future application layer, not schema concern
- Visualization (geographic, timeline, concept evolution) — future application layer
- Inference rules (temporal overlap -> contemporary_of) — future computed relationships via annotation_runs with source_type='algorithm'
- Integration with Wikidata query service — enabled by authority_links, needs application code
- 1M+ entities / 10M+ edges scale target — current data is ~1,836 entities; grows with ORACC project imports
- Seed (automated): glossary_entries with entity POS -> named_entities (~1,836). lemmas with entity POS -> entity_mentions (~4,995)
- Enrich (import): Additional ORACC projects (etcsri, hbtin, rinap, saao, etc.) -> new entity_mentions via annotation_runs
- Authority (semi-automated): pleiades_id on artifacts/excavation_sites -> authority_links. Manual Wikidata/VIAF reconciliation later
- Relationships (scholarly): Manual input via future UI or structured imports from prosopographic databases
Source: github.com/wittkensis/glintstone · Issues · Edit this wiki
Start here
Getting Started
Overview
Data Model
- Data Sources
- Data Quality
- Data Issues
- Import Pipeline Guide
- ML Integration
- Citation Pipeline Summary
Reference — Data Model
Reference — API
Reference — MCP
Opportunities
Personas
Project
Research