Entity dataset for movies, books, and related content (characters, locations, items, people, organizations).
npm install
npm run pipeline -- titles.jsonThis runs the full pipeline:
- Extract - LLM extracts entities (characters, locations, etc.) from each title
- Generate - LLM generates detailed entity JSON files
- Validate - Checks for issues (missing targets, duplicates, etc.)
entities/ # Individual entity JSON files (914 files)
prompts/ # LLM prompt templates
schemas/ # JSON schema for entities
scripts/ # Node.js scripts
batches/ # Generated batch configs (intermediate files)
titles.json # Canonical list of all titles
| Command | Description |
|---|---|
npm run pipeline -- <titles.json> |
Full end-to-end generation |
npm run extract -- <type> <title> |
Extract entities from a single title |
npm run generate -- <type> <name> [source] |
Generate a single entity |
npm run generate-batch -- <config.json> |
Generate entities from batch config |
npm run validate |
Validate all entities |
npm run build |
Build search index (see below) |
npm run reconcile |
Interactive relationship reconciliation |
# Full pipeline
npm run pipeline -- titles.json
# Extract only (review before generating)
npm run pipeline -- titles.json --extract-only
# Skip extraction, use existing batch files
npm run pipeline -- titles.json --skip-extract
# Process in smaller chunks (for API rate limits)
npm run pipeline -- titles.json --chunk-size=10-
Add titles to
titles.json:{ "type": "movie", "name": "New Movie Title" } -
Run the pipeline:
npm run pipeline -- titles.json
Or generate individually:
npm run extract -- movie "New Movie" --save
npm run generate-batch -- batches/new-movie.json
npm run validateIf you need a combined index file for a search interface:
npm run buildThis creates build/index.json with all entities bundled together, indexed by ID, type, and tag. The build folder is gitignored.
Each entity has:
id- UUIDtype- person, character, location, item, organization, movie, book, franchisename- canonical namedescription- 1-2 sentence summarycontent- array of {title, body} sectionsaliases- alternative namesrelationships- links to other entitiesproperties- type-specific attributestags- categorization tags
For entity generation, set:
OPENAI_API_KEY_CONTENTGEN=<api-key>
OPENAI_API_BASE_URL=<base-url> # optional
OPENAI_API_ORG=<org-id> # optional