Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
user1303836
left a comment
There was a problem hiding this comment.
Review
Overall
Correct architectural change for multi-guild support. Clean separation between article collections (single global) and message chunk collections (per-guild). Well-tested.
Issues
-
No migration path for existing data. The old global
message_chunkscollection at{data_dir}/message_chunks/will be orphaned. New guild-scoped collections live at{data_dir}/message_chunks/{guild_id}/. The_ensure_message_chunk_indexwill correctly rebuild from SQLite, but the old directory is never cleaned up or warned about. Consider adding a log warning or cleanup for the orphaned directory on startup. -
auto_start_ingestionbreak removal -- sequential constraint. Thebreakremoval is the correct fix (all guilds should get ingestion, not just the first). However,start_ingestion_for_guildchecksif self._ingestion_service.is_running: return, andis_runningis a single boolean. So in practice, only the first guild with pending work will actually get backfilled per startup -- subsequent guilds return early because the first backfill is still running. The testtest_auto_start_uses_all_guildspasses because the mock doesn't simulateis_runningreturning True. This may be fine for now, but worth noting that true parallel multi-guild ingestion would require per-guild backfill tracking. -
Lazy collection initialization.
_message_chunksis no longer initialized ininitialize(). Collections are lazy-created on first access via_message_chunk_collection(guild_id, create=True). This means initialization failures that used to surface at startup now surface at first use. Acceptable tradeoff but a behavior change. -
search_message_chunksreturns[]instead of raising. When the guild collection doesn't exist (create=FalsereturnsNone), it returns an empty list. The old code raisedRuntimeError("VectorStore not initialized"). The new behavior is arguably better -- searching an unindexed guild returns nothing rather than crashing. -
Merge conflict with PR #233. Both PRs modify
vector_store.pyandrepository.pyfrom the same base. Recommend merging #233 first, then rebasing this PR.
CI
Lint failure is pre-existing (7 unrelated files). Tests and type checks pass.
Verdict
Near-ready. Address the orphaned collection cleanup (even just a warning log), and rebase onto #233 after it merges.
Summary
auto_start_ingestion()break so ingestion lifecycle matches the new storage modelTesting
PYTHONPATH=/Users/user1303836/Development/intelstream-codex-lore-guild-index/src /Users/user1303836/Development/intelstream/.venv/bin/pytest tests/test_vector_store.py tests/test_services/test_message_ingestion.py tests/test_discord/test_lore.py tests/test_database.py -q/Users/user1303836/Development/intelstream/.venv/bin/ruff check src/intelstream/database/vector_store.py src/intelstream/database/repository.py src/intelstream/services/message_ingestion.py src/intelstream/discord/cogs/lore.py tests/test_vector_store.py tests/test_services/test_message_ingestion.py tests/test_discord/test_lore.py/Users/user1303836/Development/intelstream/.venv/bin/ruff format --check src/intelstream/database/vector_store.py src/intelstream/database/repository.py src/intelstream/services/message_ingestion.py src/intelstream/discord/cogs/lore.py tests/test_vector_store.py tests/test_services/test_message_ingestion.py tests/test_discord/test_lore.pyCloses #227