-
Notifications
You must be signed in to change notification settings - Fork 2
Closed
Description
Summary
The lore vector index is currently global, not guild-scoped. MessageChunkMeta stores guild_id, but the zvec message_chunks collection stores only chunk_id vectors and search_message_chunks() has no guild filter.
Why this matters
/lore is intended to answer questions about one Discord server's history. In the current architecture, once the command is re-enabled, retrieval is at risk of mixing chunks across servers.
That creates two bad options:
- Search the global collection and risk cross-guild leakage.
- Post-filter after retrieval, which avoids leakage but can still miss relevant in-guild results because the ANN step was not guild-constrained.
This is mostly hidden today because /lore is still disabled, which makes now the right time to fix it.
Evidence in the code
src/intelstream/database/models.py:MessageChunkMetapersistsguild_idandchannel_id.src/intelstream/services/message_ingestion.py: lore ingestion writes all chunks into one sharedmessage_chunkscollection.src/intelstream/database/vector_store.py:search_message_chunks()accepts only an embedding andtopk; there is no tenant/guild constraint.src/intelstream/discord/cogs/lore.py: message buffering and ingestion listeners are guild-agnostic and run for every guild the bot can see.
Suggested fix
- Use a separate lore collection per guild, or store searchable metadata that supports pre-filtered retrieval.
- Add tests that prove a guild can never retrieve chunks from another guild.
- Treat this as a blocker before re-enabling
/lore.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels