Skip to content

[P1] Partition the lore vector index by guild before re-enabling /lore #227

@user1303836

Description

@user1303836

Summary

The lore vector index is currently global, not guild-scoped. MessageChunkMeta stores guild_id, but the zvec message_chunks collection stores only chunk_id vectors and search_message_chunks() has no guild filter.

Why this matters

/lore is intended to answer questions about one Discord server's history. In the current architecture, once the command is re-enabled, retrieval is at risk of mixing chunks across servers.

That creates two bad options:

  • Search the global collection and risk cross-guild leakage.
  • Post-filter after retrieval, which avoids leakage but can still miss relevant in-guild results because the ANN step was not guild-constrained.

This is mostly hidden today because /lore is still disabled, which makes now the right time to fix it.

Evidence in the code

  • src/intelstream/database/models.py: MessageChunkMeta persists guild_id and channel_id.
  • src/intelstream/services/message_ingestion.py: lore ingestion writes all chunks into one shared message_chunks collection.
  • src/intelstream/database/vector_store.py: search_message_chunks() accepts only an embedding and topk; there is no tenant/guild constraint.
  • src/intelstream/discord/cogs/lore.py: message buffering and ingestion listeners are guild-agnostic and run for every guild the bot can see.

Suggested fix

  • Use a separate lore collection per guild, or store searchable metadata that supports pre-filtered retrieval.
  • Add tests that prove a guild can never retrieve chunks from another guild.
  • Treat this as a blocker before re-enabling /lore.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions